Providing DR By Using An Azure Service Bus In Two Regions

This is a Disaster Recovery option I used on one of my projects. A client required a DR solution which guaranteed not to lose any messages during the DR failover process. In essence, no messages were allowed to be lost in mid flight whilst the DR process was taking place.

Solution

The approach I took was to use an Azure Service Bus  in a primary Azure region and another service bus in a secondary Azure  DR region. The publisher would then send the same message to both service bus endpoints. For my solution I used Azure APIM to post the same message to both primary and secondary regions, thereby simplifying the code for the publisher. 

image

The primary region would simply process the messages as per normal using a Logic App to poll for new messages on the service bus. However in the DR region,  the equivalent Logic App would be set in a disabled state. The messages in the secondary region would remain on the bus until the Time-To-Live (TTL) threshold is reached before either being moved onto the Deadletter queue, or dropped from the queue altogether by setting the EnableDeadLetteringOnMessageExpiration property to false. Using this technique provides automatic pruning of the older messages that would have definitely been processed by the primary region before failing over.

The value chosen for the TTL property is determined by how long it would take to failover to the DR region and the time to detect an issue being realised in the primary region.

Failing over would simply involve enable the Logic App the DR region and failing back would involve disabling this Logic App again. 

Pros

  • No messages are lost during the failover process.
  • Low monetary transaction cost by the DR environment as no Logic Apps are being triggered during the normal process flow through the primary region.
  • Simplistic DR design which involves just another another queue.
  • Simple failover process.

Cons

  • There is a delay before any new messages would be processed while  the older messages on the service bus are reprocessed first.
  • The backend system processing the messages must be idempotent, meaning the same message maybe processed by the backend system multiple times and produce the same outcome.
  • Requires the publisher to send the same message to 2 different endpoints. This may be mitigated by using APIM to manage sending the messages to both service bus endpoints.

Enjoy…