Duplicate Item Message Splitter

This pattern validates incoming messages containing multiple items to identify duplicate records. When duplicates are detected, the original message is automatically split into multiple messages containing only unique items. Throughout this process, the original item sequence and message order are preserved to maintain data integrity and processing consistency.

A common use case is importing data into the Dynamics 365 Finance & Operations Data Management Framework (DMF). If multiple rows contain the same value for a unique key, the import process will fail with an error similar to “The file contains duplicate rows and cannot insert data into staging due to a unique key violation.” By separating duplicate records into distinct messages, this pattern helps prevent import failures and ensures successful processing of valid records.

Another common use case is enforcing a maximum number of items per message when a downstream system cannot efficiently process large collections within a single message. In such scenarios, the message is split into smaller batches that comply with the target system’s processing limitations while preserving the original item order.

An example of this pattern is illustrated below. In this scenario, the Duplicate Splitter component detects two Item records with the same primary key value (ItemId = 1) within a single message. Upon detecting the duplicate, the component partitions the original message into two separate messages (represented by the two blue messages in the diagram), ensuring that both the original item sequence and overall message ordering are preserved. 

The resulting messages are published to a secondary queue dedicated to split-message processing. This allows the subscribing Function App to process each message independently, leveraging Azure Service Bus message completion and abandonment semantics to ensure reliable processing.

This processing model provides the following benefits: 

  • Ensures that each published message contains only unique items. 
  • Preserves the original sequence of items within the source collection. 
  • Maintains message ordering throughout the publishing process. 
  • Prevents downstream consumers from receiving messages containing duplicate primary keys. 
  • Supports reliable message processing through independent handling of each generated message. 

Design Overview:

The following diagram illustrates the Azure components involved in the solution and the flow of messages between them.

  1. The publishing Function App sends messages to an Azure Service Bus topic.
  2. The Message Splitter Function App subscribes to this topic, validates the incoming messages for duplicate records, and forwards the processed messages to a downstream Service Bus topic.
  3. When duplicate records are detected, the Message Splitter Function App generates multiple messages containing only unique records and publishes them to the destination topic.
  4. The consumer Function App subscribes to the destination topic and processes the messages using its standard processing logic, without requiring any changes to accommodate duplicate handling.

Splitter Processing Logic: 

The Splitter Function App receives a message containing a collection of items to be processed. As the collection is iterated, each item is added to an in-memory dictionary using the item’s primary key as the dictionary key. This approach enables efficient detection of duplicate records within the current message batch. 

When an item is encountered whose primary key already exists in the dictionary, a duplicate record has been identified. At this point, all items currently stored in the dictionary are published to Azure Service Bus as a single message, ensuring that the message contains only unique items. The dictionary is then cleared, and processing resumes with the duplicate item becoming the first entry in the next batch. 

This process continues until all items in the source collection have been evaluated. Once the end of the collection is reached, any remaining items held in the dictionary are published as the final message. 

Caveat:

While the splitter logic can be implemented within the Subscriber Function App, doing so introduces the risk of function timeouts if the subscribing system requires a significant amount of time to process each message.

Enjoy…

Leave a Reply

Your email address will not be published. Required fields are marked *