That said, an often underestimated point is the handling of relationships in Data Vault 2.0.
In the following we explain what to consider and how to deal with it:
There are different ways to handle the validation of relationships from source systems depending on how the data is delivered, (full-extract or CDC), and the way a delete is delivered by the source system, such as a soft delete or hard delete.
First, let us explain the different kinds of deletes in source systems:
- Hard delete – A record is hard deleted in the source system and no longer appears in the system.
- Soft delete – The deleted record still exists in the source systems database and is flagged as deleted.
Secondly, let’s explore how we find the data in the staging area:
- Full-extract – This can be the current status of the source system or a delta/incremental extract.
- CDC (Change Data Capture) – Only new, updated or deleted records to load data in an incremental/delta way.
To keep the following explanation as simple as possible, our assumption is that we want to mark relationships as deleted as soon as we get the delete information, even if there is no audit trail from the source system (data aging is another topic).