Data Vault 2.0 is a concept for data warehousing, invented by Dan Linstedt. It brings many new features that help anyone who is concerned with Business Intelligence entering a new age of data warehousing. Data Vault 2.0 is a Big Data concept that integrates relational data warehousing with unstructured data warehousing in real-time. It is an extensible data model where new data sources are easy to add. When our founders wrote the book, they required a visual approach to model the concepts of Data Vault in the book. For this purpose, they developed the graphical modeling language, which focuses on the logical aspects of Data Vault. The Microsoft Visio stencils and a detailed white paper are available on www.visualdatavault.com as a free download.
SATELLITES IN VISUAL DATA VAULT
Satellites add descriptive data to hubs and links. Descriptive data is stored in attributes that are added to the satellite. The individual attributes are added to the satellite one at a time. A satellite might be attached to any hub or link. However, it is only possible to attach the satellite to one parent.
Figure 1: Satellite depends on hub
The connection between the satellite and the hub could be expressed by the statement “(satellite) Medicine Details depends on (hub) Medicine.”
It is also possible to add multiple satellites to one parent.
Figure 2: Multiple satellites (split by source system) depends on a hub
There is no limit to the number of satellites a hub or link can have. Figure 2 also demonstrates that satellites don’t have to show the associated attributes when presented in an overview diagram. The recommendation is to split the raw data first by source system and second by rate of change. Splitting by source system follows the data-driven approach what eliminates the re-engineering when adding new source systems and makes automation nearly 100% possible for the Raw Vault.
Once the data is split by source system, it is also best practice to further split the data by rate of change. Consider a satellite that holds information about a patient. A number of attributes don’t change very often (or never); for example, the name or the blood group. Some attributes might change more often, for example, the total number of visiting a hospital.
Even when the satellites are arranged by the source system, a Record Source attribute is still required. The Record Source attribute can be used to identify the data source geographically or by the application. For example, the source might be an SAP source system that is distributed across more than one physical machine. Depending on the requirements of the data warehouse, we track the individual physical machine in the Record Source attribute.
In addition to the attributes that store descriptive data in a satellite, the following metadata is required:
- Load date
- Record source
- Parent hash key
- (Load end date)
The following attributes are optional to Data Vault satellites:
- Extract date
- Hash difference
The hash difference attribute is similar to the hash key in a Data Vault link. It is a hash value of all the descriptive data of a satellite’s entry. Having this hash value helps you to compare row values quickly and efficiently. Hash difference helps you identify differences in descriptive attributes quickly and add new satellite entries only when a change has been made to the satellite. We wrote a newsletter about hash keys (and also hash diffs) in April this year which can be read here.
How to Get Updates and Support
Please send inquiries and feature requests to [email protected].
For Data Vault training and on-site training inquiries, please contact [email protected] or register at www.scalefree.com.
To support the creation of Visual Data Vault drawings in Microsoft Visio, a stencil is implemented that can be used to draw Data Vault models. The stencil is available at www.visualdatavault.com.