Splitting a Satellite entity based on the source data

By June 30, 2020Scalefree Newsletter

Satellite splitting criteria plays a vital role in a satellite’s structure. Being such, it is not recommended that the entirety of descriptive data related to a business key should be stored in a single satellite structure. Instead, raw data should preferably be split by certain criteria.

In general, we have defined the following types of satellite splits:

  1. Splitting by source system
  2. Splitting by rate of change

Additionally, we have defined two more types of splits as mentioned below:

  1. Splitting by level of security and by the level of privacy
  2. Business-driven split

A satellite split by source system is strongly recommended to prevent two issues when loading the data into the enterprise data warehouse: First, if two different source systems with different relational structures should be loaded into the same satellite entity, a transformation of the structure might be required. However, structural transformation requires business logic sooner or later and that should be deferred to the information delivery stage to support fully-auditable environments as well as the application of multiple business perspectives.

The second issue is that two sources loaded into the same satellite entity leads to the so-called “flip-flop effect”: if both systems store contradicting data (e.g. out-of-sync) regarding the business key to be described, the satellite will absorb two deltas per day, capturing both descriptions, leading to high storage consumption and data inconsistencies. Therefore, splitting a satellite by source system helps to reduce the storage consumption drastically. 

The advantages of splitting satellites by source system include the enhancement of parallelism, multiple source systems data can be loaded in parallel, as well. It also allows for the integration of real-time data without the need to integrate with raw data from a batch load.  

In addition to the split by source system, the storage consumption can be further reduced by splitting the satellite by rate of change:

Figure: Multiple satellites (split by source system) depends on a hub

For splitting a satellite based on rate of change, one should determine the frequency of change regarding all attributes; grouping data into those that never change, sometimes change, or change very frequently. Splitting a satellite by rate of change separates the quickly changing attributes from the slowly changing attributes and therefore prevents the consumption of unnecessary storage when a quickly changing attribute is changing.

A satellite split by source system and the technical split by rate of change of data, not required when page compression is available in the database, are common and recommended practices when it comes to splitting descriptive attributes. However, we have decided to split raw data even further, both technically and by business meaning.

As part of our process, the security levels range from:

  • The lowest confidentiality level – level 0, 1: no security measure required, for public data
  • Limited access to certain internal parties – level A, R, C, F.
  • To the highest confidentiality level – level S: top secret.

Moving forward, the business-driven satellite split distributes raw data into different satellite tables utilizing certain business meanings of data content.

We have defined several classifications for this purpose, to name a few: “contact” for contact data and “activity” for data that tracks the interactions users have made with the source record.

Additionally, data modelers can define custom business classifications for specified unique business meanings in business objects.

For example, all data attributes of an application installed on the CRM platform Salesforce are often stored within a single satellite structure. The main reason behind business driven satellites is that we can either add or remove apps while reducing the impact of structural changes to the EDW. 

Putting everything together, here’s an example of a satellite name in our internal EDW solution:

customer_contact_sfdc_lcp_s

The above is a satellite of a business object labelled Customer and holds customers’ contact information from the source system Salesforce. Thus, its content has a low rate of change, a security level of C and contains personal data.

Summary

This blog post introduced a Data Vault entity, Satellite, and we have defined our basic recommendations on how to split a satellite in different ways as well as their benefits accordingly. We also have recommended additional ways to split a satellite which are being followed in Scalefree based on source data. In our next blog post, we are going to take a deeper look into Satellite modelling in regards to any structural changes made in the source system.

– by Samatha Balla (Scalefree)

Get Updates and Support

Please send inquiries and feature requests to [email protected]

For Data Vault training and on-site training inquiries, please contact [email protected] or register at www.scalefree.com.

To support the creation of Visual Data Vault drawings in Microsoft Visio, a stencil is implemented that can be used to draw Data Vault models. The stencil is available at www.visualdatavault.com.

Newsletter

New Data Vault insights every month

Scalefree

Leave a Reply