Satellite modeling for any structural changes in the source system

By | Scalefree Newsletter | No Comments

Modeling a Satellite in the instance of any structural changes within the source system

Over time, most source systems change.
The question is how to absorb these changes into the data warehouse based on Data Vault, especially when considering the satellites?
It is necessary to find a balance between the reengineering effort and performance when the source table structure changes. To better help those who find structural changes in source systems, this article will present our recommendations, based upon our knowledge base,  for various types of changes in a source.

This article describes features embodied in the Data Vault 2.0 model: the foundation of hub, link, and satellite entities can adjust to changes in the source data easily, thus reducing the cost of reengineering the enterprise data warehouse. 

Read More

Splitting a Satellite entity based on the source data

By | Scalefree Newsletter | No Comments
Satellite splitting criteria plays a vital role in a satellite’s structure. Being such, it is not recommended that the entirety of descriptive data related to a business key should be stored in a single satellite structure. Instead, raw data should preferably be split by certain criteria.

In general, we have defined the following types of satellite splits:

  1. Splitting by source system
  2. Splitting by rate of change

Additionally, we have defined two more types of splits as mentioned below:

  1. Splitting by level of security and by the level of privacy
  2. Business-driven split

A satellite split by source system is strongly recommended to prevent two issues when loading the data into the enterprise data warehouse: First, if two different source systems with different relational structures should be loaded into the same satellite entity, a transformation of the structure might be required. However, structural transformation requires business logic sooner or later and that should be deferred to the information delivery stage to support fully-auditable environments as well as the application of multiple business perspectives. Read More

What to consider for naming conventions in Data Warehousing – Part 2

By | Scalefree Newsletter | 2 Comments
In a previous blog post, we discussed the different aspects of a naming standard documentation – from letter case types to the consideration between using prefixes or suffixes in database object names.

Throughout this article, we will continue presenting our suggestions for naming conventions in a data warehouse solution, as well as sharing examples for naming standards, which both our team and our customers utilize internally.

Layer schemas

For layer schema names, we prefer using prefixes.
As discussed in the previous blog post, this convention boosts visibility in data exploration within the Enterprise Data Warehouse for developers and business users by grouping schemas of the same data warehouse layer together.
The following is a list of common Enterprise Data Warehouse layers and our associated recommendations regarding naming conventions: Read More

CHANGES WE ARE MAKING FOR YOU

By | Announcements | No Comments
This month, our blog post will take a slightly different tone than our normal publications due to the current Covid-19 pandemic and its paralyzing effect on many areas of public life.

This said, we at Scalefree are reflecting on the ways we can do our part to  help contain the virus. Thus, the health of Scalefree’s customers, partners, employees, and the overall community is our highest priority. We do not want to subject any person to undue risk, and certainly those around us, who could be more seriously affected by the virus.

Due to this growing concern regarding the coronavirus (COVID-19), as well as in alignment with the best practices in addition to restrictions from local authorities, we have chosen to make our training programs fully available as live and interactive online classes.

Therefore, we would like to give you an overview of what changes Scalefree is making to help  stem the spread of Covid-19 while at the same time delivering actionable insights to our partners, customers and employees.

Read More

Delete and Change Handling Approaches in Data Vault 2.0 without a Trail

By | Scalefree Newsletter | No Comments
In January of this year, we published a piece detailing an approach to handle deletes and business key changes of relationships in Data Vault without having an audit trail in place.
This approach is an alternative to the Driving Key structure, which is part of the Data Vault standards and a valid solution.
Though, at times it may be difficult to find the business keys in a relationship which will never change and therefore be used as the anchor keys, Link Driving Key, when querying. The presented method inserts counter records for changed or deleted records, specifically for transactional data, and is a straightforward as well as pragmatic approach. However, the article caused a lot of questions, confusion and disagreements.
That being said, it is the intention of this blogpost to dive deeper into the technical implementation in which we could approve by employing it. Read More

An Efficient Data Lake Structure

By | Scalefree Newsletter | 2 Comments
Within a hybrid data warehouse architecture, as promoted in the Data Vault 2.0 Boot Camp training, a data lake is used as a replacement for a relational staging area. Thus, to take full advantage of this architecture, the data lake is best organized in a way that allows efficient access within a persistent staging area pattern and better data virtualization.

Read More

Data Vault Use Cases Beyond Classical Reporting: Part 2

By | Scalefree Newsletter | No Comments
As we first introduced within the first part of the Data Vault Use Cases article series, the Enterprise Data Warehouse (EDW) can do more than just simple reporting and dashboarding. 

We previously explored how the EDW can help to improve data quality by implementing data cleansing rules. 

This can be applied by write-back operations that affect the source systems directly. Though this was only one example of how to add more value to the EDW.
The scalability and flexibility of Data Vault 2.0 offers a whole variety of use cases that can be realized, e.g. to optimize and automate operational processes, predict the future, push data back to operational systems as a new input or trigger events outside the data warehouse, to name a few.

Read More

Capturing Semi-Structured Descriptive Data

By | Scalefree Newsletter | No Comments
The previous articles within this series have presented hub and link entities to capture business keys as well as the relationships between business keys. To illustrate, the hub document collection in MongoDB is a distinct list of business keys used to identify customers. 

As to capture the descriptive data, which in this case is the describing factor of the business keys, satellite entities are used in Data Vault. As both business keys and relationships between business keys can be described by user data, satellites may be attached to hub as well as link entities as such:

Read More

Identifying Additional Relationships between Documents

By | Scalefree Newsletter | No Comments
The last article within our series recently covered the Data Vault hub entity which is used to capture distinct list of business keys in an enterprise data warehouse as most integration will actually occur on these hub entities themselves. However, there are scenarios in which the integration of data solely on these hub entities is not sufficient enough for the necessary end goal in mind. 

Consider this situation in which a sample data set, involving an insurance company, concerning customers signing car and home insurance policies as well as filing claims, each respectively. Though before moving forward with the example, it is important to note that there are relationships between the involved business keys, that of the customer number, the policy identifiers, and the claims.

These relationships are captured by Data Vault link entities and just like hubs, they contain a distinct list of records, as such, they contain no duplicates in terms of stored data. Thus, both will form the skeleton of Data Vault and later be described by descriptive user data stored in satellites.

Read More

Integrating Documents from Heterogeneous Sources

By | Scalefree Newsletter | No Comments
Within this part of our ongoing blog series, we would like to introduce a sample data set based upon insurance data. This data set will be used to explain the concepts and patterns expanded upon further in the post. That said, please consider the following situation: an insurance company utilizes two different operational systems, let’s say, a home insurance policy system and a car insurance policy system.

Both systems should be technically integrated, which means if a new customer signs up for a home insurance policy, the customer’s data should be synchronized into the car insurance policy system as well and kept in sync at all times. Thus, when the customer relocates, the new address is updated within both systems.

Though in reality, it often doesn’t go quite as one would expect, as, first of all, both systems are usually not well integrated or simply not integrated at all. Adding to the complexity, in some worst-case scenarios, data is manually copied from one system to the next and updates are not applied to all datasets in a consistent fashion but only to some, leading to inconsistent, contradicting source datasets. The same situation applies often to data sets after mergers and acquisitions are made within an organization.

Read More