A data warehouse is a subject oriented, nonvolatile, integrated, time variant collection of data to support management’s decisions
- Inmon, W. H. (2005). Building the Data Warehouse. Indianapolis, Ind.: Wiley.
There are different ways to handle the validation of relationships from source systems depending on how the data is delivered, (full-extract or CDC), and the way a delete is delivered by the source system, such as a soft delete or hard delete.
First, let us explain the different kinds of deletes in source systems:
Secondly, let’s explore how we find the data in the staging area:
To keep the following explanation as simple as possible, our assumption is that we want to mark relationships as deleted as soon as we get the delete information, even if there is no audit trail from the source system (data aging is another topic).
Data Vault is capable of decoupling all the necessary business-driven changes, extensions and customizations to the platform while maintaining the ability to become the cornerstone of an integrated architecture. The decoupling is a part of our Data Vault Boot Camp and is summarized in Figure 1.
Scalefree can provide knowledge and implementation assistance in both Data Vault as well as Salesforce therefore creating the optimal partner for your Salesforce integration project.
Thus, the immediate access to the data and certainty on its quality can enhance business performance immensely. But the sad truth is that we see bad data in operational systems due to human-caused errors such as typos, ignoring standards and duplicates, in addition to lack of input-validators in operating systems such as must-fields not being declared as well as references to other entities (primary-foreign-key constraints) not being defined.
Many people already know the three layer architecture of data warehouses which is used in Data Vault 2.0. The first layer represents the staging area which holds the raw data from the source systems. The enterprise data warehouse layer, which in this case contains a Data Vault 2.0 model and the third layer with the Information Marts, which deliver the information in various structures (Star Schemas, Snowflake Schemas etc.).
Figure 1. Data Vault 2.0 Architecture
The scalability and flexibility of Data Vault 2.0 offers a whole variety of use cases that can be realized, e.g. to optimize as well as automate operational processes, predict the future, push data back to operational systems as a new input or trigger events outside the data warehouse, to name a few. Read More
This article describes features embodied in the Data Vault 2.0 model: the foundation of hub, link, and satellite entities can adjust to changes in the source data easily, thus reducing the cost of reengineering the enterprise data warehouse.
In general, we have defined the following types of satellite splits:
Additionally, we have defined two more types of splits as mentioned below:
A satellite split by source system is strongly recommended to prevent two issues when loading the data into the enterprise data warehouse: First, if two different source systems with different relational structures should be loaded into the same satellite entity, a transformation of the structure might be required. However, structural transformation requires business logic sooner or later and that should be deferred to the information delivery stage to support fully-auditable environments as well as the application of multiple business perspectives. Read More
Throughout this article, we will continue presenting our suggestions for naming conventions in a data warehouse solution, as well as sharing examples for naming standards, which both our team and our customers utilize internally.
For layer schema names, we prefer using prefixes.
As discussed in the previous blog post, this convention boosts visibility in data exploration within the Enterprise Data Warehouse for developers and business users by grouping schemas of the same data warehouse layer together.
The following is a list of common Enterprise Data Warehouse layers and our associated recommendations regarding naming conventions: Read More
This said, we at Scalefree are reflecting on the ways we can do our part to help contain the virus. Thus, the health of Scalefree’s customers, partners, employees, and the overall community is our highest priority. We do not want to subject any person to undue risk, and certainly those around us, who could be more seriously affected by the virus.
Due to this growing concern regarding the coronavirus (COVID-19), as well as in alignment with the best practices in addition to restrictions from local authorities, we have chosen to make our training programs fully available as live and interactive online classes.
Therefore, we would like to give you an overview of what changes Scalefree is making to help stem the spread of Covid-19 while at the same time delivering actionable insights to our partners, customers and employees.