Scalefree Newsletter

Still struggling with GDPR?

By | Scalefree Newsletter | No Comments

The new General Data Protection Regulation (GDPR) is a law by the European Union (EU) and became effective on May 25, 2018. This new regulation is designed to put a high level of protection to personal data of European citizens, what means that companies around the world have to establish transparency and ownership to the individuals’ data and need to get a clear declaration of consent from them to save and process their personal data. Though laws from countries outside the EU (especially the USA) tend to favor business over consumer, GDPR affects all companies over the world who have personal data from EU-citizens in their database.


To be careful with personal data is nothing new, especially not in the EU. The key change of collecting and processing personal data is that the data is now completely under control of the owner, who can force the companies to delete or anonymize their data or to request copies of all owners personal data stored in the system. Personal data or Privately Identifiable Information (PII) means data, an individual can be identified with, e.g. name, phone number or email address. Read More

Pledge 1% – Scalefree unites with Frankfurter Ring

By | Scalefree Newsletter | No Comments

We often discuss relevant Big Data and associated Data Vault topics like “Hybrid Architecture”, “Data Lake” and “Hadoop” as we try to share our knowledge. Though this time around, we’d like to take some time to shift focus and talk about our company culture and commitment to our community.


Scalefree built itself upon the idea that success is only a true success when shared with others and that idea has shaped every decision within the organization since we’ve first started. So it was rather easy to finally put that idea into a concrete commitment when presented with the growing movement, Pledge 1%, and it’s focus on building better bridges with those in which we share a community. Read More

Data Warehouse and Data Lake – Do we still need a Data Warehouse?

By | Scalefree Newsletter | No Comments

“Big Data”, “Data Lake”, “Data Swamp”, “Hybrid Architecture”, “NoSQL”, “Hadoop” … terms you are confronted with very often these days when you are dealing with data. Furthermore, the question comes up if you really need a data warehouse nowadays when you deal with a high variety and volume of data. We want to talk about what a data lake is, if we need a data warehouse when using NoSQL platforms like Hadoop, and how it is combined with Data Vault.


There is a proper definition from Tamara Dull (SAS): “A data lake is a storage repository that holds a vast amount of raw data in its native format, including structured, semi-structured, and unstructured data. The data structure and requirements are not defined until the data is needed.” 1 Read More

How to combine Managed Self-Service BI with Data Vault 2.0?

By | Scalefree Newsletter | One Comment

Last month we talked about a hybrid architecture in Data Vault 2.0, where we explain how to combine structured and unstructured data with a hybrid architecture. To follow up on this topic, we now want to explain how your business users (especially power users) can take a benefit from it with the managed Self-Service Business Intelligence (mSSBI) approach in Data Vault 2.0.


Self-service BI allows end-users to completely circumvent IT due to this unresponsiveness of IT. In this approach, business users are left on their own with the whole process of sourcing the data from operational systems, integration and consolidation of the raw data. There are many problems with this self-service approach without the involvement of IT:
Read More

Hybrid Architecture in Data Vault 2.0

By | Scalefree Newsletter | 2 Comments

Business users expect from their data warehouse systems to load and prepare more and more data, regarding the variety, volume, and velocity of data. Also, the workload that is put on typical data warehouse environments is increasing more and more, especially if the initial version of the warehouse has become a success with its first users. Therefore, scalability has multiple dimensions. Last month we talked about Satellites, which play an important role regarding the scalability. Now we explain how to combine structured and unstructured data with a hybrid architecture.


The Data Vault 2.0 architecture is based on three layers: the staging area which collects the raw data from the source systems, the enterprise data warehouse layer, modeled as a Data Vault 2.0 model, and the information delivery layer with information marts as star schemas and other structures. The architecture supports both batch loading of source systems and real-time loading from the enterprise service bus (ESB) or any other service-oriented architecture (SOA).

Read More

Visual Data Vault By Example: Satellites Modeling In The Health Care Industry

By | Scalefree Newsletter | One Comment

Data Vault 2.0 is a concept for data warehousing, invented by Dan Linstedt. It brings many new features that help anyone who is concerned with Business Intelligence entering a new age of data warehousing. Data Vault 2.0 is a Big Data concept that integrates relational data warehousing with unstructured data warehousing in real-time. It is an extensible data model where new data sources are easy to add. When our founders wrote the book, they required a visual approach to model the concepts of Data Vault in the book. For this purpose, they developed the graphical modeling language, which focuses on the logical aspects of Data Vault. The Microsoft Visio stencils and a detailed white paper are available on as a free download.

This year we already wrote about the modeling of hubs and links in Data Vault 2.0. Now, we want to introduce you the third standard entity, the Satellite.


Satellites add descriptive data to hubs and links. Descriptive data is stored in attributes that are added to the satellite. The individual attributes are added to the satellite one at a time. A satellite might be attached to any hub or link. However, it is only possible to attach the satellite to one parent. Read More

Visual Data Vault By Example: Links Modeling In The Banking Industry

By | Scalefree Newsletter | No Comments

With the advent of Data Vault 2.0, which adds architecture and process definitions to the Data Vault 1.0 standard, Dan Linstedt standardized the Data Vault symbols used in modeling. Based on these standardized symbols, the Visual Data Vault (VDV) modeling language was developed, which can be used by EDW architects to build Data Vault models. When our founders wrote the book, they, required a visual approach to model the concepts of Data Vault in the book. For this purpose, they developed the graphical modeling language, which focuses on the logical aspects of Data Vault. The Microsoft Visio stencils and a detailed white paper are available on as a free download.


In June this year we published another newsletter how hubs are modeled in the accounting industry. In this Newsletter we explain the function of standard links and how the modeling in the banking industry works.

Links connect individual hubs in a Data Vault model and represent either transactions or relationships between business objects. Business objects are connected in business. No business object is entirely separate from other business objects. Instead, they are connected to each other through the operational business processes that use business objects in the execution of their tasks. The image below shows a link that connects two hubs (a standard link has to have at least two connections). Read More

Achieve Data Lineage in Data Vault 2.0

By | Scalefree Newsletter | No Comments

One common requirement in data warehouse projects is to provide data lineage from end-to-end. However, custom solutions (for example custom Meta Marts for self-developed Data Vault generators) or tools from different vendors often break such end-to-end data lineage.

Unlike business or technical metadata, which is provided by the business or source applications, process execution metadata is generated by the data warehouse team and provides insights into the ETL processing for maintenance. The data is used by the data warehouse team or by end-users to better understand the data warehouse performance and results presented in the information marts. One type of process execution metadata is the control flow metadata which executes one or more data flows among other tasks. Logging the process execution provides a valuable tool for maintaining or debugging the ETL processes of the data warehouse because it provided information about the data lineage of all elements of the data warehouse.  Read More