How to implement insert only in Data Vault 2.0?

By | Scalefree Newsletter | No Comments

Skilled modeling is important to harness the full potential of Data Vault 2.0. To get the most out of the system due to scalability and performance, it also has to be built on an architecture which is completely insert only. On the way into the Data Vault, all update operations can be eliminated and loading processes simplified.

THE COMMON IMPLEMENTATION

In the common loading patterns, there are two important technical timestamps in Data Vault 2.0. The first is the load date timestamp (LDTS). This timestamp does not represent a business date that comes from the source system. Instead, it provides the information about when the data was first loaded into the data warehouse, usually the staging area. Read More

How to use Point in Time Tables (PIT) in the Insurance Industry?

By | Scalefree Newsletter | 2 Comments

A problem that occurs when querying the data out of the Raw Data Vault happens when there are multiple satellites on a hub or a link:

Figure 1: Data Vault model including PIT (logical)
In the above example, there are multiple satellites on the hub Customer and link included in the diagram. This is a very common situation for data warehouse solutions because they integrate data from multiple source systems. However, this situation increases the complexity when querying the data out of the Raw Data Vault. The problem arises because the changes to the business objects stored in the source systems don’t happen at the same time. Instead, a business object, such as a customer (an assured person), is updated in one of the many source systems at a given time, then updated in another system at another time, etc. Note that the PIT table is already attached to the hub, as indicated by the ribbon. Read More

Managed Self-Service BI – Success in spite of stringent regulation

By | Scalefree Newsletter | No Comments
The latest story to find itself added to the growing number of successful implementations of Scalefree’s services, and Data Vault 2.0 as a whole, centers around a sector known for its strict regulatory bodies in addition to high volume of data that demands the utmost in terms of privacy and security.

As the events that unfolded during the various financial crises at the start of the century left governments the world over seeking to impose stricter regulations for the financial sector. Banks within the sector were faced with a new task as they sought to continue operating with expansion in mind while still falling well within defined standards. Read More

The latest innovations of Data Vault 2.0

By | Scalefree Newsletter | No Comments

Focus on trends: Data Lake and no-sql, dwh architecture, self-service bi, modeling and gdpr

In the past, we wrote about topics we were confronted with when we consult our clients or just recognized widely occurring discussions in the web.

All these topics were already covered in Data Vault 2.0 and most of them moved into a higher focus within the last months. Coming with the trends in the private sector, NoSQL databases are now playing an important role for storing data fast from different source systems. This brings new opportunities to analyze the data, but also new challenges, i.e. how to query fast from those “semi”- and “unstructured” data, e.g. including Massive Parallel Processing (MPP). Furthermore, there is an abundance of tools to store, transport, transform and analyze the data, what often results in time and cost-intensive researching.  The knowledge about “Schema on Write” and “Schema on Read” (and their differences) became very important to build a Data “Warehouse”. A Schema has been and is still mandatory for Business Analysts when they have to tie the data to business objects for analytical reasons. Storing your data in NoSQL platforms only (let’s call it a “Data Lake”) is a good approach to capture all your company’s data, but it became much more difficult for Business User to get the data out from those platforms. A good and recommended approach is to have both, a Data Lake AND a Data Warehouse combined in a Hybrid Architecture.

Read More

How to scale in a disciplined agile manner?

By | Scalefree Newsletter | No Comments

Looking beyond Scrum and learn how to increase the value in Data Vault 2.0 projects

Earlier this year we talked about Managed Self-Service BI to explain how business users can take a benefit from this approach in Data Vault 2.0. Now we want to show you how to get there from a project management perspective, even in large companies where the standard Scrum approach often not works with the accorded deployment/release regulations and other approaches like the Disciplined Agile framework are the better fit.

Agile transformation is hard because cultural change is hard. It’s not one problem that needs to be solved, but a series of hundreds of decisions affecting lots of people over a long period of time that affects relationships, processes, and even the state of mind of those working within the change.

There are two fundamental visions about what it means to scale agile: Tailoring agile strategies to address the scaling challenges – such as geographic distribution, regulatory compliance, and large team size – faced by development teams and adopting agility across your organization. Both visions are important, but if you can’t successfully perform the former then there is little hope that you’ll be successful at the latter.

Read More

Still struggling with GDPR?

By | Scalefree Newsletter | No Comments

The new General Data Protection Regulation (GDPR) is a law by the European Union (EU) and became effective on May 25, 2018. This new regulation is designed to put a high level of protection to personal data of European citizens, what means that companies around the world have to establish transparency and ownership to the individuals’ data and need to get a clear declaration of consent from them to save and process their personal data. Though laws from countries outside the EU (especially the USA) tend to favor business over consumer, GDPR affects all companies over the world who have personal data from EU-citizens in their database.

WHAT IS NEW WITH GDPR?

To be careful with personal data is nothing new, especially not in the EU. The key change of collecting and processing personal data is that the data is now completely under control of the owner, who can force the companies to delete or anonymize their data or to request copies of all owners personal data stored in the system. Personal data or Privately Identifiable Information (PII) means data, an individual can be identified with, e.g. name, phone number or email address. Read More

Pledge 1% – Scalefree unites with Frankfurter Ring

By | Scalefree Newsletter | No Comments

We often discuss relevant Big Data and associated Data Vault topics like “Hybrid Architecture”, “Data Lake” and “Hadoop” as we try to share our knowledge. Though this time around, we’d like to take some time to shift focus and talk about our company culture and commitment to our community.

THE CULTURE OF GIVING BACK

Scalefree built itself upon the idea that success is only a true success when shared with others and that idea has shaped every decision within the organization since we’ve first started. So it was rather easy to finally put that idea into a concrete commitment when presented with the growing movement, Pledge 1%, and it’s focus on building better bridges with those in which we share a community. Read More

Data Warehouse and Data Lake – Do we still need a Data Warehouse?

By | Scalefree Newsletter | No Comments

“Big Data”, “Data Lake”, “Data Swamp”, “Hybrid Architecture”, “NoSQL”, “Hadoop” … terms you are confronted with very often these days when you are dealing with data. Furthermore, the question comes up if you really need a data warehouse nowadays when you deal with a high variety and volume of data. We want to talk about what a data lake is, if we need a data warehouse when using NoSQL platforms like Hadoop, and how it is combined with Data Vault.

WHAT IS A DATA LAKE?

There is a proper definition from Tamara Dull (SAS): “A data lake is a storage repository that holds a vast amount of raw data in its native format, including structured, semi-structured, and unstructured data. The data structure and requirements are not defined until the data is needed.” 1 Read More

How to combine Managed Self-Service BI with Data Vault 2.0?

By | Scalefree Newsletter | One Comment

Last month we talked about a hybrid architecture in Data Vault 2.0, where we explain how to combine structured and unstructured data with a hybrid architecture. To follow up on this topic, we now want to explain how your business users (especially power users) can take a benefit from it with the managed Self-Service Business Intelligence (mSSBI) approach in Data Vault 2.0.

ABOUT SELF-SERVICE BI

Self-service BI allows end-users to completely circumvent IT due to this unresponsiveness of IT. In this approach, business users are left on their own with the whole process of sourcing the data from operational systems, integration and consolidation of the raw data. There are many problems with this self-service approach without the involvement of IT:
Read More

Hybrid Architecture in Data Vault 2.0

By | Scalefree Newsletter | 2 Comments

Business users expect from their data warehouse systems to load and prepare more and more data, regarding the variety, volume, and velocity of data. Also, the workload that is put on typical data warehouse environments is increasing more and more, especially if the initial version of the warehouse has become a success with its first users. Therefore, scalability has multiple dimensions. Last month we talked about Satellites, which play an important role regarding the scalability. Now we explain how to combine structured and unstructured data with a hybrid architecture.

LOGICAL DATA VAULT 2.0 ARCHITECTURE

The Data Vault 2.0 architecture is based on three layers: the staging area which collects the raw data from the source systems, the enterprise data warehouse layer, modeled as a Data Vault 2.0 model, and the information delivery layer with information marts as star schemas and other structures. The architecture supports both batch loading of source systems and real-time loading from the enterprise service bus (ESB) or any other service-oriented architecture (SOA).

Read More