All Posts By

Implementing Data Vault 2.0 ghost records

By | Scalefree Newsletter | No Comments

Implementing Data Vault 2.0 ghost records

During the development of Data Vault, from the first iteration to its latest Data Vault 2.0, we’ve mentioned the two terms “ghost records” and “zero keys” in our literature as well as in our Data Vault 2.0 Boot Camps. And since then, we’ve noticed these concepts oftentimes being referenced to interchangeably. 

In this blog entry, we’ll discuss the implementation of ghost records in Data Vault 2.0. Please note, that this article is part one of a multi-part blog series clarifying Ghost records vs. Zero Keys. Read More

Data Warehousing and why we need it

By | Scalefree Newsletter | No Comments

A data warehouse is a subject oriented, nonvolatile, integrated, time variant collection of data to support management’s decisions

  • Inmon, W. H. (2005). Building the Data Warehouse. Indianapolis, Ind.: Wiley.
It provides the technical infrastructure needed to run Business Intelligence effectively. Its purpose is to integrate data from different data sources and to provide a historicised database. Through a DWH, consistent and reliable reporting can be ensured. A standardised view of the data can prevent interpretation errors, improved data quality and leads to better decision-making. Furthermore, the historization of data offers additional analysis possibilities and leads to (complete) auditability.  Read More

Data Vault Games with Cindi

By | | No Comments

Cynthia Meyersohn

C​indi has worked in a variety of IT realms over the past 35 years and, as of 2018, had spent the last 17 years working in applications and data engineering development within the U.S. DoD. As a Data Vault 2.0 (DV2) Solution Architect and Certified Instructor, her responsibilities and expertise range from the design, development, implementation, and technical guidance of Enterprise Data Warehouse/Big Data builds to crafting processes surrounding data acquisition and ingest, data governance and Master Data Management policy and compliance, development and team leadership.

Cindi has spent the past seven years leading the architectural design, implementation, and development of Data Vault 2.0 solutions at the U.S. DoD and Department of State. She is a Certified Authorized Data Vault 2.0 Instructor.

Cindi holds a MS in Systems Engineering from George Washington University and a BS in Information Systems from Strayer University.

Christian Kurze

By | | No Comments

MongoDB: A general purpose, distributed, and highly scalable data platform for modern applications



The database for modern applications: MongoDB is a general purpose, document-based, distributed database built for modern application developers and for the cloud era. No database is more productive to use.
MongoDB emerged into a general purpose database that easily allows to build globally distributed data platforms that are highly available and scale almost indefinitely. While NoSQL is still considered as a “new” technology, many of the fortune 1000 companies already migrated mission-critical workloads and decided to use MongoDB as a strategic data platform.

Due to its flexibility, the JSON-based document model support a bandwidth of use cases, like Single View, Internet of Things, Mobile, Real-Time Applications, Personalization, Content Management, Catalogs and Mainframe Offloading.

This presentation provides and overview of MongoDB, the document model, and how data can be accessed in many different ways via native drivers in almost any programming language, but also connectors like Spark or R and even SQL. A practical example shows how to use MongoDB for Data Vault creation in the insurance industry.


Christian spent the last couple of years on data management and data integration in order to generate value out of data. In MongoDB he works as a Principal Solutions Architect. Prior to joining MongoDB, he worked on data virtualization, data warehousing and active metadata management. He holds a PhD in data warehouse automation.

What you will learn

  • Comparison of the document model vs. the relational model
  • Native high availability, horizontal scalability, workload isolation and data locality
  • Deployment agnostic: on-prem, hybrid, cloud, Kubernetes
  • Additional features for rich data usage like S3-based data lake access, full-text search, access by analytical tools, etc.
  • Example how to build a Data Vault in MongoDB



Neil Strange

By | | No Comments

What’s so scary? Safely migrating to a Big Data, Data Vault Solution from a legacy Kimball data warehouse


A frequent question we get asked is “how can I migrate from my existing Kimball data warehouse to a big data Data Vault solution?”
But what do we mean by migration? And what are the implications of choosing a big data architecture? Can we use Snowflake or a Azure SQL Data Warehouse to run our new system? Where do we start?
This presentation will explore the migration question and suggest some good practice for designing a big data Data Vault target architecture.


Neil is the founder and managing director of Datavault UK, a consultancy specialising in Data Vault 2.0 and Information Governance implementations and coaching. He has many years experience working with a diverse range of clients and industries helping organisations make the best strategic use of their IT systems and data services. Neil has presented at the previous three WWDVC events in the USA.


  • How to define your migration project.
  • Architecting your big data Data Vault target solution.
  • Working on the migration process.
  • Migration good practice.

André Dörr

By | | No Comments

Data Vault in sports analytics


Everything started with Moneyball in 2002. It’s the first well know use case, where a sports team used a data-driven approach to measure player value. In the meantime, many sports clubs tried to copy this method. And with more and more  technology entering sports, more and more data is collected and analyzed to get an edge in the competition.


This presentation will take a look at different sports analytics use cases for football clubs.
– Technical challenges in football clubs
– Building a compact analytical architecture based on Data Vault & Exasol
– Data Science in sports analytics with Data Vault & Exasol

Matthias Wegner

By | | No Comments

Data Vault + GDPR at


Matthias Wegner is senior technical consultant for Data Warehouse platforms. He initiated and implemented Data Wareouse platforms for multiple projects and customers in Germany using Data Vault and Talend as the main toolsets.  Providing a tailored set of standards and best practices for all aspects of a Data Warehouse project is one of his main missions.
Matthias is Head of BI at cimt AG – IT consulting since 5 years.
Currently he works as the architect for the Data Warehouse of where the concept of encryption of data for GDPR was developed and implemented.



In this case study we will give you an overview of the data warehouse migration project at You will see how we address GDPR requirements and which role Talend plays in this project. We’ll also show how easy it is to virtualize the access layer through the database switch to Exasol.

· Data Warehouse state-of-the-art, overview and source landscape
· Full Data Vault architecture
· Team setting
· Toolset (Talend, Exasol, Confluence)
· Loading procedures with Talend / Exasol – ELT
· GDPR requirements
· Encryption architecture and decryption approach on the fly in Exasol
· Lessons learned

Matthias Reiß

By | | No Comments

A day at the data lake

Matthias Reiß is a Senior Client Technical Professional within the IBM Cloud and Cognitive Technical Sales Team in Germany.He has more than 15 years experience in Analytics and data integration projects in heterogeneous environments.

A day at the Data Lake – Get your data working in your Data Lake and beyond
Catch the big fish faster. Get the most out of your data in your Data Lake and all the data stores connected to it.Imagine how you can easily combine the different data formats in your lake with other relational and non-relational Data Stores within one single query.

– IBMs common and hybrid SQL Engine
– Data Virtualization
– Data Caching
– Polymorphic Table Functions (i. e. Apache Spark Integration)

Kent Graziano

By | | No Comments

Kent Graziano

Making Sense of Schema-On-Read



Kent Graziano is the Chief Technical Evangelist for Snowflake Computing. His is award winning author, speaker, and trainer, in the areas of data modeling, data architecture, and data warehousing. He is a certified Data Vault Master and Data Vault 2.0 Practitioner (CDVP2), an Oracle ACE Director (Alumni), member of the OakTable Network, expert data modeler and solution architect with more than 30 years of experience, including over two decades doing data warehousing and business intelligence (in multiple industries). He is an internationally recognized expert in Data Vault, Oracle SQL Developer Data Modeler, Agile Data Warehousing, and Cloud-based Data Warehousing. Mr. Graziano has created and led many successful software and data warehouse implementation teams, including multiple agile DW/BI teams. He has written numerous articles, authored three Kindle book (available on, co-authored four books (including the 1st Edition of The Data Model Resource Book), and has given hundreds of presentations, nationally and internationally. He was a co-author on the first book on Data Vault, and the technical editor for Super Charge Your Data Warehouse. You can follow Kent on twitter @KentGraziano or on his blog The Data Warrior (


Making Sense of Schema-On-Read

With the increasing prevalence of semi-structured data from IoT devices, web logs, and other sources, data architects and modelers have to learn how to interpret and project data from things like JSON. While the concept of loading data without upfront modeling is appealing to many, ultimately, in order to make sense of the data and use it to drive business value, we have to turn that schema-on-read data into a real schema! That means data modeling! In this session I will walk through both simple and complex JSON documents, decompose them, then turn them into a representative data model using Oracle SQL Developer Data Modeler. I will show you how they might look using both traditional 3NF and data vault styles of modeling.

  1. See what a JSON document looks like
  2. Understand how to read it
  3. Learn how to convert it to a standard data model