What to consider for naming conventions in Data Warehousing – Part 2

By May 28, 2020Scalefree Newsletter

In a previous blog post, we discussed the different aspects of a naming standard documentation – from letter case types to the consideration between using prefixes or suffixes in database object names.

Throughout this article, we will continue presenting our suggestions for naming conventions in a data warehouse solution, as well as sharing examples for naming standards, which both our team and our customers utilize internally.

Layer schemas

For layer schema names, we prefer using prefixes.
As discussed in the previous blog post, this convention boosts visibility in data exploration within the Enterprise Data Warehouse for developers and business users by grouping schemas of the same data warehouse layer together.
The following is a list of common Enterprise Data Warehouse layers and our associated recommendations regarding naming conventions:

Figure 1: Recommended Naming Conventions for Common Enterprise Data Warehouse Layers

EDW/Data vault entities

As discussed in the previous section, we do not separate the Raw Data Vault and the Business Vault into different database schemas. Instead, users will be able to distinguish between a Raw Vault entity and a computed one via the entity name.

The following is a list of naming conventions for standard Data Vault entities in addition to more advanced entity types.

**Note: PIT and bridge tables are only available in Business Vault

Figure 2: Naming Conventions for Standard Data Vault Entities

More about naming conventions for satellite entities

Satellite splitting criteria play a vital role in the naming conventions for satellite structures. In fact, it is not recommended that all descriptive data of a business object be stored in a single satellite structure – instead, raw data should preferably be split by certain criteria. ( Linstedt, D., & Olschimke, M. (2016). Building a scalable Data Warehouse with Data Vault 2.0. P114-115)

 

At Scalefree, we have defined three types of satellite splits: 

  • split by source system 
  • technical split by rate of change, security level and privacy level
  • and business-driven split

A satellite split by source system and the technical split by rate of change of data are common, recommended practices when it comes to splitting descriptive attributes. However, we have decided to split raw data even further, both technically and by business meaning.
As such, our full naming convention for satellite entities is as follows:

 

[parent-obj]_[biz-split]_[src]_[tech-split]_[satellite-type-suffix]

 

With:

[parent-obj] = the name of the satellite parent’s business object

[biz-split] = the business classification, 

[src] = an abbreviation for the source system and 

[tech-split] = a combination of technical satellite splitting criteria. 

 

Please note that the split denoted by different security levels classifies data into security groups. Thus, end users are given access to only certain groups of tables corresponding to their clearance level.
As part of our process, the security levels range from:

  • the lowest confidentiality level – level 0, 1: no security measure required, for public data,
  • to limited access to certain internal parties – level A, R, C, F.
  • to the highest confidentiality level – level S: top secret.

 

It is important to take note that the final technical satellite splitting criterion within Scalefree’s SOP is defined by the split in privacy level. This split is a necessary component of the process as it separates personal and non-personal information from each other.

 

Moving onwards, the business-driven satellite split distributes raw data into different satellite tables utilizing certain business meanings of data content.
We have defined several classifications for this purpose, to name a few: “contact” for contact data and “activity” for data that tracks the interactions users have made with the source record.
Additionally, data modelers can define custom business classifications for specified unique business meanings in business objects.
For example, all data attributes of an application installed on the CRM platform Salesforce are often stored within a single satellite structure. Thus, the business-driven satellite split aims to boost usability and accessibility of the EDW for end users. It does so by assisting users in finding the data they require within the DWH in a way that is both faster and easier by classifying data into groups that have actual meaning to business users.

 

Putting everything together, here’s an example of a satellite name in our internal EDW solution:

customer_contact_sfdc_lcp_s

The above is a satellite of business object Customer and holds customers’ contact information from the source system Salesforce. Thus, its content has a low rate of change, a security level of C and contains personal data.

 

Summary

In this article, we’ve presented our recommendations regarding naming conventions for different types of data warehouse objects and offered an in-depth look into our considerations for satellite naming.

 

Are you interested in other naming standard aspects? Let us know in the comments section!

– by Trung Ta

Get Updates and Support

Please send inquiries and feature requests to [email protected]

For Data Vault training and on-site training inquiries, please contact [email protected] or register at www.scalefree.com.

To support the creation of Visual Data Vault drawings in Microsoft Visio, a stencil is implemented that can be used to draw Data Vault models. The stencil is available at www.visualdatavault.com.

Newsletter

New Data Vault insights every month

Scalefree

Join the discussion 2 Comments

  • Arne Bleijs says:

    Good overview, especially with respect to reasons behind certain aspects of naming.

    One remark:
    At ChipSoft we use database schema’s for layers, because they are a security boundary.
    One can grant permissions on schema’s to users or roles, so that they apply to all objects in the schema.
    In database tools objects (tables) are also grouped by schema, so you have the logical grouping you wish.

    However, I’m also aware that some end user tools do not always show the schema part of an object name, so for end users this can be confusing. Is that the reason why you recommend using prefixes rather than database schemas?

    Regards,

    • Sandra Krenn Sandra Krenn says:

      Hi Arne,

      thank you for your comment and remark! The reason why we’d prefer using prefixes in database schema names, is to have them neatly listed together in most end user tools. As per your concern, there are indeed some tools that hide schema name from the object name – in that case, having a well-structured naming standard for database entities themselves (hubs, links, satellites, etc., as discussed in Section “EDW/Data Vault Entities” of the article), will still do the job just as well.

      I hope that answers your question and look forward to discussing more with you!

      Thank you kindly,
      Your Scalefree Team

Leave a Reply