Skip to content

How virtualisation works with TEAM

Roelant Vos edited this page Jan 29, 2019 · 1 revision

This page covers a core functionality of TEAM: how to automatically version the Data Warehouse data model in sync with the version of the ETL automation metadata.

Although versioning models and code is relevant (but rarely implemented) in the traditional ETL area, this requirement to becomes very real when moving to a virtualised Data Warehouse / integrated model (Data Vault 2.0 in this case). This is covered in more detail in earlier posts but suffice it to say is that by removing a physical, or instantiated, core Data Warehouse layer in the design you need a way to preserve the audit trail and support any Data Mart and / or Business Intelligence components that draw information from this Data Warehouse layer. Simply put: your Data Warehouse can’t just have one implementation the one day and a different one the next. Well, it actually can: if you incorporate synchronised version control.

In my mind this is very similar to how (canonical) message formats or (SOA) services are sometimes developed to be backwards compatible. In this world you need to allow for adapters (’users’) to change their configuration when a change in the message / service is deployed. For this purpose sometimes one or more older versions are supported for some time. This is more or less what I’ve been working on for ETL and Data Warehouse models and their Data / Information Marts.

In a practical sense: what I am looking for is a way to cycle through previous and current versions and see the model and metadata change (over time - no pun intended) while you look at your designs. A Data Warehouse time machine.

This itself also serves another purpose, which is decoupling the existing dependency on the implemented data model (the table in the database) to support ETL automation. To fully support the above concepts what is needed is a way to capture the model at a point in time. For the virtualisation development efforts this is really useful as it allows you to generate ETL (Views, Insert Statements, ETL outputs e.g. packages, mappings) without having a physical model in place. As of recently even though the virtual (view-based) Data Warehouse was working fine, it still required a physical model to be present for automation.

This is now something of the past; the only database objects are views now - everything is virtual.

Clone this wiki locally