
Following on from my recent presentation at the virtual Munich Data Modeling Meetup on how to optimize the Data Vault architecture on Snowflake, along with the session recording, the SQL scripts are now available.
Overview
Data Vault is a modern, agile approach to building an enterprise data warehouse and analytics repository. It provides auditable historical data which can be loaded at speed.
Snowflake is a fully-managed service for data warehousing, data lakes, data engineering, data science, data application development, and for securely sharing and consuming data.
My session followed a series of blog posts by Kent Graziano, Chief Technical Evangelist at Snowflake. Areas covered included parallel data loading of a Data Vault on Snowflake, separation of workloads, key management, and managing big data to support schema-on-read.
Session recording
A recording of the session can be found here.
SQL Scripts on Github
The SQL scripts used during the presentation are now available on Github.
SQL Scripts summary
Script name | Description |
01 – DDL.sql | The DDL used to create the required Data Vault tables etc. on Snowflake. |
05 – Multi table inserts.sql | Loads the Data Vault from ‘Staging’ (sample Snowflake database) using Multi Table Insert statements. Uses Overwrite All for testing purposes. |
12 – Create Warehouses.sql | Creates 3 x Snowflake Virtual Warehouses to load Hubs, Links, and Satellites separately. |
15 – MultipleWarehouses – 01 – Load Hubs.sql 15 – MultipleWarehouses – 02 – Load Sats.sql 15 – MultipleWarehouses – 03 – Load Links.sql | Each script loads Hubs / Links / Satellites separately using a dedicated Virtual warehouse for separation of workload. |
30 – VARIANT – Data Load.sql | Load JSON into the Data Vault |
40 – BusinessVault.sql | Parse JSON data and optimization of the Business Vault. |