Optimizing Data Vault on Snowflake Cloud Data Platform
Data Vault physical Data Model designed for the Snowflake Cloud Data Platform using SQLDBM

Following on from my recent presentation at the virtual Munich Data Modeling Meetup on how to optimize the Data Vault architecture on Snowflake, along with the session recording, the SQL scripts are now available.

Overview

Data Vault is a modern, agile approach to building an enterprise data warehouse and analytics repository. It provides auditable historical data which can be loaded at speed.

Snowflake is a fully-managed service for data warehousing, data lakes, data engineering, data science, data application development, and for securely sharing and consuming data.

My session followed a series of blog posts by Kent Graziano, Chief Technical Evangelist at Snowflake. Areas covered included parallel data loading of a Data Vault on Snowflake, separation of workloads, key management, and managing big data to support schema-on-read.

Session recording

A recording of the session can be found here.

SQL Scripts on Github

The SQL scripts used during the presentation are now available on Github.

SQL Scripts summary

Script nameDescription
01 – DDL.sqlThe DDL used to create the required Data Vault tables etc. on Snowflake.
05 – Multi table inserts.sqlLoads the Data Vault from ‘Staging’ (sample Snowflake database) using Multi Table Insert statements. Uses Overwrite All for testing purposes.
12 – Create Warehouses.sqlCreates 3 x Snowflake Virtual Warehouses to load Hubs, Links, and Satellites separately.
15 – MultipleWarehouses – 01 – Load Hubs.sql
15 – MultipleWarehouses – 02 – Load Sats.sql
15 – MultipleWarehouses – 03 – Load Links.sql
Each script loads Hubs / Links / Satellites separately using a dedicated Virtual warehouse for separation of workload.
30 – VARIANT – Data Load.sqlLoad JSON into the Data Vault
40 – BusinessVault.sqlParse JSON data and optimization of the Business Vault.