Tip – Holmesian deduction on the Snowflake Data Cloud

Tips for using the Snowflake Data Cloud, ACCESS_HISTORY view, and Holmesian deduction

Sherlock Holmes, the fictional private detective, uses a combination of observation, deduction, forensic science, and logical reasoning to investigate cases. He applies these facets to great effect in solving those cases. This process is known as “Holmesian deduction”.

We can use various facets of Data Governance as inputs into investigating a user’s data processing activity. As examples, we can use logging, metadata management, and security configuration.

The Snowflake Data Cloud provides and facilitates a variety of Data Governance aspects. Some examples include:

  • Login & Query History
  • Role-based Access Control (RBAC) where access privileges are assigned to roles, which are in turn assigned to users
  • Time Travel and Table Streams to record data changes

Another example – and a new feature – is Access History. This adds to Data Governance and the possibilities for “Holmesian deduction” on the Snowflake Data Cloud.

What is Access History?

Access History tells us which columns and associated objects that a user read. It also records when that read took place. Access History is available as part of the Account Usage suite of views, in a view called ACCOUNT_USAGE.ACCESS_HISTORY.

This means:

  • You may encounter a lead time between a data read taking place and a corresponding record being available in the ACCOUNT_USAGE.ACCESS_HISTORY view
    • In my tests the delay was about 10 seconds but this may vary and can be up to 3 hours
  • The data is available for 365 days
  • The records include data related to dropped objects
  • The data by default is available to the ACCOUNTADMIN Snowflake role only (other roles can be granted access)

What does Access History look like?

An example output from the Snowflake ACCESS_HISTORY view

The ACCESS_HISTORY view includes the following columns:

  • QUERY_ID – unique system ID
  • QUERY_START_TIME – statement start time (UTC time zone)
  • USER_NAME – The user who executed the query
  • DIRECT_OBJECTS_ACCESSED – The objects that the user’s query read e.g. view and corresponding columns
  • BASE_OBJECTS_ACCESSED – The base data objects e.g. if the user read from a view, the underlying tables would be the “base objects”
An example of the JSON contents of an OBJECTS_ACCESSED column in the ACCESS_HISTORY view. It lists the table accessed along with the columns accessed on that table
An example of the JSON contents of an OBJECTS_ACCESSED column in the ACCESS_HISTORY view. It lists the table accessed along with the columns accessed on that table

Given the variable nature of the last two columns, these columns are configured as JSON arrays.

We can get into the detail of which columns the user read from, whether ‘direct’, or ‘base’ by using the Snowflake FLATTEN command. FLATTEN allows us to parse the JSON into separate rows.

I have prepared a script which:

  • Allows us to view all base columns without needing to know how many columns were read from
  • Filters out the ACCESS_HISTORY view and underlying objects from the result set (if we SELECT from the ACCESS_HISTORY view, a record is also added to the ACCESS_HISTORY view)
  • Exclude rows where the JSON columnName element is empty.
SELECT H.QUERY_START_TIME
        ,H.USER_NAME
        ,B.value:"objectName"::STRING    AS Base_Object_Name            
        ,ColB.value:"columnName"::STRING AS Base_Column_Read_From                    
FROM ACCOUNT_USAGE.ACCESS_HISTORY H
    ,LATERAL FLATTEN(H.BASE_OBJECTS_ACCESSED) B 
    ,LATERAL FLATTEN(B.value, RECURSIVE=>TRUE) ColB     
WHERE NOT (B.value:objectName::string LIKE ANY ('SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY','%TABLE_ACCESS_LOGS%')) 
    AND ColB.value:"columnName"::STRING IS NOT NULL
ORDER BY H.QUERY_START_TIME DESC;
Parsed JSON outputs from the ACCESS_HISTORY view. The base object accessed is in the BASE_OBJECT_NAME column, and the columns on that table which were read from are in the BASE_COLUMN_ACCESSED column

We can use the same approach for “Direct” objects.

Would Sherlock Holmes use Snowflake ACCESS_HISTORY?

Firstly, I suspect that Holmes would note that the Snowflake ACCESS_HISTORY view is in Preview. In other words there may be a few kinks to iron out. Secondly, he would take into account that Snowflake Enterprise Edition or higher is required.

Thirdly, he would deduce that:

  • It’s an opportunity to fish out possibly unused data
  • It can help with impact assessments on planned database changes
  • It can play a key part in any Data Governance initiatives
  • It’s a Snowflake Account Usage view – while there may be a brief lead time with the evidence – it will be available for 365 days.

Fourthly and finally, the view shows which users are querying which columns, and when. As a key input into Holmesian deduction, this is ultimately why Sherlock Holmes would use the ACCESS_HISTORY view.

In fact, I suspect he would describe it as “Elementary!”.

About the Author: Dan Galavan is a Data Architecture consultant with over 20 years experience, a Snowflake Data SuperHero, and Snowflake SnowPro Core Certified. Along with consultancy services, he provides Snowflake Data Cloud training.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.