We can use various facets of Data Governance as inputs into investigating a user’s data processing activity. As examples, we can use logging, metadata management, and security configuration.
The Snowflake Data Cloud provides and facilitates a variety of Data Governance aspects. Some examples include:
- Login & Query History
- Role-based Access Control (RBAC) where access privileges are assigned to roles, which are in turn assigned to users
- Time Travel and Table Streams to record data changes
Another example – and a new feature – is Access History. This adds to Data Governance and the possibilities for “Holmesian deduction” on the Snowflake Data Cloud.
What is Access History?
Access History tells us which columns and associated objects that a user read. It also records when that read took place. Access History is available as part of the Account Usage suite of views, in a view called
- You may encounter a lead time between a data read taking place and a corresponding record being available in the ACCOUNT_USAGE.ACCESS_HISTORY view
- In my tests the delay was about 10 seconds but this may vary and can be up to 3 hours
- The data is available for 365 days
- The records include data related to dropped objects
- The data by default is available to the
ACCOUNTADMINSnowflake role only (other roles can be granted access)
What does Access History look like?
The ACCESS_HISTORY view includes the following columns:
- QUERY_ID – unique system ID
- QUERY_START_TIME – statement start time (UTC time zone)
- USER_NAME – The user who executed the query
- DIRECT_OBJECTS_ACCESSED – The objects that the user’s query read e.g. view and corresponding columns
- BASE_OBJECTS_ACCESSED – The base data objects e.g. if the user read from a view, the underlying tables would be the “base objects”
Given the variable nature of the last two columns, these columns are configured as JSON arrays.
We can get into the detail of which columns the user read from, whether ‘direct’, or ‘base’ by using the Snowflake
FLATTEN allows us to parse the JSON into separate rows.
I have prepared a script which:
- Allows us to view all base columns without needing to know how many columns were read from
- Filters out the
ACCESS_HISTORYview and underlying objects from the result set (if we
ACCESS_HISTORYview, a record is also added to the
- Exclude rows where the JSON columnName element is empty.
,B.value:"objectName"::STRING AS Base_Object_Name
,ColB.value:"columnName"::STRING AS Base_Column_Read_From
FROM ACCOUNT_USAGE.ACCESS_HISTORY H
,LATERAL FLATTEN(H.BASE_OBJECTS_ACCESSED) B
,LATERAL FLATTEN(B.value, RECURSIVE=>TRUE) ColB
WHERE NOT (B.value:objectName::string LIKE ANY ('SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY','%TABLE_ACCESS_LOGS%'))
AND ColB.value:"columnName"::STRING IS NOT NULL
ORDER BY H.QUERY_START_TIME DESC;
We can use the same approach for “Direct” objects.
Would Sherlock Holmes use Snowflake ACCESS_HISTORY?
Firstly, I suspect that Holmes would note that the Snowflake
ACCESS_HISTORY view is in Preview. In other words there may be a few kinks to iron out. Secondly, he would take into account that Snowflake Enterprise Edition or higher is required.
Thirdly, he would deduce that:
- It’s an opportunity to fish out possibly unused data
- It can help with impact assessments on planned database changes
- It can play a key part in any Data Governance initiatives
- It’s a Snowflake
Account Usageview – while there may be a brief lead time with the evidence – it will be available for 365 days.
Fourthly and finally, the view shows which users are querying which columns, and when. As a key input into Holmesian deduction, this is ultimately why Sherlock Holmes would use the ACCESS_HISTORY view.
In fact, I suspect he would describe it as “Elementary!”.
About the Author: Dan Galavan is a Data Architecture consultant with over 22 years experience, a Snowflake Data SuperHero, and Snowflake SnowPro Core Certified. Along with consultancy services, he provides Snowflake Data Cloud training.
Need Snowflake training? Then you’ve come to the right place.