Data sharing including cloud-based data marketplaces are appearing at a growing rate to match data consumers with the right data suppliers. Data sharing has a long history in academic, research, and public policy circles. In more recent years it has made enormous inroads into private enterprises.
Direct data sharing between a publisher and subscriber, and data exchanges such as privacy-safe cleanrooms and data marketplaces are examples of approaches. Data cleanrooms have been identified as a solution to recent marketing challenges.
Overall, businesses are increasingly seeking to augment or enrich internal datasets with external data. However, the road is paved with challenges, whether process orientated, technical, or regulatory.
So where does one begin?
The Third Wave of Open Data
If we look at the evolution of data sharing from the perspective of the Open Data Policy Lab, we see three phases. The first wave is defined as data sharing driven by Freedom of Information. The second wave is primarily public sector focused, and we see that data is proactively shared with a goal of creating public value from previously siloed assets.
This brings us to the third wave. This encompasses publishing with purpose, fostering partnerships and data collaboration, advanced open data at the subnational level, and a “responsibility-by-design” approach to Open Data activities.
The technical capability (using Snowflake as a reference)
There are a myriad of technical considerations in the context of Data Sharing. Using the Snowflake Data Cloud as a reference is an insightful approach to identifying what needs to be considered. At a high level:
- Types of data sharing — direct data sharing, data exchanges, and data marketplaces
- Security — e.g. data masking, row level security encryption, access auditing, query auditing, end to end encryption including tri-secret-secure (a combination of a Snowflake-maintained key and a customer-managed key in the cloud provider platform that hosts your Snowflake account to create a composite master key to protect your Snowflake data)
- Avoid the need to engineer data pipelines to move data a.k.a. ‘frictionless integration of data’
- Data classification via object tagging
- Data modeling — The shared data will typically have structure. Even if it’s not — e.g. semi-structured documents shared via Snowflake external tables — a logical structure is still needed to garner value from the data.
- Do we need to keep traffic within a Virtual Private Cloud i.e. off the public cloud?
- Scalability whether scaling up or scaling out or both
- The types of objects that can be shared including the role that secure views and secure functions can play
A new data regulation kid on the block?
When we discuss data sharing, regulatory compliance is often part of that discussion. In particular in the context of personal data. Along with regulations such as the GDPR and the CCPA (California Consumer Privacy Act), there will soon be a new data regulation kid on the block. The European Union’s Data Governance Act (DGA). This has a strong focus on Data Sharing.
Also, there may be licensing considerations to navigate.
As indicated above, there are a variety of considerations in the context of data sharing. These can be process-based, technical, or regulatory compliance orientated. More importantly, what about the data itself? Is the data available, up to date, complete, and most importantly, trustworthy?
If the above challenges are addressed, is this enough to ensure data sharing success?
What does the future hold for data sharing capabilities such as clean rooms and data marketplaces?
And why are data clean rooms being promoted as a solution for the marketing industry?
To find out more, register for The Evolution of Data Sharing & Data Marketplaces, a free online event taking place at the Data Engineering and Data Architecture Group (DEDAG) on Tuesday 22nd February at 6pm GMT.
Copyright ©2022, Dan Galavan.