
Yesterday I attended the Great Data Minds debate on the pros and cons of different data architectures. The panel:
- Hans Hultgren: CEO, Genesee Academy – defending DataVault
- Scot Reagin: CEO, Sensible Data Integrations – defending Traditional
- Tyler Allbritton: Managing Director, AEGroup – defending the Flattening
- Mike Lampa: Advisor, Great Data Minds – Moderator
Mike set the scene by remarking on the correlation with the Kimball vs. Inmon debates of days gone by. In my own career, going back to my first data warehousing project in 2007, this was a theme.
I really liked Scot’s point that while horsepower can overcome a lot of deficiency, it can’t solve operational challenges, or the accumulation of technical debt. The practical points pertaining to solution modularity were also notable. As Scot explained, in order to support varied needs you have to be able to apply the right combination of components – models, MDM, Auditing Rules, etc. – rather than being constrained within a single rigid data pipeline. He also added that irrespective of approach, the business will change its mind, and we need to be able to respond to that.
For me, this is where traditional data modelling approaches can really struggle, particularly in data warehousing. In the time it takes to model a particular area, the business requirements will probably have evolved, and we as a data architectural community need to be able to react to that. 3 to 6 months is simply too long for any client. Having worked on a number of enterprise data modelling projects over the years in the health product regulation, finance, and investment sectors I’ve seen this first hand. Given the rate of change of requirements that clients have to grapple with, this approach can really struggle when it comes to agility. As Hans stated, “we never have time to build a big traditional logical model. Models should be able to address problems from a logical perspective in a 2-6 week timeframe, end to end”.
In defense of the complete flattening of data, Tyler made some compelling points such as denormalized data being more portable. Also if you get people to explain their business problems in a discrete manner, the ‘flatten very quick’ methodology can align with that. He also highlighted the risks and possible mitigation of those risks via discipline and automated approaches such as continuous integration, and the benefits of artefacts such as process and data documentation indicating grain etc.
As I have seen over the years, these artefacts are often not forthcoming. As data architects we are like doctors, we need to understand the ailments, explore the underlying business environment conditions, and prescribe the right solution.
For me, Hans made the most important point of the talk: “Ensemble Logical Modelling works like a conformed dimension; decomposed table structures which operate under the same key, and which are then built incrementally without having all the answers upfront. Taking into account schema on read, the logical data is all that remains”.
My top three takeaways from the debate:
- You can get either of the 3 approaches to work, but the need to understand the data is inescapable. As Hans mentioned, you must “connect with the business, know the data, and deliver it fast”
- It is essential to understand the merits of each approach. If you are considering one of these paths, it is important to be aware of any gaps between where you are, or your organization is, vs. the destination
- Whichever approach is taken, the necessity to manage expectations remains, and this cannot be done without covering the previous points
A big thank you to Great Data Minds for hosting the event, and to the panelists for an engrossing discussion. The session was engaging, insightful, and entertaining, and I’m looking forward to the next Great Data Minds event!
Dan Galavan is a Data Architect who has been delivering data solutions to clients for 21 years, and is a Certified Data Vault 2.0 practitioner.