According to the recent messaging of many players in the data space, self-service analytics is the next-big-thing in the data space, the end-users want it, and the organizations should acquire as quickly as possible the technologies needed to deliver it.
I beg to differ on the starting assumption: no matter the definition of “self-service analytics” adopted, there are many, what the business users really want is access to the information contained in the organization’s data in a time-frame that makes said information usable to drive a business decision.
The definition of “self-service analytics” is not standardized and this can cause misunderstandings when discussing the subject.
The spectrum goes from a relatively conservative “self-service reporting capability”, where the data is integrated by the IT team (for example Gartner‘s and usually the BI tool vendors’), to a more modern “data-mesh-enabled self-service-everything-data” where simpler-to-use tools and standard interfaces are made available and allow the end-users to integrate the data as they deem appropriate without having to wait for the IT team to perform the task (TDWI‘s and Snowflake‘s definitions leans more in this direction).
Self-service analytics, in the modern definition, today might the best way to get to the desired timely insights from data given the current technological and organizational landscapes (i.e. without waiting a few months for the IT data integration process to complete).
“Best today” is not synonymous with “great” and I believe that a generalized shift of the end-users toward modern self-service analytics should not be advocated.
Organizations should keep investing toward a truly integrated data landscape.
The fact that data self-help can now be performed using big data technologies in a cloud environment doesn’t make the practice substantially different, in terms of risks, from the Excel-based “spread-marts” of the past. Even with the modern tools it is still possible to have different people label the same way data that is integrated, filtered, and aggregated in different ways at different point in time resulting in siloed, mismatching data marts.
The division of work and specialization is what enabled the standards of living to improve at an increasingly fast pace when humans started to leverage them at scale.
Mass production drove up the quality and volume of goods produced for the unit of time (and money) in exchange for a longer setup time of the production line and a reduced space for customization (“Any customer can have a car painted any colour that he wants so long as it is black” is a famous quote attributed to Henry Ford), but production techniques have evolved over time and, remaining in the car domain, today’s production give us a degree of flexibility in designing our vehicle Ford couldn’t even dream of.
This evolution didn’t completely destroy the demand for artisanal work and custom realizations, but made these realizations the exception rather than standard, and the prices for high-quality, personalized solutions are significantly higher.
When thinking about self-service analytics and data democratization organizations should be careful to properly model the associated sunken costs.
It is easy to quantify the cost of a centralized data modeling and engineering team, but is hard to model the costs of self-service analytics practices.
We should have learned from the experiments with the Hadoop-based data lakes that the apparently cheap schema-on-read approach resulted either in an increase of costs, as multiple people had to figure out the same data integration over and over, or in data swamps that no one wanted or could use.
There is no doubt that self-service analytics is more cost-effective with a proper data mesh (i.e. one where you have real data products and is based on a enterprise-wide domain driven design) than with the first-generation data lakes, but this doesn’t mean that the specialized, factory-like, creation of integrated data (products), that takes longer to build, should be abandoned.
While it’s great that today I could get the material needed to quickly replace the shattered glass of my table delivered at home with a few clicks, without having to lease a small truck and go to the shop like a decade ago, it doesn’t mean that now I should build my own kitchen.
Organizations should strive to have a faster, more flexible, mass production of high-quality insights and augment it, rather than replace it, with an efficient and effective delivery of raw data.
Just like I should buy my next kitchen here again rather than try to build it myself.