Replacing the word “source” with the word “product” is not enough to change the reality of your data.

I recently shared my perspective about the Data Mess vs. Data Mesh.
Shortly afterward I was chatting with my friend Luca asking his feedback and his main comments were along the lines of “it’s all good, but it’s very hard to find an effective mechanism to reward and incentivize the change”.
In this post I’m sharing my view on this challenge.

Citing  Zhamak Dehghani original work I highlighted that “Data as a product” is one of the pillars of the Data Mesh.
To reap the benefits promised by the data mesh organizations then feel urged to morph their data sets into data products.

Unfortunately there are many definitions of product and they are not equally useful to transform the organization and create a real, useful data mesh.

One definition of product is: “a thing that is the result of an action or process”.
This is the definition easiest to apply and is the most dangerous too.
It makes possible to quickly and automatically label every existing data source as a “data product” without the need to change anything in the existing processes.
It is a sure guarantee that the data mess will remain in place for the years to come with data remaining a by-product of the business processes rather than a “real” product.
Just like today, but with the trendy label.

A definition of product much more useful to incentivize the organizational change is the following: “an article or substance that is manufactured or refined for sale”.
The key part is “for sale” because it implies the existence of a historically strong product improvement driver: money changing hands and increasing a producer’s wealth in the process.

Many organizations have created, or are in the process of creating a “data marketplace”, to facilitate data product exchanges.
Unfortunately a lack of general agreement about what a “data marketplace” should be can lead to the creation of something slightly, but significantly different: a data catalog.
In recent implementations, hopefully, the catalog is paired with a set of tools to self service the data access and/or transfer (I’ll come to the subject of transfer vs. access in a future post).
This kind of data marketplaces work nicely with the first (and lesser) definition of a data product but is not fully supporting the value creation expected from the adoption of the second definition.

The data marketplace that supports continuously improving data products is something slightly, but significantly different.
On top of the functional characteristics of the basic marketplace I listed earlier it enables a low friction exchange of (data) goods for a certain amount of an agreed currency.

The technical means to easily move money around are many, well known, and broadly available.
The tricky part is, once again, an organization and people problem: defining the “certain amount” of currency that should change hands.

Who sets the price of the data products and how?

Being fundamentally Austrian in my vision of economy my first answer was: the free market!
Unfortunately this is a bad approach for the data market because the producers of data raw (I’ll tentatively blog about raw data/data by-products in the future) are in most cases natural monopolies: having only monopolists setting the prices would immediately lead to a complete failure of the marketplace.
Centrally regulated prices appears to be the only option.

Historically centralized economies have been trailing free economies in terms of wealth generation and this can be concerning: how we could prevent the same from happening in our data marketplace?
The special nature of digital goods compared to physical goods (produce once, sell many times) help us a bit in this matter.
Setting a fixed price centrally will promote efficiencies in data production at the assigned quality point (by making the production more efficient the producer will increase his gain), but shouldn’t destroy completely the incentives to improve the products because the same price is paid by each consumer and not by the corporate through a budget directly allocated to the producer.
In this scenario the producer has an incentive to get more consumers to buy the data product by improving it (there is a quantifiable return on additional investments made on the data product) and by sharing ideas about new ways to create value from data.

Deciding the prices is, once again, a non-technical problem.
I would promote value-based prices over cost-based prices any day because inefficient production processes lead to a higher price for a set value generated (I am being Austrian again here), but there is, again, a data product specific constraint here: most organizations have a hard time materializing the expected return of their data investiment.
I dare to say that many can’t even quantify the obtained return at all and this makes a proper value-based central pricing of data assets close to impossible.
The only the option left is to set the initial price of data products based on a linear combination of incurred and recurring production costs.
This unit price of the data product (subscription) for each consumer will be calculated by dividing the (calculated) current cost by the number of current consumers and the budgets will be aligned accordingly to the current consumers.
Organizations can (and should) apply a periodic price deflation factor to the initial prices to drive efficiency up and prevent omission bias and complacency on the producer side .