What’s the right LinkedIn connection policy for me?

It is more than 15 years I’m using LinkedIn and in all this time I stuck by the same policy: connect only to people I know well enough to be able to answer work-relevant questions about them.

I’m confident that 90% of my contacts met this criteria at the time I connected, but with the years passing my recall of past interactions in most cases got blurred or faded away altogether. And it is also true that most people has changed, usually for the better, since we worked or studied together.
It’s a fact that for most of my contacts I’m no longer able to answer with confidence whether they might, or might not, be a good fit for a certain role.

To remain coherent with my original mental model about when I should be connected with someone in LinkedIn I should take my list of contacts and slash mercilessly at least half of them.
Unfortunately isn’t this easy.
This approach looks wrong to me because my current inability to answer specific questions about them doesn’t make the reality of my past interactions with my contacts any less true and if I thought at the time that it was right to connect I shouldn’t assume now it’s no longer the case.
It would also be unfair to my connections..

While I have pretty much made my mind about my existing contacts I’m quite undecided about new ones.
Should I maintain the approach just accepting the fact in a few years I will not be able to answer detailed questions?
Should I accept the connection from people I’ve met in person and use LinkedIn as a modern version of the business card holder?
Should I differentiate, and safeguard against the effect of time, the deeper interactions using the recommendations more than I do today?
Please share in the comments how do you manage your network and why.

P.S.I know there are many that take an open network approach accepting anyone reaching out on LinkedIn and why it makes sense for them, but I know already this is not something I am comfortable with.

Being a tech-savvy parent sometime can be very frustrating.

The COVID-19 pandemic has forced a great acceleration in the adoption of information technology by schools scrambling to keep students able to learn while at home.
The private school of my children took the path of cloud technology, a very reasonable one to scale capacity rapidly, and went for Google’s ecosystem: Google Drive and Google Docs/Sheets form the backbone of their solution.

All is good as long as you have a reliable connectivity, which is the case of Singapore, and the students are able to self-manage their use of non-school-related Google services.
The moment the children need help in resisting the temptation of burning tens of hours per week on YouTube things become more complicated for parents that neither have a solid understanding of the technology nor can spend all their time watching the computer use of their children.
Having a fairly solid IT background I thought I was in a better position that the average parent and could easily implement an automated solution to the problem.

I have a firewall at home and I thought that I could simply block YouTube through it.
Unfortunately blocking YouTube-related domains (m.youtube.com, http://www.youtube-nocookie.com, http://www.youtube.com, youtube.googleapis.com, youtubei.googleapis.com) not only works perfectly to block YouTube, but also works perfectly to prevent Google Drive website from loading.
A quick search confirmed it wasn’t something I did wrong on my side, but the way Google has setup their services.
But you can use the official app to make Google Drive look like a local disk and access the files while blocking YouTube, right?
I did so and felt quite proud of it, but only for a short while. Just until I realized this is not enough to edit documents created with Google’s productivity suite.
The documents in the virtual disk in reality contain only the metadata to open the remote documents with Google’s productivity suite online, not the actual documents: I needed YouTube access again.

Another quick search provided me with a manual workaround for the documents created with Google’s tools, but I find it is fairly impractical:
1) Download locally through the web interface the document created with Google’s productivity suite. This will automatically trigger a conversion to a standard office format for the downloaded copy.
2) Load back the converted file after ensuring you didn’t configure the settings to automatically convert to Google’s format
3) Edit the standard-format document both online (with Google’s tool) and locally (with LibreOffice if you don’t have a license for M365).
Because my son already had >100 documents in Google format and the first 2 steps require access to YouTube every time he forgets to perform them at school I find this route quite inconvenient.
For the time being I am giving up on finding a way to automatically block YouTube while having full access to Google Docs.

From the business point of view I can see why Google would rather not have parents blocking children’s access to YouTube; this is one way they make money with their “free” productivity tools.
I would also have understood the business decision to go with Google’s productivity suite if it was made by a public institute in a country with financial troubles because it’s still much better than leaving the students unable to learn.
But when the school asks for annual tuition fees in the range of of the tens of thousands of dollars picking Google’s productivity suite over Microsoft 365, which is also cloud based, but works by default with industry standard formats, it simply makes no sense from a parent’s point of view.
The school should have invested adequately in tools and personnel to facilitate the remote learning needs of their students at a level matching their premium tuition fees.

If someone knows a fully automatic way to block YouTube while at the same time work seamlessly with Google’s documents please share it in the comments: I’m sure I will not be the only parent happy to know how to do it.

A personal experience about the role of software in sustainability

A bit less than two years ago, shortly after arriving in Singapore and with the perspective of a life of intense work-related traveling, I purchased a phone from a local telco company.
I picked a device in the premium segment expecting that this way I would be covered in terms of frequent security updates at least for the duration of my contract.
Phone life was pretty much uneventful with me accepting updates as soon as offered and quickly forgetting the matter afterward.

Until a few weeks ago when, one sunny morning, I got an email stating my smartphone was no longer authorized to access corporate resources because it was on a security patch older than 3 months.
The first though was I had somehow missed the patch, but I checked and the phone said I was on the latest. Which indeed was more than 3 months old as per the email message.
As I added fruitless calls to fruitless calls and chats and forum reading I grew more and more unhappy and the only silver lining was that being unable to travel freely made the problem an annoyance rather than a disruptive event.

Since then I had the opportunity to think about two things:
1) how software maintenance plays a key role in sustainability and saving natural resources
2) how different markets are treated differently by the same vendor.

If Covid-19 hadn’t grounded pretty much everyone in Singapore how would I have coped with the crippled communication functionalities of the smartphone?
Because using the smartphone as an hotspot for the computer is often inconvenient and sometimes is not viable at all.
I’m pretty sure I would have ended up purchasing a new device in a rush while the old one was perfectly working except for the lack of the security patch and of an estimated release date for it to help me plan my next course of action.
The end result of that would have been an unhappy customer, wasted resources and no guarantee that the problem wouldn’t resurface again in a couple of years on the new device.

While trying to find how to solve my problem I discovered that my phone model in other countries, for example Italy, hadn’t moved from monthly to quarterly security patch releases.
The temptation to download a foreign firmware was strong, but the risk of bricking the phone or, even worse, ending up with a hacked firmware containing a backdoor was too high.

Now I wonder: maybe the market in Singapore is too small to keep investing on frequent patches even for premium phones?
I’d rather not have to pick and choose between the risk of not getting security patches and the risk of not having a local warranty.

After a couple of weeks a patch was released in Singapore and corporate access was back as before, but a regular traveler could really afford the risk of being back to the same situation in 3 months time?
What the vendor is teaching me, and every one of their customers, is that cheaper, more frequent purchases are a better money allocation strategy to remain covered by the security patches.
And if this is a bad strategy for our planet… is not really their problem, just bad luck for the environment and the future generations.

Synology DSM 7.0 as a simple ad-blocker

I’m ok with the ad-supported business model, but when on the screen I find 10 (ten!) advertising for the same computer vendor I feel a strong urge to get some ad-freedom.
Often times the ads are not static, but dynamic and this obsoletes older computer faster than strictly needed. This motivates me even further.

A few years back I have successfully used a nice little tool to convert my DS411Slim into an ad-blocker.
The process was simple and all was good until my DS411Slim died for good.
After recovering all the data (with a fair amount of stress and work) I kept using the same solution on the Xpenology running on a HP microserver Gen8 that replaced it.

Fast forward a few years and I moved to Singapore without bringing the server over, but still needed a baseline level of ad-freedom.
For a while I used OPNsense (running on a Kodlix GK45) for this purpose, but after a few (several) months I grew tired of Suricata turning off seemingly randomly on the box (with little useable logs to troubleshoot and no improvement with the OPNsense patches) and sometimes the DNS service doing just the same.
The OPNsense box got decommissioned (it is waiting to be repurposed) and a Unifi USG (I had it sitting around unused since in Italy I moved to a Sophos firewall for my dual-wan needs) took over the basic firewall duties leaving me with my ad-invasion problem again.

Sunday last week I thought it was a good idea to get rid of my ad problem leveraging my recently purchased Synology DS120J in tandem with a used DS115J (yes, I did learn from the DS411Slim experience that a NAS is a single point of failure even when is supporting multiple disks).
Because I went through the process in the past I thought it was going to be a matter of a few minutes and a good idea to relax a bit during my study session for the AWS Certified Solutions Architect – Professional exam.

After almost 30 years dealing with computers I should have known better that often times “few minutes” becomes “few hours”.
In this case the time-stretching happened because I’m running DSM 7.0 instead of DSM 6.x like in my previous NAS and Synology in this major version changed the way some of the built-in services are managed.
Because of this change the directory structure changed too and the original script didn’t work out of the box anymore.
The former:
/etc/zone/data/
in DSM7 becomes:
/volume1/@appstore/DNSServer/named/etc/zone/data/

After hammering here and there for a while I got to a hack that seems to be working for me.
If someone wants to take a look at it I attached it in the GitHub issue here.
The usual disclaimer apply: there is no guarantee implied or expressed that it will work for anyone else, it’s totally unrelated to my current or past job and employers, and if the script is used I bear no responsibility for anything bad happening. You can read at the bottom the full, wordy, detail.

Because I passed my certification exam last Friday (Friday the 17th, what an auspicious day did I pick!) I feel much less guilty now of having subtracted two hours from my study and I hope with this post I will save some time for others trying to follow the same approach.

https://marcoullasci.com/2019/08/18/disclaimer/

Replacing the word “source” with the word “product” is not enough to change the reality of your data.

I recently shared my perspective about the Data Mess vs. Data Mesh.
Shortly afterward I was chatting with my friend Luca asking his feedback and his main comments were along the lines of “it’s all good, but it’s very hard to find an effective mechanism to reward and incentivize the change”.
In this post I’m sharing my view on this challenge.

Citing  Zhamak Dehghani original work I highlighted that “Data as a product” is one of the pillars of the Data Mesh.
To reap the benefits promised by the data mesh organizations then feel urged to morph their data sets into data products.

Unfortunately there are many definitions of product and they are not equally useful to transform the organization and create a real, useful data mesh.

One definition of product is: “a thing that is the result of an action or process”.
This is the definition easiest to apply and is the most dangerous too.
It makes possible to quickly and automatically label every existing data source as a “data product” without the need to change anything in the existing processes.
It is a sure guarantee that the data mess will remain in place for the years to come with data remaining a by-product of the business processes rather than a “real” product.
Just like today, but with the trendy label.

A definition of product much more useful to incentivize the organizational change is the following: “an article or substance that is manufactured or refined for sale”.
The key part is “for sale” because it implies the existence of a historically strong product improvement driver: money changing hands and increasing a producer’s wealth in the process.

Many organizations have created, or are in the process of creating a “data marketplace”, to facilitate data product exchanges.
Unfortunately a lack of general agreement about what a “data marketplace” should be can lead to the creation of something slightly, but significantly different: a data catalog.
In recent implementations, hopefully, the catalog is paired with a set of tools to self service the data access and/or transfer (I’ll come to the subject of transfer vs. access in a future post).
This kind of data marketplaces work nicely with the first (and lesser) definition of a data product but is not fully supporting the value creation expected from the adoption of the second definition.

The data marketplace that supports continuously improving data products is something slightly, but significantly different.
On top of the functional characteristics of the basic marketplace I listed earlier it enables a low friction exchange of (data) goods for a certain amount of an agreed currency.

The technical means to easily move money around are many, well known, and broadly available.
The tricky part is, once again, an organization and people problem: defining the “certain amount” of currency that should change hands.

Who sets the price of the data products and how?

Being fundamentally Austrian in my vision of economy my first answer was: the free market!
Unfortunately this is a bad approach for the data market because the producers of data raw (I’ll tentatively blog about raw data/data by-products in the future) are in most cases natural monopolies: having only monopolists setting the prices would immediately lead to a complete failure of the marketplace.
Centrally regulated prices appears to be the only option.

Historically centralized economies have been trailing free economies in terms of wealth generation and this can be concerning: how we could prevent the same from happening in our data marketplace?
The special nature of digital goods compared to physical goods (produce once, sell many times) help us a bit in this matter.
Setting a fixed price centrally will promote efficiencies in data production at the assigned quality point (by making the production more efficient the producer will increase his gain), but shouldn’t destroy completely the incentives to improve the products because the same price is paid by each consumer and not by the corporate through a budget directly allocated to the producer.
In this scenario the producer has an incentive to get more consumers to buy the data product by improving it (there is a quantifiable return on additional investments made on the data product) and by sharing ideas about new ways to create value from data.

Deciding the prices is, once again, a non-technical problem.
I would promote value-based prices over cost-based prices any day because inefficient production processes lead to a higher price for a set value generated (I am being Austrian again here), but there is, again, a data product specific constraint here: most organizations have a hard time materializing the expected return of their data investiment.
I dare to say that many can’t even quantify the obtained return at all and this makes a proper value-based central pricing of data assets close to impossible.
The only the option left is to set the initial price of data products based on a linear combination of incurred and recurring production costs.
This unit price of the data product (subscription) for each consumer will be calculated by dividing the (calculated) current cost by the number of current consumers and the budgets will be aligned accordingly to the current consumers.
Organizations can (and should) apply a periodic price deflation factor to the initial prices to drive efficiency up and prevent omission bias and complacency on the producer side .

Will the Data Mesh save organizations from the Data Mess?

The “Data Mess” is almost as old as the installation of a second database within a single organization. Or maybe even older and paper-based.

Many companies, all over the world, have tried to solve the data mess problem for decades with varying degrees of success.
Which is a nice way to say: in many cases with limited or no success.
No matter the many promises of technical silver bullets that were made over the years, like the MPP databases earlier or the Hadoop-based datalakes later, the task of integrating data is still far from being a trivial one.

About 6 months ago I had a chat with a friend, and former Teradata colleague, and he told me he had to discuss the data mesh with the CIO of a large Italian company that was extremely excited about the subject.
Unsurprisingly, given the ripples that this post of Zhamak Dehghani had in the market, in the preceding weeks I had several conversations about the data mesh with my team mates and we are still debating the subject.

I’m writing today because I’m concerned by the fact that the data mesh is perceived, in almost all the conversations I have, as the (new&improved) silver bullet that will finally kill the data mess monster for good.
I think this might be the case. But only as long as the data mesh is not reduced to the technology/architecture part of the solution.

The “data mess” is generated by a combination of shortcomings in 3 key areas:
1) people
2) processes
3) technologies

The data mesh discussions I’ve had so far focus mostly, if not only, on the technical solutions with an unexpressed assumption (or hope) that removing the technical obstacles will be enough to magically fix also people and processes shortcomings.
I guess it might be because a lot of people in IT is more comfortable dealing with technologies than with processes and other people.
Or maybe I am just perceived as too much of a geek for my counterparts to discuss the non-technical aspects of the data mesh with me.

Frankly I hope it’s the latter scenario and the people and processes pillars are being addressed in other streams I’m not part of.
I say this because what the experience in the software quality space taught me is that technologies can facilitate processes, but don’t change them (with a few notable exceptions when packaged ERPs replaced custom solutions ahead of Y2K and many organization in a hurry just had to adapt to processes supported by the ERP they picked).
I also learned that people with enough motivation to do so can ignore, or even hijack, the best processes.

Both the first and the second post of Zhamak Dehghani touch multiple times the process aspects.
Are processes prominently missing only from the conversations I am having and hearing about or is a common pattern?

I tend to think that the people pillar (a.k.a. incentives to embrace a new way of doing things) is still not sorted out, or maybe is even perceived as too hard to approach, in many organizations and for this reason is simply removed from the debate.

I believe that solving the people part of the problem is strongly tied to a real transformation of data into a product rather than just a dump of JSON by-products, that the potential consumer has to figure out how to use, of the organization’s processes.
What incentive is given to the marketing team (or the e-commerce one, or the customer service, or the production lines…) to invest part of their limited budget to produce high-quality. easy to use, data available in the mesh and, maybe, also increase the data value over time?
No ROI, no party.

In the end my answer to the question I asked in the title is:
“Building a data mesh infrastructure without creating effective processes (and the right incentives for individuals and organizations to embrace the new processes) is not going to remove the data mess from the map.”

What is a performance benchmark?

In my experience, when adopting a buyer’s perspective,  I have seen 3 main classes of performance benchmarks:

  1. Check-mark benchmarks
  2. Decision confirmation benchmarks
  3. Decision driving (risk-reducing) benchmarks

 

Check-mark benchmarks are usually driven by internal processes tied to best practices associated with quality processes or required by laws and regulations.

In most cases the benchmarks falling into this class are perceived as a pure cost that needs to be minimized: an industry standard benchmark is usually the cheapest answer to the need for a benchmark result.

The Wikipedia article on the general performance benchmarking subject adopts a perspective that matches very well this type of benchmarks.

The key principle, from the 7 proposed Benchmarking Principles, that in my opinion position the article as a description of “check-mark benchmarks” is the one called Representativeness “Benchmark performance metrics should be broadly accepted by industry and academia”.

Several years ago Curt Monash wrote here that “The TPC-H benchmark is a blight upon the industry”.

Not only I fully agree with him about TPC-H, but I would expand the statement further: as of today all the industry standard benchmarks serve (some) vendors, but not the buyers.

I find the rest of the principles listed in the article sound and relevant for all the classes I’m describing in this post, but I use a slightly different definition for relevance (the test should measure metrics relevant to the business problem that needs to be solved by the technical solutions tested) and transparency (not only the metrics need to be easy to understand, but also the test conditions and how changing these condition can influence the results should be clear).


Decision confirmation benchmarks are executed to demonstrate the correctness of a decision that has been already taken.

When running such a test there is a high risk of a confirmation bias coming into play in the way the test is defined with the tests favoring the technical solution that has been selected.

Because the decision is already made the benchmark is seen as a cost to minimize rather than an investment also in this case.


Risk-reducing benchmarks are executed, as the definition implies, to minimize the risks associated with the selection of a specific technical solution to address a set of business needs.

The costs associated with the selection of an incorrect or sub-optimal solution can be very significant for an enterprise with the direct ones (incurred to implement the solution) usually being just a fraction of the total.  The cost of (lost) opportunity is usually the largest part.

When looking at the performance benchmark from this perspective the buyer sees the costs associated with the preparation and execution as an investment like it would be the case for an insurance.

Minimization of cost is no longer the main design criteria and is replaced by the balance between the ability to predict the future behavior of the different technical solutions when implemented with the buyer’s specific processing pattern and the cost of defining and running the test.


A single exercise might show characteristics of more than one of the classes, but in my experience the (mainly) risk-reducing performance benchmarks are a very small fraction.

What is your experience  in this regard?

Performance benchmarks for the operational people

In August I promised (here) to post a target table of content of this series of posts about performance benchmarks and delivered the first part covering the subjects I believe are relevant to managers.

I label “operational people” any individual or organization that is involved hands-on in the definition, implementation, execution and technical evaluation of a performance benchmark.

Today you find below the candidate list of subjects I believe are relevant to the operational people.


1) Knowing your starting state

  1.   The system(s) you are going to replace
  2.   The interaction between the systems
  3.   The workload on each system

2) Simulating your final state

  1.   Black box definition of the final state
  2.   Defining the workload insisting on the black box based on the current state
  3.   Defining the workload generated by functional and/or organizational changes

3) Ensuring the technologies are set to perform the best way for the desired final state

  1.   Defining the metrics
  2.   Defining how the metrics are grouped
  3.   Sharing the metrics with the vendors/implementation teams

4)  Executing the tests and reviewing the results.


 

Performance benchmarks for the manager

I promised last week (here) to post a target table of content of this series of posts about performance benchmarks.

It is a quite long list of topics that I split in two main areas:

  • Topics that are relevant for everyone in the organization, labeled “for the manager”,
  • Topics that are of interest mostly for the technical people, labeled “for the operational people

Below you find the candidate subjects that I believe are of general interest.


1)   What is a performance benchmark

2)  Types of technical benchmark

  1.   The check mark benchmark
  2.   The confirmation benchmark
  3.   The industry standard benchmark
  4.   The valuable benchmark

3)  Organizational challenges to implement a valuable benchmark

  1.  The IT architecture perspective
  2.  The IT operations perspective
  3.  The CFO perspective
  4.  The CEO perspective

Next post will contain the list of technical subjects while the following ones will start to dig into each subject in the two lists in an orderly fashion.

As I wrote earlier: your feedback will be key in shaping how this will move forward. Feel free to comment here or to reach me directly.

Performance benchmarks

It is time for me to give back.

Dealing with performance benchmarks has occupied a fair share of my life from my early days in the computer world in the mid ’80s.

In the beginning it was mostly reading, with just a bit of writing, that today I would be ashamed of, in one of the early Italian BBS “newspaper” called “Corriere Telematico“.

At the time I could have never imagined that benchmarks would have a very large role in my career to the point that for about 8 years they even defined my job title.

Now, as I my transition into a new role is almost complete, it feels like the right time to write something about benchmarks that can help many people in the industry.


I recall reading in one of the paper magazines of my early days something along the lines of “benchmarks don’t lie, but liars do use benchmarks”. I believe it was on MCmicrocomputer but I can’t bet on this.

This bleak statement about benchmarks was true 30+ years ago and it’s still true now, but we should not throw the good away together with the bad: proper benchmarks were and still are useful tools for individuals and organizations alike.  It’s all about defining “proper” correctly in each context.

For a while, given the scarcity of published material on the subject, I was thinking of putting together a book, with the help of a friend of mine.

I fear I will not be able to put in all the time needed to complete it in a reasonable time frame and for this reason I decided to blog on the subject instead.

In the coming weeks (or months, I don’t know yet how this will work) I will share what I learned in many years as a source for anyone wanting to get closer to the holy grail of the “proper benchmark”.

I will be vendor and technology neutral, covering both the business and the technical sides.

Your feedback will be key in shaping how this will move forward.

In the next post I’ll share the target table of content of this series of posts.