There is one book I’ve read in the past months that keeps coming back to me. Currently in the form of a series of Medium posts by Simon Wardley it is already a few hundred pages in length. Created as a means to map out the different parts of a company and to help developing a strategy, one of the central insight is that any piece of technology evolves over time from inception, refinement, to commoditization.
The first stage might be to figure out how to create an electric light which does not break immediately. Once you have the general solution of a special kind of metal in a glass container filled with inert gas, you have to ask yourself how you can make it usable. What sockets to use. Where to put the switches, and so on. Once you have that figured out, you can think about how to build these bulbs at scale with low costs and high efficiency.
These different stages have their own distinct characteristics. The inception stage is very unpredictable. It might just not work, and you cannot really force a discovery. The refinement stage is more controlled, but still might require a bit of trial and error to get right. Commoditization is finally all about efficiency and optimization.
Wardley states that this pressure towards commoditization is almost a law of nature. Products invariably follow that path, although not all products might make it to the final stage.
One area I was close to the past few years is software tools around machine learning. Back in the 2000s, there were some attempts at writing general purpose machine learning libraries, but I’d say we were still in the research phase. Eventually, an ecosystem began to emerge around Python libraries like numpy, matplotlib, and scitkit-learn.
In the Java world, Big Data systems like Hadoop and Spark were created, which created a new industry. It was pretty clear that there was a lot of money to make in a market so far dominated by database companies.
At conferences like O’Reilly’s Strata you could see how the products evolved from year to year trying to find some way to sell these tools to the data scientists. One of the challenges was that machine learning people originally worked with languages like R or MATLAB and were not really experienced or interested in dealing with JVM based tools and languages and infrastructure. Eventually, companies like databricks started to embrace Python and notebooks. Still, these were tools which still felt more like database infrastructure. Tensorflow and pytorch finally added real machine learning tools to the mix.
One phrase I have often heard in the past years is the “democratization of AI.” By the way, I am not sure this is the right metaphor, just like you probably wouldn’t say that cars democratized travel. Something is off, and it is not just marketing overselling the products.
Coming back to Wardley maps, I think the question is how commoditized these AI solutions already are, really. It is clear that companies like AWS only deal in commoditized products, but I think that this area is still so new that we’re most likely still in the refinement phase. Which means that those products which are pushed towards us still need a lot of work before we can buy them like light bulbs to put into our infrastructures. They are sold to sound simple, but in reality each product is a very specific and intricate attempt of a solution that we’re still trying to figure out.
The problem is not everyone is able to tell the difference. More than once I’ve been sitting in meetings and had to explain why data scientists wouldn’t just be happy if they got a notebook server. Or if all problems would be solved if we allowed all teams to use AWS Sagemaker.
I’d say that currently, it is not even clear exactly which problem is being solved by those tools. And there is also quite a range of complexity when it comes to putting machine learning into production. The easiest variant is probably taking a classification or regression model and then serving those outputs via a REST API, and this is also what most libraries that currently claim to solve all the problems are able to do.
At the other end of the spectrum of ML solutions are complex systems that periodically retrain and update their models, involve multi-staged preprocessing and feature extraction pipelines, and require extensive monitoring in production, and where the machine learning model is only a part in a bigger system, for example, providing scores in an information retrieval application.
In many ways, what we have right now are incomplete products, each slightly different in the way the have solved the problem. Some are quite complicated to use, and many provide simply incomplete solutions. I probably need to follow up these claims with more examples, but one example is the over-reliance on notebooks as the main programming interface, which falls short in terms of collaboration and turning solutions into production systems IMHO.
One interesting observation from Simon Wardley is that the biggest disruptions happen when products go from refinement to commodity. You would probably think that exciting new products are the place where disruption happens, but in terms of money to be made, this happens when you have a product and you can go into the mass market.
Another interesting claim is that this disruption rarely happens from companies that have dominated during the refinement phase. There are several reasons for this, for example, that they have a lot of investment in the technology that they have built up and the clients they have, so that they cannot act fast enough to go into the next stage. One prime example is cloud computing. Although companies like IBM had literally decades of experience with computing, it was AWS that disrupted the market.
We currently have many interesting products, but they haven’t really solved the full problem yet. And once we have, it’ll be interesting to see who will make the race.