Biology of Distributed Information Systems: 2023

Saturday, November 25, 2023

To Keep One’s Promises in a VUCA Word

1. Introduction

This blog post is a return to what I used to do when I started my blogs, that is sharing a few thoughts (“musing”) on a specific topic, as opposed to writing longer texts that are either full essays or book reviews. The topic of the day is the difficulty of keeping promises – such as meeting SLA: Service Level Agreements, defined as lead time – in a VUCA world, volatile, uncertain, complex and ambiguous. This difficulty stems both from the complexity of the object that is being promised (that is, knowing beforehand how much effort, time and resources will be necessary) and the complexity of the environment (all the concurrent pressures that make securing the necessary efforts and resources uncertain, volatile, and complex).

This is a topic that stays in my top of mind, both professionally and personally, since we are constantly exposed to the difficulties that companies experience to keep their promises to their customers. While preparing this blog post, I have a wealth of personal examples in mind. This week, I tried to get a birthday package delivered to my wife with one of the well-known flower delivery companies. Not only the delivery was late, half of it missing, but they simply lied (telling me that the delivery had occurred when it had not!) rather than owning their mistake. Two years ago, I ordered online a large fridge from one of our largest France appliances brands and started an amazing string of failed deliveries with a complete absence of warnings or ability to be reached (trying to solve a problem when the only channel is emails that get answered in 24 hours is excruciating). Even one of the best-known furniture brands in the world failed me miserably when I moved to Clermont a couple of years ago, cancelling the sofa delivery on the very day I had driven from Paris, with zero warning. What I find striking as a customer is not simply that such companies are bad at keeping their promises, but their reaction to the broken promise is appalling: they do not know what is happening, have no clue when their problem will be fixed and usually keep making further promises that are no better respected. The episode of the “refrigerator delivery” is still vivid in my time since the travel costs and lost time after three failed attempts ended costing me more money than the object itself, not to mention the anguish of waiting (all kinds, waiting for a call, for an email, for a truck to show up, ….).

Professionally, as a software and IS executive for the past 30 years, I have lived first-hand the difficulty of many information systems projects to be delivered on-time. Moving to agile methodologies has helped our profession to recognize the VUCA difficulty, and to reduce the number of promises that could not be kept. However, the complex interdependency of projects in a company still requires orchestration roadmaps and project teams to agree and to meet on synchronization milestones. Therefore, a key skill in the digital world is to balance the necessity to manage emergence (some capacities are grown without a precise control on lead time) together with the control of continuous delivery.

The thesis that I will explore in this blog post is that, when both the environment and the nature of the object/product/service that is being promised becomes more complex (in the full VUCA sense), we need to reduce the number of such promises, make simpler-to-understand promises (which does not mean simpler-to-deliver) and, on the opposite, make promises that are strongly binding on your resources. The most important corollary of this list is that the first thing one needs to learn in a VUCA world is to say no. However, saying no all the time, or refusing to make any promises, is not a practical solution either, so the art of the VUCA promise is also a balancing act, grounded in humility and engagement.

This blog post is organized as follows. Section 2 looks at waiting queues, and how they are used everywhere as a demand management tools. We know this from the way our calls (or emails) are handled in call center, but this is also true of an agile backlog, which is equally managed as a (sorted) queue. The queue, such as the line queue in front of a popular pastry shop, is the oldest form of self-stabilizing demand management (you decide how long you are ready to wait based on the value you expect to get), but not necessarily the most efficient one. Section 3 looks at alternatives to manage demand in a VUCA contexts, from resource management such as Kanban to adaptative prioritization policies. I will look at different situations, including the famous “Beer Game”, that shows that resilience under a VUCA load is poorly served by complexity. Section 4 makes a deeper dive on the topic of “service classes” and how to handle them using adaptative policies. I will briefly present the approach of DDMRP (demand driven material requirement planning) as one of the leading approaches to “live with a dynamic adaptative supply chain”. There is no silver bullet, and the techniques to strengthen one’s promises should not be opposed but combined.

2. Waiting Queues as Demand Management

We have a shortage of ophthalmologists in France (among many other medical professions), consequently it takes up to 6 months to set up an appointment in many cities. It varies a lot from one place to another, depending on the shortage’s acuteness. The ordered queue of available appointments is used to regulate the flow of requests into a regular workstream. This mechanism is self-adjusting: the waiting list acts as a deterrent, depending on the urgency and the availability of other options, each future patient accepts or declines the proposed appointment time slots. This mechanism works well to spread the “load” when irregular bursts occur. On the other hand, it does a poor job in a situation of constant shortage since everyone has to wait a long time to get an appointment (so the patient satisfaction is low) while the workload or the income of the practitioner is not improved by such a practice.

I have always been fascinated by the use of queues as demand management devices, it is the topic on my early 2007 blog post. Queues are used everywhere, in front of shops, museums, movie theaters because it is a simple and fair (Movie theaters have moved to pre-reservation using digital apps a decade ago). Fairness is important here, a lot of sociology work exists about how people react to queues, depending on their nationality and culture. It comes from the transparency of the process: even if you do not like it, you can see how you are managed in this process. This is the famous FCFS (first come, first serve) policy that has been a constant topic of attention: it is fair, but it is not efficient nor for the service provider nor for the provide. If you step back, the FCFS policy has two major shortcomings:

In most cases, including healthcare practitioners, there are different priorities since the value associated with the upcoming schedule appointment varies considerably. In the case of practitioners, priority comes either from the urgency of the situation and the long-standing relationship with a patient.
In many cases, a customer that is refused an appointment within an “acceptable” time window may decide never to return (which is why small businesses on the street, try to never turn down a customer). Whether this is important or not depends on the business context and the scarcity of the competing offer … obviously, this is not a big issue for ophthalmologists today. However, the problem becomes amplified by variability of the demand. If the demand is stable, the long queue is a self-adjusting mechanism to shave the excess demand. But in the case of high variability, “shaving the demand” creates oscillations, a problem that we shall meet again in the next section (due to the complex/non-linear nature of service with a queue).

Consequently, healthcare practitioners find ways to escape the FCFS policy:

Segmenting their agenda into different zones (which is the heart of yield management), that is having time slots that are opened to FCFS through, for instance, Doctolib (which supports a fair exposure of the agenda used as a sorted backlog), and other time slots which are reserved for emergencies (either from a medical emergency perspective or a long-time customer relationship perspective)
Forcing the agenda allocation to escape the FCFS, either by inserting patients “between appointments” or by shifting appointments (with or without notice). In all cases, this is a trade-off when the practitioner implicitly or explicitly misses the original promise, to maximize value.

The remainder of this post will discuss about these two options, zoning (reserving resources for class of services) and dynamic ordering policies. Both are a departure from fairness, but they aim to create value.

Unkept promises yield amplified demand, this is a well-know observation from many fields. In the world of call centers, when the waiting queue to get an agent gets too long, people drop the call and come back later. When trying to forecast the volume of calls that need to be handled, the first task of the data scientist is to reconstruct the original demand from the observed number of calls. This is also true for medical practitioners, when the available appointments are too far in the future, people tend to keep looking for other options, and drop their appointment later. This happens for restaurants or hotels as well, which is why you are now often charged a “reservation fee” to make sure that you are not edging your bets with multiple reservations. This is a well-known adaptive behavior: if you expect to be under-served, you tend to amplify your demand. This is why the title of this blogpost matters, “keeping your promises in a VUCA world”, because if you don’t, not only you degrade your customer satisfaction, but you promote an adaptative behavior that degrades even more your capacity to meet your promises in the future. This is not at all a theoretical concern, companies that let their customer place “future” orders know this firsthand, and it makes supply chain optimization even more complex. The additional complexity comes from the non-linear amplification: the more a customer is under-served, the more requests are “padded” with respect to the original need. If you manage to keep your promise, you avoid entering a phenomenon which is hard to analyze since customers react differently to this “second-guessing game”.

Trying to understand how your stakeholders with react to a complex situation and second-guess your behavior is a fascinating topic. In his great book, “The Social Atom”, March Buchanan recalls the social experiment of Richard Thaler, who proposed in 1997 a game to the readers of the Financial Times, which consists of guessing a number "that must be as close as possible to 2/3 of the average of the other players' entries." It is a wonderful illustration of bounded rationality and second-guessing what the others will do. If players are foolish, they answer 50 (the average between 0 and 100). If they think "one step ahead," they play 33. If they think hard, they answer 0 (the only fixed point of the thought process, or the unique solution to the equation X = 2/3 X). The verdict: the average entry was 18.9, and the winner chose 13. This is extremely interesting information when simulating actors or markets (allows calibration of a distribution between "fools" and "geniuses"). Thaler’s game gives insights into a very practical problem: how to leverage the collective intelligence of the salesforces to extract a forecast for the next year’s sales, while at the same time, a part of variable compensation is linked to reaching next year's goal?. This is an age-old problem, but which is made much more acute in a VUCA world. The more uncertain the market is, the more sales agents’ best interest is to protect their future gain by proposing a lower estimate than what they guess. This is precisely why game theory in general, and GTES in particular, is necessary to understand how multiple stakeholders reacts in a VUCA context.

This question of meeting your promises in a VUCA context is critical to modern software development. It is actually a common issue of software development, but the VUCA context of digital/modern software products makes it more acute (I refer you to my last book where I discuss why the digital transformation context makes software development more volatile and uncertain, mostly because of the central role of the user, together with the fast rate of change pushed by exponential technologies). Agile methods, with their backlogs of user stories, offer a way to adapt continuously. Before each sprint starts, the prioritization of the backlog is dynamic and is re-evaluated according to the evolution of the context (customer feedback, technology evolution, competitors’ moves, etc.). However, as soon as we look at large scale systems, some milestones have hard deadlines and should not be reshuffled dynamically. Not everything is “agile” : coordination with advertising campaign, other partners, other legacy systems require to agree on large-scale coordination (which is why large-scale agile methodologies such as SAFe introduced “PI planning” events). More than 15 years ago at Bouygues Telecom when developing software for set-top boxes, we introduced “clock tags” in some user stories in the backlog. Later, at AXA’s digital agency, we talked about the “backmap”, the illegitimate daughter of the roadmap and the backlog. To make this coexistence of “hard-kept promises” with a “dynamically sorted backlog” work, two principles are critically important:

Do not put a clock stamp until feasibility has been proven. The only uncertainty that is acceptable in a “backmap” is the one about resources (how much will be necessary). This means that the design thinking phase that produce the user story card for the backlog must be augmented, when necessary, with a “technical” (there is an infinite number of reasons why feasibility is questionable) “POC”, that is a proof of concept, which can be light or really complex (depending on the issue at stake).
Do not overuse this mechanism: 30% of capacity for timed milestones is a good ratio, 50% at worst. Agility with the rest of the backlog gives you the flexibility to adapt resources to meet milestones, but only up to a point. There is no “silver bullet” here: the ratio depends on (a) the level of uncertainty (b) your actual, versus theoretical, agility.

Agile backlog is not a silver bullet for another reason: it takes discipline and craftsmanship to manage and sort out the backlog, to balance between short-term goals such as value and customer satisfaction and longer-term goals such as cleaning the technical debt and growing the “situational potential” of the software product (capability to produce long-term future).

3. Demand-Adaptative Priorities versus Kanban

More than 15 years ago, I worked on the topic of self-adaptive middleware, whose purpose is to implement “service classes” among business processes. In the context of EDA (event-driven architecture), the middleware routes asynchronous messages from one component to another to execute business process. Each business process has a service level agreement, which describes, among other things, the expected lead time, that is the time it takes to complete the business process from end to end. Since there are different kinds of business processes with different priorities, related to different value creation, the goal of an “adaptive middleware” is to prioritize flows to maximize value when the load is volatile and uncertain. This is a topic of interest because queuing theory, especially the theory of Jackson networks, tells us that a pipe or a network of queues are variability amplifiers. A modern distributed information system with microservices can easily be seen as network of service nodes with handling queues. When observing a chain of queues submitted to a highly variable load, this variability can be amplified as it moves from one a station to another, something that is a direct consequence of the Pollaczek- Khinchine formula. I got interested with this topic after observing a few crisis, as a CIO, when the message routing infrastructure got massively congested.

I have written different articles on this work and I refer you to this 2007 presentation to find out more about policies and algorithms. Here I just want to point out three things:

I tried to adapt scheduling and resource reservation since my background (at that time) was scheduling and resource allocation optimization. It was quite tempting to import some of the smart OR (operations research) algorithms from my previous decade of work into the middleware. The short story is that it works for a volatile situation (stochastic optimization, when the load distribution varies but the distributions are known) but it does not work in an uncertain world, when the distribution laws are unknown.
The best approach is actually simpler, it based on routing policies. This is where the self-adaptive adjective comes from: declarative routing policies based on SLA do not make assumptions the incoming distribution and prove to be resilient, whereas future resource allocation works beautifully when the future is known but prove to fail to be resilient. Another interesting finding is that, in a crisis time, LCFS (last-come, first serve) is better than FCFS (which is precisely the issue of the ophthalmologist whose entire set of customers are unhappy because of the delays).
The main learning is that tight systems are more resilient than loose ones. While playing with the IS/EDA simulator, I tried to compare two information systems. The first one uses asynchronous coupling and queues as a load absorber and has SLAs that are much larger than end-to-end lead times. It also has less computing resources allocated to the business resources, since the SLA supports a “little bit of waiting” in a processing queue. The second configuration (for the same business processes and the same components) is much tighter: SLAs are much shorter, and more resources are allocated so the end-to-end lead time includes much less waiting time. It turns out that the second system is much more resilient! Its performance (SLA satisfaction) shows “graceful degradation” when the load grows unexpectedly (large zone of linear behavior), while the first system is chaotic and shows exponential degradation. There should be no surprises for practitioners of lean management, since this is precisely a key lesson from lean (avoid buffers, streamline the flows, and work with just-in-time process management)

The topic of understanding how a supply chain reacts when the input signals become volatile and uncertain is both very old and quite famous. It has led to the creation of the “The Beer Game”, a great “serious game” that let players experiment with a supply chain setting. Quoting GPT, "The Beer Game" is an educational simulation that demonstrates supply chain dynamics. Originally developed at MIT in the 1960s, it's played by teams representing different stages of a beer supply chain: production, distribution, wholesale, and retail. The objective is to manage inventory and orders effectively across the supply chain. The game illustrates the challenges of supply chain management, such as delays, fluctuating demand, and the bullwhip effect, where small changes in consumer demand cause larger variations up the supply chain. It's a hands-on tool for understanding systems dynamics and supply chain management principles”. A lot has been written about this game, since it has been used to train, students as well as executive, for a long time. The game is very different according to the demand flow settings. It becomes really interesting when the volatility increases, since it almost always leads to shortages as well as overproduction. Here are the three main observations that I draw from this example (I was a happy participant during my MBA training 30+ years ago):

Humans are poor at managing delays. This is a much more general law that applies to many more situation, from how we react to a cold/hot shower (there is a science about the piping length between the shower head and the faucet) to how we react to global warming – I talk about this often when dealing with complex systems. In the context of “The Beer Game”, it means that players react too late and over-react, leading to amplifying oscillations, from starvation to over-production.
Humans have usually a hard time understanding the behavior of Jackson networks, that is a graph (here a simple chain) of queues. This is the very same point as the previous example. Because we do not understand, we react too late, and the oscillating cycle of over-correcting starts. This is what makes the game fun (for the observer) and memorable (for the participant).
Seeing the big picture matters: communication is critical to handle variability. One usually plays in two stages: first the teams (who each handle one station of the supply chain) are separated and work in front of a computer (seeing requests and sending orders). The second step is when the teams are allowed to communicate.Communication helps dramatically, and the teams are able to stabilize the production flow according to the input demand variation.

Let me add that when I played the game, although I was quite fluent with Jackson networks and queueing theory at that time, and although I had read beforehand about the game, I was unable to help my team and we failed miserably to avoid oscillations, a lesson which I remember to this day.

I will conclude this section with the digital twin paradox: these systems, a supply chain network or an information system delivering business processes are easy to model and to simulate. Because of the systemic nature of the embedded queue network, simulations are very insightful. However, there are two major difficulties. First, the VUCA nature of the world makes it hard to characterize the incoming (demand) laws. It is not so hard to stress-test such a digital twin, but you must be aware of what you do not know (the “known unknowns”). Second, a model is a model, and the map is not the territory. This is reason why techniques such as process mining, which reconstruct the actual processes from observed traces (logs), are so important. I refer you to the book “Process Mining in Action”, edited by Lars Reinkemeyer. Optimizing the policies to better serve a network of processes is of lesser use if the processes of the real word are too different from the ones in the digital twin. This is especially true for large organizations and processes for Manufacturing or Order-to-Cash (see the examples from Siemens and BMW in the book). I am quoting here directly from the introduction: “ Especially in logistics, Process Mining has been acknowledged for supporting a more efficient and sustainable economy. Logistics experts have been able to increase supply chain efficiency by reducing transportation routes, optimization of inventories, and use insights as a perfect decision base for transport modal changes”.

4. Service Classes and Policies

Before working on how to maintain SLA with asynchronous distributed information systems, I have worked, over 20 years ago, on how to keep the target SLAs in a call center serving many types of customers with varying priorities and value creation opportunities. The goal was to implement different “service classes” associated with different SLAs (defined mostly as lead time, the sum of waiting time and handling time). I was collaborating with Contactual (a company sold to 8x8 in 2011) and my task was to design smart call routing algorithms – called SLR : Service Level Routing) that would optimize service class management in the context of call center routing. Service classes are a key characteristic of yield management or value pricing, that is, the ability to differentiate one’s promise according to the expected value of the interaction. This work led to a patent “Declarative ACD routing with service level optimization”, that you may browse if you are curious, here I will just summarize what is relevant to the topic of this blog post. Service Level Routing (SLR) is a proposed solution for routing calls in a contact center according to Service Level Agreements (SLAs). It dynamically adjusts the group of agents available for a queue based on current SLA satisfaction, ranking agents from less to more flexible. However, while SLR is effective in meeting SLA constraints, it can lead to reduced throughput as it prioritizes SLA compliance over overall efficiency. Declarative control, where the routing algorithm is governed solely by SLAs, is an ideal approach. Reactive Stochastic Planning (RSP) is a method for this, using a planner to create and regularly update a schedule that incorporates both existing and forecasted calls. This schedule guides a best-fit algorithm aimed at fulfilling the forecast. However, RSP tends to plan for a worst-case scenario that seldom happens, leading to potential misallocation of resources. The short summary of the SLR solution development is strikingly similar to the previous example of Section 3:

Being (at the time) a world-class specialist of stochastic scheduling and resource allocation, I spent quite some time designing algorithms that would pre-allocate resources (here group of agents with the required skills) to groups of future incoming calls. It worked beautifully when the incoming calls followed the expected distribution and failed otherwise.
I then looked at simpler “policy” (rule-based) algorithms and found that I could get performance results that were still close to what I got when the call distribution is known beforehand (volatile but not uncertain), without making such hypothesis (the algorithm continuously adapts to the incoming distribution and is, therefore, much more resilient).
The art of the SLR is to balance between learning too much from the past and adapting quickly to a “new world”. I empirically rediscovered Taleb’s law: in the presence of true (VUCA) complexity, it is better to stick to simple formulas /methods/ algorithms to avoid the “black swans” of the unintended consequences of complexity.

Since then, I have experienced the same situation and conclusion in many different environments. I have participated, a long time ago, to the introduction of yield management into the advertising grid of TF1, using stochastic algorithms to open and close tariff classes based on current reservations and expected future requests. Simplicity of the allocation schema is the only way to obtain resilience: as soon as one “overfits”, that is create too many classes which render forecasting brittle, the benefits of yield management become very uncertain. I have also witnessed how over-optimization of supply chain management with pre-reservation (slices of inventories allocated to service classes) also fail to show resilience to the current string of crises which is a signature of our VUCA world. For the exact same reasons as with the previous domains, pre-reservation works beautifully in the lab, when the load distribution follows past (observed) distributions. But in the real world, these reservations create shortages and oscillations, as noticed with the beer game. The state-of-the-art today is to use a mix of kanban (reactive) and stochastic optimization call Demand-Driven Supply Chain. If you want to dive into this topic, I recommend “Demand Driven Material Requirement Planning” from Carol Ptak and Chad Smith. The book starts with the observation that we made about our VUCA world “All forecasts start out with some inherent level of inaccuracy. Any prediction about the future carries with it some margin of error. This is especially true in the more complex and volatile New Normal”. It warns us about “overfitting” (too detailed forecast): “The more detailed or discrete the forecast is, the less accurate it is. There is definitely a disparity in the accuracy between an aggregate-level forecast (all products or parts), a category-level forecast (a subgroup of products or parts), and a SKU-level forecast (single product or part) … Today many forecasting experts admit that 70 to 75 percent accuracy is the benchmark for the SKU level”. Using these approximate forecasts to manage the supply chain produced a “bimodal distribution” of shortages and overproduction: “This bimodal distribution is rampant throughout industry. It can be very simply described as “too much of the wrong and too little of the right” at any point in time and “too much in total” over time. In the same survey noted earlier, taken between 2011 and 2014 by the Demand Driven Institute, 88 percent of companies reported that they experienced this bimodal inventory pattern. The sample set included over 500 organizations around the world”. Batching policies should be avoided since they amplify variability (cf. what we said about the Beer Game and Jackson networks): “ The distortion to relevant information and material inherent in the bullwhip is amplified due to batching policies. Batching policies are determined outside of MRP and are typically formulated to produce better-unitized cost performance or are due to process restrictions or limitations”. DDMRP “ combines some of the still relevant aspects of Material Requirements Planning (MRP) and Distribution Requirements Planning (DRP) with the pull and visibility emphases found in Lean and the Theory of Constraints and the variability reduction emphasis of Six Sigma”. Here are few thoughts from the book that I want to underline because they match precisely what I have seen in the previously mentioned examples:

To reduce the variability, one needs to work on the flow (cf. the adaptive middleware example): “The need for flow is obvious in this framework since improved flow results from less variability”.
Frequent reaction yields better adaptation (the heart of agility): “ This may seem counterintuitive for many planners and buyers, but the DDMRP approach forces as frequent ordering as possible for long lead time parts (until the minimum order quantity or an imposed order cycle becomes a constraining factor)”.
Beware of global planning methods that bring “nervousness” (“This constant set of corrections brings us to another inherent trait of MRP called nervousness” : small changes in the input demand producing large changes in the output plan) – I have told enough of my war stories in this post, but this is exactly the issue that I worked on more than 30 years ago, when scheduling fleet of repair trucks for the US telcos at Bellcore.
DDMRP is a hybrid approach that combines the lean tradition of demand-driven pull management with Kanban and the stochastic sizing of buffers (“The protection at the decoupling point is called a buffer. Buffers are the heart of a DDMRP system”): “What if both camps are right? What if in many environments today the traditional MRP approach is too complex, and the Lean approach is too simple?”
However, stochastic sizing also needs frequent updates to gain resilience and adaptability: “ Yet we know that those assumptions are extremely short-lived, as conventional MRP is highly subject to nervousness (demand signal distortion and change) and supply continuity variability (delay accumulation)”.

The main idea from the book is that we need to distinguish volatility and uncertainty, and the hard part is adapting to uncertainty. One may add that all stochastic optimization methods tend to suffer from a classical weakness: they make assumptions on the statistical independence of many input stochastic variables and fare poorly when these variables are bound to a common root cause that we usually call a crisis. This applies to the 2008 subprime crises as well as so many other situations.

To end this blog post, I will return to the practical issue of filling one’s calendar in a VUCA world. This is an old favorite of mine, which I had already discussed in my second book (the English edition may be found here). The problem that I want to solve when managing my calendar is, without surprise, to maximize the expected value of what gets in while keeping the agility (flexibility of non-assigned time slot) to also maximize the hypothetical value of high-priority opportunities that would come later. It is easy the adapt the two principles that we have discussed in this blog post:

Reservation is “zoning”, that is defining temporal zone that you reserve for some special types of activities. For many years, as a CIO, I have reserved the late hours of the day (6pm-8pm) to crisis management. This supports a 24h SLA when a crisis occurs, and the capability to keep following the crisis with a daily frequency. Zoning works, but at the expense of flexibility when many service classes are introduced (overfitting of the model). It is also at the expense of the global service level (as with any reservation policy). In addition to its capacity to optimize the value from your agenda, zoning has the additional benefit of team/organization synchronization: if your zoning rules are shared, there is a new level of efficiency that you gain (a topic not covered today but a key stake in my books and other blogposts).
Dynamic routing means to assign different SLA according to the service class, that is to find an appointment in the near future for high value/priority jobs but to use a longer time horizon as the priority declines. The goal here is avoid the terrible short-coming of first-come first-serve, which yields inevitably to the “full agenda syndrome” (so easy when you are a CIO or an ophthalmologist). Dynamic routing is less effective than zoning, but it is adaptive, and works well (i.e., compared to doing nothing) even then the incoming rate of requests enters a completely different distribution.

As mentioned in the introduction, my current policy after many years of tinkering, is to apply both. Reservation should be used sparingly (with the constant reminder to yourself that you do not know the future) because it reduces agility, but it is very effective. The more the agenda is overbooked, the more reservations are required. Dynamic routing on the other hand, is what works best in uncertain times, when the overbooking is less acute. It blends fluidly with the art of saying no, since saying no is the special case of returning an infinite lead time for the next appointment. It is by no means a silver bullet, since we find here the weakness of all sorting methods (such as the agile backlog): assigning a future value to a meeting opportunity is an art.

5. Conclusion

I will conclude this blog post with three “laws” that one should keep in mind when managing promises, such as service level agreements, in a VUCA world:

Forecasting is difficult in a VUCA world, a little bit because of volatility and uncertainty (by definition) but mostly because of complexity and non-linearity between causal factors. We could call this the “Silberzann law”, it becomes more important the longer time horizon you consider (as told in another blog post, the paradox of the modern world is both the increasing relevance of short-term forecasts because of data and algorithms and the increasing irrelevance of long-term forecasts.
Complex and uncertain situations are best managed with simple formulas and simple policies, which we could call the “Taleb Law”. This is both of consequence of the mathematics behind stochastic processes (cf. the Pollaczek–Khinchine formula) and complex systems with feedback loops. Here, the rule is to be humble and beware of our own hubris as system designers.
When trying to secure resources to deliver promises made in a VUCA context, beware of hard resource reservation and favor adaptative policies, while keeping the previous law in mind. Aim for “graceful degradation” and keep Lean “system thinking” in mind when designing your supply / procurement / orchestration processes.

Wednesday, May 31, 2023

Adding Language Fluency and Knowledge Compression to the AI Toolbox

1. Introduction

The last six months have been very busy on the Artificial Intelligence front. The National Academy of Technologies of France (NATF) has just issued a short position paper to discuss some of the aspects of LLM (large language models) and conversational agents such as ChatGPT. Although much has happened very recently, reading the yearly report from Stanford, the 2023 AI index report, is a good way to reflect on the constant stream of AI innovations. While some fields show stabilization of the achieved performance level, others, such as visual reasoning, have kept progressing over the last few years (now better than human performance level on VQA challenge). The report points out the rise of multi-modal reasoning systems. Another trend that has grown from some time is the use of AI to improve AI algorithms, such as PaLM being used by Google to improve itself. This report is also a great tool to evaluate the constant progress of the underlying technology (look at the page about GPU, which have shown “constant exponential progress” over the past 20 years) or the relative position of countries on AI research and value generation.

It is really interesting to compare this “2023 AI Index Report” with the previous edition of 2022. In a previous post last year, the importance of transformers / attention-based training was already visible, but the yet-to-come explosive success of really large LLMs coupled with reinforcement learning was nowhere in sight. In a similar way, it is interesting to re-read the synthesis of “Architects of Intelligence – The Truth about AI from the People Building It”, since the bulk of this summary still stands but the warnings about the impossibility to forecast have proven to be “right on target”.

Despite what the press coverage portrays, AI is neither a single technology nor a unique tool. It is a family of techniques, embedded in a toolbox of algorithms, components and solutions. When making the shortcut of talking about AI as a single technique, it is almost inevitable to get it wrong about what AI can or cannot dot. In the 2017 NATF report (see figure below) we summarize the toolbox by separating along two axes: do you have a precise/closed question to solve (classification, optimization) or an open problem? do you have lots of tagged data from previous problems or not. The point being that “one size does not fit all”, lots of techniques have different purposes. For instance, the most relevant techniques for forecasting are based on using very large volumes of past data and correlations. However, in the past few “extra VUCA years” of COVID, supply chain crises and wars, forecasting based on correlation does not do so well. However, causality AI and simulation have a lot to offer in this new world. As noticed by Judea Pearl or Yann Le Cun, one of the most exciting frontiers for AI is world modeling and counterfactual reasoning (which is why “digital twins” are such an interesting concept). It is important to notice that exponential progress fueled by Moore’s Law happens everywhere. Clearly, on the map below, deep learning is the field that has seen the most spectacular progress in the past 20 years. However, what can be done with agent communities to simulate large cities, or what you can expect from the more classical statistical machine learning algorithms, has also changed a lot compared to what was feasible 10 years ago. The following figure is an updated version of the NATF figure, reflecting the arrival of “really large and uniquely capable” LLMs into the AI toolbox. The idea of LLM, as recalled by Geoffrey Hinton, is pretty old, but many breakthroughs have occurred that makes LLMs a central component of the 2023 AI toolbox. As said by Satya Nadella: « thanks to LLM, natural language becomes the natural interface to perform most sequences of tasks ».

Figure 1: The revised vision of the NATF 2017 Toolbox

This blog post is organized as follows. Section 2 looks as generative AI in general and LLM in particular, as a major on-going breakthrough. Considering what has happened in the few past months, this section is different from what I would have written in January, and will probably become obsolete soon. However, six months past the explosive introduction of ChatGPT, it makes sense to draw a few observations. Section 3 takes a fresh look at the “System of Systems” hypothesis, namely that we need to combine various forms of AIs (components in the toolbox and in the form of “combining meta-heuristics”) to deliver truly intelligent/remarkable systems. Whenever a new breakthrough appears, it gets confused with “the AI technology”, the approach that will subsume all others and … will soon become AGI. Section 4 looks at the AI toolbox from an ecosystem perspective, trying to assess how to leverage the strengths of the outside world without losing your competitive advantage in the process. The world of AI is moving so fast that the principles of “Exponential Organizations” hold more true than ever : there are more smart people outside than inside your company, you cannot afford to build exponential tech (only) on yourself, you must organize to benefit from the constant flow of technology innovation, and so on. There are implicitly two tough questions to answer: (1) how do you organize yourself to benefits from the constant progress of the AI toolbox (and 2023 is clearly the perfect year to ask this question)? (2) how do you do this while keeping your IP and your proprietary knowledge about your processes, considering how good AI has become to reverse-engineer practices from data?

2. Large Languages Models and Generative AI

Although LLM have been around for a while, four breakthroughs have happened recently, which (among other things) explain why the generative AI revolution is happening in 2023:

The first breakthrough is the transformer neural network architecture, which started (as explained in last year’s post) with the famous 2017 article “Attention is all you need”. The breakthrough is simplicity: training a RNN (recurrent neural network) that operates on a sequence of input (speech, text, video) has been an active but difficult field for many years. The idea of “attention” is to encode/compress what a RNN must carry from the analysis of the past section of a sequence to interpret the next token. Here simplicity means scale: a transformer network can be grown to very large sizes because it is easier to train (more modular, in a way) than previous RNN architectures.
The second breakthrough is the emergence of knowledge “compression” when the size grows over a few thresholds (over 5 and then 50 billion parameters). The NATF has the pleasure of interviewing Thomas Wolf, a leader of the team that developed BLOOM, and he told us about this “emergence” : you start to observe a behavior of a different nature when the size grows (and if this sounds vague, it is because it is precisely hard to characterize). Similar observations may be found while listening to Geoffrey Hinton or Sam Altman on the Lex Fridman podcast. The fun fact is that we still do not understand why this happens, but the emergence of this knowledge compression created the concept of prompt engineering since the LLM is able to do much more than stochastic reconstruction. So, beware of anyone who would tell you that generative AI is nothing more than a “stochastic parrot” (It is too easy to belittle what you do not understand).
As these LLMs are trained on very large corpus of generic texts, you would expect to have to retrain them on domain specific data to get precise answers relevant to your field. The third breakthrough (still not really explained) is that some form of transfer learning occurs and that the LLM, using your domain specific knowledge as its input, is able to combine its general learning with your domain into a relevant answer. This is especially spectacular when using a code generation tool such as GitHub co-pilot. From experience, because my visual studio plugin uses the open files as the context, GitHub copilot generates code that is amazingly customized to my style and my on-going project. This also explains why the length of the context (32k tokens with GPT4 today) is such an important parameter. We should get ready for 1M token contexts that have been already discussed, which supports giving a full book as part of your prompt.
The last breakthrough is the very fast improvement of RLFH (reinforcement learning with human feedback), which has itself been accelerated by the incredible success of ChatGPT adoption rate. As told in the MIT review article, the growth of ChatGPT user base came as a surprise to the ChatGPT team itself. Transforming a LLM into a capable conversational agent is (still) not an easy task, and although powerful LLMs were already in use in many research labs as of two years ago, the major contribution of OpenAI is to have successfully curated a complete process (from fine tuning, training the reward model through RLHF to guide the LLM to produce more relevant outputs to inner prompt engineering, such as Chain-Of-Thought prompting which is very effective in GPT4).

It is probably too early to stand back and see all what we can learn from the past few months. However, this is a great illustration of many digital innovations principles that have been illustrated in this blog posts of the past 10 years, such as the importance of engineering (thinking is great but doing is what matters), the emergence mindset (as Kevin Kelly said over 20 years ago, “intelligent systems are grown, not designed”) and the absolute necessity of an experimentation mindset (together with the means to execute, since as was beautifully explained by Kai-Fu Lee in his book “AI superpowers”, “size matters in AI development”). This is a key point, even though the nature of engineering is then to learn to scale down, that is to reproduce with less effort what you found with a large-scale experiment. The massive development of open-source LLMs, thanks to the LLaMA code released by Meta and optimization techniques such as LoRA (Low-Rank Adaptation of LLMs) gives a perfect example of this trend. A lot of debates about this topic was generated by the article leaked by Luke Sernau from Google. Although the commoditization of LLM is undeniable, and the performance gap between large and super-large LLMs is closing, there is still a size advantage for growing market-capable conversational agents. Also, as told by Sam Altman, there is a “secret sauce” for ChatGPT, which is “made of hundreds of things” … and lots of accumulated experience on RLHF tuning. If you still have doubts about GPT4 can do, I strongly recommend listening to Sebastien Bubeck. From all the multiple “human intelligence assessments" performed with GPT4 during the past few week, it is clear (to me) that LLMs work beautifully even though we do not fully understand why. As will become clear while reading the rest of the post, I do not completely agree with Luke Sernau’s article (size matters, there is a secret sauce), but I recognize two key ideas : dissemination will happen (it is more likely that we shall have many kind of LLMs of various kinds that a few large general-purpose ones) and size is not all that matters. For instance, DeepMind with its ChinChilla LLM focuses on a “smaller LMM” (70B parameters) that may be trained on a much larger corpus. Smaller LLMs outperform larger one in some contexts because they are easier to train, which is what happened with a Meta’s LLaMA (open sourced) comparison with GPT3. Another trend that is favoring the “distributed/specialized” vision is the path the Google is taking, with multiple “flavors” of its LaMDA (LLM for Dialog Application) and PaLM (Pathways LLM) that is specialized into Med-PaLM, Sec-PaLM and others.

For many reasons, labelling LLMs as “today’s artificial intelligence” is somehow misleading but is it certainly a new exciting form of “Artificial Knowledge”. There are many known limits to LLMs, that are somehow softened with RLHF and prompt engineering, but nevertheless strong enough to keep in mind at all time. First, hallucinations (when the LLM does not have the proper knowledge embedded in its original training set, it finds the most likely match which is not only wrong but very often plausible, hence confusing) mean that you need to be in control of the output (unless you are looking for random output). Hallucinations are tricky because, by construction, when the knowledge does not exist in the training corpus, the LLM “invents” the most plausible completion, which is false but designed to look plausible. There are many interesting examples when you play with GPT to learn about laws, norms or regulations. It does very well at first on generic or common questions but starts inventing subsections or paragraph to existing document while quoting them with (what is perceived as) authority. The same thing happens with my own biography: it is a mix of real jobs, articles, references, intermixed with books that I did not write (with interesting titles, though) and positions that I did not have (but could have had, considering my background).

This is why the “copilot” metaphor is useful : generative AI agents such as ChatGPT are great tools as co-pilots, but you need to “be the pilot”, that is, be capable of checking the validity and the relevance of the output. ChatGPT has proven to be a great tool for creative sessions, but when some innovation occurs, the pilot (you) is doing the innovation, not the machine. As pointed out in the “Stanford AI report”, generative AI is not the proper technique for scheduling or planning, nor is it a forecasting or simulation tools(with the exception of providing a synthesis of available forecasts or simulations that have been published earlier). This is precisely the value of Figure 1: to remind oneself that specific problems are better solved by specific AI techniques. Despite the media hype as LLMs being the “democratization of AI available for all”, I find it easier to see these tools as “artificial knowledge” rather than “intelligence”. If you have played with asking GPT4 to solve simple math problems, you were probably impressed but there is already more that LLMs at work, preprocessing through “chain of thoughts” prompt engineering adds an abstraction layer that is not a native feature of LLMs. We shall return to how generative AI will evolve through hybrid combination and API extensions in the next section. By using the “artificial knowledge” label, I see GPT4 as a huge body of compressed knowledge that may be queried with natural language.

In this blog post, I focus on the language capabilities brought by LLM, but generative AI is a much broader discipline since GPT (generative pre-trained transformer networks) can operate on many other inputs. Also, there are many other techniques, such as stable diffusion for images, to develop “generative capabilities”. As pointed out in the AI Report quoted in the introduction, multi-modal prompting is already here (as shown by GPT4). In a reciprocate manner, a LLM can transform words into words … but it can also transform words into many other things such as programming languages or scripts, 3D models of objects (hence the Satya Nadella quote). Besides the use of knowledge assistants to retrieve information, it is likely that no-code/low code (such as Microsoft Power Apps) will one of the key vector of generative AI introduction into companies in the years to come. The toolbox metaphor becomes truly relevant to see the various “deep learning components” that transform “embeddings” (compressed input) as “Lego bricks” in a truly multi-modal playground (video or image to text, text to image/video/model, model/signals to text/image, etc.). Last, we have not seen the end of the applicability of the transformer architecture to other streams of input. Some of the complex adaptative process optimization problems of digital manufacturing (operating in an optimal state space from the input of a large set of IOT sensors) are prime candidates for replacing the “statistical machine learning” techniques (cf. Figure 1) with transformer deep neural nets.

3. Hybrid AI and Systems of Systems

A key observation from the past 15 years is that state-of-the-art intelligent systems are hybrid systems that combine many different techniques, whether they are elementary components or the assembly of components with meta-heuristics (GAN, reinforcement learning, Monte-Carlo Tree Search, evolutionary agent communities, to give a few examples). Todai Robot is a good example of the former while DeepMind many successes are good examples of the later (I refer you to the excellent Hannah Fry podcast on DeepMind which I have often advertised in my blog post). DeepMind is constantly updating its reinforcement learning knowledge in a form of a composable agent. The aforementioned report from NATF gives other examples, such as the use of encoder/decoder architecture to recognize defects on manufacturing products while training on a large volume of pieces without defects (a useful use cases since pieces with defects are usually rare). The ability to combine, or to enrich, elementary AI techniques with others or with meta-heuristics is the reason for sticking with the “AI toolbox” metaphor. The subliminal message is not to specialize too much, but rather to develop skills with a larger set of techniques. The principle of “hybrid AI” generalizes to “system of systems”, where multiple components collaborate, using different forms of AI. Until we find a truly generic technique, this approach is a “best-of-breed” system engineering method to pick in the toolbox the best that each technique can bring. As noted in the NATF report, “System of systems” engineering is also a way to design certifiable AI if the “black box components” are controlled with (provable) “white box” ones.

ChatGPT is a hybrid system in many ways. The most obvious way is the combination of LLM and RLHF (reinforcement learning with human feedback). If you look closer at the multiples steps, many techniques are used to grow through reinforcement learning a reward system that is then uses to hybrid the LLM’s output. In a nutshell, once a first step of fine tuning is applied, reinforcement learning with human operators is used to grow a reward system (a meta-heuristic that the LLM-base chatbot can later use to select the most relevant answer). Open AI has worked for quite some time to develop various reinforcement learning such as PPO (Proximal Policy Optimization), and the successive versions of RLHF has grown to be large, sophisticated and hybrid in its own way, taking advantage of the large training set brought by the massive adoption (cf. Point #4 of Section 2). As explained by Sam Altman in its YouTube interview, there are many other optimizations that have been added to reach the performance level of GPT4, especially with Chain-Of-Thoughts extensions. It is quite different to think about LLM as a key component or as a new form of all-purpose AI. The title of this blog post is trying to make this point. First let me emphasize that “English Fluency” and “Knowledge compression” are huge breakthroughs in terms of capabilities. We will most certainly see multiple impressive consequences in the years to come, beyond the marvels of what GPT4 can do today. Thinking in terms of toolbox and capabilities mean that “English fluency” – that is both the capacity to understand questions in their natural language format and the capacity to restitute through well-formed and well-balanced English sentence – can be added to almost any computer tool (as we are about to see soon). English is just a language example here, although my own experience tells me that GPT is better with English than French. However, when you consider the benefits of being able to query any application in natural language versus following the planned user interface, one can see how “English fluency” (show me your data, explain this output, justify your reasoning ….) might become a user requirement for most applications of our information systems. On the other hand, recognizing that GPT is not a general-purpose AI engine (cf. the previous comment about planning, forecasting and the absence of world model other than the compression of experiences embedded into the training set) has led OpenAI to move pretty fast on opening GPT4 as a component, which is a sure way to promote hybridization.

The opening of GPT4 with inbound and outbound APIs is happening fast considering the youth of the Open AI software components. Inbound APIs help you to use GPT4 as a component to give dialog capabilities to your own system. Think of is as “prompt engineering as code”, that use using the expressive power of computer languages to probe GPT4 in the directions that suit your needs (and yes, that covers the “chain of thoughts” approach mentioned earlier, that is instructing GPT4 to solve a problem step by step, under the supervision of another algorithm – yours – to implement another kind of knowledge processing). Outbound APIs means to let ChatGPT call your own knowledge system to extend its reasoning capabilities or to have access to better form of information. Here the best example to look at is the combination of Wolfram Alpha with GPT. Another interesting example is the interplay between GPT and knowledge graphs. If you remember the toolbox of Figure 1, there is an interesting hybrid combination to explore, that of semantic tools (such as ontologies and knowledge graphs) with LLM capabilities from tools such as ChatGPT. Thus, a reason for selecting this post title, was to draw the attention of the reader on the fast-growing field of GPT APIs, versus thinking of GPT as a stand-alone conversational agent.

The idea that we need a system of systems approach to build “general purpose AI” (not AGI, “just” complex intelligent systems with a large range of adaptative behavior) is only a hypothesis, but one that seems to hold for the moment. I am reproducing below a figure that I have overused in my blog posts or talks, but that illustrate this point pretty well. The question is how to design a smart autonomous robot for a factory, that is able to learn on its own but also able to learn as a community, from similar robots deployed in the factory or similar factories. Community learning is something that Tesla cars are doing by sharing vehicle data so that experience grows much faster. Any smart robot would build on the neural net AI huge progress made a decade ago about perception and recently (this post’s topic) about natural language interaction. On the other hand, a robot needs to have a world model, to generate autonomous goals (from larger goals, by adapting to context) and then to plan and schedule. The robot (crudely) depicted in the picture illustrates the combination of many forms of AI represented in Figure 1. Security in a factory is a must; hence the system of systems is the choice framework to include “black box” components under the supervision of certifiable/explainable AI modules (from rules to statistical ML inference, there are many auditable techniques). Similarly, this figure illustrates the dual need for real-time “reflex” action and “long-term” learning (which can be distributed on the cloud, because latency and security requirements are less stringent).

Figure 2: Multiple AI bricks to build a smart robot community

4. AI Ecosystem Playbook

This last section is about “playing” the AI ecosystem, that is taking advantage of the constant flow of innovation in AI technology, while keeping one’s differentiation know-how safe. 2023, with the advent of the multiple versions and variants of GPT, makes this question/issue very acute. On the one hand, you cannot afford to miss what OpenAI and Microsoft (and Google, and many others) are bringing to the world. On the other hand, these capabilities are proposed “as a service”, and require a flow of information from your company to someone else’s cloud. The smarter you get with your usage, the more your context/prompts grow, the more you tell about yourself. I also want to emphasize that this section deals with only one (very salient but limited) aspects of protecting the company when using outside AI tools. For instance, when using GPT4 or GitHub copilot to generate code, the question of the IP status of the fragments “synthesized” from the “open source” training data is a tough one. Until we have (source) attribution as a new feature of generative AI, one has to be careful with the commercial use of “synthetic answers” (a large part of open-source code fragments requires the explicit mention of their provenance).

The following figure is a simplified abstraction of how we see the question of protecting our know-how at Michelin. It is based on recognizing three AI domain:

“core” AI: when the algorithm reproduces a differentiation process of the company (this is obvious for manufacturing companies but is much more widely applicable). What defines “core AI” is that the flow of information (data or algorithm) can only be from outside to inside. In many cases, telling a partner (a research lab or a solution vendor) about your digital traces (from the machines, connected products or IOT-enriched processes) is enough to let others become experts in your own field with the benefit of your own experience that is embedded into your data. Deciding that a domain is “core” is likely to slow you down because it is a large burden on playing “the ecosystem game”, but it is sometimes wised to be later rather than being disrupted.
“Industry AI” is what you do, together with your competitors, but is specific to your industry. This is where there is more to be gained to reuse solutions or techniques that have been developed by the outside ecosystem to solve problems that you share with others. Even though there are always aspects that are unique to each manufacturing, distribution, supply chain situation, the nature of the problems is common enough that “industry solutions” exist, and sharing your associated data is no longer a differentiation risk.
“Commodity AI” represents the solutions for problems that are shared across all industries, activities that are generic, for instance for “knowledge workers”, and offer similar optimization and automation opportunities across the globe. Because of the economy of scale, “commodity AI” is by now mean lower quality: it is the opposite, commodity AI is developed by very large players (such as the GAFAM) on very large set of data and represent the state of the art of methods in Figure 1.

Figure 3: AI Ecosystem – Trade-off between differentiation and leverage

This distinction is a little bit rough, but it yields a framework about what to use and what not to use from the outside AI ecosystem. As shown in Figure 3, Core AI is where you make your solutions. It does not mean that you do not need to learn about the state of the art by reading the relevant literature or implementing some of the new algorithms, but the company is in charge of making its own domain specific AI. It also requires extra care from a network isolation and cybersecurity perspective because when your process/product know-how is embedded into an AI piece of software the risk of both IP theft and very significant cyber-attacks grows. Industry AI is the realm of integration of the “best of breed” solutions. The main task is to identify the best solutions which requires a large exchange of data and to integrate them, to build your own “systems of systems”. Customization to your needs often requires writing a little bit of code of your own, such as your own meta-heuristics, or your own data processing/filtering. These solutions also required to be protected from cybersecurity threats (for the same reasons, the more you digitize your manufacturing, the more exposed you are), but IP theft from data leaks is less of a problem (by construction). Industry AI is based on trust with your partners, so selecting them is critical. Commodity AI are solution that already existed before you considered them, they are often proposed as a service, and it is wise to use them while recognizing that your level of control and protection is much lower. This is the current ChatGPT situation: you cannot afford to miss the opportunity, but you must remember that your prompt data goes to enrich the cloud base and may be distributed – including to competitors – later on. Since commodity AI has the largest R&D engine in the world (tens of billions of dollars), it has to be a key part of your AI strategy, but learning to use “AI as a service” with data and API call threads that do not reveal to much is the associated skill that you must learn to develop.

Figure 3 also represents a key idea, that of “public training sets”, which we implemented at Bouygues 25 years ago. Training sets are derived from your most significant problems, using either data from “industry AI” problems that have been cleaned or, sometimes, data from “CORE AI” problems that have been significantly transformed so that the problem is still here, but the reverse engineering of your own IP is no longer possible. Training sets are used both internally to evaluate the solutions of outside vendors, but they can be shared to facilitate and accelerate the evaluation. As pointed out in the following conclusion, knowing how your internal solution and pieces of system stand against the state of the art is a must for any AI solutions. Curating “training set” (we used to call them “test sets” when the preferred optimization technique was OR algorithms, and moved to “training sets” with the advent of machine learning). It easy for technical teams to focus on delivering “code that works” but the purpose of the AI strategy is to deliver as much competitive value as possible. Training sets may be used to organize public hackathons, such as the ROADEF challenge or such as the famous competition that Netflix organized, more than a decade ago, on recommendation algorithms. Experience shows that learning to curate the training sets for your most relevant problems is a great practice – as any company who has submitted a problem to the ROADEF challenge knows. It forces communication between the teams and is more demanding than it sounds. Foremost, it embodies the attitude that open innovation (looking out for what others are doing) is better than the (in)famous “ivory tower” mentality.

5. Conclusion: Beware of Exponential Debt

The summary for this blog post is quite simple:

Many breakthroughs have happened in the field of LLM and conversational agents. It is a transformative revolution you cannot afford to miss. Generative-AI-augmentation will make you more productive, provided that you keep being “the pilot”.
Think of LLM as unique capabilities: language fluency and knowledge management. They work as standalone tools, but much more value is available if you think “systems of systems” and start playing with in/out APIs and extended contexts.
You cannot afford to go alone, you must play the ecosystem, but find out how to benefit from the external solutions without losing the control of your internal knowledge.

I will not attempt to conclude with a synthesis about the state of AI that could be proven wrong in a few months. On the contrary, I will underline a fascinating consequence of the exponential rhythm of innovation: whichever piece of code you write, whichever algorithm you use, it becomes obsolete very fast since its competitive performance follows a reverse law of exponential decay. In a tribute to the reference book “Exponential Organizations”, I call this phenomenon exponential debt, which is a form of technical debt. The following figure (borrowed here) illustrates the dual concept of exponential growth and exponential decay. What exponential debt means is that, when AI capabilities grow at an exponential rate, any frozen piece of code has a relative performance (compared with the state of the art) that decays exponentially.

This remark is a nice way to loop back on the necessity to build exponential information systems, that are modular systems that can be changed constantly with “software as flows” processes. As pointed by Kai-Fu Lee, AI requires science and engineering, because AI is deployed as a “modality” of software. Scientific knowledge is easily shared, engineering requires experience and practice. Being aware of exponential debt is one thing, being able to deal with it requires great software engineering skills.

This is one of the key topics of my upcoming Masterclass on June 23rd.

Biology of Distributed Information Systems

Saturday, November 25, 2023

To Keep One’s Promises in a VUCA Word

1. Introduction

2. Waiting Queues as Demand Management

3. Demand-Adaptative Priorities versus Kanban

4. Service Classes and Policies

5. Conclusion

Wednesday, May 31, 2023

Adding Language Fluency and Knowledge Compression to the AI Toolbox

1. Introduction

2. Large Languages Models and Generative AI

3. Hybrid AI and Systems of Systems

4. AI Ecosystem Playbook

5. Conclusion: Beware of Exponential Debt

My Links

Blog Archive

Other Blogs and Sites