1. Introduction
The past 16 months have been really exciting. The progress on generative AI have been so impressive that newspapers and media tend to confuse genAI and AI, which is a shame because there is so much more about AI than genAI and mostly because many problems have beautiful AI solutions but are not meant for genAI (planning and scheduling for instance). Still, we have seen a constant stream of breakthroughs and innovations:
- Large-language models are evolving very fast. Size has been growing very fast in the past 3 years and this is not over. Architecture is also progressing quickly with “society of minds” or “mixture of expert” approaches (Mistral.ai or Gemini 1.5 for instance). The race between the big giants and the open-source approach is also fascinating to watch.
- We have seen LLM-based tools explode with creativity, starting with ChatGPT and RLHF, and many forms of context extensions, such as RAG (Retrieval Augmented Generation) using a wide variety of embedding to hybrid LLMs with other knowledge engineering tools (ontologies, semantic graphs, etc.).
- As a consequence, hybridizations of LLMs with other AI techniques are everywhere. When you play with the more advanced tools such as GPT4 or Gemini 1.5, chain of thought (CoT: link) is used comprehensively to mix different kind of reasoning tools with the raw power of LLMs. The availability of API (both ways, to use LLM as a slave of a higher-level knowledge assistant or to use a specialize solver as a slave for your LLM) has boosted the creativity and the scope of what a knowledge assistant means today.
If you feel dizzy, with you mind spinning with so much news, so many new techniques and so much to read to keep afloat, you are not alone. This is a good moment to step back and work a little bit on “your foundations”, that is your mental model to sort out all these innovations and techniques.
The good news is that I have just the right book for this. This blogpost is centered around “A Brief History of Intelligence” by Max Bennett. This is the perfect book to refresh and improve your understanding of natural intelligence and see the multiple links with artificial intelligence, both the big picture over the past decades and the more recent tumultuous developments of generative AI. For my regular blog readers, let me say that this is my “book of the decade”. It goes in my pantheon of exceptional books, together with “A Treatise of Efficacy” by François Jullien or “Fooled by Randomness” by Nassim Taleb.
This is an exceptional book because it mixes two perspectives. On the one hand it is a biology vulgarization book about natural intelligence, about how our brain works. Max Bennett is not a neuroscientist or a biologist, but he has compiled a vast body of knowledge and managed to enroll the best experts to help him write a narrative that is both fascinating and easy to read. As he puts it, “A Brief History of Intelligence is a synthesis of the work of many others. At its heart, it is merely an attempt to put together the pieces that were already there”. On the other hand, Max Bennet is a computer scientist by training and this book provides a systemic vision of natural intelligence (a crude abstraction, by construction) with all the related links to the progress of artificial intelligence in the past decade. This means that you get from this reading a great mental model to sort out all the key concepts of neural networks, deep learning and many related AI techniques such as reinforcement learning, generative adversarial networks or generative AI.
2. A Brief History of Intelligence
2.1 Steering
The first breakthrough, that of primitive animals, has been the ability to steer in complex environment to find what they needed, food, water, etc., through simple heuristics that processes the stimuli produced by the animal senses. Max Bennet explains, through the precise examples of paths from these simple animals (as simple as a nematode C. elegans in a petri dish) navigating towards food, which simple set of rules are implemented in these primitive brains. It can be summarized as: if the positive stimuli increase, keep going, otherwise make a turn. “This was the breakthrough of steering. It turns out that to successfully navigate in the complicated world of the ocean floor, you don’t actually need an understanding of that two-dimensional world. You don’t need an understanding of where you are, where food is, what paths you might have to take, how long it might take, or really anything meaningful about the world”. This strategy is similar to how the Roomba, the vacuum-cleaner robot co-created by Rodney Brooks two decades ago, operates : “ Whenever the Roomba hit a wall, it would perform a random turn and try to move forward again. When it was low on battery, the Roomba searched for a signal from its charging station, and when it detected the signal, it simply turned in the direction where the signal was strongest, eventually making it back to its charging station”.
The first step of this steering breakthrough, at first based on simple stimuli and greedy optimization, evolved with bilaterians animals (with a symmetry axis) with the associative learning thanks to neurons and neuromodulators. “The ubiquitous presence of associative learning within Bilateria and the notable absence of it outside Bilateria suggests that associative learning first emerged in the brains of early bilaterians. It seems that at the same time valence—the categorizing of things in the world into good and bad—emerged, so too did the ability to use experience to change what is considered good and bad in the first place”. To make associative learning possible, the first step is to grow goals and rewards, which are achieved through emotions. For rudimentary animals, these emotions can be characterized through valence and arousal : “ Neuroscientists and psychologists use the word affect to refer to these two attributes of emotions; at any given point, humans are in an affective state represented by a location across these two dimensions of valence and arousal ”. Max Bennet explains how bilaterians have developed the two most famous neuromodulators, dopamine and serotonin, to grow an adaptive behavior for the steering that create associative memories of the signals that are positive indicators of goal completion (for instance, when searching food in a maze, an example that is used throughout the book to show the progressives stages of natural intelligence development). In one such famous experience conducted by Berridge on rats, it was found that dopamine was not an indicator of actual satisfaction but expected satisfaction to come, a building block for the plan-do-reward-learn cycle : “Dopamine is not a signal for pleasure itself; it is a signal for the anticipation of future pleasure”. I will not try to reproduce here the careful reconstruction of associative learning through the rudimentary brain, you should read the book because this first section about the evolution of associative learning and how to optimize a goal with a series of possible conflicting stimuli is fascinating. A key challenge, in real life as well with machine learning, is the “credit assignment problem”, that is, if you have a history of weak signals that may indicate parts of the good decisions that led you (the animal) to an ultimate reward, how do you decide which factors were significant and which were minor, through learning algorithms ? As Max Bennet states, “The ancient bilaterian brain, which was capable of only the simplest forms of learning, employed four tricks to solve the credit assignment problem. These tricks were both crude and clever, and they became foundational mechanisms for how neurons make associations in all their bilaterian descendants”. Here is simplified list of these four “tricks”:
- The first trick used what are called eligibility traces. Associations between events are only kept if they occur within a given time period.
- The second trick is overshadowing: the brain picks the clues that are the strongest and ignore the weakest.
- The third is latent inhibition: frequent stimuli are flagged as irrelevant.
- The fourth is blocking: once a predictive association has been established, other possible causes are blocked away, to avoid rule conflicts.
2.3 Reinforcement
Whereas the behavior that we have seen in the previous section may be qualified as adaptative algorithm with associative learning, the second breakthrough came from reinforcing, that is “learning to repeat behaviors that historically have led to positive valence and inhibit behaviors that have led to negative valence”. The second breakthrough is a learning breakthrough, which is extracting more knowledge from trial-and-error. This behavior is exhibited by many animals, including obviously mice but also fishes, who have been exposed by research scientists for a very long time to quite complex puzzles and mazes: “What was most surprising was how much intelligent behavior emerged from something as simple as trial-and-error learning. After enough trials, these animals could effortlessly perform incredibly complex sequences of actions … Fish can learn to find and push a specific button to get food; fish can learn to swim through a small escape hatch to avoid getting caught in a net; and fish can even learn to jump through hoops to get food. Fish can remember how to do these tasks for months or even years after being trained”. These smart behaviors emerge from reinforcement learning, a foundation concept both for artificial intelligence (Max Bennet quotes the first reinforcement learning algorithm from Marvin Minsky in 1951) and natural intelligence: “Responses that produce a satisfying effect in a particular situation become more likely to occur again in that situation, and responses that produce a discomforting effect become less likely to occur again in that situation”. The progress compared to the simpler steering strategies of the previous sections comes from the more sophisticated strategy to solve the credit assignment problem.
This issue was actually noted in the artificial intelligence world: “ Minsky realized that reinforcement learning would not work without a reasonable strategy for assigning credit across time; this is called the temporal credit assignment problem”. Max Bennet explains how Richard Sutton found a solution in the algorithm world, while at the same time neuroscientist found that similar approaches were at play in the brain of the fish or mouse trying to solve their puzzle. The key idea, that we will see in many places, is the training of a predictive model, in an adversarial manner : the brain or the algorithm constructs a model of the future rewards that are expected, and the learning algorithm modifies this model through observation, with a focus on time difference between possible cause and signals: “The signal on which the actor learns is not rewards, per se, but the temporal difference in the predicted reward from one moment in time to the next. Hence Sutton’s name for his method: temporal difference learning”. From a machine learning perspective, this adversarial pattern is everywhere (for instance, Generative Adversarial Networks): “In his simulations, however, Sutton found that by training an actor and a critic simultaneously, a magical bootstrapping occurs between them”. The focus on time differences between causes and effects (a first step into the search for causality, which is how a “world model” is built) lead to the term of “temporal difference learning”. The dopamine neuromodulator plays a critical role in the brain for this time difference learning: “As Sutton found, reinforcement and reward must be decoupled for reinforcement learning to work. To solve the temporal credit assignment problem, brains must reinforce behaviors based on changes in predicted future rewards, not actual rewards”. The link between natural and artificial intelligence science is beautifully illustrated with this topic: “In 1997 Dayan and Montague published a landmark paper, coauthored with Schultz, titled “A Neural Substrate of Prediction and Reward.” To this day, this discovery represents one of the most famous and beautiful partnerships between AI and neuroscience”.
Emotions emerge – through evolution - from temporal difference learning. For instance, “Both disappointment and relief are emergent properties of a brain designed to learn by predicting future rewards”. Curiosity (and its associated behavior which we call “sense of humor”) is also critical to improve the performance of learning. Max Bennet gives different examples of machine learning algorithms and how they grew to solve more and more sophisticated problems (like the animals in the mazes): “It wasn’t until 2018 when an algorithm was developed that finally completed level one of Montezuma’s Revenge. This new algorithm, developed by Google’s DeepMind, accomplished this feat by adding something familiar that was missing from Sutton’s original TD learning algorithm: curiosity”. Curiosity is an emotion that adds both valence and arousal to discovery and to surprise. Its us in artificial intelligence is growing – especially with games, as we shall see in the next section: “The approach is to make AI systems explicitly curious, to reward them for exploring new places and doing new things, to make surprise itself reinforcing. The greater the novelty, the larger the compulsion to explore it. When AI systems playing Montezuma’s Revenge were given this intrinsic motivation to explore new things, they behaved very differently—indeed, more like a human player.
One explanation for this is that vertebrates get an extra boost of reinforcement when something is surprising”. This reminds me strongly of the observations made by Michio Kaku in his great book, The Future of the Mind : our curiosity and our love for surprise (two sides of the same kind) is an evolutionary trait that rewards the constant updating and improving of our mental model of the world (to follow the thinking of Yann Le Cun, to which I return in the next section). I quote here from an earlier blog post : “A great illustration of this idea proposed by Mikio Kaku is the sense of humor, which may be described as our ability to appreciate the difference between what we expect (the outcome of our own world model simulation) and what happens. This is how magic tricks and jokes work. Because we value this difference, we are playful creatures: we love to explore, to be surprised, to play game. Kaku makes a convincing argument that the sense of humor is a key evolution trait”.
As noticed by Max Bennet, the brains of simple animals such as fishes are still full of mysteries. AI made significant progress in computer vision through convolutional neural networks (CNN), but the simpler visual brain of a fish is able to do wonders without CNN: “CNNs were inspired by the mammal visual cortex, which is much more complex than the simpler visual cortex of fish; and yet the fish brain—lacking any obvious hierarchy or the other bells and whistles of the mammalian cortex—is still eminently capable of solving the invariance problem. … How the fish brain does this is not understood. While auto-association captures some principles of how pattern recognition works in the cortex, clearly even the cortex of fish is doing something far more sophisticated”.
2.3 Simulating
The third breakthrough in the evolution of natural intelligence occurred around one hundred million years ago when our “four-inch-long mammal ancestors” became able to make mental simulations, both of stimuli and actions, developing subregions of the cortex that would become the modern neocortex. “If the reinforcement-learning early vertebrates got the power of learning by doing, then early mammals got the even more impressive power of learning before doing—of learning by imagining”. This is another fascinating part of the book, because brain simulation is essential to who we are and how we interact with the world. As told by Max Bennet: “You can imagine the dinner you ate last night or imagine what you will be doing later today. What are you doing when you are imagining something? This is just your neocortex in generation mode. You are invoking a simulated reality in your neocortex”.
First, it is very large in scope, the brain is able to exercise its simulation abilities on a very broad set of capabilities: “Further, if Mountcastle’s theory is correct, it suggests that the neocortical column implements some algorithm that is so general and universal that it can be applied to extremely diverse functions such as movement, language, and perception across every sensory modality”. Second, simulation is based on the capacity to flow information both ways, from reality to model and from model to reality: “But unlike other neural networks, it also had backward connections that flowed the opposite way—from the end to the beginning”. This applies also to perception: we do not “simply see or hear” the real world, we match a “computer model of the world” with a flow of stimuli : “Helmholtz suggested that much of human perception is a process of inference—a process of using a generative model to match an inner simulation of the world to the sensory evidence presented”. You may pause on this, it means that generative AI is not a new idea, your brain does it all the time. It is more efficient, more powerful, but also the source of biases, illusions and hallucinations: “ Indeed, the neocortex as a generative model explains more than just visual illusions—it also explains why humans succumb to hallucinations, why we dream and sleep, and even the inner workings of imagination itself”.
To follow up of the pattern presented in the previous section, of constant updating of our “world model”, “The neocortex is continuously comparing the actual sensory data with the data predicted by its simulation. This is how you can immediately identify anything surprising that occurs in your surroundings”. Interestingly, I also learned reading this book that you cannot flow in both directions: if you start to imagine, it blocks the capacity to process outside stimuli … so if you are in deep thinking and creative mode, you cannot listen to other people (or to your wife, in my case) at the same time. Something that I have noticed (and that my family has reproached me for decades) … but now I know why 😊 This constant interplay between what we model/predict and what we see explains the power of narratives: “This is also why generative models are said to try to explain their input—your neocortex attempts to render a state of the world that could produce the picture that you are seeing (e.g., if a frog was there, it would “explain” why those shadows look the way they do)”. What I find fascinating here is that neurosciences give a powerful explanation about what Nassim Taleb calls the “narrative fallacy” (trying to build a model from meaningless input).
The beauty of this book is that Max Bennet tells us how neuroscientists have been able to confirm this happening in mammals’ brains: “How groundbreaking this was cannot be overstated—neuroscientists were peering directly into the brain of a rat, and directly observing the rat considering alternative futures”. This third breakthrough is what separates us mammals from other animals such as fish: “This was the gift the neocortex gave to early mammals. It was imagination—the ability to render future possibilities and relive past events—that was the third breakthrough in the evolution of human intelligence”. For lack of space, I will not reproduce the detailed explanation about the aPFC (agranular prefontal cortex) and how it helps the rat to make better choices by simulating the possible consequences of his choices (e.g., when navigating a maze). In the metaphor of the adversarial training explained earlier, the aPFC trains the basal ganglia which is the place where reinforcement learning occurs: “The emergent effect of all this is that the aPFC vicariously trained the basal ganglia that left was the better option … This is consistent with the idea that the neocortex enables even simple mammals such as rats to vicariously simulate future choices and change their behaviors based on the imagined consequences”. Frontal neocortex main four functions are attention, working memory, executive control and planning. These functions are controlled by the aPFC because they are different manifestations of our brain performing a simulation to optimize a decision: “ Without a neocortex to pause and vicariously consider options, the only way lizards learn this task is through endless real trial and error. In contrast, rats learn to inhibit their hardwired responses much more rapidly, an advantage that disappears if you damage a rat’s aPFC”.
Imagination and simulation support counterfactual reasoning, which is critical to understanding causality. This means that simulation offers a new set of tools to build a causality model of the world that is richer and deeper that what could be learned with the only application of reinforcement learning: “What fish are missing is the ability to learn from counterfactuals. A counterfactual is what the world would be now if you had made a different choice in the past”. This brings us to the seminal book of Judea Pearl, “The Book of Why”, which is have briefly commented in an old blog post about “hunting for causality”, as said by Max Bennet, without counterfactuals , there is no way to distinguish between causation and correlation … “Causation is constructed by our brains to enable us to learn vicariously from alternative past choices”. This also leads to the distinction between model-free and model-based learning. Reinforcement learning does not require a causal world model to operate, they grow through experience a reward model of stimuli to actions: “ Most of the reinforcement learning models employed in modern technology are model-free. The famous algorithms that mastered various Atari games and many self-driving-car algorithms are model-free”. Another way to say it is that the machine learning code that made Demis Hassabis famous because it became able to play arcade games at a super-human level achieved this without “understanding the rules of the game that it was playing”, that is without a causal model of the game (the ball, the bricks, the rebounds, etc.). The duality between model-based and model-free decision-making methods shows up in different forms across different fields. In animal psychology, this same duality is described as goal-driven behavior and habitual behavior. And in behavioral economics, as in Daniel Kahneman’s famous book Thinking, Fast and Slow, this same duality is described as “system 2” (thinking slow) versus “system 1” (thinking fast).
2.4 Mentalizing
The fourth breakthrough that is identified by Max Bennett is called “Mentalizing”, which is the ability to model one’s mind, and consequently the mind of one’s neighbors. He sees this capacity as emerging sometime around ten to thirty million years ago, when new regions of neocortex evolved in early primates that build a reflective model of older brain capabilities. This idea that the primate neocortex is associated the socializing capabilities of primates is famously linked to Robin Dunbar. He was the first to notice a correlation between the size of primate brains and the size of their social networks: “This correlation has been confirmed across many primates: the bigger the neocortex of a primate, the bigger its social group”. The famous “Dunban number” – commonly estimated at 150 meaningful and rich social connections for a human – is quoted everywhere, including in my own blog posts. After the breakthrough of simulation, this is both a breakthrough of reflection (building a model of oneself) and a the breakthrough of “theory of mind” : “This act of inferring someone’s intent and knowledge is called “theory of mind”—so named because it requires us to have a theory about the minds of others. It is a cognitive feat that evidence suggests emerged in early primates. And as we will see, theory of mind might explain why primates have such big brains and why their brain size correlates with group size”. This “cognitive feast” is allowed by the extension of the prefrontal cortex of primates: “prefrontal cortex becomes uniquely active during tasks that require self-reference, such as evaluating your own personality traits, general self-related mind wandering, considering your own feelings, thinking about your own intentions, and thinking about yourself in general”. This prefrontal cortex extension is explained by the capacity to trigger neurons that are associated with an action when we see others doing this action: “ the neurons in the premotor and motor areas of a monkey’s neocortex—those that control a monkey’s own movements—not only activated when they performed those specific fine motor skills, but also when they merely watched others perform them. Rizzolatti called these “mirror neurons.””. I will let you deep dive into the fascinating topic of mirror neurons by reading the book, there is still much that is unexplained and there are many competing theories about the actual functioning.
To help you visualize the content of this blog post, I have borrowed one of the great illustrations by Rebeca Gelenter from the book. These illustrations are a huge contribution to Max Bennet’s book and one more reason why you should get your own copy. As noted by the author, these illustrations and explanations are a crude abstraction of the reality – “We are in abstract land here—this is hardly a detailed algorithmic blueprint for how to build an AI system with theory of mind. But the idea of bootstrapping theory of mind by first modeling one’s own inner simulation, of modeling yourself to model others, provides an interesting waypoint” – but they are definitely useful.
This capability of reflective modeling of our own, and others, goals and thoughts is critical for the AI systems of the future, as pointed out by Max Bennett : “If we want AI systems and robots that can live alongside us, understand the type of people we are, deduce what we don’t know that we want to know, infer what we intend by what we say, anticipate what we need or want before we tell them, navigate social relationships with groups of humans, with all of their hidden rules and etiquettes—in other words, if we want true humanlike AI systems, theory of mind will undeniably be an essential component of that system”. Max Bennett then tells the story of Dean Pomerleau and Chuck Thorpe developing ALVINN, an AI system for autonomous driving. After trying to let the system learn by itself, the scientist moved to active teaching, where a human driver would let the machine drive but correct the steering wheel if a mistake was noticed. “This strategy of active teaching worked fantastically. When only directly copying driving (like ALVINN was trained), Ross’s AI system was still crashing cars after a million frames of expert data. In contrast, with this new strategy of active teaching, his AI system was driving almost perfectly after only a handful of laps”. The development of prefrontal context, social skills and neuron mirrors have helped primate to develop another trait of mentalizing, which is to learn very quickly through imitation. In the conclusion of the book, Max Bennet summarized the three traits of the “mentalizing” breakthrough as follows:
- Theory of mind: inferring intent and knowledge of others.
- Imitation learning: acquiring novel skills through observation.
- Anticipating future needs: taking an action now to satisfy a want in the future, even though I do not want it now.
This third capability is something that differentiate primates versus other mammals: “Fascinatingly, squirrel monkeys learn to select the low treat option, while rats continue to select the high treat option. Squirrel monkeys are capable of resisting the temptation to have treats now, in anticipation of something—water—that they don’t even want yet”. This third capability is also illustrated with another AI example, that of Pieter Abbeel, Adam Coates, and Andrew Ng who developed an AI system to autonomously fly a remote-controlled helicopter. They found that the most effective strategy was to separate the learning of the lower level of behavior, that of flying the helicopter, with the learning of the higher level concepts, such as properly flying the helicopter : “This technique is called “inverse reinforcement learning” because these systems first try to learn the reward function they believe the skilled expert is optimizing for (i.e., their “intent”), and then these systems learn by trial and error, rewarding and punishing themselves using this inferred reward function”. Inverse reinforcement learning is an extension of the adversarial pattern of two layers of learning that we mentioned earlier, and a key component of RLHF (reinforcement learning with human feedback), the technique that helped OpenAI grow ChatGPT from GPT.
2.5 Language
The last section of this fifth part deals with LLM (large language models). Although LLMs have been around for a while – Max Bennet quotes the example of Jeffrey Ellman : “ In the 1990s, a linguist and professor of cognitive sciences at UC San Diego, Jeffrey Elman, was one of the first to use neural networks to try to predict the next word in a sentence given the previous words” – it is clear, as we noticed in the introduction and as is underlined in this book, that we are seeing a phenomenal acceleration of their capabilities :” Language models have been around for a long time, but LLMs like GPT-3 are unique in their almost inconceivable scale”. The book was written when GPT4 was just being released, and the author notices that GPT4 has fixed many of the shortcomings of its predecessor – “Amazingly, each question that I designed in this chapter to demonstrate a lack of common sense and physical intuition in GPT-3 was answered flawlessly by GPT-4”, but there is more to LLM growth, there are also many other mechanisms at play, such as Chain of Thoughts, as we shall see briefly in the next section “By training GPT-4 to not just predict the answer, but to predict the next step in reasoning about the answer, the model begins to exhibit emergent properties of thinking, without, in fact, thinking—at least not in the way that a human thinks by rendering a simulation of the world”. Max Bennett gives some interesting examples of systems that are trained by playing against themselves: “TD-Gammon was trained by playing against itself. TD-Gammon always had an evenly matched player. This is the standard strategy for training reinforcement learning systems. Google’s AlphaZero was also trained by playing itself. The curriculum used to train a model is as crucial as the model itself”. He concludes the book with the fact that LLMs are an incredible step forward, and a building block for the future AI systems to come: “ I think most would agree that the humanlike artificial intelligences we will one day create will not be LLMs; language models will be merely a window to something richer that lies beneath”.
3. Hybrid AI of Tomorrow
This second section proposes a few thoughts about the future of AI based on some ideas that were exposed in this book. It will be much shorter, since this blog post is long enough already, and is more offered as “food for thoughts” than a self-contained essay. The general theme here is the combination of different AI techniques (heuristics, algorithms, or meta-heuristics) and the important role that LLM can play in such hybrid combinations, which should not come as a surprise after reading Max Bennett: “In the human brain, language is the window to our inner simulation. Language is the interface to our mental world. And language is built on the foundation of our ability to model and reason about the minds of others—to infer what they mean and figure out exactly which words will produce”.
3.1 LLMs as Software components
As was explained in section 2.5, LLMs should not be seen as “the future of AI”, but definitely as a key component. This component view helps to emphasize that we see constant progress on three dimensions (as mentioned in the introduction
- LLM are getting more powerful : they are getting bigger (although the billions of parameters of the larger models, such as GPT4, Gemini Ultra or Claude3 are not always shared or exact), they support large contexts and their structure become more complex (a federation of smaller LLMs). It started with GPT4 and evolved into MoE (Mixture of Experts) a pattern used both by Mistral and Gemini. This competition (see the recent performance of Claude 3 Opus) is pushing openAI to accelerate the release of the long-awaited (because testing results have transpired) GPT5.
- The techniques to get more value from a given LLM, through prompting, fine-tuning or mixed-embedding prompting (see below about RAG) are improving constantly. It is an “orthogonal progress” axis since it means that with the same LLM, we can get answers which are more and more relevant to a specific domain.
- LLMs may be used as Lego bricks and combined with other techniques, including other forms of AI. The CoT (Chain Of Thought) pattern, where another AI algorithm breaks down a larger task into smaller tasks is a good illustration. When playing with advanced gen AI (GPT4 or Gemini, as I do in this column) logical or numerical puzzles, one can see the power of CoT at work.
Here, for lack of time, I am focusing on LLMs but there are many other foundation models that work on images, videos or other form of time-dependent materials such as time series.
Sometimes, these different approaches seem to overlap. For instance to add a body of domain specific knowledge to an existing LLM, should one fine-tune, use a large context (leveraging the fact that, as mentioned earlier, new LLMs accept very large contexts) or use RAG (retrieval Augmented generation) ? As explained in this blog post, RAG is an architecture that provides the most relevant and contextually-important proprietary, private or dynamic data to your LLM when it is performing tasks to enhance its accuracy and performance, using an embedding model to translate your private data into vectors that the LLM can use in a customized prompt. What is fascinating with RAG is that every month a new extended technique is proposed that we could see as “enhanced RAG”, with ontologies, knowledge graphs or other forms of semantic networks. Here again, we see that not only are LLM powerful tools, but they can easily be combined (through the sharing of embeddings) with other knowledge management techniques. For instance, ontologies have been developed for many decades as semantic hierarchies that represent the synonyms and inclusion relationships between terms. For plain English, the corpus used to train the LLMs is large enough to capture these relations, but this is not necessarily the case for a domain-specific language. Combining LLM with ontologies is a way to improve the RAG search capabilities, as shown with the OLaLa system, “with only a handful of examples and a well-designed prompt, it is possible to achieve results that are on par with supervised matching systems which use a much larger portion of the ground truth”. Ontologies are one kind of tree-structured knowledge graphs, there are many way to build and use the “semantic net” (to use a term form the 80s) structure of conceptual knowledge. In the same manner, knowledge graphs may be combined with LLMs and with RAG. This article shows how knowledge graphs may be used in the domain of medical data.
3.2 Augmented Simulation, Digital Twin and Hybrid Agents
The second key idea from Max Bennett’s book is that intelligence requires simulation, causality and counter-factual reasoning, which has been a key claim of Yann Le Cun for many years. The use of simulation and digital twins correspond precisely to the construction and use of a causal model of the world. There is a continuum of techniques in the world of artificial intelligence to model causality, from Bayesian inference to Causality links. The necessity of causal models to perform advanced and innovative tasks does not take away the fact than inductive models (data-driven) can do wonders in many cases. The following figure is a slide taken from a presentation written at the time when DeepMind announced the great forecasting results of its weather model based on deep learning, GraphCast. When a problem is stable enough, experience shows that deep learning is able to make predictions both accurately and with better speed than “traditional” model-based approaches (such as finite-element simulations). However, these inductive models are good at reproducing “truths from the past”, not at exploring completely new situations. This is where causal models are required, that separates causal links that keep being accurate in “the new world” from correlations that are no longer relevant. What this slide says is that the two approaches should be opposed but combined. This is what we do at Michelin when we use deep learning to accelerate the performance of finite element simulations.
I have quoted many times Yann Le Cun when he says that forecasting is the true measure of intelligence (from a practical perspective, which is very much what Max Bennett says). I definitely agree about the importance of common sense, causal models and simulation of an “inner world model” as a foundation for the next generation of artificial intelligence. Does it mean, as is reported about Yann Le Cun in this article, that generative AI as we know it is a dead end ? The book from Max Bennett helps to understand the criticism made by Yann Le Cun to generative AI, and there seems to be a consensus (recall the quote made at the end of Section 2) that we need more than LLMs to build the” AI of the future”. This being said, there are three reasons to be less affirmative about the fate of LLMs:
- LLMs are pieces of the puzzle and can be assembled into powerful hybrid combinations. As represented by breakthrough #5, there is a good reason to believe that language will continue to play a key role into this hybrid combination (cf. the link between language and imagination).
- LLMs have emerging properties when used in a puzzle, as shown by some of the progresses with MoE, CoT or extended RAG. This is one of the most interesting claims about the future GPT5: “An apparent focus for GPT-5 is the incorporation of extended thinking capabilities. OpenAI aims to enable the model to lay out reasoning steps before solving a challenge, with internal or external checks on each step’s accuracy. This represents a significant shift towards enhancing the model’s reliability and reasoning prowess”.
- The effect of scale is unknown (and not understood properly yet). True, LLMs had been around for a while, but nobody expected the emerging change of capabilities when the scale grew over 10B parameters. As the search for super-massive LLMs continues, it would be difficult to form an educated guess of what these new foundation models will be able to do.
Hybridization of AI is not a new idea. It was the backbone of the NATF report on AI, and a common theme of my blog posts (drawing my inspiration from DeepMind and the great podcast from Hannah Fry ). However, it is clearly an idea that is getting more traction. I strongly recommend reading this article, “The Shift from Models to Compound AI Systems”, to get an updated view of what forms hybrid AI can take. A classical form of hybridization in AI is to use communities of agents, allowing to decline different forms and parametrization of an AI technique as a federation of agents who collaborate on a common goal. which applies perfectly to LLM as well. LLM agent has become a buzzword, that can designated the addition of LLM capability to an existing AI agent (electronic games is a field where AI agents have been used for a long time) of the embedding of an LLM and its context, parameters and techniques such as extended RAG to specialize on a specific time. In that second case, LLM agents are a way to specialize LLMs for specific tasks, for better efficiency and with the further possibility to use them as a federation to achieve higher-level tasks. In the first case, agents are used to combine LLMs with other capabilities such as planning or action, especially through the medium of code generation (recall the example of LLMs using python programming as a problem-solving technique). You should take a look that this great article “ How LLM Agents are Unlocking New Possibilities” to see that “LLM agents can proactively call external APIs or vector stores for additional information, based on dynamic decision-making. By calling different tools and using semantic search and vector databases, LLMs agents can provide precise answers according to search results. This also avoids common LLM issues such as inaccuracy and hallucinations”.
As mentioned in the introduction, the speed at which the technology is evolving, and the richness of this evolution – as shown with the three axis that we described in Section 3.1 – may be overwhelming. It is true that one of our challenges as technology professionals is to keep fluent with the AI toolbox, but this is better done in a “test and learn” approach rather than trying to read everything that is published. It is also important to note that not all progresses are relevant to every use case. When I manage “genAI” as a CIO, I tend to separate between four groups of use cases:
- "Whatever Copilot”: how to get the best from the constant flow of genAI assistants. In the make vs buy vs rent, this is clearly a rent (SaaS) choice, where the main focus is adoption, training and proper sourcing at the right cost.
- “Augmented software development” : How to leverage genAI for software engineering, from writing code to improving the software quality (testing, security, technical debts etc.). The impact of generative AI on SW engineering is hard to overestimate since we are entering the Software 2.0 world where programmers will train AI to grow source code.
- “Generative AI Customer dialog”: how to develop the next generation of “chat bots” (and other forms of personalized content) using the language fluency provided by LLMs. Here, the sensitivity (see the Air Canada story) and risk management requires hybrid (multiple-tier) architectures (to separate intent / answer / checking) and a mixture of make/buy or rent.
- “Proprietary Knowledge Engineering”, where a company wants to apply the power of LLMs and RAG to its own proprietary information without risking its IP. In that last case, the “make” part is more important, and hosting and cybersecurity are primary concerns. The goal is not to reproduce the state-of-the-art MMLU results of the best, but to ensure the full protection of the company’s know-how while leveraging the knowledge management wonders of Section 3.1.
4. Conclusion
It is time to conclude this long blog post, once I have restated that “A brief history of intelligence” is one of my favorite books and that you should really get your own copy. The first obvious conclusion from this post is that we will see LLMs as ubiquitous components of AI systems in the future, because of the importance of breakthrough #5 in the words of Max Bennett. LLMs will not only be part of systems specialized in knowledge and dialog, but they will also bring both interaction capabilities and knowledge embedding to many other digital and information systems.
The second conclusion that I draw from this book, based on breakthroughs #3 and #4, is that the future will be invented with digital twins that combine advanced simulation with reflective world modeling capabilities. Simulation is the natural approach to develop and to use causal models, which we need to advance the frontiers of artificially intelligent systems. Breakthrough #4 lightens the path towards many forms of AI that look at capturing role playing, game theoretical decision making and intelligent agent communities (a natural way to develop a “society of mind” approaches).