Tuesday, April 9, 2019

Hunting for Causality in Short Time Series

1. Introduction

This post is about the search for sense in a small data set, such as the few measures that one accumulates through self-tracking. Most commonly, finding sense in a small set of data means either to see regular patterns or to detect causality. Many writers have argued that our brains are hardwired for detecting patterns and causality. Causality is our basic ingredient for modelling “how the world works”. Inferring causality from our world experience is also a way of “compressing” our knowledge: once you understand that an open flame hurts, you don’t need to recall the experiences (and you don’t need so many of them to detect this causality). The reason for selecting this topic for today’s blog post is my recent participation to the ROADEF 2019 conference. I had the pleasure of chairing the machine learning session and the opportunity to present my own work about machine learning for self-tracking data.

We are so good at detecting causality that we are often fooled by random situations and tend to see patterns when there are none. This is a common theme of Nassim Taleb’s many books and especially his master first book “Fooled by Randomness”. The concept of “narrative fallacy” is critical when trying to extract sense from observation, we need to remember that we love to see “stories” with a sense because this is how our brain best remembers. There are two type of issues when trying to mine short data sets for sense: the absence of statistical significance because the data set is too small and the existence of our own narrative fallacy and other cognitive biases. Today I will talk about data sets collected from self-tracking (i.e. the continuous measurement of some of your characteristics, either explicitly while logging observations or implicitly with connected sensors such as a connected watch). The challenge of scientific methods when searching for sense with such short time series is to know when to say “I don’t know” when presented with a data set with no other form of patterns or correlation that what could be expected in any random distribution, without falling into the “pitfall of narrative fallacy”. In short, the “Turing test” of causality hunting is to reject random or quasi-random data input.
On the other hand, it is tempting to look for algorithms that could learn and extract sense from short time series precisely because humans are good at it. Humans are actually very good at short-term forecasting and quick learning which is without a doubt the consequence of evolution. Learning quickly to forecast the path of a predator or a prey has been resolved with reinforcement learning through “survival of the fittest” evolution. The topic of this blog post – which I discussed at ROADEF – is how to make sense of a set of short time series using machine learning algorithms. "Making sense" here is a combination of forecasting and causality analysis which I will discuss later.
The second reason for this blogpost is the wonderful book of Judea Pearl, “The Book of Why”, which is a masterpiece about causality. The central idea of the book is that causality does not “jump from the data” but requires an active role from the observer. Judea Pearl introduces concepts which are deeply relevant to this quest of search for sense with small data sets.  Hunting for causality is a “dangerous sport” for many reasons: most often you come back empty-handed, sometimes you catch your own tail … and when successful, you most often have little to show for your efforts. The two central ideas of causality diagrams and the role of active observers are keys for unlocking some of the difficulties of causality hunting with self-tracking data.

This post is organised as follows. Section 2 is a very short and partial review of “The Book of Why”. I will try to explain why Judea Pearl’s concepts are critical to causality hunting with small data sets. These principles have been applied to the creation of a mobile application that generated the data sets onto which the machine learning algorithm of Section 4 have been applied. This application uses the concept of a causal diagram (renamed as quests) to embody the user’s prior knowledge and assumptions. The self-measure follows the principle of the “active observer” of Judea Pearl’s P(X | do(Y)) definition. Section 3 dives into causality hunting through two other books and introduced the concept of Granger causality that binds forecasting and causality detection. It also links the concept of pleasure and surprise with self-learning, a topic that I borrow from Michio Kaku and which also creates a strong relationship between forecasting and causality hunting. As noted by many scholars, “the ability to forecast is the most common form of intelligence”. Section 4 talks briefly about Machine Learning algorithms for short time-series forecasting. Without diving too deep into the technical aspects, I show what prediction from small data sets is difficult and what success could look like, considering all the pitfalls that we have presented before. Machine Learning from small data is not a topic for deep learning, thus I present an approach based on code generation and reinforcement learning.

2. Causality Diagrams - Learn by Playing

Judea Pearl is an amazing scientist with a long career about logic, models and causality that has earned him a Turing Award in 2011. His book reminds me of “Thinking, Fast and Slow”  of Daniel Kahneman, a fantastic effort of summarising decades of research into a book that is accessible and very deep at the same time.  “The Book of Why – The new science of cause and effect” by Judea Pearl and Dana MacKenzie, is a master piece about causality. It requires careful reading if ones want to extract the full value of the content, but can also be enjoyed as a simple, exciting read. A great part of the book deals with paradoxes of causality and confounders, the variable that hide or explain causality relationships. In this section I will only talk about four key ideas that are relevant to hunting causality from small data

The first key idea of this book is causality is not a cold objective that one can extract from data without prior knowledge. He refutes a “Big Data hypothesis” that would assume that once you have enough data, you can extract all necessary knowledge. He proposes a model for understanding causality with three levels :  the first level is association, what we learn with observation; the second level is intervention, what we learn by doing things and the third level is counterfactuals, what we learn through imagining what-if scenarios. Trying to assess causality from observation only (for instance through conditional probabilities) is both very limited (ignoring the two top levels) but also quite tricky since as recalled by Persi Diaconis: “Our brains are not just wired to do probability problems, so I am not surprised there were mistakes”. Judea Pearl talk in depth about the Monty Hall problem, a great puzzle/paradox proposed by Marilyn Vos Savant, that has tricked many of the most educated minds. I urge you to read the book to learn for yourself from this great example. The author’s conclusion is: “Decades’ worth of experience with this kind of questions has convinced me that, in both a cognitive and philosophical sense, the ideas of causes and effects is much more fundamental than the idea of probability”.
Judea Pearl introduced the key concept of causal diagram to represent our prior preconception of causality that may be reinforced or invalidated from observation, following a true Bayesian model. A causal diagram is a directed graph that represents your prior assumptions, as a network of factors/variable that have causal influence on each other. A causal diagram is a hypothesis that actual data from observation will validate or invalidate. The central idea here is that you cannot extract a causal diagram from the data, but that you need to formulate a hypothesis that you will keep or reject later, because the causal diagram gives you a scaffolder to analyse your data. This is why any data collection with the Knomee mobile app that I mentioned earlier starts with a causal diagram (a "quest").
Another key insight from the author is to emphasise a participating role to the user asking the causality question, which is represented through the notation P(X | do(Y)). Where the conditional probability P(X | Y) is the probability of X being true when Y is observed, P(X | do(Y)) is the probability of X when the user chooses to “do Y”. The stupid example of learning that a flame burns your hand is actually meaningful to understand the power of “learning by doing”. One or two experiences would not be enough to infer the knowledge from the conditional probability P(hurts | hand in flame) while the experience do(hand in flame) means that you get very sure, very quick, about P(hurts | do(hand in flame)). This observation is at the heart of personal self-tracking. The user is active and is not simply collecting data. She decides to do or not to do things that may influence the desired outcome. A user who is trying to decide whether drinking coffee affects her sleep is actually computing P(sleep | do(coffee)). Data collection is an experience, and it has a profound impact on the knowledge that may be extracted from the observations. This is very similar to the key concept that data is a circular flow in most AI smart systems. Smart systems are cybernetic systems with “a human inside”, not deductive linear systems that derive knowledge from static data. One should recognise here a key finding from the NATF reports on Artificial Intelligence and Machine Learning (see “Artificial Intelligence Applications Ecosystem: How to Grow Reinforcing Loops”).

The role of the participant is especially important because there is a fair amount of subjectivity when hunting for causality. Judea Pearl gives many examples where the controlling factors should be influenced by the “prior belief” of the experimenters, at the risk of misreading the data. He writes:  “When causation is concerned, a grain of wise subjectivity tells us more about the real world that any amount of objectivity”. He also insists on the importance of the data collection process. For him, one of the reasons statisticians are often the most puzzled with the Monty Hall paradox is the habit of looking at data as a flat static table: “No wonder statisticians found this puzzle hard to comprehend. They are accustomed to, as R.A. Fisher (1922) puts it, “the reduction of the data” and ignoring the data-generation process”. As told earlier, I strongly encourage you to read the book to learn about “counfounders” – that are easy to explain with causal diagram – and how they play a critical role for these types of causality paradox where the intuition is easily fooled. This is the heart of this book: “ I consider the complete solution of the counfounders problem one of the main highlights of the Causal Revolution because it has ended an era of confusion that has probably resulted in many wrong decisions in the past”.

3. Finding a Diamond in the Rough

Another interesting book about hunting for causality is “Why: A Guide to Finding and Using Causes” by Samantha Kleinberg. This books starts with the idea that causality is hard to understand and hard to establish. Saying that “correlation is not causation” is not enough, understanding causation is more complex. Statistics do help to establish correlation, but people are prone to see correlation when none exists: “many cognitive biases lead to us seeing correlations where none exist because we often seek information that confirms our beliefs”. Once we validate a correlation with statistics tool, one needs to be careful because even seasoned statisticians “cannot resists treating correlations as if they were causal”.
Samantha Kleinberg talks about Granger Causality: “one commonly used method for inference with continuous-valued time series data is Granger”, the idea that if there is a time delay observed within a correlation, this may be a hint of causality. Judea Pearl warns us that this may be simply the case of a counfounder with asymmetric delays, but in practice the test of Granger causality is not a proof but a good indicator for causality. The proper wording is that this test is a good indicator for “predictive causality”. More generally, if predicting a value Y from the past of X up to a non-null delay does a good job, it may be said that there is a good chance of “predictive causality” from X to Y. This links the tool of forecasting to our goal of causality hunting. It is an interesting tool since it may be used with non-linear models (contrary to Granger Causality) and multi-variate analysis. If we start from a causal diagram in the Pearl’s sense, we may see if the root nodes (the hypothetical causes) may be used successfully to predict the future of the target nodes (the hypothetical “effects”). This is, in a nutshell, how the Knomee mobile app operates: it collects data associated to a causal diagram and uses forecasting as a possible indicator of “predictive causality”.
The search of “why” with self-tracking data is quite interesting because most values (heart rate, mood, weight, number of steps, etc.) are nonstationary on a short time scale, but bounded on a long-time horizon while exhibiting a lot of daily variation. This makes detecting patterns more difficult since this is quite different from extrapolating the movement of a predator for its previous positions (another short time series). We are much better at “understanding” patterns that derive from linear relations than those that emerge from complex causality loops with delays. The analysis of delays between two observations (at the heart of the Granger Causality) is also a key tool in complex system analysis. We must, therefore, bring it with us when hunting for causality. This is why the Knomee app includes multiple correlation/delay analysis to confirm or invalidate the causal hypothesis.

A few other pearls of wisdom about causality hunting with self-tracking may be found in the book from Gina Neff and Dawn Nafus. This reference book on quantified self and self-tracking crosses a number of ideas that we have already exposed, such as the critical importance of the user in the tracking and learning process. Self-tracking – a practice which is both very ancient and has shown value repeatedly – is usually boring if no sense is derived from the experiment. Making sense is either positive, such as finding causality, or negative, such as disproving a causality hypothesis. Because we can collect data more efficiently in the digital world, the quest for sense is even more important: Sometimes our capacity to gather data outpaces our ability to make sense of it”.  In the first part of this book we find this statement which echoes nicely the principles of Judea Pearl: “A further goal of this book is to show how self-experimentation with data forces us to wrestle with the uncertain line between evidence and belief, and how we come to decisions about what is and is not legitimate knowledge”.  We have talked about small data and short time-series from the beginning because experience shows that most users collect data over long period of time: “Self-tracking projects should start out as brief experiments that are done, say, over a few days or a few weeks. While there are different benefits to tracking over months or years, a first project should not commit you for the long haul”.  This is why we shall focus in the next section on algorithms that can work robustly with a small amount of data.
Self-tracking is foremost a learning experiment: “The norm within QS is that “good” self-tracking happens when some learning took place, regardless of what kind of learning it was”. A further motive for self-tracking is often behavioural change, which is also a form of self-learning. A biologists tell us, learning is most often associated with pleasure and reward. As pointed out in a previous post, there is a continuous cycle : pleasure to desire to plan to action to pleasure, that is a common foundation for most learning with living creatures. Therefore, there is a dual dependency between pleasure and learning when self-tracking: one must learn (make sense out the collected data) to stay motivated and to pursue the self-tracking experience (which is never very long) and this experience should reward the user from some forms of pleasure, from surprise and fun to the satisfaction of learning something about yourself.

Forecasting is a natural part of the human learning process. We constantly forecast what will happen and learn by reacting to the difference. As explained by Michio Kaku, our sense of humour and the pleasure that we associate with surprises is a Darwinian mechanism to push us to constantly improve our forecasting (and modelling abilities). We forecast continuously, we experience the reality and we enjoy the surprise (the difference between what happens and what we expect) as an opportunity to learn in a Bayesian way, that is to reassign our prior assumptions (our model of the world). The importance of curiosity as a key factor for learning is now widely accepted in the machine learning community as illustrated in this ICML 2017 paper: “Curiosity-driven Exploration by Self-supervised Prediction”. The role of surprise and fun in learning is another reason to be interested in forecasting algorithms. Forecasting the future, even if unreliable, creates positive emotions around self-tracking. This is quite general: we enjoy forecasts, which we see as games (in addition of their intrinsic value) – one can think of sports or politics as example. A self-tracking forecasting algorithm that does a decent job (i.e., not too wrong nor too often) works in a way similar to our brain: it is invisible but acts as a time saver most of the times, and when wrong it signals a moment of interest. We shall now come back to the topic of forecasting algorithms for short time-series, since we have established that they could play an interesting role for causality hunting.

4. Machine Generation of Robust Algorithms

Our goal in this last section is to look at the design of robust algorithms for short time series forecasting. Let us first define what I mean by robust, which will explain the metaphor which was proposed in the introduction. The following figure is extracted from my ROADEF presentation, it represents two possible types of “quests” (causal diagrams). Think of a quest as a variable that we try to analyse, together with other variables (the “factors”) which we think might explain the main variable. The vertical axis represents a classification of the variation that is observed into three categories: the random noise in red, the variation that is due to factors that were not collected in the sample in orange, and the green area is the part that we may associate with the factors. A robust algorithm is a forecasting algorithm that accepts an important part of randomness, to the point that many quests are “pointless” (remember the “Turing test of incomplete forecasting”). A robust algorithm should be able to exploit the positive influence of the factors (in green) when and if it exists. The picture makes it clear that we should not expect miracles: a good forecasting algorithm can only improve by a few percent over the simple prediction of the average values. What is actually difficult is to design an algorithm that is not worse – because of overfitting – than average prediction when given a quasi-random input (right column on the picture).

As the title of the section suggests, I have experimented with machine generation of forecasting algorithms. This technique is also called meta-programming: a first algorithm produces code that represents a forecasting algorithm. I have used this approach many times in the past decades, from complex optimization problems to evolutionary game theory. I found that it was interesting many years ago when working on TV audience forecasting, because it is a good way to avoid over-fitting, which is a common plague when doing machine learning over a small data set, and to control the robustness properties thanks to evolutionary meta-techniques. The principle is to create a term algebra that represents instantiations and combinations of simpler algorithm. Think of it as a tool box. One lever of control (robustness and over-fitting) is to make sure that you only select “robust tools” to put in the box. This means that you may not obtain the best or more complex machine learning algorithm such as deep learning, but you ensure both “explainability” and control. The meta-algorithm is an evolutionary randomised search algorithm (similar to the Monte-Carlo Tree Search of Alpha Zero) that may be sophisticated (using genetic combinations of terms) or simple (which is what we use for short time series).

The forecasting algorithm used by the Knomee app is produced locally on the user phone from the collected data. To test robustness, we have collected self-tracking data over the past two years - for those of you who are curious to apply other techniques, the data is available on GitHub. The forecasting algorithm is the fixed-point of an evolutionary search. This is very similar to reinforcement learning in the sense that each iteration is directed by a fitness function that describes the accuracy of the forecasting (modulo regularization, as explained in the presentation). The training protocol is defined as running the resulting forecasting algorithm on each sample of the data set (a quest) and for each time position from 2/3 to 3/3 of the ordered time series. In other words, the score that we use is the average precision of the forecasting that a user would experience in the last third of the data collection process. The term-algebra that is used to represent and to generate forecasting algorithms is made of simple heuristics such as regression and movingAverage, of weekly and hourly time patterns, and correlation analysis with threshold, cumulative and delays options. With the proper choice of meta-parameters to tune the evolutionary search (such as the fitness function or the depth and scope of local optimisation), this approach is able to generate a robust algorithm, that is : (1) that generates better forecasts than average (although not by much) (2) that is not thrown away by pseudo-random time series . Let me state clearly that this approach is not a “silver bullet”. I have compared the algorithm produced by this evolutionary search with the classical and simple machine learning approaches that one would use for time series: Regression, k-means clustering and ARMA. I refer you to the great book “Machine Learning for the Quantified Self” by M. Hoogendoorn and B. Funk for a complete survey on how to use machine learning with self-tracking data. On regular data (such as sales time series), the classical algorithms perform slightly better that evolutionary code generation. However, when real self-tracking data is used with all its randomness, evolutionary search manages to synthesise robust algorithms, which none of the three classical algorithms are. 

5. Conclusion

This topic is more complex than many of the subjects that I address here. I have tried to stay away from the truly technical aspects, at the expense of scientific precision. I will conclude this post with a very short summary:
  1. Causality hunting is a fascinating topic. As we accumulate more and more data, and as Artificial Intelligence tools become more powerful, it is quite logical to hunt for causality and to build models that represent a fragment of our world knowledge through machine learning. This is, for instance, the heart of the Causality Link startup led by my friend Pierre Haren, which builds automatically knowledge graphs from textual data while extracting causal links, which is then use for deep situation analysis with scenarios.
  2. Causality hunting is hard, especially with small data and even more with “Quantified Self” data, because of the random nature of many of the time series that are collected with connected devices. It is also hard because we cannot track everything and quite often what we are looking for depends on other variable (the orange part of the previous picture).
  3. Forecasting is an interesting tool for causality hunting. This is counter-intuitive since forecasting is close to impossible with self-tracking data. A better formulation should be: “ a moderate amount of robust forecasting may help with causality hunting". Forecasting gives a hint of “predictive causality”, in the sense of the Granger causality, and it also serves to enrich the pleasure-surprise-discovery learning loop of self-tracking.
  4. Machine code generation through reinforcement learning is a powerful technique for short time-series forecasting. Code generating algorithms try to assemble building blocks from a given set to match a given output. When applied to self-tracking forecasting, this technique allows to craft algorithms that are robust to random noise (to recognise the data as such) and able to extract a weak correlative signal from a complex (although short) data set.

Tuesday, January 29, 2019

What Today’s AI can and cannot do (Part 2)

1. Introduction

This is a sequel to the previous post, that was mostly about Kai-Fu Lee’s book “AI Superpowers”. Very simply stated, that book develops two threads of thoughts:
  1. AI technology has reached a maturity level where it is ready for massive deployment and application. This should change the way we operate many processes/services/products in our world.
  2. The AI tool box is now ready, what matters is scaling and engineering, more than new science. Hence the race will favour those ecosystems where speed and efficiency meet with software engineering and business skills.

The first part should come as no surprise. There are a number of other good books, such as “The Mathematical Corporation – where machine intelligence + human ingenuity achieve the impossible” and “Human+AI – reimagining work in the age of AI” that make the same point with lots of convincing examples. The report of the National Academy of Technologies of France on Artificial Intelligence and Machine Learning is saying exactly the same thing : we have reached a tipping point and the time to act is now.
The first part should come as no surprise. There are a number of other good books, such as “The Mathematical Corporation – where machine intelligence + human ingenuity achieve the impossible” and “Human+AI – reimagining work in the age of AI” that make the same point with lots of convincing examples. The report of the National Academy of Technologies of France on Artificial Intelligence and Machine Learning is saying exactly the same thing : we have reached a tipping point and the time to act is now.
The second part is more controversial. There has been a lot of heated reactions to Kai-Fu Lee’s statement about the state of AI and the chances of Europe to be part of the winning players in the years to come. This debate is included into a larger one about the hype and the fake statements about what is possible today. We may summarize the “AI paradoxes” or “open questions” as follows:
  • Is today’s AI ready for wonders, or are there so many impossibilities today that many claims are hyped ?
  • Is the next generation of  autonomous AI around the corner ? or is AGI a pure fiction that is totally out of reach ?
  • Should one should just focus on data to build the best AI strategy (i.e., become your own business’ best data source), or is there more than data to AI mastery ?
  • Will, as Kai-Fu Lee seems to suggest, only large massive players dominate, or should we expect to see some breakthrough from small players ?

To try to shed some light on those questions, I propose a short synthesis of Martin Ford’s book, “Architect of Intelligence - The Truth about AI from the People Building it”, where 25 world experts share their views about the future of AI. At the time of this writing,  this book is the best source to search for answers to the previous four open questions. The thesis of this post is that, while we have indeed reached a tipping point about AI and while the “current level of AI” technology enables a world race of investment and development, there is a larger field of “tomorrow’s AI” for which predictions are hazardous at best. Martin Ford’s book is an absolute must-read for anyone who is interested in AI. As told in the previous post, I find it a great source to explore the questions and issues raised by Kai-Fu Lee’s book, but there are many other topics addressed in this book that I will not touch today.

2. Architects of Intelligence

Martin Ford is a well-known futurist and author who has worked extensively on topics such as AI, automation, robots and the future of work. His previous book, “Rise of the Robots – Technology and the Threats of a Jobless Future” is a thought-provoking essay that addresses the issues of “AI and the future of work” and which I have made a personal reference on this topic. His new book, “Architects of Intelligence” is a list of 25 interviews with the world best-known scientists in the field of Artificial Intelligence and Machine Learning. You may think of it as an extended version of the Wired article “How to teach Artificial Intelligence common sense”. Although each interview is different, Martin Ford uses a common canvas of questions that have a clear intersection with the 4 introductory issues. The exceptional quality of the book comes both from the very distinguished list of scientists but also from the talent and knowledge of the interviewer.
In his first chapter Martin Ford says: “All would acknowledge the remarkable achievements of deep neural networks over the past decade, but they would likely argue that deep learning is just “one tool in the toolbox” and that continued progress will require integrating ideas from other spheres of artificial intelligence”. This book provides with a remarkable synthesis on the AI topic, but I should say beforehand that you should read it, because this post only covers a small part of the content. A summary is next to impossible since, even though there is a strong common thread of ideas that are shared by the majority of experts, there are also dissenting opinions. Therefore, what follows is my own synthesis that represents an editor’s choice both with the topics and the selected voices, even though I try to be as faithful as possible. Because of the multiple opinions and the dissenting topics, I have decided to include a fair number of quotes and to attribute them, consistently, to one of the interviewed scientists. A synthesis of so many different viewpoints is biased by nature. I try to stay faithful to the spirits both of the scientists to whom I borrow the quotations and to Martin Ford as the book editor, but you may disagree.

2.1 Even the best experts are very careful about what tomorrow’s AI will be an will do: we do not know what’s ahead.

This is one of the most consensual statement I will make in this synthesis: all experts are very careful about what the future of AI will look like. Yoshua Bengio insists that each new discovery changes the landscape of what will be possible next: “As we reach this satisfying improvement that we are getting in our techniques—we reach the top of the first hill—we also see the limitations, and then we see another hill that we have to climb, and once we climb that one we’ll see another one, and so on. It’s impossible to tell how many more breakthroughs or significant advances are going to be needed before we reach human-level intelligence.”  Fei-Fei Li explains that this is only the beginning, that convolutional networks and deep learning are not the final tools that will solve all problems. She warns us that “Dating when a breakthrough will come, is much harder to predict. I learned, as a scientist, not to predict scientific breakthroughs, because they come serendipitously, and they come when a lot of ingredients in history converge. But I’m very hopeful that in our lifetime we’ll be seeing a lot more AI breakthroughs given the incredible amount of global investment in this area”. Many other scientists use the same language: we don’t know, the path is unclear, etc. There is a strong worry about the hype and exaggeration that could cause a new winter or unsubstantiated fears, as said by Andrew Ng: “A lot of the hype about superintelligence and exponential growth were based on very naive and very simplistic extrapolations. It’s easy to hype almost anything. I don’t think that there is a significant risk of superintelligence coming out of nowhere and it happening in a blink of an eye, in the same way that I don’t see Mars becoming overpopulated overnight”. Rodney Brooks explains that hundreds of new algorithms need to be invented before we can address all the limitations of current AI. He also notices that even the technology trends may become more difficult to forecast when we enter the end of Moore’s Law: “We’re used to exponentials because we had exponentials in Moore’s Law, but Moore’s Law is slowing down because you can no longer halve the feature size. What it’s leading to though is a renaissance of computer architecture. For 50 years, you couldn’t afford to do anything out of the ordinary because the other guys would overtake you, just because of Moore’s Law”.

2.2 Even though there is no consensus of what “hybrid” may mean, it is most likely that a “system of systems” approach will prevail to solve the current challenges of AI.

Even the fathers of the modern deep learning are looking for a way to add structure and architecture to neural nets in order to address larger challenges than perception and recognition. Yoshua Bengio says: “Note that your brain is all neural networks. We have to come up with different architectures and different training frameworks that can do the kinds of things that classical AI was trying to do, like reasoning, inferring an explanation for what you’re seeing and planning”. When we look at the human brain, there seems to be much structure and specialization that occurs before the birth. Here is what Joshua Tenenbaum says: “Elizabeth Spelke is one of the most important people that anybody in AI should know if they’re going to look to humans. She has very famously shown that from the age of two to three months, babies already understand certain basic things about the world …. It used to be thought that that was something that kids came to and learned by the time they were one year old, but Spelke and others have shown that in many ways our brains are born already prepared to understand the world in terms of physical objects, and in terms of what we call intentional agents.” The debate starts when it comes to define what the best paradigm could be to add this structure. For Yann Lecun, “Everybody agrees that there is a need for some structure, the question is how much, and what kind of structure is needed. I guess when you say that some people believe that there should be structures such as logic and reasoning, you’re probably referring to Gary Marcus and maybe Oren Etzioni”.

The majority of scientists advocate for a hybrid approach that combines different forms of AI. Stuart Russel explains that “Carnegie Mellon’s Libratus poker AI was another very impressive hybrid AI example: it was a combination of several different algorithmic contributions that were pieced together from research that’s happened over the last 10 or 15 years”. He explains the value of randomized algorithms, a technique applied universally from AI (such as AlphaGo and MCTS) to operations research, network and cryptography algorithms. According to him, “The only way that humans and robots can operate in the real world is to operate at multiple scales of abstraction. Andrew Ng acknowledges that hybrid combination is de facto a standard for many systems: “At Landing AI we use hybrids all the time to build solutions for industrial partners. There’s often a hybrid of deep learning tools together with, say, traditional computer vision tools because when your datasets are small, deep learning by itself isn’t always the best tool”. Judea Pearl makes a great argument about the constraints imposed by small data sets but then extends to the problem of understanding causality: “Even today, people are building hybrid systems when you have sparse data. There’s a limit, however, to how much you can extrapolate or interpolate sparse data if you want to get cause-effect relationships. Even if you have infinite data, you can’t tell the difference between A causes B and B causes A”. Evolutionary algorithms – where machine learning tries to simulate and reproduce evolution – are a plausible path to develop hybrid architectures for AI, as illustrated by Joshua Tenenbaum: “Evolution does a lot of architecture search; it designs machines. It builds very differently, structured machines across different species or over multiple generations. We can see this most obviously in bodies, but there’s no reason to think it’s any different in brains. The idea that evolution builds complex structures that have complex functions, and it does it by a process which is very different to gradient descent, but rather something more like search in the space of developmental programs, is very inspiring to me.”

2.3 Deep Learning is “the technology advance of the decade” and we are only seeing the beginning of the consequences, but it is not a universal problem solver technique. 

There is more to AI than deep learning, Stuart Russel recalls that Deep Learning is a strict subset of Machine Learning, which is only one kind of AI : “it would be a huge mistake for someone to think that deep learning is the same thing as artificial intelligence, because the ability to distinguish Dalmatian dogs from bowls of cherries is useful but it is still only a very small part of what we need to give an artificial intelligence in order for it to be successful”.  He recalls that AlphaGo is a hybrid of classical search-based randomized algorithm and a deep learning algorithm for position evaluation. This is also what Demmis Hassabis says : “Deep learning is amazing at scaling, so combining that with reinforcement learning allowed it to scale to these large problems that we’ve now tackled in AlphaGo and DQN—all of these things that people would have told you was impossible 10 years ago”.

Gary Marcus is famous for his position that we need more than Deep Learning, especially because it requires a huge amount of data and delivers low levels of abstraction from that data. This entails that these algorithms are well suited to “the few big common problems” – such as vision or speech - and less for the large number of less frequent ones: “Neural networks are able to capture a lot of the garden-variety cases, but if you think about a long-tail distribution, they’re very weak at the tail.”  Oren Etzioni also sees Deep Learning as one tool in the tool box, that is very good at what it does but with a rather narrow scope: “we really have a long way to go and there are many unsolved problems. In that sense, deep learning is very much overhyped. I think the reality is that deep learning, and neural networks are particularly nice tools in our toolbox, but it’s a tool that still leaves us with a number of problems like reasoning, background knowledge, common sense, and many others largely unsolved”. I would argue, differently, that the problems that Deep Learning allow us to solve – perception - had been plaguing the scientific community for decades and that solving them makes Deep Learning  more than a “really nice tool”. Like many other scientists, I believe that the availability of DL for perception is opening a new era of hybrid approaches (as is precisely demonstrated by AlphaGo).

2.4 The scientific investigation is running at full speed, many domains are progressing fast and multiple techniques are added constantly to the toolbox.

Natural Language Processing is a perfect instance of a field that is making constant progress and that is fueled with the lower-abstraction progresses of speech/text recognition brought by Deep Learning. We have made great progresses – translation being a great example – but the consensus is that we are reaching the barrier of semantics (there is only so much you can do with automatic translation without understanding). This is just critical for dialogues (think chatbots) as explained by Barbara Grosz : “If you consider any of the systems that purport to carry on dialogues, however, the bottom line is they essentially don’t work. They seem to do well if the dialogue system constrains the person to following a script, but people aren’t very good at following a script. There are claims that these systems can carry on a dialogue with a person, but in truth, they really can’t.” Yoshua Bengio explains the search of semantic understanding: “There’s a lot of research in grounded language learning now trying to build an understanding of language, even if it’s a small subset of the language, where the computer actually understands what those words mean, and it can act in correspondence to those words”. David Ferrucci, who was part of the Watson team, works along the same path: “Elemental Cognition is an AI research venture that’s trying to do real language understanding. It’s trying to deal with that area of AI that we still have not cracked, which is, can we create an AI that reads, dialogs, and builds understanding”.

It is well recognized that Deep Learning works better that we are able to explain or understand, maybe for good reasons. This opens a huge research field of better understanding and characterizing the deep learning and neural nets techniques. The work from Yoshua Bengio about autoencoders is a good example of the multiple possible applications of these techniques besides pure pattern recognition: “There are two parts to an autoencoder, an encoder and a decoder. The idea is that the encoder part takes an image, for example, and tries to represent it in a compressed way, such as a verbal description.” Better explainability is another hot research topic; for instance, James Manyika talks about LIME (Local Interpretable Model Agnostic Explanation) : “LIME tries to identify which particular data sets a trained model relies on most to make a prediction. Another promising technique is the use of Generalized Additive Models, or GAMs. These use single feature models additively and therefore limit interactions between features, and so changes in predictions cane be determined as features are added”.

Yann Lecun makes a very interesting distinction between three levels (in a continuum) of learning from reinforcement learning, supervised learning to self-supervised learning. Self-supervised learning supports the constant enrichment of the learning model, as shown in another example brought by Oren Etzioni : “Tom Mitchell’s work, with lifelong learning at CMU, is also very interesting—they’re trying to build a system that looks more like a person: it doesn’t just run through a dataset and build a model and then it’s done. Instead, it continually operates and continually tries to learn, and then learn based on that, over a longer extended period of time”. Yann Lecun is famous for his argument against the hype about reinforcement learning, saying that you could not learn how to drive this way without too many – possibly fatal - crashes. As a reinforcement learning practitioner, I find this argument biased: one could grow the speed very progressively while the evaluation function penalizes small deviations from safe situation, which is more or less the way we learn to drive.

Many large players such as Google or Salesforce are investing massively into AutoML, the addition of another layer of machine learning methods to automatically tune the parameters for lower level machine learning techniques. Fei-Feil Li says: “An example of what we’re doing is a product we created that’s called AutoML. This is a unique product on the market to really lower the entry barrier of AI as much as much as possible—so that AI can be delivered to people who don’t do AI “. AutoML is a critical tool for continuous learning, thus it is much more than simply saving time. It supports the continuous adaptation to an environment that may be changing.  This is why it is a core technology for Digital Manufacturing as show by a startup like TellMePlus. AutoML is clearly a technology to watch according to Jeffrey Dean : “We also have a suite of AutoML products, which are essentially designed for people who may not have as much machine learning expertise, but want a customized solution for a particular problem they have. Imagine if you have a set of images of parts that are going down your assembly line and there are 100 kinds of parts, and you want to be able to identify what part it is from the pixels in an image. There, we can actually train you a custom model without you having to know any machine learning through this technique called AutoML.”.

Last, I want to point out a key insight from Judea Pearl: there are many things that you cannot learn by simply watching, you have to act – causality cannot be learned though simple observation. I will return to this idea when I write about Judea Pearl’s book “The Book of Why – the new science of causes and effects”, which is probably the deepest book I have read in 2018. I believe that this insight will have a profound effect on future machine learning algorithms, especially for autonomous robots: “This is how a child learns causal structure, by playful manipulation, and this is how a scientist learns causal structure—playful manipulation. But we have to have the abilities and the template to store what we learn from this playful manipulation so we can use it, test it, and change it. Without the ability to store it in a parsimonious encoding, in some template in our mind, we cannot utilize it, nor can we change it or play around with it”. I see there a delightful parallel with Nassim Taleb’s idea that you cannot really learn without “skin in the game”.

2.5  Although the AGI concept is difficult to pin down, there is a consensus that a “spectacularly better level of AI” performance will be achieved this century.

Stuart Russel reminds us that the goal of AI has always been to create general-purpose intelligent machines. Because this is hard, most of the work has been applied to more specific subtasks and application tasks, but “arguably, some of the conceptual building blocks for AGI have already been here for decades. We just haven’t figured out yet how to combine those with the very impressive learning capacities of deep learning.”  Fei-Fei Li position is not that different: “So, let’s first define AGI, because this isn’t about AI versus AGI: it’s all on one continuum. We all recognize today’s AI is very narrow and task specific, focusing on pattern recognition with labeled data, but as we make AI more advanced, that is going to be relaxed, and so in a way, the future of AI and AGI is one blurred definition”. Some research scientists, like Demmis Hassabis, have no difficulty recognizing their end goal: “From the beginning, we were an AGI company, and we were very clear about that. Our mission statement of solving intelligence was there from the beginning. As you can imagine, trying to pitch that to standard venture capitalists was quite hard”.

As told in the first section, it is impossible to speculate about future scientific discoveries, so many scientists prefer to stay vague about the time line. For instance, Stuart Russel says: “So that is why most AI researchers have a feeling that AGI is something in the not-too-distant future. It’s not thousands of years in the future, and it’s probably not even hundreds of years in the future”. Yann Lecun has a similar position: “How much prior structure do we need to build into those systems for them to actually work appropriately and be stable, and for them to have intrinsic motivations so that they behave properly around humans? There’s a whole lot of problems that will absolutely pop up, so AGI might take 50 years, it might take 100 years, I’m not too sure”. One thing seems clear though, AGI is not around the corner, as told by Andrew Ng: “The honest answer is that I really don’t know. I would love to see AGI in my lifetime, but I think there’s a good chance it’ll be further out than that. … Frankly, I do not see much progress. Other than having faster computers and data, and progress at a very general level, I do not see specific progress toward AGI”.

It is hard to speculate about the “when”, but many scientists have their opinion about what the path towards AGI could look like. For Stuart Russel, “many of the conceptual building blocks needed for AGI or human-level intelligence are already here. But there are some missing pieces. One of them is a clear approach to how natural language can be understood to produce knowledge structures upon which reasoning processes can operate.” For Gary Marcus, capturing common sense requires adding other techniques than Deep Learning: “Another way to put it is that humans have all kinds of common-sense reasoning, and that has to be part of the solution. It’s not well captured by deep learning. In my view, we need to bring together symbol manipulation, which has a strong history in AI, with deep learning. They have been treated separately for too long, and it’s time to bring them together”. Joshua Tenenbaum makes a compelling argument that deep understanding of natural language (which has to be measured in other ways than the Turing test, which most scientists recognize as too easy to fool) is on the critical path towards AGI : “ Language is absolutely at the heart of human intelligence, but I think that we have to start with the earlier stages of intelligence that are there before language, but that language builds on. If I was to sketch out a high-level roadmap to building some form of AGI of the sort you’re talking about, I would say you could roughly divide it into three stages corresponding to three rough stages of human cognitive development”.

2.6 Our future will be strongly impacted by the constant progress of AI in applicability and performance.

In a way very similar to Kai-Fu Lee or the two books that I quoted in the introduction, most scientists interviewed in Martin Ford’s book see AI as a powerful transformation force. Start Russel says: “what’s likely to happen is that machines will far exceed human capabilities along various important dimensions. There may be other dimensions along which they’re fairly stunted and so they’re not going to look like humans in that sense”. Yoshua Bengio also foresees a very strong impact of AI to come: “I don’t think it’s overhyped. The part that is less clear is whether this is going to happen over a decade or three decades. What I can say is that even if we stop basic research in AI and deep learning tomorrow, the science has advanced enough that there’s already a huge amount of social and economic benefit to reap from it simply by engineering new services and new products from these ideas”. Geoffrey Hinton explains that there are many more dangerous technologies out there (such as molecular biology) compared to the threat of “ultra-intelligent systems”. He sees the probable outcome as positive even though he opens the need for social regulation which we will discuss later on: “I hope the rewards will outweigh the downsides, but I don’t know whether they will, and that’s an issue of social systems, not with the technology”. Andrew Ng is confident that AI technology has reached the critical mass for delivering value and that we should not experience a new AI winter: “In the earlier AI winters, there was a lot of hype about technologies that ultimately did not really deliver. The technologies that were hyped were really not that useful, and the amount of value created by those earlier generations of technology was vastly less than expected. I think that’s what caused the AI winters”.

Artificial Intelligence is seen very much as “Augmented Intelligence”, a technique that will help humans to be more efficient. For instance, Rana el Kaliouby says: “My thesis is that this kind of interface between humans and machines is going to become ubiquitous, that it will just be ingrained in the future human-machine interfaces, whether it’s our car, our phone or smart devices at our home or in the office. We will just be coexisting and collaborating with these new devices, and new kinds of interfaces“.  Barbara Grosz sees a similar pattern in the world of multi-agent systems: “This whole area of multi-agent systems now addresses a wide range of situations and problems. Some work focuses on strategic reasoning; other on teamwork. And, I’m thrilled to say, more recently, much of it is now really looking at how computer agents can work with people, rather than just with other computer agents”. David Ferrucci at Elemental Cognition envision a future where human and machine intelligence collaborate tightly and fluently: “Through thought-partnership with machines that can learn, reason, and communicate, humans can do more because they don’t need as much training and as much skill to get access to knowledge and to apply it effectively. In that collaboration, we are also training the computer to be smarter and more understanding of the way we think”.

A key point that is made by many scientists is that AI is part of a larger ecosystem – software, digitalization, connected objects and robots -, and will not move forward independently but together with other technology advances. For Daniela Rus, “the advances in navigation were enabled by hardware advances. When the LIDAR sensor—the laser scanner—was introduced, all of a sudden, the algorithms that didn’t work with sonar started working, and that was transformational”. Rodney Brooks insists that we need to take a larger perspective when want to understand the future transformation: “I don’t deny that, but what I do deny is when people say, oh that’s AI and robots doing that. As I say, I think this is more down to digitalization”.

2.7 Faced to the speed of change, it is likely that our mature countries will need adaptive mechanisms, similar to “universal basic income”.

The ideal mechanism to help the society adapt to an era of fast transformation due to accrued automation is yet to be determined, but there is a consensus among the scientists interviewed by Martin Ford that something is necessary. Yann Lecun says: “Economic disruption is clearly an issue. It’s not an issue without a solution, but it’s an issue with considerable political obstacles, particularly in cultures like the US where income and wealth redistribution are not something that’s culturally accepted”. James Manyika has a strong opinion on the possible scope of AI-fueled automation: “By looking at all this, we have concluded that on a task level in the US economy, roughly about 50% of activities—not jobs, but tasks, and it’s important to emphasize this—that people do now are, in principle, automatable”. Gary Marcus has a similar view, even if the timeline is stretched: “Driverless cars are harder than we thought, so paid drivers are safe for a while, but fast-food workers and cashiers are in deep trouble, and there’s a lot of them in the workplace. I do think these fundamental changes are going to happen. Some of them will be slow, but in the scale of, say, 100 years, if something takes an extra 20 years, it’s nothing”.  The conclusion, as pointed out by Daphne Koller, is that society needs to think hard about the disruption that is coming: “Yes, I think that we are looking at a big disruption on the economic side. The biggest risk/opportunity of this technology is that it will take a lot of jobs that are currently being done by humans and have those be taken over to a lesser or greater extent by machines. There are social obstacles to adoption in many cases, but as robust increased performance is demonstrated, it will follow the standard disruptive innovation cycle”.

For many scientists, the solution will look very much like Universal Basic Income (UBI), from Geoffrey Hinton : “Yes, I think a basic income is a very sensible idea” to Yoshua Bengio : “I think a basic income could work, but we have to take a scientific view on this to get rid of our moral priors that say if a person doesn’t work, then they shouldn’t have an income. I think it’s crazy”. Gary Markus sees UBI as the only solution to the inevitable job loss: “I see no real alternative. We will get there, but it’s a question of whether we get there peacefully through a universal agreement or whether there are riots on the street and people getting killed. I don’t know the method, but I don’t see any other ending”. Ray Kurzweil provides the optimistic version of this, since he thinks that the positive outcome of AI will have raised the overall revenue: “I made a prediction at TED that we will have universal basic income, which won’t actually need to be that much to provide a very high standard of living, as we get into the 2030s”.

The exact formula is not clear and some object to universal income as “inactivity subsidy”, but everyone is calling for something to soften the strength of the upcoming transformation. For instance, Demmis Hassabis says: “I think that’s the key thing, whether that’s universal basic income, or it’s done in some other form. There are lots of economists debating these things, and we need to think very carefully about how everyone in society will benefit from those presumably huge productivity gains, which must be coming in, otherwise it wouldn’t be so disruptive”. James Manyika likes the debate about UBI because he agrees that something has to happen, although he points out that work brings more than income, so the replacement of jobs should not be income alone. We need to supply as well “meaning, dignity, self-respect, purpose, community and social effects, and more” – hence UBI is not enough: “My issue with it is that I think it misses the wider role that work plays. Work is a complicated thing because while work provides income, it also does a whole bunch of other stuff. It provides meaning, dignity, self-respect, purpose, community and social effects, and more. By going to a UBI-based society, while that may solve the wage question, it won’t necessarily solve these other aspects of what work brings”. David Ferrucci also welcomes the need for regulation, while he points out how hard this is, to regulate without slowing down the advances and the benefits of technology for society. Eventually, what is needed is not only universal basic income, but universal contribution to society and universal training. Joshua Tenenbaum is more open about possible solutions in the future: “We should think about a basic income, yes, but I don’t think anything is inevitable. Humans are a resilient and flexible species. Yes, it might be that our abilities to learn and retrain ourselves have limitations to them. If technology keeps advancing, especially at this pace, it might be that we might have to do things like that. But again, we’ve seen that happen in previous eras of human history. It’s just unfolded more slowly”. Yann Lecun emphasizes the importance of training when society needs to undergo a strong technology shift : “ You would think that as technological progress accelerates, there’d be more and more people left behind, but what the economists say is that the speed at which a piece of technology disseminates in the economy is actually limited by the proportion of people who are not trained to use it”. A similar view is expressed by Andrew Ng: “I don’t support a universal basic income, but I do think a conditional basic income is a much better idea. There’s a lot about the dignity of work and I actually favor a conditional basic income in which unemployed individuals can be paid to study”.

2.8  Most AI experts are optimistic about the future that AI will make possible for our societies, despite the complex transformation journey.

 The consensus is rather positive about what AI will enable society to accomplish in the years to come. For instance, Stuart Russel says: “As an optimist, I can also see a future where AI systems are well enough designed that they’re saying to humans, “Don’t use us. Get on and learn stuff yourself. Keep your own capabilities, propagate civilization through humans, not through machines.” Stuart Russel explains that AI should be controllable and safe by design, otherwise it fails its purpose. For Yann Lecun, the rise of automation will create a premium on meaningful human interaction: “Everything that’s by done by machine is going to get a lot cheaper, and anything that’s done by humans is going to get more expensive. We’re going to pay more for authentic human experience, and the stuff that can be done by machine is going to get cheap”. For James Maniyika, the race for AI applications has already started (“the genie is out of the bottle”) and this is good because “we’re about to enter a new industrial revolution. I think these technologies are going to have an enormous, transformative and positive impact on businesses, because of their efficiency, their impact on innovation, their impact on being able to make predictions and to find new solutions to problems, and in some case go beyond human cognitive capabilities”. Oren Etzioni quotes one of his colleagues, Eric Horvitz, to point out that the advances that AI will enable are badly needed, that is, the risk of not using AI is higher than the risk of moving forward: “He has a great quote when he responds to people who are worried about AI taking lives. He says that actually, it’s the absence of AI technology that is already killing people. The third-leading cause of death in American hospitals is physician error, and a lot of that could be prevented using AI. So, our failure to use AI is really what’s costing lives”.

Although I use the term “race” to reflect the intensity of the competition – and also as a link to Kai-Fu Lee’s book – most scientists do not see the international competition as a zero-sum game. On the contrary, there is room for collaboration as well. For instance, Andrew Ng says: “AI is an amazing capability, and I think every country should figure out what to do with this new capability, but I think that it is much less of a race than the popular press suggests”. Still other see the competition as a race, as pointed out by David Ferrucci: “To stay competitive as a nation you do have to invest in AI to give a broad portfolio. You don’t want to put all your eggs in one basket. You have to attract and maintain talent to stay competitive, so I think there’s no question that national boundaries create a certain competition because of how much it affects competitive economics and security”. This intense competition should receive the attention of public offices and some form of regulation is called for. For instance, Oren Etzioni says: “Yes, I think that regulation is both inevitable and appropriate when it comes to powerful technologies. I would focus on regulating the applications of AI—so AI cars, AI clothes, AI toys, and AI in nuclear power plants, rather than the field itself. Note that the boundary between AI and software is quite murky!”  We see once more the idea that AI is not an isolated technology, but a feature of advanced software systems. As told earlier, many scientists insist on the critical need for training to help people accommodate this new technology wave. To end on a positive note, Daniela Rus quotes the example of BitSource: “BitSource was launched a couple of years back in Kentucky, and this company is retraining coal miners into data miners and has been a huge success. This company has trained a lot of the miners who lost their jobs and who are now in a position to get much better, much safer and much more enjoyable jobs”.

3. Conclusion  

 This post is already quite long, due to the richness of the source and the complexity of the questions. I will simply end with a small summary of my key convictions (hence my key biases) about AI’s future in the years to come:
  1. Nothing summarizes my way of thinking better than Pedro Domingo’s great quote that I used earlier: “ People worry that computers will get too smart and take over the world, but the real problem is that they’re too stupid and they’ve already taken over the world”. This is the first part of the introduction statement: the AI revolution – with all its current shortcomings - has already started.
  2. “Unit techniques” – such as deep neural nets or Bayesian networks -  will continue to improve (because the domain is still young, and because there is a lot that we do not understand about our own successes with these techniques) but I expect to see more of the improvement coming from “system of systems” approach. Hence system engineering is critical part of AI science and technology, as the reading of Martin Ford’s book makes abundantly clear. However, the better understanding of deep learning is a fascinating quest (as shown in the book or in this great blog post from Google Brain’s team) that may bring revolutionary advances of its own.
  3. It is very hard, not to say impossible, to forecast the next advances in AI methods. This makes predicting China’s future superiority, or Europe’s impediment, a risky proposition. Europe has a lot of strength in its science, academic and innovation network. At the same time (“en même temps”) extracting the value from what is currently at hand requires a level of investments, business drive, access to data and computing resources, and software engineering skills that Europe is comparatively lacking, as pointed out by Kai-Fu Lee. If you have a hard time to understand why massive computing power matters with AI, listen to Azeem Azhar conversation with Jack Clark in this great podcast.
  4. I am a strong believer that biomimicry will continue to lead AI research towards new techniques and paradigms. For instance, it is safe to bet that progresses in perception will lead to progresses with reasoning. Because there is a continuum between narrow and general AI, or between weak and strong AI, progress will surprise us, even tough the “endgame ambition” of AGI is probably far away. The excessive hype about what AI can do today, or the heated debates about “superintelligence”, have created a reaction, especially from research scientists over 50, telling that “there is nothing new nor truly exciting about AI today”. I most definitely disagree. Though no one is able to foresee the next advances in AI science, I have a strong conviction that the previous accumulation of the past decade will create new amazing combinations and applications in the decade to come.

Technorati Profile