Sunday, June 30, 2019

The Future of Work and the Transformation of Jobs

1. Introduction

Today I return to the “future of work” topic, especially in the face of automation, from robots to AI services. I have already written about the future of work, in my other blog, a piece that was reproduced in the website. During the last 3 years, I have given multiple talks about the future of work and engaged with various audiences. I have found that some of my arguments needed rewriting or better explanations to be understood. During the same three years, we have witnessed numerous examples of the continuous progress of automation. What I have seen in the technology field has strengthened my conviction that a really big shift is coming: automation in the next two decades will transform the word. The AI revolution has started, but the real consequences will occur as the component AI technologies are assembled into systems of systems, such as robots.
This post has a similar structure and thesis than the previous one. It is both more concise and deeper; I will focus on what are usually the most heated topics of discussion when I present this content:

  •  Universal Basic Income, and what forms it could take to open a wide area of micro-entrepreneurship opportunities without been perceived as an invitation to idleness
  • The power of localization and multi-scale interaction-based business opportunity. Automation will dramatically transform the landscape of value creation opportunities. What can be automated and digitalized will fall into concentration cycles while activities that require physical interaction will undergo a localization trend (for many reasons, including climate change and the scarcity of natural resources).
  • The network between large scale multi-national companies that act as a platform and a marketplace of very small players. This mesh is the real content of the blog post, i.e., the future of work.
  • The constant rise of complexity and uncertainty does not lead to the complete domination of marketplaces over integrated companies. If the “uberization” of complicated tasks is definite trend that merges with automation, the rising complexity of other tasks require integration (communication and collaboration).

Since I wrote my last blogpost, I read Martin Ford’s book: “Rise of the Robots: Technology and the Threat of a Jobless Future” and found that there was a striking similarity to my own way of thinking. I do not plan to write a summary but I wholeheartedly recommend the reading of this book, and the more recent one “AI : Architects of Intelligence” which is crucially relevant to the topics being discussed here.
This post is organized as follows. Section 2 states the problem: the probable massive job destruction due to automation, because of smarter robots and artificial intelligence. Although it is very hard to make forecast and to pinpoint a date where so many jobs would disappear, it is very likely that the compounded progresses of AI and robots will yield more job destructions than job creations. This will shift the nature of most future jobs towards what is “better left to human”: managing emotion and embodied interaction. Section 3 tries to describe what jobs and companies would look like when this AI/robotic/digital transformation is completed. My lines of thoughts follow the model of “iconomy”, the state where the economy has absorbed the full benefits of information technology. Section 4 looks at the transition from today’s economy towards what might be considered as a positive future reconciled with technology. Even if we subscribe to a positive and optimistic outlook to what AI, automation and digital will create, the transition is risky at best. This last part makes a few proposals about what society should do to protect its citizen from too much brutality with the emergence of a new world.

2. Automatization, AI and Job Destruction


2.1 The technology revolution is destroying more jobs than creating new ones

The issue of forecasting how many jobs are at stake is quite controversial; according to the Frey/Osborne report 47% of jobs are at risk within a few decades. There has been has been so many discussions about the 2013 report “The future of employment: how susceptible are job to computerization?” from  Carl B. Frey and Michael A. Osborne that I won’t say much more that the fact that I share their viewpoint. This forecast is not precisely set up in the future, and it takes into account the continuous progress of automation technology. Most of the people who look hard at what automation is likely to bring in the next decades, such as Andrew McAffeeEric Brynjolfsson or Martin Ford come up with similar conclusions. The OCDE has taken a different perspective, where much fewer jobs seem to be at stake, but their conservative analysis is based on the state of technology today. This may be seen as a safe forecasting approach, but I think that it misses the point and I prefer the viewpoint of technology futurists, even though it is by construction more speculative.

If we avoid trying to pinpoint a precise date, the compounded effect of automation is likely to touch as much as 50% of todays’ jobs. Once we understand that we are talking about tomorrow’s AI and automation capabilities, it becomes clear that it is hard to be specific about the “when”. Another defensive way of thinking that came along in the past few years says that automation will eliminate tasks, not jobs. More precisely, many tasks could be automated, but the jobs would still be necessary because some of it could not be automated for a long time. This is certainly true, but somehow naïve. Once many tasks are automated, the machine is much faster, and companies need fewer humans for doing the jobs. This has been the story of automation in the past centuries.

Factories with almost no humans are already here for stable and decomposable processes; AI is a complexity absorber and will extend the field of what can be done without human intervention. This mega trend of full automation is important: it supports putting back the factories closer to the consumers and because of the geopolitical consequences (great equalizer of the worldwide salary differences). There are already a number of those factories – I visited a Sharp LCD factory 10 years ago with close to no human operators – but the upcoming revolution of AI, machine vision and smarter robots with better sensors will grow in a spectacular manner the scope of what we can produce with these robotized factories.

2.2 It will take time to remove humans from processes

The path to automation is complex and hard to forecast. Foxconn gives us a good example since they announced to the word in 2014 that they would replace their human workers with 1 million robots within 3 years, which has not happened. There are many Kiva robots in Amazon’s warehouse, but there are also many humans because, for the time being, they do a better and cheaper job than robots when packing into boxes. Deep Learning is doing a great job at machine vision and complex sorted tasks, but only when the goal is well defined and stable.  When “common sense” is required to do a lot of menial tasks, humans are still the way to go.

As a consequence of the current state of AI, it is more likely that specialized jobs will be automated before generic ones. This is beautifully explained by Brynjolfsson and McAffee in their book : “The second machine age” with the following quote : “As the cognitive scientist Steven Pinker puts it, “The main lesson of thirty-five years of AI research is that the hard problems are easy and the easy problems are hard. . . . As the new generation of intelligent devices appears, it will be the stock analysts and petrochemical engineers and parole board members who are in danger of being replaced by machines. The gardeners, receptionists, and cooks are secure in their jobs for decades to come.” I heard exactly the same message at the Singularity University where I attended their great executive program in 2016:  automation starts with expert jobs, because AI today is quite narrow.

The emphasizes on robots that will replace humans is misplaced: the whole environment will become smart (sensors, network, ML, AI) and the increased efficiency will gradually reduce the number of jobs. Another lesson from the Singularity University curriculum is not to focus only on AI and software, but also on the NBIC progresses that translate into tremendous improvement of sensors, networks and manufacturing devices. The first step of the automation, where a big robot is installed to take your job and to replace you has already happened. What is coming now is much more subtle: the complete environment around you becomes a robot that assist you – a form of ubiquitous robotisation. Thanks to smart technologies, all your tools and environment become smart and adaptive to help you do a better job.

 2.3 A new job landscape

McKinsey sees the future of production in the hands of robots, transactions performed by Artificial Intelligence and interaction left to human. For the past few years, I have been quoting heavily the great article « Preparing for a new era of work » from Susan Lund, James Manyika et Sree Ramaswamy at the McKinsey Institute. They propose a simple yet powerful framework where jobs are separated into three groups: production, transaction and interaction. These are not absolute categories, there is an amount of overlap, but they do work : production is manufacturing, focused on products, transaction is a an umbrella for large class of services, from customer service to financial services, and interaction here means an experience that requires the use of your body to carry emotions. For instance, a chatbot that answers your queries is seen as a service, not an interaction (obviously debatable, but it helps with what follows). The framework proposed by the article can be summarized as: jobs in the production sectors will move to robots, jobs in the transaction sector will be performed by artificial intelligence and the interaction sector is where human will continue to add value.

A key insight is that what can be automated will eventually become a commodity thus the value is mostly in emotions and interactions. This is not a new idea : it has been brilliantly expressed by Daniel Pink in his bestseller, « A Whole New Mind – Why Right-Brainers Will Rule The Future », who see the jobs of tomorrow as driver by creativity, storytelling, design and emotions. He characterizes many of today high-value activities as left-brain activities, such as planning, computing, problem solving, which will fall into the realm of automation thanks to artificial intelligence. It is not a coincidence if the word “experience” has become the buzzword of business. In a world of abundance of intelligence and smart production, the experience proposed to the customer is the heart of differentiation. Anything else will become a commodity.

What defines the economy of the next decades is the “experience” economy; Story telling will become the critical skills for most jobs. It we think about the gardener that you may hire in 2040, she is probably going to use robots, such as an evolved version of the lawn mower that we see today or a more interesting hedge trimmer robot. She will leverage automated technology but the main service that she will propose will be storytelling, that is a discussion about your garden, how you live with it and what your aspirations are, from say, aesthetic delight to deep relaxation. The tremendous progress that we see today with deep learning (from vision to speech recognition) means that our interaction with smart robots and systems will evolve from “coding” to “conversing”. The gardener will not program her automated helpers, she will train them. The gardener example could be transposed to many “manual” interactions jobs: as the technology will become ubiquitous in the way the service is rendered, the weight of the emotions, the storytelling and the human connection will rise.

3. What the Future of Work Might Look Like


3.1 Why companies will continue to exist

The rise of complexity and uncertainty creates new transaction costs, such as communication or training, which ensures the need for companies for many decades.  Coase’s theory of the firm tells that companies exists because of transaction costs. Some think that smart automation and digitalization will dramatically reduce these costs and produce massive “Uberization”: replacing companies with marketplaces. This is shortsighted, as explained by the book “How Google works”. Eric Schmidt explains that teams work better if their members live, play, eat and work together.  There are many forms of “transaction costs” that will not disappear in the future, even with the prevalence of digital technologies. By definition “complexity” in tasks mean that communication and collaboration are required. These are “transaction” costs that give an advantage to an integrated team over a virtual one. The uncertainty and the constant change require a lot of constant learning (learning by doing, with trials and errors) that is another form of transaction cost: if one outsources the “trying”, the “learning from trying” occurs elsewhere.

Tasks platforms/marketplaces require simple interfaces, event for tasks that are difficult to perform. Things that can be broken into individual components and then dealt with a marketplace are complicated, not complex (this is the definition of complexity). During the past 10 years I have read many articles about the future of work, that see the marketplace as the future dominant pattern and the (regular) integrated company as the exception. I totally disagree, because of the rising complexity, of both the products and services that companies are trying to deliver, but even more the complexity of the ecosystems that constitute the company’s environment. I see the rising attention given to “synchronicity” as a signal that the distributed, asynchronous marketplace model is too limited to be general. Synchronicity, that is the importance of using the same time structure and the necessity to reconnect frequently, is everywhere in the “new ways of working”, from the SCRUM rituals to the team structures of empowered organizations. I strongly recommend reading Bruce Daisley’s book, “The Joy of Work” to understand the importance of synchronicity in the face of complexity and uncertainty.

Our VUCA world is offering multiple opportunities for companies who can master complexity: there is a large world of opportunities that will open as technology progresses. Companies will leverage AI to perform more complex projects, they will explore new frontiers, while those who can’t will be stuck to a red ocean of simpler commodity products and services. Somehow, the playground of future companies will be defined by the maximum amount of complexity that they will be able to manage. The introduction of AI, smart communication tools and smart sensitive environments that act as a cobots, will grow this capability continuously. Our teams in 2030, then in 2040, will have collaborative abilities that will be vastly enhanced by artificial intelligence. They will be able to solve challenges that require better synchronization, orchestration and communication.

3.2 A Mesh of Platform Companies and Micro Companies

Our way of working in large companies will change massively though the collaboration of AI and cognitive tools, as well as computer-aided collaboration: for instance, Google search will be dramatically more powerful in the decades to come. We are only at the beginning of the journey since semantic tools search such as Watson are still primitive while speech recognition / translation tools are using pattern recognition through deep learning with little help from semantic tools. I am convinced that this will change within one, at most two, decades. The ubiquitous presence of smart assistants that are “really smart” will change completely the life of tomorrow’s knowledge workers. As Erik Brynjolfsson points out, we will work seamlessly together with the machines.

AI will give rise to augmented collaboration. The rise of complexity translates into the growth of the context that is necessary for collaboration. AI, as a complexity sponge, will reduce part of this context from explicit to implicit. Put in other words, it will become easier to collaborate with the mediation of a smart assistant. AI will massively increase the network effects of platforms. This is not a forecast; it has already happened: most platforms from Uber and Facebook to AirBnB already rely on smart matching algorithms. The increasing power and flexibility of future AI will mean that most companies will act as platforms and develop both internal and external networks of collaboration and partnership.

Globalization and digitalization have a demonstrated concentration effect- what is often called “The Matthew effect” : the winner takes all. This is a key idea from the creators of the “iconomy” concept such as Michel Volle. Digitalization and globalization tend to concentration. The more AI and automation are used, the more value creation is based on accumulated data, the more concentration is likely to occur. This has definitely occurred with the first wave of the digital economy, leading to the creation of the GAFAM and BATX, it could continue to spread as “software is eating the world”. Needless to say, this trend does not help with the destruction of jobs.

The end game may be a group of few global actors, who open the opportunity of “last mile” customization using a platform approach with many small partners who are closer to the local constraints / needs / culture. The creation of large platforms tends to produce ecosystems of small players that build bridges between the technical power of the centralized platform and the specialized needs of local markets. The Apple iPhone and the millions of apps is a good illustration. Even though Steve Jobs would have preferred to keep a walled garden of Apple-sanctioned apps, the App Store and its developer ecosystem has proven to be much more effective.

3.3  A Human and Social Services Economy

“Personal Services” economy is bound to grow: the territory is large because of population growth, global aging, and because many needs are underserved today. Service-to-person is an “experience economy” including products (technology) and services (interaction). If we follow the analysis proposed by McKinsey, human interaction is the growth area for future economic services. Shifting from production to “personal services” is complex for many reasons. First, the jobs in the service economy such as welcoming customer at Home Depot or taking care of elderlies in a retirement home pay far less than the manufacturing jobs that are being replaced. Second, the skills and the value system – the way our society recognizes and value contributions – have to change dramatically in the decades to come for this shift to be accepted.

There is a long list of interaction services where humans are better suited than robots: restaurants, clothing, medicine, personal care, education, entertainment, art, law & order, etc. These domains already provide a large share of the total amount of jobs, and there is room for growth because many needs are not served as well as they could. For most of these domains, there is a choice to be made. The associated jobs are not the best candidates for automation, both from a technical feasibility and a desirability point of view.  We find a similar analysis in Erik Brynjolfsson and Andrew McAffee’s book:  «  Results like these indicate that cooks, gardeners, repairmen, carpenters, dentists, and home health aides are not about to be replaced by machines in the short term ». But, as we will discuss in the next section, this could change. Human interaction as a growth/replacement sector for employment will only occur if there is a strong political will to promote and protect it.

States will have to solve the paradox of less public spending (because of growing deficits) with more public services (because states will need to play a role in the production to interaction transition). Civil-servant jobs are clearly part of this “interaction domain”. The “Gilets jaunes” crisis in France was a clear illustration of this tension between the need for cost reductions and the need for more local human interaction. It is easy to forecast that although the total budget of nation-states is under constraints, many of the human-facing jobs are here to stay, if only to avoid a massive social upheaval. If we look at the most under-served needs and the most pressing global factor, which is the aging of populations (in developed countries), states will be the major player in this job transformation. To put it bluntly, since there is no way to massively increase the salaries of the high-value, high-interaction jobs in personal care, there is a societal mandate to increase their recognition and social attractiveness.

The future of personalized health may be an increase in the standard deviation, more than the mean, of life expectancy. Mass medicine (vaccination, hygiene, procedures) helped move the mean upward for everyone in the 20th century, “4P” medicine (predictive, preventive, personalized and participatory has a more complex/debatable outcome (we are not necessarily equally suited to benefit). This means that the aging economy may become more complex and that the concept of “retirement age” is hard to define. This is a debated topic and I have no claims to expertise or forecasting abilities. Yet there are now many signs that life expectancy is stalling. It has also always been the case that life expectancy has a large standard deviation, with many socio-economics factors. There are many possible causes for the life expectancy stalling, but it looks like we are done with the systemic factors that applied to the whole population for improving our life expectancy and we are entering a time where progress is slow and specialized. Standard deviation had reduced in the past decades, but it may also become more stable in the future. NBIC technology improvements will improve our longevity, but this may not necessarily help the equal access to retirement benefits. This is something that regulators must consider very carefully before deciding that everyone should work longer in the decades to come.

3.4  Hyper-scale concentration and localized distribution

Digital economy follows a concentration logic, but the interaction economy leads to geographical distribution of opportunities (trip versus value). Because digital services are immaterial, there is the “winner takes all” effect that we mentioned earlier. However, interaction – in the restricted sense presented in Section 2.3 – requires the provider and the customer to meet, with one of them having to move. This creates a multi-scale geographic distribution: high value creation (because of talent, skill scarcity or fame) will justify longer trips yielding an “opportunity zone” that is large, whereas many lesser value creation opportunities will need to develop on smaller territories. This geographical structure is well known (it has been the basis for setting up shops in the past centuries), but it applies to many more activities than commerce.

The value geography is a multi-scale distribution with lots of opportunities at the local and micro-local level. There is a “power law” of business opportunities with a “fat tail” as the attraction zone becomes smaller. If we take the classical example of a restaurant, the “attraction zone” varies according to the talent of the chef (and localization / fame / marketing / etc). An exceptional restaurant operates nationally, even internationally. A great restaurant works at a province level whereas a good restaurant works at the scale of a city. A mediocre restaurant plays on convenience and serves a neighborhood. In many countries, street cooking is available with a much smaller scale, with an even bigger focus on convenience over quality. The geographical nested structure yields the power law: there are few spots to become a Michelin-starred restaurant, but a huge amount of opportunities for neighborhood or street-cooking.

The value distribution of the “interaction economy” is sensitive to population density; this rule reinforces the urban concentration and creates a more stringent problem in rural area. This is a simple corollary of the geographical value distribution model. It explains why personal services thrive in an urban setting and in more densely populated area. It also explains why the damage of job destruction due to automation is more hardly felt in rural area (another lesson from the “Gilets jaunes” crisis in France). This leads us naturally to the next section and the need for regulation and incentives to free the micro-business opportunities for “interaction jobs” that may exist at the local level. Large cities may let the Darwinian play of evolution run the transformation from 20th century jobs to this new world of interaction jobs, but states need a framework to manage the transition in less densely populated areas.

4. The Transition Challenge: How to Soften the Civilization Shift

4.1 The uncanny and dark valley of human-like robots

Human-like interaction is perfectly accessible to robots; as a matter of fact, it is one of the “hot” areas of robotics. The rise of companion robots in Japan (and Asia at large) shows that human interaction is definitely a possible field for robots. Japan is working very hard on companion robots because of its age pyramid imbalance, but we can expect to see multiple similar developments everywhere, because developing interactions between human and robots is both possible and exciting. It is surprisingly easy because we are easy to fool. Our mirror neurons are very quick at projecting emotions on an artificial head with the proper eyes, lips and eyebrows movements. I was amazed by a simplistic robotic head in IBM research lab 10 years ago: it was simply programmed to look at me and to mimic my expression, and I was already feeling a strong connection. There is no doubt that a high-quality silicon head like the ones developed in Japan, with the proper smart software, will be able to perform human-like interaction with the illusion of artificial emotions.

If the field of human interaction jobs is not protected from automation, the rapid job destruction of the next decades will create massive social unrest. This is a consequence of the analysis proposed by McKinsey: if production and transaction are heavily automated, we need to protect interaction jobs. Regulation is possible, because interaction (in the sense of Section 2.3) is local and physical. There are many possible paths, from interdiction (like self-service gasoline pumps in some states) to taxes and economic incentives. I have no clue about the best solution, every solution that I can think of shows many problems. However, I have a strong conviction that democratic states will have to act and protect their interaction jobs. Somehow, what has happened in the US (Trump election) or UK (Brexit) is an illustration of what is to come.

Whereas production and transaction activity are subjected to globalization (and therefore, a “prisoner dilemma” that prevents states from acting separately), the interaction economy may and should be protected through regulation.  It is close to impossible to fight against automation, robots and AI in the worlds of production and transaction. Any state who moves with its own agenda will create a competitive disadvantage and will lose business to others. Manufacturing companies are competing on a worldwide basis and are required to operate at the best possible level of efficiency. Transaction and digital service companies deliver immaterial services that are hard to track, constraint or regulate. Human embodied interaction can be defined legally, tracked and it falls under the local scope of state political power. A country may decide to be more protective than others without being at risk of losing jobs.

4.2 Universal Basic Income versus social challenges

Universal Basic Income is almost there and will be necessary to smoothen the transition: to avoid “precarity as a life condition” (what is called precariat by Guy Standing). When I talk about Universal Basic Income (UBI), I do not necessarily mean the same amount given to everyone. Universal Basic Income is any scheme that ensures that every citizen has access to a basic income that ensures “a basic standard of living”. Many countries, including France with its “RSA” (active solidarity revenue), have a safety net income, so the gap to UBI is not necessarily huge, although the ambition is to put each citizen away from poverty. UBI is a common topic today and is often associated to the topic of job destruction and automation because of the massive transition that is ahead of us. As noticed when I reviewed “Architects of Intelligence”, I found it interesting that although nobody has a clear proposal to make about what the ideal UBI should be, most people who work on the cutting edge of AI and Automation are convinced that some form of UBI will become necessary.

UBI does not need to be stigmatized as a “giving a salary for doing nothing”, it may be used as a lever to create micro-enterprise local opportunities. This may be seen as the extension of French status “intermittent du spectacle” offered to part-time live performance participants, to many activity fields.  This status is meant to allow many to live from their talent and passion for the art (theater, music, etc.) even though their yearly workload would not yield a sufficient income. There is a group of thinker, both at MIT and in the Silicon Valley, who see UBI as a way to promote “micro-entrepreneurship”, with the argument that for the average citizen, there is too much risk and not enough expected income to make entrepreneurship a viable choice. Their reasoning is that UBI could serve as a revenue insurance and convince people to take to risk living from what they care about. This is the exact opposite of giving UBI so that people could stay home doing nothing.

To paraphrase Richard Feynman, “There is room at the bottom”: there are many interaction micro-business opportunities that do not materialize because of risk or low revenue. For instance, UBI makes living from selling your paintings a much more practical calling. There are many artists who are forced to consider their practice as a hobby that could reconsider given the proper incentive scheme, and society would fare better if many had access to their own works of art. UBI may be seen as an economy-related attempt to change the “opportunity landscape”, along the two previously mentioned axis of risk and expected revenue. There are many activities with great social value, especially in the field of personal care such as elderly care, or the field of neighborhood assistance, that are not creating enough value to produce jobs but that could be re-established as life-supporting provided the proper economic incentive (UBI). It does not have to be a monthly check that you get, no questions asked. It could be a supplementary income that you receive based on the accepted social value of your work (e.g., helping kids to do their homework in your apartment block).

Citizens do not need only UBI, they need access to “universal including opportunities to contribute”.  This is beautifully explained by Pierre-Noël Giraud in his book “L’homme inutile”) (the useless man) : people who live far below the poverty threshold express that it is better to be exploited than to be useless. This concept of the useless man is also described by Yuval Harari in “Homo Deus”.  Therefore, the goal of UBI is not only to provide a basic way of life but also a “basic inclusive contributor status”.  Today, already millions of people work for no salary but for social purpose. Advocates of UBI sees this mechanism as a way to extend this possibility to a much larger group of people (those who are not retired or with additional revenue, who can afford to work for free on causes that they believe in).  

4.3 Craftmanship and Mass Personalization

Mass production may be a “historical parenthesis” brought by the industrial revolution before we move back from standardized to personalized products, because of technology (3D printing) and interaction opportunities. I am borrowing this idea and this image from Avi Reichental, whose 2014 TED talk on 3D printing describes a world of “makers”, where everyone may have access to custom-made products. Technology and localization can work hand-in-hand: as explained earlier, micro-entrepreneurs can benefit from large platforms to produce or deliver locally exactly what is required by their customer, in a way that a giant company could not achieve. 3D printing brings a tremendous advantage of speed and personalization. It works together with mass production, because the economy of scale still applies (3D printing is not destined to mass-manufacture standard objects). This is what creates the network/platform structure that we discussed earlier: the combination of the strength of massive hubs with the flexibility of “last mile” shops. The “bits shoe” designed by Earl Stewart is a great illustration of the combination of technology (3D printing unique to each foot) with local craftmanship (the leather cover of the shoe).

New « digital creatives » jobs in large companies will be too few to offset job destructions; local use of creativity and design skills, at a lower scale of value creation, is more realistic. In the words of techno-optimists who love to quote Schumpeter and his “destructive creation”, automation will yield the creation of multiple designers and creative jobs. The arguments of the previous sections make me quite dubious: the number of future digital creative jobs is not commensurate to the number of jobs destroyed through smart automation. However, the future of local interaction “micro-entrepreneurs” includes most certainly creativity and design.

We could envision in the future a service economy that brings back craftmanship for clothing, custom-made pieces of furniture, cooking, home hairdresser or masseur, as well as paintings or sculpture. There is a form of a paradox here : many of the services that were reserved to the affluent class in the past centuries, such as made-to-order customized products (clothing, furniture, …) and services (home cook, hairdresser) could be revived at a micro-local scale (hence creating a very large number of jobs) offered to a much larger group of customers, provided the proper economic conditions, such as a form of universal basic income. Next to the previously mentioned week-end paint artist who could start making a living from his craft, you could add an amateur gardener, the neighbor who has a woodshop in his basement, the lady who likes to sew dresses, etc.  This kind of micro-economy is actually more resilient, uses fewer natural resources, provides more social links than the mass production of the past century.

4.4 From End of Jobs to renewed forms for work: new labor contracts

Freelance and the « gig economy” is already here; the trend is strong in the US and rising in Europe. The reduction of jobs for the same amount of work as already started. Even Google is making use of a large number of temporary workers. There is a growing number of talents in the software industry who prefer to work as freelance. According to the experts at Singularity University, freelance made for 35% of the work in 2015 and could reach 50% as early as 2020. This is a complex evolution, since on the one hand, companies are betting on “strong ties” (synchronicity, co-located teams, context-sharing) and on the other hand, they are building networks, platforms and marketplace to hire the best talents, anywhere, anytime, betting on “weak ties” (people who are further from the company’s social network but uniquely qualified to the task). This brings us to the central idea expressed in Section 3: future companies as platforms that develop an ecosystem of partners. The Uber example shows why we need regulation to protect the smaller “players”, but the efficiency of this networked model should not be underestimated.

A new way of working based on tasks, projects and short-term commitment is both aspirational (and fits the desire of many young workers) and a curse when imposed to workers without social protection. This new form of project-based work without the ties of a permanent work contract matches the aspirations of new generations of workers. It fits their desire to balance multiple goals and to pursue different paths at the same time (the so-called slashers). At the same time, for many other workers, this new way of working is imposed by the platforms that we mentioned earlier. Thus, regulation is necessary, as well as protection from the precarity of the gig economy for those whose talent is not so rare, which brings us back to the need for Universal Basic Income.

Labor protection needs reinvention both within and outside the companies. Here is a great quote from Nathaniel Calhoun at the Singularity University : « Exponential Organizations worsens the fate of labor ».  “Exponential organization” is a best-seller than describe new trends in organization for companies to adapt to new technology opportunities and to the exponential rhythm of change. This quote tells something that other sociology experts have told before: new ways of synchronized working are not always less stressful nor more comfortable. Labor contracts and work conditions require an overhaul to avoid precarity also for those who works for large companies. If we combine the networked platform trend, the need to continuously retrain for new skills in a world where technology changes at an accelerating speed, the desire for a better pro/perso, corporate/societal balance, jobs and careers – or the lack of them – will evolve in ways that regulation cannot afford to miss.

5. Conclusion

This is a partial view, focused on technology, companies and job markets. There are other bigger issues at stake that are interplaying with the future of work: global warming, natural resource depletion, rise of pollution and geostrategic and political unrest.

I have no pretense to try and incorporate these views into a global perspective about the future of work, that would become the future of the world. This blog post should not be seen as a prediction about the future, but more as an essay whose goals is to propose a few key ideas that may be used as “food for thoughts”.

This being said, I have the conviction that the push towards localization and human interaction that is emphasized and promoted in this article resonates with the changes that will be pushed onto our world to reduce carbon emissions, to reduce air and water pollution, and to reduce the consumption of non-renewable natural resources.

The key message – similar to the “iconomy vision” – is that among the multiple possible futures of AI, robots and work automation, some may fit harmoniously with the long-term aspiration and requirements of a sustainable society. However, we will not get there by chance; a long-term systemic vision of what state policies may do is badly needed.

Tuesday, April 9, 2019

Hunting for Causality in Short Time Series

1. Introduction

This post is about the search for sense in a small data set, such as the few measures that one accumulates through self-tracking. Most commonly, finding sense in a small set of data means either to see regular patterns or to detect causality. Many writers have argued that our brains are hardwired for detecting patterns and causality. Causality is our basic ingredient for modelling “how the world works”. Inferring causality from our world experience is also a way of “compressing” our knowledge: once you understand that an open flame hurts, you don’t need to recall the experiences (and you don’t need so many of them to detect this causality). The reason for selecting this topic for today’s blog post is my recent participation to the ROADEF 2019 conference. I had the pleasure of chairing the machine learning session and the opportunity to present my own work about machine learning for self-tracking data.

We are so good at detecting causality that we are often fooled by random situations and tend to see patterns when there are none. This is a common theme of Nassim Taleb’s many books and especially his master first book “Fooled by Randomness”. The concept of “narrative fallacy” is critical when trying to extract sense from observation, we need to remember that we love to see “stories” with a sense because this is how our brain best remembers. There are two type of issues when trying to mine short data sets for sense: the absence of statistical significance because the data set is too small and the existence of our own narrative fallacy and other cognitive biases. Today I will talk about data sets collected from self-tracking (i.e. the continuous measurement of some of your characteristics, either explicitly while logging observations or implicitly with connected sensors such as a connected watch). The challenge of scientific methods when searching for sense with such short time series is to know when to say “I don’t know” when presented with a data set with no other form of patterns or correlation that what could be expected in any random distribution, without falling into the “pitfall of narrative fallacy”. In short, the “Turing test” of causality hunting is to reject random or quasi-random data input.
On the other hand, it is tempting to look for algorithms that could learn and extract sense from short time series precisely because humans are good at it. Humans are actually very good at short-term forecasting and quick learning which is without a doubt the consequence of evolution. Learning quickly to forecast the path of a predator or a prey has been resolved with reinforcement learning through “survival of the fittest” evolution. The topic of this blog post – which I discussed at ROADEF – is how to make sense of a set of short time series using machine learning algorithms. "Making sense" here is a combination of forecasting and causality analysis which I will discuss later.
The second reason for this blogpost is the wonderful book of Judea Pearl, “The Book of Why”, which is a masterpiece about causality. The central idea of the book is that causality does not “jump from the data” but requires an active role from the observer. Judea Pearl introduces concepts which are deeply relevant to this quest of search for sense with small data sets.  Hunting for causality is a “dangerous sport” for many reasons: most often you come back empty-handed, sometimes you catch your own tail … and when successful, you most often have little to show for your efforts. The two central ideas of causality diagrams and the role of active observers are keys for unlocking some of the difficulties of causality hunting with self-tracking data.

This post is organised as follows. Section 2 is a very short and partial review of “The Book of Why”. I will try to explain why Judea Pearl’s concepts are critical to causality hunting with small data sets. These principles have been applied to the creation of a mobile application that generated the data sets onto which the machine learning algorithm of Section 4 have been applied. This application uses the concept of a causal diagram (renamed as quests) to embody the user’s prior knowledge and assumptions. The self-measure follows the principle of the “active observer” of Judea Pearl’s P(X | do(Y)) definition. Section 3 dives into causality hunting through two other books and introduced the concept of Granger causality that binds forecasting and causality detection. It also links the concept of pleasure and surprise with self-learning, a topic that I borrow from Michio Kaku and which also creates a strong relationship between forecasting and causality hunting. As noted by many scholars, “the ability to forecast is the most common form of intelligence”. Section 4 talks briefly about Machine Learning algorithms for short time-series forecasting. Without diving too deep into the technical aspects, I show what prediction from small data sets is difficult and what success could look like, considering all the pitfalls that we have presented before. Machine Learning from small data is not a topic for deep learning, thus I present an approach based on code generation and reinforcement learning.

2. Causality Diagrams - Learn by Playing

Judea Pearl is an amazing scientist with a long career about logic, models and causality that has earned him a Turing Award in 2011. His book reminds me of “Thinking, Fast and Slow”  of Daniel Kahneman, a fantastic effort of summarising decades of research into a book that is accessible and very deep at the same time.  “The Book of Why – The new science of cause and effect” by Judea Pearl and Dana MacKenzie, is a master piece about causality. It requires careful reading if ones want to extract the full value of the content, but can also be enjoyed as a simple, exciting read. A great part of the book deals with paradoxes of causality and confounders, the variable that hide or explain causality relationships. In this section I will only talk about four key ideas that are relevant to hunting causality from small data

The first key idea of this book is causality is not a cold objective that one can extract from data without prior knowledge. He refutes a “Big Data hypothesis” that would assume that once you have enough data, you can extract all necessary knowledge. He proposes a model for understanding causality with three levels :  the first level is association, what we learn with observation; the second level is intervention, what we learn by doing things and the third level is counterfactuals, what we learn through imagining what-if scenarios. Trying to assess causality from observation only (for instance through conditional probabilities) is both very limited (ignoring the two top levels) but also quite tricky since as recalled by Persi Diaconis: “Our brains are not just wired to do probability problems, so I am not surprised there were mistakes”. Judea Pearl talk in depth about the Monty Hall problem, a great puzzle/paradox proposed by Marilyn Vos Savant, that has tricked many of the most educated minds. I urge you to read the book to learn for yourself from this great example. The author’s conclusion is: “Decades’ worth of experience with this kind of questions has convinced me that, in both a cognitive and philosophical sense, the ideas of causes and effects is much more fundamental than the idea of probability”.
Judea Pearl introduced the key concept of causal diagram to represent our prior preconception of causality that may be reinforced or invalidated from observation, following a true Bayesian model. A causal diagram is a directed graph that represents your prior assumptions, as a network of factors/variable that have causal influence on each other. A causal diagram is a hypothesis that actual data from observation will validate or invalidate. The central idea here is that you cannot extract a causal diagram from the data, but that you need to formulate a hypothesis that you will keep or reject later, because the causal diagram gives you a scaffolder to analyse your data. This is why any data collection with the Knomee mobile app that I mentioned earlier starts with a causal diagram (a "quest").
Another key insight from the author is to emphasise a participating role to the user asking the causality question, which is represented through the notation P(X | do(Y)). Where the conditional probability P(X | Y) is the probability of X being true when Y is observed, P(X | do(Y)) is the probability of X when the user chooses to “do Y”. The stupid example of learning that a flame burns your hand is actually meaningful to understand the power of “learning by doing”. One or two experiences would not be enough to infer the knowledge from the conditional probability P(hurts | hand in flame) while the experience do(hand in flame) means that you get very sure, very quick, about P(hurts | do(hand in flame)). This observation is at the heart of personal self-tracking. The user is active and is not simply collecting data. She decides to do or not to do things that may influence the desired outcome. A user who is trying to decide whether drinking coffee affects her sleep is actually computing P(sleep | do(coffee)). Data collection is an experience, and it has a profound impact on the knowledge that may be extracted from the observations. This is very similar to the key concept that data is a circular flow in most AI smart systems. Smart systems are cybernetic systems with “a human inside”, not deductive linear systems that derive knowledge from static data. One should recognise here a key finding from the NATF reports on Artificial Intelligence and Machine Learning (see “Artificial Intelligence Applications Ecosystem: How to Grow Reinforcing Loops”).

The role of the participant is especially important because there is a fair amount of subjectivity when hunting for causality. Judea Pearl gives many examples where the controlling factors should be influenced by the “prior belief” of the experimenters, at the risk of misreading the data. He writes:  “When causation is concerned, a grain of wise subjectivity tells us more about the real world that any amount of objectivity”. He also insists on the importance of the data collection process. For him, one of the reasons statisticians are often the most puzzled with the Monty Hall paradox is the habit of looking at data as a flat static table: “No wonder statisticians found this puzzle hard to comprehend. They are accustomed to, as R.A. Fisher (1922) puts it, “the reduction of the data” and ignoring the data-generation process”. As told earlier, I strongly encourage you to read the book to learn about “counfounders” – that are easy to explain with causal diagram – and how they play a critical role for these types of causality paradox where the intuition is easily fooled. This is the heart of this book: “ I consider the complete solution of the counfounders problem one of the main highlights of the Causal Revolution because it has ended an era of confusion that has probably resulted in many wrong decisions in the past”.

3. Finding a Diamond in the Rough

Another interesting book about hunting for causality is “Why: A Guide to Finding and Using Causes” by Samantha Kleinberg. This books starts with the idea that causality is hard to understand and hard to establish. Saying that “correlation is not causation” is not enough, understanding causation is more complex. Statistics do help to establish correlation, but people are prone to see correlation when none exists: “many cognitive biases lead to us seeing correlations where none exist because we often seek information that confirms our beliefs”. Once we validate a correlation with statistics tool, one needs to be careful because even seasoned statisticians “cannot resists treating correlations as if they were causal”.
Samantha Kleinberg talks about Granger Causality: “one commonly used method for inference with continuous-valued time series data is Granger”, the idea that if there is a time delay observed within a correlation, this may be a hint of causality. Judea Pearl warns us that this may be simply the case of a counfounder with asymmetric delays, but in practice the test of Granger causality is not a proof but a good indicator for causality. The proper wording is that this test is a good indicator for “predictive causality”. More generally, if predicting a value Y from the past of X up to a non-null delay does a good job, it may be said that there is a good chance of “predictive causality” from X to Y. This links the tool of forecasting to our goal of causality hunting. It is an interesting tool since it may be used with non-linear models (contrary to Granger Causality) and multi-variate analysis. If we start from a causal diagram in the Pearl’s sense, we may see if the root nodes (the hypothetical causes) may be used successfully to predict the future of the target nodes (the hypothetical “effects”). This is, in a nutshell, how the Knomee mobile app operates: it collects data associated to a causal diagram and uses forecasting as a possible indicator of “predictive causality”.
The search of “why” with self-tracking data is quite interesting because most values (heart rate, mood, weight, number of steps, etc.) are nonstationary on a short time scale, but bounded on a long-time horizon while exhibiting a lot of daily variation. This makes detecting patterns more difficult since this is quite different from extrapolating the movement of a predator for its previous positions (another short time series). We are much better at “understanding” patterns that derive from linear relations than those that emerge from complex causality loops with delays. The analysis of delays between two observations (at the heart of the Granger Causality) is also a key tool in complex system analysis. We must, therefore, bring it with us when hunting for causality. This is why the Knomee app includes multiple correlation/delay analysis to confirm or invalidate the causal hypothesis.

A few other pearls of wisdom about causality hunting with self-tracking may be found in the book from Gina Neff and Dawn Nafus. This reference book on quantified self and self-tracking crosses a number of ideas that we have already exposed, such as the critical importance of the user in the tracking and learning process. Self-tracking – a practice which is both very ancient and has shown value repeatedly – is usually boring if no sense is derived from the experiment. Making sense is either positive, such as finding causality, or negative, such as disproving a causality hypothesis. Because we can collect data more efficiently in the digital world, the quest for sense is even more important: Sometimes our capacity to gather data outpaces our ability to make sense of it”.  In the first part of this book we find this statement which echoes nicely the principles of Judea Pearl: “A further goal of this book is to show how self-experimentation with data forces us to wrestle with the uncertain line between evidence and belief, and how we come to decisions about what is and is not legitimate knowledge”.  We have talked about small data and short time-series from the beginning because experience shows that most users collect data over long period of time: “Self-tracking projects should start out as brief experiments that are done, say, over a few days or a few weeks. While there are different benefits to tracking over months or years, a first project should not commit you for the long haul”.  This is why we shall focus in the next section on algorithms that can work robustly with a small amount of data.
Self-tracking is foremost a learning experiment: “The norm within QS is that “good” self-tracking happens when some learning took place, regardless of what kind of learning it was”. A further motive for self-tracking is often behavioural change, which is also a form of self-learning. A biologists tell us, learning is most often associated with pleasure and reward. As pointed out in a previous post, there is a continuous cycle : pleasure to desire to plan to action to pleasure, that is a common foundation for most learning with living creatures. Therefore, there is a dual dependency between pleasure and learning when self-tracking: one must learn (make sense out the collected data) to stay motivated and to pursue the self-tracking experience (which is never very long) and this experience should reward the user from some forms of pleasure, from surprise and fun to the satisfaction of learning something about yourself.

Forecasting is a natural part of the human learning process. We constantly forecast what will happen and learn by reacting to the difference. As explained by Michio Kaku, our sense of humour and the pleasure that we associate with surprises is a Darwinian mechanism to push us to constantly improve our forecasting (and modelling abilities). We forecast continuously, we experience the reality and we enjoy the surprise (the difference between what happens and what we expect) as an opportunity to learn in a Bayesian way, that is to reassign our prior assumptions (our model of the world). The importance of curiosity as a key factor for learning is now widely accepted in the machine learning community as illustrated in this ICML 2017 paper: “Curiosity-driven Exploration by Self-supervised Prediction”. The role of surprise and fun in learning is another reason to be interested in forecasting algorithms. Forecasting the future, even if unreliable, creates positive emotions around self-tracking. This is quite general: we enjoy forecasts, which we see as games (in addition of their intrinsic value) – one can think of sports or politics as example. A self-tracking forecasting algorithm that does a decent job (i.e., not too wrong nor too often) works in a way similar to our brain: it is invisible but acts as a time saver most of the times, and when wrong it signals a moment of interest. We shall now come back to the topic of forecasting algorithms for short time-series, since we have established that they could play an interesting role for causality hunting.

4. Machine Generation of Robust Algorithms

Our goal in this last section is to look at the design of robust algorithms for short time series forecasting. Let us first define what I mean by robust, which will explain the metaphor which was proposed in the introduction. The following figure is extracted from my ROADEF presentation, it represents two possible types of “quests” (causal diagrams). Think of a quest as a variable that we try to analyse, together with other variables (the “factors”) which we think might explain the main variable. The vertical axis represents a classification of the variation that is observed into three categories: the random noise in red, the variation that is due to factors that were not collected in the sample in orange, and the green area is the part that we may associate with the factors. A robust algorithm is a forecasting algorithm that accepts an important part of randomness, to the point that many quests are “pointless” (remember the “Turing test of incomplete forecasting”). A robust algorithm should be able to exploit the positive influence of the factors (in green) when and if it exists. The picture makes it clear that we should not expect miracles: a good forecasting algorithm can only improve by a few percent over the simple prediction of the average values. What is actually difficult is to design an algorithm that is not worse – because of overfitting – than average prediction when given a quasi-random input (right column on the picture).

As the title of the section suggests, I have experimented with machine generation of forecasting algorithms. This technique is also called meta-programming: a first algorithm produces code that represents a forecasting algorithm. I have used this approach many times in the past decades, from complex optimization problems to evolutionary game theory. I found that it was interesting many years ago when working on TV audience forecasting, because it is a good way to avoid over-fitting, which is a common plague when doing machine learning over a small data set, and to control the robustness properties thanks to evolutionary meta-techniques. The principle is to create a term algebra that represents instantiations and combinations of simpler algorithm. Think of it as a tool box. One lever of control (robustness and over-fitting) is to make sure that you only select “robust tools” to put in the box. This means that you may not obtain the best or more complex machine learning algorithm such as deep learning, but you ensure both “explainability” and control. The meta-algorithm is an evolutionary randomised search algorithm (similar to the Monte-Carlo Tree Search of Alpha Zero) that may be sophisticated (using genetic combinations of terms) or simple (which is what we use for short time series).

The forecasting algorithm used by the Knomee app is produced locally on the user phone from the collected data. To test robustness, we have collected self-tracking data over the past two years - for those of you who are curious to apply other techniques, the data is available on GitHub. The forecasting algorithm is the fixed-point of an evolutionary search. This is very similar to reinforcement learning in the sense that each iteration is directed by a fitness function that describes the accuracy of the forecasting (modulo regularization, as explained in the presentation). The training protocol is defined as running the resulting forecasting algorithm on each sample of the data set (a quest) and for each time position from 2/3 to 3/3 of the ordered time series. In other words, the score that we use is the average precision of the forecasting that a user would experience in the last third of the data collection process. The term-algebra that is used to represent and to generate forecasting algorithms is made of simple heuristics such as regression and movingAverage, of weekly and hourly time patterns, and correlation analysis with threshold, cumulative and delays options. With the proper choice of meta-parameters to tune the evolutionary search (such as the fitness function or the depth and scope of local optimisation), this approach is able to generate a robust algorithm, that is : (1) that generates better forecasts than average (although not by much) (2) that is not thrown away by pseudo-random time series . Let me state clearly that this approach is not a “silver bullet”. I have compared the algorithm produced by this evolutionary search with the classical and simple machine learning approaches that one would use for time series: Regression, k-means clustering and ARMA. I refer you to the great book “Machine Learning for the Quantified Self” by M. Hoogendoorn and B. Funk for a complete survey on how to use machine learning with self-tracking data. On regular data (such as sales time series), the classical algorithms perform slightly better that evolutionary code generation. However, when real self-tracking data is used with all its randomness, evolutionary search manages to synthesise robust algorithms, which none of the three classical algorithms are. 

5. Conclusion

This topic is more complex than many of the subjects that I address here. I have tried to stay away from the truly technical aspects, at the expense of scientific precision. I will conclude this post with a very short summary:
  1. Causality hunting is a fascinating topic. As we accumulate more and more data, and as Artificial Intelligence tools become more powerful, it is quite logical to hunt for causality and to build models that represent a fragment of our world knowledge through machine learning. This is, for instance, the heart of the Causality Link startup led by my friend Pierre Haren, which builds automatically knowledge graphs from textual data while extracting causal links, which is then use for deep situation analysis with scenarios.
  2. Causality hunting is hard, especially with small data and even more with “Quantified Self” data, because of the random nature of many of the time series that are collected with connected devices. It is also hard because we cannot track everything and quite often what we are looking for depends on other variable (the orange part of the previous picture).
  3. Forecasting is an interesting tool for causality hunting. This is counter-intuitive since forecasting is close to impossible with self-tracking data. A better formulation should be: “ a moderate amount of robust forecasting may help with causality hunting". Forecasting gives a hint of “predictive causality”, in the sense of the Granger causality, and it also serves to enrich the pleasure-surprise-discovery learning loop of self-tracking.
  4. Machine code generation through reinforcement learning is a powerful technique for short time-series forecasting. Code generating algorithms try to assemble building blocks from a given set to match a given output. When applied to self-tracking forecasting, this technique allows to craft algorithms that are robust to random noise (to recognise the data as such) and able to extract a weak correlative signal from a complex (although short) data set.

Technorati Profile