Biology of Distributed Information Systems

Sunday, December 20, 2015

Event-Driven Architecture and Biomimicry

1. Introduction

Ten years ago I simultaneously discovered the concepts of Autonomic Computing and the fascinating book “Out of Control – the New Biology of Machines, Social Systems and the Economic World” from Kevin Kelly. This came at a moment when I was still the CIO of Bouygues Telecom and getting puzzled with the idea of “organic operations”. I had become keenly aware that high availability and reliability were managed – on paper – using a mechanistic and probabilistic vision of system on engineering, while real-life crisis were solved using a more “organic” approach of how systems worked. This is described with more details in my first book. Autonomic computing gave me a conceptual framework for thinking about organic and self-repairing systems design. I then had the chance to learn about Google operations in 2006, including a long discussion with Urs Hoëlzle, and found that many of these ideas were already applied. It became clear that complex properties such as high-availability, adaptability or smart behavior could be seen as emergent properties that were grown and not designed, and this lead to the opening of this blog.

I decided to end this year with a post that fits squarely into this blog’s positioning – i.e., what can we learn from biological systems to design distributed information systems? - with a focus on event-driven architectures. The starting point for this post is the reading of the report “Inside the Internet of Things (IoT)” from Deloitte University Press. This is an excellent document, which I found interesting from a technology perspective, but which I thought could be expanded with a more “organic” vision of IoT systems design. The “Information Value Loop” proposed by Deloitte advocates for augmented intelligence and augmented behavior, which is very much aligned with my previous post on the topic of IoT and Smart Systems. The following schema is extracted from this report; it shows a stack of technology capabilities that may be added to the stream of information collected from connected objects. From a technologist’s standpoint, I like this illustration: it captures a lot of key capabilities without loosing clarity. However, it portrays a holistic, unified, structured vision which is too far, in my opinion, to the organic nature of Systems of Systems that will likely emerge in the years to come.

The first section of this post will cover event-driven architectures, which are a natural framework for such systems. They also make perfect instances of “Distributed Information Systems” to which this blog is dedicated. The next section will introduce Complex Event Processing (CEP) as a platform for smart and adaptive behavior. I will focus mostly on how such systems should be grown and not designed, following the footsteps of Kevin Kelly. The last section will deal with the “cognitive computing” ambition of tomorrow’s smart systems. I will first propose a view that complements what is shown in the document from Deloitte, borrowing on biology to enrich the pattern of a “smart adaptive IoT system”. I will also advocate for a more organic, recursive, fractal vision of System of Systems design, in the spirit of the IRT SystemX.

I use the concept of biomimicry in a loose sense here, which is not as powerful or elegant as the real thing, as explained by Idriss Aberkane. In this blog, biomimicry (which I have also labelled as “biomimetism” in the past) means to look for nature as a source of inspiration for complex system design – hence the title of the blog. In today’s post, I will borrow a number of design principles for “smart systems of systems” from what I can read from biology about the brain or the human body, but a few of these principles directly come from readings about complex systems.

2. Event-Driven Architectures

Event-Driven Architectures (EDA) are well suited to design systems around smart objects, such as smart homes. Event-driven architectures are naturally scalable, open and distributed. The “Publication/Subscription” pattern is the cornerstone of application integration and modular system design. This was incidentally the foundation of application integration two decades ago, so there is no surprise that EDA has found its way back into SOA 2.0. I will not talk about technology solutions in this post, but a number of technologies such as Kafka or Smaza have appeared in the open source community that fit EDA systems. There is a natural fit to Internet of Things (IoT) – need for scalability, openness, decoupling – which is illustrated, for instance, in Cisco’s paper “Enriching Business Process through Internet of Everything”. Its reference to IfThisThenThat (IFTT), one of the most popular smart objects ecosystem, is a perfect example : IFTT has built its strategy on an open, API-based, event-driven architecture. The smart home protection service provided by myLively.com is another great instance of event-driven architecture at work to deliver a “smart experience” using sensors and connected devices.

In a smart system that adapts continuously to its environment, the preferred architecture is to distribute control and analytics. This is our first insight drawn both from complex systems and biological systems analysis. There are multiple possible reasons for this – because of the variety of control & analytics needs, because of the need for redundancy and reliability, because of performance constraints, … - but this should be taken more as an observation than a rational law (and it is more powerful as such). It is clear that “higher-level” control functions are more prone to errors and failure and that they typically react slower, which is why nature seems to favor redundant designs with multiple control engines and failover modes. Translated into the smart systems world, it means that we should both avoid single points of failure (SPOF) and single points of decisions (SPOD). In a smart home system, it is good to keep the control of the command layer if the automated system is down, and to keep the automated system on if the “smart” system is not operating properly. On the contrary, the distributed decision architecture designed decades ago by Marvin Minsky in his Society of Mind is a better pattern for robust smart systems. From a System of Systems design perspective, distributed control and analytics is indeed a way to ensure better performance (place the decision closer to where the action is, which recalls the trends towards edge computing, as exemplified by Cisco’s fog computing). It is also a way to adapt the choice of technology and analytics paradigms closer to the multiple situations that occur in a large distributed system.

A natural consequence of control distribution is the occurrence of redundant distributed storage. Although this is implicit in the Deloitte document, it is worth being underlined. Most complex control and decision systems require efficient access to data, hence distribution and redundancy are a matter of fact. Which leaves us with age-old data flows and synchronization issues (I write “age-old” since both the Brewer’s theorem or snapshot complexity show that these problems are here to stay). This topic is out of scope of this post, but I strongly suggest the reading of the Accenture document “Data Acceleration : Architecture for the Modern Data Supply Chain”. Not only does the document illustrate the “flow dimension” of the data architecture, which is critical to design adaptive and responsive systems based on EDA, but it explains the concept of data architecture patterns that may be used in various pieces of a system of systems. There is a very good argument, if it was necessary, made for data caching, including main-memory systems. There are two pitfalls that must be avoided when dealing with data architecture design issues: focusing on a static view of data as an asset and searching for a unifying holistic design (more about this in the next section: hierarchical decomposition and encapsulation still have merit in the 21^st century).

Smart biological systems operate on a multiplicity of time scales, irrespectively of the degree of “smartness”. What I mean by this is that smart living systems have developed control capabilities that operate on different time horizons: they are not different because of their deductive/inductive capabilities, but because their decision cycle runs on a completely different frequency. A very crude illustration of this idea could distinguish between reaction (short-term, emphasis on guaranteed latency), decision (still fast but less deterministic), adaptation (longer term). We shall see in the next section that the same distinction applies to learning, where adaptation could be distinguished from learning and reflection. Using the vocabulary of optimization theory, adaptation learns about the problem through variables adjustment, learning produces new problem formulation and reflection improves the satisfaction function. It is important to understand that really complex – or simple – approaches may be applied to each time scale: short-term is not a degraded version of long-term decision, nor is long-term an amplified and improved version of short-term. This is the reason for the now common pattern of the lambda-architecture which combines both hot and cold analytics. This understanding of multiple time scales is critical to smart System of Systems design. It has deep consequences, since most of what will be described later (goals, satisfaction criteria, learning feedback loops, emotion/pleasure/desire cycles) need to be thought about at different time scales. In a smart home, we equally need fast, secure and deterministic response time for user commands, smart behavior that requires complex decision capability and longer-term learning adaptive capabilities such as those of the ADHOCO system which I have quoted previously in this blog.

In this paper I consider a single system of its kind (even if a system of systems), but this should be further developed if the system is part of a population, which leads to collective learning (think of TESLA cars learning from one another) and population evolution (cf. Michio Kaku’s vision of emotion as Darwinian expression of population learning).

3. Emergent EDA Systems

Most systems produced by nature are hierarchical, this also applies to event architecture which must distinguish between different levels of events. Failure to do so results in systems that are too expensive (for instance, too much is stored) and too difficult to operate. For the architects reading this, please note that event “system-hierarchy” is not “event taxonomy” (which is supported out of the box by most frameworks), it is an abstraction hierarchy, not a specialization hierarchy (both are useful). A living organism uses a full hierarchy of events, some are very local, some gets propagated, some gets escalated to another scale of the system, etc. To distinguish between different levels of events, we need to introduce in smart systems what is known as Complex Event Processing (CEP). CEP is able to analyze and aggregate events to produce simple decisions which may trigger other events. You may find a more complete description of CEP in the following pages taken from theCEPblog, from which I have borrowed the illustration on the right. Similarly, you can learn a lot by watching YouTube videos of related open source technology platforms such as Storm.

A key feature of CEP is to be able to analyze and correlated events from a lower level to produce a higher level event. It is the foundation for event control logic in a “system of systems” architecture, moving from one level of abstraction to another. This is not, however, the unique responsibility of the CEP system. True to our “analytics everywhere” philosophy, “smarter” analytics systems, such as Big Data machine learning systems, need to be integrated onto the EDA, to participate to the smart behavior of the global system, in a fashion that is very similar to the organization of a living being.

Kevin Kelly’s advice for growing, rather than designing, emergent systems becomes especially relevant as soon as there exists a human in the loop. A key insight from smart system design is to let the system learn about the user and not the opposite (although one may argue that both are necessary in most cases, cf. the fourth section of my previous post). Systems that learn from their users’ behaviors are hard to design, it is easier to start from user feedback and satisfaction and let adaptation run its courses, than to get the “satisfaction equation” right from the first start. This is a key area of expertise of the IRT SystemX, which scientific and technology council I have the pleasure to lead. A number of ideas expressed here may be found in my inaugural talk of 2013. Emergence derives from feedback loops, which may be construed as “conversations”. CEP is the proper technology to develop a conversation with the user in the loop, following the insight of Chris Taylor who is obviously referring to the Cluetrain Manifesto’s “Markets are conversations”. The “complex” element of CEP is what makes the difference between a conversation (with the proper level of abstraction, listening and silence) and an automated response.

Another lesson from complex systems is that common goals should be reified (made first-class objects of the system) and distributed across smart adaptive distributed systems. There are two aspects to this rule. First, it states that complex systems with distributed control are defined by their “finality” which must be uniquely defined and shared. Second, the finality is transformed into actions locally, according to the local context and environment. This is both a key principle for designing Systems of Systems and a rule which has found its way to modern management theory. This is a lesson that has been discovered by distributed systems practitioners over and over. I found a vivid demonstration when working with OAI (optimization of application integration) over a decade ago. The best way to respect centrally-defined SLAs (service level agreements) is through policies that are distributed over the whole system and interpreted locally, as opposed to implementing a centralized monitoring system. This may be found in my paper about self-adaptive middleware. In the inaugural IRT speech that I mentioned earlier, I talked about SlapOS, the cloud programming OS, because Jean-Paul Smets told me a very similar story about SlapOS mechanism for maintaining SLA, which is also based on the distribution of goals, not commands. Commands are issued locally, in the proper context and environment, which is perfectly aligned with the control distribution strategy described in the previous section.

We should build intelligent capabilities the way nature builds muscles: by growing areas that are getting used. In the world of digital innovation, learning happens by doing. This simple but powerful ideas is a roadmap to growing emergent systems: start simple, observe what is missing but mostly reinforce what gets used. Example of reinforcement learning abound in biology from ants stigmergy to muscle growth through adaptation to efforts. Learning by doing is the heart of the lean startup approach, but it also applied to complex system design and engineering. This biology metaphor is well suited to avoid the pitfall of top-down feature-based design. Smart (hence emergent, if we follow Kevin Kelly’s axiom) systems must be grown in a bottom-up manner, by reinforcing gradually what matters most. This is especially useful when designing truly complex systems with cognitive capabilities (the topic of the next section). Nature tells us to think recursively (think fractal) and to grow from reinforcement (strengthening what is useful). If we throw a little bit of agile thinking into the picture, we understand why it’s better to start small when building an adaptive event-driven system.

4. Cognitive EDA Systems

As is rightly pointed by John E. Kelly from IBM, we are entering the new era of cognitive computing, with systems which grow by machine learning, not by programmatic design. This is precisely the vision of Kevin Kelly two decades ago. Cognitive systems, tells X. Kelly, “reason form a purpose”, which means that emergent systems emerge from their finality. The more the “how” is grown from experience (for instance, from data analysis in a Big Data setting), the more the definition and reification of goals become important (cf. previous section). One could argue that this already embedded into the Deloitte picture that I showed in the introduction, but there is a deeper transformation at work, which is why machine learning will play a bigger and more central role in IoT EDA architecture. I strongly suggest that you watch Dario Gill’s video about the rise of cognitive computing for IoT. His arguments about the usefulness of complex inferred computer model with no causality validation is very similar what is said in the NATF recently issued report on Big Data.

Biology obviously has a lot to teach us about cognitive, smart and adaptive systems. A simplistic view of our brain and nervous system distinguishes between different zones:

Reflexes (medulla oblongata and cerebellum) – these parts of the brain operate the unconscious regulation and the fine motor skills (cerebellum).
Emotions (amygdala) play a critical role in our decision process. There is an interplay between rational and emotional thoughts that has been popularized by Antonio Damasio’s best-seller. In a previous post, I referred to Michio Kaku’s analysis which makes emotion the equivalent of stored evaluation functions, honed through the evolution process.
Inductive thinking (cortex), since the brain is foremost a large associative memory.
Deductive thinking (front cortex), with a part of the brain that came later in the species evolution process and which is the last to grow in our own development process.

You may look at the previous link or at this one for more detailed information. I take this simplified view as input for the following pattern for cognitive event-driven architecture (see below). This is my own version of the introduction schema, with a few differences. Event-Driven architecture is the common glue and Complex Event Processing is the common routing technology. CEP is used to implement reflexes for the smart adaptive system that is connected with its environment (bottom part of the schema). Reflex decisions are based on rules wired with CEP but also on “emotions”, that is, valuation heuristics that are applied to input signals. Actions are either the result of reflexes or the result of planning. Goals are reified, as was explained in the previous section. This architecture pattern distinguishes between many different kinds of analytics and control capabilities. It should be made even richer if the multiple time-scale aspect was clearly shown. As said earlier, a number of these components (goals, emotions, anticipation) should be further specialized according to the time horizon under which they operate. Roughly speaking, one may recognize the earlier distinction between reflexes (CEP), decisions (with a separation between decision and planning, because planning is a specialized skill whereas decision could be left to a large type of Artificial Intelligence technology) and learning. Learning – which is meant to be covered by Big Data and Machine Learning capabilities – produces both adaptation (optimizing existing concepts) and “deep learning” (deriving news concepts). Learning is also leveraged to produce anticipation (forecasting), which is a key capability of advanced living beings. A specialized form of long-term learning, called reflection, is applied to question emotions versus long-term goals (reflection is a long-term process that assess the validity of the heuristic cost functions used to make short term decisions with respect to longer-term goals). Although this schema is a very simplified form of a learning system, it already shows multiple levels of feedback learning loops (meant to be used with different time scales).

It is important to notice that the previous picture is an incomplete representation of what was said in this post. The picture represents a pattern, which is meant to be declined in a “multi-scale” / “fractal” design, as opposed to a holistic system design view. Fractal architecture pattern was a core concept of the enterprise architecture book which I wrote in 2004. An organic design for enterprise architecture creates buffers, “isolation gateways” and redundancy that make the overall system more robust than a fully integrated design.

It is easier to build really smart smaller objects than large systems, thus they will appear first and “intelligence” will come locally before coming globally. This is the Darwinian consequence of the organic design principle. When one tries to develop a complex system in the spirit of the previous pattern, it is easier to produce with a more limited scope (input events, intended behaviors, …). Why, would you enquire ? Because intelligence comes from feedback loop analysis and it is easier to design and operate such a loop in a closed-system with a unique designer than with a larger-scope open system. Nothing in the previous schema says that it describes a big system. It could apply to a smart sensor or an intelligent camera. As a matter of fact, smart cameras such as Canary or Netatmo Welcome are good examples of advanced cognitive functions integration. A consequence is that the “System of Systems” organic approach is more likely to leverage advanced cognitive capabilities than more traditional integrated or functionally specialized designs (which one might infer from the introduction Deloitte picture). Fog computing makes a good case for edge computing, but it also promote a functional architecture which I believe to be too homogeneous and too global.

Sunday, October 18, 2015

Lean Startup and Lean Software Factory

1. Introduction

I had the pleasure to attend and talk at the Lean IT Summit in Paris last week. The first keynote, entitled “Lean Journey: what have we learned ?” was given by Dan Jones, the world-famous author of the Machine that Changed the World, the “book that introduced lean production to the world”.

I encourage you to watch the video as soon as it gets available on the Lean IT Summit web site. Meanwhile, here is a short selection of some of the most salient key points:

Convergence is happening, between lean and software, because as Mark Andreessen said, “software is eating the word”. Convergence means interplay: Lean software development has become a key theme of this conference for many years, while software development issues are becoming central to any production process, making the convergence between lean and IT critical. Dan Jones notices that lean works very well at the team level for IT: agile, scrum, Kanban … or Devops, seen as a great example of lean single flow continuous stream.
Lean is, foremost, a management system about learning. The imperative of continuous learning applies to all : front-line operators to all level of managers. Dan quoted Amazon as an example of a learning company that is trying things all the time. Learning from the customers is critical, from their feedbacks and using short cycles. One learns from daily practices, with the practice of measure to learn scientific discipline and to avoid jumping to conclusions.
Dan made a great review of the lean foundations: understand the work, break in small increment, standardize, that is, codify best practices to define the baseline for continuous improvement (standards are different from one team to another), etc. Tools are important, they provide the scaffolding for learning, such as visual management (critical to create the see-the-problem culture). Lean is something that is self-learned: You cannot simply invite experts to come and show it to you.
While visiting a back-office in a bank, Dan Jones noticed how much rework is one of the major plague of IT departments (something that I have seen firsthand), due to silo mentality and organizations being still jealous of their boundaries. When technical people do not know about their customers, inefficiency is unavoidable. This does not do justice to Dan’s testimony, when I head it I twitted “customer-centric software is eating the world while siloed software is eating dirt …”.

2. Lean Startup – A Global Perspective

Lean Startup was a common theme to the first half day of the summit. Dan Jones defined Lean Startup as the combination of design thinking and using small increments to get to fast production, in order to get enough of a design in a first product to get feedback and start continuous learning and improvement. I gave my own lecture later on about the use of Lean Startup at AXA Digital Agency. I was following Susana Jurado who talked about her experience with Lean Startup at Telefonica, and followed by Pierre Pezziardi who talked about the French government incubator.

My vision of Lean Startup is broad, as shown in the following illustration which is inspired by a schema from Dave Landis which we have extended at AXA together with Stephane Delbecque. We see the Lean Startup cycle covering from problem definition (with a clear influence from Design Thinking) to continuous growth (Growth Hacking) through the definition of a Minimum Viable Product. This broad vision is coherent with the scope of the seminal book by Eric Ries.

The slides and the video of my talk will also be available soon. Here is a show overview of the three parts which are portrayed in this schema :

Design thinking is about identifying a problem that matters and crafting through dialog and prototyping a promise called a Unique Value Proposition. Pain points must be collected analyzed through a lot of observation, resulting in the definition of the “job to be done”.
The central idea is to build a true product “that does not do much but does it very well”. The minimalism of the MVP is what guaranteed the lean principle of getting to meet the customers as soon as possible to learn from them. But the lean principle “right on the first time” applies as well: customers have no patience for half-baked stuff. The solution of this conundrum is to focus: select very few user stories and deliver an amazing solution. This is what Nathan Furr and Jeff Dyer are calling the “Minimum Awesome Product”.
Once the MVP is out, the real life starts – one must grow customer satisfaction. Growing customer usage and satisfaction is a co-learning process where feedback and iteration plays the central role. By listening and measuring, one may validate or invalidate the assumptions made during the design phase. By harvesting and improving what works for the customer, the goal is to reach the “Product market fit” and then scale (fuel the growth proactive & viral marketing) – what is described as “nail it then scale it”.

3. Dual Processes from Customer to Code and Code to Customer

I concluded my talk by stating that “Lean Startup must co-exist with the new requirements of producing software in the digital word”, which is another ways of expressing the “convergence idea” of Dan Jones. Two years earlier, I had given a talk at the Lean IT Summit about “Lean Software Factories”. Lean Startup and Lean Software factories are closely related, they are two faces of the same “digital innovation question”. The following picture expresses this as a set of dual processes:

A product design process (from customer to code), that is embodied by the Lean Startup approach as described earlier
A product delivery process (from code to customer) that is embodied by the Lean Software Factory metaphor, although I could also have selected Devops to capture the continuous build and delivery.

Obviously, the representation with arrows is misleading because, as shown in the previous picture, none of those two processes are linear. There are sequences of intertwined iteration loops. You do not apply Devops once you have built a MVP, but precisely to deliver this MVP and its successive evolutions.

What this picture expresses is that a great digital company must excel in both dimensions : to produce digital innovation and customer satisfaction though a customer-centric co-development process (Lean Startup) and to achieve excellence and speed in iterative delivery. A digital company produces experiences that are delivered through lines of code. The may KPI is speed, which is necessary to be innovative (first on the market) and, foremost, relevant (customer satisfaction stems from iteration, which requires speed).

Another way to say this is that you cannot be great at Lean Startup if you do not master the fast software delivery process, and that you cannot implement develop and reach the rewards of agile development if your company does not embrace the iterative, customer-centric, lean principles of Lean Startup product development. This is explicitly explained in OCTO’s book, “Les Géants du Web”, where digital innovation best practices are expressed both along the lean startup / culture and the DevOps / technology mastery axes.

4. Multi-Stage Learning Engines

As Dan Jones mentioned in his keynote, or as Michael Ballé explains in his books, the lean journey is mostly about learning. What makes the dual process of the previous section work well, and work well together, is the skills of the teams. These skills requires efforts and time to be learned. The following picture shows a complex system with three actors: the consumer, the product and the team. There is learning and adaptation everywhere:

The consumer learns from the product: it takes time for the consumer to discover and understand the product. Good design - which is about reducing friction and increasing pleasure - makes this as fast and as pleasurable as possible, but it still takes time. There is a learning curve associated with each innovative usage.
The product will “learn from” the consumer through feedback (the heart of the lean product development cycle). The core principle of Lean Startup is that one needs this feedback to manufacture really great products. Some of it is old-fashioned (listening to customer complaints and solving problems), some of is more “digital” (machine learning and big data from real-time analytics).
The team leans from building the product: there are skills that you only learn by doing. You can think, design and train all you like: it is only when you start the production adventure that real life happens and that some of the hard skills may be learned.
The product learns from the team: the product can only improve when the team grows its skills and knowledge.

In these four learning loops, there are delays everywhere, and obviously interaction (co-dependence of these learning loops). Quite logically, this is hard to grasp, understand and accept by upper management. There is always a delay between cause and effect. Good decisions usually create the context (empower the team or the product) so that learning may occur, and value (customer satisfaction) happens later. Because of this complexity and these delays, I have seen projects killed at the wrong moment many times: the hard/difficult point had been passed successfully, but the benefits was not yet visible. This is clearly related to what makes complex systems complex, especially our difficulty to understand loops and delays (cf. the sixth keys on my previous post about complex systems).

This schema does not necessary talk to every reader but it encapsulates two key ideas. First, there is a lot of learning that must take place before Lean Startup and Lean Software start to deliver massive improvements, both in time-to-market and customer satisfaction. Second, there are too many loops and delays involve to maintain the “illusion of control”. The art of lean product development is an emergent practice, something that you grow over time, not which is decided and rolled-out.

Biology of Distributed Information Systems

Sunday, December 20, 2015

Event-Driven Architecture and Biomimicry

1. Introduction

2. Event-Driven Architectures

3. Emergent EDA Systems

4. Cognitive EDA Systems

Sunday, October 18, 2015

Lean Startup and Lean Software Factory

1. Introduction

My Links

Blog Archive

Other Blogs and Sites