Biology of Distributed Information Systems: 2013

Tuesday, December 31, 2013

Seven Keys for Complex Systems Engineering

I gave a talk early this year at the “IRT SystemX” inauguration, about the challenges that occur when engineering “Systems of Systems”. This talk is a quick introduction of what we can learn from complex systems when designing large-scale interactive industrial systems. Complex systems are defined by their goals (purpose) and a set of sub-systems with rich interactions. The complexity of these interactions yields the concept of emergent behavior. Complex systems have a fractal nature, that is, they exhibit multiple scales, both from a physical/descriptive level and from a temporal level. Complex systems embed memory and have the capability to learn, which makes them both dynamic and adaptive systems. They interact constantly with their environment, which means that a dynamic vision of flows is more relevant than a static description of their top-down decomposition. Most complex systems renew their low-level components in a continuous process. Teleonomy and process analysis are, therefore, the most useful approach to capture the essence of a complex system.

I have become gradually fascinated by the topic of complex systems because I find it everywhere in my job and my own research. Complex systems is the right framework to understand the management and the organization of modern enterprises. This is the topic of my other blog. All that is said about complex systems in the previous paragraph applies to a company. I also found that this applies to information systems as well. The main reason for creating this blog was the realization that the proper control for information system has to be emergent, following the lead of Kevin Kelly and the intuition behind Autonomic Computing. Last, complex systems are everywhere when one tries to understand the most common business ecosystems, such as smartphone application development, smart homes or smart grids. I have talked about Smart Grids Players as a Complex System in this blog. More examples may be found in my keynote at CSDM 2012.

There is a paradox with the popularity of “complex systems science” in today’s business culture. On the one hand, the importance of complex systems’ concepts is obvious everywhere: system of systems, enterprises, markets. On the other hand, the practical insights are not so clear. « System thinking » has become a buzzword and the word “complexity” is everywhere … still many textbooks and articles which claim to apply “the latest of complex science theory” to business and management problems are either obscure or shallow. This is not to say that there does not exist a wealth of knowledge and practical insights that is available in complex systems literature. On the contrary, the following is a selection of some of the books which I have found useful during the last few years.

Today’s post is a crude and preliminary attempt to pick seven keys that I have found in these books which, to me at least, are practical in the sense that they unlock some of the complexity – or mystery – of the practical complex systems which I have encountered. There is no claim of completeness or rigorous selection. This is clearly a personal and subjective list which I consider a « work in progress ». This is just a list, so I will not develop each of the seven keys here, although each would deserve a blog post of its own.

Complexity means that forecasting is at best extremely slippery and difficult, and most often outright impossible. This is, for instance, the key lesson from Nassim Taleb’s books, such as The Black Swan. The non-linearity of complex system interactions causes the famed butterfly effect, in all kinds of disciplines. If you line up a series of queues, such as in the Beer Game supply chain example, each queue amplifies the variations produced by the previous one and the result is very hard to forecast, hence to control (this depends, obviously, of the system load). This does not mean that simulation of complex systems is useless, it means that is must be used for training as opposed to forecasting. Following Sun Tzu or François Jullien, one must practice “serious games” (such as war games) to learn about complex system from experience. This complexity also means that one needs as much data as possible to understand what is happening, and should beware of simplified/abstract description. “God is in the detail” has become a very popular business idiom in the last decades.
Complex systems most often live in a complex environment which makes homeostasis an (increasingly) complex feast of change management. Homeostasis describes the process through which a complex system continuously adapts to its changing environment. The characteristic of successful complex systems, in a business context, is the ability to react quickly, with a large range of possible reactions. This applies both at the level of what the system does and what it is capable of doing. This is illustrated by the rise of the word “agility” in the business vocabulary. The law of requisite variety tells us why detailed perception is crucial for a complex system (which is clearly exemplified by recent robots) : the system’s representation of the environment should be as detailed/varied as the sub-space from the outside environment that the homeostatis process needs to react to.
Complex systems, because of the non-linear interaction in general, and because its components have both memory and the capability to learn, exhibit statistical behaviors which are quite different from “classical” (Gaussian) distribution. This is one of the most fascinating insights from complex systems theory: fat tails (power laws) are the signature of intelligent behavior (such as learning). In classical physics or statistics, all individual events are (most often) assumed to be independent, which yields the law of large numbers and Gaussian distributions. But when the individual events are caused by actors who can learn or influence each other, this is no longer true. Rather than the obvious reference to Nassim Taleb, the best book I have read on this is The Physics of Wall Street. This works both ways: it warns us that “black swans” should be expected from complex systems, but also tells us that some form of coordinated behavior is probably at work when we observe a fat tail. There is another interesting consequence : small may be beautiful with complex systems, if adding many similar sub-systems creates un-foreseen complexity ! Classical statistics is all in favor of large scale and centralization (reduction of variability) whereas complex behavior may be better understood with a de-centralized approach. This is precisely one of the most interesting debates about the smart grids : if there is no feedback, learning and user behavior change, the linear nature of electricity consumption favors centralization (and large networks); if the opposite is true, a system of system approach may be the best one.
Resilience in complex systems often comes from the distribution of the whole system purpose to each of its subcomponents. This is another great insight from complex system theory: control needs to be not only distributed (to sub-systems) but also declarative, that is, the system’s purpose is distributed and the control (deriving the action from the purpose) is done “locally” (at the sub-system level). This idea of embedding the whole system’s purpose into each component is often referred as the holographic principle, with a nice hologram metaphor (in each piece of a hologram, there is a “picture” of the whole object). This principle has been proven many times experimentally with information systems’ design: it has produced “policy-based control”, where the goals/SLA/purposes are distributed in a declarative form (hence the word “policy”) to all sub-components. I gave the example of SlapOS in my IRT talk as a great illustration of this principle. This is also closely related to the need for fast reaction in the homeostasis process: agility requires distribution of control, with a bottom-up / networked organization similar to living organisms (for most critical functions). One of my favorite books which apply this to the world of enterprise organization is “Managing the Evolving Corporation” by Langdon Morris.
Efficiency in a complex system is strongly related to the capability to support information exchange flows. There is a wealth of information about the structure of information networks that best support these flows. Scale-free networks, for instance, occur in many complex systems, ranging from the Web to the molecular interactions in living cells and including social networks. Scale-free networks reduce the average diameter, among other interesting properties, and can be linked to avoiding long paths in communication chains, both for agility and resilience. The challenge that these information flows produce is represented by the product of the interaction richness (essence of complexity in a complex system) and the high frequency of these interaction (our key #2) – the product of two large numbers being an even larger number. My other blog is dedicated to the idea that managing the information flows is the most critical management challenge for the 21^st century (an idea borrowed from “Organizations” by March & Simon). For instance, the necessity to avoid long paths translates into versatility : complexity prevents specialization, because too much specialization generates even more synchronization flows. This communication challenge is not simply about capabilities (“the size of the communication pipes”), it is also about semantics and meaning. A common vocabulary is essential to most “systems of systems”, whether they are industrial systems or companies.
Complexity in time is something that is difficult to appreciate for humans. One of the most critical aspect of complex systems are the loops, mostly feedback loops. Peter Senge and John Sterman have written famous books about this. Reinforcement and stabilizing loops are what matter the most when trying to describe a complex system, precisely because of their non-linear natures. The combination of loops, memory and delays cause surprises to human observers. John Sterman gives many examples of overshooting, which happen when human over-react because of the delay. Kevin Kelly gives similar examples related to the management of wildlife ecosystem. The lesson from nature is a lesson of humility : we are not good at understanding delays and their systemic effects in a loop. In the world of business, we have a difficulty to understand long-term consequences of our actions, or simply to visualize long-term equilibriums. Many people think that user market share and sales market share should converge, given enough years, without seeing the bigger picture and the influence of attrition rate (churn). Even simple laws such a Little’s Law may produce counter-intuitive behaviors.
Efficient control for complex systems is an emergent property. Control strategies must be grown and learned, in a bottom-up approach as opposed to a top-down design. We are back to autonomous computing : top-down or centralized control does not work. It may be seen as another consequence of Ross Ashby law of requisite variety: complete control is simply impossible. Adaptive control required autonomy and leaning. This is, according to me, the key insight from Kevin Kelly’s book, Out of Control : “« Investing machines with the ability to adapt on their own, to evolve in their own directions, and grow without human oversight is the next great advance in technology. Giving machines freedom is the only way we can have intelligent control ». This insight is closely related to our key #4 : autonomy and learning transform progressively distributed policies into emergent control. There exists another corollary from this principle: such policies, or rules, should be simple, and the more complex the system, the simpler the rules. One could say that this is nothing more than the old idiom KISS, a battlefield lesson from engineering lore. But there is more to it, there seems to be a systemic law that is comforted by business experience: only simple explicit rules provide long-term values to complex systems. Any rule that is complex has to be implicit, that is constantly challenged and re-learned.

Sunday, October 20, 2013

Lean Startup & Lean Innovation Factory

I had the privilege to attend the Lean IT Summit in Paris a week ago, and was pleased to hear “The Lean Startup” mentioned in almost half of the talks. Actually, the Lean Startup is so popular that some are getting annoyed :) I co-wrote the preface of the French edition because I am a strong believer in the principles that Eric Ries explains in his book. However, with popularity comes exaggeration and re-interpretation. Here are two things I heard during the lean IT summit that got me annoyed as well:

The Lean Startup is what the lean community has expressed for a long time, with better words. Kudos to Eric Ries for being such a great communicator !
The Lean Startup is a lean reformulation of well-known innovation practices. Actually, innovation is in the genes of lean manufacturing, so no surprise there !

I disagree on both accounts:

The Lean Startup is not a book about lean, it’s a book about innovation, mostly startups but which is also relevant for larger companies, which is why I am such a strong advocate. After writing part of the preface, I ordered many dozens of the book which I have distributed freely in my own company. Sure, the lean framework gives a lot of sense to the overall contribution, but this is not the point.
Although many of the key ideas have been around for a while, the combination of these principles into a well-defined innovation process is a true contribution. It definitely goes against what most people believed to be innovation in larger companies. I had heard Eric Ries’s ideas expressed by a few VC from Silicon Valley, but they were anything but mainstream.

Hence this short post is about two things. The first part is a “Lean Startup for dummies” summary. It is by no means thorough nor complete, my French post from two years ago did a better job, but it is written for the corporate world and emphasizes what may be seen as “different”, at least compared with how “innovation” was described ten years ago, when we talked about “ideation factories”. The second part describes what I call “Lean Innovation Factory”, that is the application of Lean Startup principles to the innovation division of a large company.

1. Lean Startup for dummies

Eric Ries’s book deserves to be read because it is filled with meaningful examples. Therefore, a short summary cannot do justice to its content. Here I will only pick three key principles:

(a) Innovation is about doing, not about producing ideas

This principle is very similar to what the pretotyping manifesto promotes. The prototyping manifesto gave us these mottos: innovators beat ideas, pretotypes beat productypes, building beat talking… which all tell that the key part in innovation is the doing. This is especially true in the digital world, and is acknowledged by similar mottos from Google (“Focus on the user and all else will follow”, “Fast is better than slow”) or Facebook (“code wins”, ”done is better than perfect”). To innovate means, most of the times and above everything else, to meet a customer problem and to remove a pain point. Value creation occurs at the contact with the customer, not in a brainstorming room. This does not mean that ideation tools and techniques are not useful or important; it means that only “on the gemba” can we check that innovation actually works.

This is more revolutionary than it may sound for larger and older companies, which have associated the “innovation” word with “great ideas”. I have in my library dozens of book about innovation that distinguish between all kinds of innovation (according to the source of the “newness”) and that propose many processes for reaching all kinds of customers. The beauty of the lean startup framework is to simplify – so to speak, since value-creation-at-the-hands-of-the-customer is indeed hard – and to get rid of all the innovation funnels and ideation laboratory paraphernalia. What is clear to me after 15 years in the world of telecommunication service innovation is that everyone has the same ideas, the difference between success, failure and doing nothing (the most frequent case) is the quality of the execution process.

(b) Innovation requires iteration since nobody gets it right the first time.

This principle is often associated with the motto: fail fast to succeed sooner . In the Lean Startup world, it leads to the MVP: minimum viable product. Each word is important: a MVP is a product that may be placed in our customers’hands (this is not a prototype, it may be simple but it should not be fragile). A MVP is “viable” when it solves the customer’s problem. Its role is to jumpstart an iterative process of feedback collection, which may only happen if the customer finds a practical interest with the MVP, on the first day. A MVP is “minimal” because it is “as simple as possible but not simpler”, to paraphrase Einstein. This allows us to start the iteration as soon as possible, but not sooner. This emphasis on iteration echoes what a venture capitalist from Silicon Valley told me six years ago: there is no correlation between the success of a software startup and the quality of the piece of code that is shown to the early investors. On the other hand, there is a clear correlation between success and the ability to listen to the feedback of early customers and turn them into improvements.

This is also a bigger difference than one may think with the prevailing culture of large companies. It goes against the myth “you must get it right on the first time; you have only one chance to make the right impression”. The common culture of detailed market studies, coupled with the practice of lengthy marketing requirements, is replaced by a “hands-on” culture. MVP is a process that co-constructs software code, requirement and detailed specifications at the same time.

(c) A successful business model is built iteratively using customers’ feedbacks.

A successful business model is not a pre-condition but a post-condition for the innovation process. A startup is a “business model factory”; this is well understood today by the various startup “incubators” and “accelerators” and it may be acknowledged as one of Eric Ries’ contributions. To make a “business model factory” deliver, one needs three things. First, we need to set up measurement points in our MVP. We need to measure usage and value creation, that is, how the problem is being solved. Second, we need to build and then validate a value creation model, which Eric Ries calls innovation accounting. This is the direct application of the old saying “a measure is worth nothing without a model” (without a model, one does not known how to interpret a measure). This is an iterative process and not an exact science, where trials and errors is the common approach. On formulate hypotheses, which are either validated on invalidated by the collected measurements. Eric Ries is adamant in his book about preferring facts to opinions :). Last, when the model fails, the startup needs to “pivot”, that is to formulate a new value creation hypothesis. A key contribution from The Lean Startup is the wealth of examples and explanations regarding business models and pivoting.

This third principle is no less of a rupture with respect to the sanctity of the business case and its return on investment (RoI) that is observed in many large companies. It is simply not possible to formulate a credible business case when one starts to innovate. Obviously, one needs to start somewhere, hence there must be some initial hypotheses regarding value creation. However, the business model for the MVP is the result of an iterative process; the good news is that it comes with the validation provided by usage measures.

2. Lean Innovation Factory

I have started to use the term « Lean Innovation Factory » as a way to encapsulate principles from The Lean Startup applied to the innovation division of a large company, such as the one that I manage at Bouygues Telecom. The name Lean Innovation Factory (LIF) captures three ambitions:

(1) It is an innovation factory.

A “lean innovation factory” is a process that produces innovations. An innovation is a product or service that solves a problem, which is demonstrated in the hands of a customer. The process does not need to deliver a full-scale solution to prove its effectiveness, it can operate on a smaller set of customers, but only the “monitored feedback” of real users will validate the creation of innovative value. The emphasis is on “doing” and “building”; ideas have no glorified status in the Lean Innovation Factory, we strive for physical products and running software. We make ours the words of W. Edwards Deming : “In God we trust, all others must brings data”.

(2) It follows the “Lean Startup” principles.

The engine for creating value is the iteration of MVP feedback, which means that we strive to build the first MVP as quicky as possible (fail fast to succeed sooner), but while keeping the meaning of “viable” into our minds : the MVP is not a prototype, it is a product. We implement the heart of innovation accounting, in the sense that we measure feedback and we build continuously a value creation model that is validated or invalidated by our users.

(3) As a “factory”, the process is as important as the end result, because the result keeps changing while the strengths and the skills of the “factory workers” may build up.

This is the same pitch that I made for the “lean software factory”, and a reason for choosing a similar name :) To build a lean innovation factory is not only to build great product or service innovations, it means to build an organization that learns to do this better and better over time. This is clearly what Eric Ries tries to teach from his own experience with many startups, and where the link with "The Toyota Way" is the most evident.

Sunday, July 14, 2013

Follow-up on Lean Architecture

This post is a sequel to the previous post regarding the lean architect. It’s main topic is the book review of “Lean Architecture – for Agile Software Development” from James O. Coplien & Gertrud Bjørnvig This is really a follow-up in the sense that I have found most of the ideas from my previous post expressed in this book, but they are more thoughtfully presented :). It also goes further than my previous analysis, which is why I am writing this quick – and incomplete – book review. James Coplien is both a prolific author and a serious expert on software architecture, with an itinerary that is not so different from mine, especially with respect to object-oriented programming, which use to be my own domain of expertise twenty years ago. Similarly the 10 years which I have spent working on information systems architecture from 1997 to 2007, are well mirrored by Coplien track record in system’s architecture in the 90s and 00s.

The first key idea of “Lean Architecture” is the reconciliation of agile and architecture, because of the increase of scale for projects that agile methods are addressing today. “Extreme Programming (XP) started out in part by consciously trying to do exactly the opposite of what conventional wisdom recommended, and in part by limiting itself to small-scale software development. Over time, we have come in full circle, and many of the old practices are being restored, even in the halls and canon of Agiledom”. On page 161, one finds a nice graph from Boehm and Turner that shows the effect of size on the need for architecture. As usual with Boehm, this is a data-derived graph that captures one’s intuition: once a software project becomes large, anticipation and forethinking is required.

There is no surprise here. On page 15 we read “Ignoring architecture in the long term increases long term costs”, which any one with gray hairs knows firsthand. The insight from this book is that the lean contribution to agile is to bring back the long term and systemic focus into agile, hence the need for architecture.

Coplien and Bjørnvig bring a fresh and simple vision of lean software development, which is summarized in the book by two formulation of the same principle:

Lean = All hands on deck + long term planning
Lean = Everybody, All together, Early on

I would not say that “The Toyota Way” could be fit into such an equation, but it captures an important part of what lean management is about, and it’s relevant to emphasize the long term vision which is indeed a key aspect from lean. It also helps the authors to introduce a (creative) tension between lean and agile:

Agile is oriented towards change and organic complexity. For instance it is good to defer decision (this is not new or “agile per se” since Knuth is quoted to have said “premature optimization is the root of all evil”).
Lean is focused towards large-scale complexity (including “complication”) and long –term resilience. Hence the focus on standardization and bringing decisions forward.

This distinction is presented in a full table page 12, and, frankly, I disagree with most of its rows. For instance, team versus individual is not a good characterization of lean versus agile. Nor is the complicated versus complex debate … and the “high throughput (lean)” versus “low latency (agile)” debate is too restrictive. My views are expressed in a previous post and I think that there is much more in common between lean thinking and agile thinking. Still, I would definitely agree that the management of time, the systemic and holistic view, and the overall long-term perspective is different between lean and agile. Clearly, the goal of lean software development is to combine both, and this is where the extra emphasis on architecture comes from. It also explains the emphasis on maintainability, as expressed by this quote from Jeffrey Liker about lean: “a culture of stopping or slowing down to get quality right the first time to enhance productivity in the long run”. The tension between lean and agile is helpful to explain why one needs both the systemic practice of “Five Whys” in the search of root cause while problem solving, together with the frequent feedback and adaptive cycles. Indeed the complexity and the rate of change of our current environment require both: complexity demands the 5 whys, but the rate of change means that this is not enough, quick feedback loops are required.

An interesting development is this book deals with Conway’s law. This principle states that there is a strong dependency between architecture and organization. That is, the system architecture (module organization) will be strongly influenced by the management organization (teams & departments) that produces the piece of software. Conversely, things work best (from a management perspective) when people organization follows the system architecture. Coplien derives two consequences from this law, which strike me as truly relevant:

“Manager are über-architects, a responsibility not to be taken lightly” … here goes the myth that one may do a good job at managing software developers without a keen insight about software architecture :)
“There is a need for modularity in large-scale system” – this is obviously true … and indeed a consequence of Conway’s law. The goal of modularity is to keep changes local, and since it is unfortunately mandatory on the people side as soon as the scale of the projects tips on the “large side”, modularity in the organization must translate into modularity in software, hence the need for architecture :)

This post will not do justice to this excellent book, which is full of wisdom. I do not have the time to collect all pearls, such as “Remember that architecture is mainly about people” or “Software development is rarely a matter of having enough muscle to get the job done, but rather of having the right skill sets present”. I refer you to my book for my own two cents of wisdom about information system architecture and management. Still, it worth repeating that architecture is foremost a communication tool to manage change, and not a blueprint for a better word. One of my favorite quote from “Lean Architecture” is the following: “The customer has a larger stake in the development process than in the service that your software provides”. This is the crucial idea that is at the heart of the “Lean Software Factory”. In a dynamic and changing world, software is not an object, it is a process. Qualities such as evolvability, modularity and openness come from the people and the development process, much more than they apply to a finished product.

This brings me to my second favorite quote, page 131: “The essence of “Lean” in Lean architecture is to take careful, well-considered analysis and distill it into APIs written in everyday programming language”. Here we see the long-term/resilience thread of lean thinking brought into software development in a very practical manner. This quote expresses two key ideas into one sentence:

Being “future-proof” is having the right APIs so that the code may be both extended easily as well as reused in different manners from which it was intended in the first place.
Defining, I would rather say “growing”, the right set of APIs is an art; it requires practice, wisdom and aesthetical judgment. Still it required forethinking and analysis, which is architectural thinking.

This should be enough to reconcile agile programming with architecture, if there ever was a need. Coplien and Bjornvig spend a few delightful pages debunking “Agile Myths” such as:

“You can always refactor your way to a better architecture”. Refactoring is crucial because software is a live object that evolves constantly as its environment changes. Agile methods, like any iterative development processes, are bound to produce “accumulations” that need to be cleaned up.
“We should do things at the last moment”.
Here we find the “anticipation versus as-late-as-possible” debate that was mentioned earlier. Many authors, including Mary Poppendieck, consider that lean thinking translates into taking design decision as late as possible. This book does a good job at balancing the arguments. There is no single answer, but there is a strong plea for “thinking ahead” and “preparing oneself” through architectural design.
Agile = “don’t do documentation”.
The agile tradition of no comprehensive documentation that gets obsolete before it is used still stands. However, as soon as scale grows, and as soon as the life expectancy of the software piece grows, there is a need for serious documentation. There are many tools to automate part of the documentation task, especially that which is closely linked to the code. A large piece of software requires storytelling, and this has nothing to do with software, it’s a consequence of human nature and what is needed to motivate a large group of people (see Daniel Pink).

User stories are an integral part of Agile/SCRUM development methods. There are a few interesting pages in this book that show the link between user stories and business processes, which are commonly associated to large-scale waterfall development process. The last part of the book deals with DCI (Data, Context and Interaction), a framework that extends MVC and which is proposed as the proper foundation for designing modern distributed object-oriented systems. This topic is out of the scope of this post, although I find many similarities with the design philosophy of the CLAIRE programming language. Some of the key insights may be summarized as the need to recognize and separate the need for complex algorithm from object classes (“Does this mean that procedural design is completely dead ? Of course not. There are algorithms that are just algorithms”), the reification of roles and business processes and the use of context objects to develop functional polymorphism (hence to share and reuse more code), a practice that reminds me of my early day at Bouygues Telecom when we created the PTEs (Processus Techniques Elementaires) – cf. my book on Urbanization.

I will conclude with two quotes from page 92:

“Let the human considerations drive the partitioning [architecture], with software engineering concerns secondary”.
“A complex system might use several paradigms”.

Friday, May 10, 2013

Systemic Simulation of Smart Grids (S3G) - Part III

This post concludes the first phase of my computational experiments with S3G (Systemic Simulation of Smart Grids) which I run during 2011 & 2012 summers. I presented the results at the 2013 ROADEF conference a few months ago and I have made the extended set of slides available on my box account (left of the blog page – My Shared Documents).

1. S3G Experiments

A simple description of S3G is available in a previous post. It should help to understand what is presented in the slides, since many of those slides were included in that post. The objective of S3G is to simulate the production and consumption of electricity throughout a long period of time (15 years) with a global “ecosystem” perspective.

I will first add some explanations to three topics: the set of models, the satisfaction model and the GTES search for equilibriums. It is important to understand the limits of the current experiment before giving out the preliminary findings, since they need to be taken “with a grain of salt”.

As mentioned earlier, S3G uses very simple models (i.e. simple equations and few parameters) for the component of the “energy production & consumption” complex system. This is a deliberate choice, because I lack the expertise to produce more complex sub-models, and, mostly, because I want to focus on the overall system complexity (that is, what happens when all this simple subsystems are put together). This is clearly explained in my SystemX IRT introduction keynote. Still, it’s worth taking a look at each of these sub-models:

The energy demand generation is quite simple. I start with daily and yearly patterns, obtained by cut & pasting historic curves found on the web and I had random noise, which I can control (time or geography-dependent). I don’t think that this is a limitation for this experience.
I have a crude vision of “NegaWatts” that represent energy consumption that may be saved through energy saving investment. NegaWatts are virtuous: there is no reduction of economic output, but they require money. Here my model is really too simple, but somehow it falls outside the scope of what I was trying to accomplish. I use a simple hyperbolic function to represent the fact that, as electricity prices grow, people are likely to invest to try to reduce their consumption. Since it is very difficult to foresee the negawatt development in the next 20 years, it is better to use a single parameter (slope of the hyperbole) and make it vary to cover all kinds of scenarios.
I have an equally crude model for demand/response, which, contrary to Negawatts, is instantaneous but affect economic output. In my model (a simple S-curve), demand is reduced when peak price becomes too high. We’ll see later on that this is indeed too simple and that it should be further developed in a future next step.
The market share model – to determine the market share of the grid operator against the incumbent – is a simple/classic S-curve. My previous experience with similar economic simulation tells me that it is enough to produce a realistic experience (this is not the the systemic complexity lies).
On the other hand, the dynamic pricing model - how does the incumbent modulate his wholesale/retail price - is the heart of the relationship between the local and the national operators, and my current version is too simple. I assume that the price is a function of the output (demand), so that peak consumption yields a higher price. I have chosen a very simple function for my dynamic price equation: a piece-wise linear function, with a constant price up to a fixed (constant) production, and then a linear surcharge when the production is higher than this constant. Obviously, one would like to test and analyze more complex dynamic pricing schemes, since dynamic price and demand/response behaviors are a key engine for smart grids. The reason why I used a simple model is that this is precisely the complexity-generator for the model, and using a randomly-complex model makes it very difficult to analyze later. This pricing structure is under the control of regulation, and I am waiting to better understand what our political instances have in mind to encode a richer model (see later).
Last, the smart grid electricity production model is reasonably detailed for such an experiment. The decision about which source of electricity to use is actually straightforward due to the mix of production constraint (one must use the electricity that is produced) and economic goals (when sourcing, get the cheaper source). The only tricky part is the management of storage. I use simple rules, with a number of parameters that are tuned within the machine learning loop of the GTES simulation. Hence I let the simulation engine discover how to best use local storage. For instance, the local operator can both use storage as a “buffer” for its own production or as “reserve” to play the market (buy when cheap and sell when expensive). When I consider the small amount of storage that is actually used (because of storage price), this part of the model is quite satisfactory.

A GTES run is the simulation of an optimization loop that tries both to maximize each player’s satisfaction and to find a Nash equilibrium. Hence, defining each player’s satisfaction is a critical part of this S3G. Let us first recall that there are four players (actually, set of players) in this “game”. Each player has three goals, with a target value that is associated to each goal. We define a “strategy” as a triplet of pairs (goal, target). The satisfaction is then expressed with respect to each goal with a pseudo-linear function: 100% if the target value is reached and a linear fraction otherwise. The overall satisfaction for a strategy is the average satisfaction with respect to the three goals of the strategy. In the GTES method, we separate the parameters that represent these goals (grouped as a strategy) from the other parameters that the player may change to adjust its results, which we call the “tactical play”. The principle of the GTES game is that each player tries to adjust its “tactical parameters” to maximize its “satisfaction” (w.r.t its strategy). Here is a short description of the four players in the S3G game:

The “regulator” (political power) whose goal is to reduce CO2 emissions while preserving economic output and keeping a balanced budget (between taxes and incentives). Its three “goals” are, therefore, the total output (consumed electricity + negaWatts), the amount of CO2 and the state budget (taxes - subsidies). Its tactical play includes setting up a CO2 tax, regulating the wholesale price for the suppliers and creating a discount incentive for renewable energies.
The existing energy companies, here called “suppliers”, whose goal is to maintain their market-share against newcomers, maintain revenue and reduce exposure to consumption peaks. Their tactical play is mostly through pricing (dynamic), but they also control investment into new production facilities on a yearly basis.
The new local energy operators, who see “smart grids” as a differentiating technology to compete against incumbents. Their goal is to grow turnover, EBITDA and market-share. Their real-time tactical play is dynamic pricing, and they may invest into renewable and fossil energy production units, as well as storage units.
The consumers are grouped into cities, whose goal is to procure electricity at the lowest average price, while avoiding peak prices and preserving their comfort. The cities’ tactical play is mostly to switch its energy supplier (on a yearly basis) and to invest into “negaWatts”, which are energy-saving-investments (more energy-efficient homes, etc.).

GTES stands for Game-Theoretical Evolutionary Simulation. I have talked about it in various posts, and a summary may be found here. I gave a keynote talk about GTES at CSDM 2012, the slides are available here. GTES is a framework designed to study a model through simulation, in order to extract a few properties from this model (learning through examples), either explicitly or implicitly. GTES is based upon the combination of three techniques:

Sampling: since some parameters that occur in the economic equations are unknown, we draw them randomly from a confidence interval, using a Monte-Carlo approach. Monte-Carlo simulation has become quite popular (especially in the finance world) over the last decades (while computers became more powerful, obviously). The need for Monte-Carlo is a signature of complexity and non-linearity: simulation becomes necessary when one cannot reason with averages. The beauty of linear equations is precisely that one may work the average value. In a complex non-linear system, deviations are amplified and there is no other way to predict their effect than to look at it, case by case (hence the sampling approach).
Search for Nash Equilibrium in a repeated game: We set the parameters that define the player’s objective functions and look for an equilibrium using an iterative fixed-point approach (in the tradition of the Cournot Adjustment). The good news with S3G is that it is a “simple” complex system, hence finding a Nash equilibrium is easy. However, it is precisely easy because of the simple pricing model (cf. previous discussion).
Local Search as a machine learning technique: once the parameters that define the objective function are set, the other parameters that define the behavior of each player may be computed to find each player’s “best response” to the current situation. We use a simple local search (“local moves” = dichotomic search for the best value for each tactical parameter), coupled with “2-opt” : the random exploration of moving two parameters at the same time, using “hill climbing” as a meta-heuristic. From an OR point of view, this are rudimentary techniques, but they seem to do the job. The complexity of the optimization engine that one must embed into GTES depends on the complexity of the model. If the dynamic pricing model was made more complex, a stronger local search metaheuristic would be necessary.

Explaining GTES will take many years … I was invited at ROADEF’s yearly event last month to present some of the successes that I have had with this approach over the past 10 years. I have a book, “Enterprises and Complexity: Simulation, Games and Learning” in my “pipe”, but I expect at least five more years of work are needed to get it to a decent state (in terms of ease of understanding).

2. Most interesting findings with S3G experiments

A “S3G session” is made of interactive runs of “experiments”, which are GTES computational executions. More precisely, an experiment is defined through two things:

- The randomization boundaries, for those parameters that will be sampled.

- Some specific values for some parameters, since the goal of a “serious game” is to play “what-if scenarios”, by explicitly changing these parameters. For instance, we may play with the investment cost of storage, to see if storage is or will be critical to smart grids.

Multiple scenarios have been played to evaluate the sensitivity to “environmental parameters such as the variability of energy consumption (globally or locally), the fossil energy price (gas and coal), the possible reduction of the nuclear assets, the impact of carbon taxes or the impact of wholesale price regulation. Here is a short summary of the main findings that were presented at ROADEF:

Smart Grids and variability.One theoretical advantage of smart grids operator is that they could react better to variations. Simulation does show some form of better reaction from the local operator than the national operator to either fluctuation (electricity demand that varies compared to historical forecast) and local variation (for instance, through local changes of climatic conditions). However the difference is very small, and could be disqualified as insignificant from a statistical perspective. This result depends on storage price (see later) and wholesale price structure. With the current values, one of my key arguments in favor of smart grids (systems of distributed systems are expected to be more flexible and reactive) does not seem to hold.
Carbon tax and Nuclear strategy.
I played with carbon tax to see if the raise of carbon tax would have an effect, and it does, but it is a negative one since it favors nuclear energy and since green energy is still too expensive. On the other hand, the decision to reduce the share of nuclear energy in the national supplier (either for a long-term withdrawal or a long-term cap as announced by the French government) creates favorable conditions for smart grid operators, quite logically. However, simulation shows that the results are weak (small advantage) and unstable (they depend heavily on the overall systemic equation of wholesale prices coupled with “environment” variables such as energy prices).
Storage and Photo-voltaic costs.As explained earlier, I used the Web as my main information source, and got unit prices for storage (per MWh) and photo-voltaic that vary considerably according to the sources. I designed a number of scenarios to see what would happen if the prices fall down, as is expected by a number of “green experts”. The availability of cheap storage has an important impact, but one need to see a price reduction by a ratio of 5 to 10 (depending where you place the start point) to see this impact materialize. The simple rule seems to be that storage TCO (total cost of ownership) should get as low as 50% of wholesale price to shift the system’s behavior (quite logical if you think about it). A similar remark may be made about Photo-voltaic energy, which price is still far too high to change the smart grid operator economic mode.
Wholesale & retail price structure.This is the heart of the smart grid ecosystem: the rules/regulation that governs wholesale pricing – which controls the “coopetition” between supplier & operator – and the dynamic pricing for retail, which controls the benefits driven both from demand/response and negawatts. In the game theory tradition, we have built a strategy matrix that shows the result of conflicting strategies between the supplier and the operators, ranging from bold (focus on market-share) to aggressive (focus on revenue) through “soft” (more conservative). The sensitivity to the price regulation structure is such that it does not make sense to draw too much out of my simulations, except the fact that this is the critical part.
Sensitivity to oil price.I have played with a number of scenario regarding fossil energy price trends in the next 15 years. The sensitivity is much lower than expected, when comparing suppliers against operators. There is a clear impact on consumers, but the benefit in favor of green energy and smart grid operators is offset by the advantage in favor of nuclear energy. One may add that with shale gas, non-conventional oil and coal, this type of scenario is not likely in the next 15 years. What we see in the US is precisely the opposite.

3. Limits of current approach and next steps

Let me first summarize three obvious limits to the S3G approach:

As explained, both wholesale and retail dynamic pricing models are too simple. The shape of the curve is simplistic, but also the fact that price only depends on total demand is unrealistic (taking production costs into account is a must).
One of the expected benefits of smart grids is improved resilience, both to catastrophic events and to significant internal failures. I did not try to evaluate resilience, because I did not have enough data to generate meaningful scenarios. If you look at what is happening in Japan, local storage is deployed to increase resilience in the advent of a natural catastrophe, with good reasons (together with HEMS: home energy management systems).
My demand/response model is equally too simple, from two separate perspectives. First I only look at “shaving”, that is electricity that is not consumed, because the usage is forsaken for price/availability reasons. Another interesting alternative is to look at demand displacement, where the consumptions is “shifted” instead of “shaved”. Many usages, mostly related to heating, have enough inertia to be shifted by a few minutes. The other simplifying dimension is that I only look at the instantaneous benefit brought by demand/response that is the non-consumption of electricity at a time when prices are high. However, market prices do not raise that much, nor long enough, to make this “shaving” worth a lot of effort. On the other hand, it may help to avoid investing into excessive marginal capacity, which has a higher payoff.

This last argument is pointed out in the « Loi Brottes ». This article explains clearly the difference between the “capacity adjustment” value creation and the “production adjusting”.

"En l'état actuel du droit, aucun mécanisme n'est prévu pour rémunérer l'effacement au titre de sa valeur en capacité entre fin 2013 et l'hiver 2015-2016, autrement que par le biais du mécanisme d'ajustement, ce qui limite le développement des capacités d'effacement", indique-t-il dans l'exposé des motifs. Cet amendement a donc pout but d'assurer dès l'entrée en vigueur de la loi "le développement des effacements par un dispositif d'appel d'offres, dans l'attente de la mise en place du mécanisme de capacité pérenne qui permettra aux acteurs concernés de développer des capacités de production et d'effacement de consommation".

The reason why I focus on « production adjustment » is that it is much easier to simulate. Capacity adjustment is a three-parties value proposition (the user, the demand-response operator, and the producer whose capacity may be reduced). It requires a regulation (hence the Brottes law), to shift the capacity avoidance into operational benefits for the operator who will eventually share it with the user.

I will leave the S3G code alone for a while. When I resume this work (2014), I plan to take the following next steps:

Improve the satisfaction formula, using a product form instead of a sum. This is a classic technique when defining KPI for performance measurement. A product (i.e., multiplying the various sub-terms of section 1 instead of summing them) yields a more truthful representation of strategic satisfaction (it is much better to reach all three goals at 60% than getting 100% on two and totally missing the third).
Introduce parallelism (with a MapReduce architecture) to reach more stable results with more samples. Monte-Carlo simulation is designed for easy parallelization.
Enrich the dynamic pricing model (while sticking to piecewise-linear formulas) and re-evaluate the “model constants” (energy production and storage prices, which constanly evolve).

Sunday, January 6, 2013

« The Lean Architect » - Do we need Abstraction and Planning « on the Gemba » ?

Today’s post is a first attempt at defining the role of a software architect in a “Lean Software Factory”, that is, an agile development organization built around lean software principles. There is a natural tension between the two words “lean” and “architect”, so one may think that this title is an oxymoron: “lean management”, in general, tries to stay away from abstraction and too much planning, and promotes practical, concrete team-oriented problem solving “on the gemba”, where the action takes place. Agile development says that conception (requirements and design) evolves continuously while a software product is being developed. This is in contradiction with the architect’s role as defined in the V software development model, which is the product of Taylorism applied to development, where the Architect (capital A) thinks a grand design that is built by others. The team-oriented approach of lean software development relies on democracy and consensus, where each voice is heard, and where no “design architecture” can be accepted as a “non-negotiable” requirement. Similarly, a key practice of lean/agile software development is synchronicity (everyone working on the same “takt time”), which is implemented through stand-up meetings, common project rooms and intense real-time discussion with the product owner, which means that the very concept of “planning” and “design” seems in jeopardy.

The first question I’d like to ask is, therefore, “do we need an architectural process when software development is led in a lean/agile mode ?”. My answer is a resounding “YES”, with the following three arguments:

Agile or lean software development does not prevent from building software products that are actually complex and require a carefully-crafted form of internal organization. Let us recall that architecture is foremost about communication: its role is to provide organizations and actions with a narrative sense. In a lean software development process, architecture is a “co-creation”: architecture and design continuously evolve together with product development; they are not “completed” until the product is.
In a continuous development and integration process, we need an architectural framework to prevent from diverging (too many additions that move the product in different and opposite directions) and maintain a continuous and efficient refactoring. Software code is a living object, which evolves together with the rythms of sprints and update deliveries. As a living object, constant “gardening” is required, which is guided by a vision (of what the garden should be), which is precisely what an “architecture” should be. As Jean-Claude Ameisen explains in his wonderful book “La Sculpture du Vivant”, the birth of a living organism requires as much pruning and removal as creating and addition; this applies to software : there should be as much thoughts and care put into code pruning and removal as there is in lines-of-code addition.
A key principle from Toyota/Lean Management is the importance of a systemic vision that is shared by all actors (sharing a systemic understanding is a key role of the Obeya room). Architecture in a lean software development process is not restricted to a few hand-picked specialists; it is everyone’s responsibility, especially developers’. A major feedback from SCRUM practitioners is the importance of “design reviews”. Organizing successful design reviews is better conducted by “experts”, who are people with experience (architecture is an art, much more than a science – one learns from previous mistakes). These are precisely “architects”, people who are well qualified to lead a successful review and to encourage learning/capitalization. Agile does not mean than everything needs to be rediscovered “on the gemba”; 30 years of software architecture have produced both useful principles and tools.

The second question to ask is then « What role should an architect play in this emergent software development process ? ». Part of the answer is common to the previous question and has been brushed in the previous three bullet points. The architect is required to shepherd the continuous refactoring and to lead the regular design reviews. This is quite different from the classical role as defined by the V-model. Let us emphasize the practical and cultural differences through three critical insights:

An architect needs to be a story teller. Agile processes suggest working on requirements expressed as “story boards”. Story boards make for “units of requirement” which are gathered into “backlogs”. The “architectural vision” also needs to be expressed as a set of “stories”. Stories are a great form for expressing a requirement because they both carry meaning and they are robust. The heart of the agile process is that volatility, randomness, complexity of environment will force the product developers to adapt continuously during its completion. Strict and formal requirements tend to become obsolete or incompatible with one another too soon! Stories are robust because they can evolve; the combination of their meaning is context-dependent. I would qualify stories as “antifragile” following Nassim Taleb’s insight: the volatility and variations that will necessarily occur will “shake” the story and often produce even more value than the original thought contained. Serendipity is produced by the random shocks between stories from the backlog and continuous feedback from the users.
An architect works in a “pull mode”, she is pulled by the need for advice, assistance or explanation from developers. This is very different of the “push mode” from the Taylor-style organizations where the architect would hand out a document to the developers: “please implement the following architecture requirement R1 to R112”. This requires some form of change management, but this new way of working is quite fulfilling for an architect. She may feel the pleasure of participating to the actual creation, instead of being frustrated to see her “great principles, well-written in a forward-looking design document” ignored. This is the instantiation of a key lean principle that says that each actor in a process is at the service of the next actor (in the value creation process, from a chronological viewpoint). Kanban, both its industrial and software version, is the tool that puts this principle into practice.
Architects are stakeholders in the SCRUM backlog management: they ensure that performance tuning and bug fixing (especially “non-functional ones”) are maintained on the top of the stack. Obviously the product owner is the key stakeholder, and these concerns should be hers/his as well. An architect is not a “chief engineer” in the Toyota sense (it is unclear to me if there is such a role in a lean software development process, although it is often used in the Devops context). As with any complex project, there exists a recursive/fractal structure associated with a product (systems and sub-systems). There are a number of “product owners” (for subsystems) and a master “product owner” who plays a role similar to that of a chief engineer. The architect is a member of the engineering team, a voice that reflects the value of long-term thinking, simplicity and “well-oiled” running. Using lean/agile does not remove the age-old conflict between short-term and long-term goals. Products tend to last (at least, customers wish they would), hence long-term characteristics are essential to a well-designed and well-built product.

To conclude this post, I would like to stand back and ask a more general question « What is the role of the architect from a systemic viewpoint ? ».

At the side of the product owner, the lean architect carries the critical mission to build the “situation potential” in Sun Tzu’s sense. The concept of “situation potential” is the cornerstone of François Jullien’s book on efficiency. In a complex world, both François Jullien and Nassim Taleb tell us that forecasting (as in “strategic planning”) does not work well any longer and one should become “opportunistic”, in the (positive) Chinese sense (i.e., be prepared to jump on opportunities). Being able to leverage the future, whatever it holds, is the heart of this “Art of War” strategy. It relies on agility: to react quickly as soon as the opportunity arises. It also relies on building the right set of capabilities: the situation potential. A “good architecture” is precisely a “situation potential”. This becomes even more obvious as soon as the necessity of a platform strategy occurs – the core message of “What would Google do ?” by Jeff Jarvis. Most successful software products are platforms (cf. the famous post by Steve Yegge). Designing a platform requires every developer to be eager to contribute to an “open API culture”, to be ready “to eat your own dogfood” (the best way to implement the feedback circle that is necessary to reach quality). However, designing a powerful platform is precisely an architectural task.
A key tenet of “lean software”, as explained by Mary & Tom Poppendieck, is to free up creativity and innovation through cooperation. Architecture may be seen as the “grammar” for this cooperation. This is the core of “service-oriented architecture” (SOA). A software architect, irrespectively of the software development method (agile or not), aims to develop sharing and reuse of software components. This brings us back to the critical role of architecture to develop a software platform. This is lean & pretotyping at work: designing a platform is not (only) about thinking right, it is mostly about feedback loops and listening to users/customers. The value of a grammar is judged from the beautiful sentences that other write through it.
The more evolution we see in our current world, the more we need solid foundations to avoid losing our time building, destroying and re-building over and over. Layered architecture stays a relevant paradigm in the world of “lean software” because layer structure tells us the order in which things should be built and re-built. Experience shows that one cannot rely only of “auto-organization”: too many iterations – that entail massive rework – are necessary until a fixed-point is reached. Also, the practice of 5S (sorting, set in order, systematic cleaning, standardizing, and sustaining), applied to software (as explained by Mary Poppendieck for instance), is both relevant to code development and software architecture. There are many other interesting architectural contribution to a lean software product, such as the DCI architecture, which may be found in the book “Lean Software Architecture”.

This short post does not do justice to the complex topic of “lean software development” mixed with “software architecture”. The title, “Lean Architect”, applies to a long-term goal, which is to reconcile large-scale-systems-engineering with agile development methods. My own conviction is that “lean” – in the “Toyota Way” sense - is the appropriate framework for such an integration. Lean adds to agile methods the combination of a global/systemic vision, together with a practical focus on concrete issues (“on the gemba”). This is actually the core of Nassim’s Taleb “Antifragile” book: complex systems are learned by doing, practice is superior to thinking – and this obviously applies to architecture. Because it emphasizes practice and respect for people, lean management is geared to address the complex challenges of our environment.

Biology of Distributed Information Systems