Saturday, November 26, 2011

Lean IT, Devops and Cloud Programming


I have become more and more interested with lean IT over the years. It started with the book "The Art of Lean Software Development" by Curt Hibbs. 
I enjoyed this book because its introduction of lean is 100% compatible of what I learned from Jeffrey Liker and other great authors about TPS. This simple book helped me draw the lines between good software development methods such as extreme programming or agile programming, and lean management. For those of you who are not familiar with lean or extreme programming, here is a very crude summary of some of the most salient points of lean software development :
  • No delays: avoid as much as possible work that is sitting between two steps of the development process (what is called WIP – work in process – in the lean jargon). This is true at the process level (the goal is to design a streamlines/ “single piece flow” development organization) and the developer level. A practical goal is to avoid switching tasks as much as possible: focus on one thing and do it right!
  • Quality in (right first time) :  this lean principle translates into testing as early as possible (a tenet from agile programming) but also to use all techniques that improve the quality of the code, even at the expense of source code productivity since we all know that it is cheaper not to produce a bug than to remove it later. Here comes, for instance, the practice of pair programming and code reviews, but also “good practices” such as programming guidelines and standards
  • Fast delivery: the lean principle is to reduce the “lead time” of the software development process, which requires working on all stages. Removing in-between delays (cf. earlier) is necessary but not enough. Continuous integration is a core technique to achieve this goal, such as fast deployment techniques.
  • Short deliveries: It is more efficient to produce small pieces of software at a high rate than bigger pieces at a lower rate. This is another key principle from lean (“small batches”), which is doubly true for software development: not only are l smaller batches easier to build (a well-known law of software engineering), but the continuous evolution of customers needs and environment makes it easier to adapt with a small batch approach.
  • Less code: this is the application of the KISS principle !  Lean software development tries to stay away from complexity (see later in this post).  Unnecessary code may pop up from many sources, lean applies a technique called VSM (Value-Stream Mapping) and a posture of “muda removal”.  Muda (waste) removal means to go through the process with the “eyes of the customer” and remove all that does not produce value from her perspective. VSM is a tool that tracks the value creation and assigns it to each step of the process. Lean software development aims at producing the right product, without unnecessary features. It is also an architecture principle (stay away from complexity) designed at simpler and faster maintenance over the years.
  • Customer participation: the best way to produce only what is necessary from the user’s perspective is to ask her frequently ! This is why end-user/customer participation is a tenet of agile programming. When the customer is not available, the principle mixes with “small batches” to become : deliver fast, fail faster to succeed sooner.
  • Regular work effort : the leveling of the effort is a key principle from extreme programming, the equivalent of “heijunka” from lean programming. A few years ago, when I was still a CIO, I started thinking about “extreme IT” (applying extreme programming at the information system level), and the sustainability of the effort is a crucial point. Regularity has a counterpart, which is discipline. Using methods and tools (such as configuration management, source code versioning, automated testing) is crucial. One should take a look at the wonderful talk from Mark Striebeck on “Creating a testing culture” at Google.
  • Minimize handovers: complex tasks, such as writing software, are better accomplished when the responsibility chain is not truncated into too many segments. Working as a team is the only alternative to deliver complex projects (this is the topic of my third book). This is another insight from the Agile Manifesto, this is why today’s best practice it to assemble multi-disciplinary teams including software developers, marketers and designers.
There are a number of interesting books on this topic. The most classical ones are those from Mary Poppendieck, such as “Implementing Lean Software Development: from Concept to Cash”.  I had the pleasure of presenting my last book at the Lean IT Summit a month ago and to meet with Mary Poppendieck (a number of great talks, including hers, Michael Ballé’s and Georges Striebeck’s, are now available online).
There are also books which are not about software development per se but very closely related. I am especially fond of two books which I have reviewed in my French blog:
Thanks to Guillaume Fortaine, I have come to learn about Devops. Quoting from Wikipedia, “DevOps" is an emerging set of principles, methods and practices for communication, collaboration and integration between software development (application/software engineering) and IT operations (systems administration/infrastructure) professionals.  The more I read about Devops (Devops.com is a treasure trove of great articles), the more I find that Devops is the missing link between Agile/Extreme Programming and lean management (Toyota-style, hence Lean-Startup-style).  Although Devops claims to “help finishing what Agile Development started”, there are different flavors in what one may read on the web: bridging the development /admin & operations gap, making lean IT happen, making collaboration happen (and also),  bringing agility to operations,  which in turns makes it easier to leverage the benefits of new computing resources such as cloud programming. My first obvious interest with Devops has been the practical deployment of lean IT, as a followup of what I just explained earlier. For instance, Devops promote a number of interesting tools related to deployment automation, such  as GLU. It turns out that the “Cloud and Devops intersection” is also quite promising. Indeed, leveraging the strengths of cloud programming requires a shift in development/architecture culture that requires a “Devops-like” approach.

It happens that I gave a talk last week at the Model-Driven Day about « Complexity, Modularity and Abstraction” (the talk is available in the “My Shared Document” box on the left). The talk is about, among other things:
  • Complexity (why it is important, how to measure it and to tame complexity – since avoiding complexity altogether is not an option at the IS level)
  •  Sustainability: how to transform enterprise architecture into a regular practice. This is related to the concept of “extreme IT” : avoid the “heroic struggle” to move towards the “continuous transformation of the information system”. This is a major reason why I have been advocating for SOA as a company-wide enterprise architecture practice for many years.
  • Architecture-oriented services” : I made this pun to emphasize the difficulty to produce the “right” services through SOA. “Architecture-oriented” means services that have the right level of abstraction, that are modular and “easy-to-compose”. To my knowledge, there is no easy recipe for this, but the wisdom and folklore of 40 years of software architecture design apply.
  •  Cloud computing: I have added a small addition about “cloud-ready architecture” in the 4th edition of my first book. I strongly believe that Information Systems will change in a spectacular manner when we learn how to exploit massively parallel architecture (using tools/approaches such as Hadoop/MapReduce). Using Cloud Computing to provide with a few Saas-based front-office services is nice and relevant, but the big change comes when cloud-computing is applied to back-office services (provisioning, billing, data mining). This requires an architecture change, but mostly requires a culture change.
The thought that prompted me to write this post is as follows. Devops is the missing link between the themes of my MDD talk: managing complexity, delivering agility/modularity and moving to the new century of massively parallel computing (including, but not restricted to, cloud computing). I speak of "Cloud Programming" in the title because I agree with a comment made by Georges Reese: the Cloud is a computing resources defined by its API, i.e., a resource that is managed through programming. The exact quote is:
Cloud is, for the purposes of this discussion, is sort of the pinnacle of SOA in that it makes everything controllable through an API. If it has no API it is not Cloud. If you buy into this then you agree that everything that is Cloud is ultimately programmable.
 Not only the cloud is more agile (up/down scalability), it may be controled by the piece of software which is using it. From a systemic perspective, it's a "whole new game" :)

Sunday, September 25, 2011

Systemic Simulation of Smart Grids

This blog has been extremely quiet for two years, mostly because I was busy with other topics, including Lean and Enterprise 2.0. In my previous post, I mentioned that I was looking at “Smart Distributed Things” through two instances, Smart Grids and Smart Home Networks. I will not talk about the second in this blog, since it is work-related (we’ve come up with the acronym BAHN : Bouygues Autonomous Home Network).
Smart Grids make an interesting instance of the “smart distributed network” because of the extraordinary amount of interest/excitement/hype that exists around this topic. There is a range of conflicting opinions, ranging from “this is a straightforward and marginal improvement of existing networks which are already quite smart” to “smart grids are the backbone of a new sustainable society, based on communities, subsidiarity (organic, multi-scale, resilient organization) and self-aware optimal management of resources”.
I am not an expert with any of these topics: electricity, power grids, energy, storage … but like any citizen, I would like to build my own opinion about the future of our country and our planet, as far as energy and global warming are concerned. Hence the idea of a small systemic simulation has emerged during my vacation month (last month). I tried to assemble a crude model of all the different aspects of the “smart grid ecosystem” as we may understand it, without too much detail, just the broad principles. Each aspect is actually quite simple to explain, if taken in isolation (e.g., anyone may understand why it is smart to couple a solar panel with local storage). It is the combination of all viewpoints, together with the huge amount of uncertainty (about the economy, the speed at which technology will get cheaper, the speed at which behaviors will change, etc.), which makes this a hard topic.
This is where I got this insight: I actually have a method for complex embedded models where half is unknown and the other half is unclear: GTES (Game-Theoretical Evolutionary Simulation). GTES is the perfect platform to assemble conflicting views of what a smartgrid should be, conflicting views about how the actors should behave, and try to generate some sense. In a word, I am planning to build a “serious game” to have a closer look at smart grids.
Let me first clarify what I said about the range of opinions and introduce three views of the smart grid, which I could label: the “Utility view”, the “Google view” and the “Japanese view”:

  1. The “Utility view” defines a smart grid as adapting the power network to local sources (as opposed to a one-way distribution network that goes from few large GW production units towards millions of consumers), adapting to intermittent production sources (though storage and favoring flexible production units that can adapt to the power surges of intermittent sources like solar or wind) and using price incentives to “shave” demand peaks.  This is a “no-brainer” program (“what to do” is clear), where the major issue is price: most techniques show a cost/benefit ratio that is worse than current practice. To believe in this approach, you must believe that new technology prices will go down (e.g., solar, storage),   or that electricity prices will go “through the roof” in the next 20 years, or that global warming fears will drive a significant price for CO2.
  2. The “Google view” defines a smart grid as a change from a tree structure to a network structure (centralized to de-centralized), the use of market forces to create a dynamic and more efficient equilibrium between supply and demand, and the use of IT to provide information to all actors, including end consumers. The importance of signals (pricing and power grid control) is so important in this view that it is often said that telecom, IT and energy network will merge (something which I don’t believe for a second – but it shows the spirit of the importance of “smart” in “smart network”). Calling this the “Google view” is a friendly reference to “What Would Google DoJ  The core of this approach is the principle that a significant amount of efficiency would be obtained with a “hyper fluid market”, made possible through IT technology.
  3. The “Japanese view” is human-centered instead of being techno-centered. The goal is to change human behavior to adapt to new challenges (lack of resources, global warming, …). Smart grids are the backbone of a multi-scale architecture (smart home, neighborhood, city, region, country) where each level has its own resources and autonomy, resulting in a system that is more resilient and with more engagement as a consequence of more responsibilities. Smart grids support the necessary behavioral transformation through communities and constant feedback. I call this a “Japanese vision” because I have heard it beautifully explained in Tokyo, but the systemic approach is common to Asia as a whole. Because active communities are “engaging”, Smart Grids help get rid of waste (muda in the lean sense) such as useless transportation, un-necessary usage, etc.

These views do not necessarily conflict; it is possible to envision the union of all these ideas. However, as soon as one tries to imagine a convincing deployment scenario, there are a number of questions that pop to mind. Here are a few examples:
  • What part does local storage play? Can there be a smart grid without distributed storage? What price hypotheses are necessary to make this realistic, considering that current energy storage price are too high to justify large-scale deployment? Even if solar or wind becomes free, having to store the energy with today’s techniques make this approach uncompetitive (with today’s parameters, obviously)
  • What CO2 price would change significantly the cost/benefits analysis? Actually, this extends more generally to the pair energy prices + CO2 price, but CO2 price is especially significant. Because of the abundance of coal, and because of the increasing availability of new forms of gaz, CO2 price is, from a naïve point of view, the key factor that could change the economic analysis of introducing alternative intermittent sources of energy.
  • What is the systemic benefit of local management, that is, giving autonomy to local community to handle a part of their energy decisions, at different scales? This is a “system dynamic” issue, related to the handling of peaks, shortages and crisis. In a regular mode, there are obvious benefits to the centralized approach, from economies of scale to averaging pseudo-independent demand. The “pseudo” is a tiny word but full of consequences: the absence of independence is what produces complex systems and disasters, from the financial crisis to industrial accidents. It is the study of bursts and dynamic scenarios, with feedback loops, that may show the benefit of a “smart system” with faster counter-measures and learning abilities.
  • What could be the large-scale effect of dynamic pricing on self-optimization of customer demand?  The “marginal story” of “peak demand shaving” is beautiful: to remove the few hours of peak-production with gaz/oil turbines which produce CO2 at a high marginal cost, by returning some of the value to the consumer, to convince her/him to postpone parts of her usage. The most common example being heating (house or water) since inertia makes postponing a viable alternative. However the story does not necessarily scale so well, nor does it necessarily address the “resilience issue”. If additional capacity is needed no matter what to cope with some kinds of peaks, its marginal cost for dealing with the “other types” become quite small. We are back to a “system dynamic issue” that cannot be resolved with a “back-of-the-envelope” ROI computation, but requires a large-scale simulation.

My goal is to get a first simple set of answers to these questions. Obviously, using GTES is not going to give me a price or a definite answer, but rather a “sense of how things are interrelated and react as a whole (system)”. Here is a simplified description of the model that I have built last month:

      This is a “game theory model” with four actors:

(1)    The regulator (government) which controls the price of CO2 and may both favor renewable energy (investment incentive through tax breaks) or regulate them (impose joint storage with new intermittent sources). The long term goal (what we call a strategy in GTES) of the regulator is to reduce CO2 while maintaining the economic output of the country.
(2)    The utility (national energy supplier) which runs its production assets, distribute electricity and sets its price dynamically according to demand and primary energy (oil / gaz) price variations. One will notice that I implement an ideal world where price can be set up freely and change constantly, which is very far from the truth, but I want to address question #4. The long term goal of the utility is to make money, deliver a proper return on investment for its new acquisition (if any) and ensure resilience (the ability to serve the necessary amount of energy in the future).
(3)    The local operator who operates a smart grid associated to a city or a county. The operator manages all local production and storage capacity that is linked to the grid (wind turbines, solar panels, storage, etc.). It also operates a fossil fuel small-scale plant to provide additional electricity when required, although it may also buy it at wholesale prices from the utility. The long term goal of the operator is simply to make money :)
(4)    The end consumer who tries to reduce her electricity bill (both its average value and its expected maximum value) while preserving her comfort. Optionally, this may mean to reduce her CO2 emissions :)
-        
    The smart grid architecture is pretty naïve. I simulate a country that could be France or another European country, with one electricity utility, one thousand smart cities/ smart communities (with their own local operator), and twenty millions households. The national supplier produces most of the electricity when the game starts, using a mix of nuclear and fossil plants. The electricity is either sold to the local operators, or sold directly to the consumers (hence each operator has a given market share, that is the percentage of households that get their energy from this alternative supplier).

-          The heart of the model is the demand generation. Running the model once demand is established is not difficult, although the operational mode of the operator requires a careful description, since many choices need to be made (local production vs. wholesale buying, using energy from storage or storing energy for future use, etc.). The demand generator is actually crude and starts from a yearly and a daily pattern, to which a lot of random noise is added. The model allows both for independent noise (each consumer is different) and dependent variation (weather variations are shared by everyone in the same city).

-          Each actor can play his game through a few decisions (what we call its tactic in GTES). The utility most important decision is its pricing tactic. Prices are defined through a bunch of linear formulas, the coefficient of which are GTES tactical parameters. For those who are not familiar with GTES, an evolutionary algorithm (local optimization) is run to optimize (find the best value) these tactical parameters. Other decisions from the utility involve further investments in its production plant.  The local operator has a similar range of choices to make. It needs to define its operational production mode; it also needs to set up its pricing scheme. Once a year, it needs to decide about new investments, whether they are additional fossil fuel production capacities, renewable energy investment or additional storage. The end consumer can decide to reduce her demand when the price gets too high, at the expense of comfort (the tradeoff between the two being a design parameter of the model). The consumer may also switch from national to local providers and back. Last, the consumer may invest in “negawatt equipment” (such as house insulation or more energy-efficient equipment).

This whole model defines a “game”, which could actually be turned into a real game, SIM-style. Each actor is trying to maximize its long-term goal while making the proper “tactical choices”.
What are the next steps ?
  1. The code was written last month but I still need to run “20 years simulations” and make sure that the model is credible (the story told by one simulation run makes sense)
  2. I then need to “explore the tactics”, which means that when one of the actor make a decision, the impact on the game outcome is credible. This is the most time-consuming part, even with a simple model like this. This ensures that the game is “realistic”, even if it is obviously naïve.
  3. Apply game theory to find (Nash) equilibriums –  This is the fun part, since I have nothing to do and will leverage code that I have written for other problems. This is what GTES is designed for: looking at the conflicting strategies of the different actors.
  4. The last step of GTES is to randomize the “design parameters”. The model relies on a number of design parameters such as the demand-generation curve, the sensitivity to price or the efficiency of peak shaving, to name a few. I have no way to calibrate the S-curves that I am using, so I randomize the choice of these design parameters (Monte-Carlo simulation) to see if their value changes the conclusion that I would like to draw from these repeated simulations.
I’ll post a summary of my results if and when the computational experiments are successful. Today’s goal was just to share my overall analysis of the “smart grids” domain.

Sunday, January 2, 2011

Darwin, Lamarck and Service-Oriented Architecture

This blog has been sleeping in 2010 since I was writing a third book on « Business Processes and Enterprise 2.0 », an attempt to capture my past years involvement with lean management and information flows. Now that the book is over (I expect it to be published this spring), I am turning my attention back to autonomic/autonomous systems, networks and grids.

Although one could say that the promises of "Autonomic computing" (circa 2003/2004) have not materialized in the world of IT, the premises remain valid. My belief is that it will simply take longer to get effective technology in place in the world of corporate IT. As a research theme (which started much earlier and was very active in the 90s), "autonomous information technologies" (the combination of artificial intelligence, distributed control, adaptive software, quality of service monitoring … to name a few) is still very active.

I predict that there will be significant additional R&D efforts deployed in the coming decade, because of two related fields which are becoming "extremely hot", while requiring the same kind of scientific advances to transform hype into practical innovations:

  • Smart Grids, where the ambition captured by the world "smart" mirrors the goal of autonomic computing : self-adaptive, self-organizing and self-healing. There is no need to explain why smart grids are strategic to this century, but it is also easy to recognize the implicit difficulty of the endeavor. The heart of the smart grid principle is to evolve from the centralized management of current power networks towards a distributed and adaptive design, which feasibility remains to be proven on a large scale.
  • Home Networks, which are growing like mushrooms in our houses, and which have already reached a complexity level that is unacceptable to most households. "Smart houses" are necessary to fulfill the promises of multiple industries: energy – where smart, energy-efficient houses are actually parts of the previously mentioned smart grids -, content and entertainment – IP content anywhere, any time, on any device, home security and management, healthcare – for instance, out-care within the home, etc. The various control/distribution networks may all share the IP protocol, the complexity of provisioning, pairing, routing and interconnecting is rapidly becoming an impossible burden. Here also, the words "self-provisioning", "self-repair" and "self-discovery" are quickly becoming requirements from the customers.

It would not be difficult to define a SDT (Smart Distributed Thing) that regroups the challenges of distributed information systems, smart grids and smart home networks … With this first post of the year, I'd like to explore three ideas which I have been toying with during the past few months and which I intend to explore more seriously in the future.

  1. There is a lot of wisdom in applying biomimetics to replicate evolution to produce autonomous systems. This is especially true for smart home automation networks. Rather than designing "a grand scheme" of "the smart house's nervous system", it is much safer to start with simpler subsystems, add a first layer of local control, then add a few reflexes (limited form of autonomy), then add a second layer of global control … and end up with a "cortex" of advanced "intelligent" functions. Multi-layered redundant designs such as those produced by evolution are more robust (a key feature for a home automation control network), more stable (a key insight from complex system theory which is worth a post by itself) and more manageable. The need for recursive/fractal architecture is nothing new: I wrote about it with respect to information system architecture many years ago in my first book. I went from a global architecture (which was common when EAI was a catchwordJ) to a loosely coupled collection of subsystems (so called fractal enterprise architecture), for the same reasons: increase robustness, reduce operational complexity and, most importantly, increase the manageability (the rate of evolution is not constant over a large information system). There is much more than fractal design involved here: the hierarchy of cognitive functions from low-level pulse, reflexes, to skills and then "creative" thinking, is equally suited to the design of a SDT.
  2. Autonomous systems tend to scare end-users unless they embody the principles of calm computing. Calm computing is derived from the concept of ubiquitous computing (cf. the pioneering work of Mark Weiser at Xerox Park), and addresses the concerns that emerge when "computers are everywhere (ubiquitous)". Calm computing is very relevant to SDT, I could summarize the three main principles as follows: a smart ubiquitous system must act "in the background" and not "in your face" (it must be discrete J), it needs to be adaptive and learn from the interaction with its users (the complexity of the users must be recognized in the overall system) and, most importantly (from the user's perspective), it should be stoppable (you must be able to shut it down easily at any time). This becomes much easier with a fractal/layered design (previous point) and more difficult with a monolithic global design. There is a wealth of ideas in the early papers about calm technology, such as minimizing the consumption of the user's attention.
  3. The emergence of software ecosystems most often needs to be guided/shepherded, and rarely occurs in the wild as random events. This is a key point since it is widely acknowledged that software ecosystems (such as iPhone applications) are where innovation occurs (and where value is created from the end-user's point of view). In the realm of home network, I have been advocating for open architecture, open standards and (web) service exposition for many years, thinking that open standards for the "Home Service Bus" would attract an ecosystem of service providers. You create the opportunity and evolution/selection of the fittest does the rest (Darwin). The last two years spent thinking about sustainable development (i.e., analyzing complex systems' architectures) and looking at successful software ecosystems have made me reconsider my Darwinian position. I am much more a follower of Lamarck these days: I see a "grand architect" in the success of many application stores, iPhone being the obvious example. Open standards and open API is not enough. The spectacular failure of major Web Service exposure programs from large telco is a good example. You need to provide SDKs (more assistance for the developper), a "soul" (a common programming model) and a sense of excitement/challenge/cool (which obviously requires some marketing).


 

This third point is precisely the cause for SOA (Service-Oriented Architecture). This is an observation that I have made earlier: reuse in the world of corporate IT does not occur easily or randomly, it requires serious work. To put it differently, to come up with a catalog of reusable services is not to deploy a service-oriented architecture (with a Darwinian hope that the "fittest" services would survive). To make SOA work, you need to organize (hence the word "architecture"), promote, plan and communicate. There is a need for a "grand architect" and a "common sense of destiny" for SOA to bring its expected benefits of sharing, reusing and cost reduction.

Monday, January 25, 2010

Taming Information Systems Complexity

This blog has been silent for a long time. I'll resume with a topic which is drawn from my course at Ecole Polytechnique: measuring and mastering the complexity of information systems. I have written many times, including in this blog, that the first job of a CIO is to master the complexity of her/his company's information system.

1. Which complexity ?

Although the difference between complex and complicated is fuzzy and varies according to the source (for instance, it is not supported by the TLF, the official dictionary of French language), it has emerged as follows: complicated is matter of size and scope, where complex describes the nature of the relationship between the components of a system. Complexity (in the sense of complex systems) arise when the finality and the behavior of a system cannot be derived from those of its component (hence the concept of emergence). The three most common ingredients in a complex system are:

  • Feedback loops (and the non-linear resulting behavior that result from amplification)
  • Delays (especially long-term delays) that generate "temporal complexity" which can easily puzzle us.
  • Human factor, that is, the presence of humans as components of the global system.

Information systems are both complicated and complex. The fact that information systems are complex systems is something that I have touched upon in previous posts. I will give example of "complexity and emergence" in a future post, here I want to address complexity from a practical angle, as it appears to the CIO. Here is a summary of what makes information systems complex:

  • Too many things: the sheer number of components, of apps, of interfaces. Although standardization and automation of component management help to master this dimension, it is obviously part of the problem (i.e. the information system of a small company is neither complex nor complicated).
  • Too many interactions: these numerous components interact in many ways both explicit and implicit. Reducing the number of explicit interaction is the goal of enterprise architecture, and technology (integration middleware) may help. Implicit interaction, such as the use of a common resource, is more subtle to track. Reducing implicit interaction is the goal of a modular architecture, which is more an art than a science.
  • Temporal complexity: many relevant time scales coexist, with both very short term delays which requires mastering so-called "real time" behavior and long term life cycles that demand to step back and anticipate
  • Human complexity: information systems are centered (or should be) around human users. This is the source of uncertainty, of plain errors (e.g. typing errors) and interaction errors when users try to second-guess the system (which is unavoidable since humans are intelligent – cf. Charles Perrow's remarkable book "Normal Accidents").


2. Measuring Complexity ?

Measuring complexity is indeed difficult and I know of fewer measures than there exist "dimensions of complexity" as explained previously.

The first dimension (size) means to associate a weight to the information systems, which is a combination of counting and associating a weight to each component. This is the best understood part:

  • Applications, or software components, can be measured using function points
  • Computing resources may be measured using TPM-C (the most obvious choice for "commercial software" but other, more specialized metrics/benchmarks are available for specific purposes.
  • Storage resources are easily measured in Teraytes or Petabytes.


The second dimension is structural complexity, which measures the richness of explicit interactions between components. The most common example of such a measure is cyclomatic complexity, which counts the number of elementary cycles within a graph (the interaction graph). Cyclomatic complexity was popularized a few decades ago and found useful to measure software architecture. A better approach for information systems is Euclidian Scalar Complexity (ESC). Given an architecture diagram with n objects with associated weights (w1, … wn) and m edges between these objects, the Euclidian scalar complexity is defined as:

  • The square root of the sum of products (wi X wj), if i = j or components i and j are linked through an edge.

It is one of the rare metric that is scale invariant (insensitive to the "zoom effect") and invariant to extension without information loss. For more information, you may download a research article from Caseau, Krob and Peyronnet.

The third dimension is the complexity of implicit interaction, which is precisely the definition of modularity. Although one may define a co-evolution distance as the probability (among all possible changes) that an impact on component A also yields a change for component B, this definition is too theoretical to be useful. My own experience suggest to make specialized architectural diagrams for co-evolution (called "coupling") and to used ESC to measure the complexity of the resulting diagram. What are the possible causes for co-evolution? Here is a short and incomplete list to explain this concept:

  • Objects: components that share business objects are co-dependent.
  • Processes: similarly, the existence of a business process that uses both components A and B makes these two components linked (even if no direct reciprocal calls are made)
  • User interfaces coherence: for instance, the requirement for a coherent multi-channel access may create dependencies among components that are functionally independents.


    To summarize, this is the "measuring discipline" that I suggest to my students:
  • To be performed continuously : counting, sorting, weighing components (e.g.: function points)
  • To be performed once in a while: applying ESC to the usual architecture diagrams and maps, producing coupling specialized architectural charts (i.e., process interaction, business object lifecycles …)
  • To be performed "on demand": a detailed complexity analysis to decide between two architectural options


3. Taming Complexity

I have collected the following list of approaches, from simpler to more complex, which is actually quite thorough and effective, while still being quite practical (any comments on how to extend the list are welcome!). It is not a list drawn from "complex system theory", but rather from practical experience.
  • Simple approach: draw diagrams and maps (cartography). This may sound silly, but drawing architecture diagram is still the best way to cope with complexity, assuming that the meta-diagram (the meaning of the graphical conventions) is well-understood. This is what makes UML2 so useful.
  • Systematic approach: Enterprise Architecture (what we French call urbanization). Enterprise Architecture is, by construction, a method geared to reduce the information system complexity.
  • Technology approach: Infrastructure (middleware). As mentioned earlier, integration infrastructures have a clear benefit over the structural complexity. It can actually be proven using ESC (one of my course favorite exercise).
  • Common sense approach: Energetic Standardization. Reducing the heterogeneity of the components effectively reduces the complexity.
  • Hardest approach: modularity (de-coupling), that is producing a modular architecture. As explained earlier, there is no guaranteed method, but it is a skill that is learned through trials and errors.
  • Strategic approach: SOA (governance) as a strategic answer to complexity. SOA has a very positive impact on modularity and favors mutualization and reuse (hence a mechanical reduction of complexity). It also plays a crucial role in the governance of information systems, reducing the human complexity of satisfying the complete range of stakeholders.
  • Sustainable development of the Information System. This is a topic which I have already covered in a previous post. Sustainable development, as advocated by SITA, is a way to master temporal complexity and to avoid painful paradoxes.

If this is the practical list, what would the "theoretical one add" ? Clearly, I would add the influence of biology (hence the theme of this blog) and "autonomic computing" to build systems that self-organize and self-manage their own complexity. This is an ongoing topic of reflection, to be covered in a future post.

Saturday, September 19, 2009

New Shared Document

For reasons explained in my other blog, I am keeping quiet for a while. However, I have added the "Shared Documents" gadget on this blog and added a new presentation about SOA and BPM.

I gave this invited talk at the SOA & BPM IDC Conference on September 17th.
As usual, it is offered under creative commons rules.

Saturday, July 11, 2009

Kolmogorov and the measure of competitive value


I was fortunate enough to attend USI and, although I could not participate to all the sessions, it has been quite fruitful. The (summer) "University of the Information System" is organized by Octo, BCG, le Monde Informatique and TV4IT. It is a great gathering for "bosses and geeks", with lots of opportunities for networking (and meeting old friends as far as I am concerned) and learning exciting stuff (the list of keynotes is amazing).

1. Complexity

I'll start with the key idea that attending a brainstorming session moderated by Luc de Brabandere generated. It may be stated in a pompous manner as:

  • CompetitiveValue(IS) = f(complexity)
    The competitive value from Information Systems is a function of their complexity (in the Kolmogorov sense)

It starts as follows: what is not complex is easily reproduced and become a commodity, something that anyone can use and that may not, therefore, seen as a competitive advantage. For instance, the chisel in the hand of the stone carver is such a commodity tool. Although it is crucial to the task, and is taken great care of, the chisel is not a differentiating factor. Anyone can get a great chisel. What makes a great stone statue is the talent and the craft from the hands of the stone carver. For those companies for which IT is a differentiating factor, there is a fair amount of complexity that has been mastered, from a size, a technological or a business integration perspective.

This is actually very close to the concept of information measure as defined by Kolmogorov. Let's recall that Kolmogorov measures the complexity of an information sequence as the size of the smallest program that can generate the sequence. Anything that is very rich but generated from a set of few rules has a small Kolmogorov's complexity, while chaotic and random structures have a high Kolmogorov complexity. Here the measure of the information systems is precisely what cannot be reduced to a set of rules and a few enabling technology. If you have a large information system which is using standard tools, standard techniques in a usual manner, its complexity measure will be small. If you have an information system that is uniquely tuned to the business, where the practical know-how built over years has helped to resolve technical difficulties, its complexity measure is high.

This approach is strikingly compatible with Nicholas Carr's position on "IT does not matter", that is, IT without complexity is a commodity. One must read the original article or the book to see that Carr is talking precisely about the competitive value of information systems. His vision, which is fairly optimistic in its timing but generally accepted as target architecture, is that Web Service mash-ups will transform IT into a commodity. This Web Service / Cloud IT will not be without value (as necessary as electricity) but without differentiating value. I disagree about the availability of this “commodity IT” (cf. my book “Information Technology for the Chief Executive”, whose second chapter talks about N. Carr’s position), but I definitely agree with the (obvious) statement that there is no differentiating value with a commodity service.

One could say IT without complexity does not exist, but it is not true. Software as a service, for instance, is clearly one direction to remove a fair amount of complexity. Really simple IT exists; unfortunately it cannot solve all problems. At the end of the day, there remains a lot of complexity, irrespective of the technology or the procurement options that are chosen (cf. the Web site of the SITA: Sustainable IT Architecture). This is why companies need a CIO in the first placeJ. For me, the first job of the CIO is to manage complexity. This includes:

  • Reducing complexity through an Enterprise Architecture approach,
  • Removing complexity whenever possible, that is empower users to manage their own information system;
  • Taming complexity through collaboration and training

The paradox is that the CIO’s mission is to constantly reduce the perimeter of her/his job. But since the mission of any enterprise is to be smarter than its competitors, new challenges keep been thrown at the CIO …

Cloud computing is about complexity “de-materialization”: the complexity does not vanish (cf. SOA is not scale free), only its nature changes. If the IS is managing a complex set of business processes with a lot of service interactions, rapid changes and performance constraints, moving the IT on the cloud does not make the complexity disappear.

Back to this idea of Enterprise Architecture, the challenge is to produce a flexible architecture which may sound as an oxymoron. More precisely, one must create structure without rigidity. This prompts a few suggestions:

  • Be wary of “invariants” - there are few of them. So-called invariants are traps to install rigidity (for French readers, this is a topic covered in my first book)
  • Reference designs are living objects. Architecture relies on a number of reference designs: data models, service catalog, integration framework, etc. Any good Enterprise Architecture methodology will tell you how to build extensible designs. It looks like design violation is closely associated with innovation (?)
  • Diversity is key (a theme from the second day of the USI, I’ll be back in a moment).

This line of thoughts brings us back to biology and the general theme of this blog. In the living world, “invariants” (building blocs) are small and they are versatile (better than flexible). As explained by Albert Jacquard during his magnificent talk, diversity comes from reproduction, hence from randomness.

2. Uncertainty

The brainstorming session which generated all these ideas was about uncertainty. How to live, how to create, how to be relevant in a uncertain world?. Luc de Brabandere used a different set of scenarios, which are somehow similar to the four scenarios of Dan Rasmus in his book “Listening to the Future”.

I am a big believer in this approach to define a proper strategy for information system (cf. previous reference to the second chapter of my book). A scenario is not forecast, it is a virtual situation designed to foster creativity. This is a key point: the scenario’s value is not to be as close to what will happen in the future, it is to help build skills that will prove useful in the future (for the information systems or for the employees). In a world, the goal of the “scenario exercise” is to develop one's situation potential.

I have developed a “theory” over the years (cf. my other french blog), that the only tool to master uncertainty is gaming (as in "serious gaming"). Games are based on virtual scenarios but may develop true skills or help better understand the possibility of the future. I will return to this idea in a future post. What came to me as a conclusion of this afternoon session is that “Participants must participate”: passive viewing is of (almost) no value. This is deeper than it looks: since the scenario is not interesting per se, the value is the thought experiment and the collaboration that occurs between the participants while they play with the scenario. If a summary is proposed to a set of external listeners, most of the time it sounds dull or strange. I came to express this as a communication rule: if you need to report the result of a scenario-brainstorming session to your managers (or some other managers), it must keep the form of a role-playing exercise where the audience is actively engaged.

3. Value

This first section of this post dealt with “differentiation value”. What about “regular” value? The classical issue of the value that is produced by information systems was, as one would expect, central to this year’s USI, with one dedicated session on this topic. A good reference on this topic, by the way, is Amhed Bounfour’s book “Organizational Capital”.

That session started with an outlook on the issue, stating that there is neither consensus nor any method that would be applicable to the whole spectrum of issues (I agree, I wrote as such in the previously mentioned bookJ). Octo’s proposed approach is to define a “usage value”, very similar to Adam Smith’s definition: the value of a component of the information system is the additional amount of time it would take to perform the task without this component. It is expressed as a monetary value and actualized (over a given amount of time, such as the life expectancy of the software component.

It is a convenient measure, because it is easy to understand and relatively easy to evaluate, at least when an order of magnitude is concerned. Obviously the value must be capped by the total amount of money generated by the associated business process, in order to cover activities that would not exist without an IT platform (e.g., something that would require a million hands and generates little value). It has a few nice properties: it takes the quality of service into account, as well as the true deployment of the component. A beautiful application that is almost never used has a null value with this approach, a desirable property which is not true of all methods!

It is also a shortsighted measure, proving once again that it is hard to conciliate all objectives (cf. the introductory point). This measure does not take the future into account and how the information system is ready to embrace change. One could argue that “usage value” could be made future-oriented with a scenario approach, following the tracks of the previous section … and that’s true but that’s hard.

Adam Smith’s definition was quoted one day earlier by Daniel Cohen during a great evening speech. He talked about Philippe Askenazy's work on the Solow’s paradox (the absence of evidence of productivity gains due to IT). Philippe Askenazy, through a careful study though a large sample of data points, was able to show that IT can bring value only in conjunction with re-organization:

  • Value = Information System + Re-engineering

Those company who decided to reorganize themselves as they introduce IT in their processes showed significant returns after a few years, while those whom embraced IT but did not change, had nothing to show but costs. This is a great piece of evidence since it supports a claim easy to understand for any CIO: IT revolution only works if used as a lever to re-organize and re-optimize work (hence the importance of business processes).

I will conclude with a great idea from Pierre Pezziardi and Laurent Avignon: foster innovation through opening new territories, places where anyone can contribute to the information system. Obviously one cannot allow anyone to touch anything anywhere, hence the concept of “new territories”. This would be a “zone” where almost anyone can make a contribution (a piece of software) that may be used by others. There is a lot of value there:

  • Foster creativity and innovation
  • Improve the image of the information system, from a dark mystery to a friendly tool J
  • Support collaboration and engage a dialog with users
  • Find all the talents that hide in the company (i.e., not in the IT department) and who could contribute
  • Introduce diversity: the peaceful coexistence of safe, structured islands with active (even chaotic) agile platforms

For instance, one could open a few web services that give access to the heart of the information system (in a read mode for a first experiment J) and pick a sand box (such as Excel, Microsoft, Salesforce.com, Google App Engine) that is exposed with a SaaS (Software as a Service) philosophy. The ease of deployment (and adoption) is crucial here to make this a meaningful event. This should turn into a big innovation contest!

The general theme of Pierre and Laurent’s show “L’informatique conviviale” (it was nicely executed on stage in a very lively manner):

Diversity is necessary, Pleasure is key.

The first point is well developed in OCTO’s book “Une politique du Système d’Information”. Their dialectic analysis is too rich to be reported here, but may be summarized as follows: there are too many conficting constraints on the information systems to adopt a unique set of policies. Different parts of the information system must be governed with different approaches, to reconcile the needs for innovation, agility, safety, performance, reliability, etc. This “zoning” of the information system requires clear “passage rules” to support the exchange flows between the different zones.

The second point was the heart of their talk: build an information system that brings pleasure to the users and pleasure to the developers. This is a great insight since biologists tell us that there is no learning (continuous improvement) without pleasure. I will conclude with something that I learned a year ago at a conference about complex systems, on the process of learning for all living organisms, from the smallest to the most complex (us).

I had learned ten years ago about the PDCA cycle of continuous improvement from Deming: Plan, Do, Check, Act. There is a similar cycle that nature has invented for learning: Desire, Plan, Execute and Please. Desire creates the will to plan, to formulate a goal (for conscious livings), to get ready. The execution yields pleasure, which strengthen the desire and reinforces the cycle. I believe that pleasure is an integral component of corporate/collective learning and continuous improvement. Pleasure can take many forms, from pride (in Japan) to simple fun (in a Silicon Valley's start-up).

Sunday, June 7, 2009

Sustainable IT Budget in an Equation

This post is going to be technical, if you are allergic to math formulas, I suggest that you skip this or download my talk (in french) about IT value :)

The topic is the search for a sustainable IT budget, a thought that occured to me during the SITA conference on April 30th. I figured that I could get some characteristic equation of a sustainable state as a fixed point of the IT budget equations. I have spent quite some time (over 10 years) modelling IT costs. Some of it may be found in my last book, it is also a key part of my Polytechnique course. This course is focused on Enterprise Architecture, following the analysis of the CEISAR, but is also heavily oriented towards economic analysis (cf. the previously published bibliography).

Before I dive into the technical analysis, I'd like to stress out that I am not looking for a stable information system. Information systems are living objects, with new parts beeing constantly added and (hopefully) other parts being removed. What I am looking for is homeostasis, that is a property that is verified by the IS considered as an open system, even though the parts are changing constantly. Here, the property that I consider is that the budget grows at the same rate as the turnover of the company.

What I did is to take the cost model that is published in my book, and simplify it to reduce the number of necessary variables. The biggest simplification comes from using acquisition costs as opposed to function points to measure the application portfolio. I won't go into details today, it actually makes sense.

The budget is defined as the sum of the project budget P and the Operation budget O. Project are used either to add new applications or to improve/maintain/renew/update existing ones. The two ratios that are of interest (based on what on hears in IT conference or benchmarking) are:
  1. R = (E / E+O) = the % of the IT budget spent for projects (we also use r = E/O)
  2. n = Pn/P = % of the project budget spent on creating new applications
The model that I use is straightforward:
  • each year, Pn € is spent acquiring/building new applications
  • a given percentage (d) of the application portfolio is removed/destroyed
  • (P - Pn) is spend renewing a fraction of the remaining apps, so that the average life expectency of apps is A (measured in year, or we can say that the renewal rate is 1/A)
  • We suppose that the operation cost for an application that costs 1k€ is w k€ - A typical value is 25% (once again the references may be found here).
  • We also suppose that there is a productivity gain over the years of p%. That is, until the application is renewed (or discarded) its operation cost decreases by p% each year.
The IT budget is considered stable if both P and O grow at the same rate g as the turnover (such that r is also a constant). Looking for a stable state yields three equations:
  1. stability of the application portfolio (dS/S = g)
    we get: w* = g+d / nr (weighted operation cost, smaller than w because of p)
  2. renewal rate of the apps (P - Pn = S/A)
    we get n = 1 - 1/(w* Ar)
  3. stability of the operation expenses (dE/E = g)
    we get a huge formula:
    dE/E = [(1-p)(1-d)(1-1/A) - 1] + (w/w*)(1-d)/A + wnr

The math, although high-school grade, is a little tricky. I implemented an Excel version of this model, in order to get a first hand opinion about the behaviour of this simple model and also to check that my equation algebra was correct !
The good news is that I managed to get it right eventually, that is, the equations may be resolved to express r and n as a function of the other parameters. Here are my two theorems (I will write a paper with the proofs one day):
  • n = [A (g + d) ] / (1 + (g+d) A)
  • r = [g +d +p +(1-d)/A] / [w (1 - d + A(g+d))]
The second formula is approximated (I left aside a few "second-order" subterms) but is still quite accurate when compared with the real value from the simulation. Note that if you suppose that g = 0 (i.e., a stable IT budget for a stable company), the equations become even simpler.

What can be derived from these formulas ?
First, they actually provide useful values. If w = 25%, d = 5%, p = 4%, A = 10, we get R = 42% and n = 33%. For someone who has been a CIO for many years, these values make sense. It is important to realize that if n is, say, 50%, then the IS is unstable and will grow !

Second, they show something that I have claimed for a long time : you cannot compare companies using R as a benchmarking ratio if you do not know A ! I have heard so many hours of bullshit SOA infrastructure presentation, which tell how to increase R without taking the proper parameters into account (this is actually why I decided to write my first book: I was fed up listening to consultants who had no clue what they were talking about). I heard the same mistake made by experts (at the VP level of large US software companie). Hence I am glad to have such a compact counter-argument, something I have been looking for over ten years.

Last, the r formula gives a good summary of what can be done to increase the amount of money that can be spent on projects:
  • clean-up old applications
  • increase productivity for operations

Nothing new here, but it is nice to see that age-old sound advice (often ignored) may actually be proven.
 
Technorati Profile