Biology of Distributed Information Systems: 2008

Saturday, November 29, 2008

Information Technology for the Chief Executive

At last, the english edition of my second book is available. It is somehow simplified (no technical appendix and a shorter, leaner introduction) but the body is the same as "Performance du Système d'Information".

At the same time, I just received the "Best IT book award" from the AFISI. The AFISI is the French association for engineering and information systems. I am quite honored to receive this prize when I look at the list of the previous winners.

Since I have used a self-publishing company and performed the translation myself (with the help of a professional interpreter), trying to promote this book is going to be an adventure.

Sunday, November 2, 2008

Very Cloudy Weather: will it rain on the « Cloud Computing » Parade?

This blog has been very quiet for the last few months … and will remain as such until I am finished preparing my lecture about Information Systems for Polytechnique (2009). I should have, at that point, accumulated quite some material for future posts, since I am digging quite hard into the theoretical foundations of "Information Systems".

However, I have been reading so many newspaper articles about "cloud computing" during the last few weeks that I feel like putting a few thoughts on paper. The turning point was the article from "The Economist" last week: as is rightfully noticed, too much hype will likely cause "disillusion". I am actually totally convinced that "cloud computing" is relevant to many businesses and that it represents a revolution that is coming. I have actually explained some of this in my other blog. If you are new to this, a really simple description of the benefits could the following:

Fault-tolerance through the implicit redundancy. This is, however, an architectural issue. It is not enough to rent your computing infrastructure from Amazon, Google or Microsoft to get this benefit. You also need to implement your information system with an architectural paradigm which takes advantage from the availability of multiple servers.
Super-computing performance through parallelization. The same remark applies: it is true that new techniques for data mining or real-time event processing (two examples) may be tried successfully on the cloud through a MapReduce approach; it also requires a significant amount of work if you start from your legacy application.
Reduced costs of operation (TCO) through the use of standardized and mass-produced units. Look at the price of the TPMC according to the type of hardware and you will get the idea. I won't dwell on this, this is explained everywhere in the newspaper articles I was mentioning.

There is an implicit warning here: although "cloud computing" is "the way to go" for many cases/enterprises, there is a learning curve and a price to be paid. Especially, it will take time and energy to move from an existing architecture to one that can be "migrated on the cloud". However, "Cloud Computing" is not the "ultimate solution" for everything and I do not believe for a second that "software is dead". Let me give three simple reasons for which one may decide to stay on the "firm ground" as opposed to ""move to the cloud":

The risk of loosing privacy and control (cf. R. Stallman's reaction). There are many aspects, including legal and societal, to this issue. This is the part that is reasonably well covered in the papers (such as The Economist). It is clearly a valid point, but technology and service segmentation might alleviate this issue in the future. There may exist private clouds, secure clouds, encrypted clouds (where the encryption is managed by a third-party), etc.
Latency : accessing the cloud is not instantaneous. Even if the protocols were truly optimal (and they are not, web service invocation carries a significant overload), it takes some time to access to a distant data center (the light takes 10 ms to travel 3000km, and 10ms is significant for many high-performance computations or transactions). This is why MMORPGs rely on a RCA (rich client architecture – a significant part of the work is done locally).
Computational overhead: making each service invocation a web service invocation is not practical for high performance computing. There is a proper level of granularity for encapsulating a piece of computation/transaction into a cloud service. This is actually something for which there exists a significant amount of history: when companies try to develop a SOA architecture, they have to get the service granularity right. Otherwise, they "discover" that application servers cannot carry an infinite load and that they exhibit a performance overhead when compared with more traditional approaches (a RPC – remote procedure call- is more expensive than a procedure call, and a WS call is not the cheapest RPC).

Hence we get two negative answers to the following questions:

"Can I move all my IT onto the cloud without changing my apps and their architecture ?" (the cloud as the universal IT outsourcer). The answer is negative when performance (throughput or latency) constraints exist.
"Can I migrate all my apps towards a SOA architecture that will live in the cloud, made from SaaS (Software as a Service) ?". The answer is negative because some services are better produced locally. The economy of scale of the SaaS requires some form of mutualization (running a service remotely for a unique client is not a clear winner). If performance constraints prevent from breaking an application into "tiny components" (because of the cost of recomposition), many apps are too specific to be run as SaaS.

To summarize, I would propose two "axioms/theorems/conjecture" about "cloud computing":

"Cloud Computing" will succeed because of the combination of two facts:
1. The grid architecture is the best architecture to provide true scalability and superior availability (fault-tolerance)
2. Grid architecture exhibits "economy of scale", that is, can be run more efficiently when operated on a very large scale. The more identical servers one operates, the lower the TCO per server is.
"Service Oriented Architecture" (SOA) is not scale-free: there is a balance to be found between "recomposability" and performance. Hence the cloud may not host all the necessary services that any enterprise may need.

The first "theorem" could be labeled the "Google Theorem" since Google operations are an existence-proof of these two affirmations. They could be justified independently … and make perfect sense to anyone who has been in charge of running the IT department of a large-scale company. However, the same experience of being the CIO of a large company also suggests that performance and data synchronization/distribution issues will prevent quite a piece of the IS application portfolio to be moved to a cloud architecture, at least for the next few years.

Monday, June 2, 2008

Flexibility and Biology

In a previous post I have listed flexibility as one of the four pillars of agility. I defined flexibility very loosely with the following sentence: " the technical ability to cover a large span of business situations and processes with the existing set of IT services and capabilities."

This requires a little bit of further explaining. Especially, there are two ways to obtain this ability which are both relevant:

The ability to implement change in order to achieve functional flexibility. This is foremost a matter of architecture. Hence it is the first thing that comes to mind, in the context of "SOA and Agility". This flexibility is expressed at the component level (the ability to parameterize: change the functional behavior though a change of parameters –cf. the 12^th chapter of my book on SOA), and at the "integration level", i.e. the ability to integrate a new component quickly and efficiently.
The ability to adapt to a new situation "without change", that is without having to stop and modify one or more components. This kind of flexibility is a combination of "meta-data architecture", that is, the ability to change the behavior of information systems through the change of parameters and the ability to do so dynamically. This is the most important part, and the one that is most often overlooked. In many cases, changing parameters is still a new project that requires testing, synchronization and going live with a new version.

We can summarize by defining flexibility as the combination of functional flexibility (ease of change) and operational flexibility ( at run-time, without change to the existing binaries).

The first kind may be labeled "mechanical flexibility". The analogy makes sense because when you modify a mechanical device, you must (most often) shut it down before changing or adding a new piece. Most of what you can read about IT agility is actually of this kind. New technology, rule-based engines, meta-data parsers are all designed to extend the functional flexibility. This is definitely true of integration infrastructure technology that I have come across during the last few years. There are a few noticeable exceptions (such as using UDDI for dynamic discovery of services) but … surprisingly … they have not made it to mainstream IT yet.

I will call the second kind "organic flexibility" for obvious reasons: changing the SI without shutting it down (without the "go live" step) reminds us of living organisms which grow, change and adapt themselves without having to halt their biological processes. The boundary is subtle: everyday running systems are modified by the input of their users. So what? It turns out that for the same systems (at least those that I known of) there also exists "hidden"/admin parameters that cannot be changed without stopping and restarting the system. It also turns out that most major changes require writing new code, to re-compile, re-link or re-connect, hence a new "go live".

I must confess that I have been trained for twenty years to think about flexibility in a mechanical way. However, I now believe that the flexibility that matters the most is the organic kind. This will become truer and truer as the stringent requirements of real-time digital business will demand no shut-downs of any kind.

Organic flexibility is a matter of organization, people and processes as much as of technology. However, there is a technological part: one may not achieve this behavior (easily) with legacy technology. A key is obviously the availability of dynamic meta-data that is interpreted as parameters and modifies the behavior in a deep way. The challenge starts when different components must share a change of parameters to achieve a new behavior. This means and requires that propagating new parameters becomes a IT run-time service. The technology for this type of propagation is there, once again we find rule-based engines, scripting engine, automation tools such as processflows. If I reflect on the IT projects that I have been a part of, 80% of the energy is spent making sure that there is enough flexibility ("enough parameters") to make the component "future-proof", and 20% is spent thinking about the processes that are used to change these parameters. Designing an organic information system requires to change the ratio to the opposite.

Thursday, May 8, 2008

Software Development Productivity Gains ?

As I am close to getting my last book published, I figured I should publish a few pages once in a while to trigger the interest and to gather a few comments.

This first extract deals with the matter of productivity gains in the information system. This is a complex topic, and it is really the whole of this book that provides an answer to it. We will merely highlight the most important points and indicate the chapters where they are described in greater detail. In this initial overview we will distinguish the project aspect, dealt with in this section, from the operations aspect, dealt with in the following section. These can then be broken down into three major parts: technology, industry and architecture.
Since the dawn of IT, every year has seen the advent of new technologies that promise spectacular gains in productivity[1]. In the past, people talked about programming languages with a high level of abstraction, artificial intelligence, computer assisted graphic design, synthesis and automatic program checks, etc. Today, the hot topics in the literature consist of meta-programming, service architecture, component assembly, to name but a few. We will go back over these technologies and their promises in chapter 9. There is certainly progress, in terms of function points per dollar and in terms of function points per man-day, but this progress is coming at a very slow rate. If we had to provide an order of magnitude based on the comparative analysis of the productivity figures from the late 80s, and those that are found today, I would say that we have gained a factor 2 in 20 years.[2]. It is a matter of overall productivity (on the project perimeter) according to the technical axis (in FP/man-day). The economic aspect is different, because the IT industry has undergone significant changes.

The software and system integration industry brings with it structural gains in productivity according to the following factors:

Globalization, which makes it possible to move a part of production into countries where there is a lower cost of labor. This delocalization produces a substantial economic advantage, when it is relevant, particularly for stable needs. We shall come back to this in chapter 9.
Mutualization of needs, which leads to the continuous release of software packages that capitalize on the needs shared by different companies. The associated gains may be quite sizeable at the level of a project, but are only moderate at the level of the average of the ISD.
Maturing and professionalization of development practices in relation to the development of tools. The continued formalization and optimization of development processes by using standardized repositories as support[3], is a good practice that is vital for an ISD and its providers.

These productivity factors may have greater effects than technology factors, but their implementation is not easy, and requires discernment in terms of the scope of application.
The third area of progress is the information system architecture, which we distinguish from the technology section, because it involves what is known as “enterprise architecture”, a subject which spans information technology, business processes and “strategic alignment” of the information system. Enterprise architecture is used as a complexity control strategy so that the integration costs of a new application do not reach an irreversible level as the IS grows. This is mostly a defensive strategy that is used when the complexity is precisely such that integration becomes the predominant part of IT projects. As a result, the effect that is noticed is not a “productivity miracle” but the return to a “comfort zone” where integration costs are proportional to the functional complexity of the application.
The conclusion we can make at this stage of the analysis is that there are undisputed productivity factors in terms of information system development but these are not necessarily easily deployed; the gains remain measured.

The visible part of these gains is all the more modest that another part is consumed by three heavy trends of enterprise IT[4]:

Additional requirements, as regards availability, security, ergonomics, etc., which induces more complex IT solutions with an equivalent functional perimeter.
Inter-operability requirements, whether within the company or in conjunction with partner companies. Enterprise IT has been experiencing a double revolution for 20 years. Firstly, its functional scope grew along with inter-application connectivity requirements (unlike departmental IT in the 70s and 80s). Secondly, the concept of the extended company arose, i.e. the necessity to connect the information systems of companies that are collaborating on processes.
The abstraction of software functions to control complexity. Controlling complexity leads to the creation of layered software, with the layers corresponding to different levels of abstraction, and makes it possible to have them evolve without having to get a sense of them as a whole[5]. These methods make it possible to control application systems that are very rich and complex but generate additional costs (the actual size exceeds the expected size).

Users of general-public software such as office suite tools will have noticed that more and more resources are required to run their favorite applications (e.g. word processing), mostly due to these three reasons.

[1] We have already referred to F. Brooks’ article entitled “No Silver Bullet: Essence and Accidents of Software Engineering”, which examines the intrinsic difficulties in terms of software productivity. This article can be found in T. DeMarco and T. Lister’s Software State-of-the-art, which includes a selection of the best articles from the late 80s.
[2] This assessment is an average ratio which hides a great level of variability, according to the type of problems to be dealt with. This will become clearer when inter-company benchmark test results are published every year.
[3] The repository which is gradually becoming the standard one in IT project development is CMMI (Capability Maturity Model Integration). For more on CMMI, see the first section of M.B. Chrissis, M. Konrad and S. Shrum’s CMMI – Guidelines for Process Integration and Product Improvement. You can also find many sources online, including those of the SEI (Software Engineering Institute).
[4] This phenomenon is more commonly known in technology circles as the “rebound effect”. For example, the efficiency increase for heating a square foot of living space is offset by the increase in the average house size; the progress in car fuel consumption has helped to develop the usage, instead of simply generating savings, etc. In IT, the “complexity race” absorbs a proportion of the gains by the rebound effect.
[5] The concept of “layers” is hard for non-computer specialists to grasp. It is quite relevant to compare it to geology: layers corresponding to age, with a successive deposit of “functions”. For example, I have the experience of precisely analyzing the code of a major American publisher on typical functions whose performances seemed to be disappointing. Using the debugger as an analysis tool showed several “layers” which have been built up as the software has undergone different changes (from an initial effective kernel to multiple generic interfaces). This concept of “sedimentation” will be introduced in section 2.4.1.

Sunday, April 13, 2008

Agility, Biology and SOA

I have just finished writing a new chapter for the third edition of my book « Entreprise Architecture & Business Process Management » (which is only published in French, unfortunately). This new chapter deals with "SOA and Agility", two buzzwords of these IT times.

A part of this chapter deals with the concept of "sustainable architecture" following what I wrote in my previous post. I have proposed to paraphrase the definition from the Brundtland report:

"Sustainable development is development that meets the needs of the present without compromising the ability of future generations to meet their own needs."

into a definition of that spells out the "sustainable development" of the IT architecture:

"Sustainable architecture is the Entreprise Architecture that enables the development of today's IT services without compromising the ability to develop tomorrow's services because of an ever-increasing complexity or the induced scarcity of resources (money or skills)."

To gain this sustainability, I have been advocating for quite some while for "fractal" architectures (hence, "fractal" methods). I mean methods that can be broken into pieces and applied differently (and at different rates) to different situations. This fractal view applies to middleware, to enterprise architecture, but also to data models or to process models. The information system (as a whole) needs to grow (as a living object) according to different rates of pressure that are applied to its different parts. The CEISAR architecture documents make a very elegant and clear point about the value that exists when breaking long "end-to-end" business processes into smaller, more manageable pieces.

The major part of this new chapter is devoted to agility (mostly because I hear too much nonsense about solving the agility issues with technology). To give a short summary, I see four forces that contribute to agility:

Anticipation: being agile is implementing changes rapidly enough to meet TTM (time-to-market) constraints. A smart way to think about it is to start early. It is not a "solution in itself" (since many changes may not be anticipated) but it is definitely part of the "whole package" since many "smart decisions" (the one that will increase agility) take time to get implemented.
Flexibility: the technical ability to cover a large span of business situations and processes with the existing set of IT services and capabilities.
Leanness: the process of applying changes to the information system (for instance, developing a new IT project) needs to be as lean as possible (cf. my other blog for more details, although I will most certainly be back on this topic). Lean obviously means short from an organizational perspective (the longer the chain between the developer and the problem owner, the less agility is observed). The ultimate goal is to see IT project disappear. The "absolutely agile" IT department is the one who is not needed to let the users adapt their information system to the business changes. Although it may sound silly, it is actually a sound goal, but one that requires a lot of anticipation.
Skills: one needs to be competent to be agile! This also sounds dumb, but actually it is a rather profound observation, which is backed by observation. The fear of failure and the pressure in today's modern organizations means that any lack of full understanding or full confidence will translate into multiple redundant checks, back-and-forth, "safety precautions", etc. Another dimension is the ability to profit from the opportunity provided with the constant progress of technologies (from new IHM techniques (Ajax, mash-up, etc.) to new middleware tools, including more radical options to leverage the value in SaaS – software as a service).

Notice that, except for the second one, these are organizational traits and not technical ones. This, by the way, is also true of being able to leverage the value of SaaS. The software services provided by SaaS are not especially great. Using them is smart from a business and an organizational perspective, and requires "business agility/flexibility" even more than technical skills or abilities.

As I wrote earlier, SOA's main contribution is to provide a framework for sharing and reusing services. It also helps to achieve agility, but more as a second-order consequence rather than a direct cause-effect path, and not without efforts and dedication. The first-order benefits of reuse and share are cost and complexity reduction. On the other hand, as I have already explained earlier, SOA is foremost a governance principle, before being an architecture framework. If the stakeholders do not play "the game", the existence of a beautiful ESB (enterprise service bus) will not provide as much benefits as expected. As a governance principle, SOA sets principles and rules that must be followed.

If this sounds very different from a crude idea of what a "biology-inspired" IS could be … it is indeed! The strength of biomimecry as a design principle is found in what nature does very well:

Flexibility/ adaptability (to new situations)
Reliability / Survival ability

One thing that nature does not do so well is the "economy principle", the "simplicity of design". When biologists study mechanisms in living objects, they are often surprised by the level of redundancy, by the complexity of the processes (which have been obtained through evolution). By the way, when we look at legacy code, we often get the same impression J …

There are two consequences:

Biomimecry is not enough as a design principle. The economic reality of business requires implementing a "principle of simplicity" and a constant strive to reduce costs. I advocate for a "fractal architecture", which may sound like an oxymoron to some, as a weakened form of the necessary dictatorship of principles and rules. Rules are necessary to produce simplicity.
For all the good reasons mentioned in earlier posts, it is a good idea to borrow from Nature to design systems which are at the same time: complex, reliable, flexible (doing any of the two is OK, doing the three is a challenge). However, this will come at a cost (mostly redundancy). To justify the effort, one must look at the "big picture" and consider full "life cycles" of IT services and take multiple business scenarios into account. This means that the projects that will bring biomimecry into the IS architecture will not have a simple ROI justification, but rather a sophisticated one. This is discussed in my book, which should be made available in English (at last) in a few months.

Nature is organized around a simple principle that says that evolution cannot be stopped, which also applies to information systems. Hence the only mechanism to cope with complexity is to clean, to remove unnecessary things. Nature does great job of cleaning (and recycling) unnecessary things in living organisms. This is another aspect which may be borrowed, especially when managing a SOA. A key issue is to master the complexity (and the sheer number) of services. The book will mention a few tools and techniques to manage this catalog of shared services, but the first principle is to limit its size. Since evolution cannot be stopped, old services need to be removed.

This blog is moving very slowly (although one would think that there are enough provoking statements in this message to generate a few comments). I actually thought that it was dead, but there are a few readers coming everyday (that's what the stats tools say). I may accelerate the posting rhythm in a few months, when I'll be preparing a new course on the "Theory of Information System".

Biology of Distributed Information Systems