Biology of Distributed Information Systems: December 2006

The following is a translation of a discussion with three students from ESIEE: Philippe Glories, Erwan Le Guennec, Yoann Champeil.

- According to you, where mostly does one finds IA in everyday’s life ?

There are two types of answers, according to the scale of systems. Large scale systems require distributed AI (multi-agents, etc.) , whereas smaller scale systems may rely on the many technologies that have been developed so far for “local use” (such as expert systems, constraints, knowledge bases and inductive or case-based reasoning, fuzzy logic, etc.). Actually AI is already everywhere ! In picture cameras, cars, central heating, … and equally in existing applications from current Information Systems. As far as embedded systems are concerned, fuzzy logic (and neural nets) are heavily used for intelligent control. Information Systems applications make use of rule-based systems (sales rules, monitoring rules, and so on).

Large-scale applications of distributed AI are scarcer, as far as I know.

- May one talk about “strong AI” (as opposed to “weak AI”, see http://en.wikipedia.org/wiki/Strong_AI) to deal with autonomy ? (The autonomy would require self-awareness to manage oneself)

This is both a possible and useful distinction, although it is difficult to manage because of the continuous nature of autonomy. However, one may distinguish between AI based on applying rules to a situation and AI that uses a model of the problem it tries to solve, together with a model of its own action capabilities (a meta-representation of oneself being a first step towards consciousness), to that it may adapt and devise different appropriate reactions.

I would present a different type of distinction: made (or “built”) AI vs emergent (or “born”) AI. In the first case, a piece of software produces (intelligent) solutions that are predictable as a consequence from the original design. In the second case, the nature of solutions is harder to foresee, they emerge from the components together with the reflexive nature of the application (meta-model). It is another way to look at this weak/strong difference.

- Is the creation of autonomous AI already feasible ? Do we hold the technical means ? If missing, are we about to get them in a few years ? Are there any special theories that are requires to develop these ?

I am no longer enough of an expert to answer this question with any form of authority. I believe that the answer is positive, in the sense that we have all that we need, from a technology standpoint, to create autonomous AI. It is mostly a knowledge representation issue.

- Do you feel that the creation of autonomous AI is advisable and desirable ? From an industrial perspective? From a society perspective? From a scientific perspective ?

This is a large and difficult question !

I would answer positively, since I believe that only a strong AI approach will enable us to break the complexity barrier and to attack distributed AI problems. This is especially true for Information Systems issues, but I believe this to hold for a more general class of problems. To say it differently, solving successfully distributes problems may require to relinquish explicit control and to adopt an autonomous strategy (this is obviously the topic of this blog and of Kelly’s book).

There are associated risks, but one may hope that a rigorous definition of the meta-model, together with some form of certification, may help to master those risks.

Obviously, one of he risks, both from an industrial or social perspective, it to see the emergence of systems with “too much autonomy”. As a consequence, a research field that needs to be investigated is the qualification of “degrees of freedom” that are granted to autonomous systems. A precise answer with collide with classical indecidability problems, however abstract and “meta” answers may be reachable.

- From a philosophical point of view, do you see autonomous artificial intelligence as a threat to mankind ?

No, from a philosophical point of view, autonomous AI is an opportunity. There is a danger, however, from both an ethical and practical standpoint. Practically, the abuse of autonomy without control may have negative consequences. From an ethical point of view, there is a potential impact on society and work economy, as the delicate balance between production and consumption roles may be affected (which is true, by the way, for any method of automation).

- To summarize, would you qualify yourself as an opponent or an advocate of autonomous AI ?

Without a doubt, I see myself as a proponent of AI ! The reasons are, in part, those expressed in this blog: autonomous AI is the only approach to resolve complex problems, for which a solution is really needed. I see delivering the appropriate level of quality of service in an information system an example of such a worthy cause :)

A last remark : the scale issue is really key here. The same rules should not apply …

(1) On the small scale, components should be built using a “mechanical vision”, with proper specifications, (automated) testing and industrial quality using rigorous methods. When “intelligent” behaviour is needed, classical AI techniques such as rules or constraints should be used, for which the “behavioural space” may be inferred. Although this is just an intuition, I suspect that components should come with a certification of what they can and cannot do.

(2) On the other hand, large-scale systems, made of a distributed network with many components, should be assembled with “biomimetics” technology, where the overall behaviour will emerge, as opposed to be designed. My intuition is that declarative, or policy-based, assembly rules should be used so that a “overall behavioural space” may be defined and preserved (which is why we need certified components to start with). The issue here is “intelligent control”, which requires self-awareness and “freedom” (autonomy).

Since "Autonomic Computing" is a key concept related to the topic of this blog, I have translated an extract from my book ("Urbanisation et BPM"). It is not the best reference on the topic :) but it will give an overview for readers who are not from the IT community. This extract was written in 2004 (therefore, it is no longer precisely up to date ...)

Autonomic computing (AC) is the name given to a large-scale research initiative by IBM, on the following: mastering the ever increasing complexity requires to make IT systems « autonomous ». More precisely, « autonomic » is defined as the conjunction of four properties :

(1) self-configuring : the system adapts itself automatically and dynamically to the changes that occur in its environment. This requires a « declarative » configuration, with a statement of goals and not a description of means. A good example of such a declarative statement is the use of SLA as parameters. When something changes, the system tunes its internal parameters to keep satisfying its SLA policy.

(2) self-healing : the management of most incidents is done automatically. The discovery, diagniosis and repair of an incident are made by the system itself, which supposes a capacity to reason about itself. Therefore, such a system holds a model of its own behaviour, as well as reasoning tools that are similar to so-called “expert systems”. This is often seen as the comeback of Artificial Intelligence, although with a rich model, simple and proven techniques are enough to produce an action plan from incident detection.

(3) self-optimizing : the system continuously monitors the state of its resources and optimizes their usage. One may see this as the generalization of load balancing mechanisms to the whole IT system. This requires a complex performance model, which may be used both in a reactive (balancing) and proactive (capacity planning) manner (cf. Chapter 8).

(4) self-protecting : the system protects itself form different attacks, both in a defensive manner, while controling and checking accesses, and in a proactive manner though a constant search for intrusion.

(note: To start with an excellent introduction to « autonomic computing », one must read the article « The dawning of the autonomic computing era » from A.G. Ganek and T.A. Corbi, which may be found easily on the Web)

The interest of these properties and their relevance to the management of IT systems are self-evident. One may wonder, however, why a dedicated research initiative is called for, instead of leveraging on the constant progress of the associated fields of computer science. The foundation of the Autonomic Computing initiative is the belief that we have reached a complexity barrier: the management cost of IT infrastructure (intallation, configuration, maintenance, …) has grown relentlessly with their size and power, it represents more than half of the total cost. Next generations of infrastructure will be even more complex and powerful. IBM’s thesis is that their management is possible only if it becomes automated. In the article that we just quoted, a wealth of statistics is given that shows the ever-increasing part of operational tasks, while at the same time the business reliance on IT system is equally growing, resulting into financial consequences of IT outage that are becoming disastrous.

Therefore, a breakthough is necessary to overcome this complexity barrier : in the future, IT systems need to manage and repair themselves automatically, with as little human intervention as possible. This may sound far too ambitious and closer to a “marketing public relation initiative” than a directed research project : after the eras of distributed computing, autonomous computing and ubiquitous computing, here comes autonomic computing. This initiative flows continuously from computer science research themes of the past 30 years. Some of the problems and solution directions are actually old. However, two paradigm shifts have occured. First, the center of attention has evolved in the last few years from software (component, application) to software system (hence the focus on enterprise archictecture and business process). In the 80s and 90s, the focus on problems similar to those of AC has given birth to the concept of intelligent agents, but their application to the full scale of corporate IT has proven to be difficult. From a CIO perspective, the focus on enterprise architecture and infrastructure is a welcome shift, especially since the research budgets are impressive. On IBM’s side, this is the main theme for a R&D budget of over 5 billions dollars. Most other major players also work on similar initiatives, even though the vocabulary may vary.

The second paradigm shift is the emergence of the biological model of incident processing. This is a departure from an endless search of methods that would produce software free from bugs and multiple back-ups that would guarantee a complete availability of infrastructures. Autonomic Computing applies to the “real world”, where software contains many defects and run on computers that experience all sorts of outage, and draws analogy from “living organisms” to deliver fault-tolerant availability. Living organisms cope with a large spectrum of aggressions and incidents (bacteries, viruses, …) with a number of techniques: redundancy, re-generation, self-amputation, reactive aggression, … etc. Similarly, autonomous IT systems need to be designed to perform in an adverse environment. The analogy and inspiration from biology is a long-lived trend in computer science. For instance, the use of “ evolution theory” as an optimization strategy has produced genetic algorithms or swarms approaches. To quote a NASA expert, who is applying the concept of swarm to micro-robots, some part of the design is replaced by “evolution as an optimization strategy”.

Even though there is a form of utopia in this field and many goals are definitely long-termed, this is not science-fiction. There is an evolutionary course towards autonomic computing and the first steps correspond to technologies that have already demonstrated in research labs. Actually, there is a symmetry in the founding argument: if we are not able to provide our new systems with more autonomy, there progress in scale and complexity will soon reach a manageability limit. It is, therefore, logical to bet on the forecoming availability of autonomic capabilities. The issue is not: “If the systems become autonomous”, is is “when” and “how”..

Automic Computing is not a philosophy, it is an attitude, according to the experts. Systems should be designed to be autonomous from the very early design, even though some of the choices and options may still be very rudimentary. An illustration drawn from the practice of real-time systems is the heartbeat principle. A heartbeat is a periodical simple signal that each component broadcast as a testimony to its status (alive, weakened, distressed, etc.). The « heartbeat » principle comes from the real-time systems community. For instance, it is used for NASA ‘s satellites under the « beacon » designation. On this topic, one may read, the papers from the NASA research center in the proceeding of the autonomic computing workshop that was part of EASE’04.

This proactive attitude seems relevant to the design of information systems. The paper form Ganek and Corbi makes a strong argument in favor of evolution, as opposed to revolution, based on a roadmap that goes from “basic” to “autonomous”. The progression is labeled through the steps: managed, predictive, adaptative and autonomous. It is completely relevant to the transformation of business process management, as described in Chapter 8.

AC and Enterprise Architecture

The foundations of autonomic computing are well suited to the field of Enterprise Architecture, since IT components make a large and complex system. Hence, many aspects of the AC approached are relevant to the re-engineering of IT. We shall now consider three: autonomic computing infrastructure, autonomic business process monitoring and “biological” incident management. There is a clear overlap, but we shall move from the the most complex to the more realistic.

The concept of autonomic IT infrastructure has received the most media attention since it is the heart of IBM strategy, being a cornerstone of the on-demand computing (cf. 11.3.2). An autonomic infrastructure is based on computing resource virtualization; it manages a pool of servers that is allocated to application needs on a dynamic basis. The global monitoring allows load balancing, reaction to incidents through reactive redistribution, and dynamic reconfiguration of resources to adapt to new needs. The ultimate model of such an infrastructure is the GRID model, which implies that grid computing is de facto a chapter of autonomic computing. A grid is a set of identical anonymous servers that are managed as one large parallel computing resource. The management of Grid computing has already produced valuable contributions to the field of AC. For instance, the most accomplished effort to standardize with “quality of service” means in the world of Web Services, WSLA, comes from the world of grid computing. One may read « Web Service Level Agreement (WSLA) Language Specification”, and see, for instance, how performance-related SLA are represented (under the measurement headins, 2.4.8). One may see the grid as a metaphor of tomorrow’s data center. Therefore, even though the grid is often reduced to the idea of using the sleeping power of a farm of PC during the night, most CIOs should look into this field as a guideline for their own infrastructures.

A grid is by construction a resource that may be shared. It has interesting properties of robustness, fault-tolerance and flexibility. It is, therefore, a convenient infrastructure to bridge the gap between the autonomic and the on-demand computing concepts. A “on-demand” infrastructure is a flexible computing infrastructure, whose capacity may be increased or decreased dynamically, according to the needs of the company’s business processes. Such an infrastructure may be outsourced, which yields the concept of “utility computing” as a service, or it may be managed as an internal utility. There is more to “on-demand computing” than the infrastructure aspects, which will be covered at the end of this chapter. The synergy with autonomic computing is obvious: the business flexibility requirements of the on-demand approach demand a breakthrough in terms of technical flexibility which translates into autonomic computing, and is well illustrated by the grid concept.

This vision is shared by most major players in the IT industry. However, we consider this as a long-term target, since the majority of today’s application portfolios in most companies may not be migrated smoothly onto this type of infrastructure. This does not mean, as was previously started, that some of the features are not available today. For instance, “blade server” infrastructures deliver some of the autonomous benefits as far as self-configuration, self-management and self-healing are concerned.

On a mid-term perspective, one may expect the field of autonomic computing to have an impact on business process management. The field of OAI (Optimisation of Application Integration, cf. chapter 8) is equally a long-term, complex research topic, but which will benefits from small-steps improvements. Two family of software tools are currently evolving towards autonomous behavior: integration middleware and monitoring software. BAM (business activity monitoring) software is integrating capabilities to model and to simulate the occurrence of business incidents. Adaptative middleware or “fault-tolerant” middleware are emerging. For instance, look for the Chameleon project, which belongs to the ARMOR approach (Adaptive, Reconfigurable and Mobile Objects for Reliability). Chameleon is an integration infrastructure which combines adaptability and fault-tolerance.

On a short-term perspective, one may apply the “biology metaphor” of autonomic computing to formalize the management of incidents, as it often occurs in the “real world”. To summarize, one may say that system operation relies on two visions: a mechanical and an organic one, to deliver the continuity of service that is required by its clients. The mechanical vision is based upon redundancy, with back-up and spare copies of everything. When something fails, the faulty component is replaced with a spare. Depending on the recovery time constraints, this approach translates into a disaster recovery plan, into clusters, fault-tolerant hardware, etc. For instance, one may look at the discourses from Carly Fiorina, HP’s former CEO, who focuses on mutualization, consolidation and virtualization under « Adaptative Enterprise » vocable. The same keywords appear in SUN’s or Veritas’s strategy about servers and data centers.

The organic vision is based upon alternate scenarios and re-routing. It is derived from an intimate knowledge of business and its processes. It consists of a network of “alternate sub-processes”. It requires a strong cooperation between operation managers, application developpers and business owners. Only a precise knowledge of the business and its priorities enable a transverse team to find valid “ alternate approaches”. We named this approach « organic operations », which may be described with the following goals :

Create an operations model which supports the definition of scenarios and takes all operations tools and methods into account (some of which are represented with hatched boxes in the figure). Stating existing recovery procedure in a formal and shared manner is both the first stage of maturity and a possible step towards automation.
Create multiple levels of “reflexes and reaction”, which are automated incident management rules. An interesting contribution from the biological metaphor is the separation between reflexes (simples, distributed) and reactions (centralized, require a « conscient » intelligence).
Create the tools which allow us to make “intelligent” decisions. These may be representation/cartography tools (which is linked to the topic of business activity monitoring) or planning/simulation tools : what would happen if the message flow from component A was re-directed towards B ?

Biology of Distributed Information Systems

Sunday, December 17, 2006

Strong AI and Information Systems

Sunday, December 3, 2006

Autonomic Computing

AC and Enterprise Architecture

My Links

Blog Archive

Other Blogs and Sites