Tuesday, May 15, 2007

Multilevel Architectures - Bane or Boom?




Marsden Bloise once described life as having a "curiously laminated quality."

Life on earth does have levels, and they have important mathematical consequences.

In fact, the multi-level model is one we find reasonably familiar and can live with. We structure our corporations and government to have layers and levels, with people one "one level" reporting to people on "a higher level."

Not only are there levels, there are gaps between the layers. It is almost like a quantum mechanical model, where there are legal levels and forbidden zones.

In the world of large-scale enterprise computing, there are officially levels (see the OSI model), where there is a hardware level, a messaging level, an application level, etc. The goal of each level is to function so well that it essentially becomes a perfectly flat, stable platform or metric on which the higher levels can be built. A perfect level "goes away" and "falls out" of the equations.

So, in the best world, when nothing is going wrong, an application such as Microsoft Word can say "save this file!" and, behold, it happens. The application doesn't need to concern itself about the details of what brand disk-drive is in the computer, or how may empty slots of what size are there, or how to chain them together and break up the document into chunks that size for storage and retrieval later.

Or, in business, workers and "the boss" or the next level of management up have a functioning gap between them. The boss doesn't really want to know the details of how something happens, and only wants a simplified, almost cartoon-level sketch, and mostly cares, yes or no, did that happen. The employees see all the details and prefer the boss not "micromanage". The employees have little idea what the boss does all day - so long as reasonable work tasks come down the pike in reasonable order, it's good. The boss has little idea of the complexity of many tasks, or the pains that have to be taken to accomplish them.

On the upside, this makes "management" even possible, because otherwise the world would rapidly become way too complex for anyone to ever comprehend, and the largest business would probably be something like 200 people.

And, if perfectly managed, lower level computer "infrastructure", like plumbing or electrical wiring, should be completely invisible. The thousand upgrades a day, putting in new hardware, swapping out old networks, installing new security patches, upgrading the database or operating system, should all be done "seamlessly" and at most result in a slight slowing down of normal response time.

One downside of this is that it is very easy for the upper levels to mistake the perception with reality. The classic problem in preventive maintenance is that, if perfectly done, all problems are seen coming in advance, headed off, and so "nothing ever breaks" -- and consequently upper management, at the next budget crunch, decides they can lay off the maintenance department because, who needs them, nothing ever breaks! So, they do, and only later discover what it was that the department did.

A second downside is that upper management is shielded from details by multiple layers of oversimplified sketches to the extent that they mistakenly believe that the tasks people at the front, or on the bottom, are actually easy to do, or even trivial. Consequently, it follows that the people doing them are really only one step above morons, and also that failure to do the tasks must be due to not only incompetence, but bad attitudes, because anyone can see the work is trivial.

Thus we have what I call "wicked-II" (wicked two) problems - where the tasks may be enormously difficult, but from above or outside they appear to be simple or trivial.

The immediate consequence of those misperceptions then are that management may decide, in its infinite wisdom, to undertake some new task, or "put in" a new computer system that, from their very limited depth model, should be "easy." First, they seriously lowball the associated work and costs. Then, they interpret reports of trouble from below as being obviously due to incompetence, laziness, or, worse enemy action that demands instant retaliation and disciplining or firing the idiots who resist. Management says "I don't want to hear about problems! Don't tell me you can't do that!" That directive appears to be successful, as complaints drop to zero, until the whole project finally crashes on the rocks the employees were trying to warn management about when they got fired. Management blames the employees for failing to do what they were told to do. And everyone loses.

This model of operation appears to be the norm, and enormously easy to slip into, even if management is trying hard not to. It is what "safety cultures" and "high reliability organizations" have to try to overcome in order to work.

So, we also expect to find, throughout history, a vague awareness of this type of problem and hard-won advice on some benchmarks to avoid falling into that same pitfall in the future - advice typically ignored as old wives tales, so the future generations end up rediscovering the world of hard knocks.

In some ways, this is like the brain-body dichotomy, where our conscious selves are able to think deep thoughts, like what movie to go to, and be generally unaware of all the hard work going on in the body below synthsizing enzymes, digesting food, managing pathogen invasions, etc. It is all too easy, not seeing those details, to take "the body" for granted and neglect or abuse it. And, as with management, complaints can be suppressed and we can continue on deep into fatigue and exhaustion because of higher goals, until some physiological system that was trying to warn us finally collapses. (Recall the old rule of thumb - the time to furl your mainsail is the first time it occurs to you that maybe you should furl your mainsail. Those who forget it as the wind picks up rediscover it after their mast snaps or the boat overturns.)

Similarly, "upper" levels of society are reminded in all religious literature to "remember the poor" and take care of the powerless "below" them. This advice is often neglected for short run gain and long-run disaster.

Similarly, "upper" structures, such as corporations, can easily forget that their existence depends on the lower level existence of a healthy workforce and community, and a stable ecology and climate. Again, industry can take actions for short term gain that undermine the workforce health or environmental stability, with long term catastrophic results. It's very easy to do, and very easy to suppress complaints.

Similarly, "upper" levels of the military, or civilian government, can suppress dissent and ride roughshod over the key needs and observations of their own staff, often without realizing they are doing it. The result is being surrounded by "yes men", being cut off from reality into a fantasy shell, and making terrible mistakes that end up being catastrophic.

The problems listed above are all the "same" problem mathematically. Interlevel communication and the tradeoff between "invisibility / detail hiding" and constant needs that have to be met remains an open problem.

(note: I originally posted this Nov 21, 2006 on my weblog mawba.blogspot.com,
where mawba = "M.ight A.s W.ell B.e A.live", and it got this comment:
------------------------
Frank said...

Wade:
A truly great analysis. Are you aware of Macintosh, Moffat, Atkinson's works on resilience and networks? They are going to the High Reliability Organizations conference in Deauville next may (http://www.hro2007.org/index.html ). I find Atkinson book in particular quite congruent with your analyses. There is a link on that page where you can download it: http://www.hro2007.org/speakers.html
Frank H. Wilson

5:42 AM

-----------------

No comments: