Saturday, June 02, 2007

Zink's Ink Factory - what a mess!


This simple story illustrates a lot of what goes wrong when one task gets spread out over multiple people.
=========
Fred Zink used to make drawing ink and sell it, one bottle at a time, and business was slow, but not a problem.

He made really good ink, and as his business grew, he started running into problems.

He had automated the factory, so that ink was made on the second floor and 3 liters at a time was put into some oversize 5 liter pails by worker #1. The pails went down a conveyer belt and dumped the ink,
all 3 liters into a funnel that ran down a hose and came out on the lower floor. There, worker #2 moved the hose quickly and filled three 1-liter bottles, then moved them out of the way, and got 3 new empty bottles.

There used to be a shut-off valve downstairs, but the workers would just shut it when they
went to lunch, or goofed off, so management decided to remove it and the workers had to pace themselves to work at the same speed the ink arrived.

There were some complaints, but that worked fine. If the worker really needed a break, he'd go to the intercom and call upstairs and stop the whole assembly line until he got back.

Sometimes, a little ink got spilled in the process, or a little too much was put into the first bottle and the third one came out a little short, and customers complained. A quality improvement (QI) person (#3 in the picture) was hired to watch the bottle-filler and try to fix the problem.

So, if ink got spilled in the first bottle, the #3 worker would call upstairs and ask them to quick run over and add a little more ink to the funnel, so that the third bottle wouldn't be short.

But the QI job was boring, so that person quit and they hired a new one, who didn't know why the job was created, just that if he saw spilled ink, he should call upstairs and tell them.

While that was going on, worker #1 decided that he'd heard enough about complaints about empty ink bottles, so he changed his equipment to put a little more in each big pail, mabye 3.1 liters.

Then the problems started getting out of hand. Regardless how carefully worker #2 filled each bottle, some would spill. The QI guy would call upstairs and tell them ink had spilled. The #1 worker would add a little more ink by hand. But, he got tired of having to keep getting up and adding more ink to almost every batch, so he changed his machine to put a little bit more, now 3.2 liters in each big pail.

Now, despite that extra ink, the guy downstairs kept calling to report spilled ink, so he moved it up to 3.3 liters. At this point the boss started yelling at the worker below to be more careful and not waste so much ink, and that person quit and another was hired to do job #2.

He couldn't seem to get the job right either, and kept spilling and wasting ink. The boss decided this entire group of people was too dim-witted to manage even the simplest work, so he closed the plant and moved to Florida.

============
Analysis
============

These three workers were caught in a "systems problem" that couldn't be identified by looking at each job separately. After all, each of them did their job the best they could, and even more so in some cases.

But the job of filling ink had been distributed over three people on two floors, and now none of them could see the entire process anymore. Each just had a ritual job to do, blindly.

The killer in this case was adding the intercom. Up to that point, if each person did a good job, it would work. Once the intercom was added, a feedback loop was created between levels, and that destabilized the happy situation and started to make things worse, instead of better.

At that point, nothing that worker #2 could do would work. Every batch was more than he could fit in 3 bottles, and he'd spill some, and be called incompetent by the QI person and the boss.

There was no way he could solve his local problems of filling bottles, because there was now a global problem at a higher level, where too much ink was being put into each pail on the top floor.

Like the MC Eshcer "waterfall" painting, each step seemed OK and each worker seemed well intentioned and competent, but something subtle and structural was wrong.

This is a "systems problem" and needs someone to get the "big picture" and see why the parts don't add up correctly to a well-functioning production process. If "big picture" people are not available, it could help if the 3 workers at least talked at break and lunch to discuss each other's problems. If workers aren't allowed to talk to each other, even that won't work.

You may be able to see how with an even larger process with more steps, everyone can lose sight of the big picture and not realize how their efforts to help are actually hurting. The story also illustrates a common result - various people are blamed, fired, and replaced, and then when the problems simply won't go away, everyone is fired.

This helps illustrate my point in an earlier post that it is not a very safe strategy to say "We will solve all the small, 1-person problems first, before we look at larger problems involving work teams!" In this case, the problem is at the team level, isn't visible at the individual level, and what is visible at the individual level can't be fixed at that level.

It is remarkably easy to have such things happen, and they tend to be made suddenly worse by a very "small" thing, such as adding an intercom, that completes yet some new loop that destabilizes everything else.

==========
moral of our tale
==========
When problem s occur in organizations, there are many other causes for things going wrong than "bad workers." These problems are difficult to analyze, and such problems tend to persist. Meanwhile, simpler one-person-scale problems tend to get fixed.

Over time, that changes the mix to more and more unsolved global-scale problems, which get reflected in more and more issues that appear to be local, and intractable.
The truth is, these problems cannot be solved, ever, if you only look at the local level.
And one way to evolve a global perspective is to let everyone talk at lunch. If the employees in one function work in different buildings, even that won't work well.

Actually, the sooner you start developing wisdom at a global level, the faster the "local" problems will disappear magically. Management doesn't even have to solve the problem, they just have to stop blocking the workers from talking and sharing stories, which is often enough to solve problems.

This turns out to be a larger problem than realized in places like hospitals which are broken into specialty clinics. The Toyota "Lean" production model has the entire production line stop when a worker can't do his job correctly, which focuses everyone in the whole company on what might be contributing to that problem. But, in a hospital, it's not possible to "stop the line" and have patients go into suspended animation while you figure things out.

So, some alternative way to bring larger and larger fractions of people together in massive-parallel ways need to be found, or such problems will simply crash through the "lean" safety net, as a post in the last day or so illustrated. If people are so busy blaming each other that they don't even want to talk, this is even a worse case scenario.

And if people are so busy fighting small problems that just seem intractable to move on to considering "larger ones", the trap is complete.

The only way out is to have someone look at large-scale problems, or let everyone talk and emerge the equivalent effect.

2 comments:

Wade said...

A few afterthoughts.
1) Life would also be better without the quality improvement person "helping". This illustrates principles of "unintended effects" and John Gall's rule "The item most likely to be causing a problem is the last thing you put in to prevent problems from forming."

I've found that guideline to be amazingly helpful at telling me where to look first when things go wrong. In fact, I came up with a clever last-minute idea on Friday that made a new problem while it solved the one I was focusing on.
Same principles. Same result.

2) I totally forgot to make the whole point I set out to make - namely, that people think that "BIG" problems are simply the sum of a lot of "LITTLE" problems. Therefore, if you solve the little problems you'll be done.

This is SOooooo NOT TRUE.

That reasoning neglects two really important things.

A) While efforts of humans may not add up to more than the sum of the parts, problems in systems almost always seem, perversely, to add up to worse than the sum of the individual badness. They are "synergic in reverse." or "anti-synergic" (Term?)

B) If that weren't enough, there's a second effect, that persistent "emergent effects" tend to really seem to enjoy existing, and effectively take on a life of their own and dig in for the long haul, modifying their environment so as to support them. Then, when you remove all the small problems (if you could), the BIG problem is still lurking up there, now having become self-sustaining and not NEEDING The little problems any more to hold it up.

(This is the same way a company can take on a life of its own, and then turn around and fire the woman who created it, and yet keep on going.)

The Emergent problem has, in some sense, been "RADIATED" by the motion of all the little problems, and, is now travelling on it's own, like a "ring vortex" shot from a "ring vortex canon." (I'll dig up the reference. Great fun to build one of those.)

Anyway, like water or radio waves, once the thing has been RADIATED, it has left the antenna, (Elvis has left the building), and it's not coming back, even if you shut off the power to the transmitter.

Worse, if you did succeed in removing the little problems that generated the big one, you would also lose your trail of where the big one ever came from in the first place.

Since these things are so hard to conceptualize and "see", they have a survival advantage and can persist essentially forever (a "soliton"), so these kinds of problems just keep on increasing in number and interactive complexity.

It's like the "spirit" of the organization "goes bad" or something, because it's just not clear what is deteriorating and where or how.

Like "bigotry" or "hatred" or "stereotypes", many of these bad things become feeders, tending to cause behavior that nurtures them, and thereby persist. Those three are perceptual blinders, altering vision, so even if nothing is going on to support them, SOMETHING will be perceived as supporting them.

If everyone could stop that emotion for even one second simultaneously, they'd go away, but that is not a natural activity.

The process of getting such racism or bigotry "solitons" (self-sustaining waves) to die out is complex, and is helped a lot if even one side is forgiving and doesn't keep retaliating for retaliation. (Otherwise, the negative energy keeps being passed back and forth and growing, as the Israelis and Palestinian Arabs seem to do, each side self-righteously retaliating for the last thing the other side did to retaliate for ... etc. From either side, they appear to be getting attacked out of the blue for nothing.)

In large organizations, waves of animosity and bitterness can linger for years or decades or probably centuries, long after the original incident that trigged the wave is gone.

Identifying and rooting these out is a hard process. Prejudice tends to build a nest of similar minded people that cross-support each other in the stereotype, so it cannot be addressed one person at a time, a lot like smoking. It has to be addressed a whole population at a time.

And since each side sees itself as righteous and the other side as the personificaiton of evil, it's hard to get the process rolling.

That's the challenge for "peace studies" - is ways to overcome that sort of persistent waves of negative energy in organizational substrates, waves that have become detached from their original radiating source and are now free-floating and self-feeding
and self-repairing and self-extending and terra-forming the region around them to be more supportive of them.

Wade said...

Another thought on these persistent problems that survive long after the generator has left the scene...

Maybe that's an apt description of the structural confusion that gets generated on many computer operating systems, such as Windows, that require the box be restarted (rebooted) in order to get rid of them.

In fact, many help desks have given up diagnosing problems and simply "reimage" a machine that doesn't work, because these residual problems can be so complicated.

Maybe, the only reasonable design pattern is to have the micro-scale units expire, or laws sunset, on a regular basis, and replaced by fresh ones, because it's just not possible to undo the damage and the damages accumulate over time.