Thursday, July 05, 2007

Why are so many flights delayed?




Although my flight made it home from Baltimore, my flight there was canceled, and on the way back the two flights on adjacent gates to mine were canceled.

Northwest Airlines, with a hub in Detroit, seems to have led the pack, with 14% of its flights canceled two weeks ago, stranding over 100,000 passengers.

An article in this morning's paper confirms that it's getting worse. It also notes something I realize I should have seen myself, since it's one of those scale-dependent thingies: the delays counted by airplane are nothing compared with the delays experienced by passengers. The airline calls it a 1-hour delay, but it causes a missed connection and an overnight stay, or even longer, waiting to get re-booked, because all the other flights are already full too, and you're not the only one who got bumped.

Here's some numbers from the Times:

Ugly Airline Math: Planes late, fliers even later
New York Times
Jeff Bailey and Nate Schwebber
July 5, 2007

As anyone who has flown recently can probably tell you, delays are getting worse this year. The on-time performance of airlines has reached an all-time low, but even the official numbers do not begin to capture the severity of the problem.

That is because these statistics track how late airplanes are, not how late passengers are. The longest delays — those resulting from missed connections and canceled flights — involve sitting around for hours or even days in airports and hotels and do not officially get counted.

Researchers at the Massachusetts Institute of Technology ... determined that as planes become more crowded — and jets have never been as jammed as they are today — the delays grow much longer because it becomes harder to find a seat on a later flight.

But with domestic flights running 85 to 90 percent full, meaning that virtually all planes on desirable routes are full, Cynthia Barnhart, an M.I.T. professor who studies transportation systems, has a pretty good idea of what the new research will show when it is completed this fall: “There will be severe increases in delays,” she said.

Over all, this could be a dreadful summer to fly. In the first five months of 2007, more than a quarter of all flights within the United States arrived at least 15 minutes late. And more of those flights were delayed for long stretches, an average of 39 percent longer than a year earlier.

Moreover, in addition to crowded flights, the usual disruptive summer thunderstorms and an overtaxed air traffic control system, travelers could encounter some very grumpy airline employees; after taking big pay cuts and watching airline executives reap some big bonuses, many workers are fed up.

If a flight taxies out, sits for hours, and then taxies back in and is canceled, the delay is not recorded. Likewise, flights diverted to cities other than their destination are not figured into delay statistics.

About 30 percent to 35 percent of Continental’s passengers make connections between flights

A spokeswoman, .. added that many delays are caused by weather and thus do not reflect the airline’s performance.

...That is a typical level of missed connections, but Continental’s flights that day were 89.6 percent full, so finding seats on later flights for some passengers was difficult.

Continental also has a new system that sends e-mail messages — and, beginning next month, text messages to cellphones — informing connecting passengers on late flights how they have been re-booked.

It also is moving ticket kiosks inside the security area so passengers can print new boarding passes without going out to the main ticketing area or having to wait in line for a gate agent to help them.

The system, however, re-books people on the next available flight with a confirmed open seat and that is not always as soon as people might expect. Some are told their new departure is in three days.

“That causes them to go berserk,” said David Grizzle, a senior vice president at Continental. Often, on standby, people get out sooner, he said.


I also noticed that Northwest Airlines had attempted to solve this problem by institutionalizing the response. They now had entire special carts to make it easier for large numbers of passengers to attempt to make new bookings faster.



From the point of view of "lean" practices, and the Toyota Production Model, this represents one of the worst wastes possible - trying to become more efficient at doing work that shouldn't even be done in the first place. The risk is that the "workaround" will partly work, and then dig in for the long haul and become part of the new "normal" process, replicated 500 times in other places. New vendors will spring up to build even "more convenient" re-booking carts, and to lobby for sustaining this practice.



What might be done instead?

The first thing is to identify what the problem is. The problem is not thunderstorms or a feud between the traffic controllers and the FAA, although those contribute. The problem here is one of those pesky physical laws that I've been writing about, and what "the Yarn Harlot" pointed out as man's persistent desire to make "ten less than nine" and the delusion that maybe it just hasn't been rotated the right way yet and somehow this will "fit."

The law in question is called "Little's Law", and it looks innocent enough. It says that for any system the "cycle time" to process one unit (or passenger) goes up towards infinity as the system becomes full, and goes up much faster if there is more variability in the processing time for any individual step.

I can't easily find an authoritative textbook online, but here's the key info from a wafer fabrication newsletter "fabtime". (The same law applies to semiconductors as to passengers.
WIP = Work in process)

The relationship between cycle time and WIP was first documented in 1961 by J. D. C. Little. Little’s Law states that at a given throughput level, the ratio of WIP to cycle time equals throughput, as shown in the formulas below:

Throughput = WIP / Cycle Time

In other words, for a factory with constant throughput, WIP and cycle time are proportional. Keep in mind that Little’s Law doesn’t say that WIP and cycle time are independent of start rate. Little’s Law just says if you have two of these three numbers, you should be able to solve for the remaining one. The tricky part is that cycle time and WIP are really functions of the start rate.
Oh, and that tricky part is the devil in the details. What this really says is that as you try to jam more and more stuff through the same process, as it fills up the process starts to run into conflict and congestion costs, and the actual throughput starts declining rapidly, while "work in process" (passengers waiting for a flight) climbs towards the sky.

Fabtime's tutorial, shows a graph of the result, that shows that effect.

What this shows is that not only does the "cycle time" expected for a unit in this system (a passenger) to be processed (get home) go up, it goes up rapidly to multiples of the time it would take on an uncrowded system. So, for a very consistent, low variability process, the blue line,
trying to operate at about 90% full capacity will cause the process time to be six times the time it would take at 10% full. If the process has more variability (thunderstorms), this knee can be reached much sooner - at 65% capacity.

This is as true for service work and management work as for producing widgets or silicon wafers. Past a certain point, trying to shove more work through the system only slows down everything. So, the right thing to do is to find the sweet spot where the most work actually gets done, and resist the temptation to now try to fill every open space with more work. For wafer fabrication, this is about 85% "full". In other words, at the maximum throughput, 15% of the system will be empty, just "sitting there". This drives management crazy.

What typically happens is that people don't believe this result, even if they know it. (The delusion factor is strong, and surely 10 can be made less than 9.) None of the outside stakeholders, or visiting brass from the parent company understand this law, and a piece of idle equipment is surely a mistake and needs to be doing work! Or so it seems.

So, now, our friend, the feedback loop, comes into play. Once this knee in the curve is passed, and output starts to slow down due to congestion, the typical response of management is to go ballistic and push harder, trying to jam even more work through the system. This slows the system down more, which leads to management pushing even harder and starting even more work in process.

Then psychosocial factors come into play. Management becomes convinced that the employees must be goofing off, and become irate. "Surely that is true, because the total throughput is going down!" they think. Meanwhile, the swamped employees, seeing more in their in-boxes than ever and becoming exhausted trying to deal with all the internal delays at getting the simplest thing done, also become testy and hostile.

Meetings are held to discuss why so little is getting done, which takes more time, further slowing down the process. Labor strikes. Management retaliates, further cutting production and sales and revenue, which makes stockholders even more desperate to make up the losses with even more bookings. We end up with a positive feedback loop that rises until something breaks.
That's where it appears to be today.



If you click on that diagram, you can zoom it up to a readable size.

(That diagram is most of a Causal Loop Diagram, as developed by Systems Dynamics folks like Worcester Polytechnic Institute, or MIT Professor John Sterman (author of the tome Business Dynamics), drawn with Ventana Software's Vensim software that could put in numbers and actually run the simulation to see how this unfolds. This sort of reasoning is described by Peter Senge in his book The Fifth Discipline, where he uses an example of a beer production and distribution system to show how things can fall apart even when everyone is doing a good job, as they see it, because of "system factors" and "feedback loops". People interested in that would be interested in the whole Systems Dynamics Society. )

This occurs in a great many companies today. Unfamiliarity with Little's Law loads the gun, psychosocial factors cock the hammer, and every new thunderstorm or glitch pulls the trigger as everyone involved - stockholders, management, labor, and passengers, blame each other for the problem -- which is really a "system problem" not a "bad person" problem.

When this kind of thing happens to any health care delivery system, such as a hospital, it becomes a public health problem. When this kind of thing damages nerves and business effectiveness which leads to more pressure which damages nerves and leads to obesity and heart attacks and layoffs and no health insurance, it becomes a public health problem.

This is the kind of "systems thinking" competency I'm hoping the new ASPH Core MPH Competencies will lead to, so people can see this effect and head it off at the pass.

The biggest single controllable step here is to lower the blame factor, and realize we're all in this together. Myths and delusions and a norm that management's job is to crack a whip and push harder and harder come into play in a bad way when it is the system that is slowing down, not the employees. (Take it up with God, I guess, if you don't like Little's Law.)

Going around and around this loop is one of the major factors winding us all too tight these days, both humans and corporations. Maybe understanding what we've run up against can help defuse it and lead us back to a saner world for everyone.

At least, that's what Public Health hopes, in my view.

Wade

No comments: