Monday, November 15, 2010

Supercomputer - bah humbug!

The latest news on the supercomputer front is that the Chinese have passed the Americans with a "faster machine".

Tianhe-1, meaning Milky Way, achieved a computing speed of 2,570 trillion calculations per second, earning it the number one spot in the Top 500 (www.top500.org) survey of supercomputers.
The Jaguar computer at a US government facility in Tennessee, which had held the top spot, was ranked second with a speed of 1,750 trillion calculations per second.
 Well,  I'd like to suggest that the benchmark is pretty bogus and, like most of the rest of our educational system, measuring the wrong thing.

There are problems on which there is no speed of computation, along those lines, is sufficient to solve. The reason is that the solution requires leaping out of the frame of mind in which the problem was posed, and reframing it in order to solve either it or a better problem.

What I'm looking for is a "computer" that gets part way into a problem,  goes "AHA!", has a brilliant flash of insight, realizes it's approaching the problem from the entirely wrong direction, reframes the problem, and then solves it.

And I'm also suggesting that 2,000 trillion calculations per second should be way more than enough to support this pivot in direction.    Stop aiming at MORE OF THE SAME and start aiming our resources at better use of what we have already.

Going for more "supercomputer speed" is like taking a car capable of going 700 miles per hour and trying to get it to go 800 miles per hour.   My point is ... so what?   It's useless to me at either speed.

This is not just complaining -- it's my method of going "AHA! You guys are solving the wrong problem!".   Cause I'm not a supercomputer, I can do that.

Here's the key.  99.999% of getting a useful supercomputer means solving the operating system problem of how to get the individual sub-computers,  now close to a million of them,  to work with each other in tackling different parts of the problem in such a way that their tiny efforts add up and amount to anything. (sound familiar?)

The Chinese got the parts to talk TO each other faster.  Well, good, as far as it goes. But we've gotten past sending surface mail, as it were, between the sub-computers,  and are past sending email between them, and are up to instant-messaging and screen-sharing between them (you can see what I see.).

Now, time to stop increasing the SPEED of the communication, and start increasing the VALUE of the communication.  Instead of sending the same old messages, but faster,  we need to reconsider the whole point of WHY we are sending messages in the first place.

On a theoretical basis,  we also need to stop obsessing about sending symbol-string linear messages between the parts, and start sending multidimensional IMAGES between the parts.    These may be "Turing equivalent" in the fantasy world where we have infinite time and messages that don't degenerate with time , but in the real world we have finite time and noisy messages,  so it DOES matter which mode we use.

For one thing, if we use images to communicate,   we can get vastly more robust against noise.  You can take a picture of anything and toss in "salt and pepper noise",  changing random bits to black or white,   for a long time before the image itself is actually degraded and no longer conveying what it was supposed to picture.

What THAT, in turn, means,  is that it's OK if some of the pixels are wrong.  It's OK if some of the computing components flake out mid way through the computation and start generating garbage output.    That's a GOOD THING. (attached is a picture of my uncle Roger Schuette with his early computer, that he said tended to need to be shut down to replace a bad component about every 3 minutes. It only had a 1000 vacuum tubes. Once you get to a million subcomputers,  this reliability tends to be an issue again.)

Still, the key point is to get past processing CONTENT and start processing CONTEXT.   What needs to be recomputed is the REFERENCE FRAME,  not the data in the frame. This is a classic technique physicists and mathematicians use.    They ask, in what reference frame would the problem I'm attempting to solve become TRIVIAL (or linear).  They they solve (using Greens Functions or something like that) for the reference frame to USE, and THEN,  having done that,  return to addressing "the problem" which, when phrased in those terms (in that basis set), is now "obvious" and "trivial"  (linear).  In some cases, this is solving for the eigenvalues and eigenvectors, if those terms are familiar.

The point is,  that most of the computational energy has to be spent, and should be spent,  mulling over how the problem has been posed,  refusing to accept that the accidental framework in which this appears to be a problem is the BEST possible framework to think about it within,  and searching for an astoundingly BETTER vantage point and framework, within which the "problem" essentially, "goes away." In some ways it's searching for a better algorithm, but the idea of a better PROBLEM handle is stronger and different than a better algorithm for attacking the same old problem.

Anyway, here's my thinking on it, which is a little silly, since no one is listening, but that's never stopped me before.    Because we don't know in advance WHAT the problem REALLY is, or how it REALLY should be phrased, parsed, and decomposed into sub-tasks,    we don't really know in advance which  subcomputer should be doing what.   We have to assume that, midcourse, a given subcomputer may decide to retask itself entirely and change what it is working on, and change which other subcomputer clusters it is therefore speaking to about what as they spontaneously reassemble around a new approach and see if THAT new agile idea can let them stretch out to span the space covered by the problem, with each subcomputer cluster attempting to span the smallest part of the problem that, in the current reference frame, is closest to it and easiest to get to from here.

In other words, the computer components have to become like a liquid, or one of those neat things with a grid of wires that can take on the shape of what it is pressed against. (picture?) It has to FLOW, like a ball of tiny spheres,  (new computerized robotic "hand") attempting to get all the way around, and get a "grip on" the problem, before it can THEN tighten up it's joints and DRAG the parts of the problem around to some better position.  The current approach is more like trying to figure out which orientation of each of the joints of a computerized arm and hand will let the hand grasp the problem, and typically NONE of the orientation combinations works well, and there is a literal combinatorial explosion of ways in which each joint might twist to get the hand around all the obstacles to the right point, so it becomes computationally unbounded, requiring a "supercomputer" with superspeed to attempt to "brute-force" the way through to a solution.   That's the HARD way to solve the problem.
FORGET the robotic arm, in this metaphor.   Let the computer REINVENT the concept of "arm" instead of trying to IMPOSE an arm and ask the computer how to MOVE THAT ARM to the right place. 

In many ways this requires "swarm" intelligence, because what each computer is doing now becomes a function of what the other computers "near it" that it SEES are doing.  (The swarms of differnt types of birds may intersect and pass through each other and operate in the same part of the sky. Fun to watch.)

 And now we get back to our standard question of achieving "unity above diversity" -- letting the parts each go their own way and still having an overarching emergent consequence of that, as well as an overarching authoritative pushing and gentle nudging of that, in a highly iterated loop,  forming a sort of slime-mold metabeing that can reach out on a large scale and engulf and subsume the problem.

======


Side bar on Uncle Roger and early computer work at Barber Coleman in 1952.
From  invention.smithsonian.org/downloads/fa_cohc_abstracts_s-z.pdf
Interviewee: Interviewer: Date: Repository: Description:
Abstract:
Citation: Copyright:
Computer Oral History Collection, 1969-1973, 1977
Roger E. Schuette Henry Tropp June 20, 1972 Archives Center, National Museum of American History Transcript, 87 pp.
Schuette began working at Barber-Colman in 1939 and soon was involved in anti-aircraft and anti-tank projects, including design of a calculating device for a rangefinder made by Eastman Kodak. During World War II, Duncan Stewart of Barber-Colman met George Stibitz while working for the National Defense Research Committee. Stewart decided his company should build a commercial computer. By 1946, Stibitz had designed a small scale serial computer with error checking circuits. The first model was completed in 1948 and a second, much superior version, the Simple Electronic Digital Computer (SEDC), in 1955. Schuette worked on the memory and an input device for the SEDC. Others introduced a compiler and improved programming. This machine was used at Barber-Colman for a few months, but the company decided not to invest in either manufacturing facilities or a national service network and did not market it. They did sell patent rights to its system for arithmetic calculations and error checking.
[Comment - I think they sold the patents to a company called International Business Machines, which DID decide to go ahead and try to build a commercial version of this bug-laden computational thingie... :)  ]

Picture of the Barber-Colman Computer with uncle Roger at the helm
http://newbricks.blogspot.com/2007/04/capstone-slide-6.html


Uncle Roger was instrumental in my 6th grade class getting a kit for an analog computer which could play tic-tac-toe,  which, surprise, I was selected as the student to assemble it.   So I got to build my very first computer in 1957, and it definitely got under the skin and into my blood!  The thing had a front panel,  six or nine, I forget,  wooden disks that could be rotated to indicate which move the person at the corresponding step had selected for their move, and some version of lights that indicated what the next move  "the computer" was making would be.  Underneath was a maze of wires connecting the parts.

It worked way better and longer than my first Healtkit radio transmitter,  a 150 watt job that I finished the several month job of assembling and decided to turn on "just to see it light up" before I bothered to complete the antenna.  Well,  minus the load of the antenna, the power stage went ballistic and exploded, thus ending in 2 seconds THAT effort to communicate with the world!

Anyway, the analogy to corporate life is pretty good,  in terms of supercomputer issues.  What they are trying to do is find ever faster and "more efficient" ways for employees to carry out the directives of the boss, and move the fixed robotic arm around to grapple with a continuing flux of new problems. This turns out to be HARD, and so employes are driven harder and harder to be more "productive".

At the same time,  what the employees want to do, and should be able to do, is get OUTSIDE the box,  dissolve the robot arm, RECONCEPTUALIZE the problem the company faces, and SOLVE THAT, then bring that back into the original framework and see what it works out to.   Then we are talking about measuring EFFECTIVENESS not EFFICIENCY.   Then we will see quantum leaps in "productivity" and the bottom line.

As things stand now,   adding employees or computers to problem solving teams only MAKE THINGS WORSE,  because there are N-factorial NEW communication pathways that need to be established to get everyone "working on the same page."   The problem with that method of thinking is that there IS a "page" already predefined that the agents need to be brought TO.  So we have the wisdom that "adding staff to a project that is late will only make it later."

Clearly,  we are on "wrong side of the power curve" here.   We need a better approach or distributed problem solving algorithm,  SUCH THAT, with the property THAT,  adding more people INCREASES the power of the group to solve problems.  Clearly, and admittedly, the "STAR architecture", where all communications have to go through a central point,  suffers from congestion issues and fails this test -- not because adding new agents is a bad idea, but because having them go through "the boss" to get anything done is a BAD CONSTRAINT on a good idea.

This is reminscent of the fundamental problems in the design of the virtual reality world Second Life, shared by most of the alternatives, or all of them at this point -- namely,  that the focus is on the ENTITIES, not the RELATIONSHIPS, and the focus is on people (as problems) moving across solution space (FIXED grids of computers) instead of letting the COMPUTERS reshuffle themselves to adapt to the problem of the moment.    

So we have the silly and dysfunctional result that 100 people can each own a simulated island (sim),  and operate just fine, but if they all want to talk to each other and congregate on one sim,   the poor sim-server trying to do all the work is overloaded,  while 99 sim-servers sit around bored.  There is a strong emphasis in that architecture to LIMIT The attendance at meetings, or, if you come,you have to leave all your fun stuff at home.

Consider instead an architecture where the computational power flows WITH the avatar, and can be pooled, so that,  the more people who show up at a meeting, the MORE cpu power there is to do fun stuff with!    100 attendees means 100 servers are all focused on making that one virtual meeting space fully empowered to do neat stuff!

This is the basic transition we need to make, the corner we need to turn.  We don't need "faster supercomputers" we need wiser supercomputers.  We don't need "brighter employees" we need management to open up the constraints and let the employees BE collectively wise PRECISELY by coloring outside the lines, thinking outside the box, stepping outside the preconceived constraints and doing things THEIR Way not the "boss's way".    We don't need Linden Labs to take over the entire power of the Hoover Dam to run even more powerful servers -- we only need to let the servers flex so they can flow to the problem,  not sit and wait for the problem to flow to them.

We have the resources we need. We don't need more speed. We just need to reconceptualize our problem.

No comments: