Ames Laboratory, Department of Energy, ISU, Ames, Iowa

An FPS Forerunner; The Atanasoff-Berry Computer

John L. Gustafson

Question: What computer is specially designed for solving large sets of linear equations, has 15-decimal precision, performs linear algebra operations with parallel multiply-adders, is targeted at solving problems in structural analysis and quantum theory, and was built in 1940?

With the exception of the last qualifier, the computer described above could be the FPS-164/MAX scientific computer. In fact, the description is that of the very first electronic digital computer, built by Professor John Atanasoff at Iowa State University during the late 1930s. The two machines were conceived almost exactly 50 years apart (1933 and 1982), and the FPS-164/MAX is roughly 90 million times faster than the A-B Computer; however, the parallels between them are remarkable.

History often repeats itself in social and economic affairs, but one tends to think of technology as progressive rather than repetitive. There are actually some very good reasons why the fastest machines now being built echo the concepts used in the earliest computers. The inventors of the 1930s and 1940s faced performance barriers caused by the limits of available technology, whereas today's designers face performance barriers caused by the laws of physics. Many old techniques are now being reinvented to circumvent those limits, with Atanasoff's machine providing an excellent case in point.

The prototype of the A-B Computer is described in Atanasoff's paper [1]. The A-B computer was conceived around 1933 and prototyped in late 1940 for a total cost of $1,460. It made several astonishing breakthroughs, and is now credited with being the first computer that was both digital (not analog) and electronic (not mechanical). It certainly marks the beginning of modern scientific computing, which today includes a range of products, from desktop machines scarcely more powerful than Atanasoff's to special-purpose supercomputers such as the FPS-164/MAX.

Applications

The following quotation from Atanasoff's paper could have been taken from FPS literature:

Utility. In the treatment of many mathematical problems one requires the solution of systems of linear simultaneous algebraic equations. The occurrence of such systems is especially frequent in the applied fields of statistics, physics and technology. The following list indicates the range of problems in which the solution of systems of linear algebraic equations constitutes an essential part of the mathematical difficulty:

Multiple correlation.
Curve fitting.
Method of least squares.
Vibration problems including the vibrational Raman effect.
Electrical circuit analysis.
Analysis of elastic structures.
Approximate solution of many problems of elasticity.
Approximate solution of problems of quantum mechanics.
Perturbation theories of mechanics, astronomy and the quantum theory.

This list could be expanded very considerably, for linear algebraic systems are found in all applications of mathemtics which possess a linear aspect. [2]

Today the major uses of large-scale scientific computers include electronic circuit analysis, structural analysis, and quantum chemistry and physics. As the size of problems increases, the portion of a run which involves matrix algebra becomes disproportionately larger, so there is still the need for special matrix algebra accelerators like those used in the FPS-164/MAX.

Architectural Strategy

Atanasoff recognized that linear algebra required high speed and high precision. The high speed requirement led to the use of electronic storage and switching throughout, since the mechanical gearing methods of the existing calculators were hundreds of times slower than even 1930s electronics. The high precision required ruled out the use of analog methods, which are limited to two or three decimal digits of accuracy. Atanasoff determined that a binary machine would have advantages of speed, efficiency, and simplicity over a decimal machine attempting to imitate grade-school arithmetic. He was the first to recognize the value of "on-off" representations of numbers, which he called "abacus elements." There have been a few attempts at non-binary electronic digital computers, but virtually all the machines made in the last two decades have operated in binary mode at the lowest hardware level. Decimal/binary converters were added to the A-B Computer to facilitate initial input and final output of data: an early "formatted I/O" processor.

Atanasoff decided that 50-bit representations of numbers, plus a sign bit, would be sufficient, providing roughly fifteen decimals of accuracy. Half a century later, both FPS and the IEEE committee choose a floating-point arithmetic standard which uses a 52-bit representation, plus a sign bit and an 11-bit binary exponent. (The VLSI parts used in the FPS-164/MAX use "one's complement" arithmetic very similar to that used in the A-B Computer.) Some scientific computers use 48-bit representations of the significant part of the number. In a very real sense, the A-B Computer was a "double-precision" machine similar to FPS scientific computers.

Main Memory

The A-B Computer used dynamic storage for its main memory, requiring periodic "refresh" to remind if of its binary state, as do today's dynamic RAM chips. Atanasoff considered using relays, magnetic core memory, vacuum tubes, and charged capacitors to store each bit of memory; he finally decided on the latter, on the basis of the cost/performance. His design called for 3200 bits of storage, so cost-per-bit was by far the deciding factor.

The FPS-164/MAX has a memory capacity of fifteen megawords, where each word has 64 bits plus an eight-bit error-correcting code, for a total of 1,132,462,080 bits of main memory. To this day, dynamic storage is the most cost-effective memory available, using capacitors printed on silicon at a density of 262,144 capacitors per chip (the 256K DRAM).

The A-B Computer organized memory into two banks of 32 words each (two of these words were spares). The banks were actually registers for the arithmetic unit; corresponding, ironically, to the two register banks DPX and DPY in the scalar unit of the FPS-164/MAX, which also have 32 words each. Each bank in the A-B Computer resided in a cylinder the size of a large coffee can.

Parallel Arithmetic

The basic operation of the A-B Computer was a parallel multiplication followed by a parallel addition or subtraction. On each clock cycle of one second, the computer could compute 30 simultaneous additions or subtractions, making it the first "vector computer." The idea was to perform Gaussian Row Elimination by scaling a row of the matrix representing the equations, and adding it to another row. A multiplication was performed by repeatedly shifting and adding, requiring fifteen one-second cycles. The peak speed of the A-B Computer was therefore 2x30 calculations in 16 seconds, or 3.75 calculations per second.

The FPS-164/MAX can perform up to 341 million calculations per second on 15-digit numbers, where the basic operation is 31 parallel multiplications followed by 31 parallel additions or subtractions. It is designed for either Gaussian Row elimination like the A-B Computer or for dot products, which can be used even more efficiently to solve systems of equations.

Although the primitive nature of early computer designs is often belittled, the A-B Computer commands respect for the sophistication of its architecture. The A-B Computer used a compact arrangement of vacuum tubes, which recognized the value of replication and simplicity in computer design. The FPS-164/MAX uses very-large-scale-integration (VLSI) to replicate high speed multiplier-adders (which are millions of times faster and somewhat smaller than the one-bit addition/subtraction units in the A-B Computer), but the architecture has changed far less than the switching technology. If the tubes were primitive, the way they were connected clearly was not.

Linear Algebra

Let us translate the speed ratio of the two machines into performance on linear algebra problems. The ratio of FPS-164/MAX peak speed to A-B Computer peak speed is 91 million/1. [Note that this corresponds to an improvement of about 50-60% per year since 1940, similar to the current rate of improvement in computer performance and cost/performance.]

Atanasoff estimated that a human, using pencil and paper, would require eight hours to solve eight equations in eight unknowns. Clearly, he assumed calculations of less than fifteen digits, since at least ten minutes are required to multiply two fifteen-digit numbers by hand, and solving eight equations involves 232 multiplications in general, requiring closer to 40 hours. This estimate presupposes no errors; any errors would necessitate repeating the process from the point at which the mistake was made.

What operations could the A-B Computer perform in eight hours? To solve n equations in n unknowns requires about n^3/3 multiplications and n^3/3 additions, in the worst case where every equation involves every unknown. [In practice, many large problems are "sparse," in that every equation only involves a few unknown quantities, making computation much faster.] This means that at 3.75 calculations per second, the A-B Computer could perform about 15,000 multiplications and 15,000 additions in eight hours, sufficient to solve about 35 equations in 35 unknowns. Actually, efficiency was reduced from 100% of peak speed because fewer than 30 simultaneous multiplications-additions generally can be performed, but it appears safe to say that the A-B Computer could solve 30 simultaneous equations in one day.

By contrast, the FPS-164/MAX can solve about 24,000 equations in eight hours, where each equation involves all 24,000 unknown quantities. Because the work goes as the cube of the number of equations, the performance ratio of 91 million/1 translates into a problem-size ratio of only about 1,000/1.

It is remarkable that the applications for the FPS-164/MAX are so similar to those whichf Atanasoff envisioned: Structural analysis (linear and nonlinear), quantum chemistry, and electromagnetic boundary value problems constitute most of the demand for linear algebra computation today. Circuit analysis is extremely important, but involves many types of computation in addition to solving systems of equations. Of Atanasoff's list, only curve fitting and the method of least squares are specific to small systems of equations; most curve fitting problems become numerically meaningless when the number of degrees of freedom in the curve grows beyond thirty.

Mass Storage

Older computers are associated with mechanical card readers and punched paper tapes for recording the bulk of data being processed. The A-B Computer was considerably ahead of its time in that it used electrical means to write to, and read from, mass storage. The recording medium was cards, but holes were made electrically using a 5,000-volt spark. The holes could then be read using a much lower voltage, at a speed greater than mechanical devices of the time, with each number punched or read at 60 bits per second. Here again, however, parallelism comes into play; 30-bit streams were being read and written simultaneously, for an overall I/O rate of 1800 bits/second both in and out. This is faster than the rate at which today's personal computers read cassette tapes, and was fast enough to match the computing rate.

Besides bit parallelism, older computers also possessed functional parallelism, in that numbers could be read, processed and written simultaneously. In modern parlance, "I/O was overlapped with the CPU operation."

The mass storage on the FPS-164/MAX is disk storage, with a capacity of up to sixteen billion numbers. This capacity is sufficient to store a full matrix, representing 126,000 equations (which would take the FPS-164/MAX two months to solve). The hardware and software are designed to overlap I/O and computation, much as Atanasoff's computer did, permitting the solution of much larger problems than would have fit into the main storage of the machine.

Historical Perspective

Clever computer designs have emerged whenever there were gret difficulties to overcome. When luxurious burts of technology have occurred, the designs have tended to turn the speed improvements into ease-of-design or ease-of-use, rather than stretching performance to the breaking point.

Consider the recent furor over Reduced Instruction Set Computers (RISC). The early machines had tiny instruction sets for obvious reasons. As parts became highly integrated, large instruction sets proliferated and made computers more robust, with features ranging from bit manipulation, to printer control, to floating-point calculations. Starting about 1976, there was a resurgence of small-instruction-set architectures among "supercomputer" designs; it is now well-recognized that large instruction sets always cost extra hardware or slower operation (or both), and the cycle is repeating.

Early computers preferred serial access to random-access storage. Building a memory in which every value is equally accessible is not nearly as simple as keeping a set of values quickly revolving in some type of circle, which can be spooled in or out. Mercury delay lines on early computers are an example of serial-type storage. Random-access memory so greatly simplifies programming that the market for these memory chips is now valued at billions of dollars per year. But where performance is being pushed to the limit, designers have rediscovered the serial form of storing data. The terminology, of course, is different: vector memory, shift registers, bit streams, and pipelines all refer to the advantage of storing or using data in a particular order rather than randomly.

Multiprocessing, which is the use of multiple computing elements to work on a task, is frequently referred to as "a new trend in computing." In fact, it is difficult to find a time in the history of computers when there was not a multiprocessing approach.

What is new in multiprocessing is the concept of using hundreds or thousands of processors at once. Integration and the physical limits of uniprocessors have both made "massively parallel" approaches a logical direction. Yet, many of the algorithms which can be used on such parallel schemes closely resemble the way problems were solved decades ago, using dozens of humans as processing elements.

Conclusion

The comparison of the FPS-164/MAX and the Atanasoff-Berry Computer contains an unusual number of parallels. In applications, architectural strategy, main memory, parallel arithmetic, linear algebra, and mass storage, the FPS-164/MAX echoes its predecessor. In a larger perspective, however, the achievement of both machines is strikingly similar: they allowed the solution of larger problems than was otherwise possible at the time.

Acknowledgement

The author wishes to thank Edward Borasky, Senior Staff Analyst in the Marketing Department of Floating Point Systems, Inc. for the idea which inspired this paper.

References

[1] "Computing Machine for the Solution of Large Systems of Linear Algebraic Equations," reprinted in The Origins of Digital Computers, Brian Randell, Ed., Springer-Verlag, New York, 1982, p. 315.

[2] ibid., p.315

Dr. John Gustafson is Senior Staff Scientist in the Engineering Department of Floating Point Systems, Inc., where he is working on the T Series and future computer architectures. His previous positions at FPS include Product Development Manager for Scientific/ Engineering Products, and Senior Applications Specialist. Before Joining FPS, Dr. Gustafson was a Software Engineer for the Jet Propulsion Laboratory in Pasadena, California. He has a BS degree from Caltech, and MS and PhD degrees from Iowa State University. His research interests include numerical analysis, algorithm theory, and special functions.

Return to "Publications by John Gustafson"

Contact: John Gustafson
The URL for this document is http://www.scl.ameslab.gov
Revised July 16, 2002