The nug30 Computation

The nug30 Computational Pool

By using the flocking and glide-in mechanisms provided by Condor, we were able to bring together a computational pool consisting of 2510 processors from various locations and of varying characteristics. Table 1 shows the number and type of processors at each participating site.

Number Arch/OS Location
414 Intel/Linux Argonne
96 SGI/Irix Argonne
1024 SGI/Irix NCSA
16 Intel/Linux NCSA
45 SGI/Irix NCSA
246 Intel/Linux Wisconsin
146 Intel/Solaris Wisconsin
133 Sun/Solaris Wisconsin
190 Intel/Linux Georgia Tech
94 Intel/Solaris Georgia Tech
54 Intel/Linux Italy (INFN)
25 Intel/Linux New Mexico
12 Sun/Solaris Northwestern
5 Intel/Linux Columbia U.
10 Sun/Solaris Columbia U.
Table 1: Computational Pool

Interesting facts about the participating machines:



Graphs of the nug30 computation



The Evolution of the nug30 Computation

On June 8, 2000 at 11:05:36 CDT, Jean-Pierre Goux and Jeff Linderoth started running the MW-QAP code on nug30, logging in remotely to Jeff's Personal Condor pool at the University of Wisconsin-Madison. The computation completed on June 15, at 21:20:07 CDT, after which there was much rejoicing.

The nug30 computation was stopped five times during the week for various reasons:

After each termination, the computation was restarted from a checkpoint that was taken every 15 minutes during the run. (Thus at most 15 minutes of computation time was lost). Although no one is happy when bugs are present or human error occurs, these things will happen. Adding program robustness (in the form of checkpointing) was critical to the success of solving nug30.



nug30 Computation Statistics

The optimal solution to the nug30 QAP instance is: 14,5,28,24,1,3,16,15,10,9,21,2,4,29,25,22,13,26,17,30,6,20,19,8,18,7,27,12,11,23

In order to prove the optimality of this solution, 11,892,208,412 nodes of a branch and bound tree were explored. Solving the associated node subproblems and computing the branching information required 574,254,156,532 Frank-Wolfe iterations.

On average, there were 653 machines participating in the computation, with a maximum of 1009. One of the most remarkable features of the run was that almost 1 million linear assignment problems (LAPs) were solved each second during the course of the run. (One LAP must be solved for each Frank-Wolfe iteration). Table 2 shows a number of other interesting statistics about the nug30 run and the computational pool. The machine speeds have been normalized to an HP-C3000 workstation by comparing the time required for each participating machine to compute the same portion of the branch and bound tree. (Thus the "average" machine used in the nug30 computation was 56% as fast as an HP-C3000).

Average number of available workers 652.7
Maximum number of available workers 1009
Running wall clock time (sec) 597,872
Total cpu time (sec) 346,640,860
Average machine speed 0.560
Minimum machine speed 0.045
Maximum machine speed 1.074
Equivalent CPU time (sec) on an HP-C3000 218,823,577
Parallel Efficiency 93%
Number of times a machine joined the computation 19,063
Table 2: nug30 Run Statistics

Table 3 shows the percentage of the work done at each participating location.

Location Percentage
Argonne 42.27
Wisconsin 33.69
Gatech 11.90
INFN 5.65
NCSA 2.74
New Mexico 1.42
Columbia 1.23
NW 1.10
Table 3: Percentage of Work Done at Each Location


Table 4 shows the percentage of the work done by machines of each operating system and architecture type.

Arch/OS Percentage
Intel/Linux 79.57
SGI/Irix 8.76
Sun/Solaris 6.17
Intel/Solaris 5.50
Table 4: Percentage of Work Done by Each Architecture/Operating System


Some Historic Photos





metaneos@mcs.anl.gov
Last modified: Mon Jul 3 23:17:42 CDT 2000