How the Cluster Breaks Down

Next: Preparing for Installation Up: Administration Previous: Define the Physical Size

How the Cluster Breaks Down

Our goal is to make this cluster as modular and scalable as possible, so the cluster will not just be one huge 1024 node cluster. We'll call this cluster GAEA and break the huge cluster into parts as follows:

1024 Nodes: Kingdom
512 Nodes: Phyla
128 Nodes: Class
64 Nodes: Order
32 Nodes: Family
16 Nodes: Genus
1 Node: Species

Realize that the cluster, GAEA, and a KINGDOM are the same thing. This is for only this size cluster. If the size were to increase to say 2048 nodes, GAEA would consist of 2 KINGDOMS. The following diagrams will show how the cluster comes together.

$\includegraphics{genus.eps}$

As you can see there are two types of GENUSES, compute and thought. The difference is the COMPUTE GENUS is comprised totally of COMPUTE SPECIES. The THOUGHT GENUS contains the THOUGHT SPECIES needed to power the FAMILY, eleven COMPUTE SPECIES, and the net gear needed to run the FAMILY. Now that you know what the basic building blocks are, we'll quickly run through the rest of the classifications:

Family: 1 COMPUTE GENUS, 1 THOUGHT GENUS
Order: 2 FAMILIES
Class: 4 ORDERS
Phyla: 2 CLASSES
Kingdom: 2 PHYLA

The following diagram shows how SPECIES are networked together in a FAMILY:

$\includegraphics{family_connect.eps}$

This show only 8 COMPUTE SPECIES and 1 THOUGHT SPECIES but this holds for all SPECIES in the family. All the SPECIES in the FAMILY are connect to one 32 port ethernet switch and all are connected to a 32 port serial expander (actually two 16 port serial hubs linked together). The THOUGHT SPECIES is connected to the serial expander vie the ethernet uplink so that it acts as a console server for all the COMPUTE SPECIES.

The next diagram show how the FAMILIES are connect together to form the complete cluster:

$\includegraphics{cluster_connect.eps}$

The previous two diagrams are the building blocks for our cluster. This approach breaks the cluster down into sub-clusters. Job handling will be done on a per FAMILY basis. The theory being that 31 COMPUTE SPECIES should be adequate for most jobs. If a job actually needs more than that a performance hit will occur because the job must leave the FAMILY. Optimal job times will be achieved when the job is run strictly in a single FAMILY. The THOUGHT SPECIES in each FAMILY will handle job scheduling for the FAMILY, inter-FAMILY communication, act as console server for the FAMILY, handle the FAMILY'S distributed filesystem/RAID service, handle other FAMILY specific services (NIS/YP, NFS, NTP, etc), and be integral in the installation process.

Next: Preparing for Installation Up: Administration Previous: Define the Physical Size

Torrance Leggett
1999-04-22