############################################################################# # # This Cplant(TM) source code is the property of Sandia National # Laboratories. # # This Cplant(TM) source code is copyrighted by Sandia National # Laboratories. # # The redistribution of this Cplant(TM) source code is subject to the # terms of the GNU Lesser General Public License # (see cit/LGPL or http://www.gnu.org/licenses/lgpl.html) # # Cplant(TM) Copyright 1998, 1999, 2000, 2001, 2002, 2003, 2004 # Sandia Corporation. # Under the terms of Contract DE-AC04-94AL85000, there is a non-exclusive # license for use of this work by or on behalf of the US Government. # Export of this program may require a license from the United States # Government. # ############################################################################# CLUSTER INTEGRATION CHECKLIST: ----------------------------------- This documents provides a top-level checklist for a general cluster integration. Prior to accessing the admin node: ----------------------------------- * Verify that your network switch is supported by auto-discovery. Obtain MAC address of the switch. (If your switch does not support auto-discovery, then obtain the individual MAC addresses of the nodes). -To employ autodiscovery, you will first have to create a switch object, define the switch and switch_port attributes for the interface of your node(s) and then execute the discover command with the following flags (where 'nodes' represents a predefined collection). #discover −−useswitch −−nopowercycle nodes * Determine type of device drivers for network devices) (to be used when rebuilding the kernel for the nodes) Admin only: ----------------------------------- * Rebuild the admin node with a supported Linux distribution - typically /cluster should be its own partition - if diskless and not using udev then mkfs with lots of nodes * Set up ssh access * Make sure the software distribution is available (copy the source media to the disk /home/suse/CD? or have ISO files available) * Make sure dhcp server is installed and running * Make sure tftp is installed and running * Make sure NFS server is installed and running * Make sure that time is enabled * Install syslog-ng, an audit trail tool. CIT requires that syslog-ng be up and running. You can download this software off of the follow site: http://www.balabit.com/products/syslog_ng/ * If you have downloaded a tarball off of the cit website, then you can begin by installing the base module. To get started, untar the module and consult the INSTALL document in that directory. If you are planning on checking a version out of Sandia's repository, then install subversion and checkout cit. * Install base module, this is the foundation for CIT, and the best place to get started. * Install additional modules as required. You may be wondering which modules? CIT doesn't dictate that you set up your cluster in any particular way, so you can download modules as needed. Generally, however, if given the choice, we recommend that you run your cluster as diskless, in which case you will also need to download the diskless and distros modules, and follow the same install procedure for those modules (INSTALL.distros, INSTALL.diskless). * Create database. This step can actually be executed at any time, but by the time you reach the point where you want to use the CIT tools from the distro and diskless module to create your image and build your diskless nodes, you will have to have a working database. Granted, this step is tricky. For more details on how to create a database, the best place to start is the ACME-TUTORIAL and the configure-ACME-library-style.pl, a database creation script that can be used as a template for your database. * After you have created a working database, and have the appropriate modules installed, then you can execute the build_distro and build_diskless * Build a kernel for the nodes. Although CIT does not provide ready-made kernels for your compute nodes, we do provide a number of kernel configuration files to help you get started. These are located within the diskless module (../diskless/kernel.configs). Choose the one that most closely matches your hardware, and copy it into the .../usr/src/linux directory in the image for your compute nodes. We recommend that you use the cit chroot_distro command in order to build your new kernel for the compute nodes within their native environment. Be sure to add support for your network devices. * Link /tftpboot on the admin node to the corresponding directory under the image for your compute nodes. * Regenerate configuration files (local and image) using the mk_conf tool. mk_conf --pxe, --hosts --local, dhcpd.conf, --sys-log.ng nodes * Set up /etc/exports file on admin node. The /etc/exports is used to make directories available for nfs mounting. In general you will want to export the CIT_HOME directory to compute nodes (in general this is /cluster, but you may have installed into an alternate path). All nodes ----------------------------------- * If necessary, change the bios of the nodes to boot off the network * Boot the nodes. If you have trouble booting then you can check the log files on your admin node under /var/log/syslog-ng or using the taillogs command provided by CIT. * Build any additional modules that you require; for example mpich, torque-maui. * Backup database. * Install diag module and run test to verify that hardware is working * Set up user accounts