############################################################################# # # This Cplant(TM) source code is the property of Sandia National # Laboratories. # # This Cplant(TM) source code is copyrighted by Sandia National # Laboratories. # # The redistribution of this Cplant(TM) source code is subject to the # terms of the GNU Lesser General Public License # (see cit/LGPL or http://www.gnu.org/licenses/lgpl.html) # # Cplant(TM) Copyright 1998, 1999, 2000, 2001, 2002, 2003, 2004 # Sandia Corporation. # Under the terms of Contract DE-AC04-94AL85000, there is a non-exclusive # license for use of this work by or on behalf of the US Government. # Export of this program may require a license from the United States # Government. # ############################################################################# Creating support for a new system architecture (Distro) is a fairly advanced topic and requires a good understanding of the CIToolkit's framework and ideas. This is a basic idea of what to do though... (Examples below are for creating Fedora Core 3) * Identify where in the Distro hierarchy you want to add the new distribution. # distro_mgr --types * Compare your new distribution to currently supported ones that are close in both time and heritage. For example, RedHat Enterprise 3.0 on i386 was based on RedHat 9.0 on i386, and we will base Fedora Core-3 on Fedora Core-1. # cd $CIT_DIST/distros/lib/Distro/RedHat/Fedora # find * -type f | grep -v svn * Duplicate the class library and supporting tree of the related Distro to get started: # cp Core_1.pm Core_3.pm # mkdir Core_3 # cp Core_1/i386.pm Core_3 # cd Core_1 # for i in`find * -type d | grep -v svn`; do mkdir ../Core_3/$i; done * Edit the new class libraries... - fix class names - get rid of stuff you know is different - (more fine tuning will be done later) * Create the "pkglist" file Method 1: This method is simple and reliable, but may be time consuming and may require resources which are not available for a diskless cluster. - Install a diskfull node from CD (or FTP, kickstart, etc.) which is representative of a generic cluster node. This can be virtually any system and does not even have to be part of the cluster, as long as the software distribution and system architecture are the same. - login to the node and save the list of installed packages to a text file named "pkglist" (this is rpm -qa on an RPM based distro). Method 2: This method is usually quicker, keeps a new distro's package list closer to an existing distro, and works if you do not have a diskfull node available for method 1. However, you must be creating a Distro that is _very_ similar to an existing one and must be comfortable with RPM tools and resolving dependencies "by hand". - Copy the "pkglist" file from an existing Distro - Push the changes from CIT_DIST to the install path # make update - Create a new Distro object # distro_mgr --new --type Distro::RPM::RedHat::Fedora::Core_3::i386 fc3 - Point the object at the distribution source # distro_mgr --set --source_dir /scratch/distros/fc3/Fedora fc3 - Try creating the image... # build_distro --image fc3 - If you get "No RPM file found" errors, or encounter packages for which the selection menu does not provide a suitable match, then remove the coresponding packages from the "pkglist" file. - It is very likely that the resulting list will have unresolved dependencies. You will need to either remove the dependent packages from the "pkglist" file, or search the distribution source for packages which satisfy the dependencies and add those package names to the "pkglist" file. Here are examples of some commands which you might find useful: # rpm -q --whatprovides libselinux # find /home/distros/fc3 -name "*selinux*" # rpm -qp rhpl-0.148-1.i386.rpm -i --list - When you think you have resolved all the dependencies, try creating the image again... # build_distro --image --force fc3 - It will probably take 3-5 iterations to get the dependencies resolved. If you are editing the "pkglist" file in the CIT_DIST directory, remember to run "make update" before "build_distro". - For easier maintenance, sort the pkglist file. :) * Customize Files in the "image" and "overlay" directories # find Core_1 -type f | grep -v svn - Compare the files in the existing Distro object tree with those from the new distro image you just created in the step above. - Files should ONLY be in "image" if they have to be (i.e. some files need to be in the base image for a diskless node to boot properly). Most of the files in "image" have very minor changes from the distro defaults. For example, a key file for CIT integration is etc/rc.d/rc.sysinit (which must have a hook for the rc.cit script). - Most files can go in the "overlay" and are modified only slightly. For example, files in etc/xinetd.d where services are enabled, or the 'halt' script to work in a diskless evironment. IMPORTANT: - If you are a CIT developer and checking this into the CIToolkit source code repository, be sure to check in the original file first so we can more easily track changes from the stock distro. * Customize the Services to be started: # ls /cluster/machine/fc3/image/etc/rc.d/rc{0,3}.d - Determine which additional services should be started, and which ones need to be disabled. - Check which ones are already modified by higher layers in the Distro tree. # distro_mgr --service fc3 - Edit the class library files and modify the "service" directives. * Test the new Distro At this point, you should have enough done to get a node booted diskless using the sysarch in progres. So, customize the image, create the overlays, and create the individual diskless hierarchy for one of the nodes: # build_distro --custom --overlays --configs fc3 # device_mgr --set --sysarch fc3 --bootmode diskless node1 # build_diskless node1 # mk_conf --dhcp --local Now boot the node. It is probably easiest to use whatever kernel you have available that is close in distribution heritage and has CONFIG_ROOT_NFS capability. Otherwise you can build a new diskless capable kernel (see the diskless/doc/diskless_kernel file for help). Watch the boot on the console to see what startup scripts ran. Tweak the class library files and add to the overlays as necessary. It can be useful to check /etc/pam.d and /etc/xinetd.d on a booted diskless node (or this can be done from the nodes diskless boot tree /cluster/machine/SYSARCH/nodname) for configs that were included or excluded incorrectly. Hopefully you have a pretty much working Distro at this point. Boot a node in each of the overlay configs supported by the sysarch (diskless, diskfull, leader, login, root_ssh_login, etc...) to check that all the scripts, configs, and running services are what you want. Don't forget to do 'init 0' and 'init 6' on the node to test the shutdown/reboot path as well. * If you are having problems... Make sure to read ALL of the available documentation: # distro_mgr --help # distro_mgr --man # build_distro --help # less diskless/doc/diskless.hierarchy # perldoc base/doc/Attribute.defs.pod # less base/doc/CI.checklist Take advantage of all the debugging information you have available, and concentrate on getting that debug/log information working if it isn't. - syslog, for both the admin and the compute nodes should be collected in /var/log/syslog-ng - console output, via serial console, console-over-lan, or if all else fails go plug in a VGA monitor. - rsh/ssh/console login, however you can get in should help set up and debug the other methods, even if they will be turned off later for security purposes - server daemon stats, should help indicate how far a node gets in the boot process by checking dhcp, tftp, and nfs logs. Check configs and permissions. A few common problems are: - CIT directory access to the nfs server - /etc/exports - /etc/passwd, /etc/shadow, /etc/group - PAM and xinetd Use the "--debug" and "--libdebug" flags with CIT commands. If all the above doesn't point you to the source of the problem, then clearly there is a bug or the documentation is insufficient. Please contact the CIToolkit development team for help.