Enstore
Server
Upgrades v0.5
- KickStart
HowTo
- twiki
log
- How to use the install CD:
- If you use an unmodified CD:
- Using an Enstore Kickstart CD:
- Using a Enstore Floppy install disk:
- Before Rebooting:
- Begin
Install:
- Floppy disk
- CD
- Poking
around during the install
- When the
install Compltes
- Post
install Reboot
- Trouble Shooting
- Fdisk Errors
- You are
unable to
login
- SYSCONNECT
NIC Errors:
- Instructions & Addtions to
troubleshoot
sysconnect
- the
sk98lin.o modules file
- Unwind Motherboard
Vendor Bios Firmware
- Downtime
rules:
- Partial
Server Upgade Plan
- General
List
- Chih-Hao's notes:
- Upgrade meeting notes from
03/08/2006 11:55 AM
- ~srv0
- ~srv1
- Issues to verify
after a ~srv1 install
- stop dcache
- start dcache
- update
farmlets
- ~srv2
- What
issues remain with remedy_api
- apache
- Copy the correct
files over
- Copy cgi
scripts
- ~srv3
- things
tweaked after install of
stkensrv3.
- CRON and histograms
- dCache install issues:
- pageDcacheCms*
- dcache_page_dccpcms
- PageDcacheSRM
& pageDcacheKftp
- dcap and kftp products
- globus -
grid certs
- Certificates
- Installs
for pageDcache cronjobs
- ~srv4
- ~srv6
- ~srv5 & ~srv7
- Outstanding
Questions?
- tcp-wrappers
are installed
- Are
we correctly setting the hosts.allow ?
- cron.daily
& logrotate.d
- The correct configuration of send
mail.
- Any
questions about PostgreSQL
& PyGreSQL
- How should servers
have
xinetd.d/ftp set?
- Which
setting should we use for /etc/xinetd.conf ?
- SDS2 and
the sk98lin driver
- some
additions regarding SD-2 systems
- To
use sysconnect as a primary NIC
disable onboard NIC's in bios.
- Do
not use old reference to
sk98lin driver .
- Should
we improve the /etc/resolve.conf file?
- Change
settings for "ups
product ipmi" ?
- Dimitri will install
gnuplot-4.0
- crontab
notes
- Postgres and PNFS
boot script
KickStart
HowTo
twiki
log
Dmitri
kept a twiki log of the project notes here.
http://ncdf68.fnal.gov/twiki/bin/view/Main/MoversUpgrade
How to use the install CD:
More to
follow
If you use an
unmodified CD:
More to
follow
Using an
Enstore Kickstart CD:
More to
follow
Using a
Enstore Floppy install disk:
More to
follow
Before Rebooting:
get
a copy of the kickstart Cd or Enstore & driver floppy disks.
- select
the node to upgrade.
- Schedule
Enstore services " --down " for this node. This
step is not necessary if the All of Enstore is down.
(i.e.) mover processes, pnfs,
library_managers & media_changers, postgres or whatever.
- schedule
the node and its Enstore processes down,
[enstore@stkenmvr19a
enstore]
enstore-boot stop
%enstore
sched --down 994019.mover --reason "upgrade
to LTS"
- run
backup scripts on diskc of ~srv3. essential, special,
home, & system into to ~srv3 after mounting ~srv3Ldiskc
to /backup
- plug
the keyboard cord into the mover machine.
- plug
the blue display 9-pin (??) connector into the mover
machine.
Begin
Install:
log
onto the console as yourself. === if you have a valid ticket, ksu
to
root.
Or, just login as root
Place
the Enstore kickstart CD or floppy into the appropriate device
and reboot using:
/sbin/shutdown -r now
reboot
or CTRL-ALT-DELETE
Floppy disk
If
you are using the Enstore kickstart floppy disk you will be promted for
a driver disk.
- if
you have a driver
disk. hit YES.
- Remove
the kickstart floppy disk.
- Insert
the appropriate driver
disk. hit YES again. the driver disk begin will load.
- When
the disk has loaded you will be prompted again.
- If
you wish to load additional driver disks, continue
- If
you do not have additional driver disks, tab
to NO and hit enter.
- Otherwise
load the next disk and hit enter.
- You
do not need to reinsert the Enstore kickstart
disk.
- You
may remove the driver disk.
CD
If
you are using the Enstore
kickstart CD
- For
a StkEnstore install just press `enter`
- For
one of the other instances of Enstore; Type;
- d0en_ks
- cdfen_ks
- stken_ks
I
hope to make CDs that default to the local instance of Enstore.
So, on Moday the disks will default to the D0 Enstore kickstart.
We do have a caveat: Using the stken_ks dd option seems to fail in its
purpose of stalling long enough to
allow GB ehernet to negotiate an network connection..
more to follow.
Poking
around during the install
- After
the GUI pops up you can poke around
during in the background with CTRL-ALT-F2 thru CTRL-ALT-F6
Alt-F2 is the only interactive screen
Alt-F3
is the anaconda stdout
Alt-F4
& F5 are system or network stdout
Alt-F7
returns you to the anaconda GUI
if
the
install of the ups products seems to be going slow, check if the
network is set half
or full duplex.
Wait for the ups install progress line to jump form a thin vertical
line to a 1 or 2 inch horizontal bar
- type; CTRL-ALT-F2
to get to the background interactive screen.
- type
ethtool eth0
- And
look for entries like this:
=== %< ===
Advertised
auto-negotiation: Yes
Speed:
1000Mb/s
Duplex:
Full ( This may report Half )
===
%< ===
- if
duplex is Half
then set it to Full.
ethtool -s eth0 autoneg off
ethtool
-s eth0 duplex full
When the
install Compltes
you
may return to the anaconda GUI and be congratulated on a successfull
install.
But wait there is more: You may feel that you should check.
- Read through the anaconda log
files in /tmp.
- Verify the state of vital files
and
directories
Tab
to the SPOT & press Enter to
reboot.
When the GRUB Splash screen pops up verify that the smp kernel will
boot.
Use the arrow keys to move to it if the smp kernal is not already
highlighted.
Post
install Reboot
%ksu
enstore
%EPS
- if
no appropriate processes are listed or some are missing, atempt
to start them.
%enstore start
%enstore
sched --up 994019.mover
%enstore
mover --online 994019
%enstore
mover --status 994019
- Check that products are installed,
properly configured
& running on
all nodes:
ngop
(-q agent), python (enstore
version)
test by typing:
.
/fnal/ups/etc/setups.sh
setup ngop
ngop status
Trouble Shooting
Fdisk Errors
could not read partition
table. may
be bad disk.
%fdisk
%fdisk -l
%hdparm
-t -T /dev/hda
you
may inserat a diagnostics CD or floppy disk. (QuickTech
Pro). then reboot.
You are
unable to
login
at
grub prompt hit E for edit.
add
to line
linux
init=/bin/sh
then
after a reboot
mount
-oremount,rw /dev/hda1 /
edit
/etc/shadow and remove the root
password.
reboot
log
in as root and reset the
password.
copy
these files over again.
%
mount
d0ensrv3:/diskc /backup
%
cp /backup/backup/passwd
/etc/passwd
%
cp /backup/backup/shadow /etc/shadow
%
cp /backup/backup/group /etc/group
SYSCONNECT
NIC Errors:
- cp
/lib/modules/2.4.21-32.0.1.ELsmp/unsupported/drivers/net/sk98lin/sk98lin.o
/lib/modules//2.4.21-32.0.1.ELsmp/kernel/drivers/net/
- vi
/etc/modules.conf
#alias
eth0 e1000
alias
eth0 sk98lin
alias
eth1 e100
alias
scsi_hostadapter qla2200
alias
eth2 sk98lin
alias
usb-controller usb-uhci
reboot
- if
mover cannot start,
check that the sudoers file is correct. it
should have a line in it about enstore.
- if
not, copy a new one over.
%
ksu
%
rcp d0enmvr7a:/etc/sudoers /etc/sudoers
%
exit
- and/or start
the monitor server to see if it starts
%ksu
%monitor_server-boot
start
- and/or check
that the farmlets are ok
$farmlets
-f
- and/or Something
about %/etc/rc.d/rc.local
Instructions & Addtions
to
troubleshoot
sysconnect
attached the instructions
& my addtions to troubleshoot
sysconnect
I made the following links
in an attempt to to satisfy
~enstore/enstore/bin/Linux/wget:
/lib/libssl.so.2
->
libssl.so.0.9.7a
/lib/libcrypto.so.2 ->
libcrypto.so.0.9.7a
However, we get an undefined symbol error
for OpenSSL_add_all_algorithms, and a 37 MB log file (it can't
find any tape). Perhaps wget should be recompiled and
linked?
Or should we use the system's wget? In the meantime, I've made
a link to the system's wget in $ENSTORE_DIR/bin/Linux for now and
burn-rate is working.
the
sk98lin.o modules file
The
sk98lin.o modules from SLF3.0.5 are pretty old and exhibit the old
arp resolution problems with V2.0 cards. Fortunately, we
don't yet have that many of them installed. stkenmvr5a is
the only updated system so far, but there will be more.
I've made new drivers for the smp and uniprocessor kernels. They
are on the *srv3 nodes. The file names, which show where the
files need to go,
are:
/diskc/backup/share/lib_modules_2.4.21-32.0.1.EL_kernel_drivers_net_sk98lin_sk98lin.o
->
lib/modules/2.4.21-32.0.1.EL/kernel/drivers/net/sk98lin/sk98lin.o
/diskc/backup/share/lib_modules_2.4.21-32.0.1.ELsmp_kernel_drivers_net_sk98lin_sk98lin.o
->
lib/modules/2.4.21-32.0.1.ELsmp/kernel/drivers/net/sk98lin/sk98lin.o
So
how do I get it?
Do I need to load using ethrnet connection and
then copy driver as I did before or you have updated
drivers
diskette?
Unwind Motherboard
Vendor Bios Firmware
# copy DOS fs image
if [ -r $RESTORE_FIRMWARE_FILE ]
then
echo "`date +%H:%M:%S` -[
Restoring DOS/$firmware image to /dev/${disk}1
]-"
(gzip -d -c
$RESTORE_FIRMWARE_FILE | dd bs=8k of=/dev/${disk}1)
mount -t msdos /dev/${disk}1
$RESTORE_TO
echo "`date
+%H:%M:%S` -[ Setting firmware nodename to $hostname
]-"
[ -f
$RESTORE_TO/nodename.txt ] && rm -f $RESTORE_TO/nodename.txt
echo "$hostname" >
$RESTORE_TO/nodename.txt
echo "`date
+%H:%M:%S` -[ Setting firmware flavor to
$FIRMWARE_FLAVOR
]-"
[ -f $RESTORE_TO/flavor.txt
] && rm -f $RESTORE_TO/flavor.txt
echo "$FIRMWARE_FLAVOR" >
$RESTORE_TO/flavor.txt
umount /dev/${disk}1
else
echo "`date +%H:%M:%S` -[
Skipping DOS/$firmware firmware load ]-"
fi
Downtime
rules:
- Don't perform anything not in the plan no
matter how
trivial.
- Any system reboots, even if for minor
things, should be
planned as if the system will crash, so the time to recover should be
factored in.
- In addition to capturing the state of the
servers across
upgrades, we need to weekly capture this state to srv3 or some other
node via cron
- We don't have a clue on what to do about
pnfsmanager on
the pnfs servers.
- dasadmin (and ?) was missing from
kickstart for stkensrv4
- We shouldn't burden Terry or Michael with
anything else
until they get all servers' state understood, incorporated into
kickstart, and backed up to srv3.
Partial
Server Upgade Plan
General
List
*srv1:
- pnfs
- locally compiled but placed into ups (configuration?)
- postgres
-
locally compiled but placed into ups (what version?)
- java
(with
dCache) - From Sun? What version?
Configuration?
*srv2:
*srv3:
- java
- srmcp,
dcap, kftp
- console
servers
*srv4:
- aci
- we need to try this on LTS 3.0.5
*srv0
and *srv6:
- postgres
- locally compiled but
placed into ups (what version?)
Other:
- gnuplot
- 4.0
rpm? -- Dimitri
- console
servers
Chih-Hao's notes:
-- Chih-Hao writes:
In light of April 3 being:
- the first working day after day light saving time change.
- the first working day after Dan Ryan reconstruction begins ...
this is what I'll do for d0 upgrade:
I will start the jobs from home ...
- I assume that I will get an e-mail notice of d0en being paused by
8:00 a.m. Please do not touch d0ensrv[036] ... until I send out a
notification
- I'll wait the 8:10 backup to run its natural course.
- After the backup finishes (should be in 10 minutes), I'll stop
file_clerk, volume_clerk, info_server, accounting_server, and
drivestat_server.
- I'll dump the current databases ... should be done in half an
hour.
- I'll shutdown database servers.
- I'll send out e-mail notification to the ring master and cc: to
enstore-admin
- Then, ISA may shutdown the machines and do the OS upgrade.
- I'll beat the traffic to get here ...
- After getting the go-ahead for d0ensrv[036], I'll do the rest.
The estimated time is about 4 hours.
Upgrade meeting notes
from 03/08/2006 11:55 AM
d0en Apr 3,4 upgrade list from the board
Monday and Tuesday:
Start 8am
backup pnfs database (vp)
backup f/v database (ch)
backup acc database (ch)
backup servers' state to srv2 raid (TJ+MZ)
backup servers' state to srv3 raid (TJ+MZ)
Upgrade srv4 (IA)
After (acc db bup) upgrade srv6 (IA+MZ)
After (f/v db bup) upgrade srv0 (IA+MZ)
After (pnfs db bup) upgrade srv1 (IA+MZ) - don't delay this
After (up srv6) upgrade pg srv6 (CH)
After (up srv0) upgrade pg srv0 (CH)
After (up srv1) upgrade pg srv1 (VP)
upgrade srv2 (IA)
upgrade srv3 (IA)
upgrade postgres clients (CH)
Tuesday:
LTO bin istallation in ADIC robot
srv5 and srv7 to be upgraded independent at another time (before or
after)
David - QA on upgrades
Pre-stuff
Send out email about home areas and ask
to clean up home areas on the srv machines (TJ)
write backup script (TJ+MZ)
kickstart cleanup (TJ+MZ)
HW inventory (TJ+MZ)
Procedure for each SRV (TJ+MZ)
~srv0
postgres
- locally compiled but
placed into ups (what version?)
- pnfs
- locally compiled but placed into ups (configuration?)
- postgres
-
locally compiled but placed into ups (what version?)
- java
(with
dCache) - From Sun? What version?
Configuration?
~srv1
~srv1 is
a pnfs server node. I built the postgres and pnfs for it. If we want to
upgrade the postgres to the latest version (8.1.2 for now) we need a
downtime (~6-8 hours) to convert the databases to new format.
- pnfs
- locally compiled but placed into ups (configuration?)
- postgres
-
locally compiled but placed into ups (what version?)
- java
(with
dCache) - From Sun? What version?
Configuration?
Issues to verify
after a ~srv1 install
( This applies anywhere the pnfs database server lives )
- ls -l .bashrc
~enstore/.bashrc
-rw-r--r--
1
root
root 6340 Jan 28
2003 .bashrc
-rw-r--r-- 1 enstore
enstore 3703 Mar 10 2005
/home/enstore/.bashrc
- does ~srv1 have python v2_2_3_E_1.
- postgres v8_1_3, which has nothing to do
with pnfs
...-- Chih-Hao
- the PNFS startup
"/etc/rc.d/init.d/pnfs"
script was missing. I was able to extract it from the
system tar on stkensrv3. Using the command:
tar zxf /mnt/backup/backup/stken/srv/1/etc.tar.gz etc/rc.d/init.d/pnfs
- I have also removed the *local* files
under /backup on
stkensrv1. This a single file /backup/enstore/stken/special.
stop dcache
shut down stken dcache with:
- /sbin/service dcache-boot stop
start dcache
to start dcache on stkensrv1 after OS upgrade we
restored missing links in /usr/java area (CURRENT and OLD),
- removed gcj java and libs, and rpms
depending on it:
- rpm -e libgcj-3.2.3-52
libgcj-devel-3.2.3-52
gettext-0.11.4-7
- redhat-lsb
- rpm -e
gcc-java-3.2.3-52
- restored missing symb links in ~enstore
area:
- cd
~enstore/dcache-deploy/dcache-fermi-config
- ln -s ../classes .
- ln -s ../config .
update
farmlets
Update farmlets on stkensrv1 and stkensrv4. On stkensrv1,
only the
stken files were there, and they were old. On stkensrv4, they were all
there,
but also old.
~srv2
What
issues remain with remedy_api
What to install for the remedy api.
apache
I
have a couple of questions about apache running on stkensrv2
- Yesterday, when I was
updating the enstore files on stkensrv2,
I wanted to shut down httpd before installing the updated
cgi.py files in the appropriate product directory. I did "ups
stop apache", which simply returned after a half a minute or
so, but all the httpd processes were still running.
- Q.) How
do we stop apache?
- A.) You did the right
thing, that is supposed to be the way to stop apache. at least
that is
what i have done in the past. I do not
know why it did not work.
- I've been comparing the
crontabs
on the three srv2 nodes in order to reconstruct what was on the
original stkensrv2 before the swap, because I discovered a little
while ago that only the root crontab was brought over. The
enstore
crontab, for example, dates from 2003, and resurrected some jobs
that have been doing "ls -lR /pnfs/cms" and "ls -lR
/pnfs/minos" since midnight last night. I killed those, but
it raised the question,what else is different? More on that in a
separate email.Isn't
the original machine
still available? It should be, I think. All
it needs is to have its network and keytab set up as stkensrv9, then we
should be able to recover
whatever
original files we need. What I noticed is that the new
stkensrv2 is running apache v1_3_26b, while the old d0ensrv2 and
cdfensrv2 are running apache v1_3_31.
- Q.) Why is the new
system running an older version of apache than the others?
- A.)
you need to ask terry this, as he installed the version. i did
not look at the version on the
new node, just tailored it. i agree, that it should be the same
version as on all the other
nodes. we should install the latest soon, as this one may have
security problems.
Which is the correct version?
Copy the correct
files over
STKlog
/local/ups/prd/www_pages/enstore/log/STK-log.html
Copy cgi
scripts
none
of the cgi
scripts had been copied into the correct area.
Gene
Oleynik wrote:
Link fails for Tape inventory summary page and
tape inventory. lqcd dcache (maybe it is down) so far that
is
all.
~srv3
things
tweaked after install of
stkensrv3.
- Wayne
started up iptables because of the 3dm webserver
(system disk mirror, same as newer stkendca nodes), but he forgot
to chkconfig iptables on.
- The
combination of a slightly
newer version of bash and the particular server it is running on
exposed an unbound variable in netscan that was causing it to die
on stkensrv3, where previously it was ignored.
- netscan
is
complaining about the above mentioned webserver, listening on port
"webcache" and possibly others. I will modify it to allow
it.
- netscan
is also complaining about rpc.rquotad (nfs
quotas daemon). This has no separate chkconfig entry to turn it
off - it is started along with nfs based if the variable RQUOTAD
is undefined. I have added the line "RQUOTAD=no" to
/etc/sysconfig/network so it won't start next time, and killed the
daemon.
- ngop
did not start at boot time. I
renamed
/fnal/ups/db/.upsfiles/{startup,shutdown}/stkensrv8.fnal.gov.products to
- /fnal/ups/db/.upsfiles/{startup,shutdown}/stkensrv3.products
Dmitri
installed ImageMagick on stkensrv3
- Burn
rate plots on stken suffered from
missing /usr/X11R6/bin/convert on newly configured stkensrv3 ...
(d0en and cdfen were fine). Bytes per day plots are generated
from
different source and the same cause affected all
systems.
- not
finding xemacs on
stkensrv3. is
xemacs installed on new nodes
CRON and histograms
copy
of the ~enstore/CRON and ~root/CRON files ~srv3
We
forgot to make a fresh copy of the ~enstore/CRON and ~root/CRON files
from the old to the new stkensrv3 systems. I have copied
over the output files that had changed between 12/9 and today,
and that hadn't already been superseded. And I've merged
the
histogram file
data.
dCache install issues:
pageDcacheCms*
chmod 666 /var/log/messages* to
allow pageDcacheCms* jobs to
run.
/home/enstore/enstore/sbin/pageDcache dccp cms
++
/home/enstore/enstore/sbin/pageDcache dccp cms
cmsdcdr2.fnal.gov:
Connection refused
trying normal rcp (/usr/bin/rcp) WARNING:
NO ENCRYPTION!
cmsdcdr2.fnal.gov: Connection
refused
===================
output from
/tmp/dcache_page_output_dccpcms_11046
=======================
INFORMATIONAL:
Product 'kftp' (with qualifiers ''), has no current chain (or may
not exist)
INFORMATIONAL: Product 'dcap' (with qualifiers ''),
has no current chain (or may not exist)
INFORMATIONAL:
Product 'dcap' (with qualifiers 'unsecured'), has no current chain
(or may not
exist)
Moved the /etc/exports from stkensrv8 to
stkensrv3.
dcache_page_dccpcms
INFORMATIONAL:
Product 'kftp'
(with qualifiers ''), has no current chain (or may not
exist)
INFORMATIONAL: Product 'dcap' (with qualifiers ''), has
no current chain (or may not exist)
INFORMATIONAL: Product
'dcap' (with qualifiers 'unsecured'), has no current chain (or
may not exist)
PageDcacheSRM
& pageDcacheKftp
pageDcacheSRM
hasn't run since the upgrade because srmcp is missing.
pageDcacheKftp hasn't run since the upgrade; gssmodule.so
is unhappy:
Traceback (most recent call last):
File "/fnal/ups/prd/kftp/v3_6/NULL/bin/ftpcp.py", line 1,
in ?
from gssftp import GSSFtpClient, FTPError
File "/fnal/ups/prd/kftp/v3_6/NULL/lib/gssftp.py",
line 1, in ?
import gss
ImportError:
/fnal/ups/prd/gsspy_krb/v1_0b+p2_3/Linux/lib/gssmodule.so: undefined
symbol: PyType_IsSubtype
dcap and kftp
products
The
current versions of dcap and kftp were installed in UPS but were not
declared.
I have decalred them.
ups declare -f Linux+2.4 -c
dcap v2_32_f0408
ups declare -f NULL -c kftp v3_6
globus -
grid certs
I
have looked briefly at the globus - grid certs on stkensrv8 and
stkensrv3. It
looks like stkensrv3 was copied to stkensrv8.
/home/enstore/.globus/certificates/
---
snip -->%
--Too many to list here --->%---
/home/enstore/globus/
total
64
drwxrwxr-x
13 enstore
enstore
4096 May 28 2003 .
drwxr-xr-x
27 enstore
enstore 4096 Dec 9 15:49
..
drwxrwxr-x
2 enstore
enstore
4096 May 28 2003 bin
drwxrwxr-x
6 enstore
enstore 4096 May 28 2003
etc
-rw-r--r--
1 enstore
enstore
6715 Apr 24 2002 GLOBUS_LICENSE
drwxrwxr-x
4 enstore enstore 4096 May 28
2003 include
drwxrwxr-x
3 enstore
enstore 8192 May 28 2003
lib
drwxrwxr-x
3 enstore
enstore
4096 May 28 2003 libexec
drwxrwxr-x
6
enstore enstore 4096 May 28
2003 man
drwxrwxr-x
2 enstore
enstore
4096 May 28 2003 sbin
drwxrwxr-x
3
enstore enstore 4096 May 28
2003 setup
drwxrwxr-x
5 enstore
enstore 4096 May 28 2003
share
drwxrwxrwx
2 enstore
enstore
4096 May 28 2003 tmp
drwxrwxr-x
2 enstore
enstore 4096 May 28 2003
var
Certificates
ls
-l /etc/grid-security/*
/usr/krb5/bin/rcp -pr
root@stkensrv3:/etc/grid-security .
chkconfig --level 345 nfs
on
chkconfig --level 345 netfs on
chkconfig --level 345 smartd
on
chkconfig --level 345 portmap on
chkconfig gpm
off
chkconfig microcode_ctl off
chkconfig iptables
off
chkconfig ip6tables off
Installs
for pageDcache cronjobs
ups
declare -f Linux+2.4 -c dcap
v2_32_f0408
ups declare -f NULL -c kftp v3_6
upd
list gsspy_krb
DATABASE=/ftp/upsdb
Product=gsspy_krb
Version=v1_0b+p2_3
Flavor=Linux
Qualifiers="" Chain=current
upd install
gsspy_krb v1_0b+p2_3
upd list
gsspy_gsi
DATABASE=/ftp/upsdb
Product=gsspy_gsi
Version=v1_0b
Flavor=Linux
Qualifiers="" Chain=current
upd install
-G "-c" gsspy_gsi v1_0b
informational:
installed gsspy_gsi v1_0b.
upd
install succeeded.
ups list
-a gsspy_gsi
DATABASE=/local/ups/db
Product=gsspy_gsi
Version=v1_0b
Flavor=Linux
Qualifiers="" Chain=""
upd
install -c blt
informational: installed tcl
v7_4dfa.
informational:
installed tk v4_0dfa.
informational:
installed blt v1_9.
ups
declare -c srmcp
v1_20
<stkensrv3.fnal.gov> ups list -a
srmcp
DATABASE=/local/ups/db
Product=srmcp Version=v1_20
Flavor=NULL
Qualifiers="" Chain=current
ls
/usr/java/j2sdk1.4.2_01/bin/
ls: /usr/java/j2sdk1.4.2_01/bin/: No
such file or directory
ls /usr/java/j2sdk1.4.1/bin/
ls:
/usr/java/j2sdk1.4.1/bin/: No such file or directory
However,
ls /usr/java/j2sdk1.4.2_08/
chmod 755
/var/spool/mqueue/
~srv4
ACI
product may not install properly.
David packaged aci v3_1_2 along with v3_1_0 into the tar files
on stkensrv3:/diskc/backup. It includes statically linked executables,
both the archives and the shared libraries, and some utilities that
weren't in the previous version, all built ostensibly for Linux 8.0
- who & why declared aci 3_1_2.
- whoever did this should do the rest
(modify enstore.table)
if
this was necessary.
I do not recall any developer declaring aci 3_1_2
~srv6
postgres
- locally compiled but
placed into ups (what version?)
- pnfs
- locally compiled but placed into ups (configuration?)
- postgres
-
locally compiled but placed into ups (what version?)
- java
(with
dCache) - From Sun? What version?
Configuration?
The only glitch that I have encountered was the database server
startup scripts in /etc/rc.d/init.d (and links in rc3.d and rc6.d) were
missing. I guess they were not preserved during the upgrade and I
imagine that stkensrv0 might suffer the same. I'll pay attention to
stkensrv0 this time. However, in the future upgrade, we should remember
to preserve all relevant scripts in /etc/rc.d ...
~srv5 & ~srv7
Outstanding
Questions?
tcp-wrappers
are installed
Looking in Installed Packages:
Name
Arch
Version
Repo
--------------------------------------------------------------------------------
tcp_wrappers
i386
7
.6-34.1
db
zz_tcp_wrappers_change
noarch
3.0-2
db
Tcp_wrappers does install, I am not sure what rpm package it is
in.
rpm
-ql
zz_tcp_wrappers_change-3.0-2
/etc/banners
/etc/banners/fingerd
/etc/banners/ftpd
/etc/banners/in.fingerd
/etc/banners/in.ftpd
/etc/banners/in.rlogind
/etc/banners/in.telnetd
/etc/banners/rlogind
/etc/banners/telnetd
/etc/doe.motd
Are
we correctly setting the hosts.allow ?
are
restrictions put in hosts.allow more than
.fnal.gov?
this will mess up the scanning from randy.
Here are the entries in the hosts.allow
file.
#
Loopback interface
ALL:
localhost 127.0.0.0/255.0.0.0:
banners /etc/banners
#
FermiLab Network
ALL:
.fnal.gov:
banners /etc/banners
ALL:
131.225.0.0/255.255.0.0: banners
/etc/banners
#
Minos Soudan (only needed for STKEn)
ALL:
198.124.212.0/255.255.255.0: banners /etc/banners
ALL:
198.124.213.0/255.255.255.0: banners /etc/banners
#
Enstore
Private Network
ALL:
192.168.19.0/255.255.255.0: banners
/etc/banners
I
have sent this note to Troy and Connie.
We may have questions about
the
sendmail config files.
D0enmvr7a uses the installed defaults.
cron.daily
& logrotate.d
I
copied the /etc/logrotate.d dir to /etc/logrotate.d.backup
I then removed
psacct and yum.rpm from /etc/logrotate.d
Make a copy
of /etc/cron.daily:
cp -pr /etc/cron.daily
/etc/cron.daily.backup
And moved these files into
cron.daily.backup
mv /etc/cron.monthly/0anacron
/etc/cron.daily.backup/monthly.0anacron
mv
/etc/cron.weekly/0anacron /etc/cron.daily.backup/weekly.0anacron
/etc/cron.daily/tetex.cron
/etc/cron.daily/yum.cron
Should we remove or modify this link?
lrwxrwxrwx 1 root
root 28
Aug 8 2005
/etc/cron.daily/00-logwatch -> ../log.d/scripts/logwatch.pl
The correct configuration
of send mail.
The default install of send
mail should be correct.
On movers;
Sendmail
is running but shouldn't accept mail;
UNAUTHORIZED NETWORK
SERVICE, type 2! stkenmvr17a LISTEN sendmail 1095 root 4u IPv4 1427
TCP localhost:smtp (LISTEN) root 1095 0.0 0.0 6132 388 ? S Oct11 0:02
sendmail: accepting connections
on the movers we removed
these two files.
/etc/mail/sendmail.cf
/etc/mail/sendmail.mc
We may want to change that.
Any
questions about PostgreSQL
& PyGreSQL
PostgreSQL
v8_0_3 and PyGreSQL 3.6.2 have been built and installed on stkensrv9
...
run the tailor
script. there will need to be a simlink made to connect the
web area (currently /local/ups/prd/httpd/servers/stken/html)
to /diska/www_pages when the raid is hooked up. it
has the same alias as the current stken so i did not run it.
PostgreSQL
& PyGreSQL
I
have built and installed PostgreSQL 8.0.3 and PyGreSQL 3.6.2
on
stkensrv6.
The
only glitch that I have encountered was
the database server startup scripts
in /etc/rc.d/init.d (and links
in rc3.d and rc6.d) were missing. I
guess they were not preserved
during the upgrade and I imagine that stkensrv0
might suffer the
same. I'll pay attention to stkensrv0 this time.
However, in the
future upgrade, we should remember to preserve all relevant
scripts in /etc/rc.d ...
I
have restarted the database
servers, accounting_server and drivestat_server.
They all look
fine.
PostgreSQL
and PyGreSQL are installed on *srv[01236].
Current
PostgreSQL
version used by ENSTORE is 8.0.3 and PyGreSQL version is 3.6.2.
I
guess I need to package PyGreSQL somehow ...
-- Chih-Hao
How should servers
have
xinetd.d/ftp set?
We
decided that movers do not need ftp running. Netscan has been comparing
the xinetd.d/ftp files and warns that ftp
in
/etc/xinetd.d and /home/enstore/enstore/etc/ do not match.
It appears that this isn't
correct. I
cd'd to
/home/enstore/enstore/etc as enstore and entered these commands
- cvs commit ftp
- ENSTORE_DIR/tools/bless.py
ftp
One
problem with this. These same files in enstore/etc are used by
both movers
and servers. All the
other services are the same for
both. If this service
is different, we need
a mechanism to
distinguish the config files.
I have disabled ftp on
stkensrv1. If we eventually decide we need it, I'll modify netscan to
allow it.
Otherwise, let's assume it should be disabled.
Which
setting should we use for /etc/xinetd.conf ?
Definition;
cps
Limits the rate of incoming connections. Takes two
arguments. The first argument is the number of
connections per second to handle. If the rate of
incoming connections is higher than this, the service will
be temporarily disabled. The second argument is the
number of seconds to wait before re-enabling the service
after it has been disabled. The default for this setting is
50 incoming connections and the interval is 10 seconds.
cat /home/enstore/enstore/etc/xinetd.conf
/etc/xinetd.conf
#
# Simple configuration file for xinetd
#
# Some defaults, and include /etc/xinetd.d/
defaults
{
instances
= 60
log_type
= SYSLOG authpriv
log_on_success = HOST PID
log_on_failure = HOST
cps
= 1000 30
}
includedir /etc/xinetd.d
#
#
Simple configuration file for xinetd
#
# Some defaults, and
include /etc/xinetd.d/
defaults
{
instances
= 60
log_type
= SYSLOG authpriv
log_on_success = HOST PID
log_on_failure = HOST
cps
= 25 30
}
includedir /etc/xinetd.d
SDS2 and
the sk98lin driver
some
additions regarding SD-2 systems
- sysconnect
9D. These
cards are obsolete and SL305 does not a driver for it.
- What
to do:
replace NIC by newer.
To
use sysconnect as a primary NIC
disable onboard NIC's in bios.
On
boot
- Enter
into setup mode.
- Select
Advanced,
- Select
PCI
Configuration.
- Set
both on board cards "disabled",
- press
F10 to save and exit.
Do
not use old reference to
sk98lin driver .
Use
what Wayne set up:
/diskc/backup/share/lib_modules_2.4.21-32.0.1.EL_kernel_drivers_net_sk98lin_sk98lin.o
->
lib/modules/2.4.21-32.0.1.EL/kernel/drivers/net/sk98lin/sk98lin.o
/diskc/backup/share/lib_modules_2.4.21-32.0.1.ELsmp_kernel_drivers_net_sk98lin_sk98lin.o
->
lib/modules/2.4.21-32.0.1.ELsmp/kernel/drivers/net/sk98lin/sk98lin.o
Should
we improve the /etc/resolve.conf file?
By adding lines for a secondary name server.
cat /etc/resolv.conf
search
fnal.gov
nameserver 131.225.8.120
nameserver 131.225.17.150
nameserver 131.225.5.16
Change
settings for "ups
product ipmi" ?
What
can be donw to improve the install of ipmi? We
need to setup the correct ipmi.
ups
list -a ipmi
DATABASE=/local/ups/db
Product=ipmi Version=v1.5
Flavor=Linux+2.4
Qualifiers="" Chain=current
Product=ipmi Version=devel
Flavor=Linux+2
Qualifiers="" Chain=""
cd
~enstore/isa-tools/bin
Dimitri will install
gnuplot-4.0
also gnuplot-4.0 is available in /root/gnuplot-4.0 on *ensrv2
and *ensrv3
one should do (as root) after install of LTS 3.x:
- rpm -e gnuplot
- cd gnuplot-4.0
- make clean
- make distclean
- ./configure --prefix=/usr
- make
- make install
crontab
notes
If you edit a crontab in /var/spool/cron, and leave behind a
renamed version of the original, or otherwise create files that don't
belong there,
they will try to run. The general rule is that anything in there or
/etc/cron.d,
etc. is considered a crontab.
For example,
if you edit root and leave root.bck, both root and root.bck will try to
run. Recent releases of Linux are smart enough to not actually run a
file in
/var/spool/cron whose name is not in the passwd file, but in general I
wouldn't count on it. That's why there's a /var/spool/cron.disable on
many of our systems.
If you look in /var/log/cron on stkensrv2, there are "ORPHAN" entries
for enstore.old, baisley.save, and root.incomplete. I moved those files
to cron.disable.
The same principle applies to /etc/xinetd.d and /etc/xinetd.d.backup.
Postgres and PNFS
boot script
Where do we get these scripts? Vladmir will work on this