Primary Report for the Week of March 28, 2005
	*********************************************



Installs:
  0 installs  0 decommissions
Replaced drives:
  0 9940B  0 9940A  1 LTO1/LTO2  0 other
Replaced servers/movers/fileserver nodes:
  0 server  0 mover  0 fileservers
1 Robot hardware maintenance service calls
0 Server/Mover/Fileserver maintenance (repairs/parts replacement)
Investigations/Interventions:
 0 mover  1 server  1 tape drive  0 fileserver  10 tape  0 file  0 library
tape operations
224 tapes clobbered/recycled (143 CDF, 81 CMS)
  0 tapes labeled/entered/removed
  0 tape MIRs fixed
  1 tape MIRs not fixed
  0 tape drive firmware updates
  8 tapes cloned
  0 quota increase requests serviced
  0 raid disk replacements/interventions
  0 enstore service requests
  0 new muon pages
2 off-hour calls/interventions


                 Sunday
                 ------


                 Monday
                 ------
D0    Tapes PRO818L1, PRO835L1, PRO836L1, and PRO899L1 were copied.
      Last week tapes PRO270L1, PRO774L1, PRO797L1, and PRO808L1
      were copied.  An additional attempt to rewrite the TOC was
      made with 4772 firmware, but it fared no better than 4APO.
      Drive dumps were obtained after the failed TOC updates, but
      proved inconclusive.  Tape PRO808L1 was given to ADIC to
      analyze.  We will try writing new data to one of the other
      problem tapes, to help confirm our growing suspicion that
      the problem is media.  We're getting ready for a quick order
      of replacements in anticipation of returning the recent TDKs.
      More related trouble Thursday.

      Efforts are continuing to clone 9940A tape PRM335 so the original
      can be sent to STK for possible data recovery.  The tape, from
      July 30 - August 1, 2002 seems to have a vertical stripe erasure,
      giving EBLANK errors on 8 more-or-less regularly-spaced files.

SDSS  Various attempts were made to deal with a set of duplicate file
      names for 3 (or more) DLT tapes, especially education.

KTeV  A complaint was investigated about a lack of data delivery that
      turned out to be a problem with analysis code's vetoing events.

STK   Network cabling for 4 DLT movers was reworked to provide slack
      for the nodes' move to the top shelf as Xeons are replaced by
      SDS2s.  Cable tidies are also being installed.


                 Tuesday
                 -------
STK   Dcache headnode stkendca3a was found to have a network cabling
      problem.  Elimination of the intervening 4-wire patch panel allowed
      the network to negotiate a gigabit connection without a reboot.
      This in turn allowed the exp-db ftp puts to finish within the
      300-second window imposed by the CDF online firewall.

      Mover DG4ADLT went offline with error "mover stuck" and current
      volume "None".  There were 6 "too long in setup for none" alarms,
      followed by what appear to be mover stuck alarms with no volumes.
      There were 4 noaccess alarms for volume "None".

      Server stkensrv0 crashed about 18:03 on Tuesday while doing
      tarit.  We hadn't seen that problem since December.  We have
      a consistent loss of about 19 seconds over a few minutes, and
      have since the problem first showed up last November.  We don't
      have any such problems on the other srv0 nodes, which are much
      slower systems.  Something is clearly preventing interrupts
      during the system disk tar/gzip.  At this point, the network
      interface is a suspect.  Ntp's "synchronization lost" message
      would be consistent with that hypothesis.  We will switch to the
      other onboard interface at the next opportunity.

      At least a dozen stale plots were removed from enstore web pages
      by David, along with other cleaning up.


                 Wednesday
                 ---------
CDF   143 CDF tape volumes were recycled and checked for flipped tabs.

D0    Drive D41DLTO replaced again.  It had had problems back on the 21st
      and been replaced then.  The replacement drive had a write error on
      tape PRO897L1, which remained in the drive over the weekend due to a
      mover bug.

      Cloning of tape PRO899L1 produced a weird result -- The copy went
      fine, but the verify pass was unacceptably slow for many files.
      That problem may be a SCSI/hardware problem; there are lots of
      reset messages on the console, but nothing in syslog (?!).
      Reseating the SCSI cables and recopying to a different tape went
      much better, though there were still a few resets.

      Terry cleaned up wiring and console port connections/assignments
      on the northeast ADIC shelves.


                 Thursday
                 --------
D0    Wayne was paged at 3:30 AM because a couple of tapes, PRO797L1 and
      PRO906L1, caused 11 movers to go offline between 18:53 Wednesday
      and 4:01 Thursday.  The drives reported positioning errors after
      encountering FTT_EBLANK and FTT_EUNRECOVERED errors.  The tapes
      were not marked NOACCESS because FTT_EUNRECOVERED errors have been
      drive hardware problems in the past.  The tapes would get retried
      in other drives, eventually taking enough down to cause a red ball.

      Incidentally, the PRO797L1 had been cloned last Thursday and the
      copy put back in service Friday.  Since it was almost half empty,
      Wayne chose to not mark it read only.  Between Friday 3/25 and
      Thursday 3/31 morning at 3:06, an additional 1016 files were
      written to the tape, without incident.  The problem with the clone
      is that even though all of the files are readable, seeking to the
      last file or to the last filemark will fail.  It will likely need
      to be cloned again.  Seems to be more bad media.


                 Friday
                 ------


                 Cron jobs
                 ---------
CDF    pageDcacheDccp had hung last week.  SOP.

STK    tarit killed stkensrv0 Tuesday evening.
       offline_inventory hung when stkensrv0 died.  SOP.
       pageDcacheGridftp hung around 00:14 Wednesday morning.  SOP.
       pageDcacheKftp hung around 00:24 Wednesday morning.  SOP.
       pageDcacheSRM had a couple of errors on Tuesday.

CMS    Lots of jobs failed during Thursday's downtime.

By SOP is meant that the most distant descendant in the process
list is killed, usually resulting in a retry that succeeds.


                 Questions
                 ---------
A question needs to be resolved about deleted files and migration.
Should tapes automatically migrated for excessive mounts, say, get
recycled?  Files will become unrecoverable with less consideration.

Also, tapes with bad MIR/TOCs will be difficult to handle if
significant file skipping is required (due to deleted and/or
already-copied files at the front of the tape, for example).

Should FTT_EUNRECOVERED errors take the tape out of service, along
with the drive?

Dcache questions:  How do we find log entries for user's Dcache
accesses, successful and otherwise?  How do files transition from
write to read pools?  How long do they remain precious?  When are
they flushed from write and read pools?

D0 downtime next Tuesday?














Black day as EU fools with place names

EUROPEAN bureaucrats will push forward legislation today to force
the Scottish Executive to change place-names that offend or
discriminate on the grounds of race and gender.

In a move the Nationalists described as the "ultimate madness in
political correctness", it has taken only a quorum of four Euro
commissioners from Italy, Germany, France and Spain to redraw
Scotland's map.

The German commissioner, Arlo Pilof, the architect of the 2006
Race and Gender Equality Imposition Code (conformity), an amendment
to existing rules, said: "We believe many names do not conform,
and we started with Scotland because it is the worst of the
culprits with offensive names such as Skinflats, near Grangemouth."

However, he promised the Scottish Executive could apply for grants
of up to 43.6 million euros (28 million pounds) to facilitate change.

That was dismissed yesterday by the Scottish Chambers of Commerce as
a "drop in the ocean". A spokesman said: "Changing stationery and
business cards could cost that alone."

The commissioners in Brussels have demanded "race and gender-
sensitive" names found for towns such as Motherwell, Blackburn,
Helensburgh, Fort William, Campbeltown, Peterhead, Lewis and
Fraserburgh be changed.

A Scottish parliamentary group, set up in anticipation of the
legislation, has made a start. Fort William, in the shadow of
Britain's highest mountain, would become Fort Nevis by 2006, under
one suggestion.

Edinburgh City Council is considering revising Arthur's Seat because
the commissioners said its ancient name contained sexual undertones
"likely to offend those visiting Edinburgh".

Under the new amendment the word "Glen" could be banned as gender-
biased. Scotland Office officials have suggested a change to Vale,
as in Valecoe and the Great Vale.

An SNP spokesperson said: "This is monstrous buffoonery, an
outrageous waste of resources and politically correct madness.

"I understand, for example, that North Lanarkshire Council will
consider plans to change Motherwell to Parentwell," the spokes-
person said. "What is Dunbartonshire going to do with Helensburgh?"

Under European rules going back to 1986, a quorum of four member
state commissioners have the right to table what is known as a
"L.I.L Proof A", a prelude to any legislation which proposes to
amend or remove a name or description "relating to a city, town
or centre of habitation with more than eight people of voting age".

The four commissioners tabled the L.I.L Proof A in December and today
the legislation will go before a committee of ten commissioners.
It is expected to be law by 1 April, 2006.

The Scottish Executive had sought to win exemptions for places
beginning with "Black", but the bureaucrats were adamant they
were racist.

"We could hardly have places like Colouredford or the Coloured
Isle, the Coloured Cuillins," said a spokesman.

However, the Executive has come up with an alternative, to revert
to the Gaelic rendition of black - dubh - which it believes will
be acceptable.

The spokesman added: "They won't know the difference, hopefully.
And Burndubh and Dubhford don't sound too bad."

However, the greatest difficulty will be experienced by the
producers of Ordnance Survey maps.

A spokesman said: "This is a nightmare, amending every map. I
understand there will be a hiatus, where old maps are acceptable.
But new maps will have to be in place by 2007.

"More cartographers will be needed and the process of re-tooling
machines will begin next year.

"Inevitably, the cost will be high and prices will go up. We
estimate, for example, a map such as the Landranger series for
North Skye will retail at £94.20 by 2007."

Mr Pilof revealed that England would be next on the agenda, citing
the Isle of Man as particularly worthy of change.

A Manx spokesman said yesterday: "I hope this is a long way off.
We are two-time losers, what with the island's name and Douglas
as the capital. It's ridiculous, isn't it?

"It's as if these people sat there all day and made up this stuff."

By: PAUL DRURY AND JIM MCBETH -- 01-Apr-05