From sullivan Fri Oct 22 14:53:21 MDT 1999
Received: (from sullivan@localhost)
	by p2hp4.lanl.gov (8.8.6 (PHNE_14041)/8.8.6) id OAA18235;
	Fri, 22 Oct 1999 14:53:18 -0600 (MDT)
From: John Sullivan <sullivan>
Message-Id: <199910222053.OAA18235@p2hp4.lanl.gov>
Subject: chaintest results
To: young@orph01.phy.ornl.gov (Glenn Young),
        jsimon@lanl.gov (Jehanne Simon-Gillo),
        hubert@lanl.gov (Hubert van Hecke), schlei@lanl.gov (Bernd Schlei),
        brittoncl@ornl.gov (Chuck Britton), smithmc@ornl.gov (Melissa Smith),
        ericson@icsun1.ic.ornl.gov (Nance Ericson),
        emery@icsun1.ic.ornl.gov (Mike Emery),
        bobrek@icsun1.ic.ornl.gov (Miljko Bobrek), lind@icsun1.ic.ornl.gov,
        chi@nevis.nevis.columbia.edu (Cheng-Yi Chi),
        nagle@nevis1.nevis.columbia.edu (Jamie Nagle),
        witzig@bnl.gov (Chris Witzig), haggerty@bnl.gov (John Haggerty),
        chiu@nevis1.nevis.columbia.edu (Mickey Chiu)
Date: Fri, 22 Oct 1999 14:53:18 MDT
Cc: sullivan, ygkim@p2hp2.lanl.gov (Young Gook Kim)
X-Mailer: Elm [revision: 212.2]
Status: RO

Hi,
   I have been working on the MVD chaintest with mixed results. There is
a web pages describing the setup with a crib sheet on how to run the
programs:
http://p25ext.lanl.gov/phenix/mvd/elect/chaintest/chain.html

   I take this summary of results from the end of that page:

Results so far

So far, I have not been able to get the system to run correctly for a
large number of events. I was able to run the program overnight (15 hours
approx) and the dcm _seemed_ to still be taking data -- it still put out
a line saying it had collected XXX thousand events every now and then and
the count of events was "correct" -- it agreed with the scaler I attached
to the pulser generating level-1's. Level-1 was only being sent at a few
Hz (because I thought I would fill up the disk). The system "ran" for about
120K events. However, I got lots (roughtly one per 1000 events) of error
messages like this:

0xb51f30 (tStartRun): memPartAlloc: block too big - 524 in partition 0x1bb80c.

I do not know what this means. I tried running in multiplexed (2 MCM's using
one Glink fiber) and non-multiplexed mode with the same result. The output
file did not contain any data from this overnight run. The dcm program appeared
to crash (or maybe it took so long to close the file that I thought
it had crashed).

Today, I ran a series of tests with a few 10's of thousands of events using various
trigger rates. In every case, I got the first error like the one above between
events 8000 and 9000 (according to the output from the dcm program, which writes
down the event number every 1000 events). In each case, when I stopped the run,
the dcm program reported the number of events and it was "correct". When I tried to
read the data, I could only read some of the events from each file. In the best
case, I was able to read 7905 events from the file. I got this result three times
out of 4. The 4th time I could only read 617 events from the file.
In two cases, all 7905 events had the correct contents and format.
In the third case, there were 7905 events, but it looks like the MCMs
were not sending data to the DCIM -- the DCIM sent empty data packets.
This is what it is supposed to do when it receives ENDDAT0/1 but no data
packet from the MCMs.  In the case where I could only read 617 events, 
all the events were also "good".  You can check the definition of a "good"
event if you are willing to look at a C program, which is available 
on the web page. It means parity and a lot of other stuff was OK.
All of the output files contained exactly
9331200 bytes (8E6200 hex).

   I suspect that some of these problems (i.e. the MCMs stopped sending packets
in one case) are due to noise on our serial control lines -- I saw some of this
on the scope. We will have to modify our "homemade arcnet" board. In other
cases (all output files truncated and exactly the same length) I suspect
the dcm software.

  I will not be in Los Alamos next week (Oct 25-29). I will try to read my email.

Comments and suggestions would be appreciated,
John
--
phone: (505) 665-5963    fax: (505) 665-7920    email: sullivan@lanl.gov

From chiu@nevis1.nevis.columbia.edu Fri Oct 22 18:38:00 MDT 1999
Received: from mailman.lanl.gov (mailman.lanl.gov [128.165.5.1])
	by p2hp4.lanl.gov (8.8.6 (PHNE_14041)/8.8.6) with ESMTP id SAA19023
	for <sullivan@p2hp4.lanl.gov>; Fri, 22 Oct 1999 18:37:59 -0600 (MDT)
Received: from mailproxy.lanl.gov (mailproxy.lanl.gov [128.165.254.25])
	by mailman.lanl.gov (8.9.3/8.9.3/(cic-5, 2/8/99)) with ESMTP id SAA29210
	for <sullivan@p2hp4.lanl.gov>; Fri, 22 Oct 1999 18:38:31 -0600
Received: from nevis1.nevis.columbia.edu (nevis1.nevis.columbia.edu [192.12.82.3])
	by mailproxy.lanl.gov (8.9.3/8.9.3/(cic-5, 2/8/99)) with ESMTP id SAA22729
	for <sullivan@p2hp4.lanl.gov>; Fri, 22 Oct 1999 18:38:30 -0600
Received: from localhost (chiu@localhost) by nevis1.nevis.columbia.edu (980427.SGI.8.8.8/980728.SGI.AUTOCF) via ESMTP id UAA86776; Fri, 22 Oct 1999 20:38:27 -0400 (EDT)
Date: Fri, 22 Oct 1999 20:38:26 -0400
From: Mickey Chiu  <chiu@nevis1.nevis.columbia.edu>
To: John Sullivan <sullivan@p2hp4.lanl.gov>
cc: Mickey Chiu <chiu@nevis1.nevis.columbia.edu>,
        Jamie Nagle <nagle@nevis1.nevis.columbia.edu>,
        Cheng-Yi Chi <chi@nevis1.nevis.columbia.edu>,
        Young Gook Kim <ygkim@p2hp2.lanl.gov>
Subject: Re: mvd chaintest
In-Reply-To: <199910212344.RAA16149@p2hp4.lanl.gov>
Message-ID: <Pine.SGI.4.10.9910222036560.1860396-100000@nevis1.nevis.columbia.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Status: RO


hi john,

are you using the latest version of the dcm code, the one i had downloaded
when i logged on to the system?  i think i had named it libdcm.o.new.
some of the older versions of the code had memory leak problems.

On Thu, 21 Oct 1999, John Sullivan wrote:

> Hi,
>    You guys may be interested in knowing that I ran about 50K
> packets through the MVD DAQ chain into the DCM. The chain is as
> I described it in a message to the MVD list-server and to a few
> other interested parties on Monday October 18. Briefly, it
> contained 2 MCM's, power/comm board, motherboard, TCIM, DCIM,
> a homemade card for Arcnet, mini-DAQ, a DCM+controller+crate,
> 2 PCs, one hp-unix system, misc cables, assorted software. I 
> wrote the data to disk (on the hp) and analyzed (with a simple
> program I wrote) for
> 
> 1) "horizontal" and "vertical" parity errors
> 2) correct format of packets including:
>    a) first words has 1st 16 bits all=1
>    b) the detector ID word (=2) is correct
>    c) user word 8 (a word constructed by the DCIM) has the correct
>       settings of these bits:
>       i)   bit 10 = 0 (means valid data packet present in DCIM)
>       ii)  bit 11 = 0 (1 means packet too short)
>       iii) bit 12 = 0 (1 means no stop sequence seen)
>       iv)  bit 13 = 0 (1 means 2 consecutive start sequences seen)
> 3) ADC values returned in each packet are 0,1,2,...255 for every
>    event. This should be true because I put the MCM's in a test mode
>    where they send out these well defined results in response to
>    any level 1.
>  
> I did this run in "multi-plexed mode" -- the one DCIM fiber I was
> using sent out data from two MCM's on the same fiber. This means that
> each level-1 trigger results in two packets sent to the DCM.
> I ran at very low (and fixed) trigger frequency (3Hz) because I wanted
> to run for a long time and had finite disk space. The total number of
> "events" reported by the dcm program was 49282, which is exactly 
> consistent with 2 packets per event and the 24641 level-1 triggers I sent.
> However, the output file contained only 1012 valid events, the size of the
> file makes it clear that the remaining data was not in the file. There
> were no problems with the events which were in the output file.
> 
> I got error messages from the "dcm" program which runs on the VME crate
> controller -- it gave messages which said:
> 
> 0xbc1de0 (tStartRun): memPartAlloc: block too big - 524 in partition 0x1bb80c.
> 
> approximately once per 1000 packets (the frequency looked random to me) while
> it was running. When I tried to stop the run, it gave the same message again
> and then seemed to crash.
> 
> Do you know anything about this?
> 
> Regards,
> John
> 
> PS -- I will leave it to run overnight tonight.
> --
> phone: (505) 665-5963    fax: (505) 665-7920    email: sullivan@lanl.gov
> 


From young@her004.phy.ornl.gov Mon Oct 25 09:05:28 MDT 1999
Received: from mailman.lanl.gov (mailman.lanl.gov [128.165.5.1])
	by p2hp4.lanl.gov (8.8.6 (PHNE_14041)/8.8.6) with ESMTP id JAA22716
	for <sullivan@p2hp4.lanl.gov>; Mon, 25 Oct 1999 09:05:27 -0600 (MDT)
Received: from mailproxy.lanl.gov (mailproxy.lanl.gov [128.165.254.25])
	by mailman.lanl.gov (8.9.3/8.9.3/(cic-5, 2/8/99)) with ESMTP id JAA26982
	for <sullivan@p2hp4.lanl.gov>; Mon, 25 Oct 1999 09:05:33 -0600
Received: from her004.phy.ornl.gov (her004.phy.ornl.gov [134.167.20.160])
	by mailproxy.lanl.gov (8.9.3/8.9.3/(cic-5, 2/8/99)) with ESMTP id JAA18255
	for <sullivan@p2hp4.lanl.gov>; Mon, 25 Oct 1999 09:05:32 -0600
Received: (from young@localhost)
	by her004.phy.ornl.gov (8.8.8/8.8.8) id LAA36746
	for <sullivan@p2hp4.lanl.gov>; Mon, 25 Oct 1999 11:05:31 -0400 (EDT)
Date: Mon, 25 Oct 1999 11:05:31 -0400 (EDT)
From: glenn young <young@mail.phy.ornl.gov>
Message-Id: <199910251505.LAA36746@her004.phy.ornl.gov>
To: John Sullivan <sullivan@p2hp4.lanl.gov>
Subject: Re:  chaintest results
Status: RO

Dear John,
Speaking as a theorist (ha ha), in theory once you have things started
running, you could pull the ARCnet cable loose.  I have no idea if this
maps to your present hardware setup, but if glitches on serial lines are
suspects in stopping your many-event tests, have you tried pulling
the serial interface off once you start the run?  Or does that cause
something else to freeze up?
Glenn

From chiu@nevis1.nevis.columbia.edu Mon Oct 25 09:38:59 MDT 1999
Received: from mailhost3.lanl.gov (mailhost3.lanl.gov [128.165.3.9])
	by p2hp4.lanl.gov (8.8.6 (PHNE_14041)/8.8.6) with ESMTP id JAA22849
	for <sullivan@p2hp4.lanl.gov>; Mon, 25 Oct 1999 09:38:58 -0600 (MDT)
Received: from mailproxy.lanl.gov (mailproxy.lanl.gov [128.165.254.25])
	by mailhost3.lanl.gov (8.9.3/8.9.3/(cic-5, 2/8/99)) with ESMTP id JAA30801
	for <sullivan@p2hp4.lanl.gov>; Mon, 25 Oct 1999 09:39:04 -0600
Received: from nevis1.nevis.columbia.edu (nevis1.nevis.columbia.edu [192.12.82.3])
	by mailproxy.lanl.gov (8.9.3/8.9.3/(cic-5, 2/8/99)) with ESMTP id JAA24181
	for <sullivan@p2hp4.lanl.gov>; Mon, 25 Oct 1999 09:39:04 -0600
Received: from localhost (chiu@localhost) by nevis1.nevis.columbia.edu (980427.SGI.8.8.8/980728.SGI.AUTOCF) via ESMTP id LAA60087; Mon, 25 Oct 1999 11:39:03 -0400 (EDT)
Date: Mon, 25 Oct 1999 11:39:02 -0400
From: Mickey Chiu  <chiu@nevis1.nevis.columbia.edu>
To: John Sullivan <sullivan@p2hp4.lanl.gov>
cc: Mickey Chiu <chiu@nevis1.nevis.columbia.edu>,
        Jamie Nagle <nagle@nevis1.nevis.columbia.edu>,
        Cheng-Yi Chi <chi@nevis1.nevis.columbia.edu>,
        Young Gook Kim <ygkim@p2hp2.lanl.gov>
Subject: Re: mvd chaintest
In-Reply-To: <199910212344.RAA16149@p2hp4.lanl.gov>
Message-ID: <Pine.SGI.4.10.9910251137490.1315754-100000@nevis1.nevis.columbia.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Status: RO


hi john,

i was just wondering if the memPartAlloc messages have gone away?  if
you're still having problems let us know and we'll try to see if we can
figure out what could be wrong.

mickey

On Thu, 21 Oct 1999, John Sullivan wrote:

> Hi,
>    You guys may be interested in knowing that I ran about 50K
> packets through the MVD DAQ chain into the DCM. The chain is as
> I described it in a message to the MVD list-server and to a few
> other interested parties on Monday October 18. Briefly, it
> contained 2 MCM's, power/comm board, motherboard, TCIM, DCIM,
> a homemade card for Arcnet, mini-DAQ, a DCM+controller+crate,
> 2 PCs, one hp-unix system, misc cables, assorted software. I 
> wrote the data to disk (on the hp) and analyzed (with a simple
> program I wrote) for
> 
> 1) "horizontal" and "vertical" parity errors
> 2) correct format of packets including:
>    a) first words has 1st 16 bits all=1
>    b) the detector ID word (=2) is correct
>    c) user word 8 (a word constructed by the DCIM) has the correct
>       settings of these bits:
>       i)   bit 10 = 0 (means valid data packet present in DCIM)
>       ii)  bit 11 = 0 (1 means packet too short)
>       iii) bit 12 = 0 (1 means no stop sequence seen)
>       iv)  bit 13 = 0 (1 means 2 consecutive start sequences seen)
> 3) ADC values returned in each packet are 0,1,2,...255 for every
>    event. This should be true because I put the MCM's in a test mode
>    where they send out these well defined results in response to
>    any level 1.
>  
> I did this run in "multi-plexed mode" -- the one DCIM fiber I was
> using sent out data from two MCM's on the same fiber. This means that
> each level-1 trigger results in two packets sent to the DCM.
> I ran at very low (and fixed) trigger frequency (3Hz) because I wanted
> to run for a long time and had finite disk space. The total number of
> "events" reported by the dcm program was 49282, which is exactly 
> consistent with 2 packets per event and the 24641 level-1 triggers I sent.
> However, the output file contained only 1012 valid events, the size of the
> file makes it clear that the remaining data was not in the file. There
> were no problems with the events which were in the output file.
> 
> I got error messages from the "dcm" program which runs on the VME crate
> controller -- it gave messages which said:
> 
> 0xbc1de0 (tStartRun): memPartAlloc: block too big - 524 in partition 0x1bb80c.
> 
> approximately once per 1000 packets (the frequency looked random to me) while
> it was running. When I tried to stop the run, it gave the same message again
> and then seemed to crash.
> 
> Do you know anything about this?
> 
> Regards,
> John
> 
> PS -- I will leave it to run overnight tonight.
> --
> phone: (505) 665-5963    fax: (505) 665-7920    email: sullivan@lanl.gov
> 


From sullivan Mon Nov  1 10:05:30 MST 1999
Received: (from sullivan@localhost)
	by p2hp4.lanl.gov (8.8.6 (PHNE_14041)/8.8.6) id KAA07529;
	Mon, 1 Nov 1999 10:05:30 -0700 (MST)
From: John Sullivan <sullivan>
Message-Id: <199911011705.KAA07529@p2hp4.lanl.gov>
Subject: Re:  chaintest results
To: young@mail.phy.ornl.gov (glenn young)
Date: Mon, 01 Nov 1999 10:05:30 MST
Cc: sullivan
In-Reply-To: <199910251505.LAA36746@her004.phy.ornl.gov>; from "glenn young" at Oct 25, 99 11:05 am
X-Mailer: Elm [revision: 212.2]
Status: RO

Hi Glenn,
  Sorry for the slow response. I was out of town and forgot my "smartcard",
which left me with no way through our firewall to my email.
  Disconnecting the "arcnet" cable sounds like a good idea. I am certain
that there are some glitches being sent, but I am not sure if it is the
only problem. I think you are correct in saying that we do not need it
with everything dowloaded. I'll try it and get back to you.
  Thanks,
  John
--
phone: (505) 665-5963    fax: (505) 665-7920    email: sullivan@lanl.gov

From sullivan Mon Nov  1 10:07:12 MST 1999
Received: (from sullivan@localhost)
	by p2hp4.lanl.gov (8.8.6 (PHNE_14041)/8.8.6) id KAA07541;
	Mon, 1 Nov 1999 10:07:11 -0700 (MST)
From: John Sullivan <sullivan>
Message-Id: <199911011707.KAA07541@p2hp4.lanl.gov>
Subject: Re: mvd chaintest
To: chiu@nevis1.nevis.columbia.edu (Mickey Chiu)
Date: Mon, 01 Nov 1999 10:07:11 MST
Cc: sullivan
In-Reply-To: <Pine.SGI.4.10.9910251137490.1315754-100000@nevis1.nevis.columbia.edu>; from "Mickey Chiu" at Oct 25, 99 11:39 am
X-Mailer: Elm [revision: 212.2]
Status: RO

Hi Mickey,
  Sorry, I was out of town and forgot my "smartcard", which left me with
no way through the firewall to my email. 
  I think I am using the new libraires, but I will check on it later
this afternoon.
  Regards,
  John
--
phone: (505) 665-5963    fax: (505) 665-7920    email: sullivan@lanl.gov

From sullivan Mon Nov  1 16:39:04 MST 1999
Received: (from sullivan@localhost)
	by p2hp4.lanl.gov (8.8.6 (PHNE_14041)/8.8.6) id QAA08809;
	Mon, 1 Nov 1999 16:38:59 -0700 (MST)
From: John Sullivan <sullivan>
Message-Id: <199911012338.QAA08809@p2hp4.lanl.gov>
Subject: Re: mvd chaintest
To: chiu@nevis1.nevis.columbia.edu (Mickey Chiu)
Date: Mon, 01 Nov 1999 16:38:59 MST
Cc: sullivan, chi@nevis.nevis.columbia.edu (Cheng-Yi Chi),
        nagle@nevis1.nevis.columbia.edu (Jamie Nagle), ygim
In-Reply-To: <Pine.SGI.4.10.9910222036560.1860396-100000@nevis1.nevis.columbia.edu>; from "Mickey Chiu" at Oct 22, 99 8:38 pm
X-Mailer: Elm [revision: 212.2]
Status: RO

Hi Mickey,
   As far as I know, I am using the file you called libdcm.o.new. However,
to confuse things, I changed the file name from libdcm.o.new to libdcm.o.
It should be the same file however. If it helps, the creation date is
7-Sep-1999 at 12:13. I am not sure where this date comes from since it
was transfered to this computer after that date (unless you assume it is the
wrong file). The size is 196375 bytes. 
   If you think I have the wrong file, I will try a new one if you put it
someplace where I can find it.
   Thanks,
   John
> 
> 
> hi john,
> 
> are you using the latest version of the dcm code, the one i had downloaded
> when i logged on to the system?  i think i had named it libdcm.o.new.
> some of the older versions of the code had memory leak problems.

--
phone: (505) 665-5963    fax: (505) 665-7920    email: sullivan@lanl.gov

From chiu@nevis1.nevis.columbia.edu Mon Nov  1 19:20:57 MST 1999
Received: from mailhost.lanl.gov (mailhost.lanl.gov [128.165.3.12])
	by p2hp4.lanl.gov (8.8.6 (PHNE_14041)/8.8.6) with ESMTP id TAA09208;
	Mon, 1 Nov 1999 19:20:57 -0700 (MST)
Received: from mailproxy.lanl.gov (mailproxy.lanl.gov [128.165.254.25])
	by mailhost.lanl.gov (8.9.3/8.9.3/(cic-5, 2/8/99)) with ESMTP id TAA18354;
	Mon, 1 Nov 1999 19:21:08 -0700
Received: from nevis1.nevis.columbia.edu (nevis1.nevis.columbia.edu [192.12.82.3])
	by mailproxy.lanl.gov (8.9.3/8.9.3/(cic-5, 2/8/99)) with ESMTP id TAA15498;
	Mon, 1 Nov 1999 19:21:07 -0700
Received: from localhost (chiu@localhost) by nevis1.nevis.columbia.edu (980427.SGI.8.8.8/980728.SGI.AUTOCF) via ESMTP id VAA68142; Mon, 1 Nov 1999 21:21:06 -0500 (EST)
Date: Mon, 1 Nov 1999 21:21:06 -0500
From: Mickey Chiu  <chiu@nevis1.nevis.columbia.edu>
To: John Sullivan <sullivan@p2hp4.lanl.gov>
cc: Mickey Chiu <chiu@nevis1.nevis.columbia.edu>, sullivan@p2hp4.lanl.gov,
        Cheng-Yi Chi <chi@nevis1.nevis.columbia.edu>,
        Jamie Nagle <nagle@nevis1.nevis.columbia.edu>, ygim@p2hp4.lanl.gov
Subject: Re: mvd chaintest
In-Reply-To: <199911012338.QAA08809@p2hp4.lanl.gov>
Message-ID: <Pine.SGI.4.10.9911012112360.2247505-100000@nevis1.nevis.columbia.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Status: RO


hi john,

yes, the september 7 version is the new version.  can you check whether
you are using NFS to mount your disk?  you can type

	nfsDevShow

to see which disks are NFS mounted.  you should see some lines with the
nfs server name and the mounted disks.  if not, try mounting the disks by
typing

	nfsMount "nfs_server", "/remote/disk", "/local/mount/point"

the return value should be 0.  if you get a return value of -1, then the
mount failed (which you can check anyway with nfsDevShow).

the disk needs to be nfs mounted because otherwise the crate controller
caches all the data in memory until the end of the run, and then dumps it
all using rcp.  if the data goes above 5 or 10 megabytes, then all the
memory in the controller is used up.

let me know if this helps get rid of the memPartAlloc problems.


mickey

On Mon, 1 Nov 1999, John Sullivan wrote:

> Hi Mickey,
>    As far as I know, I am using the file you called libdcm.o.new. However,
> to confuse things, I changed the file name from libdcm.o.new to libdcm.o.
> It should be the same file however. If it helps, the creation date is
> 7-Sep-1999 at 12:13. I am not sure where this date comes from since it
> was transfered to this computer after that date (unless you assume it is the
> wrong file). The size is 196375 bytes. 
>    If you think I have the wrong file, I will try a new one if you put it
> someplace where I can find it.
>    Thanks,
>    John
> > 
> > 
> > hi john,
> > 
> > are you using the latest version of the dcm code, the one i had downloaded
> > when i logged on to the system?  i think i had named it libdcm.o.new.
> > some of the older versions of the code had memory leak problems.
> 
> --
> phone: (505) 665-5963    fax: (505) 665-7920    email: sullivan@lanl.gov
> 


From sullivan Thu Nov  4 09:28:12 MST 1999
Received: (from sullivan@localhost)
	by p2hp4.lanl.gov (8.8.6 (PHNE_14041)/8.8.6) id JAA14732;
	Thu, 4 Nov 1999 09:28:11 -0700 (MST)
From: John Sullivan <sullivan>
Message-Id: <199911041628.JAA14732@p2hp4.lanl.gov>
Subject: Re: mvd chaintest
To: chiu@nevis1.nevis.columbia.edu (Mickey Chiu)
Date: Thu, 04 Nov 1999 9:28:11 MST
Cc: sullivan
In-Reply-To: <Pine.SGI.4.10.9911012112360.2247505-100000@nevis1.nevis.columbia.edu>; from "Mickey Chiu" at Nov 1, 99 9:21 pm
X-Mailer: Elm [revision: 212.2]
Status: RO

Hi Mickey,
  As always, I'll start by saying sorry it took me so long to
get to this.

  You figured out the problem, I think. The disk I am using does
not seem to be nfs mounted. Somehow the crate controller can still
see the disk (it does read files from it), but nothing is shown
when I do nfsDevShow. My next problem is that I can not get it
to do the nfs mount. I tried commands of the form:
nfsMount "mvdonl", "/usr1", "/usr1"
and
nfsMount "128.165.205.20", "/usr1", "/usr1"
where mvdonl.lanl.gov is the name of the computer on which
the disk I want to mount lives. /usr1 is the disk's name.
I always get the following result (after about 30 seconds):
value = -1 = 0xffffffff = DAQSemaphore + 0xff48e0ff

Am I doing something obviously stupid?

Thanks,
John
--
phone: (505) 665-5963    fax: (505) 665-7920    email: sullivan@lanl.gov

From chiu@nevis1.nevis.columbia.edu Thu Nov  4 09:58:25 MST 1999
Received: from mailhost.lanl.gov (mailhost.lanl.gov [128.165.3.12])
	by p2hp4.lanl.gov (8.8.6 (PHNE_14041)/8.8.6) with ESMTP id JAA14793
	for <sullivan@p2hp4.lanl.gov>; Thu, 4 Nov 1999 09:58:24 -0700 (MST)
Received: from mailproxy.lanl.gov (mailproxy.lanl.gov [128.165.254.25])
	by mailhost.lanl.gov (8.9.3/8.9.3/(cic-5, 2/8/99)) with ESMTP id JAA19149
	for <sullivan@p2hp4.lanl.gov>; Thu, 4 Nov 1999 09:58:30 -0700
Received: from nevis1.nevis.columbia.edu (nevis1.nevis.columbia.edu [192.12.82.3])
	by mailproxy.lanl.gov (8.9.3/8.9.3/(cic-5, 2/8/99)) with ESMTP id JAA06936
	for <sullivan@p2hp4.lanl.gov>; Thu, 4 Nov 1999 09:58:30 -0700
Received: from localhost (chiu@localhost) by nevis1.nevis.columbia.edu (980427.SGI.8.8.8/980728.SGI.AUTOCF) via ESMTP id LAA35264 for <sullivan@p2hp4.lanl.gov>; Thu, 4 Nov 1999 11:58:18 -0500 (EST)
Date: Thu, 4 Nov 1999 11:58:18 -0500
From: Mickey Chiu  <chiu@nevis1.nevis.columbia.edu>
To: John Sullivan <sullivan@p2hp4.lanl.gov>
Subject: Re: mvd chaintest
In-Reply-To: <199911041628.JAA14732@p2hp4.lanl.gov>
Message-ID: <Pine.SGI.4.10.9911041155050.620478-100000@nevis1.nevis.columbia.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Status: RO


i'm not sure.. did you export the disk on mvdonl to the crate controller?
the "exportfs" exports the disks that are listed in the /etc/exports file.
you may need to add an entry into the /etc/exports file.

mickey


On Thu, 4 Nov 1999, John Sullivan wrote:

> Hi Mickey,
>   As always, I'll start by saying sorry it took me so long to
> get to this.
> 
>   You figured out the problem, I think. The disk I am using does
> not seem to be nfs mounted. Somehow the crate controller can still
> see the disk (it does read files from it), but nothing is shown
> when I do nfsDevShow. My next problem is that I can not get it
> to do the nfs mount. I tried commands of the form:
> nfsMount "mvdonl", "/usr1", "/usr1"
> and
> nfsMount "128.165.205.20", "/usr1", "/usr1"
> where mvdonl.lanl.gov is the name of the computer on which
> the disk I want to mount lives. /usr1 is the disk's name.
> I always get the following result (after about 30 seconds):
> value = -1 = 0xffffffff = DAQSemaphore + 0xff48e0ff
> 
> Am I doing something obviously stupid?
> 
> Thanks,
> John
> --
> phone: (505) 665-5963    fax: (505) 665-7920    email: sullivan@lanl.gov
> 


From sullivan Thu Nov  4 10:27:38 MST 1999
Received: (from sullivan@localhost)
	by p2hp4.lanl.gov (8.8.6 (PHNE_14041)/8.8.6) id KAA14841;
	Thu, 4 Nov 1999 10:27:38 -0700 (MST)
From: John Sullivan <sullivan>
Message-Id: <199911041727.KAA14841@p2hp4.lanl.gov>
Subject: Re: mvd chaintest
To: chiu@nevis1.nevis.columbia.edu (Mickey Chiu)
Date: Thu, 04 Nov 1999 10:27:37 MST
Cc: sullivan
In-Reply-To: <Pine.SGI.4.10.9911041155050.620478-100000@nevis1.nevis.columbia.edu>; from "Mickey Chiu" at Nov 4, 99 11:58 am
X-Mailer: Elm [revision: 212.2]
Status: RO

Hi Mickey,
  Yes, I did add an entry in the /etc/exports file and just
looked and it says it is exported (using the SAM utility).
I guess I should try nfs mounting this disk from some other
system which I understand better to make sure the problem
is not in the way the disk is exported.
  Regards,
  John
> 
> 
> i'm not sure.. did you export the disk on mvdonl to the crate controller?
> the "exportfs" exports the disks that are listed in the /etc/exports file.
> you may need to add an entry into the /etc/exports file.
> 
> mickey
> 
> 
> On Thu, 4 Nov 1999, John Sullivan wrote:
> 
> > Hi Mickey,
> >   As always, I'll start by saying sorry it took me so long to
> > get to this.
> > 
> >   You figured out the problem, I think. The disk I am using does
> > not seem to be nfs mounted. Somehow the crate controller can still
> > see the disk (it does read files from it), but nothing is shown
> > when I do nfsDevShow. My next problem is that I can not get it
> > to do the nfs mount. I tried commands of the form:
> > nfsMount "mvdonl", "/usr1", "/usr1"
> > and
> > nfsMount "128.165.205.20", "/usr1", "/usr1"
> > where mvdonl.lanl.gov is the name of the computer on which
> > the disk I want to mount lives. /usr1 is the disk's name.
> > I always get the following result (after about 30 seconds):
> > value = -1 = 0xffffffff = DAQSemaphore + 0xff48e0ff
> > 
> > Am I doing something obviously stupid?
> > 
> > Thanks,
> > John
> > --
> > phone: (505) 665-5963    fax: (505) 665-7920    email: sullivan@lanl.gov
> > 
> 
> 


--
phone: (505) 665-5963    fax: (505) 665-7920    email: sullivan@lanl.gov

From sullivan Fri Dec  3 17:36:07 MST 1999
Received: (from sullivan@localhost)
	by p2hp4.lanl.gov (8.8.6 (PHNE_14041)/8.8.6) id RAA29753;
	Fri, 3 Dec 1999 17:36:06 -0700 (MST)
From: John Sullivan <sullivan>
Message-Id: <199912040036.RAA29753@p2hp4.lanl.gov>
Subject: mvd chaintest results
To: phenix-mvd-l@bnl.gov, haggerty@bnl.gov (John Haggerty), miljko,
        young@orph01.phy.ornl.gov (Glenn Young),
        nagle@nevis1.nevis.columbia.edu (Jamie Nagle),
        chi@nevis.nevis.columbia.edu (Cheng-Yi Chi)
Date: Fri, 03 Dec 1999 17:36:05 MST
Cc: sullivan
X-Mailer: Elm [revision: 212.2]
Status: RO

Hi,
   This message goes to the MVD listserver and a few other possibly
interested parties. I do not know of the "interested parties"
are on the mvd listserver or not. If you get it twice, sorry.

Abstract: The MVD chaintest seems to work (mainly) for at least
100K events. One problem is that the number of triggers
sent to the system is a few percent more than the number of events
seen in the output file. Perhaps this difference is caused by
the limited speed of writting the files to disk. There are parity
errors about once per 50K data packets.

relevant webpage: "Notes about MVD chaintest" 
http://p25ext.lanl.gov/phenix/mvd/elect/chaintest/chain.html

____________________________________________________________
The long version:
____________________________________________________________
  With recent help from Sangkoo Hahn, Younggook Kim, and Mickey Chiu, 
the MVD DAQ chaintest now seems to work. SangYeol Kim (now back in
Korea) and Miljko Bobrek (now back in the thick air of Oak Ridge)
also played key roles in putting this chaintest together. Honerable
mention goes to a variety of people at ORNL, Nevis, and BNL for past
help on this setup. We have not run 1 million events through the 
system, as Glenn and Chi would like, but I have run more than 100K
events. Due to lack of disk space, combined with my ingnorance of 
any other way to do the test, I am unable to run 1 million events 
without stopping the DAQ system a few times and cleaning up the disk.

  The chaintest consists of 
1)  6 MCMs (but I only read out two of them right now)
2)  1 power/communication board 
3)  1 motherboard
4)  1 phenix low voltage crate
5)  1 MVD prototype Data collection interface module (DCIM)
6)  1 MVD prototype Timing and Control Interface module (TCIM)
7)  1 homemade (by SangYeol Kim) replacement for the arcnet
      interface module
8)  1 9u VME "Interface crate"
9)  1 miniDAQ used as a substitute of a granule timing module
10) 1 6u VME crate
11) 1 Phenix Data collection interface module
12) 1 VME crate controller
13) 1 Phenix Partition module
14) 2 PC's and 1 HP/unix system
15) a variety of computer programs
16) a variety of cables, copper and optical
17) a few NIM modules
18) part of the MVD cooling system
In short, a lot of stuff. There is a sketch of this on the
web page I mentioned at the top of this message.

   The data output packets go through a "real" Phenix DAQ chain
MCM --> power/comm board --> motherboard --> DCIM --> DCM -->
crate controller
The crate controller writes the data to a Phenix Raw Data Format
(PRDF) file on the disk of the HP/unix system. This disk is what
limits the number of events -- the disk gets full after a few
100K events.

   The timing information uses only part of the "real" chain.
A mini-daq system is setup via labview code on the PC. It
sends of the timing information (level-1's, clocks, ...) out on
a optical fiber. The minidaq replaces a granule timing module in
the "real" system. The rest of the chain is like the real system:
minidaq --> TCIM --> motherboard --> power/comm board --> MCM
I should mention that the minidaq + TCIM part of the setup is
extremely stable.  I started in Monday morning and it was still
running fine Friday morning without my ever touching it.

   The setup arcnet uses less of the real system, since the
Arcnet interface module does not yet exist. The arcnet information
(programs for FPGAs in the DCIM and MCM; serial control bits for
the MCMs, TCIM, DCIM) is sent out of a PC running labwindows to
a homemade "arcnet" board which is in the interface module crate.
It sends the data out on the "real" DAQ system path:
homemade arcnet <--> motherboard <--> power/comm board <-->MCM
The arrows go both ways since some of the data can be read back
through the chain.
   
   The tests so far all have used the "test" mode
in the MCMs. In this mode each MCM sends out ADC data consisting 
of 1, 2, 3, ... 256 for the 256 ADCs in the MCM. This allows us
check the validity of the data received. The tests used "duplex
mode" into the DCM -- each fiber into the DCM carries data from
two MCMs, controlled via the ENDDAT0 and 1 signals. The trigger
(into the minidaq) is a pulser (NIM module) running at rates from
a few Hz to a few 10's of Hz. If the rate is too high, the system
seems to fall behind the trigger rate -- I assume the limit is
writting on the hp/unix disk.  

   The data is analyzed with a simple program I wrote. It checks
the parity bit on each data word, the "vertical parity" word,
and about 6 other details of the data packet format. 

   The longest test so far ran for 202 minutes. The "run" was
stopped by me -- it did not crash. About 100K events (each
consisting of 2 MCM data packets) were collected. The scaler
attached to the pulser said there should be 105520 triggers.
The program on the crate controller (which I got from Mickey
and Jamie) said it had seen 211040 packets (=2*105520). The PRDF file
205190 contained data packets -- 102710 events from the first
MCM and 102467 packets from the 2nd MCM. I do not know if the
small (0.2%) difference in packets from the two MCMs is a problem
of the DAQ chain or my code. The code also reported 3 events with
parity errors (in the "vertical parity") and one event with the
detector ID (should be 2 for the MVD) incorrect in the data
packet. All other details of the data packets (including the ADC
values) were correct for all packets. 
The first MCM was only got 97.3% of the events to the output file
and the second MCM only got 97.1% of the events to the output file.
I do not yet know the cause of these discrepencies. In shorter 
runs of a few thousand events, I did not see such differences.
I hypothesize that the differences in the number of triggers and
the number of events in the file is related to the speed at
which the events can be stored on disk -- but I have not tested
this. However, if you look at the table below there seems to be
an anti-correlation between the fraction of the events which
get into the PRDF file and the event rate.

  Here a a few similar statistics from the test described above 
and a few other tests (actually I divided the number of "events"
from the crate controller by two to convert packets to events.):

                                  Test1    Test2   Test3   Test4
triggers (from NIM scaler)  =     105520   12125   45816     535

events reported by "dcm"
program in crate controller =     105520   12125   45816     535

MCM1 packets in PRDF file   =     102719   12097   44163     535

MCM2 packets in PRDF file   =     102467   12097   44047     535

packets with parity errors,
or other format problems    =          4       0       2       0

length of run (minutes)     =        202      31      89     2.5

rate: events/sec            =        8.7     6.5     8.6     3.5

Regards,
John
--
phone: (505) 665-5963    fax: (505) 665-7920    email: sullivan@lanl.gov