From sullivan Fri Oct 22 14:53:21 MDT 1999 Received: (from sullivan@localhost) by p2hp4.lanl.gov (8.8.6 (PHNE_14041)/8.8.6) id OAA18235; Fri, 22 Oct 1999 14:53:18 -0600 (MDT) From: John Sullivan Message-Id: <199910222053.OAA18235@p2hp4.lanl.gov> Subject: chaintest results To: young@orph01.phy.ornl.gov (Glenn Young), jsimon@lanl.gov (Jehanne Simon-Gillo), hubert@lanl.gov (Hubert van Hecke), schlei@lanl.gov (Bernd Schlei), brittoncl@ornl.gov (Chuck Britton), smithmc@ornl.gov (Melissa Smith), ericson@icsun1.ic.ornl.gov (Nance Ericson), emery@icsun1.ic.ornl.gov (Mike Emery), bobrek@icsun1.ic.ornl.gov (Miljko Bobrek), lind@icsun1.ic.ornl.gov, chi@nevis.nevis.columbia.edu (Cheng-Yi Chi), nagle@nevis1.nevis.columbia.edu (Jamie Nagle), witzig@bnl.gov (Chris Witzig), haggerty@bnl.gov (John Haggerty), chiu@nevis1.nevis.columbia.edu (Mickey Chiu) Date: Fri, 22 Oct 1999 14:53:18 MDT Cc: sullivan, ygkim@p2hp2.lanl.gov (Young Gook Kim) X-Mailer: Elm [revision: 212.2] Status: RO Hi, I have been working on the MVD chaintest with mixed results. There is a web pages describing the setup with a crib sheet on how to run the programs: http://p25ext.lanl.gov/phenix/mvd/elect/chaintest/chain.html I take this summary of results from the end of that page: Results so far So far, I have not been able to get the system to run correctly for a large number of events. I was able to run the program overnight (15 hours approx) and the dcm _seemed_ to still be taking data -- it still put out a line saying it had collected XXX thousand events every now and then and the count of events was "correct" -- it agreed with the scaler I attached to the pulser generating level-1's. Level-1 was only being sent at a few Hz (because I thought I would fill up the disk). The system "ran" for about 120K events. However, I got lots (roughtly one per 1000 events) of error messages like this: 0xb51f30 (tStartRun): memPartAlloc: block too big - 524 in partition 0x1bb80c. I do not know what this means. I tried running in multiplexed (2 MCM's using one Glink fiber) and non-multiplexed mode with the same result. The output file did not contain any data from this overnight run. The dcm program appeared to crash (or maybe it took so long to close the file that I thought it had crashed). Today, I ran a series of tests with a few 10's of thousands of events using various trigger rates. In every case, I got the first error like the one above between events 8000 and 9000 (according to the output from the dcm program, which writes down the event number every 1000 events). In each case, when I stopped the run, the dcm program reported the number of events and it was "correct". When I tried to read the data, I could only read some of the events from each file. In the best case, I was able to read 7905 events from the file. I got this result three times out of 4. The 4th time I could only read 617 events from the file. In two cases, all 7905 events had the correct contents and format. In the third case, there were 7905 events, but it looks like the MCMs were not sending data to the DCIM -- the DCIM sent empty data packets. This is what it is supposed to do when it receives ENDDAT0/1 but no data packet from the MCMs. In the case where I could only read 617 events, all the events were also "good". You can check the definition of a "good" event if you are willing to look at a C program, which is available on the web page. It means parity and a lot of other stuff was OK. All of the output files contained exactly 9331200 bytes (8E6200 hex). I suspect that some of these problems (i.e. the MCMs stopped sending packets in one case) are due to noise on our serial control lines -- I saw some of this on the scope. We will have to modify our "homemade arcnet" board. In other cases (all output files truncated and exactly the same length) I suspect the dcm software. I will not be in Los Alamos next week (Oct 25-29). I will try to read my email. Comments and suggestions would be appreciated, John -- phone: (505) 665-5963 fax: (505) 665-7920 email: sullivan@lanl.gov From chiu@nevis1.nevis.columbia.edu Fri Oct 22 18:38:00 MDT 1999 Received: from mailman.lanl.gov (mailman.lanl.gov [128.165.5.1]) by p2hp4.lanl.gov (8.8.6 (PHNE_14041)/8.8.6) with ESMTP id SAA19023 for ; Fri, 22 Oct 1999 18:37:59 -0600 (MDT) Received: from mailproxy.lanl.gov (mailproxy.lanl.gov [128.165.254.25]) by mailman.lanl.gov (8.9.3/8.9.3/(cic-5, 2/8/99)) with ESMTP id SAA29210 for ; Fri, 22 Oct 1999 18:38:31 -0600 Received: from nevis1.nevis.columbia.edu (nevis1.nevis.columbia.edu [192.12.82.3]) by mailproxy.lanl.gov (8.9.3/8.9.3/(cic-5, 2/8/99)) with ESMTP id SAA22729 for ; Fri, 22 Oct 1999 18:38:30 -0600 Received: from localhost (chiu@localhost) by nevis1.nevis.columbia.edu (980427.SGI.8.8.8/980728.SGI.AUTOCF) via ESMTP id UAA86776; Fri, 22 Oct 1999 20:38:27 -0400 (EDT) Date: Fri, 22 Oct 1999 20:38:26 -0400 From: Mickey Chiu To: John Sullivan cc: Mickey Chiu , Jamie Nagle , Cheng-Yi Chi , Young Gook Kim Subject: Re: mvd chaintest In-Reply-To: <199910212344.RAA16149@p2hp4.lanl.gov> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Status: RO hi john, are you using the latest version of the dcm code, the one i had downloaded when i logged on to the system? i think i had named it libdcm.o.new. some of the older versions of the code had memory leak problems. On Thu, 21 Oct 1999, John Sullivan wrote: > Hi, > You guys may be interested in knowing that I ran about 50K > packets through the MVD DAQ chain into the DCM. The chain is as > I described it in a message to the MVD list-server and to a few > other interested parties on Monday October 18. Briefly, it > contained 2 MCM's, power/comm board, motherboard, TCIM, DCIM, > a homemade card for Arcnet, mini-DAQ, a DCM+controller+crate, > 2 PCs, one hp-unix system, misc cables, assorted software. I > wrote the data to disk (on the hp) and analyzed (with a simple > program I wrote) for > > 1) "horizontal" and "vertical" parity errors > 2) correct format of packets including: > a) first words has 1st 16 bits all=1 > b) the detector ID word (=2) is correct > c) user word 8 (a word constructed by the DCIM) has the correct > settings of these bits: > i) bit 10 = 0 (means valid data packet present in DCIM) > ii) bit 11 = 0 (1 means packet too short) > iii) bit 12 = 0 (1 means no stop sequence seen) > iv) bit 13 = 0 (1 means 2 consecutive start sequences seen) > 3) ADC values returned in each packet are 0,1,2,...255 for every > event. This should be true because I put the MCM's in a test mode > where they send out these well defined results in response to > any level 1. > > I did this run in "multi-plexed mode" -- the one DCIM fiber I was > using sent out data from two MCM's on the same fiber. This means that > each level-1 trigger results in two packets sent to the DCM. > I ran at very low (and fixed) trigger frequency (3Hz) because I wanted > to run for a long time and had finite disk space. The total number of > "events" reported by the dcm program was 49282, which is exactly > consistent with 2 packets per event and the 24641 level-1 triggers I sent. > However, the output file contained only 1012 valid events, the size of the > file makes it clear that the remaining data was not in the file. There > were no problems with the events which were in the output file. > > I got error messages from the "dcm" program which runs on the VME crate > controller -- it gave messages which said: > > 0xbc1de0 (tStartRun): memPartAlloc: block too big - 524 in partition 0x1bb80c. > > approximately once per 1000 packets (the frequency looked random to me) while > it was running. When I tried to stop the run, it gave the same message again > and then seemed to crash. > > Do you know anything about this? > > Regards, > John > > PS -- I will leave it to run overnight tonight. > -- > phone: (505) 665-5963 fax: (505) 665-7920 email: sullivan@lanl.gov > From young@her004.phy.ornl.gov Mon Oct 25 09:05:28 MDT 1999 Received: from mailman.lanl.gov (mailman.lanl.gov [128.165.5.1]) by p2hp4.lanl.gov (8.8.6 (PHNE_14041)/8.8.6) with ESMTP id JAA22716 for ; Mon, 25 Oct 1999 09:05:27 -0600 (MDT) Received: from mailproxy.lanl.gov (mailproxy.lanl.gov [128.165.254.25]) by mailman.lanl.gov (8.9.3/8.9.3/(cic-5, 2/8/99)) with ESMTP id JAA26982 for ; Mon, 25 Oct 1999 09:05:33 -0600 Received: from her004.phy.ornl.gov (her004.phy.ornl.gov [134.167.20.160]) by mailproxy.lanl.gov (8.9.3/8.9.3/(cic-5, 2/8/99)) with ESMTP id JAA18255 for ; Mon, 25 Oct 1999 09:05:32 -0600 Received: (from young@localhost) by her004.phy.ornl.gov (8.8.8/8.8.8) id LAA36746 for ; Mon, 25 Oct 1999 11:05:31 -0400 (EDT) Date: Mon, 25 Oct 1999 11:05:31 -0400 (EDT) From: glenn young Message-Id: <199910251505.LAA36746@her004.phy.ornl.gov> To: John Sullivan Subject: Re: chaintest results Status: RO Dear John, Speaking as a theorist (ha ha), in theory once you have things started running, you could pull the ARCnet cable loose. I have no idea if this maps to your present hardware setup, but if glitches on serial lines are suspects in stopping your many-event tests, have you tried pulling the serial interface off once you start the run? Or does that cause something else to freeze up? Glenn From chiu@nevis1.nevis.columbia.edu Mon Oct 25 09:38:59 MDT 1999 Received: from mailhost3.lanl.gov (mailhost3.lanl.gov [128.165.3.9]) by p2hp4.lanl.gov (8.8.6 (PHNE_14041)/8.8.6) with ESMTP id JAA22849 for ; Mon, 25 Oct 1999 09:38:58 -0600 (MDT) Received: from mailproxy.lanl.gov (mailproxy.lanl.gov [128.165.254.25]) by mailhost3.lanl.gov (8.9.3/8.9.3/(cic-5, 2/8/99)) with ESMTP id JAA30801 for ; Mon, 25 Oct 1999 09:39:04 -0600 Received: from nevis1.nevis.columbia.edu (nevis1.nevis.columbia.edu [192.12.82.3]) by mailproxy.lanl.gov (8.9.3/8.9.3/(cic-5, 2/8/99)) with ESMTP id JAA24181 for ; Mon, 25 Oct 1999 09:39:04 -0600 Received: from localhost (chiu@localhost) by nevis1.nevis.columbia.edu (980427.SGI.8.8.8/980728.SGI.AUTOCF) via ESMTP id LAA60087; Mon, 25 Oct 1999 11:39:03 -0400 (EDT) Date: Mon, 25 Oct 1999 11:39:02 -0400 From: Mickey Chiu To: John Sullivan cc: Mickey Chiu , Jamie Nagle , Cheng-Yi Chi , Young Gook Kim Subject: Re: mvd chaintest In-Reply-To: <199910212344.RAA16149@p2hp4.lanl.gov> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Status: RO hi john, i was just wondering if the memPartAlloc messages have gone away? if you're still having problems let us know and we'll try to see if we can figure out what could be wrong. mickey On Thu, 21 Oct 1999, John Sullivan wrote: > Hi, > You guys may be interested in knowing that I ran about 50K > packets through the MVD DAQ chain into the DCM. The chain is as > I described it in a message to the MVD list-server and to a few > other interested parties on Monday October 18. Briefly, it > contained 2 MCM's, power/comm board, motherboard, TCIM, DCIM, > a homemade card for Arcnet, mini-DAQ, a DCM+controller+crate, > 2 PCs, one hp-unix system, misc cables, assorted software. I > wrote the data to disk (on the hp) and analyzed (with a simple > program I wrote) for > > 1) "horizontal" and "vertical" parity errors > 2) correct format of packets including: > a) first words has 1st 16 bits all=1 > b) the detector ID word (=2) is correct > c) user word 8 (a word constructed by the DCIM) has the correct > settings of these bits: > i) bit 10 = 0 (means valid data packet present in DCIM) > ii) bit 11 = 0 (1 means packet too short) > iii) bit 12 = 0 (1 means no stop sequence seen) > iv) bit 13 = 0 (1 means 2 consecutive start sequences seen) > 3) ADC values returned in each packet are 0,1,2,...255 for every > event. This should be true because I put the MCM's in a test mode > where they send out these well defined results in response to > any level 1. > > I did this run in "multi-plexed mode" -- the one DCIM fiber I was > using sent out data from two MCM's on the same fiber. This means that > each level-1 trigger results in two packets sent to the DCM. > I ran at very low (and fixed) trigger frequency (3Hz) because I wanted > to run for a long time and had finite disk space. The total number of > "events" reported by the dcm program was 49282, which is exactly > consistent with 2 packets per event and the 24641 level-1 triggers I sent. > However, the output file contained only 1012 valid events, the size of the > file makes it clear that the remaining data was not in the file. There > were no problems with the events which were in the output file. > > I got error messages from the "dcm" program which runs on the VME crate > controller -- it gave messages which said: > > 0xbc1de0 (tStartRun): memPartAlloc: block too big - 524 in partition 0x1bb80c. > > approximately once per 1000 packets (the frequency looked random to me) while > it was running. When I tried to stop the run, it gave the same message again > and then seemed to crash. > > Do you know anything about this? > > Regards, > John > > PS -- I will leave it to run overnight tonight. > -- > phone: (505) 665-5963 fax: (505) 665-7920 email: sullivan@lanl.gov > From sullivan Mon Nov 1 10:05:30 MST 1999 Received: (from sullivan@localhost) by p2hp4.lanl.gov (8.8.6 (PHNE_14041)/8.8.6) id KAA07529; Mon, 1 Nov 1999 10:05:30 -0700 (MST) From: John Sullivan Message-Id: <199911011705.KAA07529@p2hp4.lanl.gov> Subject: Re: chaintest results To: young@mail.phy.ornl.gov (glenn young) Date: Mon, 01 Nov 1999 10:05:30 MST Cc: sullivan In-Reply-To: <199910251505.LAA36746@her004.phy.ornl.gov>; from "glenn young" at Oct 25, 99 11:05 am X-Mailer: Elm [revision: 212.2] Status: RO Hi Glenn, Sorry for the slow response. I was out of town and forgot my "smartcard", which left me with no way through our firewall to my email. Disconnecting the "arcnet" cable sounds like a good idea. I am certain that there are some glitches being sent, but I am not sure if it is the only problem. I think you are correct in saying that we do not need it with everything dowloaded. I'll try it and get back to you. Thanks, John -- phone: (505) 665-5963 fax: (505) 665-7920 email: sullivan@lanl.gov From sullivan Mon Nov 1 10:07:12 MST 1999 Received: (from sullivan@localhost) by p2hp4.lanl.gov (8.8.6 (PHNE_14041)/8.8.6) id KAA07541; Mon, 1 Nov 1999 10:07:11 -0700 (MST) From: John Sullivan Message-Id: <199911011707.KAA07541@p2hp4.lanl.gov> Subject: Re: mvd chaintest To: chiu@nevis1.nevis.columbia.edu (Mickey Chiu) Date: Mon, 01 Nov 1999 10:07:11 MST Cc: sullivan In-Reply-To: ; from "Mickey Chiu" at Oct 25, 99 11:39 am X-Mailer: Elm [revision: 212.2] Status: RO Hi Mickey, Sorry, I was out of town and forgot my "smartcard", which left me with no way through the firewall to my email. I think I am using the new libraires, but I will check on it later this afternoon. Regards, John -- phone: (505) 665-5963 fax: (505) 665-7920 email: sullivan@lanl.gov From sullivan Mon Nov 1 16:39:04 MST 1999 Received: (from sullivan@localhost) by p2hp4.lanl.gov (8.8.6 (PHNE_14041)/8.8.6) id QAA08809; Mon, 1 Nov 1999 16:38:59 -0700 (MST) From: John Sullivan Message-Id: <199911012338.QAA08809@p2hp4.lanl.gov> Subject: Re: mvd chaintest To: chiu@nevis1.nevis.columbia.edu (Mickey Chiu) Date: Mon, 01 Nov 1999 16:38:59 MST Cc: sullivan, chi@nevis.nevis.columbia.edu (Cheng-Yi Chi), nagle@nevis1.nevis.columbia.edu (Jamie Nagle), ygim In-Reply-To: ; from "Mickey Chiu" at Oct 22, 99 8:38 pm X-Mailer: Elm [revision: 212.2] Status: RO Hi Mickey, As far as I know, I am using the file you called libdcm.o.new. However, to confuse things, I changed the file name from libdcm.o.new to libdcm.o. It should be the same file however. If it helps, the creation date is 7-Sep-1999 at 12:13. I am not sure where this date comes from since it was transfered to this computer after that date (unless you assume it is the wrong file). The size is 196375 bytes. If you think I have the wrong file, I will try a new one if you put it someplace where I can find it. Thanks, John > > > hi john, > > are you using the latest version of the dcm code, the one i had downloaded > when i logged on to the system? i think i had named it libdcm.o.new. > some of the older versions of the code had memory leak problems. -- phone: (505) 665-5963 fax: (505) 665-7920 email: sullivan@lanl.gov From chiu@nevis1.nevis.columbia.edu Mon Nov 1 19:20:57 MST 1999 Received: from mailhost.lanl.gov (mailhost.lanl.gov [128.165.3.12]) by p2hp4.lanl.gov (8.8.6 (PHNE_14041)/8.8.6) with ESMTP id TAA09208; Mon, 1 Nov 1999 19:20:57 -0700 (MST) Received: from mailproxy.lanl.gov (mailproxy.lanl.gov [128.165.254.25]) by mailhost.lanl.gov (8.9.3/8.9.3/(cic-5, 2/8/99)) with ESMTP id TAA18354; Mon, 1 Nov 1999 19:21:08 -0700 Received: from nevis1.nevis.columbia.edu (nevis1.nevis.columbia.edu [192.12.82.3]) by mailproxy.lanl.gov (8.9.3/8.9.3/(cic-5, 2/8/99)) with ESMTP id TAA15498; Mon, 1 Nov 1999 19:21:07 -0700 Received: from localhost (chiu@localhost) by nevis1.nevis.columbia.edu (980427.SGI.8.8.8/980728.SGI.AUTOCF) via ESMTP id VAA68142; Mon, 1 Nov 1999 21:21:06 -0500 (EST) Date: Mon, 1 Nov 1999 21:21:06 -0500 From: Mickey Chiu To: John Sullivan cc: Mickey Chiu , sullivan@p2hp4.lanl.gov, Cheng-Yi Chi , Jamie Nagle , ygim@p2hp4.lanl.gov Subject: Re: mvd chaintest In-Reply-To: <199911012338.QAA08809@p2hp4.lanl.gov> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Status: RO hi john, yes, the september 7 version is the new version. can you check whether you are using NFS to mount your disk? you can type nfsDevShow to see which disks are NFS mounted. you should see some lines with the nfs server name and the mounted disks. if not, try mounting the disks by typing nfsMount "nfs_server", "/remote/disk", "/local/mount/point" the return value should be 0. if you get a return value of -1, then the mount failed (which you can check anyway with nfsDevShow). the disk needs to be nfs mounted because otherwise the crate controller caches all the data in memory until the end of the run, and then dumps it all using rcp. if the data goes above 5 or 10 megabytes, then all the memory in the controller is used up. let me know if this helps get rid of the memPartAlloc problems. mickey On Mon, 1 Nov 1999, John Sullivan wrote: > Hi Mickey, > As far as I know, I am using the file you called libdcm.o.new. However, > to confuse things, I changed the file name from libdcm.o.new to libdcm.o. > It should be the same file however. If it helps, the creation date is > 7-Sep-1999 at 12:13. I am not sure where this date comes from since it > was transfered to this computer after that date (unless you assume it is the > wrong file). The size is 196375 bytes. > If you think I have the wrong file, I will try a new one if you put it > someplace where I can find it. > Thanks, > John > > > > > > hi john, > > > > are you using the latest version of the dcm code, the one i had downloaded > > when i logged on to the system? i think i had named it libdcm.o.new. > > some of the older versions of the code had memory leak problems. > > -- > phone: (505) 665-5963 fax: (505) 665-7920 email: sullivan@lanl.gov > From sullivan Thu Nov 4 09:28:12 MST 1999 Received: (from sullivan@localhost) by p2hp4.lanl.gov (8.8.6 (PHNE_14041)/8.8.6) id JAA14732; Thu, 4 Nov 1999 09:28:11 -0700 (MST) From: John Sullivan Message-Id: <199911041628.JAA14732@p2hp4.lanl.gov> Subject: Re: mvd chaintest To: chiu@nevis1.nevis.columbia.edu (Mickey Chiu) Date: Thu, 04 Nov 1999 9:28:11 MST Cc: sullivan In-Reply-To: ; from "Mickey Chiu" at Nov 1, 99 9:21 pm X-Mailer: Elm [revision: 212.2] Status: RO Hi Mickey, As always, I'll start by saying sorry it took me so long to get to this. You figured out the problem, I think. The disk I am using does not seem to be nfs mounted. Somehow the crate controller can still see the disk (it does read files from it), but nothing is shown when I do nfsDevShow. My next problem is that I can not get it to do the nfs mount. I tried commands of the form: nfsMount "mvdonl", "/usr1", "/usr1" and nfsMount "128.165.205.20", "/usr1", "/usr1" where mvdonl.lanl.gov is the name of the computer on which the disk I want to mount lives. /usr1 is the disk's name. I always get the following result (after about 30 seconds): value = -1 = 0xffffffff = DAQSemaphore + 0xff48e0ff Am I doing something obviously stupid? Thanks, John -- phone: (505) 665-5963 fax: (505) 665-7920 email: sullivan@lanl.gov From chiu@nevis1.nevis.columbia.edu Thu Nov 4 09:58:25 MST 1999 Received: from mailhost.lanl.gov (mailhost.lanl.gov [128.165.3.12]) by p2hp4.lanl.gov (8.8.6 (PHNE_14041)/8.8.6) with ESMTP id JAA14793 for ; Thu, 4 Nov 1999 09:58:24 -0700 (MST) Received: from mailproxy.lanl.gov (mailproxy.lanl.gov [128.165.254.25]) by mailhost.lanl.gov (8.9.3/8.9.3/(cic-5, 2/8/99)) with ESMTP id JAA19149 for ; Thu, 4 Nov 1999 09:58:30 -0700 Received: from nevis1.nevis.columbia.edu (nevis1.nevis.columbia.edu [192.12.82.3]) by mailproxy.lanl.gov (8.9.3/8.9.3/(cic-5, 2/8/99)) with ESMTP id JAA06936 for ; Thu, 4 Nov 1999 09:58:30 -0700 Received: from localhost (chiu@localhost) by nevis1.nevis.columbia.edu (980427.SGI.8.8.8/980728.SGI.AUTOCF) via ESMTP id LAA35264 for ; Thu, 4 Nov 1999 11:58:18 -0500 (EST) Date: Thu, 4 Nov 1999 11:58:18 -0500 From: Mickey Chiu To: John Sullivan Subject: Re: mvd chaintest In-Reply-To: <199911041628.JAA14732@p2hp4.lanl.gov> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Status: RO i'm not sure.. did you export the disk on mvdonl to the crate controller? the "exportfs" exports the disks that are listed in the /etc/exports file. you may need to add an entry into the /etc/exports file. mickey On Thu, 4 Nov 1999, John Sullivan wrote: > Hi Mickey, > As always, I'll start by saying sorry it took me so long to > get to this. > > You figured out the problem, I think. The disk I am using does > not seem to be nfs mounted. Somehow the crate controller can still > see the disk (it does read files from it), but nothing is shown > when I do nfsDevShow. My next problem is that I can not get it > to do the nfs mount. I tried commands of the form: > nfsMount "mvdonl", "/usr1", "/usr1" > and > nfsMount "128.165.205.20", "/usr1", "/usr1" > where mvdonl.lanl.gov is the name of the computer on which > the disk I want to mount lives. /usr1 is the disk's name. > I always get the following result (after about 30 seconds): > value = -1 = 0xffffffff = DAQSemaphore + 0xff48e0ff > > Am I doing something obviously stupid? > > Thanks, > John > -- > phone: (505) 665-5963 fax: (505) 665-7920 email: sullivan@lanl.gov > From sullivan Thu Nov 4 10:27:38 MST 1999 Received: (from sullivan@localhost) by p2hp4.lanl.gov (8.8.6 (PHNE_14041)/8.8.6) id KAA14841; Thu, 4 Nov 1999 10:27:38 -0700 (MST) From: John Sullivan Message-Id: <199911041727.KAA14841@p2hp4.lanl.gov> Subject: Re: mvd chaintest To: chiu@nevis1.nevis.columbia.edu (Mickey Chiu) Date: Thu, 04 Nov 1999 10:27:37 MST Cc: sullivan In-Reply-To: ; from "Mickey Chiu" at Nov 4, 99 11:58 am X-Mailer: Elm [revision: 212.2] Status: RO Hi Mickey, Yes, I did add an entry in the /etc/exports file and just looked and it says it is exported (using the SAM utility). I guess I should try nfs mounting this disk from some other system which I understand better to make sure the problem is not in the way the disk is exported. Regards, John > > > i'm not sure.. did you export the disk on mvdonl to the crate controller? > the "exportfs" exports the disks that are listed in the /etc/exports file. > you may need to add an entry into the /etc/exports file. > > mickey > > > On Thu, 4 Nov 1999, John Sullivan wrote: > > > Hi Mickey, > > As always, I'll start by saying sorry it took me so long to > > get to this. > > > > You figured out the problem, I think. The disk I am using does > > not seem to be nfs mounted. Somehow the crate controller can still > > see the disk (it does read files from it), but nothing is shown > > when I do nfsDevShow. My next problem is that I can not get it > > to do the nfs mount. I tried commands of the form: > > nfsMount "mvdonl", "/usr1", "/usr1" > > and > > nfsMount "128.165.205.20", "/usr1", "/usr1" > > where mvdonl.lanl.gov is the name of the computer on which > > the disk I want to mount lives. /usr1 is the disk's name. > > I always get the following result (after about 30 seconds): > > value = -1 = 0xffffffff = DAQSemaphore + 0xff48e0ff > > > > Am I doing something obviously stupid? > > > > Thanks, > > John > > -- > > phone: (505) 665-5963 fax: (505) 665-7920 email: sullivan@lanl.gov > > > > -- phone: (505) 665-5963 fax: (505) 665-7920 email: sullivan@lanl.gov From sullivan Fri Dec 3 17:36:07 MST 1999 Received: (from sullivan@localhost) by p2hp4.lanl.gov (8.8.6 (PHNE_14041)/8.8.6) id RAA29753; Fri, 3 Dec 1999 17:36:06 -0700 (MST) From: John Sullivan Message-Id: <199912040036.RAA29753@p2hp4.lanl.gov> Subject: mvd chaintest results To: phenix-mvd-l@bnl.gov, haggerty@bnl.gov (John Haggerty), miljko, young@orph01.phy.ornl.gov (Glenn Young), nagle@nevis1.nevis.columbia.edu (Jamie Nagle), chi@nevis.nevis.columbia.edu (Cheng-Yi Chi) Date: Fri, 03 Dec 1999 17:36:05 MST Cc: sullivan X-Mailer: Elm [revision: 212.2] Status: RO Hi, This message goes to the MVD listserver and a few other possibly interested parties. I do not know of the "interested parties" are on the mvd listserver or not. If you get it twice, sorry. Abstract: The MVD chaintest seems to work (mainly) for at least 100K events. One problem is that the number of triggers sent to the system is a few percent more than the number of events seen in the output file. Perhaps this difference is caused by the limited speed of writting the files to disk. There are parity errors about once per 50K data packets. relevant webpage: "Notes about MVD chaintest" http://p25ext.lanl.gov/phenix/mvd/elect/chaintest/chain.html ____________________________________________________________ The long version: ____________________________________________________________ With recent help from Sangkoo Hahn, Younggook Kim, and Mickey Chiu, the MVD DAQ chaintest now seems to work. SangYeol Kim (now back in Korea) and Miljko Bobrek (now back in the thick air of Oak Ridge) also played key roles in putting this chaintest together. Honerable mention goes to a variety of people at ORNL, Nevis, and BNL for past help on this setup. We have not run 1 million events through the system, as Glenn and Chi would like, but I have run more than 100K events. Due to lack of disk space, combined with my ingnorance of any other way to do the test, I am unable to run 1 million events without stopping the DAQ system a few times and cleaning up the disk. The chaintest consists of 1) 6 MCMs (but I only read out two of them right now) 2) 1 power/communication board 3) 1 motherboard 4) 1 phenix low voltage crate 5) 1 MVD prototype Data collection interface module (DCIM) 6) 1 MVD prototype Timing and Control Interface module (TCIM) 7) 1 homemade (by SangYeol Kim) replacement for the arcnet interface module 8) 1 9u VME "Interface crate" 9) 1 miniDAQ used as a substitute of a granule timing module 10) 1 6u VME crate 11) 1 Phenix Data collection interface module 12) 1 VME crate controller 13) 1 Phenix Partition module 14) 2 PC's and 1 HP/unix system 15) a variety of computer programs 16) a variety of cables, copper and optical 17) a few NIM modules 18) part of the MVD cooling system In short, a lot of stuff. There is a sketch of this on the web page I mentioned at the top of this message. The data output packets go through a "real" Phenix DAQ chain MCM --> power/comm board --> motherboard --> DCIM --> DCM --> crate controller The crate controller writes the data to a Phenix Raw Data Format (PRDF) file on the disk of the HP/unix system. This disk is what limits the number of events -- the disk gets full after a few 100K events. The timing information uses only part of the "real" chain. A mini-daq system is setup via labview code on the PC. It sends of the timing information (level-1's, clocks, ...) out on a optical fiber. The minidaq replaces a granule timing module in the "real" system. The rest of the chain is like the real system: minidaq --> TCIM --> motherboard --> power/comm board --> MCM I should mention that the minidaq + TCIM part of the setup is extremely stable. I started in Monday morning and it was still running fine Friday morning without my ever touching it. The setup arcnet uses less of the real system, since the Arcnet interface module does not yet exist. The arcnet information (programs for FPGAs in the DCIM and MCM; serial control bits for the MCMs, TCIM, DCIM) is sent out of a PC running labwindows to a homemade "arcnet" board which is in the interface module crate. It sends the data out on the "real" DAQ system path: homemade arcnet <--> motherboard <--> power/comm board <-->MCM The arrows go both ways since some of the data can be read back through the chain. The tests so far all have used the "test" mode in the MCMs. In this mode each MCM sends out ADC data consisting of 1, 2, 3, ... 256 for the 256 ADCs in the MCM. This allows us check the validity of the data received. The tests used "duplex mode" into the DCM -- each fiber into the DCM carries data from two MCMs, controlled via the ENDDAT0 and 1 signals. The trigger (into the minidaq) is a pulser (NIM module) running at rates from a few Hz to a few 10's of Hz. If the rate is too high, the system seems to fall behind the trigger rate -- I assume the limit is writting on the hp/unix disk. The data is analyzed with a simple program I wrote. It checks the parity bit on each data word, the "vertical parity" word, and about 6 other details of the data packet format. The longest test so far ran for 202 minutes. The "run" was stopped by me -- it did not crash. About 100K events (each consisting of 2 MCM data packets) were collected. The scaler attached to the pulser said there should be 105520 triggers. The program on the crate controller (which I got from Mickey and Jamie) said it had seen 211040 packets (=2*105520). The PRDF file 205190 contained data packets -- 102710 events from the first MCM and 102467 packets from the 2nd MCM. I do not know if the small (0.2%) difference in packets from the two MCMs is a problem of the DAQ chain or my code. The code also reported 3 events with parity errors (in the "vertical parity") and one event with the detector ID (should be 2 for the MVD) incorrect in the data packet. All other details of the data packets (including the ADC values) were correct for all packets. The first MCM was only got 97.3% of the events to the output file and the second MCM only got 97.1% of the events to the output file. I do not yet know the cause of these discrepencies. In shorter runs of a few thousand events, I did not see such differences. I hypothesize that the differences in the number of triggers and the number of events in the file is related to the speed at which the events can be stored on disk -- but I have not tested this. However, if you look at the table below there seems to be an anti-correlation between the fraction of the events which get into the PRDF file and the event rate. Here a a few similar statistics from the test described above and a few other tests (actually I divided the number of "events" from the crate controller by two to convert packets to events.): Test1 Test2 Test3 Test4 triggers (from NIM scaler) = 105520 12125 45816 535 events reported by "dcm" program in crate controller = 105520 12125 45816 535 MCM1 packets in PRDF file = 102719 12097 44163 535 MCM2 packets in PRDF file = 102467 12097 44047 535 packets with parity errors, or other format problems = 4 0 2 0 length of run (minutes) = 202 31 89 2.5 rate: events/sec = 8.7 6.5 8.6 3.5 Regards, John -- phone: (505) 665-5963 fax: (505) 665-7920 email: sullivan@lanl.gov