Skip to content.

SAMGrid Plone

Sections
Personal tools
You are here: Home » Tracking » 250: Problems with CRCmatch » pcng_issue_view
Search
Navigation:
Show issue #  

 
Views

#250: Problems with CRCmatch

View (Anonymous)
issuedata
Topic: D0
Classification: User Problem
Importance: Medium
Status: Resolved
Assigned to:
Created by: plone
Created at: 2005-02-14
contact
Name: plone
Description:
Transcript
#10: 2005-03-10 02:19 PM (plone)
status: "accepted" -> "resolved"
assignees: "[]" -> "['nicholls', 'nicholls']"

#9: 2005-02-16 10:47 AM (plone)
Date:         Tue, 15 Feb 2005 15:25:45 -0600
Reply-To:     Lauri Loebel Carpenter <lauri@FNAL.GOV>
a) The problem boils down to a change that was not propagated
    to the db.  Sometime around June 2001 we changed the spelling
    of an unknown crc from 'unknown value' to 'unknown crc value'.
    All files declared since then have 'unknown crc value' (or a real
    crc value); files before then seem to have 'unknown value'.
b) I am making the global change right now, should be done in about
    10 minutes, so that it becomes 'unknown crc value' as it should;
    the station will then recognize that the value is unknown and
    should not be compared to anything.
c) Then I will begin the task of setting the *real* crc value
    for as many of these files as I can (in dataTier 'raw' and in
    the other data tiers as well).   This will probably take some
    time, as there are 500K+ files suffering from this symptom.
-- lauri

#8: 2005-02-16 10:46 AM (plone)
Date:         Tue, 15 Feb 2005 12:34:50 -0600
Reply-To:     Andrew Baranovski <abaranov@FNAL.GOV>
Hello ,
There is another problem :
The file all_0000162577_072.raw has "unknown value" set as its CRC where
it should be "unknown crc value" to be recognized as unknown. I changed
it for this particular file.
However there are 469856 other files that do not have correct CRC
"unknown".
So before I change the DB I'd like to understand whether there it was a
recent error or these files were not properly stored from the start.
Regards,
Andrew

#7: 2005-02-16 10:10 AM (plone)
status: "resolved" -> "accepted"
Date:         Tue, 15 Feb 2005 13:57:07 +0100
Reply-To:     Thomas Nunnemann <garzoglio@fnal.gov>
Hi -
I upgraded to to version v4_2_1_77 but the problem is not fixed in that
version either!
The station still rejects the delivery since no CRC is stored in the
meta-data. Is there another version which can/should be used?
Thanks -
Thomas

#6: 2005-02-14 04:01 PM (plone)
Date:         Mon, 14 Feb 2005 21:48:12 +0100
Reply-To:     Thomas Nunnemann <Thomas.Nunnemann@PHYSIK.UNI-MUENCHEN.DE>
Hi Robert and Andrew -
thanks for the suggestion! I will upgrade to version x77.
Cheers -
Thomas

#5: 2005-02-14 11:18 AM (plone)
Date:         Mon, 14 Feb 2005 10:52:09 -0600
Reply-To:     Robert Illingworth <illingwo@FNAL.GOV>
--On Monday, February 14, 2005 5:31 PM +0100 Thomas Nunnemann
<Thomas.Nunnemann@Physik.Uni-Muenchen.DE> wrote:
> Hi Robert -
>
> thanks for finding a problem!
> I tried copying remaining 9 raw data files from run 162577 which all
> failed with this error, therefore I didn't thought about that the CRC
> might be in fact missing in the DB.
> Do you have an idea how many files might be affected? Is this a known
> problem?
Not to me.
SQL> select count(*) from data_files,data_tiers where
data_files.data_tier_id = data_tiers.data_tier_id and data_tiers.data_tier
= 'raw' and data_files.crc_value = 'unknown value';
  COUNT(*)
----------
      3604
Enstore also seems a bit confused about the CRC:
$ enstore pnfs --info
/pnfs/sam/dzero/copy1/datalogger/initial_runs/datalogger/all/all/all_0000162577_
072.raw
volume: PRM027
location_cookie: 0000_000000000_0000078
size: 686013806
file_family: datalogger_mezsilo_copy1
original_name:
/pnfs/sam/dzero/copy1/datalogger/initial_runs/datalogger/all/all/all_0000162577_
072.raw
map_file:
pnfsid_file: 000F00000000000000472368
pnfsid_map:
bfid: D0MS103024501800000
origdrive: d0enmvr12a:/dev/rmt/tps2d1n:4560020042
crc: unknown
but:
$ enstore info --bfid=D0MS103024501800000
{'bfid': 'D0MS103024501800000',
 'complete_crc': 2798025815L,
 'deleted': 'no',
 'drive': 'd0enmvr12a:/dev/rmt/tps2d1n:4560020042',
 'external_label': 'PRM027',
 'gid': -1,
 'location_cookie': '0000_000000000_0000078',
 'pnfs_name0':
'/pnfs/sam/dzero/copy1/datalogger/initial_runs/datalogger/all/all/all_0000162577
_072.raw',
 'pnfsid': '000F00000000000000472368',
 'sanity_cookie': (65536L, 1283684201L),
 'size': 686013806L,
 'uid': -1}
Different commands are giving different opinions on the CRC.
> P.S.: The station was recently upgraded with Gabriele's help. Is there a
> reason we should move to a newer version?
Apart from any issues with CRC checks, 64 doesn't work properly with glibc
2.3 systems and if I remember correctly has a bug where trying to start
projects with duplicate names can crash the station. There were other
improvements, but I think they were mostly minor. v4_2_1_77 has been
running on the FNAL stations for many months now and is extremely stable.
Robert

#4: 2005-02-14 11:18 AM (plone)
Date:         Mon, 14 Feb 2005 10:33:31 -0600
Reply-To:     Andrew Baranovski <abaranov@FNAL.GOV>
Hello ,
It shouldn't have attempted to compute CRC if it was missing in the DB.
I remember this as an issue in older station releases. Please upgrade to
x77 .
Regards,
Andrew

#3: 2005-02-14 11:17 AM (plone)
status: "accepted" -> "resolved"
Date:         Mon, 14 Feb 2005 17:31:25 +0100
Reply-To:     Thomas Nunnemann
Hi Robert -
thanks for finding a problem!
I tried copying remaining 9 raw data files from run 162577 which all
failed with this error, therefore I didn't thought about that the CRC
might be in fact missing in the DB.
Do you have an idea how many files might be affected? Is this a known
problem?
Thanks -
Thomas
P.S.: The station was recently upgraded with Gabriele's help. Is there a
reason we should move to a newer version?

#2: 2005-02-14 11:13 AM (plone)
Date:         Mon, 14 Feb 2005 10:01:57 -0600
Reply-To:     Robert Illingworth <illingwo@FNAL.GOV>
Hi Thomas,
It looks like the CRC is missing for this file (and presumably others if
you say all transfers are failing: are these all raw data files from around
the same time?)
octarine-clued0:~> sam get metadata --file=all_0000162577_072.raw
             File Type:  Raw Data File
             File Name:  all_0000162577_072.raw
               File ID:  1869978
             File Size:  669936 [KB]
              CRC Data:  unknown value [unknown crc type]
       File Start Time:  08/24/2002 19:58:35
         File End Time:  08/24/2002 20:01:26
       Physical Stream:  all
      File Format Info:  dspack
           First Event:  6674253
            Last Event:  6684466
          Total Events:  2245
        Process Family:  datalogger
          Process Name:  datalogger
       Process Version:  0.0
             Node Name:  d0olc.fnal.gov
            Work Group:  online daq
             User Name:  d0run
            Run Number:  162577
              Run Type:  physics data taking
    Minimum Luminosity:  1497364
    Maximum Luminosity:  1497365
        File Partition:  0
(Incidentally, why upgrade to v4_2_1_64? It's not the most recent version.)
Robert

#1: 2005-02-14 11:12 AM (plone)
status: "pending" -> "accepted"
assignees: "[]" -> "['listserv1']"
title: "" -> "Problems with CRCmatch"
description: "" -> "Date: Mon, 14 Feb 2005 13:51:05 +0100 Reply-To: Thomas Nunnemann <Thomas.Nunnemann@PHYSIK.UNI-MUENCHEN.DE> Hi - we recently upgraded to sam_station v4_2_1_64 at GridKa. All file transfers fail due to a CRC mismatch, which is apparently not due to the file transfer itself, but the reporting of the correct CRC (see below). Are there any known issues with this version? Thanks - Thomas 02/14/05 11:50:45 d0karlsruhe.EWORKER 24341: Expecting unknown value CRC 02/14/05 11:51:15 d0karlsruhe.EWORKER 24341: Got :2798025815L 02/14/05 11:51:15 d0karlsruhe.EWORKER 24341: Error: CRC mismatch source: remote- production-router:d0rsam01.fnal.gov:/sam/cache2/remote-production-router01/boo t arget: d0karlsruhe:d0.fzk.de:/grid/fzk.de/d0/d0-1/boo/all_0000162577_072.raw [...] Code: CRC check failed (Category SAM Internal) Severity level: ERROR Generated on 14 Feb 11:51:15 by eworker In the context: local, method name: checkCRCmatch Recommended action: Please contact sam-admin@fnal.gov "

Images
No images available
 

Powered by Plone

This site conforms to the following standards: