Views
#250: Problems with CRCmatch
View
(Anonymous)
issuedata
Topic:
D0
Classification:
User Problem
Importance:
Medium
Assigned to:
Created by: plone
Created at: 2005-02-14
contact
Name:
plone
Description:
Date: Mon, 14 Feb 2005 13:51:05 +0100 Reply-To: Thomas Nunnemann <Thomas.Nunnemann@PHYSIK.UNI-MUENCHEN.DE> Hi - we recently upgraded to sam_station v4_2_1_64 at GridKa. All file transfers fail due to a CRC mismatch, which is apparently not due to the file transfer itself, but the reporting of the correct CRC (see below). Are there any known issues with this version? Thanks - Thomas 02/14/05 11:50:45 d0karlsruhe.EWORKER 24341: Expecting unknown value CRC 02/14/05 11:51:15 d0karlsruhe.EWORKER 24341: Got :2798025815L 02/14/05 11:51:15 d0karlsruhe.EWORKER 24341: Error: CRC mismatch source: remote- production-router:d0rsam01.fnal.gov:/sam/cache2/remote-production-router01/boo t arget: d0karlsruhe:d0.fzk.de:/grid/fzk.de/d0/d0-1/boo/all_0000162577_072.raw [...] Code: CRC check failed (Category SAM Internal) Severity level: ERROR Generated on 14 Feb 11:51:15 by eworker In the context: local, method name: checkCRCmatch Recommended action: Please contact sam-admin@fnal.gov
Transcript
#10:
2005-03-10 02:19 PM (plone)
status: "accepted" ->
"resolved"
assignees: "[]" ->
"['nicholls', 'nicholls']"
#9:
2005-02-16 10:47 AM (plone)
Date: Tue, 15 Feb 2005 15:25:45 -0600 Reply-To: Lauri Loebel Carpenter <lauri@FNAL.GOV> a) The problem boils down to a change that was not propagated to the db. Sometime around June 2001 we changed the spelling of an unknown crc from 'unknown value' to 'unknown crc value'. All files declared since then have 'unknown crc value' (or a real crc value); files before then seem to have 'unknown value'. b) I am making the global change right now, should be done in about 10 minutes, so that it becomes 'unknown crc value' as it should; the station will then recognize that the value is unknown and should not be compared to anything. c) Then I will begin the task of setting the *real* crc value for as many of these files as I can (in dataTier 'raw' and in the other data tiers as well). This will probably take some time, as there are 500K+ files suffering from this symptom. -- lauri
#8:
2005-02-16 10:46 AM (plone)
Date: Tue, 15 Feb 2005 12:34:50 -0600 Reply-To: Andrew Baranovski <abaranov@FNAL.GOV> Hello , There is another problem : The file all_0000162577_072.raw has "unknown value" set as its CRC where it should be "unknown crc value" to be recognized as unknown. I changed it for this particular file. However there are 469856 other files that do not have correct CRC "unknown". So before I change the DB I'd like to understand whether there it was a recent error or these files were not properly stored from the start. Regards, Andrew
#7:
2005-02-16 10:10 AM (plone)
status: "resolved" ->
"accepted"
Date: Tue, 15 Feb 2005 13:57:07 +0100 Reply-To: Thomas Nunnemann <garzoglio@fnal.gov> Hi - I upgraded to to version v4_2_1_77 but the problem is not fixed in that version either! The station still rejects the delivery since no CRC is stored in the meta-data. Is there another version which can/should be used? Thanks - Thomas
#6:
2005-02-14 04:01 PM (plone)
Date: Mon, 14 Feb 2005 21:48:12 +0100 Reply-To: Thomas Nunnemann <Thomas.Nunnemann@PHYSIK.UNI-MUENCHEN.DE> Hi Robert and Andrew - thanks for the suggestion! I will upgrade to version x77. Cheers - Thomas
#5:
2005-02-14 11:18 AM (plone)
Date: Mon, 14 Feb 2005 10:52:09 -0600 Reply-To: Robert Illingworth <illingwo@FNAL.GOV> --On Monday, February 14, 2005 5:31 PM +0100 Thomas Nunnemann <Thomas.Nunnemann@Physik.Uni-Muenchen.DE> wrote: > Hi Robert - > > thanks for finding a problem! > I tried copying remaining 9 raw data files from run 162577 which all > failed with this error, therefore I didn't thought about that the CRC > might be in fact missing in the DB. > Do you have an idea how many files might be affected? Is this a known > problem? Not to me. SQL> select count(*) from data_files,data_tiers where data_files.data_tier_id = data_tiers.data_tier_id and data_tiers.data_tier = 'raw' and data_files.crc_value = 'unknown value'; COUNT(*) ---------- 3604 Enstore also seems a bit confused about the CRC: $ enstore pnfs --info /pnfs/sam/dzero/copy1/datalogger/initial_runs/datalogger/all/all/all_0000162577_ 072.raw volume: PRM027 location_cookie: 0000_000000000_0000078 size: 686013806 file_family: datalogger_mezsilo_copy1 original_name: /pnfs/sam/dzero/copy1/datalogger/initial_runs/datalogger/all/all/all_0000162577_ 072.raw map_file: pnfsid_file: 000F00000000000000472368 pnfsid_map: bfid: D0MS103024501800000 origdrive: d0enmvr12a:/dev/rmt/tps2d1n:4560020042 crc: unknown but: $ enstore info --bfid=D0MS103024501800000 {'bfid': 'D0MS103024501800000', 'complete_crc': 2798025815L, 'deleted': 'no', 'drive': 'd0enmvr12a:/dev/rmt/tps2d1n:4560020042', 'external_label': 'PRM027', 'gid': -1, 'location_cookie': '0000_000000000_0000078', 'pnfs_name0': '/pnfs/sam/dzero/copy1/datalogger/initial_runs/datalogger/all/all/all_0000162577 _072.raw', 'pnfsid': '000F00000000000000472368', 'sanity_cookie': (65536L, 1283684201L), 'size': 686013806L, 'uid': -1} Different commands are giving different opinions on the CRC. > P.S.: The station was recently upgraded with Gabriele's help. Is there a > reason we should move to a newer version? Apart from any issues with CRC checks, 64 doesn't work properly with glibc 2.3 systems and if I remember correctly has a bug where trying to start projects with duplicate names can crash the station. There were other improvements, but I think they were mostly minor. v4_2_1_77 has been running on the FNAL stations for many months now and is extremely stable. Robert
#4:
2005-02-14 11:18 AM (plone)
Date: Mon, 14 Feb 2005 10:33:31 -0600 Reply-To: Andrew Baranovski <abaranov@FNAL.GOV> Hello , It shouldn't have attempted to compute CRC if it was missing in the DB. I remember this as an issue in older station releases. Please upgrade to x77 . Regards, Andrew
#3:
2005-02-14 11:17 AM (plone)
status: "accepted" ->
"resolved"
Date: Mon, 14 Feb 2005 17:31:25 +0100 Reply-To: Thomas Nunnemann Hi Robert - thanks for finding a problem! I tried copying remaining 9 raw data files from run 162577 which all failed with this error, therefore I didn't thought about that the CRC might be in fact missing in the DB. Do you have an idea how many files might be affected? Is this a known problem? Thanks - Thomas P.S.: The station was recently upgraded with Gabriele's help. Is there a reason we should move to a newer version?
#2:
2005-02-14 11:13 AM (plone)
Date: Mon, 14 Feb 2005 10:01:57 -0600 Reply-To: Robert Illingworth <illingwo@FNAL.GOV> Hi Thomas, It looks like the CRC is missing for this file (and presumably others if you say all transfers are failing: are these all raw data files from around the same time?) octarine-clued0:~> sam get metadata --file=all_0000162577_072.raw File Type: Raw Data File File Name: all_0000162577_072.raw File ID: 1869978 File Size: 669936 [KB] CRC Data: unknown value [unknown crc type] File Start Time: 08/24/2002 19:58:35 File End Time: 08/24/2002 20:01:26 Physical Stream: all File Format Info: dspack First Event: 6674253 Last Event: 6684466 Total Events: 2245 Process Family: datalogger Process Name: datalogger Process Version: 0.0 Node Name: d0olc.fnal.gov Work Group: online daq User Name: d0run Run Number: 162577 Run Type: physics data taking Minimum Luminosity: 1497364 Maximum Luminosity: 1497365 File Partition: 0 (Incidentally, why upgrade to v4_2_1_64? It's not the most recent version.) Robert
#1:
2005-02-14 11:12 AM (plone)
status: "pending" ->
"accepted"
assignees: "[]" ->
"['listserv1']"
title: "" ->
"Problems with CRCmatch"
description: "" ->
"Date: Mon, 14 Feb 2005 13:51:05 +0100
Reply-To: Thomas Nunnemann <Thomas.Nunnemann@PHYSIK.UNI-MUENCHEN.DE>
Hi -
we recently upgraded to sam_station v4_2_1_64 at GridKa. All file
transfers fail due to a CRC mismatch, which is apparently not due to the
file transfer itself, but the reporting of the correct CRC (see below).
Are there any known issues with this version?
Thanks -
Thomas
02/14/05 11:50:45 d0karlsruhe.EWORKER 24341: Expecting unknown value CRC
02/14/05 11:51:15 d0karlsruhe.EWORKER 24341: Got :2798025815L
02/14/05 11:51:15 d0karlsruhe.EWORKER 24341: Error: CRC mismatch source:
remote-
production-router:d0rsam01.fnal.gov:/sam/cache2/remote-production-router01/boo
t
arget: d0karlsruhe:d0.fzk.de:/grid/fzk.de/d0/d0-1/boo/all_0000162577_072.raw
[...]
Code: CRC check failed (Category SAM Internal)
Severity level: ERROR
Generated on 14 Feb 11:51:15 by eworker
In the context: local, method name: checkCRCmatch
Recommended action: Please contact sam-admin@fnal.gov
"
PloneCollectorNG (C) 2003-2004 by ZOPYX - Software Development and Consulting Andreas Jung