Last update: Mon 02/04/02 16:57
a_entry a_owner a_short_description a_status long_description_of_problem package sam_vsn xest_pers_days zdate_added zdate_finished zz_notes
D0RACE tutorials  tbd  Prepare for Feb 11 + 12 D0RACE workshop SAM tutorials. include setting up and testing a station. SAM Shifter training as needed.                 
D0 support  Lauri + others    ongoing    NA  NA         
sam user  Lauri  take over sam_user with Carmenita. Cleanup commands, error messages, adding new commands  ongoing    sam_user  NA         
builds  Lauri  standardizing builds  evolutionary  requires a lot of design which is the bulk of the job. Benefit is anyone can build any piece of sam easily.             
sam parser  Lauri  convert all sam commands to use new command parser, clean up commands. new help, will make writing unit test suite easier.  done    sam_user  v4.0         
get run  Lauri  sam get run command -(needs further specification)  2002    sam_user,sam_db_server           
diskcache  Lauri,Igor  samadmin remove diskcache command. Need uncache file, disk, group cache, station.  done  Mark disk and files on it as not available. Need to go over use cases.  sam_admin  v3.2.2         
archive logs  Sinisa,Lee,Lauri  archiving of log files (waiting on sam on sun). Need stager and encp only, could have station running on other node. Sinisa will try to build stager on SUN, Lee will get encp for sun (seems to be available). Lauri will do final set up.                 
sam-at-a-glance  Lauri, Diana  Improve for sam-at-a-glance so it runs on ora 1 and provides more up to date information. May require sam user to run on sun OS, or convert to use the name service status info (just ping the stations instead of sam dump). Need to add additional information to database, 1. known down, and also 2 monitoring level: high, medium, and low availability systems (see Lauri's mail describing this in detail).                 
unit tests  Lauri,Chris  Produce unit tests for sam user interface. Tied to the sam parser task.      sam_user  v4.1         
clued0  Chris + Igor  Continue testing of distributed sam on Clued0. Include implemimenting batch system and load testing with additional desktop node included.  ongoing      v3.2.1, 3.2.2         
quick station revival  Matt,Igor  decrease station startup time Matt picking up the ball on that, but will require station changes. Matt's top priority. FNORB limitation has stopped this until it is figured out. Steve, Sinisa, and Matt will try to sort this out. May require another way to send files in smaller blocks of lists due to restriction in FNORB.  problems      v3.2.2         
eworker/sam_cp  Lauri,Igor  Needs to be able to recognize intra-node, intra-domain, and extra-domain transfers. Includes using rcp, kerborized rcp, bbrcp, and Gridftp.  done    sam_cp,sam_station  v4.0      10/19   
file-status  Lauri, Steve, Diana, Matt  Add crummy file status and needed support features. Could use more enduring name, like unofficial or suspect. Matt's second priority. Needs response to Matt's mail from 11/13/01. Held brain storming session Thurs Jan 17, Diana wrote notes.  in design  storing of 'crummy' half finished files - proposal on how to use status of file. Investigation of what code would need to change in sam store (or whether it is just a little samadmin command you are allowed to do right after the store has succeeeded). Investigate how to deal with --resubmit which wants to overwrite a crummy file - needs to call another samadmin command to first delete the file in pnfs space. Additional thought and discussion indicates that the way we use the current file status is incorrect, and some current statuses should be moved to file@location status. Additional statuses discussed at d0 include :incomplete, obsolete, superseded, user-added, unofficial. May be more or others    v4.1         
MC dimension code  Matt, Carmenita  Need dimension query ability for MC request parameters  close               
MC  Carmenita  Ongoing support for MC request features.  test. Still need get meta data.  Meta data is there, need response from Dave concerning the MC_Runjob description files he has produced as examples. Need way to add support entries. Store works, but need to be able to initilaize request and get a request ID back, then validate attributes when processed data metadata returns from remote MC centers. Some of the server interfaces are there, need to build client part in sam user.  sam_db_server,sam_user  v4.0         
MC request end-to-end  Carmenita, Dave Evans, Iain Bertram  end to end test of MC request facility. 1. Create request using MDC file, 2. Store the request and get the request ID, 3. Retrieve the request and generate macro 4. Modify the status to "running", 5. Run the request at a remote processing farm, 6. Declare / store files into SAM, 7. Translate constraints w/ request parameters, 8. create data set to retrieve files, 9. Modify state to "done".                 
app_family + param type/name/value  Steve,Lauri,Carmenita,Diana  Link application name/version with MC param type/name/value to provied way to record generalized processing attributes.  needs design    sam_user,sam_db_server, sam_db  v4.1         
Documentation    Look through documentation and fix problems. Need sam quick reference page, to replace the quick start guide that is obsolete.  after Thanksgiving  sam get metadata,list definition --keywords, sam create dataset --keyword???, sam run project, sam submit may have problems, mc runjob new metadata, auto dest "sam store --descrip=...", add new phase needs to be documented. need to document metadata for luminosity and archive files, sam batch commands, psusp, files not delivered. python api, new dimensions and examples. Translation of status block . sam toonl should be documented. Sam station starting options through sam_bootstrap startup. new flags need to be documented. Questions about groups need to be answered in docmentation.  sam_doc           
omniORB.py  Steve  continue to understand issues of omniORB.py use with sam . Steve provide detailed list of work to be done. Steve will produce list for discussion 12/04/2001. Steve has made some progress and can describe where he feels the problems are 1/28/2002.                 
autodest  Carmenita  autodestination with processed files needs to be resolved bug in the server in constructing the path, pulling info from the parent that it should not.  done, needs test on farms    sam_user,sam_db_server  v3.2         
get num copies  Carmenita  get the number of copies for each file from the sam database need to decide where this is kept in sam.      sam_user         
file_family  Carmenita  Add code to sam autodest so that the proposed path string uses an optional entry for "file_family=..." appended to the stream field. This has been requested by Gerry for the online direction of files to tape. Still some debate, but will provide flexibility for streaming decisions to be made later.      sam_user         
samadmin  Lauri  mark entire station as down, also might want node down, station down, fss down.  not critical    sam_admin           
new d0 robot  Chris, Lee  Need to prepare for use of new d0 robot as details emerge, need new sam on d0lxac1 and d0lxbld1.  done               
Jenny xmas tasks  Jenny Chen + Diana  Tasks for Jenny over Christmas 1. complete tasklist formatting script, 2. Shifter log tool, 3. shifter notification script. Task 2 needs a 2 table schema that Diana will build.  #3 done, started #2, not finished    sam_shift_tools, + others           
consumer.py  Chris  finish and test consumer.py. Work needed to go to a null flavor sam_user. (Lauri and Chris)  done, in production    sam_user  v4.0         
sam manager  Sinisa  possible sam_manager work that may be needed. Pingable client. Check restart option works with --CPID on command line.Also desire to reuse Gabriele's api for ROOT. Gabriele might be able to do this.  eventually, not high priority    sam_manager  v3.2.1         
Opt bug  Sinisa, Igor  There is an optimizer communication problem with FSS that shows up ocassionally. Need to fix FSS to retry, add some debugging messages, and reduce the registration time. Scheduled to be done week of Jan14.      sam_station           
Name service  Sinisa  Finish testing latest fixes to naming server and move to production.  done, extreme success!    sam_name_server           
SUN station  Sinisa  Compile SAM for SUN. Need to compile Orbacus with Kai 4.0f. Need to modify some of the stager code to wor. there.  done    sam_station           
SAM Manager  Sinisa  Need to build with new IDL products to catch the new exceptions introduced by Igor. Need to coordinate sam_station_idl.  done    sam_manager  v4.0         
Monitor ClueD0  Sinisa  Work to upgrade the backend of the SC2001 info gathering scripts to load information into the new oracle tables using dcoracle.Maybe some changes to sam_admin tools for mining log files. May also want to break log files daily to avoid long processisng times to extract information.                 
Restart    Need to be able to recover projects after station crash. 1. application must be restartable, 2. batch system must coordinate with projects, 3. projects are restarted.                 
d0mino-sam  lauri  Add ability for remotely-initiated transfers to use d0mino-sam dedicated interface on d0mino. Do not believe this involves any mods to bbftp.      sam_cp           
cleanup FSS cache cron  Carmenita  Create a cron job that runs once per day to run the cleanup script on d0mino needed if the FSS dies with orphaned files in the cache.  done               
remote staging  Chris, Igor, Others  Need to set up stagers on d0mino for remote stations so they can access files from fnal MSS. Need to test this and begin to deploy for remote sites. Should work with existing station code. Need to setup cache area for remote stations to be set up at D0Race Workshop  tested               
data routing  Igor + New person  need design for ultimate file routing. May include incorporation of FSS into station server which brings other important features like fss cache management and persistency. Refer to Igor's email concerning the topic.      sam_station           
db upkeep  diana  continue upkeep and monitoring of d0 db instances                 
Q management    batch queue management and restrictions to hold a single user to limited no of jobs  deferred               
Helpdesk Followup  Lauri,Carmenita  Need to follow up HD tickets assigned to sam and resolve and closeout        ongoing         
Vicky's list  Need to decide  list from Vicky from November.    Known issues/operations/testing stuff a) clueD0 and other linux stations strange things with PM, b) restarts - are they working - tell the users how to do it., c) writing out root-tuples at end of input file - tell users how to do it - Jim K was going to write a mail about this - root-tuple writer package needs to catch framework 'event' that input file has been closed, just like sam_manager catches it., d) remote stations getting files through from tape via their own stager need to test, e) stken need to test, f) routing and use of Gb interfaces - needs more discussion and a written, understanding of what we are going to do, g) sam submit - not allowing users to run in Farm-like mode, h) testing from Nikhef - running analysis project on d0mino to use files from SARA robot. Also the inverse - running project there and pulling files from d0mino with bbftp.             
/sam/cache  Lee, Dave F.  Re-organize /sam/cache disks.  needs more detailed plan  Re-organize /sam/cache disks into following: 1. /sam/cache - used for sam station cache, 2. /sam/route - used for remote station routing to enstore, 3. /sam/remote - used for remote station access to fnal MSS, and 4. /sam/external - used as "external" locations (e.g. farm and online data). Would like to have following arrangement: as many 273GB disks as needed for /sam/cache, and /sam/external. Allocate one 273GB disk for each /sam/route, and /sam/remote. Under the covers, disks will be called /sam/generic, and have sym links that assign to a particular usage, and group. For example, /sam/route/uta -> /sam/generic05/A, /sam/cache28 -> /sam/generic06, /sam/remote/in2p3 -> /sam/generic07/A, /sam/cache25 -> /sam/generic08.             
TH upgrades  Chris  Improve test Harness to reflect behaviour more consistent with central-analysis. For example, need simulated users to kill their jobs in the middle, and need many 10's of thousands of small files cached and reused many times. This will test the station revival more completely.                 
FRH 7.1 on SAM cluster  Chris, Operations group  Need to install RH7.1 on SAM cluster                 
monitoring use cases  ppdg-sisnsa  prepare and present monitoring use cases at D0Grid meeting                 
SAM CDF  ppdg-sinisa (or other)  split sam_config, and sam_boot_strap so can run completely independent db_servers, naming_service, optimizer, and data logger for SAM deployments other than D0.               
Pick events design    design for pick events using existing sam tools, and additional features for pooling requests, caching events, and cataloging.