a_entry |
a_owner |
a_short_description |
a_status |
long_description_of_problem |
package |
sam_vsn |
xest_pers_days |
zdate_added |
zdate_finished |
zz_notes |
D0RACE tutorials  |
tbd  |
Prepare for Feb 11 + 12 D0RACE workshop SAM tutorials.
include setting up and testing a station. SAM Shifter training as needed.  |
  |
  |
  |
  |
  |
  |
  |
  |
D0 support  |
Lauri + others  |
  |
ongoing  |
  |
NA  |
NA  |
  |
  |
  |
  |
sam user  |
Lauri  |
take over sam_user with Carmenita. Cleanup commands, error messages, adding new commands  |
ongoing  |
  |
sam_user  |
NA  |
  |
  |
  |
  |
builds  |
Lauri  |
standardizing builds  |
evolutionary  |
requires a lot of design which is the bulk of the job. Benefit is anyone can build any piece of sam easily.  |
  |
  |
  |
  |
  |
  |
sam parser  |
Lauri  |
convert all sam commands to use new command parser, clean up commands. new help, will make writing unit test suite easier.  |
done  |
  |
sam_user  |
v4.0  |
  |
  |
  |
  |
get run  |
Lauri  |
sam get run command -(needs further specification)  |
2002  |
  |
sam_user,sam_db_server  |
  |
  |
  |
  |
  |
diskcache  |
Lauri,Igor  |
samadmin remove diskcache command. Need uncache file, disk, group cache, station.  |
done  |
Mark disk and files on it as not available. Need to go over use cases.  |
sam_admin  |
v3.2.2  |
  |
  |
  |
  |
archive logs  |
Sinisa,Lee,Lauri  |
archiving of log files (waiting on sam on sun). Need stager and encp only, could have station running on other node. Sinisa will try to build stager on SUN, Lee will get encp for sun (seems to be available). Lauri will do final set up.  |
  |
  |
  |
  |
  |
  |
  |
  |
sam-at-a-glance  |
Lauri, Diana  |
Improve for sam-at-a-glance so it runs on ora 1 and provides
more up to date information. May require sam user to run on sun OS, or convert
to use the name service status info (just ping the stations instead of sam dump). Need to add additional information to database, 1. known down, and also 2 monitoring level: high, medium, and low availability systems (see Lauri's mail describing this in detail).  |
  |
  |
  |
  |
  |
  |
  |
  |
unit tests  |
Lauri,Chris  |
Produce unit tests for sam user interface. Tied to the sam parser task.  |
  |
  |
sam_user  |
v4.1  |
  |
  |
  |
  |
clued0  |
Chris + Igor  |
Continue testing of distributed sam on Clued0. Include
implemimenting batch system and load testing with additional desktop node
included.  |
ongoing  |
  |
  |
v3.2.1, 3.2.2  |
  |
  |
  |
  |
quick station revival  |
Matt,Igor  |
decrease station startup time Matt picking up the ball on that, but will require station changes. Matt's top priority. FNORB limitation has stopped this until it is figured out. Steve, Sinisa, and Matt will try to sort this out. May require another way to send files in smaller blocks of lists due to restriction in FNORB.  |
problems  |
  |
  |
v3.2.2  |
  |
  |
  |
  |
eworker/sam_cp  |
Lauri,Igor  |
Needs to be able to recognize intra-node, intra-domain, and
extra-domain transfers. Includes using rcp, kerborized rcp, bbrcp, and Gridftp.  |
done  |
  |
sam_cp,sam_station  |
v4.0  |
  |
  |
10/19  |
  |
file-status  |
Lauri, Steve, Diana, Matt  |
Add crummy file status and needed support features. Could use more enduring name, like unofficial or suspect. Matt's second priority.
Needs response to Matt's mail from 11/13/01. Held brain storming session
Thurs Jan 17, Diana wrote notes.  |
in design  |
storing of 'crummy' half finished files - proposal on how to use status of file. Investigation of what code would need to change in sam store (or whether it is just a little samadmin command you are allowed to do right after the store has succeeeded). Investigate how to deal with --resubmit
which wants to overwrite a crummy file - needs to call another samadmin
command to first delete the file in pnfs space. Additional thought and
discussion indicates that the way we use the current file status is
incorrect, and some current statuses should be moved to file@location status.
Additional statuses discussed at d0 include :incomplete, obsolete, superseded,
user-added, unofficial. May be more or others  |
  |
v4.1  |
  |
  |
  |
  |
MC dimension code  |
Matt, Carmenita  |
Need dimension query ability for MC request parameters  |
close  |
  |
  |
  |
  |
  |
  |
  |
MC  |
Carmenita  |
Ongoing support for MC request features.  |
test. Still need get meta data.  |
Meta data is there, need response from Dave
concerning the MC_Runjob description files he has produced as examples.
Need way to add support entries.
Store works, but need to be able to initilaize request and get a
request ID back, then validate attributes when processed data metadata
returns from remote MC centers. Some of the server interfaces are there,
need to build client part in sam user.  |
sam_db_server,sam_user  |
v4.0  |
  |
  |
  |
  |
MC request end-to-end  |
Carmenita, Dave Evans, Iain Bertram  |
end to end test of MC request facility. 1. Create request using MDC file, 2. Store the request and get the request ID, 3. Retrieve the request and generate macro 4. Modify the status to "running", 5. Run the request at a remote processing farm, 6. Declare / store files into SAM, 7. Translate constraints w/ request parameters, 8. create data set to retrieve files, 9. Modify state to "done".  |
  |
  |
  |
  |
  |
  |
  |
  |
app_family + param type/name/value  |
Steve,Lauri,Carmenita,Diana  |
Link application name/version with MC param type/name/value to provied way to record generalized processing attributes.  |
needs design  |
  |
sam_user,sam_db_server, sam_db  |
v4.1  |
  |
  |
  |
  |
Documentation  |
  |
Look through documentation and fix problems. Need sam quick reference page, to replace the quick start guide that is obsolete.  |
after Thanksgiving  |
sam get metadata,list definition --keywords, sam create dataset --keyword???, sam run project, sam submit may have problems, mc runjob new metadata, auto dest "sam store --descrip=...", add new phase needs to be documented.
need to document metadata for luminosity and archive files, sam batch
commands, psusp, files not delivered. python api, new dimensions and
examples. Translation of status block . sam toonl should be documented.
Sam station starting options through sam_bootstrap startup. new flags need to be documented. Questions about groups need to be answered in docmentation.  |
sam_doc  |
  |
  |
  |
  |
  |
omniORB.py  |
Steve  |
continue to understand issues of omniORB.py use with sam .
Steve provide detailed list of work to be done. Steve will produce list for
discussion 12/04/2001. Steve has made some progress and can describe where he feels the problems are 1/28/2002.  |
  |
  |
  |
  |
  |
  |
  |
  |
autodest  |
Carmenita  |
autodestination with processed files needs to be resolved
bug in the server in constructing the path, pulling info from the
parent that it should not.  |
done, needs test on farms  |
  |
sam_user,sam_db_server  |
v3.2  |
  |
  |
  |
  |
get num copies  |
Carmenita  |
get the number of copies for each file from the sam database
need to decide where this is kept in sam.  |
  |
  |
sam_user  |
?  |
  |
  |
  |
  |
file_family  |
Carmenita  |
Add code to sam autodest so that the proposed path string
uses an optional entry for "file_family=..." appended to the stream field.
This has been requested by Gerry for the online direction of files to tape.
Still some debate, but will provide flexibility for streaming decisions
to be made later.  |
  |
  |
sam_user  |
?  |
  |
  |
  |
  |
samadmin  |
Lauri  |
mark entire station as down, also might want node down, station down, fss down.  |
not critical  |
  |
sam_admin  |
  |
  |
  |
  |
  |
new d0 robot  |
Chris, Lee  |
Need to prepare for use of new d0 robot as details emerge,
need new sam on d0lxac1 and d0lxbld1.  |
done  |
  |
  |
  |
  |
  |
  |
  |
Jenny xmas tasks  |
Jenny Chen + Diana  |
Tasks for Jenny over Christmas 1. complete tasklist formatting script, 2. Shifter log tool, 3. shifter notification script. Task 2 needs a 2 table schema that Diana will build.  |
#3 done, started #2, not finished  |
  |
sam_shift_tools, + others  |
  |
  |
  |
  |
  |
consumer.py  |
Chris  |
finish and test consumer.py. Work needed to go to a null flavor sam_user. (Lauri and Chris)  |
done, in production  |
  |
sam_user  |
v4.0  |
  |
  |
  |
  |
sam manager  |
Sinisa  |
possible sam_manager work that may be needed. Pingable
client. Check restart option works with --CPID on command line.Also desire to reuse Gabriele's api for ROOT. Gabriele might be able to do this.  |
eventually, not high priority  |
  |
sam_manager  |
v3.2.1  |
  |
  |
  |
  |
Opt bug  |
Sinisa, Igor  |
There is an optimizer communication problem with FSS that shows up ocassionally. Need to fix FSS to retry, add some debugging messages, and reduce the registration time. Scheduled to be done week of Jan14.  |
  |
  |
sam_station  |
  |
  |
  |
  |
  |
Name service  |
Sinisa  |
Finish testing latest fixes to naming server and move to production.  |
done, extreme success!  |
  |
sam_name_server  |
  |
  |
  |
  |
  |
SUN station  |
Sinisa  |
Compile SAM for SUN. Need to compile Orbacus with Kai 4.0f. Need to modify some of the stager code to wor. there.  |
done  |
  |
sam_station  |
  |
  |
  |
  |
  |
SAM Manager  |
Sinisa  |
Need to build with new IDL products to catch the new exceptions introduced by Igor. Need to coordinate sam_station_idl.  |
done  |
  |
sam_manager  |
v4.0  |
  |
  |
  |
  |
Monitor ClueD0  |
Sinisa  |
Work to upgrade the backend of the SC2001 info gathering scripts to load information into the new oracle tables using dcoracle.Maybe some changes to sam_admin tools for mining log files. May also want to break log files daily to avoid long processisng times to extract information.  |
  |
  |
  |
  |
  |
  |
  |
  |
Restart  |
  |
Need to be able to recover projects after station crash.
1. application must be restartable, 2. batch system must coordinate with projects, 3. projects are restarted.  |
  |
  |
  |
  |
  |
  |
  |
  |
d0mino-sam  |
lauri  |
Add ability for remotely-initiated transfers to use d0mino-sam dedicated interface on d0mino. Do not believe this involves any mods to bbftp.  |
  |
  |
sam_cp  |
  |
  |
  |
  |
  |
cleanup FSS cache cron  |
Carmenita  |
Create a cron job that runs once per day to run the cleanup script on d0mino needed if the FSS dies with orphaned files in the cache.  |
done  |
  |
  |
  |
  |
  |
  |
  |
remote staging  |
Chris, Igor, Others  |
Need to set up stagers on d0mino for remote stations so they can access files from fnal MSS. Need to test this and begin to deploy for remote sites. Should work with existing station code. Need to setup cache area for remote stations to be set up at D0Race Workshop  |
tested  |
  |
  |
  |
  |
  |
  |
  |
data routing  |
Igor + New person  |
need design for ultimate file routing. May include incorporation of FSS into station server which brings other important features like fss cache management and persistency. Refer to Igor's email concerning the topic.  |
  |
  |
sam_station  |
  |
  |
  |
  |
  |
db upkeep  |
diana  |
continue upkeep and monitoring of d0 db instances  |
  |
  |
  |
  |
  |
  |
  |
  |
Q management  |
  |
batch queue management and restrictions to hold a single user to limited no of jobs  |
deferred  |
  |
  |
  |
  |
  |
  |
  |
Helpdesk Followup  |
Lauri,Carmenita  |
Need to follow up HD tickets assigned to sam and resolve and closeout  |
  |
  |
  |
ongoing  |
  |
  |
  |
  |
Vicky's list  |
Need to decide  |
list from Vicky from November.  |
  |
Known issues/operations/testing stuff a) clueD0 and other linux stations strange things with PM, b) restarts - are they working - tell the users how to do it., c) writing out root-tuples at end of input file - tell users how to do it - Jim K was going to write a mail about this - root-tuple writer package needs to catch framework 'event' that input file has been closed, just like sam_manager catches it., d) remote stations getting files through from tape via their own stager need to test, e) stken need to test, f) routing and use of Gb interfaces - needs more discussion and a written, understanding of what we are going to do, g) sam submit - not allowing users to run in Farm-like mode, h) testing from Nikhef - running analysis project on d0mino to use files from SARA robot. Also the inverse - running project there and pulling files from d0mino with bbftp.  |
  |
  |
  |
  |
  |
  |
/sam/cache  |
Lee, Dave F.  |
Re-organize /sam/cache disks.  |
needs more detailed plan  |
Re-organize /sam/cache disks into following:
1. /sam/cache - used for sam station cache, 2. /sam/route - used for
remote station routing to enstore, 3. /sam/remote - used for remote
station access to fnal MSS, and 4. /sam/external - used as "external"
locations (e.g. farm and online data). Would like to have following
arrangement: as many 273GB disks as needed for /sam/cache, and
/sam/external. Allocate one 273GB disk for each /sam/route, and
/sam/remote. Under the covers, disks will be called /sam/generic,
and have sym links that assign to a particular usage, and group.
For example, /sam/route/uta -> /sam/generic05/A, /sam/cache28 ->
/sam/generic06, /sam/remote/in2p3 -> /sam/generic07/A, /sam/cache25 ->
/sam/generic08.  |
  |
  |
  |
  |
  |
  |
TH upgrades  |
Chris  |
Improve test Harness to reflect behaviour more consistent with central-analysis. For example, need simulated users to kill their jobs in the middle, and need many 10's of thousands of small files cached and reused many times. This will test the station revival more completely.  |
  |
  |
  |
  |
  |
  |
  |
  |
FRH 7.1 on SAM cluster  |
Chris, Operations group  |
Need to install RH7.1 on SAM cluster  |
  |
  |
  |
  |
  |
  |
  |
  |
monitoring use cases  |
ppdg-sisnsa  |
prepare and present monitoring use cases at D0Grid meeting  |
  |
  |
  |
  |
  |
  |
  |
  |
SAM CDF  |
ppdg-sinisa (or other)  |
split sam_config, and sam_boot_strap so can run
completely independent db_servers, naming_service, optimizer, and data logger for SAM deployments other than D0.  |
  |
  |
  |
  |
3  |
  |
  |
  |
Pick events design  |
  |
design for pick events using existing sam tools, and additional
features for pooling requests, caching events, and cataloging.  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |
  |