site
name |
job
manager address |
ip
range |
status |
contact
person(s) |
CPU
available |
Clermont Ferrand |
clrlcgce01.in2p3.fr:2119/jobmanager-lcgpbs-dzero clrlcgce02.in2p3.fr:2119/jobmanager-lcgpbs-dzero |
stress test MC requests run successfully. | kurca@in2p3.fr lebrun@in2p3..fr |
160 |
|
IN2P3 |
cclcgceli02.in2p3.fr:2119/jobmanager-bqs-short cclcgceli02.in2p3.fr:2119/jobmanager-bqs-medium cclcgceli02.in2p3.fr:2119/jobmanager-bqs-long |
stress test MC requests
run successfully. |
kurca@in2p3.fr lebrun@in2p3..fr |
1500 *shared |
|
NIKHEF |
tbn20.nikhef.nl:2119/jobmanager-pbs-qshort tbn20.nikhef.nl:2119/jobmanager-pbs-qlong |
192.16.186.128 | 192.16.186.256 |
stress test MC requests run
successfully. Observed data access speed to in2p3 was 10x slower than from CF and IN2P3. |
a03@nikhef.nl templon@nikhef.nl |
200 *shared |
Imperial |
gw39.hep.ph.ic.ac.uk:2119/jobmanager-lcgpbs-dzero | 148.88.81.99 148.88.81.100 , 155.198.216.111 | 155.198.216.149 |
Jobs failed due to lack of scratch space. | f.villeneuve@imperial.ac.uk d.colling@imperial.ac.uk |
56 |
Manchester |
bohr0001.tier2.hep.man.ac.uk:2119/jobmanager-lcgpbs-dzero | 195.194.104.0/24, 195.194.105.0/24, 195.194.106.0/24, 195.194.107.0/24, 195.194.108.0/24, 195.194.109.0/24, 195.194.110.0/24, 194.36.3.0/24 |
trying to contact Sabah to get
more information on cluster status. |
sabah@fnal.gov |
2000 |
Lancaster |
fal-pygrid-18.lancs.ac.uk:2119/jobmanager-lcgpbs-dzero |
stress test MC requests run successfully. | p.love@lancaster.ac.uk | 394 |
|
Prague |
golias25.farm.particle.cz:2119/jobmanager-lcgpbs-lcgd0prod | - |
upgrade scheduled on Dec 12. |
svecj@fzu.cz, kurca@in2p3.fr |
100 |
Wuppertal |
grid-ce.physik.uni-wuppertal.de:2119/jobmanager-lcgpbs-an_long |
- |
started testing |
meder@physik.uni-wuppertal.de |
512 |
Milestone
name |
Status |
Expected
date |
Complete extension of the test
bed to production size. |
Completed stress test with 330
jobs running at CF, In2p3 , NIKHEF clusters. Substress tests for Prague , Lancaster. downloads/OctSAMGridLCGStressTest.html |
Mid Nov. |
Consistent efficiency of the
Montecarlo production jobs. |
Mid Jan. |
|
Continuous usage of the
infrastracture by experiment operators |
taks
name |
description |
status |
start
date |
tantative.
release date |
software
releases/deliverable |
contributor |
Pri.* |
Station
polling interfaces -> production |
Improve
pilot implementation of the SAM polling interfaces to support
better diagnostics and to increase tolerance with respect to central
system failures (name server) |
Mid
Oct |
Mid
Dec |
Andrew |
4 |
||
Scalability.
New DB server |
Decouple
LCG and SAMGrid production activities. The premise is based on the
increased cost of diagnostics of the LCG submitted jobs. |
done |
current |
End
Oct |
Andrew,
Steeve White |
2 |
|
Mission
critical.MC merge output storage selection |
Be
able to store merged MC data using SAMGrid storage selection
configuration. |
done | End
Oct |
Start
Nov |
Andrew |
1 |
|
Managebility.cred. management. Integration with MyProxy | Avoid
shiping user proxy with the job. Re-use MyProxy solution to delegate
the task. |
done |
End
Nov |
End
Dec |
Sudhamsh,Jeff
Templon, sites |
4 |
|
Prod
quality. regression testing. certification. |
High
level SAMGrid support model seems to call for centralized suport center
that needs tools to certify,test individual sites to resolve on-going
operational claims. Also important to keep site status up to date with
respect to changes in LCG and SAMGrid software. |
jim_stats package is about to be released with changes that enable automatic profiling of the subitted jobs. | Mid
Oct. |
Mid
November |
Sudhamsh/Andrew/Torsten/Sites |
2 |
|
Prod
quality. stress testing. |
Identify
botlenecks in submission and data handling components. Prerequisite to
procure optional deployments of the additional bridge nodes if results
fail to satisfy production throughput requirements. |
done.
|
Start
Nov |
Mid
Nov |
set
of JDLs that schedule MC jobs to all LCG sites. Results and
analysis. OctSAMGridLCGStressTest.html |
Sudhamsh/Andrew/Torsten/Sites | 3 |
Managebility.cluster
tagging. |
LCG
software , hardware changes/upgrades make it difficult to track LCG
resource pool for D0. Need a "tag" layer to isolate LCG job manager
name/address from the LCG broker string. Also need to sub-select the
pre-certified sites for a particular task (MC/ merge / reco/ reco
merge). |
planning |
Start
Nov |
? |
Andrew/Torsten/Sites |
4 |
|
passing
LCG schedulling parameters from SAM JDL to LCG JDL. |
Important
deployment feature to enable individual regression/stress testing of
the LCG site. |
done |
Start
Oct |
10/09/05 |
jim_client
v2_1_30, sam_batch_adapeters x47 |
Sudhamsh |
2 |
verbose
LCG output retrieval. |
Store
as much persistent diagnostics as possible for debuging/problem
resolution. |
done |
Start
Oct |
10/07/05 | sam_batch_adapters
x47 |
Sudhamsh |
2 |
sys.
expansion. New station storage at ccin2p3-grid1. |
Pending
on the stress test results. The additional storage bandwidth may
be required to sustain production rates. |
done |
Start
Dec |
Mid
Feb |
Andrew/Tibor |
4 |
|
Set
number of LCG jobs that failed due to the LCG output handling. |
WBS
3.1.1.2 |
in
progress |
Start End Oct. |
Mid Nov. | Sudhamsh/Parag/Andrew | 3 |
|
Job
termination on deadline. |
To
ensure predictability of the recovery process LCG jobs should not be
run
passed certain point where it is known the recovery procedure will take
over. |
done |
Mid
Nov. |
End
Nov. |
Sudhamsh | 4 |
|
Manchester storage element. |
Scale our storage elements
infrastructure (currently deployed at in2p3) by adding 2Tb storage
element in Manchester for the SAMGrid/LCG |
done |
End Oct. |
Mid Feb. |
Sabah,Andrew,Tibor |
3 |
|
Be able to select "closest"
storage element with respect to job running location. |
To optimize usage of the network bandwidth , job should select storage that is "closest" to a site it is currently running at. | not started |
Beg. Nov | End. Dec. |
Andrew,Sudhamsh, Gabrielle |
3 |
|
Accounting |
Report LCG cluster resource
usage to the collaboration. |
done |
Beg Nov. |
End Dec. |
MC_LCG_Accounting.doc |
Gavin/Jeff Templon/Mike Diesburg |
3 |
SAMGrid Production LCG
forwarding node deployment. |
Separate poroduction and test
bed development activities by dedicating new head node to
production. |
done |
Mid Nov. |
Mid Dec. |
old forwarding node is available
as "test_prd" |
Torsten LCG , SAMGrid team |
2 |
Date |
Location |
25 Oct 2005 |
downloads/OctSAMGridLCGStressTest.html |
Package
name |
release |
jim_client (upgrade instructions) |
v2_1_53 |
---- LCG forwarding node
software releases, for SAMGrid/LCG admins only ---------- |
|
jim_job_managers |
v2_2_82 |
sam_client |
v1_0_66_poll |
sam_fcp |
v1_0_25 |
jim_config |
v1_2_18 |
$Id: SAMGridLCGStatus.html,v 1.17 2006/03/09 23:13:52 abaranov Exp $,