CDFOI v2.0.2 "Scheduled Tasks" List - WBS (23 October 2008)
Text format
Size 30.1 kB - File type text/plainFile contents
ID WBS Name Notes 0 CDF Offline Initiative This plan is defined to start on 31 March 2008.... 1 1 CDF Offline Architecture 2 1.1 Strategy Sheets 3 1.1.1 CDF Grid Infrastructure 4 1.1.1.1 CDF Grid Infra Strategy: Version 1 vetted by Initiative 5 1.1.1.2 CDF Grid Infra Strategy: Version 2 vetted by CDF Spokespeople CDF DocDB #2688 6 1.1.1.3 CDF Grid Infra Strategy: Version 3 posted for review 7 1.1.1.4 "CDF Grid Infra Strategy: Review by CDF Spokes, Initiative Mgmt" "V3 presented to CDF Spokespeople on 6/5/2008, with general agreement on strategies and qualified agreement on the Project Organization component. Declared done due to lack of dissent at CDF Week presentation." 8 1.1.1.5 "CDF Grid Infra Strategy: Review by CD Mgmt, Initiative Mgmt" 9 1.1.2 CDF CAF-Grid Instances 10 1.1.2.1 CDF CAF-Grid Instances Strategy: Version 2 vetted by CDF Spokespeople CDF DocDB #2688 11 1.1.2.2 CDF CAF-Grid Instances Strategy: Version 3 posted for review 12 1.1.2.3 "CDF CAF-Grid Instances Strategy: Review by CDF Spokes, Initiative Mgmt" "V3 presented to CDF Spokespeople on 6/5/2008, with general agreement on strategies and qualified agreement on the Project Organization component. Declared done due to lack of dissent at CDF Week presentation." 13 1.1.2.4 "CDF CAF-Grid Instances Strategy: Review by CD Mgmt, Initiative Mgmt" 14 1.1.3 MILE 28: High Priority Strategy Sheets approved 15 1.1.5 CDF Disk Space Strategy 16 1.1.5.1 "CDF Disk Space Strategy: Version 1 vetted by Offline, REX Dept" 17 1.1.5.2 CDF Disk Space Strategy: Version 2 vetted by CDF Spokespeople 18 1.1.4 CDF Offline Infrastructure Strategy 19 1.1.4.1 "CDF Offline Infrastructure Strategy: Version 1 vetted by Offline, REX Dept" 20 1.1.4.2 CDF Offline Infrastructure Strategy: Version 2 vetted by CDF Spokespeople 21 1.1.6 CDF Data Handling Strategy 22 1.1.6.1 "CDF Data Handling Strategy: Version 1 vetted by Offline, REX Dept" 23 1.1.6.2 CDF Data Handling Strategy: Version 2 vetted by CDF Spokespeople 24 1.1.7 MILE 7: Strategy Sheets completed 25 1.3 Project Organization Chart Org Chart *PLUS* clear responsibilities and boundaries defined among the sub-divisions of the Offline Project 26 1.3.10 Project Org Chart: Basic version capturing current state 27 1.3.11 Project Org Chart: Proposal development 28 1.3.12 Project Org Chart: Detailed Responsibility Assignments in Org Chart This includes any shifting of responsibilities that may be considered. 29 1.3.13 "Project Org Chart: Review by CDF Spokes, CD Mgmt, Initiative Mgmt" "V3 presented to CDF Spokespeople on 6/5/2008, with general agreement on strategies and qualified agreement on the Project Organization component." 30 1.3.14 MILE 27: Project Org Chart completed 31 1.2 Offline Services Design Document 32 1.2.1 High-Level Arch Diagram for CAF and Grid 33 1.2.5 "GroupCAF, FermiGrid CAF Components Diagrams" 34 1.2.6 Modern FermiGrid CAF Component Diagrams RDK: Choose to defer this until after the long-term system configurations are attained to avoid rework. See CDF Offline Operations reports by SL in October 2008 for preliminary system diagrams during the GroupCAF to FermiGrid CAF Migration. 35 1.2.9 "NAMCAF, TestCAFs Components Diagrams" 36 1.2.3 Services-Hardware Map 37 1.2.7 CAF Job Submission WorkFlow Diagrams 38 1.2.2 Production Processing Workflow Diagrams 39 1.2.8 MILE 3: Offline Production Services documented 40 1.5 High-Level Offline Architecture Document 41 1.6 MILE 15: CD Offline Architecture defined 42 2 Shared Operations Management Topics 43 2.1 Low-level Monitoring and Alarms (Zabbix) 44 2.1.1 Zabbix: Evaluation Requirements Document 45 2.1.2 Zabbix: Evaluation and Evaluaton Summary Document 46 2.1.3 Zabbix: System Specificatons Document 47 2.1.4 Zabbix: Multi-Phase Deployment and Short-term Support Plan 48 2.1.5 Milestone: Zabbix System specified 49 2.1.7 Zabbix: Negotiate service hosting and support 50 2.1.8 Zabbix: Setup service hosting 51 2.1.6 Initial Production-quality Zabbix Agents 52 2.1.6.1 Zabbix: CDF-Independent Agent scripting and testing 53 2.1.6.2 CLOSED: Zabbix: CDF-specific Agents code "RDK (9/12/2008): Federica Moscato is and has been dedicated 100% to the GroupCAF to FermiGrid CAF migration, the highest priority for CDF, and thus unable to work this task." 54 2.1.25 CLOSED: Replan Zabbix Monitoring and Alarms 55 2.1.24 CLOSED: Zabbix: CDF-specific Agents code 56 2.1.9 CLOSED: Zabbix: Integration of available agents on hosted service 57 2.1.10 CLOSED: MILE 5: Zabbix Phase 1 Deployment successful 58 2.1.11 CLOSED: Zabbix: Configuration Testing and Tuning 59 2.1.12 CLOSED: Zabbix: Develop Complete Suite of Production Agents (by expert) 60 2.1.13 CLOSED: Zabbix: Integration and Reporting for new agents (by expert) 61 2.1.14 CLOSED: Milestone: Zabbix Phase 2 Deployment successful 62 2.1.15 CLOSED: Zabbix: Agents Development and Deployment Doc 63 2.1.16 CLOSED: Zabbix: Operations and Configuration Management Doc 64 2.1.17 CLOSED: Zabbix: Develop new agents for another system (by CAF ops team) 65 2.1.18 CLOSED: Zabbix: Integrate new agents for another system (by CAF ops team) 66 2.1.19 CLOSED: Milestone: Zabbix Phase 3 Deployment successful 67 2.1.20 CLOSED: Zabbix: Long-term Platform and Service Support Agreement 68 2.1.21 CLOSED: Zabbix: Hand-off to CDF CAF Operations team 69 2.1.22 CLOSED: MILE 20: Zabbix production deployment achieved 70 2.2 Issue/bug Tracking (Jira) 71 2.2.1 Jira Evaluation: Beta Config on Beta Platform 72 2.2.2 Jira Configuration Review 73 2.2.3 Jira Evaluation: Prod Config on Beta Platform 74 2.2.4 Jira: Beta Configuration Document 75 2.2.5 Jira: Requirements Document 76 2.2.6 Jira: Evaluation Document 77 2.2.7 Jira: Initial Integration into Existing Support Processes 78 2.2.8 Jira: Create and Deliver a User Tutorial 79 2.2.9 Jira: Refine Integration into Existing CDF Support Processes "MeV (6/3/2008): The ultimate goal is to have *only* issues@fnal.gov in the email lists. As of this weekend, this is true for all of the lists except cdf_caf and cdfdb-support. The latter is scope creep, and I hope progress will be made on the former th..." 80 2.2.10 MILE 1: Jira Production Service on Beta Platform successful 81 2.2.14 INPUT: Choose to outsource Jira production platform 82 2.2.15 Jira: Purchase Requisition development 83 2.2.16 Jira: Purchase Requisition into Lab System 84 2.2.17 Jira: Initial Access to N-user Jira Service (purchase eff. completed) 85 2.2.18 Milestone: N-user Jira Service purchased and accessible 86 2.2.19 Jira: Migration from Beta to Production Platform 87 2.2.20 Jira: Adapt Most Configuration to Specifics of Production Platform 88 2.2.29 Milestone: Switch Service to Jira on Production Platform 89 2.2.26 Jira: Refine CDF Support Process Integration - Last Loose E-mail Lists "MeV (6/3/2008): The ultimate goal is to have *only* issues@fnal.gov in the email lists. As of this weekend, this is true for all of the lists except cdf_caf and cdfdb-support. The latter is scope creep, and I hope progress will be made on the former th..." 90 2.2.12 Jira: Metrics and Reports Based on Issue Tracking Content 91 2.2.22 Jira: Configuration Fine-Tuning MeV (6/10/2008) - rephrased:... 92 2.2.21 MILE 8: Jira Migrated from Beta to Production Platform 93 2.2.30 Jira: Updated User Documentation Presented by MeV at the CDF Collaboration Meeting on 6/18/2008. 94 2.2.23 Jira: Configuration Management and Application Support Guidance Doc 95 2.2.24 Jira: Hand-off to Operations team Operations team in this case is REX/Ops (Adam Lyon) 96 2.2.25 MILE 13: Jira Issue Tracker production deployment achieved 97 2.2.27 Jira: LDAP Integration for Production Platform - NOT COMPLETED MeV (6/10/2008): waiting on approval from computer security 98 2.2.27.1 "Jira: LDAP Integration - Vendor, Central Services Discussions" 99 2.2.27.2 CLOSED: Jira: LDAP Integration - Deployment for externally-hosted production platform MeV (9/12/2008): We won't get these. We might get LDAP someday. I think we take them off the list.... 100 2.2.28 Jira: PIX Email Handler Plug-in Integration for Production Platform - NOT COMPLETED "MeV (6/10/2008): Glenn is testing the new version of PIX on the evaluation license, but it seems to be eating email off the imapserver and not generating tickets. He is trying to get help from the company in Germany, but no one is responding (bad sign..." 101 2.2.28.1 Jira: PIX Email Handler Plug-in - Defect reproduced by vendor "MeV (6/10/2008): Glenn is testing the new version of PIX on the evaluation license, but it seems to be eating email off the imapserver and not generating tickets. He is trying to get help from the company in Germany, but no one is responding (bad sign..." 102 2.2.28.2 CLOSED: Jira: PIX Email Handler Plug-in - Vendor fixes defect "MeV (6/10/2008): Glenn is testing the new version of PIX on the evaluation license, but it seems to be eating email off the imapserver and not generating tickets. He is trying to get help from the company in Germany, but no one is responding (bad sign..." 103 2.2.28.3 MOOT: Jira: PIX Email Handler Plug-in - Deploy fixed version "MeV (6/10/2008): Glenn is testing the new version of PIX on the evaluation license, but it seems to be eating email off the imapserver and not generating tickets. He is trying to get help from the company in Germany, but no one is responding (bad sign..." 104 2.2.31 Milesone: Jira Issue Tracker deployment closed 105 2.3 Downtime Planning and Recovery 106 2.3.1 Root Cause Analysis for late March downtime 107 2.4 Code Repository 108 2.4.1 Assess Risk of CVS-to-SVN Migration during Initiative 118 2.7 Milestone: Shared Operations Management Issues addressed 125 3.3 INPUT: Issue Tracking (Jira) Integration with Support Process 126 3.8 Web Server Migration: Switchover "TO be done at next Major downtime, probably August" 127 3.4 Code Server Node Upgrades 128 3.5 SL4 Migration "While this activity is active, Stephan Lammel is keeping an updated task list at:..." 129 3.5.1 Basic Migration Planning RS/LG (6/10/2008) paraphrased:... 130 3.5.2 Phase 1: Implement Modifications to Infrastructure 131 3.5.2.1 INPUT: Implement changes to SRT Already done by the time the SL4 Migration Plan was defined. 132 3.5.2.2 Test UPS v4.7.4 with 64 bit support "Lynn Garren (6/25/2008): The basic tests of ups 4.7.4 are done and it is available on the cpd build machines. I have found one problem that will need fixing, but it doesn't affect this stage of deployment. You can't make ups 4.7.4 the default version o..." 133 3.5.2.3 Fix ability to make UPS v4.7.4 the default (setup scripts) 134 3.5.2.4 Deploy UPS v4.7.4 in development 135 3.5.2.5 Update External Products 136 3.5.2.6 milestone: SL4 Phase 1: Dev environment prepared (confirmation date) 137 3.5.2.7 Deploy Stand-alone Xrootd - Coupled Part 1 05 Aug 2008: xrootd deployment was preempted by the FCC to GCC machine move. - Stephan... 138 3.5.3 INPUT: ICHEP Activity sufficiently ended 139 3.5.4 Phase 2: Configure Major Releases 140 3.5.4.1 Setup dev build machine (fcdfbld4dev) 141 3.5.4.2 "Code Server Preparation, Mounts Cleanup" Tied to October 6 site power outage. 143 3.5.4.3 Switch development to new build scheme 144 3.5.4.4 Configure Major Releases to Build under SL4 and new build scheme 145 3.5.4.5 MILE 23: Major Releases Ready for SL4 and new build scheme 146 3.5.5 Phase 3: Deployment 147 3.5.5.1 Migrate Remaining SL3 Machines to SL4: ILP 148 3.5.5.2 Migrate Remaining SL3 Machines to SL4: Desktops 149 3.5.6 Milestone: SL4 Migration completed 150 3.6 CVS Service Migration 151 3.7 "MILE 24: Code Server, SL4, CVS Migrations completed" 158 4 CDF Grid Infrastructure 159 4.1 INPUT: Low-Level Monitoring (Zabbix) Integration with Operations 160 4.2 INPUT: Issue Tracking (Jira) Integration with Support Process 161 4.3 Condor User Monitoring [Needs FEF-Approved Production Deployment Plan] Migrate CAF monitoring from Python dict to RDB-backed 162 4.3.1 Identify and gain access to hardware for evaluation work 163 4.3.2 Setup Zabbix (distinct from FEF/GuG service) Hans Wenzel (6/4/2008):... 164 4.3.3 Setup Condor Quill++ Hans Wenzel (6/4/2008):... 165 4.3.12 CLOSED: Resolve Condor Quill++/CAF Authentication Problem Federica Moscato (6/3/2008) paraphrased:... 166 4.3.4 CLOSED: Alternative: Recreate some aspects of User Monitoring based on Quill and Zabbix Hans Wenzel (9/18/2008): I am still working on the document. For various reasons it took me much longer than expected: the draft is... 167 4.3.5 CLOSED: MILE 9: Demonstrate User Monitoring based on RDB-backed system 168 4.3.6 CLOSED: Specify Production System 169 4.3.7 CLOSED: Develop/configure Production System 170 4.3.8 CLOSED: Deploy Production System 171 4.3.9 CLOSED: Document Production System 172 4.3.10 CLOSED: Hand-off Production System to Operations 173 4.3.11 CLOSED: MILE 21: User Monitoring migrated to RDB-backed system 174 4.3.13 Document design of User Monitoring base on Quill and Zabbix Hans Wenzel (9/18/2008): I am still working on the document. For various reasons it took me much longer than expected: the draft is... 175 4.3.15 Present design of User Monitoring base on Quill and Zabbix 176 4.4 FNAL KCA Upgrade 177 4.4.1 Plan and prepare for new KCA turn-on "New KCA Turn-on: week of May 15, probably May 16..." 178 4.4.2 Assess impact of KCA upgrade on GroupCAF 179 4.4.3 Simple Test Trial 1 against Test KCA - OSG Stack 180 4.4.4 Simple Test Trial 1 against Test KCA - LCG Stack 181 4.4.5 "Investigate test failures, identify cause(s)" 182 4.4.6 Determine and deploy best remedy for test failures 183 4.4.7 Simple Test Trial 2 against Test KCA - OSG Stack 184 4.4.8 Simple Test Trial 2 against Test KCA - LCG Stack 185 4.4.9 Submission Tests against Test KCA - OSG Stack 186 4.4.10 Submission Tests against Test KCA - LCG Stack 187 4.4.11 Tests for fix to whitescape and other character insertion to DN Donatella Lucchesi (6/4/2008):... 188 4.4.12 MILE 4: KCA Upgrade achieves goals 189 4.4.13 "Plan CDF VOMS work required, discuss with VOMS service providers" 190 4.4.14 VOMS service provides prepare for adaptation 191 4.4.15 Milestone: CDF VOMS service providers ready for adaptation 192 4.4.17 LCGCAF VOMS Entries introduced 193 4.4.18 CNAF VOMS Entries introduced 194 4.4.19 NAMCAF VOMS Entries introduced 195 4.4.20 PACCAF VOMS Entries introduced 196 4.4.21 FermiGrid CAF VOMS Entries introduced 197 4.4.16 EXTERNAL: FNAL Production KCA Upgrade 198 4.4.25 FNAL KCA Service Switchover (overlapped with 1/2 day downtime) 199 4.4.23 MILE 14: FNAL KCA Upgrade adaptations tested and deployed 200 4.4.22 KCA Upgrade: Remove unneeded VOMS entries - Main and synchronized VOMS servers 201 4.4.26 KCA Upgrade: Remove unneeded VOMS entries - FNAL VOMS server 202 4.4.24 Milestone: FNAL KCA Upgrade task closed 203 4.5 Migrate GroupCAF to FermiGrid CAF 204 4.5.1 "Procure and prepare fcdfhead10,11" 205 4.5.2 "Install and configure fcdfhead10,11 as stable FermiGrid head nodes - Unsuccessful" Federica Moscato (6/3/2008):... 206 4.5.3 INPUT: Shadow CAF prepared for CDF use "Steve Timm reported to CAF_DEVELOPERS that the ""sleeper"" CAF is ready for CDF use. While it is initially setup for fewer slots, it has scripts for 10k slots. This hand-off will need some follow-up to insure mutual understanding of what is ready." 207 4.5.4 CDF Initial Acceptance Test for ShadowCAF: usable at modest scale 208 4.5.5 Organize Head Node Task Force 209 4.5.6 "Review situation with fcdfhead10,11 and CAF code issues; Re-plan" 210 4.5.7 "Re-organize Condor CAF code, related software, and config with fcdfhead12,13" Margaret Votava (7/8/2008): VDT has been packaged in a UPS format so it can be setup/unsetup. I will move to condor next. 211 4.5.8 Create Production Release of CAF Software for Local Use 212 4.5.8.1 CafCondorDev: First tagged reproducible release of CAF software 213 4.5.8.2 "Adapt software,configuration to support multiple collectors" 214 4.5.8.3 Fix the Mailer Bug 215 4.5.8.4 Document CafChecklist http://www-cdf.fnal.gov/htbin/twiki/bin/view/Main/CAFChecklist 216 4.5.8.5 CafCondor Configuration Packaging 217 4.5.8.6 Prepare and Iterate Version Release Requirements Plan 218 4.5.8.7 "CafCondorConfig b2_0: tag and test on int21,22" 219 4.5.8.8 CafCondorConfig b2_0: load and test on head10 (head11 offline) 220 4.5.8.9 MILE 29: CafCondor Beta v2.0 for FNAL local use 221 4.5.8.10 CafCondorConfig b2_0: Compare/test on CNAF 222 4.5.8.11 CafCondorConfig b2_1: Works at FNAL and CNAF 223 4.5.8.19 "Krb5 Python Module Upgraded, Supportable, Deployed" 224 4.5.8.20 MILE 31: CafCondorConfig Package Ready for Long-term Support at All Sites 225 4.5.9 "Resolve Technical Issues preventing fcdfhead10,11 use as FermiGrid head nodes" "Establish a working system at any scale, based on software as much the same as the production cut release as possible." 226 4.5.10 CDF Intermediate Acceptance Test for ShadowCAF: usable at initial 3k scale This task is being done as part of the GlideinWMS initial adaptation work. 227 4.5.11 MILE 2: Shadow CAF ready for use at useful scale "RDK (10/07/2008): The initial goal was to prove the ShadowCAF (a.k.a. sleeper pool) to be operable at the 10k slot level. This scale was simply not achievable with the infrastructure in place. Each schedd could only support so many slots, and setting u..." 228 4.5.12 "Scale Testing of fcdfhead10,11-based FermiGrid CAF system" 229 4.5.12.1 "Scale Testing: Part 1 - Test Framework Done, 5k Slots Stable" 230 4.5.12.2 "Scale Testing: Part 2 - 5k Slots, 90- 99% Success Rate" 231 4.5.12.3 "Scale Testing: Part 3 - 5k Slots, No >99% success due to ShadowCAF Limitations" 232 4.5.12.4 "Scale Testing: Part 4 - 3k Slots, Multi-Users, >99.9% Success Rate" RDK: Attempts to create and use 300 fake user certs in the PILOT realm to test multi-user capacity turned into a dead-end. The realm was not working as expected and unlikely to be supported at this level in short-term. Decided to use a number of friend... 233 4.5.12.5 "Scale Testing: Part 5 - Same, with CafCondorConfig b2_0 on head10" 234 4.5.12.6 MILE 26: Establish production FermiGrid CAF head node configuration 236 4.5.14 MILE 16: FermiGrid CAF Head Nodes Upgraded (old nodes in place still) "RDK (9/30/2008): This is interpreted as putting the new head node ""head10"" in production in parallel if the old head nodes. Head11 hardware is still not trusted. Head10 has expanded memory and will host all ""head node"" services. There will then be a pr..." 237 4.5.17 Test Production Jobs on FermiGrid CAF 238 4.5.17.1 Run Calibration Jobs at Small-Scale: Working at All? 239 4.5.17.2 Run Ntuple Jobs at Modest Scale: Working at All? 240 4.5.17.3 Run Calibration Jobs at Modest to Large Scale: 85-90% Success Level 241 4.5.17.4 Run Calibration Jobs at Large Scale: 99% Success Level 242 4.5.17.5 Run Calibration Jobs at Large Scale: > 99.9% Success Level 243 4.5.17.6 Run Production (p19) Jobs at Large Scale: > 99.9% Success Level 244 4.5.17.7 Official Sign-Off for Running Production Jobs on FermiGrid CAF 245 4.5.17.8 MILE 30: Production Jobs Run Robustly on FermiGrid CAF 246 4.5.18 Final Migration Plan (with phased WN and Production User Migration) Review Coupled to glideinWMS preparedness since it is assumed this will be needed for scaling reasons.... 247 4.5.19 MILE 11: Final Plan for GroupCAF to FermiGrid CAF Migration This plan will revise the work between this milestone and the next milestone (FermiGrid Scheduling released). 248 4.5.21 INPUT: Production p17 processing completed 249 4.5.22 MILE 18: FermiGrid CAF Ready to Absorb GroupCAF 250 4.5.23 Phase 1 Implicit Migration of WNs from old to new FermiGridCAF Head Node 251 4.5.23.2 Phase 1: Production Processing migrated to FermiGrid CAF 253 4.5.24 Phase 2 Migration of WNs from GroupCAF to FermiGridCAF 254 4.5.24.1 Phase 2: N2 racks of WN updated and migrated to FermiGrid CAF 255 4.5.24.2 Phase 2: Class Y of Production Users migrated to FermiGrid CAF 256 4.5.25 Phase X Migration of WNs from GroupCAF to FermiGridCAF 257 4.5.25.1 Phase X: N3 racks of WN updated and migrated to FermiGrid CAF 258 4.5.25.2 Phase X: Class Z of Production Users migrated to FermiGrid CAF 259 4.5.26 Phase Y Migration of WNs from GroupCAF to FermiGridCAF 260 4.5.26.1 Phase Y: N4 racks of WN updated and migrated to FermiGrid CAF 261 4.5.26.2 Phase Y: Remaining Production Users migrated to FermiGrid CAF 262 4.5.27 Post-Migration FermiGrid CAF Tuning 263 4.5.20 FermiGrid CAF Service Finalization 264 4.5.20.1 Integration of FermiGrid CAF/NAMCAF across OSG1-4 265 4.5.20.2 Configure OSG1-4 to establish FermiGrid CAF/NAMCAF priorities 266 4.5.20.4 FermiGrid CAF: Disaster Recovery Plan for critical nodes 267 4.5.20.5 Production/Calibrations: Update operations procedures (if necessary) 268 4.5.20.6 Long-term Support Agreements 269 4.5.28 Switch off GroupCAF 270 4.5.28.1 "Migrate ""Add New Users"" to new host" 271 4.5.28.2 GroupCAF NFS File Server also an ICAF node? 272 4.5.28.3 Decommission Old GroupCAF Head Nodes 273 4.5.28.4 Decommission old FermiGrid head nodes 274 4.5.29 MILE 22: GroupCAF to FermiGrid CAF Migration completed 275 4.6 Adopt glideinWMS [Loosely Scheduled Plan] 276 4.6.1 INPUT: GlideinWMS proven mature enough for CDF production use 277 4.6.2 Determine Ops responsibilities and support for glideinWMS "Meeting on 6/2/2008: CDF Offline, Initiative, and Grid Service Dept mgmt - discussed responsibilies based on Initiative worklist and note in DocDB written by Keith Chadwick (defines requirements of FermiGrid customer service). Igor Sfiligoi may be avai..." 278 4.6.8 DECISION: Establish responsibilties among glideinWMS and CAFexe 279 4.6.3 Determine Hardware needed to support glideinWMS (estimate 2 nodes/CAF) 280 4.6.4 Milestone: glideinWMS Hardware Decision 281 4.6.21 Initial Adaptation of CAF software to use glideinWMS 282 4.6.22 Scale testing of glideinWMS CAF software 283 4.6.20 Reassessment of GlideinWMS Status and Replanning 285 4.6.9 Initial Rework of Glidecaf Code based on Re-adaptation + CondorCAFConfig v2_0 287 4.6.5 Procure and Prepare Hardware needed to support glideinWMS 288 4.6.6 Disaster Recovery Plan for glideinWMS nodes 289 4.6.7 Specifications and Test Plan document for glideinWMS 290 4.6.10 Establish Test CAF system for glideinWMS testing 291 4.6.10.1 Upgrade to latest glideinWMS code version on gFactory node 292 4.6.10.2 Upgrade and configure VO front-end machine 293 4.6.11 Test Setup and Initial glideinWMS Use 294 4.6.12 "Test Features, Robustness, and Scalability: 12,13" 295 4.6.13 Refinement of Glidecaf Code 296 4.6.14 Tune glideinWMS to CDF needs 297 4.6.15 MILE 10: Ready to deploy glideinWMS to first production system 298 4.6.16 "Install, Configure glideinWMS Software System for Production Use (Decision Point)" "DECISION: Install glideinWMS services on test or prod h/w, and use glideinWMS in production system?" 299 4.6.17 Deploy glideinWMS on Production System (Decision point; downtime req'd) DECISION: Install on NAMCAF or FermiGrid CAF? 300 4.6.18 MILE 17: GlideinWMS Ready for Full Deployment 302 4.7 Modify CAF code: Support multiple distinct schedd hosts [May Not Be Needed] 303 4.7.1 Adjust CAF Software to accommodate multiple schedd hosts 304 4.7.2 Test multiple schedd hosts in realistic environment 305 4.7.3 Deployment of multiple schedd hosts in production (CAF downtime req'd?) 306 4.7.4 MILE 19: Multiple Schedd Hosts in CAF production 346 5 CDF CAF-Grid Instances 347 5.1 TestCAFs Work 349 5.2 Fermigrid CAF Work 350 5.2.1 Fermigrid CAF: Critical Node Upgrades 358 5.3 NAMCAF Work 359 5.3.1 NAMCAF: Critical Node Upgrades 384 5.7 FNAL CAF Team Operations 385 5.7.1 CAF Operations Shift Level of Effort 386 5.7.1.1 Shift 1 - Downtime Recovery 387 5.7.1.2 Shift 2 388 5.7.1.3 Shift 3 - CAF Attack 1 389 5.7.1.4 Shift 4 390 5.7.1.5 Shift 5 391 5.7.1.6 Shift 6 - CAF Attack 2 392 5.7.1.7 Shift 7 393 5.7.1.8 Shift 8 394 5.7.1.9 Shift 9 395 5.7.1.10 Shift 10 396 5.7.1.11 Shift 11 397 5.7.1.12 Shift 12 398 5.7.1.13 Shift 13 399 5.7.1.14 Shift 14 400 5.7.1.15 Shift 15 401 5.7.1.16 Shift 16 402 5.7.1.17 Shift 17 403 5.7.1.18 Shift 18 404 5.7.1.19 Shift 19 405 5.7.1.20 Shift 20 406 5.7.1.21 Shift 21 407 5.7.1.22 Shift 22 408 5.7.1.23 Shift 23 409 5.7.1.24 Shift 24 410 5.7.1.25 Shift 25 411 5.7.1.26 Shift 26 412 5.7.1.27 Shift 27 413 5.7.2 CAF Operations Off-Shift Level of Effort 414 5.7.2.1 Off-shift Ops:April 2008 415 5.7.2.2 Off-shift Ops:Early May 2008 416 5.7.2.3 Off-shift Ops: Late May - June 2008 417 5.7.2.6 Off-shift Ops: July 2008 418 5.7.2.5 Off-shift Ops:August 2008 419 5.7.2.4 Off-shift Ops:September 2008 420 5.7.2.7 Off-Shift Ops: Leader "RDK (6/30/2008) per RS e-mail: KG is assigned to CAF Operations 100% as leader. The off-shift + on-shift level of effort = 100%, whatever the sum may be shown here. I am leaving this off-shift LOE as a single 100% task rather than break up this LOE tas..." 421 5.7.2.8 Off-Shift Ops: Leader "RDK (6/30/2008) per RS e-mail: KG is assigned to CAF Operations 100% as leader. The off-shift + on-shift level of effort = 100%, whatever the sum may be shown here. I am leaving this off-shift LOE as a single 100% task rather than break up this LOE tas..." 422 6 CDF Disk Space 423 6.1 Decommission/Upgrade ICAF Nodes 424 6.1.1 INPUT: ICAF User Account Migration procedure 425 6.1.2 INPUT: ICAF Hardware Replacement Decision 426 6.1.3 ICAF Upgrade: Develop ICAF Hardware Replacement Plan 427 6.1.4 ICAF Upgrade: Identify or specify replacement User Space and Service hardware 428 6.1.5 MOOT: ICAF Upgrade: Simplify ICAF user account management It was discovered while developing the ICAF upgrade plan and hardware specification that FEF had already taken on user account management and was using YP. There may be some modest changes desired (not required immediately) to ICAF infrastructure to ac... 429 6.1.6 ICAF Upgrade: Setup and preparation of User Space and Service hardware 430 6.1.7 Milestone: ICAF ready for user space migration There may still be a user account management script to modify to accommodate the revised approach... but this does not need to hold up the migration. 431 6.1.12 ICAF Upgrade: Identify or specify replacement Backup Space hardware 432 6.1.13 ICAF Upgrade: Selection and Setup of Backup Space's backup application 433 6.1.20 ICAF Upgrade: Rework Adjustments to ICAF Gui and related infrastructure code 434 6.1.14 ICAF Upgrade: Migrate Backup Space -- MOOT "Per Stephan Lammel (6/23/2008): There is no backup content to migrate, so this is moot." 436 6.1.21 REPLAN: ICAF User Migration Process "See: https://fermilab.onjira.com/browse/CDFINFRA-45 for plan, and https://fermilab.onjira.com/browse/CDFCAFS-1107 for Joe's notes related to execution of the plan." 437 6.1.22 ICAF Upgrade: Migrate User Space to new hardware - Announce Plan to Users 438 6.1.8 ICAF Upgrade: Migrate User Space to new hardware - User Group A Migration "RDK/SL (8/6/2008): Current estimate of KLG's time on this task is 25%, in large part due to operations responsibilities and short-term assignments." 439 6.1.9 ICAF Upgrade: Migrate User Space to new hardware - User Group B Migration 440 6.1.16 MILE 12: ICAF Nodes Upgrade deployed 441 6.1.23 ICAF Upgrade: Group A Servers Forced Read-only 442 6.1.24 ICAF Upgrade: Group B Servers Forced Read-only 443 6.1.25 ICAF Upgrade: Decommission Group A Servers 444 6.1.17 ICAF Upgrade: Decommission Group B Servers 445 6.1.18 Milestone: ICAF Nodes Upgrade closed 451 6.4 INPUT: Issue Tracking (Jira) Integration with Support Process 454 7 CDF Data Handling 460 7.3 INPUT: Issue Tracking (Jira) Integration with Support Process 464 7.5 Production dCache File Server Deployments 465 7.5.1 "dCache File Servers: Determine Supported Platform (OS, file system) to use" "6/6/2008: New deployments delayed by conflict: Scientific Linux dropped support for XFS, but dCache only supporting use on XFS file systems. RS states REX/Ops should do the dCache deployment, but on what file system since the OS must be Scientific Linu..." 466 7.5.2 "dCache File Servers: Prepare New File Servers (OS, file system)" 467 7.5.3 dCache File Servers: Install New File Servers (dCache application) Resource: Alex Kulyatsev 468 7.5.4 dCache File Servers: Deployment of Available New File Servers 469 7.5.7 MILE 25: dCache File Servers deployed 470 7.5.10 dCache File Servers: Move Pool Servers from FCC to GCC 471 7.5.5 dCache File Servers: Extraction of Old File Servers 472 7.5.8 dCache File Servers: Retirement of Old File Servers 473 7.5.6 Milestone: dCache File Servers upgrade closed 474 8 Project Management 475 8.1 Project Communications 476 8.1.1 Executive Meetings 477 8.1.1.1 Joint Executive Meeting 1 478 8.1.2 Management Meetings "Weekly, generally on Fridays at 11:30am->..." 479 8.1.3 Coordination Meetings "Overlay on CDF Offline Operations meetings, weekly on Wednesdays at 10am." 480 8.2 Project Planning 481 8.2.1 WBS v1.0 482 8.2.2 Organization Management Plan Embedded in Slides presented at the 4/9 CDF Offline Operations meeting. 483 8.2.3 Communications Management Plan Embedded in Slides presented at the 4/9 CDF Offline Operations meeting. 484 8.2.4 WBS v1.1 485 8.2.5 WBS v1.2 486 8.2.6 Early Schedule v0.9 487 8.2.7 Baseline Early High Priority Tasks Schedule v1.0 488 8.2.8 Resource-Loaded Complete Initiative Schedule v2.0 "This will require several steps in replanning portions of the Initiative. Schedules v1.1, v1.2, and may-be v1.3 will be intermediate schedules that introduce resource-loading and expand to lower priority work done in parallel with higher priority using..." 490 8.2.9 MILE 6: Project Planning completed 491 8.3 Project Administration 492 8.3.1 Planning Phase (Level of Effort = 10%) 493 8.3.2 Transition Phase (Level of Effort = 15%) 494 8.3.3 Execution Phase (Level of Effort = 25%) 496 8.3.4 milestone: Project Execution completed 497 8.4 Project Close-out 498 8.4.1 Project Closure Report: Draft covering Objectives 499 8.4.2 Project Internal Close-out Meeting 500 8.4.3 Project Closure Report: Final 501 8.4.4 "Project Executive Close-out Meeting: CDF, CD" 502 8.4.5 Archive project artifacts 503 8.4.6 milestone: Initiative Closed-out 504 9 END MILESTONE: CDF Offline Initiative completed