RHIC AFS cell unavailable
2/19/2009
Thu Feb 19 17:28:25 EST 2009
This item has been posted to rhic-rcf-l@lists.bnl.gov
Summary:
The RHIC AFS cell is currently unavailable.
Duration:2/19/09 17:09 - 2/19/09 17:30
Group Responsible:
GCE
Affected Area:
AFS
Expected User Impact:
AFS currently down
Maintenance Type:
Downtime
Submitted By:
John McCarthy (mccarthy@bnl.gov)
Description:
Service on one of the RHIC AFS file servers
went down. A salvage process is currently
running and AFS should be available after
the salvage process finishes.
FTS upgrade Tomorrow, February/19/2009
2/18/2009
Wed Feb 18 12:10:13 EST 2009
This item has been posted to rhic-rcf-l@lists.bnl.gov, usatlas-computing-l@lists.bnl.gov, usatlas-grid-l@lists.bnl.gov, usatlas-prodsys-l@lists.bnl.gov, atlas-project-adc-operations@cern.ch
Summary:
FTS will be relocated to BNL LHCOPN (192.12.15.0/24) and upgraded to FTS 2.1
Duration:
9:00AM, Thursday morning - 1:00PM, Thursday afternoon
Group Responsible:
Grid/GCE
Affected Area:
USATLAS Data transfer from Tier 2 back to BNL
Expected User Impact:
No data transfer from Tier 2 sites back to BNL during this period
Maintenance Type:
"Downtime"
Submitted By:
First Name Ito, Hiro, Hover, John, Dantong Yu
Description:
Last week, John Hover sent out email inquiry for FTS maintenance. And it was agreed to perform upgrade tomorrow. Details as follow:
Here is a proposed sequence of steps for the FTS switchover at 10:00am Thursday, February 19th. The purpose is to move FTS to RHEL4/gLite 3.1 from RHEL3/gLite 3.0.
(anytime)
-1) Config all FTAs (FTM?) on fts02.usatlas.bnl.gov
(10:00am)
0) Last moment production DB backup (needed?)
1) Switch oracle client on fts02 from testdb to proddb
2) Perform 3.1 -> 3.2 schema upgrade using sqlplus
3) Start service on fts02. Confirm function.
(~10:30, all at once)
4) Switch DNS alias for fts.usatlas.bnl.gov to fts02 from lcg03.
5) Switch GOCDB FTS entry from lcg03 to fts.usatlas.
6) Switch site-BDII config from lcg03 to fts.usatlas.
(several days later)
7) Upgrade fts01 to RHEL4/gLite 3.1, config as fts02.
8) Move portion of FTAs from fts02 to fts01.
Please review the sequence and email with problems, additions, clarifications, re-orderings, etc. Once we have a firm plan we'll prepare the announcement.
Cheers,
--john
RCF: Network maintenance completed
2/18/2009
Wed Feb 18 11:25:42 EST 2009
This item has been posted to rhic-rcf-l@lists.bnl.gov
Summary: Migration from the old core RCF switch to the new core RCF switch has been completed
Duration:
February 18 11:00AM EST
Group Responsible: GCE/Network group
Affected Area:
Internal RCF connectivity
Expected User Impact: Internal RCF connectivity restored
Maintenance Type: Service interruption
Submitted By:
Shigeki Misawa (misawa@bnl.gov)
Description:
Transition of core RCF network functionality has been migrated from the old RCF core switch (SW13) to the new RCF core switch.
RCF staff is in the process of checking various internal systems to determine if any residual clean up needs to be made.
ATLAS dCache upgrade has ended
2/17/2009
Tue Feb 17 17:44:59 EST 2009
This item has been posted to racf-wlcg-announce-l@lists.bnl.gov, usatlas-users-l@lists.bnl.gov, usatlas-computing-l@lists.bnl.gov, usatlas-ddm-l@lists.bnl.gov, usatlas-grid-l@lists.bnl.gov, usatlas-prodsys-l@lists.bnl.gov, atlas-project-adc-operations@cern.ch
Summary:
ATLAS dCache upgrade
Duration:
17 Feb 2009 08h00 - 17 Feb 2009 17h00
Group Responsible:
Storage
Affected Area:
dCache
Expected User Impact:
No user or system will be able to use BNL's dCache, to read or write data.
Maintenance Type:
Downtime
Submitted By:
Pedro Salgado, psalgado@bnl.gov
Description:
dCache
* upgrade to dCache 1.9.0-9
SRM database
* upgrade to 64bits & 8.3.5
billing database
* postgres upgrade to 8.3.5
* apply partitioning
RCF: (Reminder) Major network maintenance on Wed. Feb 18
2/17/2009
Tue Feb 17 14:17:05 EST 2009
This item has been posted to rhic-rcf-l@lists.bnl.gov
Summary: Major network maintenance that is likely to cause significant disruptions within the RCF
Duration: Wednesday February 18 10:00AM EST to 12:00PM EST (2 hours)
Group Responsible: GCE/Network Group
Affected Area: All RCF internal systems
Expected User Impact: Potential for major disruptions to connectivity to the RCF and disruptions to connectivity between systems within the RCF. However, more likely is the
the possibility of a series of short network outages that, when combined, can be seen an extended period of network instability.
Maintenance Type: Changes to the routing structure within the RCF network fabric, potentially causing major disruptions.
Submitted By: Shigeki Misawa (misawa@bnl.gov)
Description: The network group will be making the final changes to the RCF internal network to bring the new core switch (SW33) into full production and turn the old core switch (SW13) into an "edge" switch. This will involve moving routing functionality from SW13 to SW33 which is likely to cause sufficient network disruption to affect running applications.
Prior to the network maintenance, the RCF linux farm will be drained of jobs to decrease network traffic.
END: US ATLAS FTS and LFC Oracle Cluster Database maintenance on 02/17/2009
2/17/2009
Tue Feb 17 12:24:05 EST 2009
This item has been posted to usatlas-computing-l@lists.bnl.gov, usatlas-ddm-l@lists.bnl.gov, usatlas-grid-l@lists.bnl.gov
The oracle CPU January 2009, the CRS BUNDLE2 patch (Patch 7493592) and the upgrade of Oracle Automatic Storage Management libraries were successfully deployed in the FTS and LFC cluster Database.
Latest OS kernel patches were deployed as well.
No service interruption observed during this intervention.
Submitted By:
Carlos Fernando Gamboa, cgamboa@bnl.gov
RCF Linux Farm closing to Condor jobs
2/17/2009
Tue Feb 17 11:32:36 EST 2009
This item has been posted to rhic-rcf-l@lists.bnl.gov, racf-wlcg-announce-l@lists.bnl.gov
Summary:
The RCF portion of the Linux Farm will be
closed to Condor jobs ahead of major network
infrastructure outage announced last week.
Duration:
4 pm Feb. 17 (today) - 12 noon Feb. 18 (tomorrow)
Group Responsible:
Linux Farm
Affected Area:
Condor jobs
Expected User Impact:
Current user jobs will continue to run to
completion, but no new jobs will be scheduled
until the network maintenance work is done.
Maintenance Type:
Downtime
Submitted By:
Tony Chan, tony@bnl.gov
Description:
All Condor queues for BRAHMS, PHENIX, PHOBOS,
STAR and LSST will be closed at 4 pm today
(Feb. 17, 2009) to allow currently running
jobs to drain out. Queued jobs will remain
in the system and will be schedule to execute
by Condor when the network maintenance work
is completed.
REMINDER: US ATLAS FTS and LFC Oracle Cluster Database maintenance on 02/17/2009
2/17/2009
Tue Feb 17 09:36:55 EST 2009
This item has been posted to usatlas-computing-l@lists.bnl.gov, usatlas-ddm-l@lists.bnl.gov, usatlas-grid-l@lists.bnl.gov
Summary:
-Apply Oracle CPU 2009
-Apply Oracle Real Application Clusters bundle 2 patch
-Upgrade Oracle Automatic Storage Management libraries
-Update latest kernel and system software security patches.
Duration: 02/17/2009 10:00 EST- 02/17/2009 13:00 EST
Group Responsible: Grid
Affected Area: US Atlas FTS and LFC Oracle Cluster Database
Expected User Impact:
No service disruption during this intervention is expected.
Maintenance Type: Transparent
Submitted By: Carlos Fernando Gamboa, cgamboa@bnl.gov
Description:
Different patches will applied on the BNL US ATLAS FTS and LFC oracle cluster database and in the operative system:
-Apply Oracle Critical Patch Update January 2009
-Apply Oracle 10..2.0.4 CRS BUNDLE2 patch (Patch 7493592)
-Upgrade Oracle Automatic Storage Management libraries
-Update the kernel and system software with the latest security patches.
These patches will be deployed on one node of the cluster at the time so no downtime of the database service is expected.
ATLAS dCache scheduled upgrade has started
2/17/2009
Tue Feb 17 08:00:45 EST 2009
This item has been posted to racf-wlcg-announce-l@lists.bnl.gov, usatlas-users-l@lists.bnl.gov, usatlas-computing-l@lists.bnl.gov, usatlas-ddm-l@lists.bnl.gov, usatlas-grid-l@lists.bnl.gov, usatlas-prodsys-l@lists.bnl.gov, atlas-project-adc-operations@cern.ch
Summary:
ATLAS dCache upgrade
Duration:
17 Feb 2009 08h00 - 17 Feb 2009 17h00
Group Responsible:
Storage
Affected Area:
dCache
Expected User Impact:
No user or system will be able to use BNL's dCache, to read or write data.
Maintenance Type:
Downtime
Submitted By:
Pedro Salgado, psalgado@bnl.gov
Description:
dCache
* upgrade to dCache server 1.9.0-9
SRM database
* upgrade to 64bits & 8.3.5
billing database
* postgres upgrade to 8.3.5
* apply partitioning
[REMINDER] ATLAS dCache upgrade (20090217)
2/16/2009
Mon Feb 16 10:56:56 EST 2009
This item has been posted to racf-wlcg-announce-l@lists.bnl.gov, usatlas-users-l@lists.bnl.gov, usatlas-computing-l@lists.bnl.gov, usatlas-ddm-l@lists.bnl.gov, usatlas-grid-l@lists.bnl.gov, usatlas-prodsys-l@lists.bnl.gov, atlas-project-adc-operations@cern.ch
Summary:
ATLAS dCache upgrade
Duration:
17 Feb 2009 08h00 - 17 Feb 2009 17h00
Group Responsible:
Storage
Affected Area:
dCache
Expected User Impact:
No user or system will be able to use BNL's dCache, to read or write data.
Maintenance Type:
Downtime
Submitted By:
Pedro Salgado, psalgado@bnl.gov
Description:
dCache
* upgrade to dCache server 1.9.0-9
SRM database
* upgrade to 64bits & 8.3.5
billing database
* postgres upgrade to 8.3.5
* apply partitioning