March 13, 2009
RHIC/ATLAS Computing Facility 
What's New at the RCF/ACF
Week of
Mon, Mar 9

  • Bluearc firmware upgrade Monday 10AM-12AM 3/16/2009    3/13/2009
    Fri Mar 13 10:59:34 EDT 2009

    This item has been posted to rhic-rcf-l@lists.bnl.gov

    Summary: A firmware upgrade will be done on the Bluearc system this Monday 3/16/2009.

    Duration: 3/16/2009 10AM - 12AM

    Group Responsible: Central Storage.

    Affected Area: Bluearc NFS file systems.

    Expected User Impact: Users may notice two service interruptions, each lasting less than 2 minutes.

    Maintenance Type: Upgrade. Submitted By: David Free, dfree@bnl.gov

    Description: Two service interruptions may be noticed while transferring NFS service to partner hardware. One will be before the upgrade to idle the server. The other will be after the upgrade to move the NFS service back to the original server.

  • RCF/US Atlas: HPSS problems resolved    3/12/2009
    Thu Mar 12 15:28:38 EDT 2009

    This item has been posted to rhic-rcf-l@lists.bnl.gov, usatlas-users-l@lists.bnl.gov, usatlas-ddm-l@lists.bnl.gov, usatlas-grid-l@lists.bnl.gov, usatlas-prodsys-l@lists.bnl.gov, atlas-project-adc-operations@cern.ch

    Summary: HPSS is back in full operation

    Duration: Thursday March 12 1:30PM - 2:30PM

    Group Responsible: HPSS

    Affected Area: HPSS services

    Expected User Impact: Full HPSS access has been restored. Submitted By: Shigeki Misawa (misawa@bnl.gov)

    Description: HPSS is back in full operation. Problems associated with file retrieval through the HPSS batch mechanism, particularly Atlas and Phenix file retrievals through their respective Dcache systems has been fix.

  • RCF/US Atlas: Possible HPSS problems    3/12/2009
    Thu Mar 12 13:12:15 EDT 2009

    This item has been posted to rhic-rcf-l@lists.bnl.gov, usatlas-users-l@lists.bnl.gov, usatlas-ddm-l@lists.bnl.gov, usatlas-grid-l@lists.bnl.gov, usatlas-prodsys-l@lists.bnl.gov, atlas-project-adc-operations@cern.ch

    Summary: We are experiencing some problems with HPSS.

    Duration: Thursday March 12 1:25PM EDT - 1:25PM EDT

    Group Responsible: HPSS

    Affected Area: HPSS services

    Expected User Impact: No access to HPSS

    Maintenance Type: Service interruption

    Submitted By: Shigeki Misawa (misawa@bnl.gov)

    Description: We are experiencing problems with HPSS. We will be restarting the system at 1:25PM EDT. All access to HPSS will be interrupted at this time.



  • CERN Disruptive Network Intervention on March 19 affecting Oracle Streams replication to BNL    3/12/2009
    Thu Mar 12 09:31:39 EDT 2009

    This item has been posted to usatlas-users-l@lists.bnl.gov, usatlas-computing-l@lists.bnl.gov

    Summary: Downtime of the Conditions Database streams data replication from CERN to BNL due to a Network intervention in CERN.

    Duration:

    03/19/09 5:45 AM (CET) - 03/19/09 8:00 AM (CET)

    Group Responsible: CERN IT

    Affected Area: Oracle Conditions Database streams data replication

    Expected User Impact: The Conditions Database at BNL will be available servicing latest replicated data.

    Submitted By: Carlos Fernando Gamboa, cgamboa@bnl.gov

    Description:

  • Major interruption to BNL Tier 1 Facility services on March 24     3/12/2009
    Thu Mar 12 09:17:21 EDT 2009

    This item has been posted to usatlas-users-l@lists.bnl.gov, usatlas-computing-l@lists.bnl.gov, usatlas-ddm-l@lists.bnl.gov, usatlas-grid-l@lists.bnl.gov, usatlas-prodsys-l@lists.bnl.gov, atlas-project-adc-operations@cern.ch

    Summary: Major electrical power and cooling interruption to BNL Tier 1 Facility

    Duration: Tue., March 24 at 8 am - Wed., March 25 at 8 am

    Group Responsible: Linux Farm

    Affected Area: Linux Farm

    Expected User Impact: 60% of Linux Farm cluster will be unavailable to Tier 1 user community. A total of 1636 computing cores will be unavailable, affecting all Condor queues (production, analysis, short, long, etc). Users should expect considerable delays in job execution at the BNL Tier 1 facility. The affected systems will be closed to new jobs on Monday, March 23 at 8 am, ahead of the outage to allow currently running jobs to end gracefully. Affected systems will be powered up and reopened to jobs after the maintenance period ends.

    Maintenance Type: Service Interruption Submitted By: Tony Chan, tony@bnl.gov

    Description: A major interruption of electrical power and cooling is scheduled for March 24 and will result in a loss of 60% of the Tier 1 Linux cluster computing capacity. The electrical and cooling outage is required to integrate a new RACF data center to the infrastructure grid at BNL. The new data center is expected to be available to the RACF in July 2009. No other facility services (dCache, AFS, NFS, Condor batch, etc) will be affected during this outage.

  • US Atlas AFS DB servers to be replaced - completed    3/11/2009
    Wed Mar 11 10:53:12 EDT 2009

    This item has been posted to usatlas-users-l@lists.bnl.gov

    Summary: The US Atlas AFS DB servers will be replaced with faster and more reliable hardware.

    Duration: Monday March 9, 10:00 EDT - 10:30 EST - completed Tuesday March 10, 14:30 EDT - 15:00 EDT - completed Wednesday March 11, 10:00 EDT - 10:30 EDT - completed

    Group Responsible: GCE

    Affected Area: AFS

    Expected User Impact: Should be transparent. Worst case is a short interruption on the order of a few minutes sometime during the maintenance window.

    Maintenance Type: Upgrade Submitted By: John McCarthy (mccarthy@bnl.gov)

    Description: The last of the three US Atlas AFS DB servers was replaced this morning. All US Atlas AFS DB servers are now running on the new hardware.

  • RHIC AFS Cell back up    3/10/2009
    Tue Mar 10 17:10:01 EDT 2009

    This item has been posted to rhic-rcf-l@lists.bnl.gov

    Summary: The RHIC AFS cell is back up

    Duration: 3/10/09 14:30 EDT - 16:45 EDT

    Group Responsible: GCE

    Affected Area: RHIC AFS

    Expected User Impact: RHIC AFS was down during the above listed period

    Maintenance Type: Service interruption Submitted By: John McCarthy (mccarthy@bnl.gov)

    Description: The RHIC AFS cell went down at 14:25 EDT today. Electrcial work being done in the area caused power interruptions/fluctuations to both RHIC AFS file server systems causing them to go down. Once power was stabilized to the systems they came up, and proceeded to go through a salvaging process (an AFS fsck). All salvaging processes finished and both RHIC AFS fileserver systems are up and available since 16:45 EDT. AFS is now online. Sorry for the unexpected outage.

  • RHIC AFS CELL DOWN    3/10/2009
    Tue Mar 10 16:03:12 EDT 2009

    This item has been posted to rhic-rcf-l@lists.bnl.gov

    Summary: RHIC AFS CELL DOWN

    Duration: 3/10/09 14:30 -

    Group Responsible: GCE

    Affected Area: AFS

    Expected User Impact: AFS is unavailable in RHIC

    Maintenance Type: Down Submitted By: John McCarthy

    Description: RHIC AFS fileservers have crashed. Trying to get them back up. No time of resolution yet.

  • US Atlas AFS DB servers to be replaced - update    3/10/2009
    Tue Mar 10 15:17:12 EDT 2009

    This item has been posted to usatlas-users-l@lists.bnl.gov

    Summary: The US Atlas AFS DB servers will be replaced with faster and more reliable hardware.

    Duration: Monday March 9, 10:00 EDT - 10:30 EST - completed Tuesday March 10, 14:30 EDT - 15:00 EDT - completed Wednesday March 11, 10:00 EDT - 10:30 EDT

    Group Responsible: GCE

    Affected Area: AFS

    Expected User Impact: Should be transparent. Worst case is a short interruption on the order of a few minutes sometime during the maintenance window.

    Maintenance Type: Upgrade Submitted By: John McCarthy (mccarthy@bnl.gov)

    Description: The second of the three US Atlas AFS DB servers was replaced this afternoon. The last server will be replaced tomorrow morning (see the above schedule).

  • US Atlas AFS DB servers to be replaced - update    3/9/2009
    Mon Mar 9 11:24:07 EDT 2009

    This item has been posted to usatlas-users-l@lists.bnl.gov

    Summary: The US Atlas AFS DB servers will be replaced with faster and more reliable hardware.

    Duration: Monday March 9, 10:00 EDT - 10:30 EST - completed Tuesday March 10, 14:30 EDT - 15:00 EDT Wednesday March 11, 10:00 EDT - 10:30 EDT

    Group Responsible: GCE

    Affected Area: AFS

    Expected User Impact: Should be transparent. Worst case is a short interruption on the order of a few minutes sometime during the maintenance window.

    Maintenance Type: Upgrade Submitted By: John McCarthy (mccarthy@bnl.gov)

    Description: The first of the three US Atlas AFS DB servers was replaced this morning. The next one will be replaced tomorrow afternoon (see the above schedule).

Week of
Mon, Mar 2

  • US Atlas AFS DB servers to be replaced    3/6/2009
    Fri Mar 6 17:36:02 EST 2009

    This item has been posted to usatlas-users-l@lists.bnl.gov

    Summary: The US Atlas AFS DB servers will be replaced with faster and more reliable hardware.

    Duration: Monday March 9, 10:00 EDT - 10:30 EST Tuesday March 10, 14:30 EDT - 15:00 EDT Wednesday March 11, 10:00 EDT - 10:30 EDT

    Group Responsible: GCE

    Affected Area: AFS

    Expected User Impact: Should be transparent. Worst case is a short interruption on the order of a few minutes sometime during the maintenance window.

    Maintenance Type: Upgrade Submitted By: John McCarthy (mccarthy@bnl.gov)

    Description: We will be moving the US Atlas DB service to newer, faster, and more reliable hardware. There are three redundant servers - hence the three maintenance windows. Due to the redundancy of the servers no interruption of AFS service is anticipated.

  • Major interruption to Tier 1 Facility services on March 24    3/4/2009
    Wed Mar 4 10:35:16 EST 2009

    This item has been posted to usatlas-users-l@lists.bnl.gov, usatlas-computing-l@lists.bnl.gov, usatlas-ddm-l@lists.bnl.gov, usatlas-grid-l@lists.bnl.gov, usatlas-prodsys-l@lists.bnl.gov, atlas-project-adc-operations@cern.ch

    Summary: Major electrical power and cooling interruption to Tier 1 Facility

    Duration: Tue., March 24 at 8 am - Wed., March 25 at 8 am

    Group Responsible: Linux Farm

    Affected Area: Linux Farm

    Expected User Impact: 60% of Linux Farm cluster will be unavailable to Tier 1 user community

    Maintenance Type: Service Interruption Submitted By: Tony Chan, tony@bnl.gov Description: A major interruption of electrical power and cooling is scheduled for March 24 and will result in a loss of 60% of the Tier 1 Linux cluster computing capacity. The electrical and cooling outage is required to integrate a new RACF data center to the infrastructure grid at BNL. The new data center is expected to be available to the RACF in July 2009. No other facility services (dCache, AFS, NFS, Condor batch, etc) will be affected during this outage.

  • RCF: Problems at the RCF resolved    3/3/2009
    Tue Mar 3 12:52:35 EST 2009

    This item has been posted to rhic-rcf-l@lists.bnl.gov

    Summary: Problems at the RCF have been resolved

    Duration: Tuesday March 3 12:14PM

    Group Responsible: GCE group

    Affected Area: All services

    Maintenance Type: Facility is back to normal operations.

    Submitted By: Shigeki Misawa (misawa@bnl.gov)

    Description: Problems within the RCF facility have been resolved. The source of the problem was traced to interactions between AFS clients, an AFS server and LDAP service. Problem was resolved by restarting the errant AFS server.

    The exact trigger for this runaway interaction between the AFS clients, AFS server and LDAP has yet to be determined.

  • RCF: Multiple problems at the RCF    3/3/2009
    Tue Mar 3 10:46:23 EST 2009

    This item has been posted to rhic-rcf-l@lists.bnl.gov

    Summary: We are experiencing problems within the RCF facility.

    Duration: Tuesday March 3 6:00AM - Now

    Group Responsible: GCE group

    Affected Area: All services

    Expected User Impact: All services

    Maintenance Type: Severe degradation in services.

    Submitted By: Shigeki Misawa (misawa@bnl.gov)

    Description: We are experiencing problems within the RCF that are affecting most if not all services. We are currently experiencing slowness/heavy load on RHIC AFS and LDAP services. Whether this is a cause or effect is being investigated.

    More information as it becomes available.

  • GUMS service was recovered.     3/2/2009
    Mon Mar 2 18:13:47 EST 2009

    This item has been posted to racf-wlcg-announce-l@lists.bnl.gov, usatlas-computing-l@lists.bnl.gov, usatlas-ddm-l@lists.bnl.gov, usatlas-grid-l@lists.bnl.gov, usatlas-prodsys-l@lists.bnl.gov, atlas-project-adc-operations@cern.ch

    Summary: GUMS service had flaky performance and a period of downtime. It was recovered at 5:30PM.

    Duration: March/02/2009 10:00AM - March/02/2009 5:30PM

    Group Responsible: Grid

    Affected Area: Grid Job Submission and dCache data transfer.

    Experienced User Impact: Flake job submission and period of downtime. Submitted By: Dantong Yu

    Description:

    The newly installed F5 switch does not work properly with the backend GUMS servers. It left many open TCP connections in the two GUMS servers. The GUMS memory consumption reached the hardware limit, and caused occasional authentication failure for computing jobs and data transfer requests. We decided to swap out the F5 switch. It caused an hour downtime because we had to reconfigure the backend database, and clean up the stuck TCP connections, and authentication requests.

    The service came back around 5:30PM.

    Sorry for inconvenience. If we continue experience any authentication error, please submit RT tickets or call RACF operators.



Week of
Mon, Feb 23

  • Star: filesystem maintenance completed    2/26/2009
    Thu Feb 26 12:36:16 EST 2009

    This item has been posted to rhic-rcf-l@lists.bnl.gov

    Summary: maintenance on bluearc nfs filesystems is done

    Duration: completed 12:20

  • AFS emergency maintenance completed successfully    2/26/2009
    Thu Feb 26 10:53:20 EST 2009

    This item has been posted to rhic-rcf-l@lists.bnl.gov

    Summary: A failed system disk in one of the RHIC AFS DB servers needs to be replaced

    Duration: Today February 26, 2009 10:30 EST - 11:00 EST

    Group Responsible: GCE

    Affected Area: RHIC AFS

    Expected User Impact: Should be transparent. In the event things don't go as planned some clients may see slow response or hanging performance during the maintenance period.

    Maintenance Type: Hopefully transparent Submitted By: John McCarthy (mccarthy@bnl.gov)

    Description: The maintenance was completed as expected. No (or minimal) interruptions should have been observed.

  • Star: Filesystem downtime - 12 noon today 2/26     2/26/2009
    Thu Feb 26 10:42:56 EST 2009

    This item has been posted to rhic-rcf-l@lists.bnl.gov

    Summary: upgrading firmware on disk storage

    Duration: 12 noon, 2/26/09 for about 1/2 hr Group Responsible: Central Storage

    Affected Area: These filesystems will be unavailable for 20-30minutes

    /star/data25 /star/data26 /star/data31 /star/data36 /star/data46 /star/data47 /star/data48 /star/data18 /star/data19 /star/data20 /star/data21 /star/data40 /star/data41 /star/data43

    Expected User Impact: it has been our experience that all network mounts of these filesystems and most batch jobs survive an outage of this type and duration.

    Maintenance Type: Filesystems unavailable Submitted By: Maurice askinazi@bnl.gov

  • AFS emergency maintenance    2/26/2009
    Thu Feb 26 10:04:02 EST 2009

    This item has been posted to rhic-rcf-l@lists.bnl.gov

    Summary: A failed system disk in one of the RHIC AFS DB servers needs to be replaced

    Duration: Today February 26, 2009 10:30 EST - 11:00 EST

    Group Responsible: GCE

    Affected Area: RHIC AFS

    Expected User Impact: Should be transparent. In the event things don't go as planned some clients may see slow response or hanging performance during the maintenance period.

    Maintenance Type: Hopefully transparent Submitted By: John McCarthy (mccarthy@bnl.gov)

    Description: A system disk has failed in one of the RHIC AFS DB servers (there are three for reduncancy) The server is currently online but the disk needs to be replaced. The system, at least, needs to be rebooted to insure that it can reboot unattended. If RAID failover doesn't proceed as expected there will be a longer outage of the system. Again, clients should fail over to the other DB servers and this work should be transparent.

  • Need to power cycle the F5 switch at 10:45AM February/25/2009    2/25/2009
    Wed Feb 25 10:32:48 EST 2009

    This item has been posted to rhic-rcf-l@lists.bnl.gov, usatlas-ddm-l@lists.bnl.gov, usatlas-grid-l@lists.bnl.gov, usatlas-prodsys-l@lists.bnl.gov

    Summary: F5 switch needs to be power cycle in order to add the second power supply

    Duration: February/25/2009 10:45AM - February/25/2009 10:50AM

    Group Responsible: Grid

    Affected Area: HPSS services, etc.

    Expected User Impact: There will be one minute glitch for ATLAS LFC and USATLAS Panda servers.

    Maintenance Type: "Transparent" Submitted By: Dantong Yu, dtyu@bnl.gov

    Description: F5 will be powercycled after we added in the second power supply.

Week of
Mon, Feb 16

  • RHIC AFS cell unavailable    2/19/2009
    Thu Feb 19 17:28:25 EST 2009

    This item has been posted to rhic-rcf-l@lists.bnl.gov

    Summary: The RHIC AFS cell is currently unavailable.

    Duration:2/19/09 17:09 - 2/19/09 17:30

    Group Responsible: GCE

    Affected Area: AFS

    Expected User Impact: AFS currently down

    Maintenance Type: Downtime

    Submitted By: John McCarthy (mccarthy@bnl.gov)

    Description: Service on one of the RHIC AFS file servers went down. A salvage process is currently running and AFS should be available after the salvage process finishes.

  • FTS upgrade Tomorrow, February/19/2009    2/18/2009
    Wed Feb 18 12:10:13 EST 2009

    This item has been posted to rhic-rcf-l@lists.bnl.gov, usatlas-computing-l@lists.bnl.gov, usatlas-grid-l@lists.bnl.gov, usatlas-prodsys-l@lists.bnl.gov, atlas-project-adc-operations@cern.ch

    Summary: FTS will be relocated to BNL LHCOPN (192.12.15.0/24) and upgraded to FTS 2.1

    Duration: 9:00AM, Thursday morning - 1:00PM, Thursday afternoon

    Group Responsible: Grid/GCE

    Affected Area: USATLAS Data transfer from Tier 2 back to BNL

    Expected User Impact: No data transfer from Tier 2 sites back to BNL during this period

    Maintenance Type: "Downtime" Submitted By: First Name Ito, Hiro, Hover, John, Dantong Yu

    Description: Last week, John Hover sent out email inquiry for FTS maintenance. And it was agreed to perform upgrade tomorrow. Details as follow:

    Here is a proposed sequence of steps for the FTS switchover at 10:00am Thursday, February 19th. The purpose is to move FTS to RHEL4/gLite 3.1 from RHEL3/gLite 3.0.

    (anytime) -1) Config all FTAs (FTM?) on fts02.usatlas.bnl.gov

    (10:00am) 0) Last moment production DB backup (needed?) 1) Switch oracle client on fts02 from testdb to proddb 2) Perform 3.1 -> 3.2 schema upgrade using sqlplus 3) Start service on fts02. Confirm function.

    (~10:30, all at once) 4) Switch DNS alias for fts.usatlas.bnl.gov to fts02 from lcg03. 5) Switch GOCDB FTS entry from lcg03 to fts.usatlas. 6) Switch site-BDII config from lcg03 to fts.usatlas.

    (several days later) 7) Upgrade fts01 to RHEL4/gLite 3.1, config as fts02. 8) Move portion of FTAs from fts02 to fts01.

    Please review the sequence and email with problems, additions, clarifications, re-orderings, etc. Once we have a firm plan we'll prepare the announcement.

    Cheers,

    --john



  • RCF: Network maintenance completed    2/18/2009
    Wed Feb 18 11:25:42 EST 2009

    This item has been posted to rhic-rcf-l@lists.bnl.gov

    Summary: Migration from the old core RCF switch to the new core RCF switch has been completed

    Duration: February 18 11:00AM EST

    Group Responsible: GCE/Network group

    Affected Area: Internal RCF connectivity

    Expected User Impact: Internal RCF connectivity restored

    Maintenance Type: Service interruption

    Submitted By: Shigeki Misawa (misawa@bnl.gov)

    Description: Transition of core RCF network functionality has been migrated from the old RCF core switch (SW13) to the new RCF core switch.

    RCF staff is in the process of checking various internal systems to determine if any residual clean up needs to be made.

  • ATLAS dCache upgrade has ended    2/17/2009
    Tue Feb 17 17:44:59 EST 2009

    This item has been posted to racf-wlcg-announce-l@lists.bnl.gov, usatlas-users-l@lists.bnl.gov, usatlas-computing-l@lists.bnl.gov, usatlas-ddm-l@lists.bnl.gov, usatlas-grid-l@lists.bnl.gov, usatlas-prodsys-l@lists.bnl.gov, atlas-project-adc-operations@cern.ch

    Summary: ATLAS dCache upgrade

    Duration: 17 Feb 2009 08h00 - 17 Feb 2009 17h00

    Group Responsible: Storage

    Affected Area: dCache

    Expected User Impact: No user or system will be able to use BNL's dCache, to read or write data.

    Maintenance Type: Downtime

    Submitted By: Pedro Salgado, psalgado@bnl.gov

    Description: dCache

    * upgrade to dCache 1.9.0-9

    SRM database

    * upgrade to 64bits & 8.3.5

    billing database

    * postgres upgrade to 8.3.5 * apply partitioning

  • RCF: (Reminder) Major network maintenance on Wed. Feb 18    2/17/2009
    Tue Feb 17 14:17:05 EST 2009

    This item has been posted to rhic-rcf-l@lists.bnl.gov

    Summary: Major network maintenance that is likely to cause significant disruptions within the RCF

    Duration: Wednesday February 18 10:00AM EST to 12:00PM EST (2 hours)

    Group Responsible: GCE/Network Group

    Affected Area: All RCF internal systems

    Expected User Impact: Potential for major disruptions to connectivity to the RCF and disruptions to connectivity between systems within the RCF. However, more likely is the the possibility of a series of short network outages that, when combined, can be seen an extended period of network instability.

    Maintenance Type: Changes to the routing structure within the RCF network fabric, potentially causing major disruptions.

    Submitted By: Shigeki Misawa (misawa@bnl.gov)

    Description: The network group will be making the final changes to the RCF internal network to bring the new core switch (SW33) into full production and turn the old core switch (SW13) into an "edge" switch. This will involve moving routing functionality from SW13 to SW33 which is likely to cause sufficient network disruption to affect running applications.

    Prior to the network maintenance, the RCF linux farm will be drained of jobs to decrease network traffic.

  • END: US ATLAS FTS and LFC Oracle Cluster Database maintenance on 02/17/2009    2/17/2009
    Tue Feb 17 12:24:05 EST 2009

    This item has been posted to usatlas-computing-l@lists.bnl.gov, usatlas-ddm-l@lists.bnl.gov, usatlas-grid-l@lists.bnl.gov

    The oracle CPU January 2009, the CRS BUNDLE2 patch (Patch 7493592) and the upgrade of Oracle Automatic Storage Management libraries were successfully deployed in the FTS and LFC cluster Database. Latest OS kernel patches were deployed as well. No service interruption observed during this intervention.

    Submitted By: Carlos Fernando Gamboa, cgamboa@bnl.gov

  • RCF Linux Farm closing to Condor jobs    2/17/2009
    Tue Feb 17 11:32:36 EST 2009

    This item has been posted to rhic-rcf-l@lists.bnl.gov, racf-wlcg-announce-l@lists.bnl.gov

    Summary:

    The RCF portion of the Linux Farm will be closed to Condor jobs ahead of major network infrastructure outage announced last week.

    Duration: 4 pm Feb. 17 (today) - 12 noon Feb. 18 (tomorrow)

    Group Responsible: Linux Farm

    Affected Area: Condor jobs

    Expected User Impact: Current user jobs will continue to run to completion, but no new jobs will be scheduled until the network maintenance work is done.

    Maintenance Type: Downtime Submitted By: Tony Chan, tony@bnl.gov Description:

    All Condor queues for BRAHMS, PHENIX, PHOBOS, STAR and LSST will be closed at 4 pm today (Feb. 17, 2009) to allow currently running jobs to drain out. Queued jobs will remain in the system and will be schedule to execute by Condor when the network maintenance work is completed.

  • REMINDER: US ATLAS FTS and LFC Oracle Cluster Database maintenance on 02/17/2009    2/17/2009
    Tue Feb 17 09:36:55 EST 2009

    This item has been posted to usatlas-computing-l@lists.bnl.gov, usatlas-ddm-l@lists.bnl.gov, usatlas-grid-l@lists.bnl.gov

    Summary: -Apply Oracle CPU 2009 -Apply Oracle Real Application Clusters bundle 2 patch -Upgrade Oracle Automatic Storage Management libraries -Update latest kernel and system software security patches.

    Duration: 02/17/2009 10:00 EST- 02/17/2009 13:00 EST

    Group Responsible: Grid

    Affected Area: US Atlas FTS and LFC Oracle Cluster Database

    Expected User Impact:

    No service disruption during this intervention is expected.

    Maintenance Type: Transparent

    Submitted By: Carlos Fernando Gamboa, cgamboa@bnl.gov

    Description:

    Different patches will applied on the BNL US ATLAS FTS and LFC oracle cluster database and in the operative system:

    -Apply Oracle Critical Patch Update January 2009 -Apply Oracle 10..2.0.4 CRS BUNDLE2 patch (Patch 7493592) -Upgrade Oracle Automatic Storage Management libraries -Update the kernel and system software with the latest security patches.

    These patches will be deployed on one node of the cluster at the time so no downtime of the database service is expected.

  • ATLAS dCache scheduled upgrade has started    2/17/2009
    Tue Feb 17 08:00:45 EST 2009

    This item has been posted to racf-wlcg-announce-l@lists.bnl.gov, usatlas-users-l@lists.bnl.gov, usatlas-computing-l@lists.bnl.gov, usatlas-ddm-l@lists.bnl.gov, usatlas-grid-l@lists.bnl.gov, usatlas-prodsys-l@lists.bnl.gov, atlas-project-adc-operations@cern.ch

    Summary: ATLAS dCache upgrade

    Duration: 17 Feb 2009 08h00 - 17 Feb 2009 17h00

    Group Responsible: Storage

    Affected Area: dCache

    Expected User Impact: No user or system will be able to use BNL's dCache, to read or write data.

    Maintenance Type: Downtime Submitted By: Pedro Salgado, psalgado@bnl.gov

    Description: dCache

    * upgrade to dCache server 1.9.0-9

    SRM database

    * upgrade to 64bits & 8.3.5

    billing database

    * postgres upgrade to 8.3.5 * apply partitioning

  • [REMINDER] ATLAS dCache upgrade (20090217)    2/16/2009
    Mon Feb 16 10:56:56 EST 2009

    This item has been posted to racf-wlcg-announce-l@lists.bnl.gov, usatlas-users-l@lists.bnl.gov, usatlas-computing-l@lists.bnl.gov, usatlas-ddm-l@lists.bnl.gov, usatlas-grid-l@lists.bnl.gov, usatlas-prodsys-l@lists.bnl.gov, atlas-project-adc-operations@cern.ch

    Summary: ATLAS dCache upgrade

    Duration: 17 Feb 2009 08h00 - 17 Feb 2009 17h00

    Group Responsible: Storage

    Affected Area: dCache

    Expected User Impact: No user or system will be able to use BNL's dCache, to read or write data.

    Maintenance Type: Downtime Submitted By: Pedro Salgado, psalgado@bnl.gov

    Description: dCache

    * upgrade to dCache server 1.9.0-9

    SRM database

    * upgrade to 64bits & 8.3.5

    billing database

    * postgres upgrade to 8.3.5 * apply partitioning

Last Modified March 13, 2009
RACF Staff