09FEB2009 nomad1 is in a state where all the data areas are showing read_only. I have sent a message to the helpdesk. -------- Original Message -------- Subject: [EMC #10515]: nomad1 problem Date: Mon, 09 Feb 2009 11:54:32 -0500 From: EMC Helpdesk Reply-To: EMC Helpdesk To: Jordan.Alpert@noaa.gov References: <200902091453.n19Er0wk030565@mailrt1.ncep.noaa.gov> Jordan, The disk arrays ran into issues because of an overflow of I/O to the RAIDs which caused the controller to shut them off. I have rebooted the system and they are now back online. At this time it is functioning normally; however, if this issue occurs again, it will require firmware updates which we will coordinate at that time. -Kyle Jordan --------- 18DEC2008 The testing for the change (see 5DEC 2008) had system/ops taking the develpment super computer for their work on Dec 17. As the message below states the system has returned, and we have restarted cron. "Transfer" jobs refers to the development mirror from which the experimental NOMADS servers nomad[1,3,5] get their data. NOMADS jobs need to be started manually which Jun and I have already done. Even though the system came back early this morning it takes a day to get the mirror replinished so all should use the new high availability server, 24/7, at http://nomads.ncep.noaa.gov Below is the message from super computer system. -------- Original Message -------- Subject: [NCEP.List.SP-Announce] SP-ANNOUNCE LIST NOTIFICATION - Planned Dew maintenance work Date: Thu, 18 Dec 2008 05:50:59 -0500 From: SDM To: _NCEP.List SP-Announce All, MIST is now available for developers. All upgrading and testing has been done, all transfer jobs have been restarted. SDM Mike Wooldridge Correction: Mist is expected to be returned to development approximately 07:30 18 Dec 08. -Don -----Original Message----- From: ncep.list.sp-announce-bounces@lstsrv.ncep.noaa.gov [mailto:ncep.list.sp-announce-bounces@lstsrv.ncep.noaa.gov] On Behalf Of Don Avart (sysadmin) Sent: Wednesday, December 17, 2008 5:54 AM To: ncep.list.sp-announce@noaa.gov Subject: [NCEP.List.SP-Announce] 17 Dec 08: Mist Maintenance Wed. Dec 17 2008: 24 hours Beginning 07:30 on Wednesday 17 Dec 08 Mist will be unavailable for system maintenance. At 06:30 all development LoadLeveler queues will be drained. At 07:30 any remaining jobs will be cancelled and all users will be logged off. Once system maintenance has been completed Mist will be turned over to production for parallel operations and testing. Mist is expected to be returned to development approximately 07:30 17 Dec 08. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@lstsrv.ncep.noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce ---------- 05DEC 2008 {This means that data flow may begin on nomad[1,3,5]. Note also that there will be another data flow interuption when Dew and Mist are interchanged later this month -- not yet announced.} Subject: [NCEP.List.SP-Announce] 5 Dec 08: Dew testing complete. Cron Available Date: Fri, 05 Dec 2008 05:11:15 -0500 From: Don Avart (sysadmin) To: NCEP.List.SP-Announce@noaa.gov Production testing of Dew is complete. Users are now free to restore crontabs. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@lstsrv.ncep.noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce ---------- 01DEC 2008 Beginning 07:30 on Wednesday 3 Dec 08 Dew will be unavailable for system maintenance. At 06:30 all development LoadLeveler queues will be drained. At 07:30 any remaining jobs will be cancelled and all users will be logged off. Once system maintenance has been completed Dew will be turned over to production for parallel operations and testing. Dew is expected to be returned to development by 07:30 4 Dec 08. {This means that the data flow to nomad[1,3,5] may be interrutped or late but the flow to http://nomads.ncep.noaa.gov should be OK.} ---------- 25NOV 2008 I hope everybody is aware of the switch of Operations to Dew on 3 December as part of the quarterly OS upgrade. Production will be on Dew for 2 weeks, so please check to be sure you are ready for the switch. {This means that the data flow to nomad[1,3,5] may be interrutped or late but the flow to http://nomads.ncep.noaa.gov should be OK.} ---------- 21NOV 2008 nomads6.ncdc.noaa.gov tenure as a backup server will end beginning on December 1! I have been informed by the NCDC group due to security limits on the nomads6 server, the backup server, will be turned off shortly. It will be unavailable for a time (most of December) but will return with http and ftp service only -- at least a first. Some NOMADS applications and GDS might be returned at some point but it will no longer be the backup server. By the end of this year, NOMADS real time model files will be on the high availability server at the WOC so we should not need such a backup. I encourage all to use that server, http://nomads.ncep.noaa.gov and also the development servers are in operation. Other applications on the backup server in the last week have made it impossible to update the GDS server as it is too busy so GDS is unlikely to return at all. A lot of the problems you have encountered this week has been due to competition (high load average) of other applications on the server in anticipation of the renovation. In the future, the server will continue to run ftp, http services and pdisp, ftp2u, http and GDS are operating but updating new files will be erratic, and will continue that way through the rest of this month. A copy of the reanalysis and other data sets should remain when the system is returned to operation in 2009 and still be available. ---------- 12NOV 2008 Update #3 All, Dew has returned to service and developers can use DEW. Transfer jobs are currently running and may take some time to catch up (overnight) with all model products. SDM Grant Newby ---------- 12NOV 2008 This just in from action director GWCB: 1600Z: i Folks, Power was lost to Dew this morning. IBM will have to fsck the file system once power is restored. I would not expect Dew to become available until very late today or tomorrow morning. DevonProd will also be unavailable until /com is synced between Mist & Dew. John Another outage for the Super-computer.... -------- Original Message -------- Subject: [NCEP.List.SP-Announce] DEW down due to power problems Date: Wed, 12 Nov 2008 09:28:21 -0500 From: SDM To: _NCEP.List SP-Announce All, The Dew supercomputer is currently down due to power problems at the Fairmont Facility. It is currently unknown how long Dew will be down. Updates will be provided as more info becomes avbl. SDM - Joey 08NOV 2008 ---------- Sorry I did not get this out on time ..... (*j*) -------- Original Message -------- Subject: [NCEP.List.SP-Announce] DEW is now available to developers Date: Thu, 06 Nov 2008 17:31:33 -0500 From: SDM To: _NCEP.List SP-Announce All, Dew is now available to developers. Please note that over 24 hours of production data needs to be mirrored over from Mist to Dew...this will take a while. Therefore a full current set of production data in /com on Dew will not be available until tomorrow sometime. -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Dew GPFS problems - update #5 Date: Wed, 27 Aug 2008 14:13:28 -0400 From: SDM To: _NCEP.List SP-Announce All, Dew continues to be unavailable to development..production baseline testing will begin shortly. We expect testing and data syncing will take most of the rest of the day to accomplish, so we expect DEW will be available to developers no earlier than 12z tomorrow morning. Sorry for any inconvenience this may cause. SDM - Mark Shirey/Grant Newby IBM has no estimate of when Dew will be back in service. It is highly likely that this could possibly be an extended outage. Also, because NCEP is in a critical weather day, all developers will be taken off of Mist in order to minimize risk to Mist. Sorry for any inconvenience this may cause. 02OCT 2008 We will be shutting down Nomad1 on Monday, 15Z, October 6th to setup additional storage. It should be down for a few hours. The GFS and NAM will be on the http://nomads6.ncdc.noaa.gov/ncep_data backup. All other data sets should be available during this period. ---------- 08AUG 2008 This following means no data flow for 8/26-27 for nomad[1,3,5] -- backups at nomads6.ncdc.noaa.gov/ncep_data or nomads.ncdc.noaa.gov and ftpprd. -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Dew GPFS problems - update #4 Date: Wed, 27 Aug 2008 09:38:08 -0400 From: SDM To: _NCEP.List SP-Announce All, GPFS is still not available on Dew...the file system check continues to run on Dew...it is believed that the fsck is running successfully...once the fsck is done and analyzed a more firm time will be able to be provided as to when Dew will be available again. Sorry for any inconvenience this may cause. SDM - Mark Shirey -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Dew GPFS problems - update #3 Date: Tue, 26 Aug 2008 21:56:28 -0400 From: SDM To: _NCEP.List SP-Announce All, IBM has no estimate of when Dew will be back in service. It is highly likely that this could possibly be an extended outage. Also, because NCEP is in a critical weather day, all developers will be taken off of Mist in order to minimize risk to Mist. Sorry for any inconvenience this may cause. SDM - Joe Carr ----------- 18 AUG 2008 Attention NCDC-NOMADS users, The NCDC NOMADS servers will soon undergo a reconfiguration that will change the way users access data. These changes will simplify and stabilize the Uniform Resource Locators (URLs) used across the NOMADS systems; and most importantly will remove the need for specific port numbers to access data. This way future NOMADS systems changes will be transparent to users. Users will need to modify any stored URLs they have for accessing the NCDC NOMADS suite of servers which contain specific references to port numbers. (Note: these changes will have no impact on the NCEP suite of NOMADS servers.) A transition period will be used to allow users to modify their access scripts. From the period Tuesday, August 19th, 2008 to September 01, 2008, the existing access points will remain in parallel with the new configuration, which is currently in place. On September 02, 2008 all URLs that contain port numbers will be discontinued. We urge users now to change their bookmarks, OPeNDAP applications, URL references in upcoming publications, or access scripts of any kind to remove all port numbers from their links and substitute the following: Service Current URL New URL Ensemble Probability Tool http://nomads.ncdc.noaa.gov:9091/EnsProb/ http://nomads.ncdc.noaa.gov/EnsProb/ GrADS Data Server (GDS) http://nomads.ncdc.noaa.gov:9090/dods/ http://nomads.ncdc.noaa.gov:9091/dods/ http://nomads.ncdc.noaa.gov/dods/ Live Access Server (LAS) http://nomads.ncdc.noaa.gov:8085/las/servlets/dataset http://nomads.ncdc.noaa.gov/las/servlets/dataset SRRS / NCEP Charts http://nomads.ncdc.noaa.gov:9091/ncep/NCEP http://nomads.ncdc.noaa.gov/ncep/NCEP Thredds Data Server (TDS) http://nomads.ncdc.noaa.gov:8085/thredds/ http://nomads.ncdc.noaa.gov/thredds/ ----------- 07 JUL 2008 The message below means there could be data delays on Wednesday, 7/9 for nomad[1,3,5], and a week later when the production switch is reversed. The backup http://nomads6.ncdc.noaa.gov/ncep_data and http://nomads.ncdc.noaa.gov are on a separate data flow and should not be affected (*j*) ------- Original Message -------- Subject: [NCEP.List.SP-Announce] 9 Jul 08: Production Switch to Dew Date: Mon, 07 Jul 2008 08:40:44 -0400 From: Don Avart (sysadmin) To: NCEP.List.SP-Announce@noaa.gov Production will be switched from Mist to Dew beginning 07:30 local Wednesday 9 July. All non-production classes on Mist will be drained at 06:30. At 07:30 all development users will be logged off of Dew, their LoadLeveler jobs will be cancelled, and their crontabs will be moved out of /var/spool/cron/crontabs and placed into their home directories. Once production has switched to Dew, all non-production classes will be resumed. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@lstsrv.ncep.noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce ----------- 01 JUL 2008 This means that the data flow on nomad[1,3,5] could be delayed. -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 2 Jul 08: Dew Maintenance 2 Jul 08: 24 hours Date: Tue, 01 Jul 2008 08:56:41 -0400 From: Don Avart (sysadmin) To: NCEP.List.SP-Announce@noaa.gov Beginning 07:30 on Wednesday 2 Jul 08 Dew will be unavailable for system maintenance. At 06:30 all development LoadLeveler queues will be drained. At 07:30 any remaining jobs will be cancelled and all users will be logged off. Once system maintenance has been completed Dew will be turned over to production for parallel operations and testing. Dew is expected to be returned to development by 07:30 3 Jul 08. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@lstsrv.ncep.noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce ----------- 10 JUN 2008 It appears that the cache on Nomad1 was overloaded and as a result cut off communications to the storage array. I have cleared the cache, updated the kernel in order to prevent a similar situation from occuring, performed disk checks and rebooted the system and it appears to be proper working order now. The system had been online for 145 days which may have contributed to the issue occuring. -Kyle -------- Original Message -------- Subject: [EMC #8905]: nomad1 problem? Date: Tue, 10 Jun 2008 09:26:52 -0400 From: EMC Helpdesk Reply-To: EMC Helpdesk To: Jordan.Alpert@noaa.gov References: <200806101217.m5ACH4kx028481@mailrt1.ncep.noaa.gov> Jordan, For some reason, Nomad1 is not recognizing the storage array attached to nomad1. We have checked the connections and the hardware indicates everything is fine. I am going to unmount the drives in a short while and run some disk checks. This will require a reboot of the system. I will perform the unmounts at 9:45 a.m. this morning and the reboot shortly there after. -Kyle ----------- 23 MAY 2008 This means that NOMADS may not have data flow for all or part of this weekend... All, What: Production will switch from Mist to Dew. When: 2345Z (7:45 PM EDT) Fri May 23. Why: Due to planned power maintenance on the IBM campus in Gaithersburg, Mist will be placed on back up generator at 10 PM Fri May 23 and remain on generator through Sat May 24 at midday. It is anticipated that Mist will remain up and viable through the period. A Critical Weather Day remains in place through 12Z (8 AM EDT) Sat morning. Due to the above factors, production will be switched to Dew. Developer Impact: Developers will be switched from Dew to Mist beginning at 7:45 PM Fri May 23 and remain on Mist through 7:45 AM Tue May 27. It is anticipated that production will switch back to Mist Tue morning at 7:45 AM. Duration: 84 hours SDM - Joe Carr Senior Duty Meteorologist Senior Duty Meteorologist NCEP Central Operations Production Management Branch -------- Original Message -------- Subject: [Fwd: warning: Production may switchover to dew this weekend.] Date: Fri, 23 May 2008 14:48:01 -0400 From: Tammy Braun Organization: NOAA To: _NCEP All EMC FROM GEOFF DIMEGO: It looks like there may be a switchover between mist and dew this weekend. Eric saw a message while logging in and he confirmed it with Doris Pan. IBM is doing work on the power system in Gaithersburg. We knew this because they are taking haze & hpss down. Apparently, they are worried the power work will effect mist and want to (AT THE LAST MINUTE) move production to dew. I've complained to Don Avart ... Since we are in Critical Weather Day, they won't be able to do the change until it is lifted - maybe Saturday! I can't change this. I am powerless. If you have critical jobs or crons that have to be switched by hand when there is a switchover, you might want to look in on the machine situation this weekend. ---------- 08 MAY 2008 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 8 May 08: Production Bufr Lib Test on Dew 18:00 - 03:00 Date: Wed, 07 May 2008 21:20:30 -0400 From: Don Avart (sysadmin) To: NCEP.List.SP-Announce@noaa.gov Beginning 18:00 local 8 May 2008 until 03:00 9 May production will be conducting a parallel bufr lib test. During this time Dew will be inaccessible to development users. Beginning at 17:00 local all LoadLeveler classes will be drained. At 18:00 any remaining jobs will be cancelled. Once maintenance has been completed and all systems testing and validation have completed all LoadLeveler classes will be resumed. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@lstsrv.ncep.noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce ----------- 07 MAY 2008 The following means that there may be some disruption in the data flow for nomad1, 3, 5 on 9MAY2008: -------- Original Message -------- Subject: [NCEP.List.SP-Announce] SP-ANNOUNCE LIST NOTIFICATION - Production switch to Dew from Mist Date: Wed, 07 May 2008 06:49:23 -0400 From: SDM To: _NCEP.List SP-Announce All, What: Production will switch from Mist to Dew. When: 1045Z (645 AM) Wed May 7 through 13Z (9 AM) Fri May 9. Why: NOAA COOP Exercise. Developer Impact: Developers will switch from Dew to Mist for the duration of the period. Duration: ~50 hours NOTE: More information is forthcoming on the scheduled bufr library test on Dew which was scheduled from 6 PM Wed May 7 through 3 AM Thu May 8. SDM - Joe Carr ----------- 28 APR 2008 A (premature) switch back to dew development after emergency: Original Message -------- Subject: [NCEP.List.SP-Announce] SP-ANNOUNCE LIST NOTIFICATION - Availability of Dew Date: Mon, 28 Apr 2008 14:23:17 -0400 From: SDM To: _NCEP.List SP-Announce All, What: The mirroring of production data from Mist to Dew continues. When: The mirroring process is expected to last until about 29/0000Z. Why: The process is required as a result of an emergency switch to Mist Sunday morning April 27, and the power down of Dew at that time. Developer Impact: Developers are not expected to have complete access to all data until the mirroring process is complete. SDM - Bill Kneas ----------- 27 APR 2008 Following from Central Operaions indicating that there will be no data flow for nomad1, 3, 5. Use nomads6.ncdc.noaa.gov and nomads.ncdc.noaa.gov View message header detail SDM Sent Sunday, April 27, 2008 12:29 pm To "_NCEP.List SP-Announce" Subject [NCEP.List.SP-Announce] SP-ANNOUNCE LIST NOTIFICATION - Dew Out of Service All, What: Due to a power problem at the Fairmont Site the Dew computer has been powered down. When: Power loss was approximately 1130Z (7:30 A.M) Sunday April 27, 2008 Why: Dew was shut down. The power interruption caused a loss of cooling to the facility. Developer Impact: Developers will not have access to Dew until further notice. Duration: Unknown SDM - Bill Kneas ----------- 09 APR 2008 The message below implies that on 20080409 nomad1, 3, 5 will not receive data. We have switched development and production machines last week (sorry I did not announce this), and switching back will casue no data access for a day. -------- Original Message -------- Subject: [NCEP.List.SP-Announce] SP-ANNOUNCE LIST NOTIFICATION - Planned Mist maintenance work today Date: Wed, 09 Apr 2008 07:19:45 -0400 From: SDM To: _NCEP.List SP-Announce All, What: Mist maintenance work has begun. IBM began draining the Mist nodes at 06:30 AM and will take the entire system by 08:00 AM. When: 06:30 AM Wed Apr 9 through approximately 08:00 AM Thu Apr 10. Why: Quarterly maintenance on Mist Developer Impact: Developers will not have access to Mist during the maintenance window. Duration: ~24 hours ---------- 25 MAR 2008 nomad3 is returned to service with a grib2 feature for ftp2u called g2sub which we are testing on GFS output. The grib1 holdings are still present as before. Also: -------- Original Message -------- Subject: [EMC #8090]: stale nfs handle Date: Tue, 25 Mar 2008 12:02:54 -0400 From: EMC Helpdesk Reply-To: EMC Helpdesk To: Jordan.Alpert@noaa.gov References: Jordan, Nomad5 has been updated and rebooted. -Kyle ----------- 21 MAR 2008 nomad5 will be rebooted (after over 285 days of running) on Tuesday, March 25, 2008 at 1300Z. We did not boot it last Dec because nomad3 was rebooted and did not come back and we had to deal with it. (*j*) ----------- 18 MAR 2008 All, (The following means that nomad1,3,5 data sets will be delayed/missing 3/20-21. Use nomads6.ncdc.noaa.govi/ncep_data or nomads.ncdc.noaa.gov.) Dew maintenance work scheduled for Wed Mar 19 is delayed by one day. Dew will not be available to developers during the maintenance window. When: 10Z (6 AM) Thu Mar 20 through 13Z (9 AM) Fri Mar 21. Why: Critical Weather Day was declared through 12Z (8 AM) Thu Mar 20. Developer Impact: Developers will not have access to Dew from 10Z (6 AM) Thu Mar 20 through 13Z (9 AM) Fri Mar 21. Duration: 27 hours SDM - Joe Carr Senior Duty Meteorologist Senior Duty Meteorologist NCEP Central Operations Production Management Branch ----------- 29 JAN 2008 After security/firewall problems are worked out (any day now) a new nomads server is coming up at address: http://nomad1.ncep.noaa.gov nomad1 contains is own independent copy of the NCEP reanaysis (unlike nomad5 which pointed to nomad3). 0.5 degree GFS and SREF are already present and operating with more data sets to follow. Tests have been completed with these datasets, and we will work to get the rest of the datasets on nomad1 as well as resolving security/firewall problems so outside users can use the server. The server should be accessible soon as it is in the hands of sys admin security. We hope nomad3 will return to service but we do not know what is keeping it from restructuring to raid5 with new drives. nomad3 server which holds 2/3 of NOMADS real time data has not been working since Dec 24 2007. nomad3 has been "broken" since xmas when the power was found off and a subsequent restart showed a bad drive. New disks were placed in the raid5 but the raid would not restructure the disk meaning that the system was no longer a "raid(5)" and the next disk that was lost would cause all the system and data to be lost. We have saved off the code/data and the sys admins are working to report a new system. nomad5 continues to hold most of the data but reanalysis and some other data sets are not present. It has been running as a lone server since Dec 24. NCO (Last June) decreased the bandwidth of all NOMADS servers because of the possibility of NOMADS interfering with operations whenever an IBM-SP swap of prod and dev needs to be done. Even though an IBM-SP swap does not happen often, NCO felt that the increased all around usage of the network required that NOMADS bandwidth remain throttled. This may contribute to users having problems downloading data that is present. Some data like SREF is not on the nomads6 backup at NCDC since band width has been decreased. nomad3 continues to be worked on by EMC sys admin. Efforts to make NOMADS operational and move applications to the WOC/ftpprd with 24/7 service and improved reliability and band width continue and implementation is on schedule for end of this summer. ----------- 11 JAN 2008 -------- Original Message -------- Subject: [EMC #7042]: nomad3 not responding Date: Thu, 10 Jan 2008 23:44:54 -0500 From: EMC Helpdesk Reply-To: EMC Helpdesk To: Jordan.Alpert@noaa.gov, Kyle.Nevins@noaa.gov References: Jordan, Just an update, after talking to Yinka up in NCO, he agreed that one of the disks should be replaced. The replacement disk that I put into Nomad3 was a used disk that was labeled as a replacement. He and I are going to wait until tomorrow to see if the disks arrive, if they do not, then we are going to recompile the driver and reinstall it. -Kyle ------------------------- 07 JAN 2008 Sorry. On Friday PM nomad3 would not answer or allow a login. A message was sent to emc.helpdesk@cerberus.ncep.noaa.gov. -------------------------- 04 JAN 2008 16Z nomad3 is operating. GDS/OPENDAP(DODS) will come up a few hours (waiting for the data logjam to ease). -------------------------- 31 DEC 2007 -------- Original Message -------- Subject: [EMC #7042]: nomad3 not responding Date: Mon, 31 Dec 2007 11:11:56 -0500 From: EMC Helpdesk Reply-To: EMC Helpdesk To: Jordan.Alpert@noaa.gov, Kyle.Nevins@noaa.gov References: We are currently rebuilding the two raid5s on the system as the disks were reporting not in use and not that they were dead. We are also running another fsck on /raid2. We have shut down all network connections on the machine to ensure that no outside interference occurs. We will keep you updated upon further details. -Kyle ------------ 27 DEC 2007 nomad3 status: From EMC Helpdesk: 15:34EST: The system was rebooted again and the root filesystem and the raid1 filesystem checked out as clean but the raid2 is still running the file system check. We will let that run overnight and may need some input tomorrow. Once that has completed the machine should be back online. So our target time for Nomad3 to be back online is tomorrow afternoon. fsck is still running as 0800 Thursday on file system #2. Unfortunately,I cannot give you an accurate time frame for the disk repair. However,Kyle should be in by 0930, once he arrives we will make this issue our focal point today. ----------- 26 DEC 2007 -------- Original Message -------- Subject: [EMC #7042]: nomad3 not responding Date: Wed, 26 Dec 2007 09:45:58 -0500 From: EMC Helpdesk Reply-To: EMC Helpdesk To: Jordan.Alpert@noaa.gov References: <200712261259.lBQCx78F028930@mailrt1.ncep.noaa.gov> When I arrived this morning, I received warnings regarding nomad3, after inspection in the sever room, I noticed that nomad3 was not powered on.Nomad3 is currently powered up, however, disk checks will delay the progress of it being reachable for now. ----------- 28 NOV 2007 28 Nov 07 Mist outage extended 6 hours Due to unforeseen circumstances, development access to Mist will be delayed an additional 6 hours. Upon completion of testing, notification will go out via ncep.list.sp-announce@noaa.gov. ----------- 27 NOV 2007 From ncep.list.sp-announce@noaa.gov .... 24 hour scheduled outage on Mist 11/27/07. Beginning 06:30 on 11/27/07 all jobs on Mist will be drained. At 07:30 all users will be logged off and all remaining LoadLeveler jobs will be cancelled. Upon completion of maintenance and testing, a parallel production test will be run. Development access to Mist will be restored approximately 07:30 11/28/07. Notification will go out via ncep.list.sp-announce@noaa.gov. (This means that on 27NOV2007, real time NCEP NOMADS servers, nomad3 & 5 may have an interruption in data flow during this time.) ----------- 06 NOV 2007 This (below) means data flow on nomad5 and 3 may be late on 11/07/2007 for a number of hours before 12z: Dew will be unavailable beginning 04:00 on 11/07/07. All non-production jobs on Dew will be drained beginning 03:00 local. Beginning 04:00 any remaining jobs will be cancelled and all users on Dew will be logged off. Upon completion of maintenance and system testing and validation, a 6 hour parallel production test will be run for the the 12Z cycle. Upon completion of the 12Z test cycle users will be allowed on Dew. Notification will go out via sp-announce@noaa.gov upon completion of maintenance and testing. ----------- 31 OCT 2007 nomad5 has been up/running 139 days and nomad3 has been up 98 days. We like to reboot servers every quarter so at 3PM today we will reboot nomad3 and then nomad5. ----------- 15 OCT 2007 Change of date/time see below and 09 OCT 2007... ********* UPDATE: System Maintenance on Dew Pushed Back 1 Week *************** A 24 hour maintenance period is scheduled for Dew on 10/23/07. Non-Production jobs on Dew will be drained beginning 07:00 local. Beginning 08:00 any remaining jobs will be cancelled and all users on Dew will be logged off. Upon completion of maintenance and system testing and validation, users will be allowed back on Dew. This work is not anticipated to take the entire 24 hour maintenance period. Notification will go out via sp-announce@noaa.gov upon completion of maintenance. ----------- 09 OCT 2007 The following means that on 16OCT2007, NCEP NOMADS servers nomad3 & 5 most likely will have an interruption in data flow during this time. http://nomads6.ncdc.noaa.gov and http://nomads.ncdc.noaa.gov should be unaffected: ------------- Subject: [NCEP.List.SP-Announce] 16 Oct 07: Dew Scheduled Maintenance 16 Oct 2007 Date: Tue, 09 Oct 2007 08:45:58 -0400 ------------- A 24 hour maintenance period is scheduled for Dew on 10/16/07. Non-Production jobs on Dew will be drained beginning 07:00 local. Beginning 08:00 any remaining jobs will be cancelled and all users on Dew will be logged off. Upon completion of maintenance and system testing and validation, users will be allowed back on Dew. This work is not anticipated to take the entire 24 hour maintenance period. Notification will go out via sp-announce@noaa.gov upon completion of maintenance. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@lstsrv.ncep.noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce ------------------------------------------------------------------ 11 SEP 2007 There are changes to the GFS post on 9/25. See http://www.nws.noaa.gov/om/notification/tin07-59gfs_upgrade_unifiedpost.txt for an official statement. The GFS 0.5 degree "master" file, on 9/25, which was a GRIB1 file, will not be made in the same way anymore, but there will be a new file to replace it on the IBM-SP. The file that is currently copied from the dev IBM-SP machine known as "0.5 degree master" with 48 levels and land surface and other fields will not be there any more.... but there will be a replacement. There will be a feed to NCDC through ftpprd 0.5 degree file: The file on ftpprd will be composed of the ...0p5... file (sometimes called the "military" file) which has 28 layers compared to the 48 layers of the nomad3 master file (and some land surface fields), and the difference between these two files in one ftpprd file so it will be --should be -- the same, except the new file is in GRIB2. NCDC should get this on their ingest system and the potential for, and planning for, a 0.5 degree data set archive there, the first of its kind, for this data set. In addition, I hope to have a copy on the real time backup server, nomads6.ncdc.noaa.gov in GRIB1 so ftp2/4u and DODS works, as well as for real time backup of nomad3. These files will not be available to the public from ftpprd. On nomad3 & 5, starting 9/25, the old 0.5 degree master file will be replaced by the ...pgrb2... (the "military" 0.5 degree) and the difference between this "file and the old master (in) from a separate file, .....pgrb2b.... which is being placed on the IBM-SP. Our plan is to get both GRIB2 files, change them to grib1, and append them, and name them so the same "master" file data set will continue on nomad3. The name "master" now refers to an internal (native model vertical and horizontal grid) GFS gaussian model (hybrid) vertical coordinate grid or GFS "physics grid" file (this is not a lon/lat pressure GRIB1 file!). It is unfortunate that we also used that name for the 0.5 degree pressure lon/lat grid. Ultimately in the future, all GFS files/products will be posted/made from this master and unify the post processing code for all NCEP models. The NOMADS goal here is to make this transition transparent. We will keep our 0.5 master file name the same and the contents should also be the same. ------------------------------------------------------------------ 12 July 2007 Recalling the 03 July 2007 announcement from the SP: > 16 July 07:30 local, production will switch from Dew Back to Mist. This means on July 16, at 0730 the data flow will be interrupted and that data may be delayed or unavailable for a time on nomad3 and nomad5. ------------------------------------------------------------------- 03 July 2007 This means data may not be present on NOMADS for this period! http://nomads6.ncdc.noaa.gov/ncep_data backup server will continue to operate. -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 10 Jul 07: Updated CCS Maintenance Schedule Date: Tue, 03 Jul 2007 10:38:13 -0400 From: Don Avart (sysadmin) To: NCEP.List.SP-Announce@noaa.gov Beginning 07:30 local Tuesday July 10 through 07:30 local Thursday July 12, Mist will be unavailable for scheduled system maintenance. In the event that work concludes early, the system will be returned to the users and notification will go out via ncep.list.sp-announce@noaa.gov. Upon completion of this maintenance, Mist will continue to operate as the development cluster. 16 July 07:30 local, production will switch from Dew Back to Mist. 23 July 07:30 local, LSI patch will be applied to Dew Storage. This work is concurrent and should not impact users on Dew. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@lstsrv.ncep.noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce -------------------------------------------------------------------- 21 JUN 2007: Updated ftp2u to 0.8.0 beta (1) reduce incidence of premature "done" of web pages (I think the server should be updated to fix this problem.) (2) code cleanup (3) remove option to send files to user updated reanalyses and gdas ftp2u only. other updated code is in !wd23ja/cgi/ Wesley -------------------------------------------------------------------- 15 JUN 2007 Subject: Re: NOMADS Network Usage Date: Thu, 14 Jun 2007 16:29:21 -0400 From: Luis Cano Organization: DOC/NOAA/NWS/NCEP Louis,: Here is our current status. We have implemented the rate-limiting between the WWB NOMADS and the NOAA NOC (Internet). We see relief with the infrastructure and this component of the infrastructure is now better configured to allow proper sharing of resources. Jordon, Please let me know if there is any feedback from customers of degraded services. Thank you, Lou Luis Cano wrote: > Louis and etal: > > We are experiencing a two-fold increase of WWB NOMADS traffic to the > Internet that started two weeks ago. This usage is placing other > requirements that share the same networks to NOAA NOC at risk. In > addition, we are also experience higher-than-expected latencies with the > CCS production dataflow to the TOC. > > Here is our plan: > > 1. Today at 11:00 Eastern, we will conduct a test of rate-limiting the > NOMADS (DMZ) to an acceptable rate. This will allow NOMADS to better > share common infrastructure with other requirements. This change has the > potential of increasing transfer times to NOMADS customers. The change > will become permanent assuming a valid solution. > > 2. In parallel, we are investigating the lower latency issues with the > TOC. We will have a better understanding of this problem by this afternoon. > > I'll send a follow-up status Email by 3:00. Please call my cell if there > are questions: 202-345-7384. > > Thank you, > > Lou > -------------------------------------------------------------------- 31 MAY 2007 20070531: nomad3 and nomad5 servers are back on line, that is access to the servers has been restored. Data was being transmitted to nomad3 and nomad5 during the outage period. Most of the model data is present, back to (and before) May 14, except for a few days missing, and these appear to be from external problems with the IBM-SP when operations had to move to the development system, or when system administration had taken the servers. In all, the problem seems to have been in firewall conflicts that happened with the firewall settings. Some items that are still not operating: The network communications between servers, nomad5 and nomad3 are not yet up so a few data sets like the 0.5 degree master is not available for datasets shared between servers. Please Check both nomad3 and nomad5 for data sets until this can be resolved. Join the NOMADS (NCEP) list server https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.nomads-announce to get updates about problems and changes. -------------------------------------------------------------------- 01 MAY 2007 GFS implementation day, 01MAY2007 in case you forgot. GFS native history file changes: The GFS restart or history file (also called sigma or sigma spectral) is changing due to changes in the vertical coordinate as described below. This file is considered an internal file and is not recommended for public use. These changes should not impact most NOMADS users. This implementation of the GFS goes into operations 01MAY2007 12Z. An excerpt from "A guide to using the new GFS history file" located at http://wwwt.emc.ncep.noaa.gov/gmb/para/guidehistory/ is below: The vertical coordinate of the operational GFS forecast model will become a hybrid sigma-pressure coordinate in 2007. This will affect the file structure of the native GFS history files used by many other applications. In addition, the GRIB surface flux files will have several more fields. The implementation will not affect the GFS surface files or the posted pressure files {pressure-GRIB files}. The GFS restart files will be in an even newer format to accommodate coming anticipated changes to the GFS in succeeding implementations. No application outside of the global system needs to read GFS restart files In the near future, the GFS will output Gaussian grid files as the history files. Unfortunately, we are not ready yet to make them operational, so yet another conversion will be necessary when these files are implemented. -------------------------------------------------------------------- 23 APR 2007 There was an unannounced (power) outage in our central computer, and all development and data flow was down most of the day. The following day some model output files were also missing. It caused a gap in some data on NOMADS. We mention it here, a week later, for completness. -------------------------------------------------------------------- 21 FEB 2007 All on the list; The NCEP Operations switched to the development system and is having a problem reseting the firewall access for NOMADS data flow. NCO is working on the problem. I have shifted into a backup mode (ftpprd) and will try to get the 0.5 and nam fields operating tomorrow (2/22). The 1x1 should be OK on nomad3 and 5. Jordan -------------------------------------------------------------------- 25 JAN 2007 Large scale super computer changes are taking place at NCEP. As you can see from the message below (date stamp included) it is out with the old super computer system and in with the new. The new Dew supercomputer, as it is called, did not have access to NOMADS servers until Jan 24 so we are working to get the data flow moving again. It would have been better to have the data flow running on the old Blue supercomputer for a few days overlap with the new system so we could make the move transparent, but as NOMADS is an experimental prototype this was not to be. I can report that there has been progress on making NCEP Real Time NOMADS servers have operational data flow and operational user client applicaitons. This may happen by 2008. NOMADS has tried to keep the most used data sets like GFS (1x1) (0.5) and NAM up to date first but some of the less used data sets, like the MRF (legacy) 2.5 degree data set will not get updated for awhile. -------- Original Message -------- Subject: nomads Date: Wed, 24 Jan 2007 14:43:57 -0500 From: Joe Carr To: Jordan Alpert CC: Brent Gordon , John Ward Jordan, NOMADS has been turned off on both Blue and Mist. It is allowed on Dew. If you have any problems, please contact Matt Springer or Cameron Shelton. Thanks, Joey -------------------------------------------------------------------- 06 Dec 06 NOMADS issues. > 1. NCO will switch to Blue for operations tomorrow [06 Dec see below]and when that > happens we will not have enough bandwidth to support both operations > and NOMADS traffic. NOMADS will be out of service until Friday. Even > when NOMADS is back on line, only a little more than 50% of the NOMADS > data is available [on alternate offical servers]. > > 2. This will be an ongoing problem until the "new" TOC is up and > running. They are currently using their old system. Once the TOC is > up and running the NOMADS data will be stored at the TOC's Web > Operations Center and NCDC can pull nearly 100% of the NOMADS data > from the Web Operations Center. > > 3. The CIO is having major problems getting the new system fully on > line. If I recall correctly, the new systems was supposed to be fully > operational in Jan. 06. Ben (NCO) has agreed to send people to the > TOC to help them resolve their problems. > ----------------------------------------------------------------------- 05 Dec 2006 -- NOMADS DATA FLOW OUTAGE -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Bue/White: Production Switch to Blue Date: Tue, 05 Dec 2006 15:22:18 -0500 From: Don Avart To: NCEP.List.SP-Announce@noaa.gov Due to a required network outage in Fairmont between midnight and 6 am on December 8, production will be switched from white to blue beginning 7 am local on 6 Dec until 7 am 8 Dec. Beginning at 5 am 6 Dec. LoadLeveler queues will be drained, blue will be rebooted at 6 am. Only NCEP Production and operational accounts for the NCEP Service Centers will be permitted on the system. All user accounts, cron, and interactive access will be denied. NFS will only be mounted on interactive nodes. White will remain available for user access except during the network outage. ---------------------------------------------------------------------- 11 Nov 06 IBM-SP (Blue) data flow returns to nomad3 and nomad5 Many of the data sets you need are now available on nomad3 and nomad5. Data flow ramps up to almost normal for nomad3 and nomad5 6NOV. All parties have agreed to a long term plan for making NOMADS Operational. Some data sets are not yet transmitted, such as olr, sst, rtofs, sref etc, and we are working to get these back to normal, perhaps in a week. Check data on nomad3 or nomad5 before giving up. I can not promise that missed data in all cases will be replinished but we will see what we can do. In the short term, the data flow to the backup server at National Climate Data Center (NCDC) will not resume from the "dev" machine, but can be pulled from the ftpprd service. nomads6.ncdc.noaa.gov will still operate for archived data. (We hope that) ftpprd holdings will be improved to have more complete data sets with the goal of duplicating the content of variables, levels, times, that NOMADS presented before the outage. We will write programs and attempt to populate nomads6.ncdc... from ftpprd but it will take a little more time. Having data at the backup server, nomads6.ncdc... in real time, as well as the NCEP servers, nomad3 and 5, kept these systems from becoming over extended. Thank you all for your support and patients. The message I want to send is that NOAA management recognizes the importance of getting data out to users of all categories and is committed to making NOMADS Operational, 24/7/365. It is your requirements that are driving this process. ---------------------------------------------------------------------- 26 Oct 06 Blue/White: Production Switch to Blue A production switch from White to Blue will occur beginning 6 am Thursday October 26 and ending 2 pm Thursday October 26, (8 hours). During this time period, only NCEP production and operational accounts for the NCEP Service Centers will be permitted on the system. All nfs mounted filesystems will be dismounted from compute nodes including /u (user home directories). NFS will be available on Interactive and Class 1 nodes. White will not be available for development use during this time period. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@lstsrv.ncep.noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce -------------------------------------------------------------------- announce.txt : 20061016 -------- Original Message -------- Subject: Production Switch to White Date: Sun, 15 Oct 2006 17:46:02 -0400 From: Susan.Fenwick@noaa.gov To: ncep.all.hands@noaa.gov CC: John.Ward@noaa.gov > Corruption of the GPFS file system on White prevented the schedule > switch of Production to White on Saturday. GPFS has been > restored, but > the entire Production file system was lost. The file system is > currently being mirrored from Blue. > > Production is expected to be switch to White by 12Z on Tuesday, 17 > October. > -------------------------------------------------------------------- announce.txt : 20061012 FYI SJL -------- Original Message -------- Subject: Access To Blue Date: Thu, 12 Oct 2006 06:57:02 -0400 From: John Ward Organization: NCEP/NCO/Production Management Branch To: Stephen Lord , Jim Laver Steve & Jim, We were not able to turn on the limited list of users on Blue yesterday. We have been pushing the limit on the system this week, with on time delivery at only 94%. In addition, we have had unexpected network contention with Mist, which caused lengthly delays in delivering products. We feel that adding any additional load to Blue will cause additional delays in production and on the network. The good news is that work is ahead of schedule in Fairmont. There is a chance we will have White back on line 24 hours earlier than expected. We'll have a better estimate latter this morning. John ------------------------------------------------------- announce.txt : 20061009 All: Dave Michaud has informed me that the earliest date when EMC jobs will be turned on is Tuesday 10 October. Earlier dates proposed by Dave were rejected by NCO Configuration Board. Dave will contact you individually regarding turning on your jobs. If he doesn't contact you, your stuff will not run. I'm sorry for this situation. It is out of my control. Please pass the word if I have left someone off this email list. SJL