F Integration plan for the off-line database servers. =================================================== August 4, 1999 Current configuration (July '99): ----------------------------------------- The current configurations of the production and development oracle servers are outlined below. The production server is a SUN E3500 which has 5 backplane slots and the development server is a SUN E4500 with 8 slots. We originally deployed the larger machine as the development server intending to test various configurations of cpu, I/O and memory. The two machines have approximately 80 GB (dev) and 100 GB (prod) of storage on f/w scsi. The systems are running the SUN Solaris v2.6 OS. There are currently instances of Oracle 8 and 8i running on the development server, and a production and integration instances of Oracle 8 on the production machine. There is one Apache web server running on the dev machine and two Apache (integration and production) servers running on the production server. There is a full, hot, backup done for the production db daily, and cold backups are done once a week for the development database instances. The backup strategy in place employs the DLT 4000 stacker connected to d0ora2. The hot backups use Oracle's RMAN to make a backup image to disk. Then FMB is used to copy the disk image to tape. Following is a summary of each hardware configuration. d0ora1: E4500 development server Storage 16 slots of a shared 24 bay 3.5" disk chassis scsi#1 has 5 x 9 GB drives scsi#2 has 4 x 9 GB drives CPU 2 x UltraSPARC I Memory 1 GB I/O 2 I/O S-Bus board w/ 2 x f/w diff. scsi host adapters (each i/o card has 2 sbus channels, with 1 sbus on one, and 2 on the other) d0ora2: E3500 production server Storage 8 slots of a shared 24 bay 3.5" disk chassis scsi#1 has 5 x 9 GB drives scsi#2 has 1 x dlt4000 stacker (for backup) 1 24 GB drive in shoebox scsi#3 has 9 bay lvd chassis on rancho converter 1 x 36 GB drive CPU 4 x UltraSPARC I Memory 1 GB I/O 1 x I/O S-Bus board w/ 4 x f/w diff. scsi host adapters (each i/o card has 2 sbus channels, with 1 sbus on one, and 2 on the other) Plan Through March 1, 2000 -------------------------- We plan to continue to grow and test the system throughout the next several months. We will add a FC RAID disk subsystem and backup robot to the E4500, and reconfigure the current SCSI disk so it is primarily on the E3500. The RAID disk system will consist of a FC attached dual raid controller system with Fibre Channel disks. The system will be configured with RAID 0, 1, and 5 components suitable for the range of performance to meet the database requirements. We will conduct systematic tests to understand confirm the redundancy, RAID disk rebuild and hot spare features and to better understand the memory, cpu and I/O needs. The current plan is to have 3 x (4+1) raid 5 sets (usable space of 216 GB), one RAID 1 morror 18 GB set (usable space 18GB), and 3 spares (total 20 x 18 GB disks) which will provide a. Also, the system we are planing will have 12 x 9 GB which will be configured into 4 mirrored sets and a 2 disk stripe. The 18 GB spares can be used as 9 GB spares if a 9 GB drive fails. The backup robot will be a 120 slot Exabyte robot with 2 Mammoth I drives. This robot is expandable up to 200 slots and 10 Mammoth II drives. All of the tape drives will be attached to the E4500 and the Legato software will be installed allowing this machine to control the robot. Backup of the E3500 will be via the network. If the 10/100 base-T network interface is too small we may need to upgrade. Following is a rough plan of activities to be pursued over the next 9 months. Additional details about the systems and strategies are included in the next section. 8/1/99 Develop test suite to explore memory, cpu and I/O usage Evaluate hardware configurations and review all components. 9/1/99 Detailed plan for -E4500 initial RAID configuration -E4500 tape drive and robot configuration -E3500 scsi disk configuration 10/1/99 RAID system delivered -connect to E4500 machine. -Configure RAID 0, 1 and 5 disk -Test various configurations for throughput. -Test dual redundancy. -Test raid fault recovery -Test hot swap features Backup ATL delivered. -Install on E4500 and test. Web server node delivered or identified -Prepare web server node. 11/1/99 E4500 - RAID system and ATL ready for use. -Deploy RAID disk sets. -Begin setting up backup to use ATL and Legato software -Begin cpu, memory, I/O, and transactions per second tests. -populate large test db E3500 -Re-deploy current scsi disk as it becomes available. (-work toward moving all scsi disk to E3500 so disk chassis is no longer shared between 2 machines. Need a plan, this may be difficult.) 12/1/99 Backup strategy using Legato in place for E3500 and E4500. E3500 has sufficient scsi disk for production needs through March 2000. 1/15/00 Make preliminary estimates for memory, cpu and I/O needs. Start working on upgrade plans for E4500. Determine if additional servers are needed. 3/15/00 Begin E3500 <-> E4500, produstion <-> development transition. 4/1/00 Start requisitioning additional cpu, memory, et cetera. E4500 transitioned to production and integration server E3500 transitioned to development server. 4/15/00 Revisit DB size requirements -Determine how much additional disk is needed in FY00. -Start requisitioning disk. The following studies are needed to properly understand the system and plan for additional hardware: 1. Tests to understand CPU needs. Submit multiple transactions/queries. 2. Tests to understand memory needs.Submit multiple transactions/queries. 3. Sizing and use of Integration db. 4. Understand the resources needed to run the web servers. 5. Determine which additional servers,sam and other, need to run on the production server machine. Three year growth and usage plan --------------------------- Following is a description of the plan for the storage, backup, CPU, memory and I/O for the Ora systems. The production server will be a critical device for the D0 off-line and its availability will be 99.9%, not including carefully scheduled down times for upgrades. In order to achieve this level of availability, and to make the system more easily maintained, we are paying close attention to redundancy (dual power supplies, dual redundant raid controllers, dual IO boards, multiple cpu boards, etc.). Also, we plan to have the entire system on UPS (uninterrupted power supply). In order to reduce security risks, we need to move the web servers off of the production oracle machine to a dedicated server node. Growth needs for the system are projected in Table I. There are still many poorly understood and/or missing details. However, this is an attempt to draw a road map for the next three years which we will continue to update as we better understand the problem. DISK ---- We plan to add a FC RAID disk system to the Oracle servers. This system will be equipped with dual raid controllers which will be configured to be fully functional, at reduced throughput, if one of the FC I/O channels (raid controller, FC connection, host adapter, IO board, etc.) fails. All data and index table space will be RAID 5 with a 4+1 (4 data, 1 parity) disk configuration. The total needed usable RAID 5 space is projected to be 1.5 TB at the end of the running period. The size of the needed RAID 1 areas should not increase over their initial allocations, unless we decide that certain heavily used table or index spaces require performance higher than the RAID 5 offers. Oracle requires the following special administrative storage locations. This disk should be mirrored (RAID 1) to provide a highly available and lower maintenance system. Control files: Few hundred MB files which can be configured to have copies on multiple disks. Typical arrangement here is to have 3 disks. Redo logs: Very small, ~2 MB, files on disks, typically copied to two disks. These disks can be shared with the control file disks. Archived Redo: The archived version of the redo logs used to recover the db in case of failure. Usually kept for 2 or 3 days. Size depends on activity level and can be large, maybe many GB. 18 GB disk is good, 9 may be ok. Roll backs: Storage for uncommitted data. May be a few hundred MB. Temporary storage is needed for the temporary table space. This disk is used for swap space when memory is exceeded and should be highly perform-ant and consequently might be 2 or more disks striped together. System disk will also be mirrored (not sure about exact configuration). Backup ------ The data will be fully backed up on a weekly basis, with daily incrementals. There will be 4 sets of weekly backup tapes, and consequentially, every 5th week the media used 4 weeks prior will be reused. If the database grows to 1.5 TB, the full backup will require will 30 tapes for each backup and the full set of full backup tapes for the entire 4 week cycle will be 120 tapes. A similar rotation will be used for the daily backups. It is anticipated that a single 50 GB cartridge will be sufficient for each daily backup, and thus there will be 24 daily tapes. Therefore, a tape library with at least 134 slots, plus a few for cleaning tapes and possibly system backup will be needed. We plan to use the Oracle RMAN (repository manager) which enables us to perform "hot" backups and also reduces the size needed for the storage significantly (not sure exactly what the factor will be). This configuration requires that a separate machine running a repository manager server be provided and maintained. This is not a major resource machine, but must be reliable, and on UPS (Uninterrupted Power Supply). We are currently to purchase Legato software to use to manage the file and volume locations in the library for the backup. The software which is being purchased includes the following products ( my understanding of this is small, so details are sketchy and maybe incorrect): 1. Network edition for Unix Update service This will allow one host (4500) to backup data stored on it, and, over the network, data stored on up to 5 clients. 2. Auto-changer Software Module 1-256 slots with update service This is needed to use the 200 slot robot. 3. Oracle Module-UNIX client with Update Service This is the part which interfaces with Oracle. One license is needed per client (machine I think). We need 2 licenses. All of the backup drives in the robot will be connected via (lvd ultra scsi2 for Mammoth II??) scsi. The E3500 (to be used for development) server will be backed up via the network. Additional servers can be backed up over the network for about $5k additional license fee each. (The hardware configuration for the tape drives needs to be worked out. Need to be sure have enough scsi busses for the drive throughput. ) CPU, Memory, and IO -------------------- We can only guess at the needs for CPU, Memory and I/O. Some details about the anticipated usage patterns are included in the table, but it is difficult to understand how these translate into actual hardware without testing. In the next several months, we plan to rigorously explore our needs in these areas by processing test transactions and queries discussed previously. We can upgrade the UltraSPARC 336 MHz to UltraSPARC II 400 MHz CPU. We cannot mix the two on the same machine. Two cpu's and up to 1GB of memory can be added to each cpu/memory board. Therefore, one might imagine the maximully configured E4500 with 2 I/O boards and 6 cpu/memory boards with 12 CPUs and 6 GB memory. Growth strategy and usage patterns ---------------------------------- Table I shows anticipated growth strategy and estimated usage patterns for next 3 years. Phase 1 will be the period leading up to the run, the system will be used for Monte Carlo data and testing. By Phase 2, the system will be populated with a few hundred million events, in 100,000 files and the backup and performance will be critical. Phase 3 will begin to push the size limitations of the system and really test the performance for reads and writes. The final phase is expected to be in January of 2003 and at this point most of the activity should be reading the database. However, should the data taking run continue beyond this point this will change. The table shows several parameters which characterize the size and usage patterns. The sizes are estimates based on Oracle's Designer 2000 tool. They are based on estimated number of rows per table and the current table design for the SAM event and file database. For example, the events table, which is being calculated as holding 999,999,999 rows, maxes out at 692G, which is about 1/2 all the space required. It is assumed that the additional storage needed for the on-line, calibration, trigger, etc. will be relatively small, probably less than 100 GB total. Usage applications are divided into three categories: transaction processing, batch processing, and query processing. Typical examples of each are included below the table. Additional work is needed to understand how these applications translate into disk I/O s and throughput. The number of concurrent users is a very rough guess, but accounts for the fact that we expect most of the db access to be through middle-ware servers (for example the sam_db_server) reducing the number of concurrent database users from the number one might naively expect. Table I. Growth and usage patterns for D0 off-line database. Parameter Phase 1 Phase 2 Phase 3 Phase 4 Pre-Run Early Run Late Run Post Run (Jul '00) (Jan '01) (Jan '02) (Jan '03 ->) ------------------------------------------------------------------------------ USERS (concurrent) ===== Conc. Users 10 20 30 20 TABLE & INDEX SPACE (Production db) =================== (RAID 5 Disk) Total total table(GB)20 100 1000 1500 data table (GB)10 50 500 750 index table(GB)10 50 500 750 #tables 100 100 100 100 #datafiles 12 50 500 750 Largest Table (events) Size (GB) 10GB 50GB 500GB 750GB #rows(@300B) 10^5 10^8 10^9 1.5*10^9 #datafiles 4 25 250 400 2nd Largest (data_files) Size (GB) 2GB 10GB 100GB 150GB #rows(@2kB) 10^4 5*10^5 5*10^6 7*10^6 #datafiles 1 5 50 75 3rd Largest (processes) Size (GB) 2GB 10GB 100GB 150GB #rows(@2kB) 10^4 5*10^5 5*10^6 7*10^6 #datafiles 1 5 50 75 TABLE & INDEX SPACE (Integration db - assume 10% of production db) =================== (RAID 5 Disk) total table(GB)2 10 100 150 data table (GB)1 5 50 75 index table(GB)1 5 50 75 #tables 100 100 100 100 #datafiles 12 50 500 750 BACKUP ====== Full Backup (hot 1/week, tapes in 4 week rotation) (assume recovery takes 2 x longer than backup) size(GB) 20 100 1000 1500 tapes(@50GB) 1 2 20 30 rate (MB/s) 12 2x12 10x12 10x12 duration(hr) .5 1.2 2.3 3.5 Incr. Backup (hot 1/day, 6 daily tapes in 4 week rotation) size(GB) 20 20 20 20 tapes(@50GB) 1 1 1 1 rate (MB/s) 12 12 12 12 duration(hr) .5 .5 .5 .5 Admin. DISK =========== (RAID 1 - mirrored) System 9 GB 9 9 9 Control Files 3x9 GB 3x9 3x9 3x9 Redo logs 2x9(w/ CF) 2x9(w/ CF) 2x9(w/ CF) 2x9(w/ CF) Archived redo 18 GB 18 18 18 Rollback 9 GB 9 9 9 (RAID 0 - striped) Temp table 2x18 GB 2x18 2x18 2x18 APPLICATIONS ============ Transactions(1) (OLTP) Rate (/min) <1 10 >10 <1 Size #Tables/ columns Batch (2) Rate (/day) 1 10 >10 <1 Size #Tables/ columns Queries(3) (Decision Support) Rate (/min) <1 1 5 1 Size #Tables/ columns CPU & MEMORY ============ UltraSPARC 4 4 6 6 Memory estimate 1 2 3 3 GByte Notes: 1. Typical transactions (assume worst case for transfer:blk size x# tables x 2) a. Adding raw data files, including events -Transaction rate=1/min -Data transfer = 1.2 MB + 8k x 10 x 2 = 1.36 MB -Required bandwidth = -Each transaction will add to the following tables avg. Bytes Columns +events: 300 10 (4000 events = 1.2 MB) +data_files: 84 19 +file_locations: 30 1 +file_lineage: 22 5 +processes: 51 11 +import_processes??: 85 8 +data_file_locations: 24 5 +event_catalogs: 21 3 +physical_data_streams: 23 3 +event_catalogs: 21 3 +volumes 53? +triger_lists?: +trigger_streams?: -Total transfer ~750 Bytes in 11 tables b. Adding processed files with processing information for consumers. -Transaction rate=10/min -Data transfer = 8k x 10 x 2 = 160 kB -Required bandwidth = -Each transaction will add to the following tables avg.Bytes Columns +data_files: 84 +file_locations: 30 +file_lineage: 22 +processes: 51 +analyse_processes: 11 +consumers: 22 +project_files: 15 +consumed_files: 35 +volumes 53? -Total transfer ~300 in 9 tables c. Imported data from remote sites -Transaction rate=1/day -Data transfer = 8k x 10 x 2 = 160 kB -Required bandwidth = -Each transaction will add to or update following tables Bytes Columns +runs: 38 +data_files: 84 +file_locations: 30 +file_lineage: 22 +processes: 51 +analyse_processes: 11 +volumes 53? +import_processes: 85 -Total transfer ~300 in 8 tables d. Adding and/or manipulating projects -Transaction rate=10/day -Data transfer = -Required bandwidth = -Each transaction will add to or update one of the following tables Bytes Columns +project_definition 135 +project_snapshots 67 +analysis_projects 69 -Total transfer ~100 in 1 tables 2. Batch activities a. incorporating on-line db info -Transaction rate=8/day (assume run = 4 hrs) -Data transfer = -Required bandwidth = -Each transaction will add to update following tables b. synchronizing SAM db with Enstore -Transaction rate=1/day -Data transfer = -Required bandwidth = -Each transaction will add to update following tables c. data processing summaries -Transaction rate=1/day -Data transfer = -Required bandwidth = -Each transaction will add to update following tables 3. Typical queries a. File query -rate=1/min -Data transfer = -Required bandwidth = -each query may touch any table in SAM db. Primarily +runs: +data_files: +data_file_locations: +physical_data_streams: +application_family: +physical_dataset: +file_lineage: +processes: b. Event query (assume run partitioning, searching 10% of data) -rate=1/hr -Data transfer = -Required bandwidth = -each query will touch primarily one table +events table-10^8 rows x 300B = 30,000MB c. Calibration data access -rate=1/min (??) assuming one access per farm processed file -rate=1/hr (??) assuming cached and updated only when needed -Data transfer = 200 MB -Required bandwidth =