Tuesday, September 18, 2007 11 AM Bldg. 38A, B2 Library Title: Overview of Some Database Projects in the Structure Group Speaker: Siqian He Abstract: The design and content of a few core databases PubStruct (MMDB), PubVast (VAST), CDTrack (CDD), PigPen (Protein Identity Group), PCSubstance, and PCCompound (PubChem) in the Structure group will be discussed. The feature and capability of the shared database access library shdb will also be discussed. The database access library shdb is wrapped on top of Microsoft-ODBC/Merant-ODBC/DBAPI-FTDS. It allows users to access Structure-group’s databases (all on MS SQL servers) or IEB’s databases (some on Sybase servers) from Unix, Linux and NT machines with the same interface. It has many specialized features designed to enhance front end web service’s 24/7 availability, for instance, it can automatically switch to point to a mirror database server in case a web query responded too slowly on a dbserver. Also included in this db library is the capability of a build-in dbserver toggle awareness logic. Due to the huge volume of data involved in some of our databases, especially the PubChem db cluster, a backup/restore mechanism (versus the previously used transactional-based replication mechanism) is now adopted to push data from master dbserver to 8 public facing dbservers (serving front end cgi’s). For this to work, any open database connection reading data from public dbserver must do regular self-awareness checks to see whether current connection will soon be out of service (because of, say, an imminent database restore to current dbserver), and if so, do a bailout to switch to another dbserver, etc. This lib is now used by all database applications (over 200 libs and apps) in the Structure group. Siqian He, Sc.D CBB/NCBI/NLM/NIH