ACToR – Aggregated Computational Toxicology Resource

Release v2008Q2 (October 2008)

Produced by the U.S. Environmental Protection Agency, National Center for Computational Toxicology

 

Table of Contents

Introduction

Data Collection

Data Collection View

Generic Chemical View

Substance View

Search by Name

Search by CAS Number

Search by Structure

Browsing Assay

Assay View

Glossary

 

Introduction

ACToR (Aggregated Computational Toxicology Resource) is a collection of databases collated or developed by the US EPA National Center for Computational Toxicology (NCCT). More than 200 sources of publicly available data on environmental chemicals have been brought together and made searchable by chemical name and other identifiers, and by chemical structure. Data includes chemical structure, physico-chemical values, in vitro assay data, exposure data, and in vivo toxicology data. Chemicals include, but are not limited to, high and medium production volume industrial chemicals, pesticides (active and inert ingredients), and potential ground and drinking water contaminants.

At present, chemical toxicity data resides in a variety of specialized databases, in many different and incompatible formats and in many different locations. Up to now, in order to compile all information on a given chemical, one needed to search multiple databases and then manually compile the resulting data. While this is possible to do for specific chemicals, it is very difficult to compile comprehensive data sets on chemically-similar sets of compounds using structure searching tools. By bringing together data from a large number of sources and making the data structure-searchable, ACToR will facilitate searches that transcend available data and chemical number. As such, it will be an important tool for the advancement of computational toxicology, which requires evaluation of information across broad scales of chemical class, use, structure and biological activity.

The ACToR project is compiling data (both quantitative and qualitative) from a large number of sources (called data collections), including EPA databases, PubChem, other NIH, USDA and FDA databases, state and other national sources, and from academic groups. One novel data collection is ToxRefDB (Toxicology Reference Database), which includes detailed information on in vivo guideline study results for pesticides and other potentially toxic chemicals that has been assembled by the National Center of Computational Toxicology. ACToR is also the primary repository of data being produced by the EPA ToxCast chemical prioritization program.

The majority of chemicals in ACToR have chemical structures, which will facilitate studies of structure-function relationships in sets of environmental chemicals. The DSSTox Program in the NCCT is responsible for structure annotation in ACToR

Adding new data into ACToR is straightforward. We are always interested in obtaining other data collections that could be incorporated into the system.

ACToR is organized into a series of domains, linked together by chemical.

Domain

Description

Location

Chemicals

Structure, names and other basic chemical information

Main ACToR Database

Assays

Quantitative and other tabular data on chemicals

Main ACToR Database

Toxicology

In vivo study data from multiple domains

ACToR / ToxRefDB Database

Genomics Microarray Data

Full microarray data sets, in both original and transformed versions

Under Development

Biological Reference Data

Information on genes, proteins and pathways, downloaded from public sources

Main ACToR Database

ToxMiner

Detailed data from the ToxCast and ToxRefDB programs - used for ToxCast analyses.

Separate ToxMiner database - linked to ACToR by chemical ID

 


Chemicals are organized into three main classes, the first two of which are modeled closely after the corresponding PubChem data model

  • Substance: A substance is the article that was tested and provides a link to assay and other test data
  • Compound: A compound holds chemical structure information
  • Generic Chemical: A generic chemical aggregates a chemical structure plus all of the corresponding substances. The common link is that all substances share the same CAS registry number.

Assays are composed of a set of assay components. These can be quantitative measurements, annotations, or URLs to other sources.

All data is initially compiled as part of a set of Data Collections. A data collection is at minimum a set of substances with corresponding CAS registry numbers and names. Additional information may include chemical structures and assays. As mentioned above, a generic chemical links together data from many data collections on all substances that share a common CAS registry number.

 

TOP

 

Data Collections

 

 

All data within ACToR is organized by Data Collections. A data collection contains substances, and optionally, chemical structures and assays. The entire list of data collections in ACToR can be seen by selecting the Data Collections in the left hand navigation bar. For each collection, the following data are presented: the name, a description, the institutional source, the type and the number of substance, generic chemicals and assay results. An assay result is one data value from one assay, for one substance. The number of generic chemicals may be less than number of substances if some substances do not have CAS numbers or if there are multiple substances with the same CAS number. To view the list of chemicals in the data collections, select the details link at the left. This will take you to the Data Collection View. To navigate to the external site from which the data was taken, select the Link Out hyperlink at the far right in data collections table.

TOP

 

Data Collection View

 

 

The data collection page shows all the information within a single data collection. Here, the information is divided into three parts: overview (the information within the box), chemical table, and assay table.

 

Overview

The top chart provides a brief overview.  This includes:

 

Name - name of the data collection

Link out - provides a direct path to the data source

Description - description of the data collection

ID - the internal id number of the data collection

Institutional Source - the name of the institution that provided the data

Source Type - shows if there is assay data or just a chemical list

Number of Substances - the number substances in the data collection

Number of Generic Chemicals - the number of generic chemicals in the database.  This number may be less than number of substance if some substances do not have CAS numbers or if there are multiple substances with the same CAS number. 

 

Show Chemical Table

 

 

To see the data, click on “Show Chemical Table”. 

 

Structure - contains a diagram of the chemical

CASRN - The CAS registry number

Name - the name of the chemical
Generic - a link that will take you to the generic chemical view

Phenotype Summaries – The following fields indicate whether or not there is any information in the ACToR database for the current chemical for a series of broad toxicity phenotypes. These are general chemical hazard (typically indicating that acute toxicity studies are available), carcinogenicity, genotoxicity, developmental toxicity, reproductive toxicity, chronic toxicity and food safety. This last covers a variety of food safety studies, for instance whether the chemical has been allowed or banned from contact with food, whether it is considered safe, and if so at what level. A red box under one of the phenotypes simply indicates that data is available, and not that the chemical is recognized to cause that particular type of toxicity.

Show Assay Table

After “show assay” has been clicked on, a chart will appear. This lists the name of the assay associated with the current data collection along with the number of chemical substances and assay components associated with the assay.

 

By Clicking on the link under the first Column will take you to the assay view page.

 

TOP

 

Generic Chemical View

 

 

This page presents all the information on a specific generic chemical. Data has been aggregated from all substances with a specific CASRN from all data collections.

 

CASRN- the CAS (Chemical Abstracts Service) Registration Number

Formula- the chemical formula

MW- molecular weight

SMILES - (Simplified Molecular Input Line Entry System) is a line notation used for representing molecules

InCHI - IUPAC International Chemical Identifier (InChI, pronounced "INchee") is an alphanumeric identifier for chemicals used to encode information about the molecule in a standard way.

 

Show Substances - provides a list of all the data collections from which information on this chemical was derived. Recall that chemicals are aggregated by CASRN.

Show Synonyms - provides alternate names for this chemical

Data By Toxicology Phenotype - selecting one of these links allows one to view the detailed data for this chemical for each of the major phenotypes.

Data by Toxicology Data Category - selecting one of these links allows one to see the data by the assay data category rather than by phenotype. In particular, this separately displays tabular, quantitative data vs. summary calls of toxicity vs. URL links to external data sources.

Non-Toxicology Data – Allows the user to see a variety of specific non-toxicology data on the chemical.

  • Physico-Chemical Data – this is largely computed data on solubility, melting point, etc. based on chemical structure.
  • Biochemical Assays – The results of quantitative in vitro assays (cell-based or non-free)
  • Links to chemical summary reports on the web – These are URLs describing the chemical, but typically not including toxicology information/
  • Chemical Categories – these are chemical structure categories used, for instance, to make initial predictions of toxicity of a new compound based on similarity with compounds which have already been tested.
  • Chemical Manufacturing and Use Levels – This is data compiled by the EPA on industrial chemicals that are subject to TSCA (Toxic Substances Control Act)
  • Descriptive Data – a variety of tabulated descriptive data, for instance on intended use of the compound
  • Pesticidal Mode of Action – If the compound is a pesticidal active ingredient, this will provide the intended biological mode of action
  • Material Safety Data Sheet – if available, a link to the International Chemical Safety Card, which summarizes information from the Material Safety Data Sheet.
  • Regulations to Which the Chemical is Subject – This is a listing of U.S. Federal and state regulations to which this chemical is subject.
  • PubMed via MESH – If available, a link to the literature in PubChem based on the MESH term.
  • Notes – notes provided on the chemical for and of the data collections.
  • External Searches by Name or CAS – this is a list of pre-computed URL links to search external, on-line databases based on the CASRN and preferred name of the chemical

 

Viewing Assay Data

 

Data is organized into assays and assays into assay components. One can think of an assay as spreadsheet where there is one row per chemical. The columns are the “assay components”. In practice, a given chemical can have more than one row in the data table, and each of these is termed a “result group”.

 

In the tree that is displayed when one of the top level links is selected, the first level of information is the assay, showing the name and providing a link to the assay definition page. Down one level in the tree are one or more result groups (rows in the assay table) showing the name of the assay component and the value.

Expanding/ Collapsing List

ACToR frequently uses expanding list to organize data. An expanding list has a green triangle next to them. To expand the list, click on “Show ________”. To collapse the list, click on “Hide ____”. After clicking on “Show ____”, a “Collapse All” and “Expand All” button may appear (depending on how long the list is). Clicking on the “Collapse All” button will show only assay names, which are both green and underlined. Assay names are direct links to the Assay View page. Clicking on the “Expand All” button will show the results from all of the studies. Many of these charts are so large that they contain multiple “pages”. See moving through charts more information.

 

TOP

 

Substance View    

The Substance View provides some detail information on the chemical substance from a specific data collection. In particular, details of the corresponding database IDs and substance-specific parameters are provided.

 

 

Some of type of information is

 

CASRN – CAS number

Data Collection – the name of the source where the data collection came from

Mixture – indicates whether the substance is pure or a mixture (not currently implemented)

Substance Type S DSSTox substance type. (Not currently implemented)

Synonyms – alternative chemical name

Parameters – a variety of name-value pairs for the substance as provided by the data collection.

 

TOP

 

Searching By Name

 

 

 

To search for a chemical, type in the full or partial name in the text box. Select either “exact match” or “any match”. Exact match will find the chemical whose name matches what you typed in. “Any” match will find matches that are similar to what you typed in. The search is performed against all of the synonyms that have been compiled for each generic chemical. Click on search, and a standard chemical list chart appears with the results. Note that the search by name program does not accept SMILES or InCHI notation. To use SMILES or InCHI see the Search by Structure page.

 

TOP

 

Search by CAS Number

 

   

         

Using CAS numbers is another way of locating and identifying chemicals. Type in one or more CAS numbers in the text box, separated by either commas or new lines. After search has been clicked, a standard chemical chart will appear

 

TOP

 

Search by Structure

 

     

 

There are two major way to construct a molecules

1. The typical way to construct a molecule is to select a template (see arrow 1) , bonds (arrows 2), and atoms (arrows 3 and 4). First, select a template then click on the canvas (see 1). Then, click on button 2 and select the bond type. To attach a bond, place the cursor over the molecule until a purple circle appears. If there is a need to connect two molecules together, click and hold the left mouse button and drag the other end of the bond to the other molecule until another purple circle appears before letting go of the left mouse button. To add atoms, either click on one of the “quick add” buttons (3) or select button 4. Button 4 causes a small window to appear with the periodic table on it. Select an element and then click close. Clicking on the “Query” tab, gives some more options that do not appear on the periodic table.

 

2. The fastest way to make a molecules is to copy the molecule’s SMILES or InCHI string and click on button 5.

 

For a more in-depth tutorial for this program, click on button 6 on the upper right hand corner. This takes you to the ChemAxon Help Site.

 

TOP

 

Browsing Assays

 

  

 

Sometime a data collection contains an assay.  An assay is usually information within a data collection and comes from a single source.  It is usually arranged in a table, where the rows represent substances and the columns are assay components.  Each cell within this chart is called an assay result.  This page contains two sets of expandable/collapsing lists of assays: phenotypes and categories.

Phenotypes

An assay may contain multiple phenotypes. In other words, an assay may contain information about both genotoxicity and neurotoxicity.  To the right of the name of each phenotype name is the number of assays that contain that phenotype. This chart contains the headings: details, name, category, data collection and substance. See Moving through the chart to learn more this chart.

Categories

An assay may have only one category- which describes the assay in a broad sense. The number next to the category name is the number of assays that fall under that category. The chart contains the headings: assay id, source name aid, category, data collection id, substance count, and component count and. See Moving through the chart to learn more this chart.

Moving through the charts

If the chart contains more than 10 rows, then a list box and a “next 10” link will appear in the upper right hand corner of the chart. Click on the next button will show the next set of rows in the chart. To see all the data at once select the “Show all” option in the list box. To “jump” to another set of rows, select one of the lists box options under “Show all”. Click of the “Next 10” to see the next set of rows (if the number of rows on the next set if less than 10, then that number will replace the 10). The list box shows, what set is displayed and the total number of rows.

 

TOP

 

Assay View

 

Assay data is where the most of the in-depth information about chemicals is kept in the ACToR database. The Assay View has two parts – the overview table and the Assay Component table.

 

The assay overview table contains the headings: Assay ID, Data Collection ID and link, Name, Description, Short Name, URL link to the data source, Assay Category and number of components.

 

The component table contains the headings: Source ID, component name, component description, units, value type, and component type. To see the assay data, scroll down, and click on “Show Assay Data Page”. A new page with chart on it will present itself. In this chart, the columns headers are the assay components and the rows are the substance. The cells of this table are the assay results.

 

TOP

 

Glossary

Active ingredients

An active ingredient is one that prevents, destroys, repels or mitigates a pest, or is a plant regulator, defoliant, desiccant or nitrogen stabilizer. (for more information)

ACToR

(Aggregated Computational Toxicology Resource) is a collection of databases collated or developed by the EPA National Center for Computational Toxicology

Assay

An assay is a collection of data for substances from one data collection. Currently, an assay can be thought of a simple table with rows being chemicals and columns being assay components. An assay falls into one data type category but may have multiple phenotypes. The data can have more than one row or entry for the same substance, and elements in the data matrix can be empty.

Assay Category

Assays are organized into a number of categories that describe the broad type of data presented. Several of these categories described the level of biological organization being probed, while others describe the class of information being presented. The current set of categories are:

 

·         PhysicoChemical

·         Biochemical

·         Genomics

·         Cellular

·         Tissue

·         Organ

·         Organism

·         In vivo toxicology (tabular primary)

·         In vivo toxicology (study listing primary)

·         In vivo toxicology (tabular secondary)

·         In vivo toxicology (summary calls)

·         In vivo toxicology (summary report via URL)

·         General Descriptive information

·         Regulation

·         Chemical Category

·         Chemical Summary URL

·         Chemical Use Level

·         Pesticidal mode of action (MOA)

Assay Component

An assay component defines one column or element of an assay. A component has a unique ID, a name, a description, a data type, and optionally units.

Assay Phenotype

Some assays are characterized by toxicology phenotypes. This allows one to organize the data in ACToR into broad toxicity areas. The current set of phenotypes are:

·         General chemical hazard

·         Acute Toxicity

·         Subchronic Toxicity

·         Chronic Toxicity

·         Carcinogenicity

·         GeneTox

·         Developmental Toxicity

·         Reproductive Toxicity

·         Neurotoxicity

·         Developmental-Neurotoxicity

·         Immunotoxicity

·         Dermal Toxicity

·         Respiratory Toxicity

·         Nephrotoxicity

·         Hepatotoxicity

·         Endocrine-Related effects

·         Cardiotoxicity

·         Ecotoxicity

·         Food Safety

·         Toxicity (Other)

·         PK / Metabolism

 

Assay Result

An assay result is one data point for a single substance and a single assay component.

Assay Types

There are two main types of assays: phenotypes and categories.

CAS

CAS (Chemical Abstract Services) Registration Number (for more information)

 

Some examples of number in CAS format are:

7439-92-1

7440-50-8

79-34-5

39001-02-0

Chemical

A chemical is defined by a unique chemical ID in the database and can be either a substance or a compound.

Chemical Structure

Diagram of a chemical- can be used to search for information about chemicals

Compound

A compound is an entity with a chemical ID and chemical structure information, which may be a 2 or 3 dimensional molfile or a string representation. This can be SMILES or InCHII.

Data Collection

A collection of chemical and assay data from a single source

Generic Chemical

A generic chemical aggregates all data from all data collections for substances with a single given CAS number. It will have links to one or more substances and all of their related assay data, as well as all synonyms derived from the substances.

InCHI

The IUPAC International Chemical Identifier (InChITM) is a non-proprietary identifier for chemical substances that can be used in printed and electronic data sources. It was developed under IUPAC Project 2000-025-1-800 during the period 2000-2004. (for more information)

Inert ingredients

An inert ingredient means any substance (or group of structurally similar substances if designated by the Agency), other than an active ingredient, which is intentionally included in a pesticide product. (for more information)

In Vitro

An experiment that is performed outside of a living organisms (for examples test tubes)

In Vivo

Experimentation done on or inside of living organisms- other wise known as animal testing

SMILES

SMILES (Simplified Molecular Input Line Entry System) is a line notation (a typographical method using printable characters) for entering and representing molecules and reactions. (for more information)

Substance

A substance is an entity with a chemical ID, one or more names (including a CAS number) and potentially a URL pointing to primary data. One special name for the substance is the “source name sid” which is a unique alphanumeric label from the source, which allows a unique link back to the source.

 

TOP