New Account Helpful Tips
  ICR
  CGEMS
Added by Sangeetha Rajagopal, last edited by Ann Wiley on Jul 03, 2008  (view change)

Labels:

Enter labels to add to this page:
Wait Image 
Looking for a label? Just start typing.

CGEMS to dbGaP Data Submission:       

There are six items that are prepared for the data  submission:

       *  Manifest

       *  Study Description

       *  Data File

       *  Data Dictionary

       *  Subject File

       *  Sample Subject Mapping

  *Manifest: lists all of the files that the submitter intends to send. It serves both as an inventory and a checklist for dbGaP to ensure all the files are received.Study logo needs to be     included in the submission.The following manifest template can be used: 

Submitted File Name File Type File Description File Size (in kb) Comments
         


*  Study Description: describes study-related information including the following items:

  • Study name - character limit  is 75 with spaces
  • Study report name - comprehensive study name
  • Abstract study description
  • Study URL
  • Type
  • Disease name(s) linked to Entrez MeSH
  • Inclusion/Exclusion criteria for participants (case-, control-, trio-subjects, etc.)
  • Study History
  • Relevant Publications: PubMed IDs of most recent related articles
  • Attribution: Title/Role of person in the study, Name, Institute of Affiliation  
  • (Institute  Name, City, State, Country)                  

    

Study Description Template

Entrez Study Name (character limit is 75 with spaces): a short study name that will appear in Entrez.  The short Study Name should be relatively stable between study versions.
CGEMS Breast Cancer GWAS (Illumina 550K)
 
Webpage Study Name (no character limit): a comprehensive study name that will appear on the upper left hand corner of the study webpage.  This name length can be longer than the Entrez Study Name.  Also, this name can be different between study versions, since each study version will have a different webpage.
National Cancer Institute Cancer Genetics Markers of Susceptibility (CGEMS) Breast Cancer GWAS (Illumina 550K)
   
Description: an original summary description of the study.   If the description is taken verbatim from a published or soon to be published article, please submit copyright permission from the Journal.  Summaries with copyrighted material must include the following within the description: "Reprinted from [translationalResearch:Article Citation], with permission from [translationalResearch:Publisher]."
     Cancer Genetic Markers of Susceptibility (CGEMS) Phase 1: Breast Cancer Whole Genome Association Scan, which is being conducted to identify genetic variants that influence susceptibility to breast cancer.  Using the Illumina HumanHap550 assay, this phase screens 550,000 SNP markers from across the genome that were typed on approximately 1,140 breast cancer cases and equivalent number of controls.  The goal of this phase is to scan the genome to find genetic variants to aid in the prevention and treatment of breast cancer.  Approximately 5% of the most promising variants will be carried forward to further replication and fine-mapping phases.
Study URL: the study URL(s) if applicable.
http://cgems.cancer.gov/
https://caintegrator.nci.nih.gov/cgems/
Type: the study type(s) (Longitudinal, Case-control, Case-set, Control-set, Trio, Cohort, etc).
CASE
CONTROL

     

Disease name(s): any number of disease name(s) associated with this study.  The disease name must be a MeSH term (*http://www.ncbi.nlm.nih.gov/sites/entrez?db=mesh*).** * *To check, type in the MeSH search box: disease of interest [translationalResearch:mh].  Disease name will be ordered as submitted.
 
Inclusion/Exclusion Criteria: the inclusion and exclusion criteria for cases, controls, trios, participants as applicable.
     The Nurses' Health Study4 (NHS) is a longitudinal study of 121,700 women enrolled in
1976. The CGEMS case-control study is derived from 32,826 participants who provided a blood sample between 1989 and 1990 and were free of diagnosed breast cancer at blood collection and followed for incident disease until May 2004. Cancer follow-up in the
NHS was conducted by personal mailings and searches of the National Death Index. It is estimated that the percentage of true cancers captured by this system is greater than 90%.
Permission was requested from all participants diagnosed with cancer to review medical records to confirm the diagnoses and obtain additional information on tumor histology, staging, and other characteristics. All study participants who were menopausal at blood draw with a confirmed diagnosis of invasive breast cancer and had sufficient stored blood available for DNA extraction at the time of case and control selection were included as cases in the CGEMS project. Controls were matched to cases based on age, blood collection variables (time, date, and year of blood collection, as well as recent (<3 months) use of postmenopausal hormones), ethnicity (all cases and controls are self reported Caucasians), and menopausal status (all cases and controls were menopausal at blood draw).
Informed consent was obtained from all participants. The study was approved by the Institutional Review Board of the Brigham and Women's Hospital, Boston, MA, USA.
History: the study history as applicable.
   
Relevant Publications: use Pubmed IDs (*http://www.ncbi.nlm.nih.gov/PubMed/*).**&nbsp;* *References will appear in the order submitted.
 
Study Attribution: will appear as submitted.
Header Name Affiliation
Principal Investigator    
Institute      
Funding Source        

      

   *Data File : the following phenotypes are included for CGEMS data:

  • Age (5 year intervals)
  • Case control status
  • Gender
  • Family history (+/-)

      The sql statement to generate the phenotype data:

                select PARTICIPANT_DID,AGE_AT_ENROLL_MIN||'-'||AGE_AT_ENROLL_MAX,         

                          CASE_CONTROL_STATUS,GENDER,FAMILY_HISTORY 

                 from STUDY_PARTICIPANT

                where STUDY_ID=3

                 

   *Data Dictionary file: describes  variables that are included in phenotype data file, for example:

VARNAME VARDESC DOCFILE TYPE UNITS COMMENT VALUES
PARTICIPANT_DID
Deidentified Participant's ID
cgems_data_phenotypes.
Xls (the data file name)
integer
  Deidentified ID
 

                 

  *Subject File: for cgems data, the following attributes are collected and saved as a txt file.

  • Subject ID (SUBJID)
  • Consent group (CONSENT)
  • Subject source* (SUBJ_SOURCE)
  • Source SUBJID* (SOURCE_SUBJID)

      The sql statement to generate the subject data file:

                select PARTICIPANT_DID as Subject_ID, 'NCI IRB NHS'   

                         Consent_Group,PARTICIPANT_DID as Subject_Source, 

                         CASE_CONTROL_STATUS

                from STUDY_PARTICIPANT

               where study_id=3

                 

  *Subject Data Dictionary: describes variables that are included in subject  data file, the template is as follows:

VARNAME VARDESC TYPE
SUBJID Subject ID integer
CONSENT Consent group as determined by DAC encoded value
SUBJ_SOURCE Source repository where subjects originate string
SOURCE_SUBJID Subject ID used in the Source Repository integer

                 

  *Sample Subject Mapping: lists sample IDs of each individual DNA sample for which genotype information has been submitted and the corresponding individual subject ID for which     phenotype data has been submitted. The following columns are included:

  • Specimen id
  • Subject id

    The sql statement to generate the sample subject mapping data file:

                select SPECIMEN_ID,PARTICIPANT_DID

                 from specimen

                 where PARTICIPANT_DID

                 in ( select PARTICIPANT_DID

                      from STUDY_PARTICIPANT

                      where study_id=3)


CONTACT US PRIVACY NOTICE DISCLAIMER ACCESSIBILITY APPLICATION SUPPORT
National Cancer Institute Department of Health and Human Services National Institutes of Health USA.gov