|
Submitting mass spectrometry proteomics data to GEO
|
Introduction
|
|
GEO can accept high-throughput proteomic data generated by mass spectrometry technologies.
We aim to capture results and conclusion-level information with sufficient data and descriptive information that
would enable understanding of the experiment and analysis of the underlying data, including:
Lists of identified proteins
Lists of identified peptides used in protein identification
Any additional information such as scores, significance or quality information
Relevant peak lists
Standardized input and output from search engines
Relevant descriptive information about the biological samples, instrumentation, and informatics
Our procedures for submission and display of proteomic data are currently under development.
Nevertheless, we can still accept and issue accession numbers for this data type once all required files have been provided. These accession numbers are stable
and will not change, so you can cite them in your manuscript.
Data may be held private until published and reviewer access to private data is supported.
|
Submission requirements
|
|
A standard submission has three required components as summarized in the following table;
follow the details links for important information about each component:
Metadata spreadsheet (details) |
'Metadata' refers to descriptive information and protocols for the overall experiment and individual Samples,
as well as references to associated files.
Metadata is supplied by completing all fields of the Metadata template spreadsheet within the
NCBI_Peptides_Submission_Template Excel file.
Metadata content guidelines are provided within the template spreadsheet and in the table below.
|
Data files (details) |
Raw data files and any supporting peptide identification output files that describe the link
between the raw data and the results.
|
Results spreadsheet (details) |
The list of proteins discovered for each Sample, the peptides used to
identify those proteins, and the spectra used to identify the peptides. Modifications can also be specified.
The required format is shown in the Results spreadsheet within the
NCBI_Peptides_Submission_Template Excel file, and in the table below.
|
The metadata spreadsheet fields and content guidelines are as follows:
SERIES This section describes the overall experiment.
|
title |
Unique title (less than 120 characters) that describes the overall study. |
summary |
A thorough description of the goals and objectives of this study. The abstract from the associated publication may be suitable. Include as much text as necessary to thoroughly describe the study. |
overall design |
Indicate how many Samples are analyzed, if replicates are included, are there control and/or reference Samples, etc... |
type |
Keyword(s) that generally describe the type of study. Examples include: time course, dose response, disease state analysis, tissue comparison, stress response, genetic modification, etc. |
contributor |
Each contributor is listed on a separate line as "Firstname,Initial,Lastname", for example, "John,H,Smith" or "Jane,Doe" |
PROTOCOLS This section includes protocols and fields which are common to all Samples.
Protocols which are applicable to specific Samples should be included in the SAMPLES section instead.
|
growth protocol |
The conditions that were used to grow or maintain organisms or cells prior to protein preparation. |
treatment protocol |
The treatments applied to the biological material prior to protein extraction. |
extract protocol |
The protocol used to extract and prepare the protein. |
digestion protocol |
The enzyme used to digest the sample, duration of digestion, whether in gel or in solution, temperature. |
separation method |
The method(s) used to separate the protein mixtures. (i.e. Column chromatography, gel electrophoresis, capillary electrophoresis) with sufficient supporting details. In general, follow the appropriate MAIPE guidelines. |
mass spectrometer |
The characteristics, ion sources, fragmentation method, and major components of the instrument used. In general, follow the MAIPE: Mass Spectrometry guidelines. |
quantification protocol |
Describe any protocol used to quantify the peptides/proteins. |
data processing |
Provide details of any post-processing performed upon the raw data. |
platform |
The generic instrument type. |
SAMPLES This section lists and describes each of the biological Samples under investigation.
|
sample ID |
Unique identifier for each biological sample. This is a local ID that will not appear on the final records. |
title |
Unique title that describes the Sample. We suggest that you use the convention: [biomaterial]-[condition(s)]-[replicate number], e.g., Muscle_exercised_60min_rep2. |
source name |
Briefly identify the biological material and the experimental variable(s), e.g., vastus lateralis muscle, exercised, 60 min. |
organism |
Organism from which the biological material was derived. Use standard NCBI Taxonomy nomenclature. |
characteristics |
List all available characteristics of the biological source, including factors not necessarily under investigation, e.g., Strain: C57BL/6, Gender: female, Age: 45 days, Tissue: bladder tumor, Tumor stage: Ta. Multiple 'characteristics' columns can be included. |
description |
Additional information not provided in the other fields, or paste in broad descriptions that cannot be easily dissected into the other fields. |
SAMPLE FILES This section lists all of the files associated with the experiment and their relationship to each other. Each Sample may have multiple rows, one for each file.
|
sample ID |
Unique identifier for each biological sample. This is a local ID that will not appear on the final records. |
results file |
File that lists all of the proteins, peptides, and matching spectra for each Sample. See the Results spreadsheet for the required format. Each Sample must have only one Results file. |
fraction |
An ordinal number for the gel slice, or an "x,y" coordinate for 2D gels. |
raw file |
The name of the file containing the instrument generated (raw) data for each fraction. |
raw file type |
The raw data type (e.g., mzData, mzXML, mzML, mgf, pkl, sqt). Note: Separate dta files are not accepted. |
peptide identification output file |
The name of the peptide identification search output file for each raw file, matching spectra to peptides. Can be from a protein sequence library search, a spectral library search, or other means of matching spectra to peptides. There may be more than one per raw file. |
peptide identification file type |
The algorithm or method that generated the peptide identification output file, e.g. OMSSA, Mascot, X!Tandem, Sequest, NIST MS, PEAKS, manual inspection, etc. |
There are two types of required supplementary data files required
with each submission:
Raw data:
The raw data containing the MS1 and
MS2 information from the instrument. The preferred raw data format is mzXML or mzML that contains both
the MS1 and MS2 data from a single fraction. Alternatively, text
based formats such as MGF or PKL maybe accepted if the original
data is no longer available. We can not accept the binary data
file from the instrument (e.g. .raw or .wiff) since it is
proprietary and we are unable to process it.
Peptide identification output:
The peptide identification output files
from any program used to match the MS2 spectra to the peptides.
We accept Mascot DAT files, OMSSA ASN.1 or XML formatted files, or
any search engine output that has been converted to PepXML. If no
search engine was used, or the format is not yet supported, then
the Results spreadsheet must include spectra references (see below).
A Results file must list the proteins discovered in one Sample in the
experiment. A separate file must be generated for each Sample. For each protein, the peptides must be
listed, and for each peptide, the matching spectra must
be listed. If matching spectra are omitted, then
every matching spectrum in the peptide identification output files is assumed
to be correct. The spectrum_list is a comma separated list of spectrum_file_name:id (where 'id' is the spectrum number). The spectrum file
extension may be omitted from the file name.
The required format is shown in the Results spreadsheet within the
NCBI_Peptides_Submission_Template Excel file and in the following table:
Protein | Peptide | Spectrum_list |
CATA_MOUSE | FSTVAGESGSADTVRDPR | 07FEB15_ABRF_FT_100a:2171, 07FEB15_ABRF_FT_100a:2177, 07FEB15_ABRF_FT_100a:2183 |
| GPLLVQDVVFTDEMAHFDR | 07FEB15_ABRF_FT_100a:3653, 07FEB15_ABRF_FT_100a:3660 |
| GPLLVQDVVFTDEMAHFDRER | 07FEB15_ABRF_FT_100a:3231, 07FEB15_ABRF_FT_100a:3495, 07FEB15_ABRF_FT_100a:3499 |
| LCENIAGHLKDAQLFIQK | 07FEB15_ABRF_FT_100a:2967, 07FEB15_ABRF_FT_100a:2968 |
| LFAYPDTHR | 07FEB15_ABRF_FT_50a:2395 |
| LVNADGEAVYCK | 07FEB15_ABRF_FT_100a:2151, 07FEB15_ABRF_FT_100a:2157, 07FEB15_ABRF_FT_100a:2161 |
| VWPHKDYPLIPVGK | 07FEB15_ABRF_FT_100a:2768, 07FEB15_ABRF_FT_100a:2774, 07FEB15_ABRF_FT_50a:2808 |
CATD_HUMAN | AIGAVPLIQGEYMIPCEK | 07FEB15_ABRF_FT_100a:3305, 07FEB15_ABRF_FT_100a:3310 |
| FDGILGMAYPR | 07FEB15_ABRF_FT_100a:3258, 07FEB15_ABRF_FT_10a:3109, 07FEB15_ABRF_FT_10a:3111 |
| ISVNNVLPVFDNLMQQK | 07FEB15_ABRF_FT_100a:3771, 07FEB15_ABRF_FT_100a:3775 |
| LVDQNIFSFYLSR | 07FEB15_ABRF_FT_100a:3705, 07FEB15_ABRF_FT_100a:3711, 07FEB15_ABRF_FT_100a:3716 |
| QVFGEATKQPGITFIAAK | 07FEB15_ABRF_FT_100a:2882 |
| VSTLPAITLK | 07FEB15_ABRF_FT_100a:2857 |
HBA3_PANTR | VGAHAGZYGAEALER | 07FEB15_ABRF_FT_100a:2245, 07FEB15_ABRF_FT_25a:2301, 07FEB15_ABRF_FT_50a:2289 |
| VLSPADKTNVK | 07FEB15_ABRF_FT_100a:1892 |
KCRM_HUMAN | FEEILTR | 07FEB15_ABRF_FT_5a_070216183448:2394 |
| FKLNYKPEEEYPDLSK | 07FEB15_ABRF_FT_100a:2690 |
If the peptide identification output files are in a supported format, then
modification information need not be listed. Modifications are
listed using the UNIMOD ID number. Fixed modifications for
given residues are listed separately and are assumed to apply to
all residues of that type. Each modified peptide string is
given for each applicable spectrum. In the modified peptide
strings each residue is followed by a UNIMOD ID in parenthesis
if it is modified and fixed modifications need not be listed.
Example table listing fixed modifications:
Modification | Residues |
5 | K, R, C |
34 | C, R |
Example table listing variable modifications:
Peptide | Mod String | Spectrum File | Spectrum ID |
LSVEALNSLTGEFK | LSV(18)EALNSL(24)TGEFK | 07FEB15_ABRF_FT_100a | 3350 |
|
Deposit instructions
|
|
Zip or tar all files into one archive and transfer to us using the 'other' option on the Direct deposit page.
If you find that your files are too large to transfer in this manner, please email us at geo@ncbi.nlm.nih.gov and we will send you FTP instructions.
This submission procedure is currently under development and is subject to change. However, the accession numbers we assign to your data are stable and will not change,
so there will be no need to resubmit your data once development is complete. If you have any questions or concerns about these instructions, please do not hesitate to contact us at geo@ncbi.nlm.nih.gov.
|
|
|
|
|
|
|
|
|