NCBI C Toolkit Cross Reference

C/data/sequin.hlp

1 <HTML> <HEAD>
2
3 <TITLE>Sequin help documentation</TITLE>
4
5 
9 <link rel="stylesheet" href="ncbi_sequin.css">
10
11 </HEAD>
12
13 <body bgcolor="#FFFFFF" text="#000000" link="#0033CC" vlink="#0033CC">
14 
15
16 
17 <table border="0" width="600" cellspacing="0" cellpadding="0">
18 <tr>
19 <td width="140"><a href="http://www.ncbi.nlm.nih.gov"> <img src="http://www.ncbi.nlm.nih.gov/corehtml/left.GIF" width="130" height="45" border="0"></a></td>
20 <td width="360" class="head1" valign="BOTTOM"> <span class="H1">Sequin Help Documentation</span></td>
21 
22 </tr>
23 </table>
24
25 
26 <table CLASS="TEXT" border="0" width="600" cellspacing="0" cellpadding="3" bgcolor="#000000">
27 <tr CLASS="TEXT" align="CENTER">
28 <td width="100"><a href="index.html" class="BAR">Sequin</a></td>
29 <td width="100"><a href="http://www.ncbi.nlm.nih.gov/Entrez/" class="BAR">Entrez</a></td>
30 <td width="100"><a href="http://www.ncbi.nlm.nih.gov/BLAST/" class="BAR">BLAST</a></td>
31 <td width="100"><a href="http://www.ncbi.nlm.nih.gov/omim/" class="BAR">OMIM</a></td>
32 <td width="100"><a href="http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html" class="BAR">Taxonomy</a></td>
33 <td width="100"><a href="http://www.ncbi.nlm.nih.gov/Structure/" class="BAR">Structure</a></td>
34 </tr>
35 </table>
36
37 
38 <P>&nbsp
39
40 <H2>Table of Contents</H2>
41
42 <HR>
43
44 >Introduction
45
46 #Sequin is a program designed to aid in the submission of sequences to
47 the GenBank, EMBL, and DDBJ sequence databases. It was written at the
48 National Center for Biotechnology Information, part of the National
49 Library of Medicine at the National Institutes of Health. This section
50 of the help document provides a basic overview of how to submit
51 sequences using the Sequin forms. Subsequent sections provide detailed
52 instructions for entering information on each form.
53
54 *The Help Documentation
55
56 #The Sequin help documentation is available in both on-line and World
57 Wide Web (http://www.ncbi.nlm.nih.gov/Sequin/sequin.hlp.html) formats.
58 The text of the on-line version scrolls as you progress through the
59 Sequin forms. Specific words or phrases can be identified with the
60 "find" command at the top of the window. The on-line document can also
61 be saved as a text file, or printed directly to a printer. Click on the
62 window that contains the help documentation. Under the Sequin File
63 menu, choose Export Help... to save the documentation as a text file.
64 To print the documentation without saving it first, click on the help
65 window, and choose Print from the Sequin File menu.
66
67 *Organization of Forms
68
69 #Information is entered into Sequin on a number of different forms. Each
70 form is made up of pages, which are indicated by folder tabs at the top
71 of the form. You can move to the desired page by clicking on the
72 appropriate folder tab. You can also move between pages of a form by
73 clicking on the "Next page" or "Prev page" buttons at the bottom of the
74 screen. You can move to the previous form or the next form by clicking
75 on the "Prev form" or "Next form" buttons on the first or last pages of
76 a form, respectively.
77
78 #There are numerous ways to enter information onto a page of a form,
79 including text fields, radio buttons, check boxes, scrolling boxes,
80 pop-up menus and spreadsheets.
81
82 #You may also use tables to import annotation of source information.
83 The formatting of these tables will be discussed below.
84
85 *Overview of Sequin
86
87 #If you are using Sequin for the first time, you will be prompted to
88 fill out four forms: the Welcome to Sequin form, the Submitting
89 Authors Form, the Sequence Format form, and the Organism and Sequences
90 Form. After you have filled out these forms, a window will appear that
91 contains the Sequin record viewer. This viewer allows you to access
92 many other forms in which you can edit fields filled out in the three
93 initial forms, as well as add additional information. Detailed
94 instructions on how to fill out the forms and use the record viewer are
95 presented below.
96
97 >Welcome to Sequin Form
98
99 #First, indicate with one of the three radio buttons whether you are
100 submitting the sequence to the GenBank, EMBL, or DDBJ database. If you
101 are working on a sequence submission for the first time, click on
102 "Start New Submission". If you are modifying an existing submission
103 record, click on "Read Existing Record". If you would like to quit from
104 Sequin, click on "Quit Program".
105
106 #You can also "Read Existing Record" to read in a FASTA-formatted sequence
107 file for analysis purposes. The sequence will be displayed in Sequin and can
108 be analyzed with tools such as CDD Search, but it should not be submitted
109 because it does not have the appropriate annotations.
110
111 #If you are running Sequin in its network-aware mode, you will see
112 another button labeled "Download from Entrez". This option allows you
113 to update an existing database record using Sequin. The record will be
114 downloaded from GenBank into Sequin using NCBI's Entrez retrieval
115 system. The contents of the record will appear in Sequin, and you can
116 edit them by updating the sequence or the annotations, as necessary. If
117 you do not see the button labeled "Download from Entrez" on the Welcome
118 to Sequin form, you are not running Sequin in its network-aware mode.
119 To make Sequin network-aware, see the
120 <A HREF="#NetConfigure">
121 instructions
122 </A>
123 later in the help documentation.
124
125 #You can update only those records that you have submitted, not those
126 submitted by others. To update an existing record, first select which
127 of the databases you will be sending the update to. This should be the
128 database to which the original record was submitted. If you do not
129 know which database to use, send the record to GenBank and the NCBI
130 staff will forward it to the appropriate database. Next, click on the
131 button "Download from Entrez". Enter the nucleotide Accession number or
132 GI of the sequence on the first form. Then enter "yes" if you are
133 planning to submit the record as an update to one of the databases.
134 Fill out the Submitting Authors form.
135
136 <A HREF="#EditSubmitterInfo">
137 Instructions
138 </A>
139
140 for this form are found in the Sequin help documentation under "Edit
141 Submitter Info" under the Sequin File menu. The record will then open
142 in the record viewer. Explanations of how to add annotations or update
143 sequences are presented in the documentation entitled
144
145 <A HREF="#EditingtheRecord">
146 "Editing the record"
147 </A>
148 and
149 <A HREF="#SequenceEditor">
150 Sequence Editor
151 </A>
152
153 respectively. You will not see the Submitting Authors Form, the
154 Sequence Format Form, or the Organism and Sequences Form. Note that
155 updates, as well as new records, must be emailed to the appropriate
156 database. Sequin does not support direct submission of records over the
157 Internet.
158
159 #Additional configuration options are available under the Misc menu.
160 You can toggle between the stand-alone and network-aware modes of
161 Sequin. The default mode of Sequin, which is sufficient for most
162 sequence submissions, is stand-alone. In its network-aware mode, Sequin
163 can exchange data with NCBI and, for example, retrieve sequences
164 from Entrez and perform Taxonomy searches. The network-aware mode of
165 Sequin is described in detail in the
166 <A HREF="#NetConfigure">
167 Net Configure
168 </A>
169 section below. You can also start the NCBI DeskTop, which is for
170 advanced Sequin users only.
171
172 >Submitting Authors Form
173
174 #Information from this form will be used as a citation for the sequence
175 entry itself. It can contain the same information found in citations
176 associated with the formal publication of the sequence.
177
178 #On the bottom of each form are two buttons. Click "Prev form" (first
179 page in a form) or "Prev page" (subsequent pages in a form) to go to the
180 previous form or page. Click "Next Form" (last page on a form) or "Next
181 Page" (earlier pages on a form) to move to the next form or page.
182
183 #Form pages can also be saved individually by using the "Export" function
184 under the File menu. If you are processing multiple submissions, you
185 can use the "Import" function under the File menu to paste previously
186 entered information directly on the page.
187
188 #The Contact, Authors, and Affiliation pages can be saved as a block so
189 that you can use this information for your next submission. For your
190 first Sequin submission, fill in the requested information on the
191 Submitting Authors form and proceed with the preparation of the
192 submission. Choose Export Submitter Info under the File menu to export
193 this to a file. You can then import this information in subsequent
194 submissions using the Import Submitter Info in the File menu. You will
195 need to fill in the manuscript title for each submission however.
196
197 *Submission Page
198
199 **When May We Release Your Sequence Record?
200
201 #Please select one of the two radio buttons. If you select
202 #"Immediately After Processing", the
203 entry will be released to the public after the database staff has added
204 it to the database. If you select "Release Date", fields will appear in
205 which you can indicate the date on which the sequences should be
206 released to the public. The submission will then be held back until
207 formal publication of the sequence or GenBank Accession number, or
208 until the release date, whichever comes first. The maximum hold
209 time is five years.
210
211 **Tentative Title for Manuscript
212
213 #Please enter a title that appropriately describes the sequence entry.
214 Later in the submission process, you will have the
215 opportunity to change this information and add details for published
216 or in press references.
217
218 *Contact Page
219
220 #Please enter the name, telephone and fax numbers, and email address of
221 the person who is submitting the sequence. This is the person who will
222 be contacted regarding the sequence submission. The phone, fax, and
223 email address will not be visible in the database record, but are
224 essential for contact by the database staff.
225
226 *Authors Page
227
228 #Please enter the names of the people who should receive scientific
229 credit for the generation of sequences in this entry. The person on
230 the Contact page is automatically listed as the first author. This
231 information can be changed if necessary. The author names should be
232 entered in the order first name, middle initial, surname. You can add
233 as many authors to this page as you wish. After you type in the name
234 of the third author, the box becomes a spreadsheet, and you can scroll
235 down to the next line by using the space bar. The consortium box
236 should only be used for consortium names, not institute or department
237 names.
238
239 *Affiliation Page
240
241 #Please enter information about the principal institution where the
242 sequencing was performed. This is not necessarily the same as the
243 workplace of the person described on the Contact page. This information
244 will show up in the reference section of the record, with the title
245 Direct Submission.
246
247 >Sequence Format Form
248
249 #Use this form to indicate the type, format and category of sequence
250 you are submitting.
251
252 #Sequin can process single nucleotide sequences, gapped sequences and
253 sets of related sequences. If the sequences are related in terms of
254 coming from the same publication, or the same organism, they may be
255 candidates for a Batch submission. Biologically related sequences may
256 be classified as environmental samples, population, phylogenetic,
257 mutation, or segmented sets as appropriate. Segmented sets consist of
258 a collection of non-overlapping sequences covering a specific genetic
259 region. In all cases, although the sequences are handled as a single
260 submission, each sequence in a set will receive its own database
261 Accession number and can be annotated independently.
262
263 #Sequin can display the alignments of sequences that are submitted as
264 part of an aligned phylogenetic, population, mutation set, or
265 environmental samples. Such sequences can be submitted in FASTA,
266 Contiguous (FASTA+GAP, NEXUS, MACAW), or Interleaved (PHYLIP, NEXUS)
267 formats. If the sequences are in FASTA format, Sequin can generate an
268 alignment. If the sequences have already been aligned in FASTA+GAP,
269 PHYLIP, MACAW, or NEXUS, Sequin will not change the alignment. If one
270 of the sequences in your alignment is already present in the
271 GenBank/EMBL/DDBJ database, you must mark that sequence so that it does
272 not receive a new Accession number. Instead of supplying that sequence
273 with a new Sequence Identifier, give it the identifier accU12345, where
274 U12345 is the Accession number of the sequence.
275
276 #Single sequences, gapped sequences, segmented sequences, and batch
277 submissions must be submitted in FASTA format.
278
279 *Submission Type
280
281 #Use the radio buttons to indicate which of the following types of
282 submissions you are creating:
283
284 #-Single sequence: a single mRNA or genomic DNA sequence. If you are
285 submitting multiple sequences from the same publication, consider a
286 Batch Submission. If you decide to submit multiple Sequin files, each
287 with one or more sequences, please send each file in a separate email
288 message.
289
290 #-Segmented sequence: a collection of non-overlapping, non-contiguous
291 sequences that cover a specified genetic region. A standard example is a set
292 of genomic DNA sequences that encode exons from a gene along with fragments of
293 their flanking introns. If the segmented set is part of an alignment,
294 however, select the appropriate Population, Phylogenetic, or Mutation study
295 button. The Gapped sequence option may be a better display of the biology of
296 these types of records.
297
298 #-Gapped sequence: a single, non-contiguous mRNA or genomic DNA sequence.
299 A gapped sequence contains specified gaps of know or unknown length
300 where the exact nucleotide sequence has not been determined. The FASTA
301 format for gapped sequences is slightly different and is explained
302 below.
303
304 #-Population study: a set of sequences that were derived by sequencing
305 the same gene from different isolates of the same organism.
306
307 #-Phylogenetic study: a set of sequences that were derived by sequencing
308 the same gene from different organisms.
309
310 #-Mutation study: a set of sequences that were derived by sequencing
311 multiple mutations of a single gene.
312
313 #-Environmental samples: a set of sequences that were derived by
314 sequencing the same gene from a population of unclassified or unknown
315 organisms.
316
317 #-Batch submission: a set of related sequences that are not part of a
318 population, mutation, or phylogenetic study. The sequences should be
319 related in some way, such as coming from the same publication or
320 organism. You should plan that all sequences will be released to the
321 public on the same date.
322
323 *Sequence Data Format
324
325 #If you are submitting a single, gapped, or segmented sequence, or a
326 batch submission, your sequence must be in FASTA format, described
327 below. If you are submitting a set of sequences as part of a
328 population, phylogenetic, or mutation study, you have a choice of
329 sequence formats. You may submit the set as individual sequences in
330 FASTA format. Alternatively, you can submit the sequences as part of
331 an alignment. Sequin currently accepts the alignment formats
332 FASTA+GAP, PHYLIP, MACAW, NEXUS Interleaved, and NEXUS Contiguous.
333
334 *Submission Category
335
336 #Use the radio buttons to indicate whether your sequence corresponds to
337 an original submission or a third-party annotation submission. If you
338 have directly sequenced the nucleotide sequence in your laboratory,
339 your submission would be considered an original submission.
340
341 #If you have downloaded the sequence from GenBank and added to it your
342 own annotations, your entry may be eligible for submission to the
343 Third-Party Annotation Database
344
345 <A HREF="http://www.ncbi.nlm.nih.gov/Genbank/TPA.html">
346 (TPA)
347 </A>
348 .
349
350 #In order to be released into the TPA database, the sequence must appear
351 in a peer-reviewed publication in a biological journal. If you select
352 this option, a pop-up box will appear upon the completion of the
353 Sequence Format form. You must provide some description of the
354 biological experiments used as evidence for the annotation of your TPA
355 submission in this box.
356
357 #You will be asked later in the submission process to provide the GenBank
358 Accession number(s) of the primary sequence(s) from which your TPA
359 submission was derived.
360
361 >Organism and Sequences Form
362
363 #This form is made up of four pages. If your sequences are imported as
364 properly formatted FASTA files, there will be minimum input necessary
365 in these pages.
366
367 >FASTA Format for Nucleotide Sequences
368
369 #In FASTA format the line before the nucleotide sequence, called the
370 FASTA definition line, must begin with a carat (">"), followed by a
371 unique SeqID (sequence identifier). The SeqID must be unique for each
372 nucleotide sequence and should not contain any spaces. Use of brackets
373 ("[]") in the SeqID is also prohibited. The identifier will be
374 replaced with an Accession number by the database staff when your
375 submission is processed.
376
377 #Information about the source organism from which the sequence was
378 obtained follows the SeqID and must be in the format [modifier=text].
379 Do not put spaces around the "=". At minimum, the scientific name of
380 the organism should be included. Optional modifiers can be added to
381 provide additional information. A complete list of available source
382 <A HREF="http://www.ncbi.nlm.nih.gov/Sequin/modifiers.html">
383 modifiers
384 </A>
385 and their format is available.
386
387 #The final optional component of the FASTA definition line is the
388 sequence title, which will be used as the DEFINITION field in the final
389 flatfile. The title should contain a brief description of the
390 sequence. There is a preferred format for nucleotide and protein
391 titles and Sequin can generate them automatically using the Generate
392 Definition Line function under the Annotate menu in the record viewer.
393
394 #Note in all cases, the FASTA definition line must not contain any hard
395 returns. All information must be on a single line of text. If you
396 have trouble importing your FASTA sequences, please double check that
397 no returns were added to the FASTA definition line by your editing
398 software.
399
400 #Examples of properly formatted FASTA definition lines for nucleotide
401 sequences are:
402
403 <KBD><PRE>>Seq1 [organism=Mus musculus] [strain=C57BL/6] Mus musculus neuropilin 1 (Nrp1) mRNA, complete cds.
404 </KBD></PRE>
405 <KBD><PRE>>ABCD [organism=Plasmodium falciparum] [isolate=ABCD] Plasmodium falciparum isolate ABCD merozoite surface protein 2 (msp2) gene, partial cds.
406 </KBD></PRE>
407 <KBD><PRE>>DNA.new [organism=Homo sapiens] [chromosome=17] [map=17q21] [moltype=mRNA] Homo sapiens breast and ovarian cancer susceptibility protein (BRCA1) mRNA, complete cds.
408 </KBD></PRE>
409 #The line after the FASTA definition line begins the nucleotide
410 sequence. Unlike the FASTA definition line, the nucleotide sequence
411 itself can contain returns. It is recommended that each line of
412 sequence be no longer than 80 characters. Please only use IUPAC
413 symbols within the nucleotide sequence. For sequences that are not
414 contained within an alignment, do not use "?" or "-" characters. These
415 will be stripped from the sequence. Use the IUPAC approved symbol "N"
416 for ambiguous characters instead.
417
418 #A single file containing multiple FASTA sequences can be imported into
419 Sequin in order to create a
420 <A HREF="#SubmissionType">
421 Batch Submission
422 </A>
423 . Make sure that the FASTA definition line for each sequence is
424 formatted as above.
425
426 #If the FASTA definition line is not properly formatted a pop-up box
427 will appear upon importing the nucleotide FASTA. The top box in this
428 pop-up will list any errors in the FASTA definition lines, including
429 missing SeqIDs, duplicate SeqIDs for different sequences, or improperly
430 formatted modifiers. You can add or edit this information in the
431 spreadsheet provided. The toggle at the bottom of the pop-up allows
432 you to select whether all sequences or only those with errors are
433 listed in the spreadsheet above. After making changes, click on Refresh
434 Error List to ensure that all errors have been corrected. You must
435 correct any errors involving the SeqID in order to proceed with your
436 submission.
437
438 *FASTA Format for Gapped Sequence
439
440 #The FASTA definition line for a gapped sequence follows the same format
441 as above. To indicate a gap within the sequence, enter a hard return
442 within the sequence at the point of the gap, then insert an extra line
443 starting with a carat (">") and a question mark ("?"). If the gap size
444 is unknown, enter "unk100" after the question mark. If the gap size is
445 known, enter the length of the gap after the question mark. For
446 example,
447
448 !>Dobi [organism=Canis familiaris] [breed=Doberman pinscher]
449 !AAATGCATGGGTAAAAGTAGTAGAAGAGAAGGCTTTTAGCCCAGAAGTAATACCCATGTTTTCAGCATTA
450 !GGAAAAAGGGCTGTTG
451 !>?unk100
452 !TGGATGACAGAAACCTTGTTGGTCCAAAATGCAAACCCAGATKGTAAGACCATTTTAAAAGCATTGGGTC
453 !TTAGAAATAGGGCAACACAGAACAAAAAT
454 !>?234
455 !AAAAATAAAAGCATTAGTAGAAATTTGTACAGAACTGGAAAAGGAAGGAAAAATTTCAAAAATTGGGCCT
456 !GAAAACCCATACAATACTCCGGG
457
458 will generate a sequence containing two gaps. The first gap is of
459 unknown length, the second is 234 nucleotides long.
460
461 *FASTA+GAP Format for Aligned Nucleotide Sequences
462
463 #A number of programs output sets of aligned sequences in FASTA format.
464 Frequently, to align these sequences, gaps must be inserted. The
465 default alignment settings should correctly interpret gap and ambiguous
466 characters in most cases. If Sequin can not read your alignment, you
467 may need to change these settings using the Optional Alignment Settings
468 button on the
469 <A HREF="#NucleotidePage">
470 Nucleotide Page
471 </A>
472 form. Each sequence, including gaps, must be the same length. The
473 gaps will only show up in the alignment, not in the individual sequence
474 in the database.
475
476 #Sequences in FASTA+GAP format resemble FASTA sequences. The previous
477 section on
478
479 <A HREF="#FASTAFormatforNucleotideSequences">
480 FASTA Format for Nucleotide Sequences
481 </A>
482
483 has instructions for formatting FASTA sequences. If one of the
484 sequences in your alignment is already present in the GenBank/EMBL/DDBJ
485 database, you must mark that sequence so that it does not receive a new
486 Accession number. To do this, use a SeqID in the format accU12345,
487 where U12345 is the Accession number of the pre-existing sequence. All
488 sequences in FASTA+GAP format should be in the same file.
489
490 #The following is an example of FASTA+GAP format:
491
492 !>A-0V-1-A [organism=Gallus gallus] [clone=C]
493 !TCACTCTTTGGCAACGACCCGTCGTCATAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
494 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
495 !
496 !>A-0V-2-A [organism=Drosophila melanogaster] [strain=D]
497 !TCACTCTTTGGCAAC---GCGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
498 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
499 !
500 !>A-0V-3-A [organism=Caenorhabditis elegans] [strain=E]
501 !TCACTCTTTGGCAAC---GCGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
502 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
503 !
504 !>A-0V-4-A [organism=Rattus norvegicus] [strain=F]
505 !TCACTCTTTGGCAACGACCCGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
506 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
507 !
508 !>A-0V-7-A [organism=Aspergillus nidulans] [strain=G]
509 !TCACTCTTTGGCAACGACCAGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
510 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
511
512 *PHYLIP Format for Aligned Nucleotide Sequences
513
514 #A number of programs output sets of aligned sequences in PHYLIP format.
515
516 #The following is an example of PHYLIP format.
517
518 ! 5 100
519 !A-0V-1-A TCACTCTTTG GCAACGACCC GTCGTCATAA TAAAGATAGA GGGGCAACTA
520 !A-0V-2-A TCACTCTTTG GCAAC---GC GTCGTCACAA TAAAGATAGA GGGGCAACTA
521 !A-0V-3-A TCACTCTTTG GCAAC---GC GTCGTCACAA TAAAGATAGA GGGGCAACTA
522 !A-0V-4-A TCACTCTTTG GCAACGACCC GTCGTCACAA TAAAGATAGA GGGGCAACTA
523 !A-0V-7-A TCACTCTTTG GCAACGACCA GTCGTCACAA TAAAGATAGA GGGGCAACTA
524 !
525 !
526 ! AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT
527 ! AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT
528 ! AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT
529 ! AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT
530 ! AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT
531
532 #In this example, the first line indicates that there are 5 sequences,
533 each with 100 nt of sequence. The following five lines contain the
534 Sequence IDs, followed by the sequences. Specifically, the sequence
535 identifier for the first sequence is A-0V-1-A. Note that subsequent
536 blocks of sequence do not contain the Sequence ID. If one of the
537 sequences in your alignment is already present in the GenBank/EMBL/DDBJ
538 database, you must mark that sequence so that it does not receive a new
539 Accession number. To do this, use a SeqID in the format accU12345,
540 where U12345 is the Accession number of the pre-existing sequence.
541
542 #The default alignment settings should correctly interpret gap and
543 ambiguous characters in most cases. If Sequin can not read your
544 alignment, you may need to change these settings using the Optional
545 Alignment Settings button on the
546 <A HREF="#NucleotidePage">
547 Nucleotide Page
548 </A>
549 form.
550
551 #You can modify the PHYLIP format so that Sequin can
552 determine the correct organism and any other modifiers for each
553 sequence. An example of such modifications are below in the section on
554 <A HREF="#SourceModifiersforPHYLIPandNEXUS">
555 Source Modifiers for PHYLIP and NEXUS
556 </A>
557 .
558 #Alternatively, you can leave your sequence alignment in
559 standard PHYLIP format and enter the organism, strain, chromosome, etc.
560 information on the following
561
562 <A HREF="#ImportSourceModifiers">
563 Source Modifers form
564 </A>
565 .
566
567 *NEXUS Format for Aligned Nucleotide Sequences
568
569 #A number of programs output sets of aligned sequences in one of two
570 NEXUS formats, NEXUS Interleaved and NEXUS Contiguous.
571
572 #NEXUS files can contain ? for "missing" at the 5' and 3' ends of
573 sequences, as long as this parameter is properly defined within the
574 header of the NEXUS file.
575
576 #The following is an example of NEXUS Interleaved format.
577
578 !#NEXUS
579 !
580 !begin data;
581 ! dimensions ntax=5 nchar=100;
582 ! format datatype=dna missing=? gap=- interleave;
583 ! matrix
584 !
585 !A-0V-1-A TCACTCTTTG GCAACGACCC GTCGTCATAA TAAAGATAGA GGGGCAACTA
586 !A-0V-2-A TCACTCTTTG GCAAC---GC GTCGTCACAA TAAAGATAGA GGGGCAACTA
587 !A-0V-3-A TCACTCTTTG GCAAC---GC GTCGTCACAA TAAAGATAGA GGGGCAACTA
588 !A-0V-4-A TCACTCTTTG GCAACGACCC GTCGTCACAA T????ATAGA GGGGCAACTA
589 !A-0V-7-A TCACTCTTTG GCAACGACCA GTCGTCACAA TAAAGATAGA GGGGCAACTA
590 !
591 !
592 !A-0V-1-A AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT
593 !A-0V-2-A AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT
594 !A-0V-3-A AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT
595 !A-0V-4-A AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT
596 !A-0V-7-A AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT
597
598 #In this example, the first few lines provide information about the data
599 in the sequence alignment. The following five lines contain the
600 Sequence IDs, followed by the sequences. Specifically, the sequence
601 identifier for the first sequence is A-0V-1-A. Note that subsequent
602 blocks of sequence also contain the Sequence ID. If one of the
603 sequences in your alignment is already present in the GenBank/EMBL/DDBJ
604 database, you must mark that sequence so that it does not receive a new
605 Accession number. To do this, use a SeqID in the format accU12345,
606 where U12345 is the Accession number of the pre-existing sequence.
607 Also, Sequin will replace the "?" characters in the sequences with "N"s
608 since they are defined as "missing" data in the header. The default
609 alignment settings should correctly interpret gap and ambiguous
610 characters in most cases. If Sequin can not read your alignment, you
611 may need to change these settings using the Optional Alignment Settings
612 button on the
613 <A HREF="#NucleotidePage">
614 Nucleotide Page
615 </A>
616 form.
617
618 #You can modify either NEXUS format so that Sequin can
619 determine the correct organism and any other modifiers for each
620 sequence. An example of such modifications are below in the section on
621 <A HREF="#SourceModifiersforPHYLIPandNEXUS">
622 Source Modifiers for PHYLIP and NEXUS
623 </A>
624 .
625 #Alternatively, you can leave your sequence alignment in
626 standard NEXUS format and enter the organism, strain, chromosome, etc.
627 information on the following
628
629 <A HREF="#SourceModifiersForm">
630 Source Modifers form
631 </A>
632 .
633 #The following is an example of NEXUS Contiguous format.
634
635 !#NEXUS
636 !BEGIN DATA;
637 !DIMENSIONS NTAX=5 NCHAR=100;
638 !FORMAT MISSING=? GAP=- DATATYPE=DNA ;
639 !MATRIX
640 !
641 !A-0V-1-A
642 !TCACTCTTTGGCAACGACCCGTCGTCATAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
643 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
644 !
645 !A-0V-2-A
646 !TCACTCTTTGGCAAC---GCGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
647 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
648 !
649 !A-0V-3-A
650 !TCACTCTTTGGCAAC---GCGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
651 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
652 !
653 !A-0V-4-A
654 !TCACTCTTTGGCAACGACCCGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
655 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
656 !
657 !A-0V-7-A
658 !TCACTCTTTGGCAACGACCAGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
659 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
660
661 #In this example, the first few lines provide information about the data
662 in the sequence alignment. The following five lines contain the
663 Sequence IDs, followed by the sequences. Specifically, the sequence
664 identifier for the first sequence is A-0V-1-A. Note that subsequent
665 blocks of sequence also contain the Sequence ID. If one of the
666 sequences in your alignment is already present in the GenBank/EMBL/D
667 DBJ database, you must mark that sequence so that it does not receive a
668 new Accession number. To do this, use a SeqID in the format accU12345,
669 where U12345 is the Accession number of the pre-existing sequence.
670
671 #You can modify either NEXUS format so that Sequin can
672 determine the correct organism and any other modifiers for each
673 sequence. An example of such modifications are below in the section on
674 <A HREF="#SourceModifiersforPHYLIPandNEXUS">
675 Source Modifiers for PHYLIP and NEXUS
676 </A>
677 .
678 #Alternatively, you can leave your sequence alignment in
679 standard NEXUS format and enter the organism, strain, chromosome, etc.
680 information on the following
681
682 <A HREF="#SourceModifiersForm">
683 Source Modifers form
684 </A>
685 .
686
687 **Source Modifiers for PHYLIP and NEXUS
688
689 #You can modify the PHYLIP or NEXUS formats so that Sequin can determine
690 the correct organism and any other modifiers for each sequence by
691 adding lines at the end of the file. The first line applies to the
692 first sequence, the second line to the second sequence, and so on. You
693 must have one line for each sequence. These inserted lines contain
694 modifiers formatted like in the FASTA definition line, but do not begin
695 with a SeqID. Instead, the SeqID is present at the beginning of the
696 sequence lines as shown above.
697
698 #Each of the initial lines starts with the character ">". The
699 scientific organism name follows in brackets. Optional modifiers also
700 follow in brackets. For further information on the data that can go in
701 the lines preceding the sequences, see the instructions entitled "FASTA
702 Format for Nucleotide Sequences",
703
704 <A HREF="#FASTAFormatforNucleotideSequences">
705 above.
706 </A>
707
708 #The following lines indicating the organisms and strain of each sequence
709 would follow immediately after the sequence in the PHYLIP and NEXUS
710 examples, above.
711
712 !;
713 !END;
714 !
715 !begin ncbi;
716 !sequin
717 !>[organism=Gallus gallus] [clone=C]
718 !>[organism=Drosophila melanogaster] [strain=D]
719 !>[organism=Caenorhabditis elegans] [strain=E]
720 !>[organism=Rattus norvegicus] [strain=F]
721 !>[organism=Aspergillus nidulans] [strain=G]
722 !;
723 !end;
724
725 #The number of lines of source information must exactly match the number
726 of sequences provided. Complete examples can be found in the
727 <A HREF="http://www.ncbi.nlm.nih.gov/Sequin/QuickGuide/sequin.htm#AlignmentFormats">
728 Alignment Formats
729 </A>
730 section of the Sequin Quick Guide.
731
732 #Alternatively, you can leave your sequence alignment in
733 standard NEXUS or PHYLIP format and enter the organism, strain, chromosome, etc.
734 information on the following
735
736 <A HREF="#OrganismPage">
737 Organism Page
738 </A>
739 .
740
741 >Nucleotide Page
742
743 #The options on this page will vary depending on the
744 <A HREF="#SubmissionType">
745 Submission Type
746 </A>
747 and
748 <A HREF="#SequenceDataFormat">
749 Sequence Data Format
750 </A>
751 selected earlier. Segmented sets and gapped sequences mut be imported
752 as properly formatted FASTA files. Details about importing alignment
753 files are
754 <A HREF="#NucleotidePageforAlignedDataFormats">
755 below
756 </A>
757 .
758
759 *Nucleotide Page for FASTA Data Format
760
761 **Create Alignment
762
763 #If you have selected a Population study, Phylogenetic study, Mutation
764 study, or Environmental samples set as a
765 <A HREF="#SubmissionType">
766 Submission Type
767 </A>
768 a check box will appear at the top of the Nucleotide Page. If you
769 check 'Create Alignment', Sequin will attempt to generate an alignment
770 of the seqeunces within your submission.
771
772 **Import Nucleotide FASTA
773
774 #Use this button to import your properly formatted
775 <A HREF="#FASTAFormatforNucleotideSequences">
776 FASTA file
777 </A>
778 . You will see a window containing information about the imported
779 sequence(s). Please check the number of sequences, Sequence IDs
780 (SeqIDs) and length of each sequence to make sure they are correct. If
781 you have included source information within the FASTA definition line,
782 this will also be listed.
783
784 **Add/Modify Sequences
785
786 #This option allows you to add or modify sequences without using a
787 previously formatted FASTA file, but is not available if you have
788 selected a Segmented sequence or Gapped sequence as a
789 <A HREF="#SubmissionType">
790 Submission Type
791 </A>
792 . On the Specify Sequences box you can either import a nucleotide FASTA
793 or add a new sequence. If you choose Add New Sequence, a new box will
794 pop-up where you can either import an existing sequence file or
795 directly paste or type the nucleotide sequence.
796
797 #If you add a sequence where the FASTA definition line is not properly
798 formatted a pop-up box will appear. The top box in this pop-up will
799 list any errors in the FASTA definition lines, including missing
800 SeqIDs, duplicate SeqIDs for different sequences, or improperly
801 formatted modifiers. You can add or edit this information in the
802 spreadsheet provided. The toggle at the bottom of the pop-up allows
803 you to select whether all sequences or only those with errors are
804 listed in the spreadsheet above. After making changes, click on
805 Refresh Error List to ensure that all errors have been corrected. You
806 must correct any errors involving the SeqID in order to proceed with
807 your submission. Click on Accept to save your sequences and return to
808 the Specify Sequences box.
809
810 #In the Specify Sequences box, you can choose to add another sequence or
811 select a sequence from the list and choose to edit or delete it. You
812 can also delete all sequences at this point. You will need to click on
813 Done to save your sequences and return to the Nucleotide Page.
814
815 **Clear Sequences
816
817 #This option will remove all imported nucleotide sequences.
818
819 **Specify Molecule
820
821 #A database sequence can represent one of several different molecule
822 types. The default molecule is genomic DNA. If the sequence was not
823 derived from genomic DNA, you can edit that information here. If you
824 are submitting multiple sequences you can apply one molecule type to
825 all sequences or apply the molecule type to each sequence individually.
826 Enter in the Molecule pop-up menu the type of molecule that was
827 sequenced.
828
829 #-Genomic DNA: Sequence derived directly from the DNA of an organism.
830 Note: The DNA sequence of an rRNA gene has this molecule type, as does
831 that from a naturally-occurring plasmid.
832
833 #-Genomic RNA: Sequence derived directly from the genomic RNA of certain
834 organisms, such as viruses.
835
836 #-Precursor RNA: An RNA transcript before it is processed into mRNA,
837 rRNA, tRNA, or other cellular RNA species.
838
839 #-mRNA[cDNA]: A cDNA sequence derived from mRNA.
840
841 #-Ribosomal RNA: A sequence derived from the RNA in ribosomes. This
842 should only be selected if the RNA itself was isolated and sequenced.
843 If the gene for the ribosomal RNA was sequence, select Genomic DNA.
844
845 #-Transfer RNA: A sequence derived from the RNA in a transfer RNA, for
846 example, the sequence of a cDNA derived from tRNA.
847
848 #-Small nuclear RNA: A sequence derived from small nuclear RNA, for
849 example, the sequence of a cDNA derived from snRNA.
850
851 #-Small cytoplasmic RNA: A sequence derived from small cytoplasmic RNA,
852 for example, the sequence of a cDNA derived from small cytoplasmic RNA.
853
854 #-Other-Genetic: A synthetically derived sequence including cloning
855 vectors and tagged fusion constructs.
856
857 #-cRNA: A sequence derived from complementary RNA transcribed from DNA,
858 mainly used for viral submissions.
859
860 #-Small nucleolar RNA: A sequence derived from small nucleolar RNA, for
861 example, the sequence of a cDNA derived from snoRNA.
862
863 #-Transcribed RNA: A sequence derived from any transcribed RNA not
864 listed above.
865
866 #-Tranfer-messenger RNA: A sequence derived from transfer-messenger RNA,
867 which acts as a tRNA first and then an mRNA that encodes a peptide tag.
868 If the gene for the tmRNA was sequenced, use genomic DNA.
869
870 **Specify Topology
871
872 #Most sequences have a Linear topology and this is the default. You
873 should change this setting to Circular only if the sequence is complete
874 and it has a circular topology. For example, a complete plasmid or a
875 complete mitochondrial genome would have a Circular topology, but a
876 single gene from a plasmid or mitochondrion would have a Linear
877 topology. If you are submitting multiple sequences you can apply one
878 topology to all sequences or set the topology for each sequence
879 individually.
880
881 *Nucleotide Page for Aligned Data Formats
882
883 **Sequence Characters
884
885 #If you are submitting a set of aligned sequences, you can specify sequence
886 characters used in your alignment here. Sequin requires that you
887 define any non-IUPAC nucleotide characters in your alignment file. The
888 five types of variable characters are listed under Sequence Characters.
889
890 #Every sequence within an alignment file must contain the same number of
891 characters (nucleotides + gaps). Gap characters are used to represent the
892 spaces between contiguous nucleotides in an alignment. Gaps that appear at
893 the beginning or end of a sequence are treated differently than gaps that
894 appear between nucleotides and each must be defined. GenBank prefers to
895 use a hyphen (-) to represent gaps. If you use a different character to
896 represent a gap, you will need to add this character to the list in the
897 Beginning Gap, Middle Gap, or End Gap boxes.
898
899 #Ambiguous characters represent nucleotides that are known to exist, but
900 whose identity has not been experimentally validated. GenBank prefers to
901 use 'n' to represent any ambiguous nucleotides. If you are using a
902 different character to represent an ambiguous base, you will need to add
903 this character to the list in the Ambiguous/Unknown box. Sequin will
904 convert these characters to 'n's when your file is imported.
905
906 #Match characters denote nucleotides that are identical in every member of
907 an alignment. GenBank prefers the use of a colon (:) to represent match
908 characters. If you are using a different character to represent a match
909 character, you will need to add this character to the list in the Match box.
910
911 **Import Nucleotide Alignment
912
913 #Once you have imported the alignment using the Import Nucleotide
914 Alignment button, you can edit the molecule information using the
915 <A HREF="#SpecifyMolecule">
916 Specify Molecule
917 </A>
918 and
919 <A HREF="#SpecifyTopology">
920 Specify Topology
921 </A>
922 buttons explained above. Note that you can not access the
923 <A HREF="#Add/ModifySequences">
924 Add/Modify Sequences
925 </A>
926 dialog for submissions of aligned sequences.
927
928 >Organism Page
929
930 #Information about the organism from which the sequence was derived
931 should be entered or edited on this page. If there are any potential
932 problems with the organism information previously provided in either
933 the
934 <A HREF="#FASTAFormatforNucleotideSequences">
935 FASTA definition line
936 </A>
937 or entered in the
938 <A HREF="#Add/ModifySequences">
939 Add/Modify Sequences
940 </A>
941 dialog, a window listing these problems will appear at the top of the
942 form. Please review these problems and edit using the
943
944 <A HREF="#AddSourceModifiers">
945 </A>
946 Add Source Modifiers button as necessary. At minimum, you must supply
947 the scientific name of the organism from which the sequence was
948 obtained in order to proceed with your submission.
949
950 #The second window is a summary of the organism information provided so
951 far. Double clicking on a line of text within this window will launch a
952 modifier-specific editing window. In each of these windows, you can
953 edit the available information for the specific modifier. In most
954 cases, you have the choice to edit the modifier for each sequence
955 separately, or to enter text and select Apply above value to all
956 sequences. These changes will be reflected in the windows of the
957 Organism page immediately upon closing the modifier-specific editor.
958
959 *Add Organisms, Locations, and Genetic Codes
960
961 #If you have not added organism information using either the
962 <A HREF="#FASTAFormatforNucleotideSequences">
963 FASTA definition line
964 </A>
965 or the
966 <A HREF="#Add/ModifySequences">
967 Add/Modify Sequences
968 </A>
969 dialog, you can use the Add Organisms, Locations, and Genetic Codes to
970 do so at this point. This button will launch the Multiple Organism
971 Editor pop-up where you may add or edit existing information concerning
972 the
973 <A HREF="#Organism">
974 Organism
975 </A>
976 name,
977 <A HREF="#Location">
978 Location
979 </A>
980 and
981 <A HREF="#GeneticCode">
982 Genetic Code
983 </A>
984 . The SeqID of each sequence is listed at the left of the spreadsheet
985 format. You can change the information in the spreadsheet individually
986 or globally for all sequences.
987
988 **Organism
989
990 #The scrollable list at the top of the pop-up contains the scientific
991 names of many organisms. To reach a name on the list, type the first
992 few letters of the scientific name into the box above the list or the
993 appropriate box in the spreadsheet. The list will scroll to the names
994 beginning with those letters, and you can select the organism within
995 the list itself. You can then use the arrow button to copy this name
996 into the appropriate box in the spreadsheet.
997
998 #To apply the same scientific name to all sequences in the submission,
999 click on the Organism button in the spreadsheet column header. A
1000 separate pop-up box will appear with the same organism list. You can
1001 select a name from this list and choose Accept to apply this name to
1002 all sequences.
1003
1004 #If you have any questions about the scientific name of an organism, see
1005 the NCBI
1006 <A HREF="http://www.ncbi.nlm.nih.gov/Taxonomy/tax.html">
1007 Taxonomy Browser
1008 </A>
1009 http://www.ncbi.nlm.nih.gov/Taxonomy/tax.html
1010
1011 #If the name of the organism is not on the list, type it in directly. If
1012 you do not know the scientific name, please be as specific as you can
1013 and include a unique identifier, such as a clone, isolate, strain or
1014 voucher number, or cultivar name, e.g.; Nostoc ATCC29106, uncultured
1015 spirochete Im403, Lauraceae sp. V�squez 25230 (MO), Rosa hybrid
1016 cultivar 'Kazanlik'. Also, if applicable, indicate if the name is
1017 unpublished as of the time of submission. Additional information such
1018 as strain, isolate, or serotype can be entered later in the submission
1019 process.
1020
1021 **Location
1022
1023 #The default Location for all seqeunces is "Genomic". If the sequence
1024 is not genomic, select the alternative location (ie, organelle) from
1025 the pull-down list. You can change the location of all sequences
1026 globally by clicking on the Location button in the spreadsheet header.
1027 The following is a brief description of the choices in this list:
1028
1029 #-Apicoplast: a reduced plastid characteristic of apicomplexans
1030 (e.g., Plasmodium). NOTE: apicoplast should be applied ONLY to
1031 members of the Apicomplexa.
1032
1033 #-Chloroplast: a chlorophyllous plastid.
1034
1035 #-Chromatophore: a membrane-bound vesicle containing photosynthetic pigments
1036 in bacteria.
1037
1038 #-Chromoplast: a non-chlorophyllous, pigmented plastid, found in
1039 fruits and flowers.
1040
1041 #-Cyanelle: a specialized type of plastid found exclusively in
1042 glaucocystophytes (e.g., Cyanophora). NOTE: cyanelle should be
1043 applied ONLY to members of the Glaucocystophyceae.
1044
1045 #-Endogenous_virus: a virus that has integrated permanently into the
1046 host genome, and which is inherited vertically through the
1047 germline of the host.
1048
1049 #-Extrachromosomal: other extrachromosomal elements not listed here,
1050 such as a B chromosome or an F factor.
1051
1052 #-Genomic: chromosome. This category includes
1053 mitochondrial and chloroplast proteins that are encoded by the nuclear
1054 genome.
1055
1056 #-Hydrogenosome: an organelle that produces hydrogen and ATP and is
1057 found mainly in ciliates, fungi and trichomonads. Hydrogenosomes may
1058 be reduced mitochondria.
1059
1060 #-Kinetoplast: a specialized type of mitochondrion found exclusively
1061 in Kinetoplastida (e.g., Leishmania). NOTE: kinetoplast should
1062 be applied ONLY to members of the Kinetoplastida (trypanosomes and
1063 bodonids).
1064
1065 #-Leucoplast: a plastid lacking pigments of any type.
1066
1067 #-Macronuclear: a specialized type of nucleus found exclusively in the
1068 ciliated protists (e.g., Tetrahymena). NOTE: macronucleus
1069 should be applied ONLY to members of the Ciliophora.
1070
1071 #-Mitochondrion: a semi-autonomous, self-reproducing organelle that
1072 occurs in the cytoplasm of most eukaryotic cells.
1073
1074 #-Nucleomorph: a reduced nuclear remnant found in Chlorarachniophyceae
1075 (e.g., Chlorarachnion) and Cryptophyta (e.g, Cryptomonas). NOTE:
1076 nucleomorph should be applied ONLY to members of the
1077 Chlorarachniophyceae or Cryptophyta.
1078
1079 #-Plasmid: extrachromosomal genetic element found in bacterial species.
1080 Note this does not include the cloning vector used to propagate
1081 the sequence of interest.
1082
1083 #-Plastid: any of a class of double membrane-bound, light-harvesting
1084 organelles (or derived from same). NOTE: plastid should be used
1085 ONLY when a more precise term, e.g., chloroplast, is not
1086 applicable.
1087
1088 #-Proplastid: an immature plastid.
1089
1090 #-Proviral: a virus that is integrated into a host cell chromosome.
1091
1092
1093 **Genetic Code
1094
1095 #If you selected a scientific organism name from the scrollable list
1096 described above, this field will be filled out automatically. However,
1097 if the organism is not on the list, this field will default to the
1098 "Standard" genetic code. If this is incorrect, you can select the
1099 correct genetic code from the pull-down list. To globally change the
1100 genetic code for all sequences which are not automatically filled out,
1101 click on the Genetic Code button in the spreadsheet header.
1102
1103 #For more information regarding the genetic codes available, see the NCBI
1104 <A HREF="http://www.ncbi.nlm.nih.gov/htbin-post/Taxonomy/wprintgc?mode=c">
1105 Taxonomy page
1106 </A>.
1107 http://www.ncbi.nlm.nih.gov/htbin-post/Taxonomy/wprintgc?mode=c
1108
1109 *Import Source Modifiers
1110
1111 #Using this button allows you to import a tab-delimited table of source
1112 modifiers. The first column in the table must contain the Sequence
1113 Identifiers (SeqIDs) used earlier in the submission and each subsequent
1114 column must contain a different source modifier. The first row in the
1115 table must contain the labels for each column. The label for the
1116 Sequence Identifiers column should be in the format "Seq_ID". A list
1117 of
1118 <A HREF="http://www.ncbi.nlm.nih.gov/Sequin/modifiers.html">
1119 modifiers
1120 </A>
1121 in the format to be used in the column headers is available.
1122
1123 *Add Source Modifiers
1124
1125 #Using this button will launch the Specify Source Modifiers pop-up box
1126 where you can add or edit any source modifier. You can also import a
1127 source modifier table or export the existing source modifiers in table
1128 format from this page.
1129
1130 #The Select Modifier dialog allows you to select a modifier from the
1131 pull-down list and edit the value of this modifier for each sequence or
1132 globally add a value to all sequences.
1133
1134 #The two windows in this pop-up provide information about the current
1135 source modifiers for the sequences in your submission. The top window
1136 provides a summary of these modifiers and the lower window lists the
1137 values of each modifier for each sequence. If any sequences have
1138 missing organism names or have source information that is identical to
1139 another sequence, the SeqIDs will be shown in red in this window.
1140 Double-clicking on a modifier value in this window will launch a pop-up
1141 where you can edit this value. Double-clicking on the modifier name
1142 used in the header will launch a modifier-specific pop-up where you can
1143 globally edit the modifier value for all sequences or change the value
1144 for individual sequences.
1145
1146 *Clear All Source Modifiers
1147
1148 #This button will clear all modifiers previously entered in either the
1149 FASTA definition lines or the submission dialogs. This includes the
1150 organism name which is required for submission.
1151
1152 >Protein Page
1153
1154 #This page allows you to provide the protein sequence translated from
1155 the nucleotide sequence that you just entered. If the nucleotide
1156 sequence is alternatively spliced or contains multiple open reading
1157 frames, enter all of the protein sequences on this page. Each protein
1158 sequence will appear in the database record as a coding sequence (CDS)
1159 feature. Sequin will automatically determine which nucleotide
1160 sequences code for the protein and indicate the nucleotide sequence
1161 interval on the database record. Sequin also provides tools that allow
1162 you to view a graphical representation of all the open reading frames
1163 in your nucleotide sequence and to convert these reading frames into
1164 CDS features. These tools are described later in the help
1165 documentation under the
1166
1167 <A HREF="#ORFFinder">
1168 ORF Finder.
1169 </A>
1170
1171 *Conceptual Translation Confirmed by Peptide Sequencing
1172
1173 #Most protein entries are computer-generated conceptual translations of
1174 a nucleic acid sequence. If you have confirmed this translation by
1175 direct sequencing either of the entire protein or of peptides derived
1176 from the protein, please check this box.
1177
1178 *Incomplete at NH3 end/Incomplete at COOH end
1179
1180 #If the sequence is lacking amino acids at the amino- or
1181 carboxy-terminal end of the protein, please check the appropriate box.
1182
1183 *Create Initial mRNA with CDS Intervals
1184
1185 #If you check this box, Sequin will make an mRNA feature with the same
1186 initial intervals (i.e., range of sequence) as the CDS feature. After
1187 the record has been assembled, you should edit the mRNA feature location
1188 to add the 5' UTR and 3' UTR intervals. This may be done either in the
1189 mRNA editor or in the sequence editor.
1190
1191 *Import Protein FASTA
1192
1193 #You can import a single or multiple protein sequences contained within
1194 a previously generated protein FASTA file.
1195
1196 **FASTA Format for Protein Sequences
1197
1198 #The basic FASTA format is the same as that used for
1199 <A HREF="#FASTAFormatforNucleotideSequences">
1200 nucleotide sequences
1201 </A>
1202 , with a FASTA definition line followed by the sequence itself.
1203
1204 #In order to match the protein sequence to the correct nucleotide
1205 sequence, you must use the same Sequence Identifier (SeqID) that you
1206 used to identify the nucleotide sequence. Thus in cases of
1207 alternatively spliced genes, a single protein FASTA file can contain
1208 two unique sequences that have the same SeqID. Both coding regions
1209 will be added to the same nucleotide sequence.
1210
1211 #The available modifiers for use in a protein FASTA definition line are
1212 different than those for a nucleotide FASTA definition line and are
1213 limited to information about the protein or gene itself and are
1214 contained within the examples below. The format remains [modifer=text].
1215
1216 #Note in all cases, the FASTA definition line must not contain any hard
1217 returns. All information must be on a single line of text.
1218
1219 #Examples of properly formatted protein FASTA definition lines are:
1220
1221 <KBD><PRE>>Seq1 [protein=neuropilin 1] [gene=Nrp1]</KBD></PRE>
1222
1223 <KBD><PRE>>ABCD [protein=merozoite surface protein 2] [gene=msp2] [protein_desc=MSP2]</KBD></PRE>
1224
1225 <KBD><PRE>>DNA.new [protein=breast and ovarian cancer susceptibility protein] [gene=BRCA1] [note=breast cancer 1, early onset]</KBD></PRE>
1226
1227 #The protein name should be included in the entry; all other fields are
1228 optional.
1229
1230 #The line after the FASTA definition line begins the amino acid
1231 sequence. It is recommended that each line of sequence be no longer
1232 than 80 characters. Please only use IUPAC symbols within the amino
1233 acid sequence. Non-IUPAC amino acid symbols will be stripped from the
1234 sequence.
1235
1236 #After you import your sequence, a window will appear with information
1237 about the sequence. The first line will describe the number of protein
1238 sequences imported and the total length in amino acids of
1239 all sequences. Each sequence is numbered, and its length,
1240 unique identifier (SeqID), Gene symbol, Protein name, and title
1241 (Definition line) as supplied in the FASTA definition line are listed.
1242
1243 >Annotation Page
1244
1245 #Note: This page will not be available if you have selected a segmented
1246 or gapped sequence as the
1247 <A HREF="#SubmissionType">
1248 Submission Type
1249 </A>
1250 .
1251
1252 #On this page, you can add a
1253 <A HREF="#gene">
1254 gene
1255 </A>
1256 ,
1257 <A HREF="#rRNA">
1258 ribosomal RNA
1259 </A>
1260 or
1261 <A HREF="#CDS">
1262 CDS
1263 </A>
1264 feature across the entire span of each sequence you are submitting.
1265 You can not specify locations within each sequence using this page.
1266 More options are available under the
1267
1268 <A HREF="#AnnotateMenu">
1269 Annotate Menu
1270 </A>
1271 in the record viewer.
1272
1273 #If the feature should be partial at one or both ends, check the
1274 appropriate box and then fill in the text boxes for the relevant
1275 feature.
1276
1277 #You may add a title to all sequences if this was not included in the
1278 FASTA definition line. This will be used as the DEFINITION field in
1279 the final flatfile. The title should contain a brief description of
1280 the sequence. There is a preferred format for nucleotide and protein
1281 titles and Sequin can generate them automatically using the Generate
1282 Definition Line function under the Annotate menu in the record viewer.
1283
1284 >Assembly Tracking
1285
1286 #You will only see this form if you had previously indicated that the
1287 entry is a Third-Party Annotation submission. You must provide the
1288 GenBank Accession number(s) of the primary sequence used to assemble
1289 your TPA sequence. We can not accept primary sequences corresponding
1290 to Reference Sequences or those from proprietary databases. More
1291 information about this can be found on the
1292
1293 <A HREF="http://www.ncbi.nlm.nih.gov/Genbank/TPA.html">
1294 TPA
1295 </A>
1296 home page.
1297
1298 #If a proper GenBank Accession is entered in the first column of the
1299 Assembly Tracking form, the GenBank staff can map the coordinates for
1300 you. You do not need to fill out the 'from' and 'to' columns. Note
1301 that multiple accessions may be entered to provide full coverage of the
1302 assembled sequence.
1303
1304 #If the accession entered is not recognized as a GenBank Accession
1305 number, a pop-up box is generated requesting that you edit the numbers
1306 listed. Sequences from the trace archive can be used primary sequence
1307 data for TPA records but must be entered in the format "TI123456789".
1308
1309 #You may also generate an Assembly Tracking form in the record viewer
1310 under the Annotate menu. Select Descriptors and TPA Assembly from the
1311 pull-down menu in order to generate the Assembly Tracking form.
1312
1313 >Editing the Record
1314
1315 *Overview
1316
1317 #After you finish the Organism and Sequences Form, Sequin will process
1318 your entry based on the information you have entered. The window you
1319 see now is called the record viewer. This is also the window you will
1320 see if you are submitting an update to an existing record. The
1321 instructions after this point are the same whether you are submitting a
1322 new record or an update.
1323
1324 #In the default window of the record viewer, you will see your entry
1325 approximately as it would appear in the database. Most of the
1326 information that you entered earlier in the submission process is
1327 present in the viewer; other information, such as the contact, is still
1328 present in the record but will not be visible in the database entry. If
1329 you have provided a conceptual translation of the nucleotide sequence,
1330 the translation will be listed as a CDS Feature. Sequin automatically
1331 determines which nucleotides encode for the protein, and lists them,
1332 even if the nucleotide sequence contains introns and exons.
1333
1334 #You can save the entry to a file by selecting Save or Save As under the
1335 File menu. This is not the same as saving the entry for submission to
1336 the database. It is a good idea to save the file at this point so that
1337 if you make any unwanted changes during the editing process you can
1338 revert to the original copy. If you wish to edit the entry later, click
1339 on "Read Existing Record" on the Welcome to Sequin form and choose
1340 the file.
1341
1342 #It is likely that the entry could be processed now for submission to
1343 the database. However, you may wish to add information to
1344 the entry. This information may be in the form of Descriptors or
1345 Features. Descriptors are annotations that apply to an
1346 entire sequence, or an entire set of sequences, and Features are
1347 annotations that apply to a specific sequence interval. For example,
1348 you may want to change the Reference Descriptor to add a published
1349 manuscript, or to annotate the sequence by adding features such as a
1350 signal peptide or polyA signal.
1351
1352 #Information in the record viewer can be edited in different ways. One
1353 way to modify information is to double click within the block of
1354 information you wish to edit. Many blocks, such as "Definition",
1355 "Source", "Reference", or "Features" can be edited.
1356
1357 #To add information, create a new descriptor
1358 or feature by selecting the appropriate form from the Misc or Features
1359 menus. These options are described later in this help document.
1360
1361 #Finally, you may need to edit the sequence itself.
1362 <A HREF="#SequenceEditor">
1363 Instructions
1364 </A>
1365 for working with the sequence are presented in the documentation for the
1366 Sequence Editor.
1367
1368 *Submitting the Finished Record to the Database
1369
1370 #Once you are satisfied that you have added all the appropriate
1371 information, you must process your entry for submission to the database.
1372 Select "Validate" under the Search menu. This function detects
1373 discrepancies between the format of your submission and that required by
1374 the database selected for entry.
1375
1376 #If Sequin detects problems with the format of your record, you will see a
1377 screen listing the validation errors as well as suggestions for how to fix the
1378 discrepancies. Single clicking on an error message scrolls the record viewer
1379 to the feature that is causing the error. Double clicking on the error
1380 message launches the relevant feature editor on which you can correct the
1381 problem. If you are annotating a set of multiple sequences, shift-click to
1382 scroll to the target sequence and feature. When you think you have corrected
1383 all the problems, click on "Revalidate". You can submit files with errors,
1384 but it is strongly recommended that you correct as many errors as possible
1385 prior to submission.
1386
1387 #Message: Select Verbose, Normal, Terse, or Table. Verbose gives a more
1388 detailed explanation of the problem.
1389
1390 #Filter: Select the error messages you wish to see. You can select
1391 ALL, SEQ_INST (errors regarding the sequence itself, its type, or
1392 length), SEQ_DESCR (descriptor errors), SEQ_FEAT (feature errors), or
1393 errors specific to your record.
1394
1395 #Severity: Select the types of error messages you wish to see. You
1396 will see the type of message selected, as well as any messages warning
1397 of more serious problems.
1398
1399 #There are four types of error messages, Info, Warning, Error, and
1400 Reject. Info is the least severe, and Reject is the most severe. You
1401 may submit the record even if it does contain errors. However, we
1402 encourage you to fix as many problems as possible. Note that some
1403 messages may be merely suggestions, not discrepancies. A possible
1404 Warning message is that a splice site does not match the consensus.
1405 This may be a legitimate result, but you may wish to recheck the
1406 sequence. A possible Error message is that the conceptual translation
1407 of the sequence that you supplied does not encode an open reading
1408 frame. In this case, you should check that you translated the sequence
1409 in the correct reading frame. A possible Reject message is that you
1410 neglected to include the name of the organism from which the sequence
1411 was derived. The name of the organism is absolutely required for a
1412 database entry.
1413
1414 #If Sequin does not detect any problems with the format of your record,
1415 you will see a message stating "Validation test succeeded".
1416
1417 #To prepare the submission, click the "Done" button on the record
1418 viewer, or select "Prepare Submission" under the File menu. You will be
1419 prompted to save the file. Email this file to the database at the
1420 address shown. You MUST email the file; Sequin does not submit the
1421 file automatically over the network. The email addresses for the
1422 databases are:
1423
1424 !-GenBank: gb-sub@ncbi.nlm.nih.gov
1425 !-EMBL: datasubs@ebi.ac.uk
1426 !-DDBJ: ddbjsub@ddbj.nig.ac.jp
1427
1428 #After your entry is complete, close the record viewer. You will be
1429 returned to the Welcome to Sequin form and can begin another entry.
1430
1431 >The Record Viewer
1432
1433 *Target Sequence
1434
1435 #This pop-up menu shows a list of SeqIDs of all nucleotide and protein
1436 sequences associated with the Sequin entry. Use the menu to select the
1437 sequences displayed in the record viewer, as well as the sequences you
1438 want to "target", that is, the sequences to which you want to apply a
1439 descriptor (see
1440 <A HREF="#Descriptors">
1441 Descriptors
1442 </A>
1443 in the Sequin help documentation). You may select either an individual
1444 sequence by name or a set of sequences, such as All Sequences, or
1445 SEG_dna if you have a segmented nucleotide set. You may change the
1446 selection at any time.
1447
1448 *Display Format
1449
1450 #You may change the display format of the record viewer to any of the
1451 formats described below. Editing a field in one display format will
1452 change that field in all formats. Subsequent pop-up menus will appear
1453 depending on which format is selected.
1454
1455 **GenBank
1456
1457 #This display format allows you to see the submission as it would appear
1458 as a GenBank or DDBJ entry. It is the default format.
1459
1460 #The Mode pop-up default setting is Sequin. Release mode shows certain
1461 qualifiers and db_xrefs in RefSeq entries which are non-collaborative.
1462 Entrez mode is used for web display and can show new elements that have
1463 not yet finished their four month quarentine period. Dump mode requires
1464 that the accession slot be populated. In most cases, there is no need
1465 to change from the default Sequin mode.
1466
1467 #The Style pop-up allows different views of segmented records. The
1468 default is Normal. Segment style is the traditional representation of
1469 segmented sequences, while Contig style displays a CONTIG line with a
1470 join of accessions instead of raw sequence. Master style shows
1471 features mapped to the segmented sequence coordinates instead of the
1472 coordinates of the individual parts.
1473
1474 **Graphic
1475
1476 #This display format shows the entry in a graphical view. The top bar
1477 represents the nucleotide sequence. Lower arrows or bars represent
1478 different features on the sequence. Double click on an arrow or bar to
1479 launch the appropriate editing window. Any sequence highlighted in the
1480 Sequence Editor will be boxed on the graphical view of the sequence.
1481 To see a graphical representation of a segmented set (see
1482
1483 <A HREF="#Submissiontype">
1484 Submission type
1485 </A>,
1486 above), the Target Sequence must be set to
1487 SEG_dna.
1488
1489 #The Style pop-up menu allows you to see the display in different styles
1490 and colors.
1491
1492 #The Scale pop-up menu allows you to see the display in different sizes.
1493 The smaller the number, the larger the display.
1494
1495 **Sequence
1496
1497 #This display format shows the nucleotide sequence in the record along
1498 with any annotated features (such as CDS or mRNA). You can only view a
1499 single sequence at a time with this option. You can use the Features
1500 pop-up menu to change the display of the features. With the numbering
1501 pop-up menu, select where you want the sequence numbers to be
1502 indicated, at the side of the window, at the top of each sequence line,
1503 or not at all.
1504
1505 **Alignment
1506
1507 #This display format shows sets of aligned sequences, such as those
1508 imported as part of a population, phylogenetic, mutation, or
1509 environmental samples set. When toggled to All Sequences in the Target
1510 Sequence pop-up, the alignment of all entries will be displayed. To
1511 more closely analyze similarities, you can select a single entry in the
1512 Target Sequence pop-up. The complete sequence of the entry selected
1513 will be displayed. Any nucleotides in the other sequences that differ
1514 from that selected will be displayed, while identical nucleotides will
1515 be displayed as a period. You can also display features annotated on
1516 the selected target sequence or all sequences using the Feature display
1517 toggle. To launch the alignment editor, select
1518 <A HREF="#AlignmentAssistant">
1519 Alignment Assistant
1520 </A>
1521 from the record viewer Edit menu.
1522
1523 **EMBL
1524
1525 #This display format allows you to see the submission as it would appear
1526 as an EMBL entry.
1527
1528 **Table
1529
1530 #This display format shows the annotation in a five-column, tab-delimited
1531 <A HREF="table.html">table</A>
1532 format. This format can be imported to add annotation to a record that
1533 has none.
1534
1535 **FASTA
1536
1537 #This display shows the sequence and Definition line only, without any
1538 annotations, in a format called the FASTA format. This is a format used
1539 by many molecular biology analysis programs. You cannot edit in this
1540 display mode.
1541
1542 **Quality
1543
1544 #This display format shows quality score data ifit has been included in
1545 the submission.
1546
1547 **ASN.1
1548
1549 #This display shows the entry in Abstract Syntax Notation 1, a data
1550 description language used by the NCBI. You cannot edit in this display
1551 mode.
1552
1553 **XML
1554
1555 #This display format shows the entry in XML language, sometimes used by
1556 various databases. You cannot edit in this display mode.
1557
1558 **INSDSeq
1559
1560 #This display format shows the entry in the XML format used by the INSD.
1561 You cannot edit in this display mode.
1562
1563 **Desktop
1564
1565 #The NCBI DeskTop displays the internal
1566 structure of the record being viewed in Sequin. The
1567 <A HREF="#NCBIDeskTop">
1568 DeskTop
1569 </A>
1570 is explained under the Misc menu.
1571
1572 *Done
1573
1574 #This button allows you to validate the entry when you are finished with
1575 the submission. See
1576 <A HREF="#SubmittingtheFinishedRecordtotheDatabase">
1577 Submitting the Finished Record to the Database
1578 </A>
1579 in the Sequin help documentation.
1580
1581 *Controls for Downloaded Entries
1582
1583 #If you have downloaded a sequence from Entrez, you will see an
1584 additional button labeled PubMed. This button will launch a web
1585 browser containing the target sequence as it appears in Entrez. From
1586 here, you can access any Entrez-supported Links, including related
1587 sequences and associated references in PubMed.
1588
1589 >Descriptors
1590
1591 *Overview
1592
1593 #Descriptors are annotations that apply to an entire sequence, or an
1594 entire set of sequences, in a given entry. They do not have a specific
1595 location on a sequence, as they apply to the entire sequence. They can
1596 be contrasted to
1597 <A HREF="#Features">
1598 Features,
1599 </A>
1600 which apply to a specific interval of the sequence.
1601
1602 #You may edit descriptors in one of two ways.
1603
1604 #(1) In the record viewer, double click within the text of the
1605 descriptor to bring up a form on which information can be added.
1606
1607 #(2) Choose the option Descriptors from the Annotate menu.
1608
1609 *Annotate Menu - Descriptors
1610
1611 #This menu allows you either to create new descriptors or to modify
1612 existing ones. Select the descriptor that you wish to modify.
1613
1614 #When you first select a descriptor, you will see a window called
1615 "Descriptor Target Control". Using the target control pop-up menu,
1616 select the sequences you wish this descriptor to cover. The name(s)
1617 listed correspond to the SeqID(s) given to the nucleotide or amino acid
1618 sequences when they were imported into Sequin. The default
1619 selection for this menu is set in the Target Sequence pop-up menu on
1620 the record viewer. You may choose to have the descriptor cover just
1621 one sequence, or a set of sequences in your entry. If you are creating
1622 a new descriptor, select "Create New". If you wish to modify a
1623 previous descriptor, select "Edit Old".
1624
1625 #The following is a list of some of the descriptors that can be added.
1626 Two additional descriptors, those for
1627 <A HREF="#Publications">
1628 Publications
1629 </A>
1630 and
1631 <A HREF="#BiologicalSourceDescriptororFeature">
1632 Biological Source,
1633 </A>
1634 are described in other sections.
1635
1636 **TPA Assembly
1637
1638 #If you indicated that your sequence is a TPA submission, a
1639 <A HREF="#AssemblyTracking">
1640 TPA Assembly
1641 </A>
1642 was created from the information regarding primary accession numbers.
1643 This Assembly information can be edited here. Note that it is not
1644 necessary to enter nucleotide location in the "from" and "to" columns.
1645
1646 **Update Date
1647
1648 #This is for database staff use only. Please do not modify the date.
1649
1650 **Create Date
1651
1652 #This is for database staff use only. Please do not modify the date.
1653
1654 **Region
1655
1656 #This descriptor provides general information about the genetic context
1657 of the sequence. For example, if your nucleotide sequence is cloned
1658 from the region surrounding the Huntington's Disease gene, you could
1659 enter that information here. Providing information for this descriptor
1660 is optional.
1661
1662 **Name
1663
1664 #Alternative place for a descriptive name for the sequence. This
1665 information will not appear in the flatfile view, but will be
1666 maintained in the ASN1.
1667
1668 **Comment
1669
1670 #This descriptor is used to list any additional information that you
1671 wish to provide about the sequence. Use of this descriptor is optional.
1672 Most information can be better annotated using the appropriate
1673 features and qualifiers rather than a generic comment descriptor.
1674
1675 **Title
1676
1677 #This descriptor contains the information that will go on the Definition
1678 line of the database entry. If you supplied a title for your
1679 nucleotide sequence when you imported it into Sequin, that information
1680 is here. If you wish to change the Definition line, or if you did not
1681 supply a title when you submitted the sequence, edit this Descriptor.
1682
1683 **Molecule Description
1684
1685 #This descriptor indicates the characteristics of the molecule from
1686 which the sequence was derived. The information that you have already
1687 entered can be edited here. In most cases, the molecule and class are
1688 the only choices which should be edited from the default values.
1689
1690 ***Molecule
1691
1692 #A GenBank sequence can represent one of several different molecule
1693 types. Enter in the Molecule pop-up menu the type of molecule that was
1694 sequenced. A brief description of the choices in this pop-up menu were
1695 listed previously.
1696
1697 ***Completedness
1698
1699 Choose the appropriate option from the pop-up menu.
1700
1701 #-Complete: Use this designation when a complete molecule, such as a
1702 complete mitochondrial genome, is being submitted.
1703
1704 #-Partial: Use this designation when an incomplete unit, such as the
1705 partial coding sequence of a gene, is being submitted.
1706
1707 #-No left: Use this designation when an incomplete unit, such as the
1708 partial coding sequence of a gene, or a partial protein sequence, is
1709 being submitted. The sequence has no left if it is incomplete on the
1710 5', or amino-terminal, end.
1711
1712 #-No right: Use this designation when an incomplete unit, such as the
1713 partial coding sequence of a gene, or a partial protein sequence, is
1714 being submitted. The sequence has no right if it is incomplete on the
1715 3', or carboxy-terminal, end.
1716
1717 #-No ends: Use this designation when an incomplete unit, such as the
1718 partial coding sequence of a gene, or a partial protein sequence, is
1719 being submitted, The sequence has no ends if it is incomplete at both
1720 the 5' and 3', or amino- and carboxy- terminal, ends.
1721
1722 #-Other: Use this designation when none of the above descriptions apply.
1723
1724 ***Technique
1725
1726 #From the pop-up menu, select the technique that was used to generate the
1727 sequence.
1728
1729 #-Standard: standard sequencing technique.
1730
1731 #-EST:
1732 <A HREF="http://www.ncbi.nlm.nih.gov/dbEST/index.html">
1733 Expressed Sequence Tag
1734 </A>
1735 : single-pass, low-quality mRNA sequences
1736 derived from cDNAs. These sequences will appear in the EST division.
1737
1738 #-STS:
1739 <A HREF="http://www.ncbi.nlm.nih.gov/dbSTS/index.html">
1740 Sequence Tagged Site
1741 </A>
1742 : short sequences that are operationally
1743 unique in a genome and that define a specific position on the physical
1744 map. These sequences will appear in the STS division.
1745
1746 #-Survey:
1747 <A HREF="http://www.ncbi.nlm.nih.gov/dbGSS/index.html">
1748 single-pass genomic sequence
1749 </A>
1750 . These sequences will appear in
1751 the Genome Survey Sequence (GSS) division.
1752
1753 #-Genetic Map: Genetic map information, for example, in the Genomes division.
1754
1755 #-Physical Map: Physical map information, for example in the Genomes division.
1756
1757 #-Derived: A sequence assembled into a contig from shorter sequences.
1758
1759 #-Concept-trans: A protein translation generated with the appropriate
1760 genetic code.
1761
1762 #-Seq-pept: Protein sequence was generated by direct sequencing of a
1763 peptide.
1764
1765 #-Both: Protein sequence was generated by conceptual translation and
1766 confirmed by peptide sequencing.
1767
1768 #-Seq-pept-Overlap: Protein sequence was generated by sequencing
1769 multiple peptides, and the order of peptides was determined by overlap
1770 in their sequences.
1771
1772 #-Seq-pept-Homol: Protein sequence was generated by sequencing
1773 multiple peptides, and the order of peptides was determined by homology
1774 with another protein.
1775
1776 #-Concept-Trans-A: Conceptual translation of the nucleotide sequence
1777 provided by the author of the entry.
1778
1779 #-HTGS 0:
1780 <A HREF="http://www.ncbi.nlm.nih.gov/HTGS/">
1781 High Throughput Genome Sequence
1782 </A>
1783 , Phase 0. These sequences
1784 are produced by high-throughput sequencing projects and will be in the
1785 HTG division.
1786
1787 #-HTGS 1:
1788 <A HREF="http://www.ncbi.nlm.nih.gov/HTGS/">
1789 High Throughput Genome Sequence
1790 </A>
1791 , Phase 1. These sequences
1792 are produced by high-throughput sequencing projects and will be in the
1793 HTG division.
1794
1795 #-HTGS 2:
1796 <A HREF="http://www.ncbi.nlm.nih.gov/HTGS/">
1797 High Throughput Genome Sequence
1798 </A>
1799 , Phase 2. These sequences
1800 are produced by high-throughput sequencing projects and will be in the
1801 HTG division.
1802
1803 #-HTGS 3:
1804 <A HREF="http://www.ncbi.nlm.nih.gov/HTGS/">
1805 High Throughput Genome Sequence
1806 </A>
1807 , Phase 3. These sequences
1808 are produced by high-throughput sequencing projects and will be in the
1809 HTG division.
1810
1811 #-FLI_cDNA: Full Length Insert cDNA. Sequence corresponds to entire cDNA but
1812 not necessarily entire transcript. These sequences are produced by large
1813 sequencing projects.
1814
1815 #-HTC: High Throughput cDNA. These sequences are produced by large sequencing
1816 projects.
1817
1818 #-WGS:
1819 <A HREF="http://www.ncbi.nlm.nih.gov/Genbank/wgs.html">
1820 Whole Genome Shotgun
1821 </A>
1822 . These sequences are produced by large sequencing projets and follow a
1823 separate submission process.
1824
1825 #-Barcode: Nucleotide sequence is part of Barcodes of Life project. This
1826 selection should only be used by members of the Consortium for the
1827 Barcodes of Life.
1828
1829 #-Composite-WGS-HTGS: Nucleotide seqeunce has been assembled by large
1830 sequencing centers using a combination of whole genome shotgun and BAC-baed
1831 sequencing.
1832
1833 #-TSA: Transcriptome Shotgun Assembly. Shotgun assemblies of mRNA sequences
1834 from primary data submitted to dbEST, the short read archive (SRA) or the
1835 trace archive.
1836
1837 #-Other: Do not use this designation.
1838
1839 ***Class
1840
1841 #From the pop-up menu, select the type of molecule that was sequenced.
1842
1843 #-DNA: DNA
1844
1845 #-RNA: RNA
1846
1847 #-Protein: Protein
1848
1849 #-Nucleotide: Do not select this item
1850
1851 #-Other: Do not select this item
1852
1853 ***Topology
1854
1855 #From the pop-up menu, select the topology of the sequenced molecule.
1856
1857 #-Linear: Linear molecule (most sequences).
1858
1859 #-Circular: Circular molecule (such as a complete plasmid or mitochondrion).
1860
1861 #-Tandem: Do not select this item.
1862
1863 #-Other: Do not select this item.
1864
1865 ***Strand
1866
1867 #From the pop-up menu, select whether the sequence was derived from an
1868 organism with a single- or double-stranded genome. This is used primarily for
1869 viral submissions.
1870
1871 #-Single: The organism contains only a single-stranded genome, for
1872 example, ssRNA viruses.
1873
1874 #-Double: The organism contains only a double-stranded genome, for
1875 example, dsDNA viruses.
1876
1877 #-Mixed: Do not select this item.
1878
1879 #-Mixed Rev: Do not select this item.
1880
1881 #-Other: Do not select this item.
1882
1883 **Biological Source
1884
1885 #The Biological Source descriptor is described in more detail
1886 <A HREF="#BiologicalSourceDescriptororFeature">
1887 below.
1888 </A>
1889
1890 >Features
1891
1892 *Overview
1893
1894 #Features are annotations which apply to one or more intervals on a
1895 sequence. They can be contrasted to
1896 <A HREF="#Descriptors">
1897 Descriptors,
1898 </A>
1899 that apply to an entire sequence or an entire set of sequences.
1900 Features will be added to the Target Sequence selected in the record
1901 viewer pop-up menu.
1902
1903 #You may add or modify features in one of three ways.
1904
1905 #(1) In the record viewer, double click on the text of an existing
1906 feature to bring up a form on which information can be added or edited.
1907
1908 #(2) Choose the feature from the Annotate menu to add a new feature.
1909
1910 #(3) Choose the feature from the Sequence Editor Features menu to add a
1911 new feature.
1912
1913 #The features listed in the Annotate menu and the Sequence Editor
1914 Features menu are identical, and the instructions for adding them are
1915 the same, with one exception. If you annotate them in the Annotate
1916 menu, you must provide the nucleotide sequence location of the feature.
1917 However, if you add features from the Sequence Editor, you can
1918 highlight the sequence that the feature covers, and the location of the
1919 sequence will be automatically entered in the feature location box.
1920
1921 *Annotate Menu - Features
1922
1923 #This menu allows you to add or modify features on the sequence selected
1924 in the Target Sequence pop-up menu of the record viewer. Features are
1925 grouped into six categories. Select the feature that you would like to
1926 mark on your sequence. A new form will appear.
1927
1928 #Feature forms share a common design. The first page is specific to the
1929 particular feature, e.g., Coding Region or Gene. The second page lists
1930 Properties of the Feature. The third page describes the Location of the
1931 feature. Details about the common second and third pages are provided
1932 below.
1933
1934 **Properties Page
1935
1936 ***General Subpage
1937
1938 #Enter general comments about the feature here.
1939
1940 #Select any of the flags if necessary. If this sequence contains only a
1941 partial representation of the feature you are describing, check the
1942 "Partial" box. Check the "Exception" box if the feature annotates a
1943 post-transcriptional modification of the nucleotide sequence, such as
1944 ribosomal slippage or RNA editing. This is generally used only on CDS
1945 features. The evidence dialogs will only be editable if information
1946 has been entered in the Evidence subpage.
1947
1948 #If a gene feature overlaps the feature you are editing, the gene symbol
1949 will appear in the pull-down menu. If you want to add the name of a
1950 new gene, select new, and enter its name and optional description. By
1951 default, mapping between the feature and the gene is done by overlap,
1952 that is, the gene associated with the feature is the gene whose
1953 location overlaps with the location of the feature. Under some
1954 circumstances, for example, if the sequences of two genes overlap, you
1955 may wish the feature to apply to a different gene. In this case,
1956 select cross-reference, and select the name of the new gene in the
1957 pop-up menu. If you do not want the feature to map to any existing
1958 gene, select suppress. You may also edit information on the Gene
1959 feature form by clicking on Edit Gene Feature.
1960
1961 ***Comment Subpage
1962
1963 #Add any comments about the feature here, especially if you checked the
1964 "Exception" box on the General Subpage.
1965
1966 ***Citations Subpage
1967
1968 #This page is used to list any citations that specifically apply to the
1969 feature you are annotating. The citation must have already been entered
1970 into the record (see
1971 <A HREF="#Publications">
1972 Publications
1973 </A>)
1974 in the Sequin help documentation. Click on Edit Citations, and
1975 place a check mark in box next to the publication you want to cite.
1976 However, we discourage the use of citations on features.
1977
1978 ***Cross-Refs Subpage
1979
1980 #This is a read-only page used to cross-reference this entry to entries
1981 in external databases (databases other than GenBank, EMBL/EBI, and
1982 DDBJ), such as dbEST or FLYBASE. For more information on this topic,
1983 see the International Nucleotide Sequence Database Collaboration
1984
1985 <A HREF="http://www.ncbi.nlm.nih.gov/collab/db_xref.html">
1986 page
1987 </A>.
1988 http://www.ncbi.nlm.nih.gov/collab/db_xref.html
1989
1990 ***Evidence Subpage
1991
1992 #This page is primarily used by large sequencing centers to explain
1993 annotation prediction methods and its use is optional. More details
1994 about these qualifiers can be found in the
1995 <A HREF="http://www.ncbi.nlm.nih.gov/Genbank/genomesubmit_annotation.html#Evidence_Qualifiers">
1996 genome submission guidelines
1997 </A>.
1998 The two choices of evidence are Experiment or Inference.
1999
2000 #Wet-bench, experimental evidence can be entered as free text in the
2001 Experiment section. Please be as brief as possible.
2002
2003 #The Inference section allows for information to be added in cases where
2004 the feature is annotated based solely on sequence similarity or
2005 prediction software. In order to fill in text, you must select one of
2006 the options from the Category pull-down menu. Different pull-down and
2007 text boxes will appear depending on the selection you choose from the
2008 Category menu. If you select one of the 'similar to' categories, you
2009 must include the name of the database and the corresponding accession
2010 number of the sequence used as the basis for the annotation. If you
2011 choose one of the prediction categories, you must include the name and
2012 version of the prediction program used as the basis for the annotation.
2013
2014 #For example, if your annotation of a coding region was based on
2015 similarity to the sequence and annotation in GenBank Accession number
2016 AY411252, you would select "similar to DNA sequence" from the pull-down
2017 menu and then select "INSD" in the Database pull-down. You would then
2018 type "AY411252.1" in the Accession text box. If the annotation is
2019 based on the Genscan prediction algorithm, you would select "ab initio
2020 prediction" from the pull-down menu, select "Genscan" in the Program
2021 pull-down and enter 2.0 in the Program Version text box. If the
2022 database or program used is not listed in the appropriate pull-down
2023 list, select Other from the list. A new text box will appear where you
2024 can enter the name of the database or program used. You still must
2025 include the appropriate accession number or version in the subsequent
2026 text box.
2027
2028 ***Identifiers Subpage
2029
2030 #This is a read-only page used by the database staff for tracking
2031 features within the record.
2032
2033 **Location Page
2034
2035 #This page allows you to select the location of the feature you are
2036 citing. Each feature must have a sequence interval associated with it.
2037 In most cases, Sequin will limit the option to the nucleic acid or
2038 protein sequence as appropriate.
2039
2040 #Check the 5' Partial or 3' Partial box if the feature in your nucleic
2041 acid sequence is missing residues at the 5' or 3' ends, respectively.
2042 Check the NH2 Partial or COOH Partial if the feature in your amino acid
2043 sequence is missing residues at the amino- or carboxy-terminal ends,
2044 respectively. If you checked "Partial" on the Properties page, you
2045 must check either the 5' and/or 3' partial boxes.
2046
2047 #Enter the sequence range of the feature. The numbers should correspond
2048 to the nucleotide sequence interval if the SeqID is set to a nucleotide
2049 sequence, and to an amino acid sequence interval if the SeqID is set to
2050 a protein sequence. If the feature spans multiple, non-continuous
2051 intervals on the sequence, indicate the beginning and end points of each
2052 interval. If each interval is separate, and should not be joined with
2053 the others to describe the feature, check the Intersperse intervals with
2054 gaps box (for example, when annotating multiple primer binding sites).
2055 If the feature is composed of several intervals that should all be
2056 joined together, do not check the box (for example, when annotating mRNA
2057 on a genomic DNA sequence).
2058
2059 #For nucleic acid Features only: From the pop-up menu, select the
2060 strand on which the feature is found.
2061
2062 #-Plus: Plus strand, or coding strand.
2063
2064 #-Minus: Minus strand, or non-coding strand.
2065
2066 #-Both: Both strands.
2067
2068 #-Reverse: Do not select this item.
2069
2070 #-Other: Do not select this item.
2071
2072 #Use the pop-up menu to select the SeqID of the sequence you are
2073 describing by the location. Clicking on the X button to the left will clear
2074 location spans, strand, and SeqID from that row.
2075
2076 #If you are working on a set of sequences which contain an alignment,
2077 you will see a toggle at the bottom of the Location Page where you can
2078 select to add or view the location of the feature using the Sequence
2079 Coordinates of the target sequence or the Alignment Coordinates. In
2080 either case, the feature will only be added to the target sequence. If
2081 you want to add features to all members of the set using the alignment
2082 coordinates, you must use the
2083
2084 <A HREF="http://www.ncbi.nlm.nih.gov/Sequin/sequin.hlp.html#Workingwithsetsofalignedsequences">
2085 Alignment Assistant
2086 </A>
2087 .
2088 #A brief description of the available features follows. A detailed
2089 explanation of how to use the coding region (CDS) feature is included.
2090 The DDBJ/EMBL/GenBank feature table definition
2091 <A HREF="http://www.ncbi.nlm.nih.gov/collab/FT/index.html">
2092 page
2093 </A>
2094 http://www.ncbi.nlm.nih.gov/collab/FT/index.html
2095 provides detailed information about other features.
2096
2097 *attenuator
2098
2099 #1) region of DNA at which regulation of termination of transcription
2100 occurs, which controls the expression of some bacterial operons; 2)
2101 sequence segment located between the promoter and the first structural
2102 gene that causes partial termination of transcription.
2103
2104 *C_region
2105
2106 #Constant region of immunoglobulin light and heavy chains, and T-cell
2107 receptor alpha, beta, and gamma chains. Includes one or more exons,
2108 depending on the particular chain.
2109
2110 *CAAT_signal
2111
2112 #CAAT box; part of a conserved sequence located about 75 bp upstream of
2113 the start point of eukaryotic transcription units that may be involved
2114 in RNA polymerase binding; consensus=GG(C or T)CAATCT.
2115
2116 *CDS
2117
2118 #coding sequence; sequence of nucleotides that corresponds with the
2119 sequence of amino acids in a protein (location includes stop codon).
2120 Feature includes amino acid conceptual translation.
2121
2122 **Coding Region Page
2123
2124 #Most users add a coding region to their sequence when they fill out the
2125 Organism and Sequences form. However, you may need to edit the coding
2126 region, or add additional ones. Choose CDS under the Coding Regions
2127 and Transcripts submenu of the Features menu, or to edit an existing
2128 CDS, double click on the record viewer. If you appended the partial
2129 sequence of a coding region to the Organism and Sequences form, you will
2130 probably need to edit the Coding Region feature to avoid validation
2131 error messages about the location of the coding region.
2132
2133 ***General (Product) Subpage
2134
2135 #Choose the genetic code that should be used to translate the
2136 nucleotide sequence. For more information, and for the translation
2137 tables themselves, see the NCBI Taxonomy
2138 <A HREF="http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c">
2139 page
2140 </A>.
2141 If the genetic code is already populated from the taxonomy database, do
2142 not change this selection.
2143
2144 #Choose the reading frame in which to translate the sequence. Do not
2145 fill in the Protein Product or SeqID selections.
2146
2147 #Supply additional information about the protein by clicking on Edit
2148 Protein Information to launch the Protein feature forms. The protein
2149 name must have already been filled out on the Protein subpage.
2150
2151 #Checking retranslate on accept will translate the nucleotide sequence
2152 according to the interval(s) indicated on the Locations page when you
2153 click on Accept to exit the editor. This new translation will replace
2154 any earlier translations you have supplied. This should not be a
2155 problem if the interval was indicated appropriately.
2156
2157 #If the coding sequence that you supply is a partial sequence and you
2158 have checked a Partial box on the Location subpage, it is a good idea to
2159 check the Synchronize Partials box. In this case, Sequin will ensure
2160 that all other appropriate features (such as protein) are also marked as
2161 partial.
2162
2163 #When editing existing CDS features, choose the sequence you want to
2164 view by selecting its name uder the Product pop-up menu. You may also
2165 import a new protein sequence by selecting Import Protein FASTA under
2166 the file menu. The sequence should be formatted as described above on
2167 the Organism and Sequences form.
2168
2169 #After you have imported a protein sequence, click on Predict Interval.
2170 This function will predict the interval on the nucleotide sequence to
2171 which the coding region applies. If you do not select this function,
2172 the interval will likely be wrong, and you will get an error message
2173 when you attempt to validate the record. If your sequence is a 5' or 3'
2174 partial, you must first indicate this manually on the Location Page.
2175
2176 #You may also have Sequin generate the protein sequence from the
2177 nucleotide sequence by clicking on Translate Product. However, you must
2178 first indicate the location and partialness of the coding region on the
2179 Location page in order to obtain the correct translation.
2180
2181 #The Edit Protein Sequence button will launch an amino acid
2182 <A HREF="#SequenceEditor">
2183 Sequence Editor
2184 </A>
2185 as discussed below.
2186
2187 #The Adjust for Stop Codon button will truncate a displayed translation
2188 at the first stop codon. If no stop codon is present in the current
2189 translation, this function will extend the translation to the first stop
2190 codon or to the end of the sequence. In both cases, the spans of the
2191 coding region will be automatically updated on the Location Page to
2192 reflect the new translation.
2193
2194 ***Protein Subpage
2195
2196 #Use this page to enter or edit a name or description of the protein
2197 product. For a new sequence, enter information directly into the
2198 boxes. You can edit descriptions of an existing sequence by clicking
2199 on Edit Protein Feature which will bring up the Protein feature form.
2200 The Launch Product Viewer displays the flatfile view of ht eprotein
2201 record generated from the information in the CDS feature.
2202
2203 ***Exceptions Subpage
2204
2205 #Exceptions describe places where there is a posttranslational
2206 modification. Enter the amino acid position at which the modification
2207 occurs, and select the amino acid that is actually represented in the
2208 protein from the pop-up list. Sequin will change the amino acid number
2209 to a nucleotide interval. Please provide some explanation for the
2210 exception in a comment.
2211
2212 *conflict
2213
2214 #Independent determinations of the "same" sequence differ at this site
2215 or region.
2216
2217 *D-loop
2218
2219 #Displacement loop; a region within mitochondrial DNA in which a short
2220 stretch of RNA is paired with one strand of DNA, displacing the
2221 original partner DNA strand in this region; also used to describe the
2222 displacement of a region of one strand of duplex DNA by a single
2223 stranded invader in the reaction catalyzed by RecA protein.
2224
2225 *D_segment
2226
2227 #Diversity segment of immunoglobulin heavy chain, and T-cell receptor
2228 beta chain.
2229
2230 *enhancer
2231
2232 #A cis-acting sequence that increases the utilization of (some)
2233 eukaryotic promoters and can function in either orientation and in any
2234 location (upstream or downstream) relative to the promoter.
2235
2236 *exon
2237
2238 #Region of genome that codes for portion of spliced mRNA; may contain
2239 5' UTR, all CDSs, and 3' UTR.
2240
2241 *gap
2242
2243 #Gap in the sequence, only applied to gaps of unknown length. The
2244 location span of the gap feature is 100 base pairs, indicated by 100 "n"s
2245 in the sequence. The qualifier /estimated_length=unknown is mandatory.
2246
2247 *GC_signal
2248
2249 #GC box; a conserved GC-rich region located upstream of the start point
2250 of eukaryotic transcription units that may occur in multiple copies or
2251 in either orientation; consensus=GGGCGG.
2252
2253 *gene
2254
2255 #Region of biological interest identified as a gene and for which a name
2256 has been assigned.
2257
2258 *iDNA
2259
2260 #Intervening DNA; DNA which is eliminated through any of several kinds
2261 of recombination.
2262
2263 *intron
2264
2265 #A segment of DNA that is transcribed, but removed from within the
2266 transcript, by splicing together the sequences (exons) on either side of
2267 it.
2268
2269 *J_segment
2270
2271 #Joining segment of immunoglobulin light and heavy chains, and T-cell
2272 receptor alpha, beta, and gamma chains.
2273
2274 *LTR
2275
2276 #Long terminal repeat, a sequence directly repeated at both ends of a
2277 defined sequence, of the sort typically found in retroviruses.
2278
2279 *mat_peptide
2280
2281 #Mature peptide or protein coding sequence; coding sequence for the
2282 mature or final peptide or protein product following post-translational
2283 modification. The location does not include the stop codon (unlike the
2284 corresponding CDS).
2285
2286 *misc_binding
2287
2288 #Site in nucleic acid that covalently or non-covalently binds another
2289 moiety that cannot be described by any other Binding key (primer_bind or
2290 protein_bind).
2291
2292 *misc_difference
2293
2294 #Feature sequence is different from that presented in the entry and
2295 cannot be described by any other Difference key (conflict, unsure,
2296 mutation, variation, allele, or modified_base).
2297
2298 *misc_feature
2299
2300 #Region of biological interest which cannot be described by any other
2301 feature key.
2302
2303 *misc_recomb
2304
2305 #Site of any generalized, site-specific, or replicative recombination
2306 event where there is a breakage and reunion of duplex DNA that cannot be
2307 described by other recombination keys (iDNA and virion) or qualifiers of
2308 source key (/proviral).
2309
2310 *misc_RNA
2311
2312 #Any transcript or RNA product that cannot be defined by other RNA keys
2313 (prim_transcript, precursor_RNA, mRNA, 5'UTR, 3'UTR,
2314 exon, transit_peptide, polyA_site, rRNA, tRNA, and ncRNA).
2315
2316 *misc_signal
2317
2318 #Any region containing a signal controlling or altering gene function or
2319 expression that cannot be described by other Signal keys (promoter,
2320 CAAT_signal, TATA_signal, -35_signal, -10_signal, GC_signal, RBS,
2321 polyA_signal, enhancer, attenuator, terminator, and rep_origin).
2322
2323 *misc_structure
2324
2325 #Any secondary or tertiary structure or conformation that cannot be
2326 described by other Structure keys (stem_loop and D-loop).
2327
2328 *modified_base
2329
2330 #The indicated nucleotide is a modified nucleotide and should be
2331 substituted for by the indicated molecule (given in the mod_base
2332 qualifier value).
2333
2334 *mRNA
2335
2336 #messenger RNA; includes 5' untranslated region (5' UTR), coding sequences
2337 (CDS, exon) and 3' untranslated region (3' UTR).
2338
2339 *ncRNA
2340
2341 #non-coding RNA; a non-protein-coding transcript other than ribosomal RNA and
2342 transfer RNA, including antisense RNA, guide RNA, scRNA, siRNA, miRNA, piRNA,
2343 snoRNA, and snRNA. The specific type of ncRNA must be specified in the
2344 /ncRNA_class qualifier.
2345
2346 *N_region
2347
2348 #Extra nucleotides inserted between rearranged immunoglobulin segments.
2349
2350 *operon
2351
2352 #Region containing polycistronic transcript under the control of the same
2353 regulatory sequences.
2354
2355 *oriT
2356
2357 Origin of transfer; region of DNA where transfer is initiated during the
2358 process of conjugation or mobilization.
2359
2360 *polyA_signal
2361
2362 #Recognition region necessary for endonuclease cleavage of an RNA
2363 transcript that is followed by polyadenylation; consensus=AATAAA.
2364
2365 *polyA_site
2366
2367 #Site on an RNA transcript to which will be added adenine residues by
2368 post-transcriptional polyadenylation.
2369
2370 *precursor_RNA
2371
2372 #Any RNA species that is not yet the mature RNA product; may include 5'
2373 clipped region (5' clip), 5' untranslated region (5' UTR), coding
2374 sequences (CDS, exon), intervening sequences (intron), 3' untranslated
2375 region (3' UTR), and 3' clipped region (3' clip).
2376
2377 *prim_transcript
2378
2379 #Primary (initial, unprocessed) transcript; includes 5' clipped region
2380 (5' clip), 5' untranslated region (5' UTR), coding sequences (CDS, exon),
2381 intervening sequences (intron), 3' untranslated region (3' UTR), and 3'
2382 clipped region (3' clip).
2383
2384 *primer_bind
2385
2386 #Non-covalent primer binding site for initiation of replication,
2387 transcription, or reverse transcription. Includes site(s) for synthetic
2388 e.g., PCR primer elements.
2389
2390 *promoter
2391
2392 #Region on a DNA molecule involved in RNA polymerase binding to initiate
2393 transcription.
2394
2395 *protein_bind
2396
2397 #Non-covalent protein binding site on nucleic acid.
2398
2399 *RBS
2400
2401 #Ribosome binding site.
2402
2403 *repeat_region
2404
2405 #Region of genome containing repeating units. Some qualifiers such as
2406 rpt_type, mobile_element and satellite have controlled vocabularies. These
2407 qualifiers have check boxes or pull-down menus to ensure that the
2408 correct format is used.
2409
2410 *rep_origin
2411
2412 #Origin of replication; starting site for duplication of nucleic acid to
2413 give two identical copies.
2414
2415 *rRNA
2416
2417 #Mature ribosomal RNA ; the RNA component of the ribonucleoprotein
2418 particle (ribosome) that assembles amino acids into proteins.
2419
2420 *S_region
2421
2422 #Switch region of immunoglobulin heavy chains. Involved in the
2423 rearrangement of heavy chain DNA leading to the expression of a
2424 different immunoglobulin class from the same B-cell.
2425
2426 *sig_peptide
2427
2428 #Signal peptide coding sequence; coding sequence for an N-terminal
2429 domain of a secreted protein; this domain is involved in attaching
2430 nascent polypeptide to the membrane; leader sequence.
2431
2432 *source
2433
2434 #Identifies the biological source of the specified span of the sequence.
2435 This key is mandatory. Every entry will have, as a minimum, a single
2436 source key spanning the entire sequence. More than one source key per
2437 sequence is permittable.
2438
2439 *stem_loop
2440
2441 #Hairpin; a double-helical region formed by base-pairing between
2442 adjacent (inverted) complementary sequences in a single strand of RNA or
2443 DNA.
2444
2445 *STS
2446
2447 #Sequence Tagged Site. Short, single-copy DNA sequence that
2448 characterizes a mapping landmark on the genome and can be detected by
2449 PCR. A region of the genome can be mapped by determining the order of a
2450 series of STSs.
2451
2452 *TATA_signal
2453
2454 #TATA box; Goldberg-Hogness box; a conserved AT-rich heptamer found
2455 about 25 bp before the start point of each eukaryotic RNA polymerase II
2456 transcript unit that may be involved in positioning the enzyme for
2457 correct initiation; consensus=TATA(A or T)A(A or T).
2458
2459 *terminator
2460
2461 #Sequence of DNA located either at the end of the transcript or adjacent
2462 to a promoter region that causes RNA polymerase to terminate
2463 transcription; may also be site of binding of repressor protein.
2464
2465 *tmRNA
2466
2467 #Transfer messenger RNA; acts as a tRNA first, then an mRNA that encodes a
2468 peptide tag.
2469
2470 *transit_peptide
2471
2472 #Transit peptide coding sequence; coding sequence for an N-terminal
2473 domain of a nuclear-encoded organellar protein; this domain is involved
2474 in post- translational import of the protein into the organelle.
2475
2476 *tRNA
2477
2478 #Mature transfer RNA, a small RNA molecule (75-85 bases long) that
2479 mediates the translation of a nucleic acid sequence into an amino acid
2480 sequence.
2481
2482 *unsure
2483
2484 #Author is unsure of exact sequence in this region.
2485
2486 *V_region
2487
2488 #Variable region of immunoglobulin light and heavy chains, and T-cell
2489 receptor alpha, beta, and gamma chains. Codes for the variable amino
2490 terminal portion. Can be made up from V_segments, D_segments,
2491 N_regions, and J_segments.
2492
2493 *V_segment
2494
2495 #Variable segment of immunoglobulin light and heavy chains, and T-cell
2496 receptor alpha, beta, and gamma chains. Codes for most of the variable
2497 region (V_region) and the last few amino acids of the leader peptide.
2498
2499 *variation
2500
2501 #A related strain contains stable mutations from the same gene (e.g.,
2502 RFLPs, polymorphisms, etc.) that differ from the presented sequence at
2503 this location (and possibly others).
2504
2505 *3'UTR
2506
2507 #Region near or at the 3' end of a mature transcript (usually following
2508 the stop codon) that is not translated into a protein; trailer.
2509
2510 *5'UTR
2511
2512 #Region near or at the 5' end of a mature transcript (usually preceding
2513 the initiation codon) that is not translated into a protein; leader.
2514
2515 * -10_signal
2516
2517 #Pribnow box; a conserved region about 10 bp upstream of the start point
2518 of bacterial transcription units that may be involved in binding RNA
2519 polymerase; consensus=TAtAaT.
2520
2521 * -35_signal
2522
2523 #A conserved hexamer about 35 bp upstream of the start point of
2524 bacterial transcription units; consensus = TTGACa or TGTTGACA.
2525
2526 >Biological Source Descriptor or Feature
2527
2528 #This annotation is very important, as an entry cannot be processed by
2529 the databases unless it includes some basic information about the
2530 organism from which the sequence was derived. This basic information was
2531 entered previously in the submission, in the Organism and Sequences
2532 Form. The more detailed Organism Information form allows you to alter
2533 or add to the data you entered earlier.
2534
2535 *Overview: Descriptor or Feature?
2536
2537 #Sequin allows two types of biological source information to be entered,
2538 Biological Source Descriptors and Biological Source Features. Biological
2539 Source Descriptors, like other descriptors, provide organism information
2540 about an entire sequence, or an entire set of sequences, in an entry.
2541 Biological Source Features, like other features, provide organism
2542 information about a specific interval on a given sequence.
2543
2544 #In most cases, you will want to use a Biological Source Descriptor, because
2545 all the sequences in the entry will derive from the same source. However, if
2546 you have sequenced a transgenic molecule, for example, one that is part plant
2547 and part bacterial, you would use Biological Source Features to annotate which
2548 sequence was derived from plant and which from bacteria.
2549
2550 #To add a Biological Source Descriptor, select Biological Source under
2551 the Descriptor section of the Annotate menu. To add a Biological
2552 Source Feature, select Biological Source under the Bibliographic and
2553 Comments section of the Annotate menu.
2554
2555 #Annotating a Biological Source Descriptor or Feature is similar to
2556 annotating any descriptor or feature. For help in creating descriptors
2557 and features, see the appropriate section of the help documentation.
2558 The following are instructions for filling out Biological
2559 Source-specific forms.
2560
2561 *Organism Page
2562
2563 **Names Subpage
2564
2565 #The scrollable list contains the scientific names of many organisms.
2566 To reach a name on the list, either type the first few letters of the
2567 scientific name, or use the thumb bar. Click on a name from the list to
2568 fill out the scientific name field. If there is a common name for the
2569 organism, that field will be filled out automatically. You may also
2570 directly type in the scientific name. If you have any questions about
2571 the scientific or common name of an organism, see the NCBI
2572 <A HREF="http://www.ncbi.nlm.nih.gov/Taxonomy/tax.html">
2573 taxonomy browser
2574 </A>
2575 http://www.ncbi.nlm.nih.gov/Taxonomy/tax.html
2576
2577 **Location Subpage
2578
2579 ***Location of Sequence
2580
2581 #From the selection list, please enter the location of the genome that
2582 contains your sequence. Most entries will have a "Genomic" location.
2583 A brief description of the choices in this pop-up menu were listed
2584 previously.
2585
2586 ***Origin of Sequence
2587
2588 #This menu is for the use of database personnel. Please leave this
2589 field empty. The Biological focus box should be checked in rare cases
2590 where multiple source features are annotated.
2591
2592 **Genetic Codes Subpage
2593
2594 #Please use these fields to select the nuclear and mitochondrial genetic
2595 code that should be used to translate the nucleic acid sequence. The
2596 genetic code for a eukaryotic organism is "Standard". If you selected
2597 an organism name from the scrollable list described above, this field
2598 was filled out automatically. Do not change these fields if they have
2599 been filled out automatically.
2600
2601 #For more information regarding the translation tables available, see
2602 the NCBI Taxonomy
2603
2604 <A HREF="http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c">
2605 page
2606 </A>.
2607
2608 **Lineage Subpage
2609
2610 #This information is normally entered by the database staff. They will
2611 use the
2612 <A HREF="http://www.ncbi.nlm.nih.gov/Taxonomy/tax.html">
2613 Taxonomy database
2614 http://www.ncbi.nlm.nih.gov/Taxonomy/tax.html
2615 </A>
2616 maintained by the NCBI/GenBank.
2617
2618 #If you disagree with the lineage supplied please notify the database
2619 staff.
2620
2621 #If you are running Sequin in its
2622 <A HREF="#NetConfigure">
2623 network-aware
2624 </A>
2625 mode, you will see a button labeled "Lookup Taxonomy". Click on this
2626 button to perform an automatic look-up of the taxonomic lineage of the
2627 organism. Sequin will perform the look-up by accessing the Taxonomy
2628 database and will fill out the Taxonomic Lineage and
2629 Division fields.
2630
2631 #If you have any comments about the taxonomic lineage determined by
2632 Sequin, please submit these comments with your entry. Under the Sequin
2633 File menu, select Edit Submitter Info. Enter your comments in the box
2634 entitled "Special Instructions to Database Staff", on the Submission
2635 page.
2636
2637 *Modifiers Page
2638
2639 #This page allows you to enter additional information about the source
2640 and/or organism. Entering information is optional.
2641
2642 **Source Subpage
2643
2644 #Choose a modifier from the pull-down menu on the left side of the page
2645 and type the appropriate name on the right side of the page. If you do
2646 not find appropriate modifiers in the scroll down list, you can enter
2647 additional source information as text in the field at the bottom of the
2648 page. You may add multiple modifiers to describe the source organism.
2649
2650 #Clicking on the X button to the right of the text box will remove the
2651 text and clear the modifier from the pull-down in that line.
2652
2653 #The following is a description of the available modifiers:
2654
2655 #-Cell-line: Cell line from which sequence derives.
2656
2657 #-Cell-type: Type of cell from which sequence derives.
2658
2659 #-Chromosome: Chromosome to which the gene maps.
2660
2661 #-Clone: Name of clone from which sequence was obtained.
2662
2663 #-Clone-lib: Name of library from which sequence was obtained.
2664
2665 #-Collected-by: Name of person who collected sample. Do not use
2666 accented or non-ASCII characters.
2667
2668 #-Collection-date: Date sample was collected. Must use format
2669 23-Mar-2005, Mar-2005, or 2005.
2670
2671 #-Country: The country of origin of DNA samples used for epidemiological
2672 or population studies. A list of approved country designations can
2673 be found on the
2674 <A HREF="http://www.ncbi.nlm.nih.gov/projects/collab/country.html">
2675 ISDC web pages.</A> Additional text may be added after a colon. For example,
2676 /country="USA: Bethesda, MD"
2677
2678 #-Dev-stage: Developmental stage of organism.
2679
2680 #-Endogenous-virus-name: Name of inactive virus that is integrated into
2681 the chromosome of its host cell and can therefore exhibit vertical
2682 transmission.
2683
2684 #-Environmental-sample: Identifies sequence derived by direct molecular
2685 isolation from an unidentified organism. You cannot include extra text when
2686 using this modifier; the text box will change to TRUE upon selection of this
2687 modifier from the pull-down list
2688
2689 #-Frequency: Frequency of occurrence of a feature.
2690
2691 #-Fwd-PCR-primer-name: Name or designation of forward primer used for
2692 amplification.
2693
2694 #-Fwd-PCR-primer-seq: Sequence of forward primer used for amplification.
2695
2696 #-Genotype: Genotype of the organism.
2697
2698 #-Germline: If the sequence shown is DNA and a member of the
2699 immunoglobulin family, this qualifier is used to denote that the sequence
2700 is from unrearranged DNA. You cannot include extra text when using this
2701 modifier; the text box will change to TRUE upon selection of this modifier
2702 from the pull-down list.
2703
2704 #-Haplogroup: Combination of stable polymorphic variants clustered together
2705 in a specific combination which can indicate a common ancestor.
2706
2707 #-Haplotype: Haplotype of the organism.
2708
2709 #-Identified-by: Name of person who identified sample. Do not use
2710 accented or non-ASCII characters.
2711
2712 #-Isolation-source: Describes the local geographical source of the organism
2713 from which the sequence was derived
2714
2715 #-Lab-host: Laboratory host used to propagate the organism from which
2716 the sequence was derived.
2717
2718 #-Lat-Lon: Latitude and longitude of location where sample was
2719 collected. Mandatory format is decimal degrees N/S E/W. Selecting this
2720 modifier in the pull-down list will generate separate boxes for entering the
2721 information in the mandatory format.
2722
2723 #-Linkage-group: Group of genes whose loci are physically connected and tend
2724 to segregate together during meiosis.
2725
2726 #-Map: Map location of the gene.
2727
2728 #-Mating-type: Designation of individual single-celled organisms and protists
2729 based on mating behavior.
2730
2731 #-Metagenomic: Identifies sequence from a culture-independent genomic
2732 analysis of an environmental sample submitted as part of a whole genome
2733 shotgun project. You may not include extra text when using this modifier,
2734 instead the text box will change to TRUE upon selection.
2735
2736 #-Plasmid-name: Name of plasmid from which the sequence was obtained.
2737
2738 #-Pop-variant: Name of the population variant from which the sequence was
2739 obtained.
2740
2741 #-Rearranged: If the sequence shown is DNA and a member of the
2742 immunoglobulin family, this qualifier is used to denote that the sequence
2743 is from rearranged DNA. You cannot include extra text when using this
2744 modifier; the text box will change to TRUE upon selection of this modifier
2745 from the pull-down list.
2746
2747 #-Rev-PCR-primer-name: Name or description of reverse primer used for
2748 amplification.
2749
2750 #-Rev-PCR-primer-seq: Sequence of reverse primer used for amplification.
2751
2752 #-Segment: Name of viral genome fragmented into two or more nucleic acid
2753 molecules.
2754
2755 #-Sex: Sex of the organism from which the sequence derives.
2756
2757 #-Subclone: Name of subclone from which sequence was obtained.
2758
2759 #-Tissue-lib: Tissue library from which the sequence was obtained.
2760
2761 #-Tissue-type: Type of tissue from which sequence derives.
2762
2763 #-Transgenic: Identifies organism that was the recipient of transgenic
2764 DNA. You cannot include extra text when using this modifier; the text box
2765 will change to TRUE upon selection of this modifier from the pull-down list.
2766
2767 **Organism Subpage
2768
2769 #Choose a modifier from the pull-down menu on the left side of the page
2770 and type the appropriate name on the right side of the page. If you do
2771 not find appropriate modifiers in the scroll down list, you can enter
2772 additional organism information as text in the field at the bottom of
2773 the page. You may add multiple modifiers to describe the source organism.
2774
2775 #Clicking on the X button to the right of the text box will remove the text
2776 and clear the modifier from the pull-down in that line.
2777
2778 #The following is a description of the available modifiers:
2779
2780 #-Acronym: Standard synonym (usually of a virus) based on the initials
2781 of the formal name. An example is HIV-1.
2782
2783 #-Anamorph: The scientific name applied to the asexual phase of a fungus.
2784
2785 #-Authority: The author or authors of the organism name from which sequence
2786 was obtained.
2787
2788 #-Bio-material: An identifier of the stored biological material from which
2789 the sequence was obtained. This qualifier should be used to cite collections
2790 that are not appropriate in specimen-voucher or culture-collection. Examples
2791 include stock centers and seed banks. Mandatory format is "institution
2792 code:collection code:material_id". However, only material_id is required.
2793 Selecting this modifier in the pull-down list will generate separate boxes for
2794 entering the information in the correct format.
2795
2796 #-Biotype: See biovar.
2797
2798 #-Biovar: Variety of a species (usually a fungus, bacteria, or virus)
2799 characterized by some specific biological property (often geographical,
2800 ecological, or physiological). Same as biotype.
2801
2802 #-Breed: The named breed from which sequence was obtained (usually applied
2803 to domesticated mammals).
2804
2805 #-Chemovar: Variety of a species (usually a fungus, bacteria, or virus)
2806 characterized by its biochemical properties.
2807
2808 #-Common: Common name of the organism from which sequence was obtained.
2809
2810 #-Cultivar: Cultivated variety of plant from which sequence was obtained.
2811
2812 #-Culture-collection: Identifier and institution code of the microbial or
2813 viral culture or stored cell-line from which the sequence was obtained. This
2814 qualifier should be used to cite the collection where the author has deposited
2815 the culture or from which the culture was obtained. Personal library
2816 collections should be annotated in strain and not in culture-collection.
2817 Mandatory format is "institution code:collection code:culture_id". However,
2818 collection code is not required. Selecting this modifier in the pull-down
2819 list will generate separate boxes for entering the information in the correct
2820 format.
2821
2822 #-Ecotype: The named ecotype (population adapted to a local habitat) from
2823 which sequence was obtained (customarily applied to populations of
2824 Arabidopsis thaliana).
2825
2826 #-Forma: The forma (lowest taxonomic unit governed by the nomenclatural
2827 codes) of organism from which sequence was obtained. This term is usually
2828 applied to plants and fungi.
2829
2830 #-Forma-specialis: The physiologically distinct form from which sequence
2831 was obtained (usually restricted to certain parasitic fungi).
2832
2833 #-Group: Do not select this item.
2834
2835 #-Host: Natural (as opposed to laboratory) host to the organism from which
2836 sequenced molecule was obtained. Use of the Latin name of the host organism
2837 is preferred.
2838
2839 #-Isolate: Identification or description of the specific individual
2840 from which this sequence was obtained. An example is Patient X14.
2841
2842 #-Metagenome-source: Used only for genome projects. Do not select this item.
2843
2844 #-Pathovar: Variety of a species (usually a fungus, bacteria or virus)
2845 characterized by the biological target of the pathogen. Examples
2846 include Pseudomonas syringae pathovar tomato and Pseudomonas syringae
2847 pathovar tabaci.
2848
2849 #-Serogroup: See serotype.
2850
2851 #-Serotype: Variety of a species (usually a fungus, bacteria, or virus)
2852 characterized by its antigenic properties. Same as serogroup and
2853 serovar.
2854
2855 #-Serovar: See serotype.
2856
2857 #-Specimen-voucher: Identifier of the physical specimen from which the
2858 sequence was obtained. The qualifier is intended for use where the sample is
2859 still available in a curated museum, herbarium, frozen tissue collection, or
2860 personal collection. Mandatory format is "institution code:collection
2861 code:specimen_id". However, only specimen_id is required. Selecting this
2862 modifier in the pull-down list will generate separate boxes for entering the
2863 information in the correct format.
2864
2865 #-Strain: Strain of organism from which sequence was obtained.
2866
2867 #-Subgroup: Do not select this item.
2868
2869 #-Sub-species: Subspecies of organism from which sequence was obtained.
2870
2871 #-Substrain: Sub-strain of organism from which sequence was obtained.
2872
2873 #-Subtype: Subtype of organism from which sequence was obtained.
2874
2875 #-Synonym: The synonym (alternate scientific name) of the organism name
2876 from which sequence was obtained.
2877
2878 #-Teleomorph: The scientific name applied to the sexual phase of a fungus.
2879
2880 #-Type: Type of organism from which sequence was obtained.
2881
2882 #-Variety: Variety of organism from which sequence was obtained.
2883
2884 **GenBank Subpage
2885
2886 #Please do not use this form. This field is reserved for information from
2887 NCBI's taxonomy database.
2888
2889 *Miscellaneous Page
2890
2891 **Synonyms Subpage
2892
2893 #If there are alternative names for the organism from which the sequence
2894 was derived, enter them here. Please be aware that this is the
2895 appropriate field only for alternative names for the organism, not for
2896 alternative gene or protein names.
2897
2898 **Cross-Refs Subpage
2899
2900 #This page is for use by database staff only.
2901
2902 >Publications
2903
2904 *Overview: Descriptor or Feature?
2905
2906 #Sequin allows two types of publications to be entered, Publication
2907 Descriptors and Publication Features. Publication Descriptors are
2908 bibliographic references that, like other descriptors, cover an entire
2909 sequence, or an entire set of sequences, in an entry. Publication
2910 Features are bibliographic references that, like other features, cover
2911 a specific interval on a given sequence.
2912
2913 #Publications are entered into the Reference field of the database
2914 entry. References are citations of unpublished, in press, or published
2915 works that are relevant to the submitted sequence. Publications
2916 should provide information regarding the principle cloning and
2917 determination of the sequence within the record.
2918
2919 #In general, there is one publication describing a sequence, and a
2920 Publication Descriptor should be used. To enter a Publication
2921 Descriptor, select Publications under the Annotate menu and click on
2922 Publication Descriptor.
2923
2924 #However, if one publication describes the cloning of the 5' end of a
2925 gene, and another publication describes the cloning of the 3' end of
2926 the gene, Publication features may be used. To make a publication
2927 feature, choose Publication Feature in the Publications section of the
2928 Annotate menu. Enter the information about the publication, and then
2929 enter the nucleotide interval to which the publication refers on the
2930 Location page.
2931
2932 *Citation on Entry Form
2933
2934 **Status
2935
2936 #Using the radio buttons, select one of the three options:
2937
2938 #-Unpublished: Select this option if a manuscript has been written but
2939 not yet submitted or has been submitted for publication but has not yet
2940 been accepted.
2941
2942 #-In Press: The article has been accepted for publication but is not yet
2943 in print.
2944
2945 #-Published: The article has been published.
2946
2947 **Class
2948
2949 #Using the radio buttons, select the type of publication in which the
2950 sequence will appear.
2951
2952 #-Journal
2953
2954 #-Book Chapter
2955
2956 #-Book
2957
2958 #-Thesis/Monograph
2959
2960 #-Proceedings Chapter: Abstract from a meeting
2961
2962 #-Proceedings: A meeting
2963
2964 #-Patent
2965
2966 #-Online Publication: Used for journals which publish strictly online and
2967 do not issue print copies.
2968
2969 #-Submission
2970
2971 **Scope
2972
2973 #Using the radio buttons, select one of the options.
2974
2975 #-Refers to the entire sequence: Most publications should be classified
2976 as such.
2977
2978 #-Refers to part of the sequence: For use only when a publication
2979 discusses only part of the presented sequence. You must enter the
2980 locations in the location tab in later forms. This selection is only
2981 valid when adding a Publication feature, not descriptor.
2982
2983 #-Cites a feature on the sequence: This selection should only be made in
2984 limited cases. Its use must coincide with the use of the /citation
2985 qualifier on the given feature.
2986
2987 #After you have filled out the Citation on Entry form, click on
2988 "Proceed" to see the next form.
2989
2990 *Citation Information Form (General)
2991
2992 **Authors Page
2993
2994 ***Names Subpage
2995
2996 #Please enter the names of the authors. Note that the first name of the
2997 author is listed first. You can add as many authors to this page as
2998 necessary. After you type in the name of the third author, the box
2999 becomes a spreadsheet, and you can scroll down to the next line by
3000 using the thumb bar. The suffix toggle allows the addition of common
3001 suffixes to the author name. The consortium field should be used when
3002 a consortium is responsible for the sequencing or publication of the
3003 data. The consortium should not be the department or institute
3004 affiliation of the authors. Individual authors may be listed along
3005 with a consortium name.
3006
3007 ***Affiliation Subpage
3008
3009 #Please enter information about the institution where the sequencing was
3010 performed.
3011
3012 #Other pages in the Citation Information Form will be different,
3013 depending on the Class of publication selected in the Citation on Entry
3014 Form. Instructions for filling out the Citation Information Form for
3015 Journals is included here.
3016
3017 *Citation Information Form (If Selected Class Was Journal)
3018
3019 **Title Page
3020
3021 #Enter title for manuscript in the box.
3022
3023 **Journal Page
3024
3025 #Fill in the appropriate Journal, Volume, Issue, Pages, Day, and Year
3026 fields by typing information into the boxes. Select the month with the
3027 pop-up menu. If necessary, choose an option from the Erratum pop-up
3028 menu and explain the erratum.
3029
3030 #If you are running Sequin in its
3031 <A HREF="#NetConfigure">
3032 network-aware
3033 </A>
3034 mode, the program will look up the Title, Author, and Journal
3035 information in the MEDLINE database if you supply it with some minimal
3036 information. For example, if you know the MUID (MEDLINE Unique
3037 Identifier) of the publication, enter it in the appropriate box and
3038 select "Lookup By MUID." Sequin will automatically retrieve the rest
3039 of the information. One way to find the MUID of the publication is to
3040 look up the publication with the NCBI's
3041
3042 <A HREF="http://www.ncbi.nlm.nih.gov/Entrez">
3043 Entrez
3044 </A>
3045 service. Alternatively, if you do not know the MUID, enter the Journal,
3046 Volume, Pages, and Year. Then select "Lookup Article". Sequin will
3047 retrieve the missing Title and Author information.
3048
3049 #The PubStatus toggle is used by database staff. If you have used the
3050 "Lookup by MUID" or "Lookup by PMID" functions, this field may be
3051 populated. Please do not edit the information.
3052
3053 **Remark Page
3054
3055 #This page is reserved for use by the database staff.
3056
3057 >File Menu
3058
3059 *About Sequin
3060
3061 #Details about the current version of Sequin.
3062
3063 *Help
3064
3065 #Launches the help documentation.
3066
3067 *Open
3068
3069 #Open an existing entry. This option will open a record that has been
3070 previously saved in Sequin. Furthermore, for analysis purposes, it can also
3071 open
3072 a FASTA-formatted sequence file. The sequence will be displayed in Sequin and
3073 can be analyzed with tools such as CDD Search, but it should not be submitted,
3074 because it does not have the appropriate annotations.
3075
3076 *Close
3077
3078 #Close this entry.
3079
3080 *Export GenBank
3081
3082 #Exports the currently displayed format to a file. Do not use export
3083 ASN1 for submission of sequences to the database.
3084
3085 *Duplicate View
3086
3087 #Duplicates the entry. You can then view the entry simultaneously in
3088 different Display Formats.
3089
3090 *Save
3091
3092 #Saves the entry. Note: This merely saves the entry so you can go back
3093 and edit it. It does not prepare the entry for submission to the
3094 database, that is, it does not validate the entry.
3095
3096 *Save As
3097
3098 #See Save.
3099
3100 *Save as Binary Seq-entry
3101
3102 #Saves the file in a compressed format and should be used only when the
3103 file is to be imported into other analysis programs. Do not use this
3104 option to save files for submission directly to GenBank.
3105
3106 *Restore
3107
3108 #Replaces the displayed record with a previously saved version. This
3109 feature is useful if you have made unwanted changes since you last saved
3110 the record.
3111
3112 *Prepare Submission
3113
3114 #Prepares the entry for submission to the database. See
3115 <A HREF="#SubmittingtheFinishedRecordtotheDatabase">
3116 Submitting the Finished Record to the Database
3117 </A>
3118 in the Sequin help documentation.
3119
3120 *Print
3121
3122 #Prints the window that is currently selected. The selected window can
3123 be one of the Sequin forms or pages, or the help documentation.
3124
3125 *Quit
3126
3127 #Exit from Sequin.
3128
3129 >Edit Menu
3130
3131 *Copy
3132
3133 #Copy the selected item.
3134
3135 *Clear
3136
3137 #Clear the selected item.
3138
3139 *Edit Sequence
3140
3141 #To edit a single sequence, select the sequence identifier in the Target
3142 Sequence pop-up menu, and click on Edit sequence. The sequence editor
3143 will be launched for that sequence. The
3144 <A HREF="#SequenceEditor">
3145 sequence editor
3146 </A>
3147 is discussed in more detail below.
3148
3149 *Alignment Assistant
3150
3151 #This option will launch the Alignment Assistant which is discussed in
3152 more detail
3153
3154 <A HREF="#Workingwithsetsofalignedsequences">
3155 below
3156 </A>
3157 .
3158
3159 *Edit Submitter Info
3160
3161 #Opens up the Submission Instructions form, which allows you to enter
3162 additional information about the person submitting the record. Much of
3163 this information was entered on the first form in Sequin, the Submitting
3164 Authors form.
3165
3166 #You can also save the information from the Submitting Authors form
3167 here, so that you can use it in subsequent Sequin submissions. Click
3168 on "Edit Submitter Info" and, under the file menu in the resulting
3169 Submission Instructions form, click on Export Submitter Info to save
3170 the information to a file. For subsequent Sequin submissions, if you
3171 have already saved the submittor information, click on Import Submitter
3172 Info under the File menu on the Submission page of the Submitting
3173 Authors form.
3174
3175 **Submission Page
3176
3177 #Indicate the type of submission. If it is a new submission, select
3178 New. If you are updating an existing submission in order to resubmit it
3179 to the databases, select Update. Check either the "Yes" or "No" radio
3180 button to indicate if the record should be released before publication.
3181 If you select "Yes", the entry will be released to the public after the
3182 database staff has added it to the database. If you select "No", fields
3183 will appear in which you can indicate the date on which the sequences
3184 should be released to the public. The submission will then be held back
3185 until formal publication of the sequence or
3186 GenBank Accession number, or until the Release Date, whichever comes
3187 first. If you have any special instructions, enter them in the box at
3188 the bottom of the page.
3189
3190 **Contact Page
3191
3192 #Update the name, affiliation, or contact numbers of the person
3193 submitting the record. Please supply a fax number to facilitate
3194 communication with database staff.
3195
3196 **Citation Page
3197
3198 #Update the names and affiliation of the people who should receive
3199 scientific credit for the generation of sequences in this entry. The
3200 address should list the principal institution in which the sequencing
3201 and/or analysis was carried out. If you are submitting the record as
3202 an update to the databases, explain the reason for the update on the
3203 Description subpage.
3204
3205 *Update Sequence
3206
3207 #This selection allows you to replace a sequence with another sequence,
3208 merge two sequences that overlap at their ends, 'patch' a corrected
3209 fragment of a sequence to the current sequence, or copy features from
3210 one sequence to another.
3211
3212 #Use Single Sequence to import a sequence in FASTA or ASN.1 format (for
3213 example, a sequence record that has already been saved in Sequin). If
3214 you are running Sequin in
3215
3216 <A HREF="#NetConfigure">
3217 Network Aware mode,
3218 </A>
3219 you can use Download Accession to import a record from Entrez. The
3220 Multiple Sequences option allows you to update multiple sequences using
3221 either FASTA or ASN.1 formats. In either format, each sequence
3222 identifier must be identical in the new and old sequences.
3223
3224 #After you import the updated sequence, a new window will open that
3225 displays two graphical views and the text of the alignment of the new
3226 and old sequence. The first graphic displays the relative length of the
3227 two sequences and the length of the overlapping region between
3228 sequences. The second graphic represents any inserts, deletions, or
3229 point changes within the aligned region between the new and old
3230 sequences. Clicking on a region in this graphic will scroll to the
3231 corresponding nucleotide sequence in the alignment text below.
3232
3233 #The Sequence Update box to the right shows the action that will be
3234 performed upon updating the sequence, i.e., no change, replace, extend
3235 5', extend 3', or patch. The patch function allows you to replace an
3236 internal fragment of the sequence without affecting flanking regions.
3237 You can also override the alignment between the new and old sequence
3238 using the Ignore alignment checkbox to force a sequence change of
3239 replace, Extend 5' or Extend 3'. This option allows you to append new
3240 sequence to with no overlap.
3241
3242 #If the current sequence has annotation, you can use the Existing
3243 Features box to determine whether the annotation should remain or be
3244 removed upon updating the sequence. The Do not remove option is the
3245 default. However, you may chose to remove annotated features only in
3246 the aligned area, outside the aligned area, or to remove all currently
3247 annotated features.
3248
3249 #When updating via Download Accession or an ASN.1 file, the Import
3250 Features box allows you to specify whether features from the new file
3251 should be imported to the existing record. The dialog offers
3252 different options for cases where the features on the new file are
3253 identical to those on the existing record.
3254
3255 #If you are using the Multiple Sequences option, you may choose to
3256 review the sequences and update them one by one using the Update this
3257 Sequence box at the bottom of the window. You may skip a sequence
3258 update or choose to update all sequences at once without reviewing them
3259 in the Update Sequence dialog.
3260
3261 #In any case, please carefully review the sequence and annotation in the
3262 record viewer after using the Update Sequence function.
3263
3264 *Extend Sequence
3265
3266 #This selection functions similar to the
3267
3268 <A HREF="#UpdateSequence">
3269 Update Sequence
3270 </A>
3271
3272 function. However, you can only extend the existing sequence in either
3273 the 5' or 3' direction in cases with no overlap between the existing
3274 and new sequences.
3275
3276 *Feature Propagate
3277
3278 #This selection allows you to propagate any annotated feature from one
3279 sequence in an aligned set to other sequences within the set. For
3280 example, if one nucleotide sequence in the alignment contains a CDS
3281 feature, you can annotate a similar CDS on the other nucleotide
3282 sequences in the set.
3283
3284 #The default source of features to be propagated is the first member
3285 of the set. If you would like to use a different entry as the source of
3286 the features, scope to that entry in the Target Sequence menu before
3287 selecting Feature Propagate from the Edit menu.
3288
3289 #The Feature Propagate window allows you to select which sequences
3290 should receive the new annotation and which features will be
3291 propagated. You can also select whether the features will be extended
3292 or split at gaps in the alignment. The split at gaps selection will
3293 produce two features, one on either side of the gap within the
3294 alignment. If you are propagating a CDS feature, you may specify that
3295 the translation end or extend through internal stop codons. You may
3296 also extend the translation after the stop codon on the source entry by
3297 chosing to translate the CDS after partial 3' boundary. If the CDS
3298 that you are propagating to other records is partial on either end, you
3299 should select the 'Cleanup CDS partials after propagation' check box.
3300 This will retain the partial nature of the CDS features on all records.
3301 The fuse adjacent propagated intervals function will create one
3302 feature from two of the same type that contain abutting nucleotide
3303 intervals due to the nature of the alignment used for propagation.
3304
3305 *Add Sequence
3306
3307 #This selection allows you to add a new sequence to an existing
3308 population, mutation, phylogenetic, or environmental sample set.
3309 You may import the new entry in FASTA format or ASN.1 format (for
3310 example, a sequence record that has been saved in Sequin).
3311
3312 *Parse File to Source
3313
3314 #This selection allows you to add unique information for one source
3315 qualifier for each of the records in a batch or set. The input file
3316 must be in the format of a tab-delimited, two column table. The first
3317 column should list the SeqID exactly as it was listed in the original
3318 FASTA file. The second column should list the text value for the
3319 desired source qualifier for each record. Once the file has been
3320 imported, a pop-up box will appear with the source qualifiers listed in
3321 the pull down menus. The qualifiers are separated into three menus:
3322 one for taxonomic information, one for the Organism modifiers and one
3323 for the Source modifiers. For example, in order to add the clone
3324 designations 57 and 49 to the sequences labeled seq1 and seq2, the table
3325
3326 seq1 57
3327 seq2 49
3328
3329 should be used and clone should be selected from the Source modifiers
3330 pull-down menu.
3331
3332 >Search Menu
3333
3334 *Find ASN.1
3335
3336 #Under this command, you can find and replace strings of letters in
3337 those fields of your submission that contain manually entered data.
3338 The fields that can be altered are Locus, Definition, Accession,
3339 Keywords, Source, Reference, and Features. To use this option, select
3340 Find and fill the Find and Replace lines with the appropriate text.
3341 Note that you cannot edit the sequence in this way.
3342
3343 *Find FlatFile
3344
3345 #Under this command, you can find strings of letters in all fields of
3346 your submission. You can use the Find First and Find Next buttons to
3347 identify the specified text sequentially through the flatfile.
3348
3349 *Find by Gene
3350
3351 #This option allows you to move quickly in the record viewer to a gene
3352 feature containing the specified gene symbol.
3353
3354 *Find by Protein
3355
3356 #This option allows you to move quickly in the record viewer to a CDS
3357 feature containing the specified product name.
3358
3359 *Find by Position
3360
3361 #This option allows you to move quickly in the record viewer to any
3362 feature annotated at the specified nucleotide location.
3363
3364 *Validate
3365
3366 #This option detects discrepancies between the format of your submission
3367 and that required by the database selected for entry. If discrepancies
3368 are present, it suggests ways in which to correct them. See the topic on
3369
3370 <A HREF="#SubmittingtheFinishedRecordtotheDatabase">
3371 Submitting the Finished Record to the Database
3372 </A>
3373 in the Sequin help documentation.
3374
3375 *CDD Search
3376
3377 #Performs a CDD BLAST search of the selected sequence against the
3378 NCBI's
3379 <A HREF="http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml">
3380 Conserved Domain Database
3381 </A>
3382 . To do a CDD BLAST search, Sequin must be in its network aware mode.
3383
3384 #CDD currently contains domains derived from two popular collections,
3385 Smart and Pfam, plus contributions from colleagues at NCBI. The source
3386 databases also provide descriptions and links to citations. Since
3387 conserved domains correspond to compact structural units, CDs contain
3388 links to 3D-structure via Cn3D whenever possible.
3389
3390 #The results of the CDD search will be displayed in the record
3391 viewer. These results are for your use only and should be removed
3392 from the record before submission.
3393
3394 *Vector Screen
3395
3396 #This option allows you to run a BLAST search of your nucleotide
3397 sequence(s) against NCBI's
3398 <A HREF="http://www.ncbi.nlm.nih.gov/VecScreen/UniVec.html">
3399 UniVec
3400 </A>
3401 database. We highly recommend that you run this analysis and remove
3402 any vector contamination before submission. The UniVec database
3403 contains only one copy of every unique sequence segment from a large
3404 number of vectors. It also contains sequences for adapters, linkers
3405 and primers commonly used.
3406
3407 #To run Vector screen on a submission containing multiple sequences,
3408 scope to ALL SEQUENCES in the Target Sequence pull-down before running
3409 the analysis. If there are many sequences, a status bar will appear
3410 indicating the progress of the search. If no contamination is found, a
3411 pop-up box will appear to notify you. If contamination is found, a
3412 miscellaneous feature will be annotated on the flatfile with the
3413 location of the contamination. Details will include the relative
3414 strength of the BLAST hit. You must trim the nucleotide sequence to
3415 remove this feature before submission.
3416
3417 *ORF Finder
3418
3419 #The ORF Finder shows a graphical representation of all open reading
3420 frames (ORFs) in the nucleotide sequence. This tool allows you to
3421 select ORFs and have them appear as coding sequence (CDS) features on
3422 the sequence record.
3423
3424 #The ORFs, indicated by colored boxes, are defined as the longest sequence
3425 containing a start codon and stop codon. All six reading frames are shown as
3426 separate lines in the graphical view. The top three lines represent the plus
3427 strands, and the bottom three lines the minus strands. In the default view,
3428 the nucleotide sequence intervals of the ORFs are displayed in descending
3429 length order on the right side of the window. Intervals on the complementary
3430 (minus) strand are indicated by a 'c'. Selecting 'Order by Start' will
3431 reorder the list based on the nucleotide location of the start codon.
3432
3433 #Clicking on the box labelled ORF changes the graphical display so that the
3434 potential start codons are indicated in white, and stop codons in red.
3435
3436 #The default settings display only those ORFs which contain an ATG start
3437 codon. Selecting 'Alternative' in the 'Initiation Codon' box, will also
3438 include ORFs beginning with all valid alternative start codons as determined
3439 by the genetic code listed in the source feature. If the genetic code for the
3440 source organism has not been specified, the default
3441 <A HREF="http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c#SG1">
3442 standard genetic code
3443 </A>
3444 will be used.
3445
3446 #The ORF length button sets the length of ORFs that are
3447 displayed. For example, the default of 10 shows all ORFs that are
3448 greater than 10 nucleotides in length.
3449
3450 #Checking the Show Partial ORFs box will display ORFs that extend to the
3451 end(s) of the nucleotide sequence but are 5' or 3' partial.
3452
3453 #ORFs can be selected by clicking on the graphical representation or on the
3454 sequence interval. Once an ORF has been selected, its location and amino acid
3455 sequence will automatically be fielded in the
3456 <A HREF="#CDS">
3457 CDS feature editor
3458 </A>
3459 accessed under the Annotate menu.
3460
3461 *Select Target
3462
3463 #This option changes the sequence that is selected in the Target
3464 Sequence pop-up. Type the SeqID of the sequence in the box, and the
3465 record viewer will be updated to display that sequence.
3466
3467 >Misc Menu
3468
3469 *Style Manager
3470
3471 #The Style manager allows you to choose between different formats in
3472 which to view the Graphical Display Format. The graphical display is
3473 selected by choosing the Graphic display format on the record viewer.
3474 Using the Style Manager, you can also copy the style or modify it to
3475 suit your needs.
3476
3477 *Net Configure
3478
3479 #As a default, Sequin is available as a stand-alone program. However,
3480 the program can also be configured to exchange information with the NCBI
3481 (GenBank) over the Internet. The network-aware mode of Sequin is
3482 identical to the stand-alone mode, but it contains some additional
3483 useful options.
3484
3485 #Sequin will only function in its network-aware mode if the computer on
3486 which it resides has a direct Internet connection. Electronic mail
3487 access to the Internet is insufficient. In general, if you can install
3488 and use a WWW browser on your system, you should be able to install and
3489 use network-aware Sequin. Check with your system administrator or
3490 Internet provider if you are uncertain as to whether you have direct
3491 Internet connectivity.
3492
3493 #There are two ways to change Sequin into its network-aware mode. If
3494 you are still on the initial Welcome to Sequin form, select Net
3495 Configure under the Misc menu. If you have already worked on a Sequin
3496 submission and are looking at the record in the record viewer, select
3497 the Net Configure option from the Misc menu.
3498
3499 #Most users will be able to use the default (Normal) settings on the
3500 Network Configuration page; select Accept to complete the configuration
3501 process.
3502
3503 #If a "Normal" Connection does not work, you may need to select the
3504 Firewall Connection. Contact your system administrator to determine
3505 what to enter into the Proxy and Port fields. If you do not have
3506 access to the domain name server (DNS), uncheck this box.
3507
3508 #The Timeout pop-up selects the length of time that your local copy of
3509 Sequin will wait for a reply from the NCBI server. You may need to set
3510 this number higher (i.e., 60 seconds or 5 minutes) if you are outside
3511 of the United States or have a bad internet connection.
3512
3513 #If you have problems setting up the network configuration, contact
3514
3515 <a href="mailto:info@ncbi.nlm.nih.gov">
3516 info@ncbi.nlm.nih.gov.
3517 </a>
3518
3519 #If you would like to change Sequin back to its stand-alone mode, select
3520 Net Configure again from the Misc menu. Click on Connection: None.
3521
3522 #The network-aware mode of Sequin allows you to perform a number of
3523 additional, important functions. These functions all appear as
3524 additional menu items. A brief description of these functions follows.
3525 Further descriptions are available as indicated elsewhere in the help
3526 documentation.
3527
3528 **Updating Existing GenBank Records
3529
3530 #Using Sequin in its network-aware mode, you can download an existing
3531 GenBank record from Entrez using the GenBank accession number or GI
3532 identification number (NID). You can then use Sequin to make any
3533 necessary changes to the record, and resubmit it to GenBank as a
3534 sequence update.
3535
3536 <A HREF="#WelcometoSequinForm">
3537 Instructions
3538 </A>
3539 for submitting sequence updates are presented under the Welcome to
3540 Sequin Form. You can download any record from Entrez and look at it in
3541 Sequin. However, you can only formally update those records which you
3542 have submitted since submitters retain editorial control of their
3543 records.
3544
3545 **Performing a PubMed Look-Up
3546
3547 #In its network-aware mode, Sequin can import the relevant sections of a
3548 PubMed record directly into a sequence submission record. Rather than
3549 typing in the entire citation, you can enter minimal information, such
3550 as the PubMed Unique Identifier (PMID), or Journal name, volume, year,
3551 and pages. The
3552
3553 <A HREF="#JournalPage">
3554 PubMed lookup
3555 </A>
3556 is explained in the section of the documentation entitled Publications.
3557
3558 **Performing a Taxonomy Look-up
3559 #In its network-aware mode, Sequin can look
3560 up the taxonomic lineage of an organism from the NCBI's Taxonomy
3561 database. This look-up is normally performed by the NCBI database staff
3562 after the record has been submitted to GenBank. If you look up the
3563 taxonomy before submitting the sequence, you can make a note in the
3564 record of any disagreements. The
3565 <A HREF="#LineageSubpage">
3566 taxonomy lookup
3567 </A>
3568 is explained in the section of the documentation covering
3569 Biological Source: Organism page: Lineage subpage.
3570
3571 **Accessing the NCBI DeskTop
3572 #The NCBI DeskTop displays the internal
3573 structure of the record being viewed in Sequin. The
3574 <A HREF="#NCBIDeskTop">
3575 DeskTop
3576 </A>
3577 is explained under the Misc menu.
3578
3579 *NCBI DeskTop
3580
3581 #This option is only available if you are running Sequin in its
3582 <A HREF="#NetConfigure">
3583 network-aware
3584 </A>
3585 mode.
3586
3587 #The NCBI DeskTop provides a view of the internal structure of the
3588 Sequin record, the ASN.1. Its display resembles a Venn diagram and
3589 represents all the structures represented in the ASN.1 data model.
3590
3591 #In addition, a number of undocumented software tools from the NCBI can
3592 be accessed from the DeskTop. These tools are components of the NCBI
3593 portable software Toolkit. You can also customize these functions using
3594 the Toolkit with your own software tools. The Toolkit and its
3595 documentation are available from the NCBI by anonymous
3596 <A HREF="ftp://ftp.ncbi.nih.gov/toolbox/README">
3597 FTP.
3598 </A>
3599
3600 #The DeskTop should only be used by very seasoned users. At this time,
3601 we are not providing any documentation for these specialized functions.
3602
3603 >Annotate Menu
3604
3605 #This menu allows you to enter features and descriptors on the sequence.
3606
3607 #The first six options, Genes and Named Regions, Coding Regions and
3608 Transcripts, Structural RNAs, Bibliographic and Comments, Sites and
3609 Bonds, and Remaining Features refer to types of Features that can be
3610 added to the sequence. Features are described in more detail in the
3611 above section entitled
3612 <A HREF="#Features">
3613 Features.
3614 </A>
3615
3616 #If you are submitting a set of similar sequences, you can add the same
3617 feature across the entire span of each by using the Batch Feature Apply
3618 option. The feature must span the entire nucleotide sequence of each
3619 member; you can not annotate specific nucleotide locations using this
3620 option (for this, see
3621
3622 <A HREF="#FeaturePropagate">Feature Propagate</A>).
3623
3624 For each feature, you will be prompted to designate whether the feature
3625 is 5' or 3' partial and whether is is on the plus or minus strand. You
3626 may also add a comment or other qualifier to the feature. The Add
3627 Qualifier option allows you to add a qualifier to an existing feature.
3628 You must specify the feature and qualifier in the Add Qualifier pop-up
3629 box. Source qualifiers can be added to all entries using the Add
3630 Source Qualifier option. Qualifiers specific to the CDS and gene can
3631 be added using Add CDS-Gene-Prot-mRNA and RNA qualifiers using Add RNA
3632 Qual. In each case, a pop-up box appears with qualifier options
3633 appropriate for that feature.
3634
3635 #The Batch Feature Edit function allows you to edit existing qualifiers.
3636 For each menu choice, a pop-up box allows you to select the feature
3637 containing the qualifier and the specific qualifier to be edited. You
3638 can use the Find/Replace text boxes to edit the information contained
3639 within the qualifier.
3640
3641 #The Publications option allows you to add a Publication Feature or
3642 Publication Descriptor to the record. Publications are described in
3643 more detail in the above section entitled
3644
3645 <A HREF="#Publications">
3646 Publications.
3647 </A>
3648
3649 #The Descriptors option allows you to add Descriptors to the
3650 record. Descriptors are described in more detail in the section
3651 entitled
3652 <A HREF="#Descriptors">
3653 Descriptors,
3654 </A>
3655 above.
3656
3657 #The Generate Definition Line option will generate a title for your
3658 sequence based on the information provided in the record. This option
3659 will work for single sequences as well as sets of sequences, and can
3660 handle complex annotations with multiple features. The title will
3661 follow GenBank conventions, but may be modified by the database staff
3662 if it is not appropriate. The title you enter here will replace any
3663 title you entered elsewhere in the submission, for example, any title
3664 that was attached to the nucleotide sequence.
3665
3666 #The Advanced Table Readers option imports a properly formatted structured
3667 comment table. Please contact us if you wish to use this option.
3668
3669 #Sort Unique Count by Group opens a new window which displays your record(s)
3670 the number of times an individual line appears in the flatfile(s). This is
3671 particularly useful when checking that all records in a large set contain the
3672 required source or feature information.
3673
3674 >Options Menu
3675
3676 #This menu is only available when using Sequin in its network-aware mode.
3677 *Font
3678
3679 #Use this item to change the display font. From the pop-up menus,
3680 choose the style and size of type. For additional changes, mark the
3681 Bold, Italic, or Underline check boxes. The default font is 10-point
3682 Courier.
3683
3684 >Sequence Editor
3685
3686 #This editor allows you to modify the nucleotide or amino acid sequences
3687 and corresponding annotation in your entry. Although the Sequence Editor
3688 does allow you to undo changes you make to the sequence, we strongly
3689 suggest that you save a copy of the entry before launching the Sequence
3690 Editor so that you can revert to it if necessary.
3691
3692 *Starting the Sequence Editor
3693
3694 #The sequence that appears in the editor is dependent on the sequence(s)
3695 selected in the Target Sequence pull-down list. There are two ways to
3696 launch the sequence editor for nucleotide sequences. First, you can
3697 double click within sequence in any display format of the record viewer.
3698 A window containing the DNA sequence will appear. Second, in the record
3699 viewer, select the sequence that you would like to edit in the Target
3700 Sequence pop-up menu. Click on Edit Sequence under the Edit menu. You
3701 can launch the editor for protein sequences by selecting the protein
3702 sequence in the Target Sequence pop-up menu and double clicking within
3703 the protein sequence. A window containing the protein sequence will
3704 appear.
3705
3706 *Moving around the Sequence Editor
3707
3708 #The cursor can be moved with the mouse or the arrow keys. The display
3709 window will change to show the position of the cursor. The sequence
3710 location of the first residue on each line is indicated on the left side
3711 of the window. The cursor location, or the range of sequences selected
3712 by the mouse, is shown in the upper left corner of the window. If you
3713 want to move the cursor to a specific location, type the number in the
3714 box on the top left of the sequence editor window, and hit the Go to
3715 button. If you want to look at a specific sequence, but not move the
3716 cursor to it, type the number in the upper right box of the window and
3717 hit the Look at button.
3718
3719 *Editing Sequence and Existing Annotation
3720
3721 #Select a piece of sequence by highlighting it with the mouse. To
3722 select the entire sequence, click on a sequence location number on the
3723 left side of the window. Any sequence that is highlighted in the
3724 Sequence Editor will show up as a box on the sequence when it is viewed
3725 in the Graphic Display Format.
3726
3727 #One way to insert and delete residues is with the mouse. Move the
3728 cursor to the appropriate location and type. Text will be inserted to
3729 the left of the cursor. Delete sequence with the backspace or delete
3730 key. Text will be deleted to the left of the cursor. To delete a block
3731 of sequence, highlight it with the mouse and use the delete or backspace
3732 key.
3733
3734 #Another way to insert and delete residues is with options under the Edit
3735 menu of the Sequence Editor. Use Cut to remove, or Copy to copy,
3736 highlighted residues. Copied residues can then be pasted elsewhere
3737 within the sequence by using the Paste option.
3738
3739 #Features annotated via the record viewer will be displayed in a
3740 graphical format within the sequence editor. CDS features will be be
3741 displayed as a blue line across the appropriate nucleotide location. All
3742 other features will be displayed as a black line. To the left of the
3743 line, the name of the feature is displayed. In the case of CDS or mRNA
3744 features, the product name is shown. For gene features, the gene locus
3745 is shown.
3746
3747 #Double-clicking on the feature will launch the feature editor just as in
3748 the record viewer. However, you can also change the nucleotide location
3749 of any feature within the graphical view. To move the entire feature,
3750 select the feature and drag it to the appropriate location while holding
3751 down the mouse button. To alter the 5' or 3' end of a feature, click on
3752 the feature's end and drag to the new location while holding down the
3753 mouse button.
3754
3755 #Before moving the nucleotide locations of a CDS feature, it may be
3756 useful to view the codons in the current translation. You can do this by
3757 clicking on the feature line and releasing the mouse button. A grid will
3758 be displayed that shows the triplet location for the current annotation.
3759 Once you have changed the nucleotide location of a CDS feature in the
3760 graphical view, you can see the new translation by using the Translate
3761 CDS button at the bottom of the window.
3762
3763 #To save changes you have made to the sequence, press the Accept button
3764 at the bottom of the Sequence Editor display window. If you do not want
3765 to save the changes, press the Cancel button at the bottom of the
3766 Sequence Editor display window. Selecting either Accept or Cancel will
3767 quit the Sequence Editor and return you to the record viewer. Any
3768 changes you make will not become a permanent part of the Sequin record
3769 until you Save the record in the record viewer.
3770
3771 #New features can be added using the Features menu.
3772
3773 *Sequence Editor Window Buttons
3774
3775 **Go to
3776
3777 #Moves the cursor to the indicated location.
3778
3779 **Look at
3780
3781 #Moves the window to the indicated location without moving the cursor.
3782
3783 **Merge Feature Mode/Split Feature Mode
3784
3785 #In merge mode, any new sequence that is entered into a region spanned
3786 by an existing feature becomes part of that feature. For example, if
3787 you enter new sequence in the middle of a CDS, that sequence will be
3788 translated as part of the CDS. In split mode, the new sequence
3789 interrupts the feature. For example, if you enter new sequence in the
3790 middle of a CDS, the CDS will be interrupted by that sequence (see the
3791 location of the CDS in the record viewer).
3792
3793 **Numbering
3794
3795 #Allows the sequence location numbering to be hidden, displayed on the
3796 side, or displayed on the top of the sequence.
3797
3798 **Grid
3799
3800 #Allows the display to show a grid separating each feature and sequence
3801 for easier viewing.
3802
3803 **Show/Hide Features
3804
3805 #This box toggles between hiding and showing the features on a sequence.
3806
3807 **Accept
3808
3809 #Closes the Sequence Editor after saving all of the changes made to
3810 sequences and features.
3811
3812 **Cancel
3813
3814 #Closes the Sequence Editor without saving any changes made to sequences or
3815 features.
3816
3817 **Translate CDS
3818
3819 #Allows translation of coding region features after the location has been
3820 changed within the graphical view.
3821
3822 *Sequence Editor File Menu
3823
3824 **Export
3825
3826 #Allows the export of a range of sequence as a FASTA file or text file.
3827 Using the text option will also export overlapping features if they are
3828 displayed. If the features are first hidden, only the sequence will be
3829 exported. All protein translations displayed at the time of export, will
3830 be exported as well.
3831
3832 **Accept
3833
3834 #Closes the Sequence Editor after saving all of the changes made to
3835 sequence and features.
3836
3837 **Cancel
3838
3839 #Closes the Sequence Editor without saving any changes made to sequences
3840 of features.
3841
3842 *Sequence Editor Edit Menu
3843
3844 **Undo
3845
3846 #Undoes all actions performed in the Sequence Editor since the last save.
3847
3848 **Redo
3849
3850 #Restores changes removed with Undo option
3851
3852 **Cut
3853
3854 #Removes the highlighted sequence. This sequence can be pasted elsewhere.
3855
3856 **Paste
3857
3858 #Pastes a cut or copied sequence to the right of the cursor.
3859
3860 **Copy
3861
3862 #Copies the highlighted sequence. This sequence can be pasted elsewhere.
3863
3864 **Find
3865
3866 #Allows you to find DNA or amino acid sequence patterns in your sequence.
3867 The search is case insensitive. To find an exact match to a DNA
3868 sequence pattern, type the pattern in the box. The number of items found
3869 will be displayed and you can toggle through each instance with the Find
3870 Next button. To find the reverse complement of the pattern, click on
3871 the reverse complement box at the bottom of the pop-up box.
3872
3873 #To find an exact match to an amino acid seqeunce pattern, type that
3874 sequence in the box, and click on "translate sequence". Sequin will look
3875 for all occurrences of that pattern in all six open reading frames. The
3876 DNA sequence encoding that protein sequence in any of the six reading
3877 frames will be hightlighted.
3878
3879 **Translate CDS
3880
3881 #Allows translation of coding region features after the location has been
3882 changed within the graphical view.
3883
3884 **Complement
3885
3886 #Shows the complement of the submitted strand underneath the original.
3887
3888 **Reading Frames
3889
3890 #Shows the indicated phase translation of the selected coding sequence.
3891 You can select any or all of the six reading frames, all reading frames
3892 or all positive or negative frames.
3893
3894 **Protein Mismatches
3895
3896 #Indicates amino acid which does not match conceptual translation
3897 following a nucleotide sequence change. The original amino acid sequence
3898 will be displayed until the Translate CDS function is used. Differences
3899 will be indicated by a red box around the amino acid abbreviation.
3900
3901 **On-the-fly Protein Translations
3902
3903 #Creates a second amino acid sequence in the display which retranslates
3904 as the nucleotide sequence is changed to allow side-by-side comparison to
3905 the original amino acid sequence.
3906
3907 *Sequence Editor Features Menu
3908
3909 #The menu contains a long list of all features that can be annotated on a
3910 sequence. These features are the same as those that are accessible
3911 through the main Sequin Annotate menu.
3912
3913 #You can annotate features either in the Annotate menu or in the Sequence
3914 Editor. If you annotate them in the Annotate menu, you must type in the
3915 nucleotide sequence location of the feature. However, if you add
3916 features from the Sequence Editor, you can highlight the sequence that
3917 the feature covers, and the location of the sequence will be
3918 automatically entered in the feature location box. Additional
3919 explanations of how to annotate features are provided in the section on
3920 <A HREF="#Features">
3921 Features.
3922 </A>
3923
3924 >Working with Sets of Aligned Sequences
3925
3926 #Sequin allows you to work with aligned sets of closely related
3927 nucleotide sequences that are part of a population, phylogenetic, or
3928 mutation study. If the sequences are imported in a pre-aligned format,
3929 such as PHYLIP, Sequin uses this alignment. If the sequences are
3930 imported individually in FASTA format, Sequin can generate its own
3931 alignment.
3932
3933 #You can view the aligned sequences in the Sequence Alignment Editor. In
3934 the record viewer, select All Sequences in the Target Sequences menu,
3935 and select the Alignment Display Format.
3936
3937 #The Alignment Assistant is launched by selecting Alignment Assistant
3938 from the Edit menu in the record viewer. It can be used to apply
3939 features to the whole set of sequences using the alignment coordinates.
3940 Rather than calculating the nucleotide coordinates for every feature on
3941 every nucleotide sequence, you may select the feature's location using
3942 its alignment coordinates and apply it to every member of the set
3943 simultaneously. Sequin will calculate the nucleotide locations as they
3944 apply to each member of the set.
3945
3946 *Alignment Assistant Window Buttons
3947
3948 **Go to
3949
3950 #The Go to alignment position and Go to sequence position buttons both
3951 scroll the aligment assistant so that the requested position is
3952 visible. If the requested position is already visible, nothing will
3953 happen. Unlike the Sequence editor window, the 'go to' button does not
3954 control the cursor position.
3955
3956 **Numbering
3957
3958 #Allows the sequence location numbering to be hidden, displayed on the
3959 side, or displayed on the top of the sequence.
3960
3961 **Grid
3962
3963 #Allows the display to show a grid separating each feature and sequence for
3964 easier viewing.
3965
3966 **Features Toggle
3967
3968 #It is possible to view annotated features in the aligment assistant.
3969 The features are displayed as a bar underneath the coordinates for that
3970 feature. The identity of the feature is displayed in the left-hand
3971 column. The default selection is to have the features Hidden. You may
3972 display the features associated only with the Target Sequence or
3973 features annotated on All Sequences in the alignment.
3974
3975 *Alignment Assistant File Menu
3976
3977 **Export
3978
3979 #Allows you to export the alignment to a file in three different
3980 formats. The contiguous and interleaved options export the alignment
3981 accordingly in FASTA+GAP format. The text representation option saves
3982 the alignment as it appears in the Alignment Assistant. Note that with
3983 this option features are included if they are displayed at the time of
3984 export.
3985
3986 **Close
3987
3988 #Closes the Alignment Assistant window and saves any changes made.
3989
3990 *Alignment Assistant Edit Menu
3991
3992 **Remove Sequences from Alignment
3993
3994 #Allows you to remove selected sequence(s) from the alignment. Select
3995 the sequence by clicking on it. You can select multiple sequences by
3996 holding down the control key. The sequence will then be highlighted in
3997 grey. Note that this option will remove the sequence from the
3998 alignment, but it is still present in your submission.
3999
4000 **Validate Alignment
4001
4002 #Checks for problems with the alignment. If errors are reported, please
4003 review and attempt to fix your alignment before submission.
4004
4005 **Propagate Features
4006
4007 #This function is the same as that available under the Edit Menu in the
4008 record viewer. A full description is available
4009
4010 <A HREF="#FeaturePropagate">
4011 above
4012 </A>
4013 .
4014
4015 *Alignment Assistant View Menu
4016
4017 **Target
4018
4019 #Allows you to select a sequence within the alignment as the target
4020 sequence. This can also be done by double-clicking on the sequence
4021 within the alignment. The SeqID of the target sequence will be
4022 displayed in red. Features can be displayed on the target sequence
4023 only and it is the sequence used for comparison in the
4024
4025 <A HREF="#ShowSubstitutions">
4026 Show Substitutions
4027 </A>
4028 view.
4029
4030 **Show Substitutions
4031
4032 #Changes the alignment view so that identities are replaced with a "."
4033 and only substitutions are shown. The substitutions and identities are
4034 relative to the selected target sequence.
4035
4036 *Alignment Assistant Features Menu
4037
4038 #Allows the annotation of features to a single sequence or all sequences
4039 within the alignment. All features available in this menu are
4040 discussed through the main Sequin Annotate menu.
4041
4042 #Select the feature location by clicking the start location on one of
4043 the sequences, keeping the mouse button depressed, drag the cursor to
4044 the end of the feature location. The selected area will now be
4045 underlined and red and the alignment coordinates of this area will be
4046 displayed in the upper left of the Alignment Assistant window.
4047
4048 **Apply to Target Sequence
4049
4050 #Allows you to choose a feature to be applied only to the target
4051 sequence. The locations may be entered manually or can be determined
4052 based on highlighting the sequence as described above.
4053
4054 **Apply to Alignment
4055
4056 #Allows you to add the selected feature to all sequences within your
4057 alignment based on the alignment coordinates you have selected. Note
4058 that in the feature pop-up boxes in this menu, the Location will always
4059 be entered as the location relative to the alignment coordinates.
4060
4061 <HR>
4062
4063 <CENTER>
4064 <P>&nbsp
4065 <P CLASS=medium1><B>Questions or Comments?</B>
4066 <BR>Write to the <A HREF="mailto:info@ncbi.nlm.nih.gov">NCBI Service
4067 Desk</A></P>
4068 <P CLASS=medium1>Revised November 17, 2008
4069
4070 </CENTER>
4071
4072 
4073
4074 </body>
4075 </html>
4076

[ source navigation ] [ diff markup ] [ identifier search ] [ freetext search ] [ file search ]

This page was automatically generated by the LXR engine.
Visit the LXR main site for more information.