NCBI Home IEB Home C Toolkit docs C++ Toolkit source browser C Toolkit source browser (2) |
NCBI C Toolkit Cross ReferenceC/data/sequin.hlp |
source navigation diff markup identifier search freetext search file search |
1 <HTML> <HEAD> 2 3 <TITLE>Sequin help documentation</TITLE> 4 5 <!-- if you use the following meta tags, uncomment them. 6 <meta name="author" content="sequindoc"> 7 <META NAME="keywords" CONTENT="national center for biotechnology information, ncbi, national library of medicine, nlm, national institutes of health, nih, database, archive, bookshelf, pubmed, pubmed central, bioinformatics, biomedicine, sequence submission, sequin, bankit, submitting sequences"> 8 <META NAME="description" CONTENT="Sequin is a stand-alone software tool developed by the National Center for Biotechnology Information (NCBI) for submitting and updating entries to the GenBank, EMBL, or DDBJ sequence databases. "> --> 9 <link rel="stylesheet" href="ncbi_sequin.css"> 10 11 </HEAD> 12 13 <body bgcolor="#FFFFFF" text="#000000" link="#0033CC" vlink="#0033CC"> 14 <!-- change the link and vlink colors from the original orange (link="#CC6600" vlink="#CC6600") --> 15 16 <!-- the header --> 17 <table border="0" width="600" cellspacing="0" cellpadding="0"> 18 <tr> 19 <td width="140"><a href="http://www.ncbi.nlm.nih.gov"> <img src="http://www.ncbi.nlm.nih.gov/corehtml/left.GIF" width="130" height="45" border="0"></a></td> 20 <td width="360" class="head1" valign="BOTTOM"> <span class="H1">Sequin Help Documentation</span></td> 21 <!-- <td width="100" valign="BOTTOM">Your Logo</td> --> 22 </tr> 23 </table> 24 25 <!-- the quicklinks bar --> 26 <table CLASS="TEXT" border="0" width="600" cellspacing="0" cellpadding="3" bgcolor="#000000"> 27 <tr CLASS="TEXT" align="CENTER"> 28 <td width="100"><a href="index.html" class="BAR">Sequin</a></td> 29 <td width="100"><a href="http://www.ncbi.nlm.nih.gov/Entrez/" class="BAR">Entrez</a></td> 30 <td width="100"><a href="http://www.ncbi.nlm.nih.gov/BLAST/" class="BAR">BLAST</a></td> 31 <td width="100"><a href="http://www.ncbi.nlm.nih.gov/omim/" class="BAR">OMIM</a></td> 32 <td width="100"><a href="http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html" class="BAR">Taxonomy</a></td> 33 <td width="100"><a href="http://www.ncbi.nlm.nih.gov/Structure/" class="BAR">Structure</a></td> 34 </tr> 35 </table> 36 37 <!-- the contents --> 38 <P>  39 40 <H2>Table of Contents</H2> 41 42 <HR> 43 44 >Introduction 45 46 #Sequin is a program designed to aid in the submission of sequences to 47 the GenBank, EMBL, and DDBJ sequence databases. It was written at the 48 National Center for Biotechnology Information, part of the National 49 Library of Medicine at the National Institutes of Health. This section 50 of the help document provides a basic overview of how to submit 51 sequences using the Sequin forms. Subsequent sections provide detailed 52 instructions for entering information on each form. 53 54 *The Help Documentation 55 56 #The Sequin help documentation is available in both on-line and World 57 Wide Web (http://www.ncbi.nlm.nih.gov/Sequin/sequin.hlp.html) formats. 58 The text of the on-line version scrolls as you progress through the 59 Sequin forms. Specific words or phrases can be identified with the 60 "find" command at the top of the window. The on-line document can also 61 be saved as a text file, or printed directly to a printer. Click on the 62 window that contains the help documentation. Under the Sequin File 63 menu, choose Export Help... to save the documentation as a text file. 64 To print the documentation without saving it first, click on the help 65 window, and choose Print from the Sequin File menu. 66 67 *Organization of Forms 68 69 #Information is entered into Sequin on a number of different forms. Each 70 form is made up of pages, which are indicated by folder tabs at the top 71 of the form. You can move to the desired page by clicking on the 72 appropriate folder tab. You can also move between pages of a form by 73 clicking on the "Next page" or "Prev page" buttons at the bottom of the 74 screen. You can move to the previous form or the next form by clicking 75 on the "Prev form" or "Next form" buttons on the first or last pages of 76 a form, respectively. 77 78 #There are numerous ways to enter information onto a page of a form, 79 including text fields, radio buttons, check boxes, scrolling boxes, 80 pop-up menus and spreadsheets. 81 82 #You may also use tables to import annotation of source information. 83 The formatting of these tables will be discussed below. 84 85 *Overview of Sequin 86 87 #If you are using Sequin for the first time, you will be prompted to 88 fill out four forms: the Welcome to Sequin form, the Submitting 89 Authors Form, the Sequence Format form, and the Organism and Sequences 90 Form. After you have filled out these forms, a window will appear that 91 contains the Sequin record viewer. This viewer allows you to access 92 many other forms in which you can edit fields filled out in the three 93 initial forms, as well as add additional information. Detailed 94 instructions on how to fill out the forms and use the record viewer are 95 presented below. 96 97 >Welcome to Sequin Form 98 99 #First, indicate with one of the three radio buttons whether you are 100 submitting the sequence to the GenBank, EMBL, or DDBJ database. If you 101 are working on a sequence submission for the first time, click on 102 "Start New Submission". If you are modifying an existing submission 103 record, click on "Read Existing Record". If you would like to quit from 104 Sequin, click on "Quit Program". 105 106 #You can also "Read Existing Record" to read in a FASTA-formatted sequence 107 file for analysis purposes. The sequence will be displayed in Sequin and can 108 be analyzed with tools such as CDD Search, but it should not be submitted 109 because it does not have the appropriate annotations. 110 111 #If you are running Sequin in its network-aware mode, you will see 112 another button labeled "Download from Entrez". This option allows you 113 to update an existing database record using Sequin. The record will be 114 downloaded from GenBank into Sequin using NCBI's Entrez retrieval 115 system. The contents of the record will appear in Sequin, and you can 116 edit them by updating the sequence or the annotations, as necessary. If 117 you do not see the button labeled "Download from Entrez" on the Welcome 118 to Sequin form, you are not running Sequin in its network-aware mode. 119 To make Sequin network-aware, see the 120 <A HREF="#NetConfigure"> 121 instructions 122 </A> 123 later in the help documentation. 124 125 #You can update only those records that you have submitted, not those 126 submitted by others. To update an existing record, first select which 127 of the databases you will be sending the update to. This should be the 128 database to which the original record was submitted. If you do not 129 know which database to use, send the record to GenBank and the NCBI 130 staff will forward it to the appropriate database. Next, click on the 131 button "Download from Entrez". Enter the nucleotide Accession number or 132 GI of the sequence on the first form. Then enter "yes" if you are 133 planning to submit the record as an update to one of the databases. 134 Fill out the Submitting Authors form. 135 136 <A HREF="#EditSubmitterInfo"> 137 Instructions 138 </A> 139 140 for this form are found in the Sequin help documentation under "Edit 141 Submitter Info" under the Sequin File menu. The record will then open 142 in the record viewer. Explanations of how to add annotations or update 143 sequences are presented in the documentation entitled 144 145 <A HREF="#EditingtheRecord"> 146 "Editing the record" 147 </A> 148 and 149 <A HREF="#SequenceEditor"> 150 Sequence Editor 151 </A> 152 153 respectively. You will not see the Submitting Authors Form, the 154 Sequence Format Form, or the Organism and Sequences Form. Note that 155 updates, as well as new records, must be emailed to the appropriate 156 database. Sequin does not support direct submission of records over the 157 Internet. 158 159 #Additional configuration options are available under the Misc menu. 160 You can toggle between the stand-alone and network-aware modes of 161 Sequin. The default mode of Sequin, which is sufficient for most 162 sequence submissions, is stand-alone. In its network-aware mode, Sequin 163 can exchange data with NCBI and, for example, retrieve sequences 164 from Entrez and perform Taxonomy searches. The network-aware mode of 165 Sequin is described in detail in the 166 <A HREF="#NetConfigure"> 167 Net Configure 168 </A> 169 section below. You can also start the NCBI DeskTop, which is for 170 advanced Sequin users only. 171 172 >Submitting Authors Form 173 174 #Information from this form will be used as a citation for the sequence 175 entry itself. It can contain the same information found in citations 176 associated with the formal publication of the sequence. 177 178 #On the bottom of each form are two buttons. Click "Prev form" (first 179 page in a form) or "Prev page" (subsequent pages in a form) to go to the 180 previous form or page. Click "Next Form" (last page on a form) or "Next 181 Page" (earlier pages on a form) to move to the next form or page. 182 183 #Form pages can also be saved individually by using the "Export" function 184 under the File menu. If you are processing multiple submissions, you 185 can use the "Import" function under the File menu to paste previously 186 entered information directly on the page. 187 188 #The Contact, Authors, and Affiliation pages can be saved as a block so 189 that you can use this information for your next submission. For your 190 first Sequin submission, fill in the requested information on the 191 Submitting Authors form and proceed with the preparation of the 192 submission. Choose Export Submitter Info under the File menu to export 193 this to a file. You can then import this information in subsequent 194 submissions using the Import Submitter Info in the File menu. You will 195 need to fill in the manuscript title for each submission however. 196 197 *Submission Page 198 199 **When May We Release Your Sequence Record? 200 201 #Please select one of the two radio buttons. If you select 202 #"Immediately After Processing", the 203 entry will be released to the public after the database staff has added 204 it to the database. If you select "Release Date", fields will appear in 205 which you can indicate the date on which the sequences should be 206 released to the public. The submission will then be held back until 207 formal publication of the sequence or GenBank Accession number, or 208 until the release date, whichever comes first. The maximum hold 209 time is five years. 210 211 **Tentative Title for Manuscript 212 213 #Please enter a title that appropriately describes the sequence entry. 214 Later in the submission process, you will have the 215 opportunity to change this information and add details for published 216 or in press references. 217 218 *Contact Page 219 220 #Please enter the name, telephone and fax numbers, and email address of 221 the person who is submitting the sequence. This is the person who will 222 be contacted regarding the sequence submission. The phone, fax, and 223 email address will not be visible in the database record, but are 224 essential for contact by the database staff. 225 226 *Authors Page 227 228 #Please enter the names of the people who should receive scientific 229 credit for the generation of sequences in this entry. The person on 230 the Contact page is automatically listed as the first author. This 231 information can be changed if necessary. The author names should be 232 entered in the order first name, middle initial, surname. You can add 233 as many authors to this page as you wish. After you type in the name 234 of the third author, the box becomes a spreadsheet, and you can scroll 235 down to the next line by using the space bar. The consortium box 236 should only be used for consortium names, not institute or department 237 names. 238 239 *Affiliation Page 240 241 #Please enter information about the principal institution where the 242 sequencing was performed. This is not necessarily the same as the 243 workplace of the person described on the Contact page. This information 244 will show up in the reference section of the record, with the title 245 Direct Submission. 246 247 >Sequence Format Form 248 249 #Use this form to indicate the type, format and category of sequence 250 you are submitting. 251 252 #Sequin can process single nucleotide sequences, gapped sequences and 253 sets of related sequences. If the sequences are related in terms of 254 coming from the same publication, or the same organism, they may be 255 candidates for a Batch submission. Biologically related sequences may 256 be classified as environmental samples, population, phylogenetic, 257 mutation, or segmented sets as appropriate. Segmented sets consist of 258 a collection of non-overlapping sequences covering a specific genetic 259 region. In all cases, although the sequences are handled as a single 260 submission, each sequence in a set will receive its own database 261 Accession number and can be annotated independently. 262 263 #Sequin can display the alignments of sequences that are submitted as 264 part of an aligned phylogenetic, population, mutation set, or 265 environmental samples. Such sequences can be submitted in FASTA, 266 Contiguous (FASTA+GAP, NEXUS, MACAW), or Interleaved (PHYLIP, NEXUS) 267 formats. If the sequences are in FASTA format, Sequin can generate an 268 alignment. If the sequences have already been aligned in FASTA+GAP, 269 PHYLIP, MACAW, or NEXUS, Sequin will not change the alignment. If one 270 of the sequences in your alignment is already present in the 271 GenBank/EMBL/DDBJ database, you must mark that sequence so that it does 272 not receive a new Accession number. Instead of supplying that sequence 273 with a new Sequence Identifier, give it the identifier accU12345, where 274 U12345 is the Accession number of the sequence. 275 276 #Single sequences, gapped sequences, segmented sequences, and batch 277 submissions must be submitted in FASTA format. 278 279 *Submission Type 280 281 #Use the radio buttons to indicate which of the following types of 282 submissions you are creating: 283 284 #-Single sequence: a single mRNA or genomic DNA sequence. If you are 285 submitting multiple sequences from the same publication, consider a 286 Batch Submission. If you decide to submit multiple Sequin files, each 287 with one or more sequences, please send each file in a separate email 288 message. 289 290 #-Segmented sequence: a collection of non-overlapping, non-contiguous 291 sequences that cover a specified genetic region. A standard example is a set 292 of genomic DNA sequences that encode exons from a gene along with fragments of 293 their flanking introns. If the segmented set is part of an alignment, 294 however, select the appropriate Population, Phylogenetic, or Mutation study 295 button. The Gapped sequence option may be a better display of the biology of 296 these types of records. 297 298 #-Gapped sequence: a single, non-contiguous mRNA or genomic DNA sequence. 299 A gapped sequence contains specified gaps of know or unknown length 300 where the exact nucleotide sequence has not been determined. The FASTA 301 format for gapped sequences is slightly different and is explained 302 below. 303 304 #-Population study: a set of sequences that were derived by sequencing 305 the same gene from different isolates of the same organism. 306 307 #-Phylogenetic study: a set of sequences that were derived by sequencing 308 the same gene from different organisms. 309 310 #-Mutation study: a set of sequences that were derived by sequencing 311 multiple mutations of a single gene. 312 313 #-Environmental samples: a set of sequences that were derived by 314 sequencing the same gene from a population of unclassified or unknown 315 organisms. 316 317 #-Batch submission: a set of related sequences that are not part of a 318 population, mutation, or phylogenetic study. The sequences should be 319 related in some way, such as coming from the same publication or 320 organism. You should plan that all sequences will be released to the 321 public on the same date. 322 323 *Sequence Data Format 324 325 #If you are submitting a single, gapped, or segmented sequence, or a 326 batch submission, your sequence must be in FASTA format, described 327 below. If you are submitting a set of sequences as part of a 328 population, phylogenetic, or mutation study, you have a choice of 329 sequence formats. You may submit the set as individual sequences in 330 FASTA format. Alternatively, you can submit the sequences as part of 331 an alignment. Sequin currently accepts the alignment formats 332 FASTA+GAP, PHYLIP, MACAW, NEXUS Interleaved, and NEXUS Contiguous. 333 334 *Submission Category 335 336 #Use the radio buttons to indicate whether your sequence corresponds to 337 an original submission or a third-party annotation submission. If you 338 have directly sequenced the nucleotide sequence in your laboratory, 339 your submission would be considered an original submission. 340 341 #If you have downloaded the sequence from GenBank and added to it your 342 own annotations, your entry may be eligible for submission to the 343 Third-Party Annotation Database 344 345 <A HREF="http://www.ncbi.nlm.nih.gov/Genbank/TPA.html"> 346 (TPA) 347 </A> 348 . 349 350 #In order to be released into the TPA database, the sequence must appear 351 in a peer-reviewed publication in a biological journal. If you select 352 this option, a pop-up box will appear upon the completion of the 353 Sequence Format form. You must provide some description of the 354 biological experiments used as evidence for the annotation of your TPA 355 submission in this box. 356 357 #You will be asked later in the submission process to provide the GenBank 358 Accession number(s) of the primary sequence(s) from which your TPA 359 submission was derived. 360 361 >Organism and Sequences Form 362 363 #This form is made up of four pages. If your sequences are imported as 364 properly formatted FASTA files, there will be minimum input necessary 365 in these pages. 366 367 >FASTA Format for Nucleotide Sequences 368 369 #In FASTA format the line before the nucleotide sequence, called the 370 FASTA definition line, must begin with a carat (">"), followed by a 371 unique SeqID (sequence identifier). The SeqID must be unique for each 372 nucleotide sequence and should not contain any spaces. Use of brackets 373 ("[]") in the SeqID is also prohibited. The identifier will be 374 replaced with an Accession number by the database staff when your 375 submission is processed. 376 377 #Information about the source organism from which the sequence was 378 obtained follows the SeqID and must be in the format [modifier=text]. 379 Do not put spaces around the "=". At minimum, the scientific name of 380 the organism should be included. Optional modifiers can be added to 381 provide additional information. A complete list of available source 382 <A HREF="http://www.ncbi.nlm.nih.gov/Sequin/modifiers.html"> 383 modifiers 384 </A> 385 and their format is available. 386 387 #The final optional component of the FASTA definition line is the 388 sequence title, which will be used as the DEFINITION field in the final 389 flatfile. The title should contain a brief description of the 390 sequence. There is a preferred format for nucleotide and protein 391 titles and Sequin can generate them automatically using the Generate 392 Definition Line function under the Annotate menu in the record viewer. 393 394 #Note in all cases, the FASTA definition line must not contain any hard 395 returns. All information must be on a single line of text. If you 396 have trouble importing your FASTA sequences, please double check that 397 no returns were added to the FASTA definition line by your editing 398 software. 399 400 #Examples of properly formatted FASTA definition lines for nucleotide 401 sequences are: 402 403 <KBD><PRE>>Seq1 [organism=Mus musculus] [strain=C57BL/6] Mus musculus neuropilin 1 (Nrp1) mRNA, complete cds. 404 </KBD></PRE> 405 <KBD><PRE>>ABCD [organism=Plasmodium falciparum] [isolate=ABCD] Plasmodium falciparum isolate ABCD merozoite surface protein 2 (msp2) gene, partial cds. 406 </KBD></PRE> 407 <KBD><PRE>>DNA.new [organism=Homo sapiens] [chromosome=17] [map=17q21] [moltype=mRNA] Homo sapiens breast and ovarian cancer susceptibility protein (BRCA1) mRNA, complete cds. 408 </KBD></PRE> 409 #The line after the FASTA definition line begins the nucleotide 410 sequence. Unlike the FASTA definition line, the nucleotide sequence 411 itself can contain returns. It is recommended that each line of 412 sequence be no longer than 80 characters. Please only use IUPAC 413 symbols within the nucleotide sequence. For sequences that are not 414 contained within an alignment, do not use "?" or "-" characters. These 415 will be stripped from the sequence. Use the IUPAC approved symbol "N" 416 for ambiguous characters instead. 417 418 #A single file containing multiple FASTA sequences can be imported into 419 Sequin in order to create a 420 <A HREF="#SubmissionType"> 421 Batch Submission 422 </A> 423 . Make sure that the FASTA definition line for each sequence is 424 formatted as above. 425 426 #If the FASTA definition line is not properly formatted a pop-up box 427 will appear upon importing the nucleotide FASTA. The top box in this 428 pop-up will list any errors in the FASTA definition lines, including 429 missing SeqIDs, duplicate SeqIDs for different sequences, or improperly 430 formatted modifiers. You can add or edit this information in the 431 spreadsheet provided. The toggle at the bottom of the pop-up allows 432 you to select whether all sequences or only those with errors are 433 listed in the spreadsheet above. After making changes, click on Refresh 434 Error List to ensure that all errors have been corrected. You must 435 correct any errors involving the SeqID in order to proceed with your 436 submission. 437 438 *FASTA Format for Gapped Sequence 439 440 #The FASTA definition line for a gapped sequence follows the same format 441 as above. To indicate a gap within the sequence, enter a hard return 442 within the sequence at the point of the gap, then insert an extra line 443 starting with a carat (">") and a question mark ("?"). If the gap size 444 is unknown, enter "unk100" after the question mark. If the gap size is 445 known, enter the length of the gap after the question mark. For 446 example, 447 448 !>Dobi [organism=Canis familiaris] [breed=Doberman pinscher] 449 !AAATGCATGGGTAAAAGTAGTAGAAGAGAAGGCTTTTAGCCCAGAAGTAATACCCATGTTTTCAGCATTA 450 !GGAAAAAGGGCTGTTG 451 !>?unk100 452 !TGGATGACAGAAACCTTGTTGGTCCAAAATGCAAACCCAGATKGTAAGACCATTTTAAAAGCATTGGGTC 453 !TTAGAAATAGGGCAACACAGAACAAAAAT 454 !>?234 455 !AAAAATAAAAGCATTAGTAGAAATTTGTACAGAACTGGAAAAGGAAGGAAAAATTTCAAAAATTGGGCCT 456 !GAAAACCCATACAATACTCCGGG 457 458 will generate a sequence containing two gaps. The first gap is of 459 unknown length, the second is 234 nucleotides long. 460 461 *FASTA+GAP Format for Aligned Nucleotide Sequences 462 463 #A number of programs output sets of aligned sequences in FASTA format. 464 Frequently, to align these sequences, gaps must be inserted. The 465 default alignment settings should correctly interpret gap and ambiguous 466 characters in most cases. If Sequin can not read your alignment, you 467 may need to change these settings using the Optional Alignment Settings 468 button on the 469 <A HREF="#NucleotidePage"> 470 Nucleotide Page 471 </A> 472 form. Each sequence, including gaps, must be the same length. The 473 gaps will only show up in the alignment, not in the individual sequence 474 in the database. 475 476 #Sequences in FASTA+GAP format resemble FASTA sequences. The previous 477 section on 478 479 <A HREF="#FASTAFormatforNucleotideSequences"> 480 FASTA Format for Nucleotide Sequences 481 </A> 482 483 has instructions for formatting FASTA sequences. If one of the 484 sequences in your alignment is already present in the GenBank/EMBL/DDBJ 485 database, you must mark that sequence so that it does not receive a new 486 Accession number. To do this, use a SeqID in the format accU12345, 487 where U12345 is the Accession number of the pre-existing sequence. All 488 sequences in FASTA+GAP format should be in the same file. 489 490 #The following is an example of FASTA+GAP format: 491 492 !>A-0V-1-A [organism=Gallus gallus] [clone=C] 493 !TCACTCTTTGGCAACGACCCGTCGTCATAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA 494 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT 495 ! 496 !>A-0V-2-A [organism=Drosophila melanogaster] [strain=D] 497 !TCACTCTTTGGCAAC---GCGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA 498 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT 499 ! 500 !>A-0V-3-A [organism=Caenorhabditis elegans] [strain=E] 501 !TCACTCTTTGGCAAC---GCGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA 502 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT 503 ! 504 !>A-0V-4-A [organism=Rattus norvegicus] [strain=F] 505 !TCACTCTTTGGCAACGACCCGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA 506 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT 507 ! 508 !>A-0V-7-A [organism=Aspergillus nidulans] [strain=G] 509 !TCACTCTTTGGCAACGACCAGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA 510 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT 511 512 *PHYLIP Format for Aligned Nucleotide Sequences 513 514 #A number of programs output sets of aligned sequences in PHYLIP format. 515 516 #The following is an example of PHYLIP format. 517 518 ! 5 100 519 !A-0V-1-A TCACTCTTTG GCAACGACCC GTCGTCATAA TAAAGATAGA GGGGCAACTA 520 !A-0V-2-A TCACTCTTTG GCAAC---GC GTCGTCACAA TAAAGATAGA GGGGCAACTA 521 !A-0V-3-A TCACTCTTTG GCAAC---GC GTCGTCACAA TAAAGATAGA GGGGCAACTA 522 !A-0V-4-A TCACTCTTTG GCAACGACCC GTCGTCACAA TAAAGATAGA GGGGCAACTA 523 !A-0V-7-A TCACTCTTTG GCAACGACCA GTCGTCACAA TAAAGATAGA GGGGCAACTA 524 ! 525 ! 526 ! AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT 527 ! AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT 528 ! AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT 529 ! AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT 530 ! AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT 531 532 #In this example, the first line indicates that there are 5 sequences, 533 each with 100 nt of sequence. The following five lines contain the 534 Sequence IDs, followed by the sequences. Specifically, the sequence 535 identifier for the first sequence is A-0V-1-A. Note that subsequent 536 blocks of sequence do not contain the Sequence ID. If one of the 537 sequences in your alignment is already present in the GenBank/EMBL/DDBJ 538 database, you must mark that sequence so that it does not receive a new 539 Accession number. To do this, use a SeqID in the format accU12345, 540 where U12345 is the Accession number of the pre-existing sequence. 541 542 #The default alignment settings should correctly interpret gap and 543 ambiguous characters in most cases. If Sequin can not read your 544 alignment, you may need to change these settings using the Optional 545 Alignment Settings button on the 546 <A HREF="#NucleotidePage"> 547 Nucleotide Page 548 </A> 549 form. 550 551 #You can modify the PHYLIP format so that Sequin can 552 determine the correct organism and any other modifiers for each 553 sequence. An example of such modifications are below in the section on 554 <A HREF="#SourceModifiersforPHYLIPandNEXUS"> 555 Source Modifiers for PHYLIP and NEXUS 556 </A> 557 . 558 #Alternatively, you can leave your sequence alignment in 559 standard PHYLIP format and enter the organism, strain, chromosome, etc. 560 information on the following 561 562 <A HREF="#ImportSourceModifiers"> 563 Source Modifers form 564 </A> 565 . 566 567 *NEXUS Format for Aligned Nucleotide Sequences 568 569 #A number of programs output sets of aligned sequences in one of two 570 NEXUS formats, NEXUS Interleaved and NEXUS Contiguous. 571 572 #NEXUS files can contain ? for "missing" at the 5' and 3' ends of 573 sequences, as long as this parameter is properly defined within the 574 header of the NEXUS file. 575 576 #The following is an example of NEXUS Interleaved format. 577 578 !#NEXUS 579 ! 580 !begin data; 581 ! dimensions ntax=5 nchar=100; 582 ! format datatype=dna missing=? gap=- interleave; 583 ! matrix 584 ! 585 !A-0V-1-A TCACTCTTTG GCAACGACCC GTCGTCATAA TAAAGATAGA GGGGCAACTA 586 !A-0V-2-A TCACTCTTTG GCAAC---GC GTCGTCACAA TAAAGATAGA GGGGCAACTA 587 !A-0V-3-A TCACTCTTTG GCAAC---GC GTCGTCACAA TAAAGATAGA GGGGCAACTA 588 !A-0V-4-A TCACTCTTTG GCAACGACCC GTCGTCACAA T????ATAGA GGGGCAACTA 589 !A-0V-7-A TCACTCTTTG GCAACGACCA GTCGTCACAA TAAAGATAGA GGGGCAACTA 590 ! 591 ! 592 !A-0V-1-A AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT 593 !A-0V-2-A AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT 594 !A-0V-3-A AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT 595 !A-0V-4-A AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT 596 !A-0V-7-A AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT 597 598 #In this example, the first few lines provide information about the data 599 in the sequence alignment. The following five lines contain the 600 Sequence IDs, followed by the sequences. Specifically, the sequence 601 identifier for the first sequence is A-0V-1-A. Note that subsequent 602 blocks of sequence also contain the Sequence ID. If one of the 603 sequences in your alignment is already present in the GenBank/EMBL/DDBJ 604 database, you must mark that sequence so that it does not receive a new 605 Accession number. To do this, use a SeqID in the format accU12345, 606 where U12345 is the Accession number of the pre-existing sequence. 607 Also, Sequin will replace the "?" characters in the sequences with "N"s 608 since they are defined as "missing" data in the header. The default 609 alignment settings should correctly interpret gap and ambiguous 610 characters in most cases. If Sequin can not read your alignment, you 611 may need to change these settings using the Optional Alignment Settings 612 button on the 613 <A HREF="#NucleotidePage"> 614 Nucleotide Page 615 </A> 616 form. 617 618 #You can modify either NEXUS format so that Sequin can 619 determine the correct organism and any other modifiers for each 620 sequence. An example of such modifications are below in the section on 621 <A HREF="#SourceModifiersforPHYLIPandNEXUS"> 622 Source Modifiers for PHYLIP and NEXUS 623 </A> 624 . 625 #Alternatively, you can leave your sequence alignment in 626 standard NEXUS format and enter the organism, strain, chromosome, etc. 627 information on the following 628 629 <A HREF="#SourceModifiersForm"> 630 Source Modifers form 631 </A> 632 . 633 #The following is an example of NEXUS Contiguous format. 634 635 !#NEXUS 636 !BEGIN DATA; 637 !DIMENSIONS NTAX=5 NCHAR=100; 638 !FORMAT MISSING=? GAP=- DATATYPE=DNA ; 639 !MATRIX 640 ! 641 !A-0V-1-A 642 !TCACTCTTTGGCAACGACCCGTCGTCATAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA 643 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT 644 ! 645 !A-0V-2-A 646 !TCACTCTTTGGCAAC---GCGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA 647 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT 648 ! 649 !A-0V-3-A 650 !TCACTCTTTGGCAAC---GCGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA 651 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT 652 ! 653 !A-0V-4-A 654 !TCACTCTTTGGCAACGACCCGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA 655 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT 656 ! 657 !A-0V-7-A 658 !TCACTCTTTGGCAACGACCAGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA 659 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT 660 661 #In this example, the first few lines provide information about the data 662 in the sequence alignment. The following five lines contain the 663 Sequence IDs, followed by the sequences. Specifically, the sequence 664 identifier for the first sequence is A-0V-1-A. Note that subsequent 665 blocks of sequence also contain the Sequence ID. If one of the 666 sequences in your alignment is already present in the GenBank/EMBL/D 667 DBJ database, you must mark that sequence so that it does not receive a 668 new Accession number. To do this, use a SeqID in the format accU12345, 669 where U12345 is the Accession number of the pre-existing sequence. 670 671 #You can modify either NEXUS format so that Sequin can 672 determine the correct organism and any other modifiers for each 673 sequence. An example of such modifications are below in the section on 674 <A HREF="#SourceModifiersforPHYLIPandNEXUS"> 675 Source Modifiers for PHYLIP and NEXUS 676 </A> 677 . 678 #Alternatively, you can leave your sequence alignment in 679 standard NEXUS format and enter the organism, strain, chromosome, etc. 680 information on the following 681 682 <A HREF="#SourceModifiersForm"> 683 Source Modifers form 684 </A> 685 . 686 687 **Source Modifiers for PHYLIP and NEXUS 688 689 #You can modify the PHYLIP or NEXUS formats so that Sequin can determine 690 the correct organism and any other modifiers for each sequence by 691 adding lines at the end of the file. The first line applies to the 692 first sequence, the second line to the second sequence, and so on. You 693 must have one line for each sequence. These inserted lines contain 694 modifiers formatted like in the FASTA definition line, but do not begin 695 with a SeqID. Instead, the SeqID is present at the beginning of the 696 sequence lines as shown above. 697 698 #Each of the initial lines starts with the character ">". The 699 scientific organism name follows in brackets. Optional modifiers also 700 follow in brackets. For further information on the data that can go in 701 the lines preceding the sequences, see the instructions entitled "FASTA 702 Format for Nucleotide Sequences", 703 704 <A HREF="#FASTAFormatforNucleotideSequences"> 705 above. 706 </A> 707 708 #The following lines indicating the organisms and strain of each sequence 709 would follow immediately after the sequence in the PHYLIP and NEXUS 710 examples, above. 711 712 !; 713 !END; 714 ! 715 !begin ncbi; 716 !sequin 717 !>[organism=Gallus gallus] [clone=C] 718 !>[organism=Drosophila melanogaster] [strain=D] 719 !>[organism=Caenorhabditis elegans] [strain=E] 720 !>[organism=Rattus norvegicus] [strain=F] 721 !>[organism=Aspergillus nidulans] [strain=G] 722 !; 723 !end; 724 725 #The number of lines of source information must exactly match the number 726 of sequences provided. Complete examples can be found in the 727 <A HREF="http://www.ncbi.nlm.nih.gov/Sequin/QuickGuide/sequin.htm#AlignmentFormats"> 728 Alignment Formats 729 </A> 730 section of the Sequin Quick Guide. 731 732 #Alternatively, you can leave your sequence alignment in 733 standard NEXUS or PHYLIP format and enter the organism, strain, chromosome, etc. 734 information on the following 735 736 <A HREF="#OrganismPage"> 737 Organism Page 738 </A> 739 . 740 741 >Nucleotide Page 742 743 #The options on this page will vary depending on the 744 <A HREF="#SubmissionType"> 745 Submission Type 746 </A> 747 and 748 <A HREF="#SequenceDataFormat"> 749 Sequence Data Format 750 </A> 751 selected earlier. Segmented sets and gapped sequences mut be imported 752 as properly formatted FASTA files. Details about importing alignment 753 files are 754 <A HREF="#NucleotidePageforAlignedDataFormats"> 755 below 756 </A> 757 . 758 759 *Nucleotide Page for FASTA Data Format 760 761 **Create Alignment 762 763 #If you have selected a Population study, Phylogenetic study, Mutation 764 study, or Environmental samples set as a 765 <A HREF="#SubmissionType"> 766 Submission Type 767 </A> 768 a check box will appear at the top of the Nucleotide Page. If you 769 check 'Create Alignment', Sequin will attempt to generate an alignment 770 of the seqeunces within your submission. 771 772 **Import Nucleotide FASTA 773 774 #Use this button to import your properly formatted 775 <A HREF="#FASTAFormatforNucleotideSequences"> 776 FASTA file 777 </A> 778 . You will see a window containing information about the imported 779 sequence(s). Please check the number of sequences, Sequence IDs 780 (SeqIDs) and length of each sequence to make sure they are correct. If 781 you have included source information within the FASTA definition line, 782 this will also be listed. 783 784 **Add/Modify Sequences 785 786 #This option allows you to add or modify sequences without using a 787 previously formatted FASTA file, but is not available if you have 788 selected a Segmented sequence or Gapped sequence as a 789 <A HREF="#SubmissionType"> 790 Submission Type 791 </A> 792 . On the Specify Sequences box you can either import a nucleotide FASTA 793 or add a new sequence. If you choose Add New Sequence, a new box will 794 pop-up where you can either import an existing sequence file or 795 directly paste or type the nucleotide sequence. 796 797 #If you add a sequence where the FASTA definition line is not properly 798 formatted a pop-up box will appear. The top box in this pop-up will 799 list any errors in the FASTA definition lines, including missing 800 SeqIDs, duplicate SeqIDs for different sequences, or improperly 801 formatted modifiers. You can add or edit this information in the 802 spreadsheet provided. The toggle at the bottom of the pop-up allows 803 you to select whether all sequences or only those with errors are 804 listed in the spreadsheet above. After making changes, click on 805 Refresh Error List to ensure that all errors have been corrected. You 806 must correct any errors involving the SeqID in order to proceed with 807 your submission. Click on Accept to save your sequences and return to 808 the Specify Sequences box. 809 810 #In the Specify Sequences box, you can choose to add another sequence or 811 select a sequence from the list and choose to edit or delete it. You 812 can also delete all sequences at this point. You will need to click on 813 Done to save your sequences and return to the Nucleotide Page. 814 815 **Clear Sequences 816 817 #This option will remove all imported nucleotide sequences. 818 819 **Specify Molecule 820 821 #A database sequence can represent one of several different molecule 822 types. The default molecule is genomic DNA. If the sequence was not 823 derived from genomic DNA, you can edit that information here. If you 824 are submitting multiple sequences you can apply one molecule type to 825 all sequences or apply the molecule type to each sequence individually. 826 Enter in the Molecule pop-up menu the type of molecule that was 827 sequenced. 828 829 #-Genomic DNA: Sequence derived directly from the DNA of an organism. 830 Note: The DNA sequence of an rRNA gene has this molecule type, as does 831 that from a naturally-occurring plasmid. 832 833 #-Genomic RNA: Sequence derived directly from the genomic RNA of certain 834 organisms, such as viruses. 835 836 #-Precursor RNA: An RNA transcript before it is processed into mRNA, 837 rRNA, tRNA, or other cellular RNA species. 838 839 #-mRNA[cDNA]: A cDNA sequence derived from mRNA. 840 841 #-Ribosomal RNA: A sequence derived from the RNA in ribosomes. This 842 should only be selected if the RNA itself was isolated and sequenced. 843 If the gene for the ribosomal RNA was sequence, select Genomic DNA. 844 845 #-Transfer RNA: A sequence derived from the RNA in a transfer RNA, for 846 example, the sequence of a cDNA derived from tRNA. 847 848 #-Small nuclear RNA: A sequence derived from small nuclear RNA, for 849 example, the sequence of a cDNA derived from snRNA. 850 851 #-Small cytoplasmic RNA: A sequence derived from small cytoplasmic RNA, 852 for example, the sequence of a cDNA derived from small cytoplasmic RNA. 853 854 #-Other-Genetic: A synthetically derived sequence including cloning 855 vectors and tagged fusion constructs. 856 857 #-cRNA: A sequence derived from complementary RNA transcribed from DNA, 858 mainly used for viral submissions. 859 860 #-Small nucleolar RNA: A sequence derived from small nucleolar RNA, for 861 example, the sequence of a cDNA derived from snoRNA. 862 863 #-Transcribed RNA: A sequence derived from any transcribed RNA not 864 listed above. 865 866 #-Tranfer-messenger RNA: A sequence derived from transfer-messenger RNA, 867 which acts as a tRNA first and then an mRNA that encodes a peptide tag. 868 If the gene for the tmRNA was sequenced, use genomic DNA. 869 870 **Specify Topology 871 872 #Most sequences have a Linear topology and this is the default. You 873 should change this setting to Circular only if the sequence is complete 874 and it has a circular topology. For example, a complete plasmid or a 875 complete mitochondrial genome would have a Circular topology, but a 876 single gene from a plasmid or mitochondrion would have a Linear 877 topology. If you are submitting multiple sequences you can apply one 878 topology to all sequences or set the topology for each sequence 879 individually. 880 881 *Nucleotide Page for Aligned Data Formats 882 883 **Sequence Characters 884 885 #If you are submitting a set of aligned sequences, you can specify sequence 886 characters used in your alignment here. Sequin requires that you 887 define any non-IUPAC nucleotide characters in your alignment file. The 888 five types of variable characters are listed under Sequence Characters. 889 890 #Every sequence within an alignment file must contain the same number of 891 characters (nucleotides + gaps). Gap characters are used to represent the 892 spaces between contiguous nucleotides in an alignment. Gaps that appear at 893 the beginning or end of a sequence are treated differently than gaps that 894 appear between nucleotides and each must be defined. GenBank prefers to 895 use a hyphen (-) to represent gaps. If you use a different character to 896 represent a gap, you will need to add this character to the list in the 897 Beginning Gap, Middle Gap, or End Gap boxes. 898 899 #Ambiguous characters represent nucleotides that are known to exist, but 900 whose identity has not been experimentally validated. GenBank prefers to 901 use 'n' to represent any ambiguous nucleotides. If you are using a 902 different character to represent an ambiguous base, you will need to add 903 this character to the list in the Ambiguous/Unknown box. Sequin will 904 convert these characters to 'n's when your file is imported. 905 906 #Match characters denote nucleotides that are identical in every member of 907 an alignment. GenBank prefers the use of a colon (:) to represent match 908 characters. If you are using a different character to represent a match 909 character, you will need to add this character to the list in the Match box. 910 911 **Import Nucleotide Alignment 912 913 #Once you have imported the alignment using the Import Nucleotide 914 Alignment button, you can edit the molecule information using the 915 <A HREF="#SpecifyMolecule"> 916 Specify Molecule 917 </A> 918 and 919 <A HREF="#SpecifyTopology"> 920 Specify Topology 921 </A> 922 buttons explained above. Note that you can not access the 923 <A HREF="#Add/ModifySequences"> 924 Add/Modify Sequences 925 </A> 926 dialog for submissions of aligned sequences. 927 928 >Organism Page 929 930 #Information about the organism from which the sequence was derived 931 should be entered or edited on this page. If there are any potential 932 problems with the organism information previously provided in either 933 the 934 <A HREF="#FASTAFormatforNucleotideSequences"> 935 FASTA definition line 936 </A> 937 or entered in the 938 <A HREF="#Add/ModifySequences"> 939 Add/Modify Sequences 940 </A> 941 dialog, a window listing these problems will appear at the top of the 942 form. Please review these problems and edit using the 943 944 <A HREF="#AddSourceModifiers"> 945 </A> 946 Add Source Modifiers button as necessary. At minimum, you must supply 947 the scientific name of the organism from which the sequence was 948 obtained in order to proceed with your submission. 949 950 #The second window is a summary of the organism information provided so 951 far. Double clicking on a line of text within this window will launch a 952 modifier-specific editing window. In each of these windows, you can 953 edit the available information for the specific modifier. In most 954 cases, you have the choice to edit the modifier for each sequence 955 separately, or to enter text and select Apply above value to all 956 sequences. These changes will be reflected in the windows of the 957 Organism page immediately upon closing the modifier-specific editor. 958 959 *Add Organisms, Locations, and Genetic Codes 960 961 #If you have not added organism information using either the 962 <A HREF="#FASTAFormatforNucleotideSequences"> 963 FASTA definition line 964 </A> 965 or the 966 <A HREF="#Add/ModifySequences"> 967 Add/Modify Sequences 968 </A> 969 dialog, you can use the Add Organisms, Locations, and Genetic Codes to 970 do so at this point. This button will launch the Multiple Organism 971 Editor pop-up where you may add or edit existing information concerning 972 the 973 <A HREF="#Organism"> 974 Organism 975 </A> 976 name, 977 <A HREF="#Location"> 978 Location 979 </A> 980 and 981 <A HREF="#GeneticCode"> 982 Genetic Code 983 </A> 984 . The SeqID of each sequence is listed at the left of the spreadsheet 985 format. You can change the information in the spreadsheet individually 986 or globally for all sequences. 987 988 **Organism 989 990 #The scrollable list at the top of the pop-up contains the scientific 991 names of many organisms. To reach a name on the list, type the first 992 few letters of the scientific name into the box above the list or the 993 appropriate box in the spreadsheet. The list will scroll to the names 994 beginning with those letters, and you can select the organism within 995 the list itself. You can then use the arrow button to copy this name 996 into the appropriate box in the spreadsheet. 997 998 #To apply the same scientific name to all sequences in the submission, 999 click on the Organism button in the spreadsheet column header. A 1000 separate pop-up box will appear with the same organism list. You can 1001 select a name from this list and choose Accept to apply this name to 1002 all sequences. 1003 1004 #If you have any questions about the scientific name of an organism, see 1005 the NCBI 1006 <A HREF="http://www.ncbi.nlm.nih.gov/Taxonomy/tax.html"> 1007 Taxonomy Browser 1008 </A> 1009 http://www.ncbi.nlm.nih.gov/Taxonomy/tax.html 1010 1011 #If the name of the organism is not on the list, type it in directly. If 1012 you do not know the scientific name, please be as specific as you can 1013 and include a unique identifier, such as a clone, isolate, strain or 1014 voucher number, or cultivar name, e.g.; Nostoc ATCC29106, uncultured 1015 spirochete Im403, Lauraceae sp. Vásquez 25230 (MO), Rosa hybrid 1016 cultivar 'Kazanlik'. Also, if applicable, indicate if the name is 1017 unpublished as of the time of submission. Additional information such 1018 as strain, isolate, or serotype can be entered later in the submission 1019 process. 1020 1021 **Location 1022 1023 #The default Location for all seqeunces is "Genomic". If the sequence 1024 is not genomic, select the alternative location (ie, organelle) from 1025 the pull-down list. You can change the location of all sequences 1026 globally by clicking on the Location button in the spreadsheet header. 1027 The following is a brief description of the choices in this list: 1028 1029 #-Apicoplast: a reduced plastid characteristic of apicomplexans 1030 (e.g., Plasmodium). NOTE: apicoplast should be applied ONLY to 1031 members of the Apicomplexa. 1032 1033 #-Chloroplast: a chlorophyllous plastid. 1034 1035 #-Chromatophore: a membrane-bound vesicle containing photosynthetic pigments 1036 in bacteria. 1037 1038 #-Chromoplast: a non-chlorophyllous, pigmented plastid, found in 1039 fruits and flowers. 1040 1041 #-Cyanelle: a specialized type of plastid found exclusively in 1042 glaucocystophytes (e.g., Cyanophora). NOTE: cyanelle should be 1043 applied ONLY to members of the Glaucocystophyceae. 1044 1045 #-Endogenous_virus: a virus that has integrated permanently into the 1046 host genome, and which is inherited vertically through the 1047 germline of the host. 1048 1049 #-Extrachromosomal: other extrachromosomal elements not listed here, 1050 such as a B chromosome or an F factor. 1051 1052 #-Genomic: chromosome. This category includes 1053 mitochondrial and chloroplast proteins that are encoded by the nuclear 1054 genome. 1055 1056 #-Hydrogenosome: an organelle that produces hydrogen and ATP and is 1057 found mainly in ciliates, fungi and trichomonads. Hydrogenosomes may 1058 be reduced mitochondria. 1059 1060 #-Kinetoplast: a specialized type of mitochondrion found exclusively 1061 in Kinetoplastida (e.g., Leishmania). NOTE: kinetoplast should 1062 be applied ONLY to members of the Kinetoplastida (trypanosomes and 1063 bodonids). 1064 1065 #-Leucoplast: a plastid lacking pigments of any type. 1066 1067 #-Macronuclear: a specialized type of nucleus found exclusively in the 1068 ciliated protists (e.g., Tetrahymena). NOTE: macronucleus 1069 should be applied ONLY to members of the Ciliophora. 1070 1071 #-Mitochondrion: a semi-autonomous, self-reproducing organelle that 1072 occurs in the cytoplasm of most eukaryotic cells. 1073 1074 #-Nucleomorph: a reduced nuclear remnant found in Chlorarachniophyceae 1075 (e.g., Chlorarachnion) and Cryptophyta (e.g, Cryptomonas). NOTE: 1076 nucleomorph should be applied ONLY to members of the 1077 Chlorarachniophyceae or Cryptophyta. 1078 1079 #-Plasmid: extrachromosomal genetic element found in bacterial species. 1080 Note this does not include the cloning vector used to propagate 1081 the sequence of interest. 1082 1083 #-Plastid: any of a class of double membrane-bound, light-harvesting 1084 organelles (or derived from same). NOTE: plastid should be used 1085 ONLY when a more precise term, e.g., chloroplast, is not 1086 applicable. 1087 1088 #-Proplastid: an immature plastid. 1089 1090 #-Proviral: a virus that is integrated into a host cell chromosome. 1091 1092 1093 **Genetic Code 1094 1095 #If you selected a scientific organism name from the scrollable list 1096 described above, this field will be filled out automatically. However, 1097 if the organism is not on the list, this field will default to the 1098 "Standard" genetic code. If this is incorrect, you can select the 1099 correct genetic code from the pull-down list. To globally change the 1100 genetic code for all sequences which are not automatically filled out, 1101 click on the Genetic Code button in the spreadsheet header. 1102 1103 #For more information regarding the genetic codes available, see the NCBI 1104 <A HREF="http://www.ncbi.nlm.nih.gov/htbin-post/Taxonomy/wprintgc?mode=c"> 1105 Taxonomy page 1106 </A>. 1107 http://www.ncbi.nlm.nih.gov/htbin-post/Taxonomy/wprintgc?mode=c 1108 1109 *Import Source Modifiers 1110 1111 #Using this button allows you to import a tab-delimited table of source 1112 modifiers. The first column in the table must contain the Sequence 1113 Identifiers (SeqIDs) used earlier in the submission and each subsequent 1114 column must contain a different source modifier. The first row in the 1115 table must contain the labels for each column. The label for the 1116 Sequence Identifiers column should be in the format "Seq_ID". A list 1117 of 1118 <A HREF="http://www.ncbi.nlm.nih.gov/Sequin/modifiers.html"> 1119 modifiers 1120 </A> 1121 in the format to be used in the column headers is available. 1122 1123 *Add Source Modifiers 1124 1125 #Using this button will launch the Specify Source Modifiers pop-up box 1126 where you can add or edit any source modifier. You can also import a 1127 source modifier table or export the existing source modifiers in table 1128 format from this page. 1129 1130 #The Select Modifier dialog allows you to select a modifier from the 1131 pull-down list and edit the value of this modifier for each sequence or 1132 globally add a value to all sequences. 1133 1134 #The two windows in this pop-up provide information about the current 1135 source modifiers for the sequences in your submission. The top window 1136 provides a summary of these modifiers and the lower window lists the 1137 values of each modifier for each sequence. If any sequences have 1138 missing organism names or have source information that is identical to 1139 another sequence, the SeqIDs will be shown in red in this window. 1140 Double-clicking on a modifier value in this window will launch a pop-up 1141 where you can edit this value. Double-clicking on the modifier name 1142 used in the header will launch a modifier-specific pop-up where you can 1143 globally edit the modifier value for all sequences or change the value 1144 for individual sequences. 1145 1146 *Clear All Source Modifiers 1147 1148 #This button will clear all modifiers previously entered in either the 1149 FASTA definition lines or the submission dialogs. This includes the 1150 organism name which is required for submission. 1151 1152 >Protein Page 1153 1154 #This page allows you to provide the protein sequence translated from 1155 the nucleotide sequence that you just entered. If the nucleotide 1156 sequence is alternatively spliced or contains multiple open reading 1157 frames, enter all of the protein sequences on this page. Each protein 1158 sequence will appear in the database record as a coding sequence (CDS) 1159 feature. Sequin will automatically determine which nucleotide 1160 sequences code for the protein and indicate the nucleotide sequence 1161 interval on the database record. Sequin also provides tools that allow 1162 you to view a graphical representation of all the open reading frames 1163 in your nucleotide sequence and to convert these reading frames into 1164 CDS features. These tools are described later in the help 1165 documentation under the 1166 1167 <A HREF="#ORFFinder"> 1168 ORF Finder. 1169 </A> 1170 1171 *Conceptual Translation Confirmed by Peptide Sequencing 1172 1173 #Most protein entries are computer-generated conceptual translations of 1174 a nucleic acid sequence. If you have confirmed this translation by 1175 direct sequencing either of the entire protein or of peptides derived 1176 from the protein, please check this box. 1177 1178 *Incomplete at NH3 end/Incomplete at COOH end 1179 1180 #If the sequence is lacking amino acids at the amino- or 1181 carboxy-terminal end of the protein, please check the appropriate box. 1182 1183 *Create Initial mRNA with CDS Intervals 1184 1185 #If you check this box, Sequin will make an mRNA feature with the same 1186 initial intervals (i.e., range of sequence) as the CDS feature. After 1187 the record has been assembled, you should edit the mRNA feature location 1188 to add the 5' UTR and 3' UTR intervals. This may be done either in the 1189 mRNA editor or in the sequence editor. 1190 1191 *Import Protein FASTA 1192 1193 #You can import a single or multiple protein sequences contained within 1194 a previously generated protein FASTA file. 1195 1196 **FASTA Format for Protein Sequences 1197 1198 #The basic FASTA format is the same as that used for 1199 <A HREF="#FASTAFormatforNucleotideSequences"> 1200 nucleotide sequences 1201 </A> 1202 , with a FASTA definition line followed by the sequence itself. 1203 1204 #In order to match the protein sequence to the correct nucleotide 1205 sequence, you must use the same Sequence Identifier (SeqID) that you 1206 used to identify the nucleotide sequence. Thus in cases of 1207 alternatively spliced genes, a single protein FASTA file can contain 1208 two unique sequences that have the same SeqID. Both coding regions 1209 will be added to the same nucleotide sequence. 1210 1211 #The available modifiers for use in a protein FASTA definition line are 1212 different than those for a nucleotide FASTA definition line and are 1213 limited to information about the protein or gene itself and are 1214 contained within the examples below. The format remains [modifer=text]. 1215 1216 #Note in all cases, the FASTA definition line must not contain any hard 1217 returns. All information must be on a single line of text. 1218 1219 #Examples of properly formatted protein FASTA definition lines are: 1220 1221 <KBD><PRE>>Seq1 [protein=neuropilin 1] [gene=Nrp1]</KBD></PRE> 1222 1223 <KBD><PRE>>ABCD [protein=merozoite surface protein 2] [gene=msp2] [protein_desc=MSP2]</KBD></PRE> 1224 1225 <KBD><PRE>>DNA.new [protein=breast and ovarian cancer susceptibility protein] [gene=BRCA1] [note=breast cancer 1, early onset]</KBD></PRE> 1226 1227 #The protein name should be included in the entry; all other fields are 1228 optional. 1229 1230 #The line after the FASTA definition line begins the amino acid 1231 sequence. It is recommended that each line of sequence be no longer 1232 than 80 characters. Please only use IUPAC symbols within the amino 1233 acid sequence. Non-IUPAC amino acid symbols will be stripped from the 1234 sequence. 1235 1236 #After you import your sequence, a window will appear with information 1237 about the sequence. The first line will describe the number of protein 1238 sequences imported and the total length in amino acids of 1239 all sequences. Each sequence is numbered, and its length, 1240 unique identifier (SeqID), Gene symbol, Protein name, and title 1241 (Definition line) as supplied in the FASTA definition line are listed. 1242 1243 >Annotation Page 1244 1245 #Note: This page will not be available if you have selected a segmented 1246 or gapped sequence as the 1247 <A HREF="#SubmissionType"> 1248 Submission Type 1249 </A> 1250 . 1251 1252 #On this page, you can add a 1253 <A HREF="#gene"> 1254 gene 1255 </A> 1256 , 1257 <A HREF="#rRNA"> 1258 ribosomal RNA 1259 </A> 1260 or 1261 <A HREF="#CDS"> 1262 CDS 1263 </A> 1264 feature across the entire span of each sequence you are submitting. 1265 You can not specify locations within each sequence using this page. 1266 More options are available under the 1267 1268 <A HREF="#AnnotateMenu"> 1269 Annotate Menu 1270 </A> 1271 in the record viewer. 1272 1273 #If the feature should be partial at one or both ends, check the 1274 appropriate box and then fill in the text boxes for the relevant 1275 feature. 1276 1277 #You may add a title to all sequences if this was not included in the 1278 FASTA definition line. This will be used as the DEFINITION field in 1279 the final flatfile. The title should contain a brief description of 1280 the sequence. There is a preferred format for nucleotide and protein 1281 titles and Sequin can generate them automatically using the Generate 1282 Definition Line function under the Annotate menu in the record viewer. 1283 1284 >Assembly Tracking 1285 1286 #You will only see this form if you had previously indicated that the 1287 entry is a Third-Party Annotation submission. You must provide the 1288 GenBank Accession number(s) of the primary sequence used to assemble 1289 your TPA sequence. We can not accept primary sequences corresponding 1290 to Reference Sequences or those from proprietary databases. More 1291 information about this can be found on the 1292 1293 <A HREF="http://www.ncbi.nlm.nih.gov/Genbank/TPA.html"> 1294 TPA 1295 </A> 1296 home page. 1297 1298 #If a proper GenBank Accession is entered in the first column of the 1299 Assembly Tracking form, the GenBank staff can map the coordinates for 1300 you. You do not need to fill out the 'from' and 'to' columns. Note 1301 that multiple accessions may be entered to provide full coverage of the 1302 assembled sequence. 1303 1304 #If the accession entered is not recognized as a GenBank Accession 1305 number, a pop-up box is generated requesting that you edit the numbers 1306 listed. Sequences from the trace archive can be used primary sequence 1307 data for TPA records but must be entered in the format "TI123456789". 1308 1309 #You may also generate an Assembly Tracking form in the record viewer 1310 under the Annotate menu. Select Descriptors and TPA Assembly from the 1311 pull-down menu in order to generate the Assembly Tracking form. 1312 1313 >Editing the Record 1314 1315 *Overview 1316 1317 #After you finish the Organism and Sequences Form, Sequin will process 1318 your entry based on the information you have entered. The window you 1319 see now is called the record viewer. This is also the window you will 1320 see if you are submitting an update to an existing record. The 1321 instructions after this point are the same whether you are submitting a 1322 new record or an update. 1323 1324 #In the default window of the record viewer, you will see your entry 1325 approximately as it would appear in the database. Most of the 1326 information that you entered earlier in the submission process is 1327 present in the viewer; other information, such as the contact, is still 1328 present in the record but will not be visible in the database entry. If 1329 you have provided a conceptual translation of the nucleotide sequence, 1330 the translation will be listed as a CDS Feature. Sequin automatically 1331 determines which nucleotides encode for the protein, and lists them, 1332 even if the nucleotide sequence contains introns and exons. 1333 1334 #You can save the entry to a file by selecting Save or Save As under the 1335 File menu. This is not the same as saving the entry for submission to 1336 the database. It is a good idea to save the file at this point so that 1337 if you make any unwanted changes during the editing process you can 1338 revert to the original copy. If you wish to edit the entry later, click 1339 on "Read Existing Record" on the Welcome to Sequin form and choose 1340 the file. 1341 1342 #It is likely that the entry could be processed now for submission to 1343 the database. However, you may wish to add information to 1344 the entry. This information may be in the form of Descriptors or 1345 Features. Descriptors are annotations that apply to an 1346 entire sequence, or an entire set of sequences, and Features are 1347 annotations that apply to a specific sequence interval. For example, 1348 you may want to change the Reference Descriptor to add a published 1349 manuscript, or to annotate the sequence by adding features such as a 1350 signal peptide or polyA signal. 1351 1352 #Information in the record viewer can be edited in different ways. One 1353 way to modify information is to double click within the block of 1354 information you wish to edit. Many blocks, such as "Definition", 1355 "Source", "Reference", or "Features" can be edited. 1356 1357 #To add information, create a new descriptor 1358 or feature by selecting the appropriate form from the Misc or Features 1359 menus. These options are described later in this help document. 1360 1361 #Finally, you may need to edit the sequence itself. 1362 <A HREF="#SequenceEditor"> 1363 Instructions 1364 </A> 1365 for working with the sequence are presented in the documentation for the 1366 Sequence Editor. 1367 1368 *Submitting the Finished Record to the Database 1369 1370 #Once you are satisfied that you have added all the appropriate 1371 information, you must process your entry for submission to the database. 1372 Select "Validate" under the Search menu. This function detects 1373 discrepancies between the format of your submission and that required by 1374 the database selected for entry. 1375 1376 #If Sequin detects problems with the format of your record, you will see a 1377 screen listing the validation errors as well as suggestions for how to fix the 1378 discrepancies. Single clicking on an error message scrolls the record viewer 1379 to the feature that is causing the error. Double clicking on the error 1380 message launches the relevant feature editor on which you can correct the 1381 problem. If you are annotating a set of multiple sequences, shift-click to 1382 scroll to the target sequence and feature. When you think you have corrected 1383 all the problems, click on "Revalidate". You can submit files with errors, 1384 but it is strongly recommended that you correct as many errors as possible 1385 prior to submission. 1386 1387 #Message: Select Verbose, Normal, Terse, or Table. Verbose gives a more 1388 detailed explanation of the problem. 1389 1390 #Filter: Select the error messages you wish to see. You can select 1391 ALL, SEQ_INST (errors regarding the sequence itself, its type, or 1392 length), SEQ_DESCR (descriptor errors), SEQ_FEAT (feature errors), or 1393 errors specific to your record. 1394 1395 #Severity: Select the types of error messages you wish to see. You 1396 will see the type of message selected, as well as any messages warning 1397 of more serious problems. 1398 1399 #There are four types of error messages, Info, Warning, Error, and 1400 Reject. Info is the least severe, and Reject is the most severe. You 1401 may submit the record even if it does contain errors. However, we 1402 encourage you to fix as many problems as possible. Note that some 1403 messages may be merely suggestions, not discrepancies. A possible 1404 Warning message is that a splice site does not match the consensus. 1405 This may be a legitimate result, but you may wish to recheck the 1406 sequence. A possible Error message is that the conceptual translation 1407 of the sequence that you supplied does not encode an open reading 1408 frame. In this case, you should check that you translated the sequence 1409 in the correct reading frame. A possible Reject message is that you 1410 neglected to include the name of the organism from which the sequence 1411 was derived. The name of the organism is absolutely required for a 1412 database entry. 1413 1414 #If Sequin does not detect any problems with the format of your record, 1415 you will see a message stating "Validation test succeeded". 1416 1417 #To prepare the submission, click the "Done" button on the record 1418 viewer, or select "Prepare Submission" under the File menu. You will be 1419 prompted to save the file. Email this file to the database at the 1420 address shown. You MUST email the file; Sequin does not submit the 1421 file automatically over the network. The email addresses for the 1422 databases are: 1423 1424 !-GenBank: gb-sub@ncbi.nlm.nih.gov 1425 !-EMBL: datasubs@ebi.ac.uk 1426 !-DDBJ: ddbjsub@ddbj.nig.ac.jp 1427 1428 #After your entry is complete, close the record viewer. You will be 1429 returned to the Welcome to Sequin form and can begin another entry. 1430 1431 >The Record Viewer 1432 1433 *Target Sequence 1434 1435 #This pop-up menu shows a list of SeqIDs of all nucleotide and protein 1436 sequences associated with the Sequin entry. Use the menu to select the 1437 sequences displayed in the record viewer, as well as the sequences you 1438 want to "target", that is, the sequences to which you want to apply a 1439 descriptor (see 1440 <A HREF="#Descriptors"> 1441 Descriptors 1442 </A> 1443 in the Sequin help documentation). You may select either an individual 1444 sequence by name or a set of sequences, such as All Sequences, or 1445 SEG_dna if you have a segmented nucleotide set. You may change the 1446 selection at any time. 1447 1448 *Display Format 1449 1450 #You may change the display format of the record viewer to any of the 1451 formats described below. Editing a field in one display format will 1452 change that field in all formats. Subsequent pop-up menus will appear 1453 depending on which format is selected. 1454 1455 **GenBank 1456 1457 #This display format allows you to see the submission as it would appear 1458 as a GenBank or DDBJ entry. It is the default format. 1459 1460 #The Mode pop-up default setting is Sequin. Release mode shows certain 1461 qualifiers and db_xrefs in RefSeq entries which are non-collaborative. 1462 Entrez mode is used for web display and can show new elements that have 1463 not yet finished their four month quarentine period. Dump mode requires 1464 that the accession slot be populated. In most cases, there is no need 1465 to change from the default Sequin mode. 1466 1467 #The Style pop-up allows different views of segmented records. The 1468 default is Normal. Segment style is the traditional representation of 1469 segmented sequences, while Contig style displays a CONTIG line with a 1470 join of accessions instead of raw sequence. Master style shows 1471 features mapped to the segmented sequence coordinates instead of the 1472 coordinates of the individual parts. 1473 1474 **Graphic 1475 1476 #This display format shows the entry in a graphical view. The top bar 1477 represents the nucleotide sequence. Lower arrows or bars represent 1478 different features on the sequence. Double click on an arrow or bar to 1479 launch the appropriate editing window. Any sequence highlighted in the 1480 Sequence Editor will be boxed on the graphical view of the sequence. 1481 To see a graphical representation of a segmented set (see 1482 1483 <A HREF="#Submissiontype"> 1484 Submission type 1485 </A>, 1486 above), the Target Sequence must be set to 1487 SEG_dna. 1488 1489 #The Style pop-up menu allows you to see the display in different styles 1490 and colors. 1491 1492 #The Scale pop-up menu allows you to see the display in different sizes. 1493 The smaller the number, the larger the display. 1494 1495 **Sequence 1496 1497 #This display format shows the nucleotide sequence in the record along 1498 with any annotated features (such as CDS or mRNA). You can only view a 1499 single sequence at a time with this option. You can use the Features 1500 pop-up menu to change the display of the features. With the numbering 1501 pop-up menu, select where you want the sequence numbers to be 1502 indicated, at the side of the window, at the top of each sequence line, 1503 or not at all. 1504 1505 **Alignment 1506 1507 #This display format shows sets of aligned sequences, such as those 1508 imported as part of a population, phylogenetic, mutation, or 1509 environmental samples set. When toggled to All Sequences in the Target 1510 Sequence pop-up, the alignment of all entries will be displayed. To 1511 more closely analyze similarities, you can select a single entry in the 1512 Target Sequence pop-up. The complete sequence of the entry selected 1513 will be displayed. Any nucleotides in the other sequences that differ 1514 from that selected will be displayed, while identical nucleotides will 1515 be displayed as a period. You can also display features annotated on 1516 the selected target sequence or all sequences using the Feature display 1517 toggle. To launch the alignment editor, select 1518 <A HREF="#AlignmentAssistant"> 1519 Alignment Assistant 1520 </A> 1521 from the record viewer Edit menu. 1522 1523 **EMBL 1524 1525 #This display format allows you to see the submission as it would appear 1526 as an EMBL entry. 1527 1528 **Table 1529 1530 #This display format shows the annotation in a five-column, tab-delimited 1531 <A HREF="table.html">table</A> 1532 format. This format can be imported to add annotation to a record that 1533 has none. 1534 1535 **FASTA 1536 1537 #This display shows the sequence and Definition line only, without any 1538 annotations, in a format called the FASTA format. This is a format used 1539 by many molecular biology analysis programs. You cannot edit in this 1540 display mode. 1541 1542 **Quality 1543 1544 #This display format shows quality score data ifit has been included in 1545 the submission. 1546 1547 **ASN.1 1548 1549 #This display shows the entry in Abstract Syntax Notation 1, a data 1550 description language used by the NCBI. You cannot edit in this display 1551 mode. 1552 1553 **XML 1554 1555 #This display format shows the entry in XML language, sometimes used by 1556 various databases. You cannot edit in this display mode. 1557 1558 **INSDSeq 1559 1560 #This display format shows the entry in the XML format used by the INSD. 1561 You cannot edit in this display mode. 1562 1563 **Desktop 1564 1565 #The NCBI DeskTop displays the internal 1566 structure of the record being viewed in Sequin. The 1567 <A HREF="#NCBIDeskTop"> 1568 DeskTop 1569 </A> 1570 is explained under the Misc menu. 1571 1572 *Done 1573 1574 #This button allows you to validate the entry when you are finished with 1575 the submission. See 1576 <A HREF="#SubmittingtheFinishedRecordtotheDatabase"> 1577 Submitting the Finished Record to the Database 1578 </A> 1579 in the Sequin help documentation. 1580 1581 *Controls for Downloaded Entries 1582 1583 #If you have downloaded a sequence from Entrez, you will see an 1584 additional button labeled PubMed. This button will launch a web 1585 browser containing the target sequence as it appears in Entrez. From 1586 here, you can access any Entrez-supported Links, including related 1587 sequences and associated references in PubMed. 1588 1589 >Descriptors 1590 1591 *Overview 1592 1593 #Descriptors are annotations that apply to an entire sequence, or an 1594 entire set of sequences, in a given entry. They do not have a specific 1595 location on a sequence, as they apply to the entire sequence. They can 1596 be contrasted to 1597 <A HREF="#Features"> 1598 Features, 1599 </A> 1600 which apply to a specific interval of the sequence. 1601 1602 #You may edit descriptors in one of two ways. 1603 1604 #(1) In the record viewer, double click within the text of the 1605 descriptor to bring up a form on which information can be added. 1606 1607 #(2) Choose the option Descriptors from the Annotate menu. 1608 1609 *Annotate Menu - Descriptors 1610 1611 #This menu allows you either to create new descriptors or to modify 1612 existing ones. Select the descriptor that you wish to modify. 1613 1614 #When you first select a descriptor, you will see a window called 1615 "Descriptor Target Control". Using the target control pop-up menu, 1616 select the sequences you wish this descriptor to cover. The name(s) 1617 listed correspond to the SeqID(s) given to the nucleotide or amino acid 1618 sequences when they were imported into Sequin. The default 1619 selection for this menu is set in the Target Sequence pop-up menu on 1620 the record viewer. You may choose to have the descriptor cover just 1621 one sequence, or a set of sequences in your entry. If you are creating 1622 a new descriptor, select "Create New". If you wish to modify a 1623 previous descriptor, select "Edit Old". 1624 1625 #The following is a list of some of the descriptors that can be added. 1626 Two additional descriptors, those for 1627 <A HREF="#Publications"> 1628 Publications 1629 </A> 1630 and 1631 <A HREF="#BiologicalSourceDescriptororFeature"> 1632 Biological Source, 1633 </A> 1634 are described in other sections. 1635 1636 **TPA Assembly 1637 1638 #If you indicated that your sequence is a TPA submission, a 1639 <A HREF="#AssemblyTracking"> 1640 TPA Assembly 1641 </A> 1642 was created from the information regarding primary accession numbers. 1643 This Assembly information can be edited here. Note that it is not 1644 necessary to enter nucleotide location in the "from" and "to" columns. 1645 1646 **Update Date 1647 1648 #This is for database staff use only. Please do not modify the date. 1649 1650 **Create Date 1651 1652 #This is for database staff use only. Please do not modify the date. 1653 1654 **Region 1655 1656 #This descriptor provides general information about the genetic context 1657 of the sequence. For example, if your nucleotide sequence is cloned 1658 from the region surrounding the Huntington's Disease gene, you could 1659 enter that information here. Providing information for this descriptor 1660 is optional. 1661 1662 **Name 1663 1664 #Alternative place for a descriptive name for the sequence. This 1665 information will not appear in the flatfile view, but will be 1666 maintained in the ASN1. 1667 1668 **Comment 1669 1670 #This descriptor is used to list any additional information that you 1671 wish to provide about the sequence. Use of this descriptor is optional. 1672 Most information can be better annotated using the appropriate 1673 features and qualifiers rather than a generic comment descriptor. 1674 1675 **Title 1676 1677 #This descriptor contains the information that will go on the Definition 1678 line of the database entry. If you supplied a title for your 1679 nucleotide sequence when you imported it into Sequin, that information 1680 is here. If you wish to change the Definition line, or if you did not 1681 supply a title when you submitted the sequence, edit this Descriptor. 1682 1683 **Molecule Description 1684 1685 #This descriptor indicates the characteristics of the molecule from 1686 which the sequence was derived. The information that you have already 1687 entered can be edited here. In most cases, the molecule and class are 1688 the only choices which should be edited from the default values. 1689 1690 ***Molecule 1691 1692 #A GenBank sequence can represent one of several different molecule 1693 types. Enter in the Molecule pop-up menu the type of molecule that was 1694 sequenced. A brief description of the choices in this pop-up menu were 1695 listed previously. 1696 1697 ***Completedness 1698 1699 Choose the appropriate option from the pop-up menu. 1700 1701 #-Complete: Use this designation when a complete molecule, such as a 1702 complete mitochondrial genome, is being submitted. 1703 1704 #-Partial: Use this designation when an incomplete unit, such as the 1705 partial coding sequence of a gene, is being submitted. 1706 1707 #-No left: Use this designation when an incomplete unit, such as the 1708 partial coding sequence of a gene, or a partial protein sequence, is 1709 being submitted. The sequence has no left if it is incomplete on the 1710 5', or amino-terminal, end. 1711 1712 #-No right: Use this designation when an incomplete unit, such as the 1713 partial coding sequence of a gene, or a partial protein sequence, is 1714 being submitted. The sequence has no right if it is incomplete on the 1715 3', or carboxy-terminal, end. 1716 1717 #-No ends: Use this designation when an incomplete unit, such as the 1718 partial coding sequence of a gene, or a partial protein sequence, is 1719 being submitted, The sequence has no ends if it is incomplete at both 1720 the 5' and 3', or amino- and carboxy- terminal, ends. 1721 1722 #-Other: Use this designation when none of the above descriptions apply. 1723 1724 ***Technique 1725 1726 #From the pop-up menu, select the technique that was used to generate the 1727 sequence. 1728 1729 #-Standard: standard sequencing technique. 1730 1731 #-EST: 1732 <A HREF="http://www.ncbi.nlm.nih.gov/dbEST/index.html"> 1733 Expressed Sequence Tag 1734 </A> 1735 : single-pass, low-quality mRNA sequences 1736 derived from cDNAs. These sequences will appear in the EST division. 1737 1738 #-STS: 1739 <A HREF="http://www.ncbi.nlm.nih.gov/dbSTS/index.html"> 1740 Sequence Tagged Site 1741 </A> 1742 : short sequences that are operationally 1743 unique in a genome and that define a specific position on the physical 1744 map. These sequences will appear in the STS division. 1745 1746 #-Survey: 1747 <A HREF="http://www.ncbi.nlm.nih.gov/dbGSS/index.html"> 1748 single-pass genomic sequence 1749 </A> 1750 . These sequences will appear in 1751 the Genome Survey Sequence (GSS) division. 1752 1753 #-Genetic Map: Genetic map information, for example, in the Genomes division. 1754 1755 #-Physical Map: Physical map information, for example in the Genomes division. 1756 1757 #-Derived: A sequence assembled into a contig from shorter sequences. 1758 1759 #-Concept-trans: A protein translation generated with the appropriate 1760 genetic code. 1761 1762 #-Seq-pept: Protein sequence was generated by direct sequencing of a 1763 peptide. 1764 1765 #-Both: Protein sequence was generated by conceptual translation and 1766 confirmed by peptide sequencing. 1767 1768 #-Seq-pept-Overlap: Protein sequence was generated by sequencing 1769 multiple peptides, and the order of peptides was determined by overlap 1770 in their sequences. 1771 1772 #-Seq-pept-Homol: Protein sequence was generated by sequencing 1773 multiple peptides, and the order of peptides was determined by homology 1774 with another protein. 1775 1776 #-Concept-Trans-A: Conceptual translation of the nucleotide sequence 1777 provided by the author of the entry. 1778 1779 #-HTGS 0: 1780 <A HREF="http://www.ncbi.nlm.nih.gov/HTGS/"> 1781 High Throughput Genome Sequence 1782 </A> 1783 , Phase 0. These sequences 1784 are produced by high-throughput sequencing projects and will be in the 1785 HTG division. 1786 1787 #-HTGS 1: 1788 <A HREF="http://www.ncbi.nlm.nih.gov/HTGS/"> 1789 High Throughput Genome Sequence 1790 </A> 1791 , Phase 1. These sequences 1792 are produced by high-throughput sequencing projects and will be in the 1793 HTG division. 1794 1795 #-HTGS 2: 1796 <A HREF="http://www.ncbi.nlm.nih.gov/HTGS/"> 1797 High Throughput Genome Sequence 1798 </A> 1799 , Phase 2. These sequences 1800 are produced by high-throughput sequencing projects and will be in the 1801 HTG division. 1802 1803 #-HTGS 3: 1804 <A HREF="http://www.ncbi.nlm.nih.gov/HTGS/"> 1805 High Throughput Genome Sequence 1806 </A> 1807 , Phase 3. These sequences 1808 are produced by high-throughput sequencing projects and will be in the 1809 HTG division. 1810 1811 #-FLI_cDNA: Full Length Insert cDNA. Sequence corresponds to entire cDNA but 1812 not necessarily entire transcript. These sequences are produced by large 1813 sequencing projects. 1814 1815 #-HTC: High Throughput cDNA. These sequences are produced by large sequencing 1816 projects. 1817 1818 #-WGS: 1819 <A HREF="http://www.ncbi.nlm.nih.gov/Genbank/wgs.html"> 1820 Whole Genome Shotgun 1821 </A> 1822 . These sequences are produced by large sequencing projets and follow a 1823 separate submission process. 1824 1825 #-Barcode: Nucleotide sequence is part of Barcodes of Life project. This 1826 selection should only be used by members of the Consortium for the 1827 Barcodes of Life. 1828 1829 #-Composite-WGS-HTGS: Nucleotide seqeunce has been assembled by large 1830 sequencing centers using a combination of whole genome shotgun and BAC-baed 1831 sequencing. 1832 1833 #-TSA: Transcriptome Shotgun Assembly. Shotgun assemblies of mRNA sequences 1834 from primary data submitted to dbEST, the short read archive (SRA) or the 1835 trace archive. 1836 1837 #-Other: Do not use this designation. 1838 1839 ***Class 1840 1841 #From the pop-up menu, select the type of molecule that was sequenced. 1842 1843 #-DNA: DNA 1844 1845 #-RNA: RNA 1846 1847 #-Protein: Protein 1848 1849 #-Nucleotide: Do not select this item 1850 1851 #-Other: Do not select this item 1852 1853 ***Topology 1854 1855 #From the pop-up menu, select the topology of the sequenced molecule. 1856 1857 #-Linear: Linear molecule (most sequences). 1858 1859 #-Circular: Circular molecule (such as a complete plasmid or mitochondrion). 1860 1861 #-Tandem: Do not select this item. 1862 1863 #-Other: Do not select this item. 1864 1865 ***Strand 1866 1867 #From the pop-up menu, select whether the sequence was derived from an 1868 organism with a single- or double-stranded genome. This is used primarily for 1869 viral submissions. 1870 1871 #-Single: The organism contains only a single-stranded genome, for 1872 example, ssRNA viruses. 1873 1874 #-Double: The organism contains only a double-stranded genome, for 1875 example, dsDNA viruses. 1876 1877 #-Mixed: Do not select this item. 1878 1879 #-Mixed Rev: Do not select this item. 1880 1881 #-Other: Do not select this item. 1882 1883 **Biological Source 1884 1885 #The Biological Source descriptor is described in more detail 1886 <A HREF="#BiologicalSourceDescriptororFeature"> 1887 below. 1888 </A> 1889 1890 >Features 1891 1892 *Overview 1893 1894 #Features are annotations which apply to one or more intervals on a 1895 sequence. They can be contrasted to 1896 <A HREF="#Descriptors"> 1897 Descriptors, 1898 </A> 1899 that apply to an entire sequence or an entire set of sequences. 1900 Features will be added to the Target Sequence selected in the record 1901 viewer pop-up menu. 1902 1903 #You may add or modify features in one of three ways. 1904 1905 #(1) In the record viewer, double click on the text of an existing 1906 feature to bring up a form on which information can be added or edited. 1907 1908 #(2) Choose the feature from the Annotate menu to add a new feature. 1909 1910 #(3) Choose the feature from the Sequence Editor Features menu to add a 1911 new feature. 1912 1913 #The features listed in the Annotate menu and the Sequence Editor 1914 Features menu are identical, and the instructions for adding them are 1915 the same, with one exception. If you annotate them in the Annotate 1916 menu, you must provide the nucleotide sequence location of the feature. 1917 However, if you add features from the Sequence Editor, you can 1918 highlight the sequence that the feature covers, and the location of the 1919 sequence will be automatically entered in the feature location box. 1920 1921 *Annotate Menu - Features 1922 1923 #This menu allows you to add or modify features on the sequence selected 1924 in the Target Sequence pop-up menu of the record viewer. Features are 1925 grouped into six categories. Select the feature that you would like to 1926 mark on your sequence. A new form will appear. 1927 1928 #Feature forms share a common design. The first page is specific to the 1929 particular feature, e.g., Coding Region or Gene. The second page lists 1930 Properties of the Feature. The third page describes the Location of the 1931 feature. Details about the common second and third pages are provided 1932 below. 1933 1934 **Properties Page 1935 1936 ***General Subpage 1937 1938 #Enter general comments about the feature here. 1939 1940 #Select any of the flags if necessary. If this sequence contains only a 1941 partial representation of the feature you are describing, check the 1942 "Partial" box. Check the "Exception" box if the feature annotates a 1943 post-transcriptional modification of the nucleotide sequence, such as 1944 ribosomal slippage or RNA editing. This is generally used only on CDS 1945 features. The evidence dialogs will only be editable if information 1946 has been entered in the Evidence subpage. 1947 1948 #If a gene feature overlaps the feature you are editing, the gene symbol 1949 will appear in the pull-down menu. If you want to add the name of a 1950 new gene, select new, and enter its name and optional description. By 1951 default, mapping between the feature and the gene is done by overlap, 1952 that is, the gene associated with the feature is the gene whose 1953 location overlaps with the location of the feature. Under some 1954 circumstances, for example, if the sequences of two genes overlap, you 1955 may wish the feature to apply to a different gene. In this case, 1956 select cross-reference, and select the name of the new gene in the 1957 pop-up menu. If you do not want the feature to map to any existing 1958 gene, select suppress. You may also edit information on the Gene 1959 feature form by clicking on Edit Gene Feature. 1960 1961 ***Comment Subpage 1962 1963 #Add any comments about the feature here, especially if you checked the 1964 "Exception" box on the General Subpage. 1965 1966 ***Citations Subpage 1967 1968 #This page is used to list any citations that specifically apply to the 1969 feature you are annotating. The citation must have already been entered 1970 into the record (see 1971 <A HREF="#Publications"> 1972 Publications 1973 </A>) 1974 in the Sequin help documentation. Click on Edit Citations, and 1975 place a check mark in box next to the publication you want to cite. 1976 However, we discourage the use of citations on features. 1977 1978 ***Cross-Refs Subpage 1979 1980 #This is a read-only page used to cross-reference this entry to entries 1981 in external databases (databases other than GenBank, EMBL/EBI, and 1982 DDBJ), such as dbEST or FLYBASE. For more information on this topic, 1983 see the International Nucleotide Sequence Database Collaboration 1984 1985 <A HREF="http://www.ncbi.nlm.nih.gov/collab/db_xref.html"> 1986 page 1987 </A>. 1988 http://www.ncbi.nlm.nih.gov/collab/db_xref.html 1989 1990 ***Evidence Subpage 1991 1992 #This page is primarily used by large sequencing centers to explain 1993 annotation prediction methods and its use is optional. More details 1994 about these qualifiers can be found in the 1995 <A HREF="http://www.ncbi.nlm.nih.gov/Genbank/genomesubmit_annotation.html#Evidence_Qualifiers"> 1996 genome submission guidelines 1997 </A>. 1998 The two choices of evidence are Experiment or Inference. 1999 2000 #Wet-bench, experimental evidence can be entered as free text in the 2001 Experiment section. Please be as brief as possible. 2002 2003 #The Inference section allows for information to be added in cases where 2004 the feature is annotated based solely on sequence similarity or 2005 prediction software. In order to fill in text, you must select one of 2006 the options from the Category pull-down menu. Different pull-down and 2007 text boxes will appear depending on the selection you choose from the 2008 Category menu. If you select one of the 'similar to' categories, you 2009 must include the name of the database and the corresponding accession 2010 number of the sequence used as the basis for the annotation. If you 2011 choose one of the prediction categories, you must include the name and 2012 version of the prediction program used as the basis for the annotation. 2013 2014 #For example, if your annotation of a coding region was based on 2015 similarity to the sequence and annotation in GenBank Accession number 2016 AY411252, you would select "similar to DNA sequence" from the pull-down 2017 menu and then select "INSD" in the Database pull-down. You would then 2018 type "AY411252.1" in the Accession text box. If the annotation is 2019 based on the Genscan prediction algorithm, you would select "ab initio 2020 prediction" from the pull-down menu, select "Genscan" in the Program 2021 pull-down and enter 2.0 in the Program Version text box. If the 2022 database or program used is not listed in the appropriate pull-down 2023 list, select Other from the list. A new text box will appear where you 2024 can enter the name of the database or program used. You still must 2025 include the appropriate accession number or version in the subsequent 2026 text box. 2027 2028 ***Identifiers Subpage 2029 2030 #This is a read-only page used by the database staff for tracking 2031 features within the record. 2032 2033 **Location Page 2034 2035 #This page allows you to select the location of the feature you are 2036 citing. Each feature must have a sequence interval associated with it. 2037 In most cases, Sequin will limit the option to the nucleic acid or 2038 protein sequence as appropriate. 2039 2040 #Check the 5' Partial or 3' Partial box if the feature in your nucleic 2041 acid sequence is missing residues at the 5' or 3' ends, respectively. 2042 Check the NH2 Partial or COOH Partial if the feature in your amino acid 2043 sequence is missing residues at the amino- or carboxy-terminal ends, 2044 respectively. If you checked "Partial" on the Properties page, you 2045 must check either the 5' and/or 3' partial boxes. 2046 2047 #Enter the sequence range of the feature. The numbers should correspond 2048 to the nucleotide sequence interval if the SeqID is set to a nucleotide 2049 sequence, and to an amino acid sequence interval if the SeqID is set to 2050 a protein sequence. If the feature spans multiple, non-continuous 2051 intervals on the sequence, indicate the beginning and end points of each 2052 interval. If each interval is separate, and should not be joined with 2053 the others to describe the feature, check the Intersperse intervals with 2054 gaps box (for example, when annotating multiple primer binding sites). 2055 If the feature is composed of several intervals that should all be 2056 joined together, do not check the box (for example, when annotating mRNA 2057 on a genomic DNA sequence). 2058 2059 #For nucleic acid Features only: From the pop-up menu, select the 2060 strand on which the feature is found. 2061 2062 #-Plus: Plus strand, or coding strand. 2063 2064 #-Minus: Minus strand, or non-coding strand. 2065 2066 #-Both: Both strands. 2067 2068 #-Reverse: Do not select this item. 2069 2070 #-Other: Do not select this item. 2071 2072 #Use the pop-up menu to select the SeqID of the sequence you are 2073 describing by the location. Clicking on the X button to the left will clear 2074 location spans, strand, and SeqID from that row. 2075 2076 #If you are working on a set of sequences which contain an alignment, 2077 you will see a toggle at the bottom of the Location Page where you can 2078 select to add or view the location of the feature using the Sequence 2079 Coordinates of the target sequence or the Alignment Coordinates. In 2080 either case, the feature will only be added to the target sequence. If 2081 you want to add features to all members of the set using the alignment 2082 coordinates, you must use the 2083 2084 <A HREF="http://www.ncbi.nlm.nih.gov/Sequin/sequin.hlp.html#Workingwithsetsofalignedsequences"> 2085 Alignment Assistant 2086 </A> 2087 . 2088 #A brief description of the available features follows. A detailed 2089 explanation of how to use the coding region (CDS) feature is included. 2090 The DDBJ/EMBL/GenBank feature table definition 2091 <A HREF="http://www.ncbi.nlm.nih.gov/collab/FT/index.html"> 2092 page 2093 </A> 2094 http://www.ncbi.nlm.nih.gov/collab/FT/index.html 2095 provides detailed information about other features. 2096 2097 *attenuator 2098 2099 #1) region of DNA at which regulation of termination of transcription 2100 occurs, which controls the expression of some bacterial operons; 2) 2101 sequence segment located between the promoter and the first structural 2102 gene that causes partial termination of transcription. 2103 2104 *C_region 2105 2106 #Constant region of immunoglobulin light and heavy chains, and T-cell 2107 receptor alpha, beta, and gamma chains. Includes one or more exons, 2108 depending on the particular chain. 2109 2110 *CAAT_signal 2111 2112 #CAAT box; part of a conserved sequence located about 75 bp upstream of 2113 the start point of eukaryotic transcription units that may be involved 2114 in RNA polymerase binding; consensus=GG(C or T)CAATCT. 2115 2116 *CDS 2117 2118 #coding sequence; sequence of nucleotides that corresponds with the 2119 sequence of amino acids in a protein (location includes stop codon). 2120 Feature includes amino acid conceptual translation. 2121 2122 **Coding Region Page 2123 2124 #Most users add a coding region to their sequence when they fill out the 2125 Organism and Sequences form. However, you may need to edit the coding 2126 region, or add additional ones. Choose CDS under the Coding Regions 2127 and Transcripts submenu of the Features menu, or to edit an existing 2128 CDS, double click on the record viewer. If you appended the partial 2129 sequence of a coding region to the Organism and Sequences form, you will 2130 probably need to edit the Coding Region feature to avoid validation 2131 error messages about the location of the coding region. 2132 2133 ***General (Product) Subpage 2134 2135 #Choose the genetic code that should be used to translate the 2136 nucleotide sequence. For more information, and for the translation 2137 tables themselves, see the NCBI Taxonomy 2138 <A HREF="http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c"> 2139 page 2140 </A>. 2141 If the genetic code is already populated from the taxonomy database, do 2142 not change this selection. 2143 2144 #Choose the reading frame in which to translate the sequence. Do not 2145 fill in the Protein Product or SeqID selections. 2146 2147 #Supply additional information about the protein by clicking on Edit 2148 Protein Information to launch the Protein feature forms. The protein 2149 name must have already been filled out on the Protein subpage. 2150 2151 #Checking retranslate on accept will translate the nucleotide sequence 2152 according to the interval(s) indicated on the Locations page when you 2153 click on Accept to exit the editor. This new translation will replace 2154 any earlier translations you have supplied. This should not be a 2155 problem if the interval was indicated appropriately. 2156 2157 #If the coding sequence that you supply is a partial sequence and you 2158 have checked a Partial box on the Location subpage, it is a good idea to 2159 check the Synchronize Partials box. In this case, Sequin will ensure 2160 that all other appropriate features (such as protein) are also marked as 2161 partial. 2162 2163 #When editing existing CDS features, choose the sequence you want to 2164 view by selecting its name uder the Product pop-up menu. You may also 2165 import a new protein sequence by selecting Import Protein FASTA under 2166 the file menu. The sequence should be formatted as described above on 2167 the Organism and Sequences form. 2168 2169 #After you have imported a protein sequence, click on Predict Interval. 2170 This function will predict the interval on the nucleotide sequence to 2171 which the coding region applies. If you do not select this function, 2172 the interval will likely be wrong, and you will get an error message 2173 when you attempt to validate the record. If your sequence is a 5' or 3' 2174 partial, you must first indicate this manually on the Location Page. 2175 2176 #You may also have Sequin generate the protein sequence from the 2177 nucleotide sequence by clicking on Translate Product. However, you must 2178 first indicate the location and partialness of the coding region on the 2179 Location page in order to obtain the correct translation. 2180 2181 #The Edit Protein Sequence button will launch an amino acid 2182 <A HREF="#SequenceEditor"> 2183 Sequence Editor 2184 </A> 2185 as discussed below. 2186 2187 #The Adjust for Stop Codon button will truncate a displayed translation 2188 at the first stop codon. If no stop codon is present in the current 2189 translation, this function will extend the translation to the first stop 2190 codon or to the end of the sequence. In both cases, the spans of the 2191 coding region will be automatically updated on the Location Page to 2192 reflect the new translation. 2193 2194 ***Protein Subpage 2195 2196 #Use this page to enter or edit a name or description of the protein 2197 product. For a new sequence, enter information directly into the 2198 boxes. You can edit descriptions of an existing sequence by clicking 2199 on Edit Protein Feature which will bring up the Protein feature form. 2200 The Launch Product Viewer displays the flatfile view of ht eprotein 2201 record generated from the information in the CDS feature. 2202 2203 ***Exceptions Subpage 2204 2205 #Exceptions describe places where there is a posttranslational 2206 modification. Enter the amino acid position at which the modification 2207 occurs, and select the amino acid that is actually represented in the 2208 protein from the pop-up list. Sequin will change the amino acid number 2209 to a nucleotide interval. Please provide some explanation for the 2210 exception in a comment. 2211 2212 *conflict 2213 2214 #Independent determinations of the "same" sequence differ at this site 2215 or region. 2216 2217 *D-loop 2218 2219 #Displacement loop; a region within mitochondrial DNA in which a short 2220 stretch of RNA is paired with one strand of DNA, displacing the 2221 original partner DNA strand in this region; also used to describe the 2222 displacement of a region of one strand of duplex DNA by a single 2223 stranded invader in the reaction catalyzed by RecA protein. 2224 2225 *D_segment 2226 2227 #Diversity segment of immunoglobulin heavy chain, and T-cell receptor 2228 beta chain. 2229 2230 *enhancer 2231 2232 #A cis-acting sequence that increases the utilization of (some) 2233 eukaryotic promoters and can function in either orientation and in any 2234 location (upstream or downstream) relative to the promoter. 2235 2236 *exon 2237 2238 #Region of genome that codes for portion of spliced mRNA; may contain 2239 5' UTR, all CDSs, and 3' UTR. 2240 2241 *gap 2242 2243 #Gap in the sequence, only applied to gaps of unknown length. The 2244 location span of the gap feature is 100 base pairs, indicated by 100 "n"s 2245 in the sequence. The qualifier /estimated_length=unknown is mandatory. 2246 2247 *GC_signal 2248 2249 #GC box; a conserved GC-rich region located upstream of the start point 2250 of eukaryotic transcription units that may occur in multiple copies or 2251 in either orientation; consensus=GGGCGG. 2252 2253 *gene 2254 2255 #Region of biological interest identified as a gene and for which a name 2256 has been assigned. 2257 2258 *iDNA 2259 2260 #Intervening DNA; DNA which is eliminated through any of several kinds 2261 of recombination. 2262 2263 *intron 2264 2265 #A segment of DNA that is transcribed, but removed from within the 2266 transcript, by splicing together the sequences (exons) on either side of 2267 it. 2268 2269 *J_segment 2270 2271 #Joining segment of immunoglobulin light and heavy chains, and T-cell 2272 receptor alpha, beta, and gamma chains. 2273 2274 *LTR 2275 2276 #Long terminal repeat, a sequence directly repeated at both ends of a 2277 defined sequence, of the sort typically found in retroviruses. 2278 2279 *mat_peptide 2280 2281 #Mature peptide or protein coding sequence; coding sequence for the 2282 mature or final peptide or protein product following post-translational 2283 modification. The location does not include the stop codon (unlike the 2284 corresponding CDS). 2285 2286 *misc_binding 2287 2288 #Site in nucleic acid that covalently or non-covalently binds another 2289 moiety that cannot be described by any other Binding key (primer_bind or 2290 protein_bind). 2291 2292 *misc_difference 2293 2294 #Feature sequence is different from that presented in the entry and 2295 cannot be described by any other Difference key (conflict, unsure, 2296 mutation, variation, allele, or modified_base). 2297 2298 *misc_feature 2299 2300 #Region of biological interest which cannot be described by any other 2301 feature key. 2302 2303 *misc_recomb 2304 2305 #Site of any generalized, site-specific, or replicative recombination 2306 event where there is a breakage and reunion of duplex DNA that cannot be 2307 described by other recombination keys (iDNA and virion) or qualifiers of 2308 source key (/proviral). 2309 2310 *misc_RNA 2311 2312 #Any transcript or RNA product that cannot be defined by other RNA keys 2313 (prim_transcript, precursor_RNA, mRNA, 5'UTR, 3'UTR, 2314 exon, transit_peptide, polyA_site, rRNA, tRNA, and ncRNA). 2315 2316 *misc_signal 2317 2318 #Any region containing a signal controlling or altering gene function or 2319 expression that cannot be described by other Signal keys (promoter, 2320 CAAT_signal, TATA_signal, -35_signal, -10_signal, GC_signal, RBS, 2321 polyA_signal, enhancer, attenuator, terminator, and rep_origin). 2322 2323 *misc_structure 2324 2325 #Any secondary or tertiary structure or conformation that cannot be 2326 described by other Structure keys (stem_loop and D-loop). 2327 2328 *modified_base 2329 2330 #The indicated nucleotide is a modified nucleotide and should be 2331 substituted for by the indicated molecule (given in the mod_base 2332 qualifier value). 2333 2334 *mRNA 2335 2336 #messenger RNA; includes 5' untranslated region (5' UTR), coding sequences 2337 (CDS, exon) and 3' untranslated region (3' UTR). 2338 2339 *ncRNA 2340 2341 #non-coding RNA; a non-protein-coding transcript other than ribosomal RNA and 2342 transfer RNA, including antisense RNA, guide RNA, scRNA, siRNA, miRNA, piRNA, 2343 snoRNA, and snRNA. The specific type of ncRNA must be specified in the 2344 /ncRNA_class qualifier. 2345 2346 *N_region 2347 2348 #Extra nucleotides inserted between rearranged immunoglobulin segments. 2349 2350 *operon 2351 2352 #Region containing polycistronic transcript under the control of the same 2353 regulatory sequences. 2354 2355 *oriT 2356 2357 Origin of transfer; region of DNA where transfer is initiated during the 2358 process of conjugation or mobilization. 2359 2360 *polyA_signal 2361 2362 #Recognition region necessary for endonuclease cleavage of an RNA 2363 transcript that is followed by polyadenylation; consensus=AATAAA. 2364 2365 *polyA_site 2366 2367 #Site on an RNA transcript to which will be added adenine residues by 2368 post-transcriptional polyadenylation. 2369 2370 *precursor_RNA 2371 2372 #Any RNA species that is not yet the mature RNA product; may include 5' 2373 clipped region (5' clip), 5' untranslated region (5' UTR), coding 2374 sequences (CDS, exon), intervening sequences (intron), 3' untranslated 2375 region (3' UTR), and 3' clipped region (3' clip). 2376 2377 *prim_transcript 2378 2379 #Primary (initial, unprocessed) transcript; includes 5' clipped region 2380 (5' clip), 5' untranslated region (5' UTR), coding sequences (CDS, exon), 2381 intervening sequences (intron), 3' untranslated region (3' UTR), and 3' 2382 clipped region (3' clip). 2383 2384 *primer_bind 2385 2386 #Non-covalent primer binding site for initiation of replication, 2387 transcription, or reverse transcription. Includes site(s) for synthetic 2388 e.g., PCR primer elements. 2389 2390 *promoter 2391 2392 #Region on a DNA molecule involved in RNA polymerase binding to initiate 2393 transcription. 2394 2395 *protein_bind 2396 2397 #Non-covalent protein binding site on nucleic acid. 2398 2399 *RBS 2400 2401 #Ribosome binding site. 2402 2403 *repeat_region 2404 2405 #Region of genome containing repeating units. Some qualifiers such as 2406 rpt_type, mobile_element and satellite have controlled vocabularies. These 2407 qualifiers have check boxes or pull-down menus to ensure that the 2408 correct format is used. 2409 2410 *rep_origin 2411 2412 #Origin of replication; starting site for duplication of nucleic acid to 2413 give two identical copies. 2414 2415 *rRNA 2416 2417 #Mature ribosomal RNA ; the RNA component of the ribonucleoprotein 2418 particle (ribosome) that assembles amino acids into proteins. 2419 2420 *S_region 2421 2422 #Switch region of immunoglobulin heavy chains. Involved in the 2423 rearrangement of heavy chain DNA leading to the expression of a 2424 different immunoglobulin class from the same B-cell. 2425 2426 *sig_peptide 2427 2428 #Signal peptide coding sequence; coding sequence for an N-terminal 2429 domain of a secreted protein; this domain is involved in attaching 2430 nascent polypeptide to the membrane; leader sequence. 2431 2432 *source 2433 2434 #Identifies the biological source of the specified span of the sequence. 2435 This key is mandatory. Every entry will have, as a minimum, a single 2436 source key spanning the entire sequence. More than one source key per 2437 sequence is permittable. 2438 2439 *stem_loop 2440 2441 #Hairpin; a double-helical region formed by base-pairing between 2442 adjacent (inverted) complementary sequences in a single strand of RNA or 2443 DNA. 2444 2445 *STS 2446 2447 #Sequence Tagged Site. Short, single-copy DNA sequence that 2448 characterizes a mapping landmark on the genome and can be detected by 2449 PCR. A region of the genome can be mapped by determining the order of a 2450 series of STSs. 2451 2452 *TATA_signal 2453 2454 #TATA box; Goldberg-Hogness box; a conserved AT-rich heptamer found 2455 about 25 bp before the start point of each eukaryotic RNA polymerase II 2456 transcript unit that may be involved in positioning the enzyme for 2457 correct initiation; consensus=TATA(A or T)A(A or T). 2458 2459 *terminator 2460 2461 #Sequence of DNA located either at the end of the transcript or adjacent 2462 to a promoter region that causes RNA polymerase to terminate 2463 transcription; may also be site of binding of repressor protein. 2464 2465 *tmRNA 2466 2467 #Transfer messenger RNA; acts as a tRNA first, then an mRNA that encodes a 2468 peptide tag. 2469 2470 *transit_peptide 2471 2472 #Transit peptide coding sequence; coding sequence for an N-terminal 2473 domain of a nuclear-encoded organellar protein; this domain is involved 2474 in post- translational import of the protein into the organelle. 2475 2476 *tRNA 2477 2478 #Mature transfer RNA, a small RNA molecule (75-85 bases long) that 2479 mediates the translation of a nucleic acid sequence into an amino acid 2480 sequence. 2481 2482 *unsure 2483 2484 #Author is unsure of exact sequence in this region. 2485 2486 *V_region 2487 2488 #Variable region of immunoglobulin light and heavy chains, and T-cell 2489 receptor alpha, beta, and gamma chains. Codes for the variable amino 2490 terminal portion. Can be made up from V_segments, D_segments, 2491 N_regions, and J_segments. 2492 2493 *V_segment 2494 2495 #Variable segment of immunoglobulin light and heavy chains, and T-cell 2496 receptor alpha, beta, and gamma chains. Codes for most of the variable 2497 region (V_region) and the last few amino acids of the leader peptide. 2498 2499 *variation 2500 2501 #A related strain contains stable mutations from the same gene (e.g., 2502 RFLPs, polymorphisms, etc.) that differ from the presented sequence at 2503 this location (and possibly others). 2504 2505 *3'UTR 2506 2507 #Region near or at the 3' end of a mature transcript (usually following 2508 the stop codon) that is not translated into a protein; trailer. 2509 2510 *5'UTR 2511 2512 #Region near or at the 5' end of a mature transcript (usually preceding 2513 the initiation codon) that is not translated into a protein; leader. 2514 2515 * -10_signal 2516 2517 #Pribnow box; a conserved region about 10 bp upstream of the start point 2518 of bacterial transcription units that may be involved in binding RNA 2519 polymerase; consensus=TAtAaT. 2520 2521 * -35_signal 2522 2523 #A conserved hexamer about 35 bp upstream of the start point of 2524 bacterial transcription units; consensus = TTGACa or TGTTGACA. 2525 2526 >Biological Source Descriptor or Feature 2527 2528 #This annotation is very important, as an entry cannot be processed by 2529 the databases unless it includes some basic information about the 2530 organism from which the sequence was derived. This basic information was 2531 entered previously in the submission, in the Organism and Sequences 2532 Form. The more detailed Organism Information form allows you to alter 2533 or add to the data you entered earlier. 2534 2535 *Overview: Descriptor or Feature? 2536 2537 #Sequin allows two types of biological source information to be entered, 2538 Biological Source Descriptors and Biological Source Features. Biological 2539 Source Descriptors, like other descriptors, provide organism information 2540 about an entire sequence, or an entire set of sequences, in an entry. 2541 Biological Source Features, like other features, provide organism 2542 information about a specific interval on a given sequence. 2543 2544 #In most cases, you will want to use a Biological Source Descriptor, because 2545 all the sequences in the entry will derive from the same source. However, if 2546 you have sequenced a transgenic molecule, for example, one that is part plant 2547 and part bacterial, you would use Biological Source Features to annotate which 2548 sequence was derived from plant and which from bacteria. 2549 2550 #To add a Biological Source Descriptor, select Biological Source under 2551 the Descriptor section of the Annotate menu. To add a Biological 2552 Source Feature, select Biological Source under the Bibliographic and 2553 Comments section of the Annotate menu. 2554 2555 #Annotating a Biological Source Descriptor or Feature is similar to 2556 annotating any descriptor or feature. For help in creating descriptors 2557 and features, see the appropriate section of the help documentation. 2558 The following are instructions for filling out Biological 2559 Source-specific forms. 2560 2561 *Organism Page 2562 2563 **Names Subpage 2564 2565 #The scrollable list contains the scientific names of many organisms. 2566 To reach a name on the list, either type the first few letters of the 2567 scientific name, or use the thumb bar. Click on a name from the list to 2568 fill out the scientific name field. If there is a common name for the 2569 organism, that field will be filled out automatically. You may also 2570 directly type in the scientific name. If you have any questions about 2571 the scientific or common name of an organism, see the NCBI 2572 <A HREF="http://www.ncbi.nlm.nih.gov/Taxonomy/tax.html"> 2573 taxonomy browser 2574 </A> 2575 http://www.ncbi.nlm.nih.gov/Taxonomy/tax.html 2576 2577 **Location Subpage 2578 2579 ***Location of Sequence 2580 2581 #From the selection list, please enter the location of the genome that 2582 contains your sequence. Most entries will have a "Genomic" location. 2583 A brief description of the choices in this pop-up menu were listed 2584 previously. 2585 2586 ***Origin of Sequence 2587 2588 #This menu is for the use of database personnel. Please leave this 2589 field empty. The Biological focus box should be checked in rare cases 2590 where multiple source features are annotated. 2591 2592 **Genetic Codes Subpage 2593 2594 #Please use these fields to select the nuclear and mitochondrial genetic 2595 code that should be used to translate the nucleic acid sequence. The 2596 genetic code for a eukaryotic organism is "Standard". If you selected 2597 an organism name from the scrollable list described above, this field 2598 was filled out automatically. Do not change these fields if they have 2599 been filled out automatically. 2600 2601 #For more information regarding the translation tables available, see 2602 the NCBI Taxonomy 2603 2604 <A HREF="http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c"> 2605 page 2606 </A>. 2607 2608 **Lineage Subpage 2609 2610 #This information is normally entered by the database staff. They will 2611 use the 2612 <A HREF="http://www.ncbi.nlm.nih.gov/Taxonomy/tax.html"> 2613 Taxonomy database 2614 http://www.ncbi.nlm.nih.gov/Taxonomy/tax.html 2615 </A> 2616 maintained by the NCBI/GenBank. 2617 2618 #If you disagree with the lineage supplied please notify the database 2619 staff. 2620 2621 #If you are running Sequin in its 2622 <A HREF="#NetConfigure"> 2623 network-aware 2624 </A> 2625 mode, you will see a button labeled "Lookup Taxonomy". Click on this 2626 button to perform an automatic look-up of the taxonomic lineage of the 2627 organism. Sequin will perform the look-up by accessing the Taxonomy 2628 database and will fill out the Taxonomic Lineage and 2629 Division fields. 2630 2631 #If you have any comments about the taxonomic lineage determined by 2632 Sequin, please submit these comments with your entry. Under the Sequin 2633 File menu, select Edit Submitter Info. Enter your comments in the box 2634 entitled "Special Instructions to Database Staff", on the Submission 2635 page. 2636 2637 *Modifiers Page 2638 2639 #This page allows you to enter additional information about the source 2640 and/or organism. Entering information is optional. 2641 2642 **Source Subpage 2643 2644 #Choose a modifier from the pull-down menu on the left side of the page 2645 and type the appropriate name on the right side of the page. If you do 2646 not find appropriate modifiers in the scroll down list, you can enter 2647 additional source information as text in the field at the bottom of the 2648 page. You may add multiple modifiers to describe the source organism. 2649 2650 #Clicking on the X button to the right of the text box will remove the 2651 text and clear the modifier from the pull-down in that line. 2652 2653 #The following is a description of the available modifiers: 2654 2655 #-Cell-line: Cell line from which sequence derives. 2656 2657 #-Cell-type: Type of cell from which sequence derives. 2658 2659 #-Chromosome: Chromosome to which the gene maps. 2660 2661 #-Clone: Name of clone from which sequence was obtained. 2662 2663 #-Clone-lib: Name of library from which sequence was obtained. 2664 2665 #-Collected-by: Name of person who collected sample. Do not use 2666 accented or non-ASCII characters. 2667 2668 #-Collection-date: Date sample was collected. Must use format 2669 23-Mar-2005, Mar-2005, or 2005. 2670 2671 #-Country: The country of origin of DNA samples used for epidemiological 2672 or population studies. A list of approved country designations can 2673 be found on the 2674 <A HREF="http://www.ncbi.nlm.nih.gov/projects/collab/country.html"> 2675 ISDC web pages.</A> Additional text may be added after a colon. For example, 2676 /country="USA: Bethesda, MD" 2677 2678 #-Dev-stage: Developmental stage of organism. 2679 2680 #-Endogenous-virus-name: Name of inactive virus that is integrated into 2681 the chromosome of its host cell and can therefore exhibit vertical 2682 transmission. 2683 2684 #-Environmental-sample: Identifies sequence derived by direct molecular 2685 isolation from an unidentified organism. You cannot include extra text when 2686 using this modifier; the text box will change to TRUE upon selection of this 2687 modifier from the pull-down list 2688 2689 #-Frequency: Frequency of occurrence of a feature. 2690 2691 #-Fwd-PCR-primer-name: Name or designation of forward primer used for 2692 amplification. 2693 2694 #-Fwd-PCR-primer-seq: Sequence of forward primer used for amplification. 2695 2696 #-Genotype: Genotype of the organism. 2697 2698 #-Germline: If the sequence shown is DNA and a member of the 2699 immunoglobulin family, this qualifier is used to denote that the sequence 2700 is from unrearranged DNA. You cannot include extra text when using this 2701 modifier; the text box will change to TRUE upon selection of this modifier 2702 from the pull-down list. 2703 2704 #-Haplogroup: Combination of stable polymorphic variants clustered together 2705 in a specific combination which can indicate a common ancestor. 2706 2707 #-Haplotype: Haplotype of the organism. 2708 2709 #-Identified-by: Name of person who identified sample. Do not use 2710 accented or non-ASCII characters. 2711 2712 #-Isolation-source: Describes the local geographical source of the organism 2713 from which the sequence was derived 2714 2715 #-Lab-host: Laboratory host used to propagate the organism from which 2716 the sequence was derived. 2717 2718 #-Lat-Lon: Latitude and longitude of location where sample was 2719 collected. Mandatory format is decimal degrees N/S E/W. Selecting this 2720 modifier in the pull-down list will generate separate boxes for entering the 2721 information in the mandatory format. 2722 2723 #-Linkage-group: Group of genes whose loci are physically connected and tend 2724 to segregate together during meiosis. 2725 2726 #-Map: Map location of the gene. 2727 2728 #-Mating-type: Designation of individual single-celled organisms and protists 2729 based on mating behavior. 2730 2731 #-Metagenomic: Identifies sequence from a culture-independent genomic 2732 analysis of an environmental sample submitted as part of a whole genome 2733 shotgun project. You may not include extra text when using this modifier, 2734 instead the text box will change to TRUE upon selection. 2735 2736 #-Plasmid-name: Name of plasmid from which the sequence was obtained. 2737 2738 #-Pop-variant: Name of the population variant from which the sequence was 2739 obtained. 2740 2741 #-Rearranged: If the sequence shown is DNA and a member of the 2742 immunoglobulin family, this qualifier is used to denote that the sequence 2743 is from rearranged DNA. You cannot include extra text when using this 2744 modifier; the text box will change to TRUE upon selection of this modifier 2745 from the pull-down list. 2746 2747 #-Rev-PCR-primer-name: Name or description of reverse primer used for 2748 amplification. 2749 2750 #-Rev-PCR-primer-seq: Sequence of reverse primer used for amplification. 2751 2752 #-Segment: Name of viral genome fragmented into two or more nucleic acid 2753 molecules. 2754 2755 #-Sex: Sex of the organism from which the sequence derives. 2756 2757 #-Subclone: Name of subclone from which sequence was obtained. 2758 2759 #-Tissue-lib: Tissue library from which the sequence was obtained. 2760 2761 #-Tissue-type: Type of tissue from which sequence derives. 2762 2763 #-Transgenic: Identifies organism that was the recipient of transgenic 2764 DNA. You cannot include extra text when using this modifier; the text box 2765 will change to TRUE upon selection of this modifier from the pull-down list. 2766 2767 **Organism Subpage 2768 2769 #Choose a modifier from the pull-down menu on the left side of the page 2770 and type the appropriate name on the right side of the page. If you do 2771 not find appropriate modifiers in the scroll down list, you can enter 2772 additional organism information as text in the field at the bottom of 2773 the page. You may add multiple modifiers to describe the source organism. 2774 2775 #Clicking on the X button to the right of the text box will remove the text 2776 and clear the modifier from the pull-down in that line. 2777 2778 #The following is a description of the available modifiers: 2779 2780 #-Acronym: Standard synonym (usually of a virus) based on the initials 2781 of the formal name. An example is HIV-1. 2782 2783 #-Anamorph: The scientific name applied to the asexual phase of a fungus. 2784 2785 #-Authority: The author or authors of the organism name from which sequence 2786 was obtained. 2787 2788 #-Bio-material: An identifier of the stored biological material from which 2789 the sequence was obtained. This qualifier should be used to cite collections 2790 that are not appropriate in specimen-voucher or culture-collection. Examples 2791 include stock centers and seed banks. Mandatory format is "institution 2792 code:collection code:material_id". However, only material_id is required. 2793 Selecting this modifier in the pull-down list will generate separate boxes for 2794 entering the information in the correct format. 2795 2796 #-Biotype: See biovar. 2797 2798 #-Biovar: Variety of a species (usually a fungus, bacteria, or virus) 2799 characterized by some specific biological property (often geographical, 2800 ecological, or physiological). Same as biotype. 2801 2802 #-Breed: The named breed from which sequence was obtained (usually applied 2803 to domesticated mammals). 2804 2805 #-Chemovar: Variety of a species (usually a fungus, bacteria, or virus) 2806 characterized by its biochemical properties. 2807 2808 #-Common: Common name of the organism from which sequence was obtained. 2809 2810 #-Cultivar: Cultivated variety of plant from which sequence was obtained. 2811 2812 #-Culture-collection: Identifier and institution code of the microbial or 2813 viral culture or stored cell-line from which the sequence was obtained. This 2814 qualifier should be used to cite the collection where the author has deposited 2815 the culture or from which the culture was obtained. Personal library 2816 collections should be annotated in strain and not in culture-collection. 2817 Mandatory format is "institution code:collection code:culture_id". However, 2818 collection code is not required. Selecting this modifier in the pull-down 2819 list will generate separate boxes for entering the information in the correct 2820 format. 2821 2822 #-Ecotype: The named ecotype (population adapted to a local habitat) from 2823 which sequence was obtained (customarily applied to populations of 2824 Arabidopsis thaliana). 2825 2826 #-Forma: The forma (lowest taxonomic unit governed by the nomenclatural 2827 codes) of organism from which sequence was obtained. This term is usually 2828 applied to plants and fungi. 2829 2830 #-Forma-specialis: The physiologically distinct form from which sequence 2831 was obtained (usually restricted to certain parasitic fungi). 2832 2833 #-Group: Do not select this item. 2834 2835 #-Host: Natural (as opposed to laboratory) host to the organism from which 2836 sequenced molecule was obtained. Use of the Latin name of the host organism 2837 is preferred. 2838 2839 #-Isolate: Identification or description of the specific individual 2840 from which this sequence was obtained. An example is Patient X14. 2841 2842 #-Metagenome-source: Used only for genome projects. Do not select this item. 2843 2844 #-Pathovar: Variety of a species (usually a fungus, bacteria or virus) 2845 characterized by the biological target of the pathogen. Examples 2846 include Pseudomonas syringae pathovar tomato and Pseudomonas syringae 2847 pathovar tabaci. 2848 2849 #-Serogroup: See serotype. 2850 2851 #-Serotype: Variety of a species (usually a fungus, bacteria, or virus) 2852 characterized by its antigenic properties. Same as serogroup and 2853 serovar. 2854 2855 #-Serovar: See serotype. 2856 2857 #-Specimen-voucher: Identifier of the physical specimen from which the 2858 sequence was obtained. The qualifier is intended for use where the sample is 2859 still available in a curated museum, herbarium, frozen tissue collection, or 2860 personal collection. Mandatory format is "institution code:collection 2861 code:specimen_id". However, only specimen_id is required. Selecting this 2862 modifier in the pull-down list will generate separate boxes for entering the 2863 information in the correct format. 2864 2865 #-Strain: Strain of organism from which sequence was obtained. 2866 2867 #-Subgroup: Do not select this item. 2868 2869 #-Sub-species: Subspecies of organism from which sequence was obtained. 2870 2871 #-Substrain: Sub-strain of organism from which sequence was obtained. 2872 2873 #-Subtype: Subtype of organism from which sequence was obtained. 2874 2875 #-Synonym: The synonym (alternate scientific name) of the organism name 2876 from which sequence was obtained. 2877 2878 #-Teleomorph: The scientific name applied to the sexual phase of a fungus. 2879 2880 #-Type: Type of organism from which sequence was obtained. 2881 2882 #-Variety: Variety of organism from which sequence was obtained. 2883 2884 **GenBank Subpage 2885 2886 #Please do not use this form. This field is reserved for information from 2887 NCBI's taxonomy database. 2888 2889 *Miscellaneous Page 2890 2891 **Synonyms Subpage 2892 2893 #If there are alternative names for the organism from which the sequence 2894 was derived, enter them here. Please be aware that this is the 2895 appropriate field only for alternative names for the organism, not for 2896 alternative gene or protein names. 2897 2898 **Cross-Refs Subpage 2899 2900 #This page is for use by database staff only. 2901 2902 >Publications 2903 2904 *Overview: Descriptor or Feature? 2905 2906 #Sequin allows two types of publications to be entered, Publication 2907 Descriptors and Publication Features. Publication Descriptors are 2908 bibliographic references that, like other descriptors, cover an entire 2909 sequence, or an entire set of sequences, in an entry. Publication 2910 Features are bibliographic references that, like other features, cover 2911 a specific interval on a given sequence. 2912 2913 #Publications are entered into the Reference field of the database 2914 entry. References are citations of unpublished, in press, or published 2915 works that are relevant to the submitted sequence. Publications 2916 should provide information regarding the principle cloning and 2917 determination of the sequence within the record. 2918 2919 #In general, there is one publication describing a sequence, and a 2920 Publication Descriptor should be used. To enter a Publication 2921 Descriptor, select Publications under the Annotate menu and click on 2922 Publication Descriptor. 2923 2924 #However, if one publication describes the cloning of the 5' end of a 2925 gene, and another publication describes the cloning of the 3' end of 2926 the gene, Publication features may be used. To make a publication 2927 feature, choose Publication Feature in the Publications section of the 2928 Annotate menu. Enter the information about the publication, and then 2929 enter the nucleotide interval to which the publication refers on the 2930 Location page. 2931 2932 *Citation on Entry Form 2933 2934 **Status 2935 2936 #Using the radio buttons, select one of the three options: 2937 2938 #-Unpublished: Select this option if a manuscript has been written but 2939 not yet submitted or has been submitted for publication but has not yet 2940 been accepted. 2941 2942 #-In Press: The article has been accepted for publication but is not yet 2943 in print. 2944 2945 #-Published: The article has been published. 2946 2947 **Class 2948 2949 #Using the radio buttons, select the type of publication in which the 2950 sequence will appear. 2951 2952 #-Journal 2953 2954 #-Book Chapter 2955 2956 #-Book 2957 2958 #-Thesis/Monograph 2959 2960 #-Proceedings Chapter: Abstract from a meeting 2961 2962 #-Proceedings: A meeting 2963 2964 #-Patent 2965 2966 #-Online Publication: Used for journals which publish strictly online and 2967 do not issue print copies. 2968 2969 #-Submission 2970 2971 **Scope 2972 2973 #Using the radio buttons, select one of the options. 2974 2975 #-Refers to the entire sequence: Most publications should be classified 2976 as such. 2977 2978 #-Refers to part of the sequence: For use only when a publication 2979 discusses only part of the presented sequence. You must enter the 2980 locations in the location tab in later forms. This selection is only 2981 valid when adding a Publication feature, not descriptor. 2982 2983 #-Cites a feature on the sequence: This selection should only be made in 2984 limited cases. Its use must coincide with the use of the /citation 2985 qualifier on the given feature. 2986 2987 #After you have filled out the Citation on Entry form, click on 2988 "Proceed" to see the next form. 2989 2990 *Citation Information Form (General) 2991 2992 **Authors Page 2993 2994 ***Names Subpage 2995 2996 #Please enter the names of the authors. Note that the first name of the 2997 author is listed first. You can add as many authors to this page as 2998 necessary. After you type in the name of the third author, the box 2999 becomes a spreadsheet, and you can scroll down to the next line by 3000 using the thumb bar. The suffix toggle allows the addition of common 3001 suffixes to the author name. The consortium field should be used when 3002 a consortium is responsible for the sequencing or publication of the 3003 data. The consortium should not be the department or institute 3004 affiliation of the authors. Individual authors may be listed along 3005 with a consortium name. 3006 3007 ***Affiliation Subpage 3008 3009 #Please enter information about the institution where the sequencing was 3010 performed. 3011 3012 #Other pages in the Citation Information Form will be different, 3013 depending on the Class of publication selected in the Citation on Entry 3014 Form. Instructions for filling out the Citation Information Form for 3015 Journals is included here. 3016 3017 *Citation Information Form (If Selected Class Was Journal) 3018 3019 **Title Page 3020 3021 #Enter title for manuscript in the box. 3022 3023 **Journal Page 3024 3025 #Fill in the appropriate Journal, Volume, Issue, Pages, Day, and Year 3026 fields by typing information into the boxes. Select the month with the 3027 pop-up menu. If necessary, choose an option from the Erratum pop-up 3028 menu and explain the erratum. 3029 3030 #If you are running Sequin in its 3031 <A HREF="#NetConfigure"> 3032 network-aware 3033 </A> 3034 mode, the program will look up the Title, Author, and Journal 3035 information in the MEDLINE database if you supply it with some minimal 3036 information. For example, if you know the MUID (MEDLINE Unique 3037 Identifier) of the publication, enter it in the appropriate box and 3038 select "Lookup By MUID." Sequin will automatically retrieve the rest 3039 of the information. One way to find the MUID of the publication is to 3040 look up the publication with the NCBI's 3041 3042 <A HREF="http://www.ncbi.nlm.nih.gov/Entrez"> 3043 Entrez 3044 </A> 3045 service. Alternatively, if you do not know the MUID, enter the Journal, 3046 Volume, Pages, and Year. Then select "Lookup Article". Sequin will 3047 retrieve the missing Title and Author information. 3048 3049 #The PubStatus toggle is used by database staff. If you have used the 3050 "Lookup by MUID" or "Lookup by PMID" functions, this field may be 3051 populated. Please do not edit the information. 3052 3053 **Remark Page 3054 3055 #This page is reserved for use by the database staff. 3056 3057 >File Menu 3058 3059 *About Sequin 3060 3061 #Details about the current version of Sequin. 3062 3063 *Help 3064 3065 #Launches the help documentation. 3066 3067 *Open 3068 3069 #Open an existing entry. This option will open a record that has been 3070 previously saved in Sequin. Furthermore, for analysis purposes, it can also 3071 open 3072 a FASTA-formatted sequence file. The sequence will be displayed in Sequin and 3073 can be analyzed with tools such as CDD Search, but it should not be submitted, 3074 because it does not have the appropriate annotations. 3075 3076 *Close 3077 3078 #Close this entry. 3079 3080 *Export GenBank 3081 3082 #Exports the currently displayed format to a file. Do not use export 3083 ASN1 for submission of sequences to the database. 3084 3085 *Duplicate View 3086 3087 #Duplicates the entry. You can then view the entry simultaneously in 3088 different Display Formats. 3089 3090 *Save 3091 3092 #Saves the entry. Note: This merely saves the entry so you can go back 3093 and edit it. It does not prepare the entry for submission to the 3094 database, that is, it does not validate the entry. 3095 3096 *Save As 3097 3098 #See Save. 3099 3100 *Save as Binary Seq-entry 3101 3102 #Saves the file in a compressed format and should be used only when the 3103 file is to be imported into other analysis programs. Do not use this 3104 option to save files for submission directly to GenBank. 3105 3106 *Restore 3107 3108 #Replaces the displayed record with a previously saved version. This 3109 feature is useful if you have made unwanted changes since you last saved 3110 the record. 3111 3112 *Prepare Submission 3113 3114 #Prepares the entry for submission to the database. See 3115 <A HREF="#SubmittingtheFinishedRecordtotheDatabase"> 3116 Submitting the Finished Record to the Database 3117 </A> 3118 in the Sequin help documentation. 3119 3120 *Print 3121 3122 #Prints the window that is currently selected. The selected window can 3123 be one of the Sequin forms or pages, or the help documentation. 3124 3125 *Quit 3126 3127 #Exit from Sequin. 3128 3129 >Edit Menu 3130 3131 *Copy 3132 3133 #Copy the selected item. 3134 3135 *Clear 3136 3137 #Clear the selected item. 3138 3139 *Edit Sequence 3140 3141 #To edit a single sequence, select the sequence identifier in the Target 3142 Sequence pop-up menu, and click on Edit sequence. The sequence editor 3143 will be launched for that sequence. The 3144 <A HREF="#SequenceEditor"> 3145 sequence editor 3146 </A> 3147 is discussed in more detail below. 3148 3149 *Alignment Assistant 3150 3151 #This option will launch the Alignment Assistant which is discussed in 3152 more detail 3153 3154 <A HREF="#Workingwithsetsofalignedsequences"> 3155 below 3156 </A> 3157 . 3158 3159 *Edit Submitter Info 3160 3161 #Opens up the Submission Instructions form, which allows you to enter 3162 additional information about the person submitting the record. Much of 3163 this information was entered on the first form in Sequin, the Submitting 3164 Authors form. 3165 3166 #You can also save the information from the Submitting Authors form 3167 here, so that you can use it in subsequent Sequin submissions. Click 3168 on "Edit Submitter Info" and, under the file menu in the resulting 3169 Submission Instructions form, click on Export Submitter Info to save 3170 the information to a file. For subsequent Sequin submissions, if you 3171 have already saved the submittor information, click on Import Submitter 3172 Info under the File menu on the Submission page of the Submitting 3173 Authors form. 3174 3175 **Submission Page 3176 3177 #Indicate the type of submission. If it is a new submission, select 3178 New. If you are updating an existing submission in order to resubmit it 3179 to the databases, select Update. Check either the "Yes" or "No" radio 3180 button to indicate if the record should be released before publication. 3181 If you select "Yes", the entry will be released to the public after the 3182 database staff has added it to the database. If you select "No", fields 3183 will appear in which you can indicate the date on which the sequences 3184 should be released to the public. The submission will then be held back 3185 until formal publication of the sequence or 3186 GenBank Accession number, or until the Release Date, whichever comes 3187 first. If you have any special instructions, enter them in the box at 3188 the bottom of the page. 3189 3190 **Contact Page 3191 3192 #Update the name, affiliation, or contact numbers of the person 3193 submitting the record. Please supply a fax number to facilitate 3194 communication with database staff. 3195 3196 **Citation Page 3197 3198 #Update the names and affiliation of the people who should receive 3199 scientific credit for the generation of sequences in this entry. The 3200 address should list the principal institution in which the sequencing 3201 and/or analysis was carried out. If you are submitting the record as 3202 an update to the databases, explain the reason for the update on the 3203 Description subpage. 3204 3205 *Update Sequence 3206 3207 #This selection allows you to replace a sequence with another sequence, 3208 merge two sequences that overlap at their ends, 'patch' a corrected 3209 fragment of a sequence to the current sequence, or copy features from 3210 one sequence to another. 3211 3212 #Use Single Sequence to import a sequence in FASTA or ASN.1 format (for 3213 example, a sequence record that has already been saved in Sequin). If 3214 you are running Sequin in 3215 3216 <A HREF="#NetConfigure"> 3217 Network Aware mode, 3218 </A> 3219 you can use Download Accession to import a record from Entrez. The 3220 Multiple Sequences option allows you to update multiple sequences using 3221 either FASTA or ASN.1 formats. In either format, each sequence 3222 identifier must be identical in the new and old sequences. 3223 3224 #After you import the updated sequence, a new window will open that 3225 displays two graphical views and the text of the alignment of the new 3226 and old sequence. The first graphic displays the relative length of the 3227 two sequences and the length of the overlapping region between 3228 sequences. The second graphic represents any inserts, deletions, or 3229 point changes within the aligned region between the new and old 3230 sequences. Clicking on a region in this graphic will scroll to the 3231 corresponding nucleotide sequence in the alignment text below. 3232 3233 #The Sequence Update box to the right shows the action that will be 3234 performed upon updating the sequence, i.e., no change, replace, extend 3235 5', extend 3', or patch. The patch function allows you to replace an 3236 internal fragment of the sequence without affecting flanking regions. 3237 You can also override the alignment between the new and old sequence 3238 using the Ignore alignment checkbox to force a sequence change of 3239 replace, Extend 5' or Extend 3'. This option allows you to append new 3240 sequence to with no overlap. 3241 3242 #If the current sequence has annotation, you can use the Existing 3243 Features box to determine whether the annotation should remain or be 3244 removed upon updating the sequence. The Do not remove option is the 3245 default. However, you may chose to remove annotated features only in 3246 the aligned area, outside the aligned area, or to remove all currently 3247 annotated features. 3248 3249 #When updating via Download Accession or an ASN.1 file, the Import 3250 Features box allows you to specify whether features from the new file 3251 should be imported to the existing record. The dialog offers 3252 different options for cases where the features on the new file are 3253 identical to those on the existing record. 3254 3255 #If you are using the Multiple Sequences option, you may choose to 3256 review the sequences and update them one by one using the Update this 3257 Sequence box at the bottom of the window. You may skip a sequence 3258 update or choose to update all sequences at once without reviewing them 3259 in the Update Sequence dialog. 3260 3261 #In any case, please carefully review the sequence and annotation in the 3262 record viewer after using the Update Sequence function. 3263 3264 *Extend Sequence 3265 3266 #This selection functions similar to the 3267 3268 <A HREF="#UpdateSequence"> 3269 Update Sequence 3270 </A> 3271 3272 function. However, you can only extend the existing sequence in either 3273 the 5' or 3' direction in cases with no overlap between the existing 3274 and new sequences. 3275 3276 *Feature Propagate 3277 3278 #This selection allows you to propagate any annotated feature from one 3279 sequence in an aligned set to other sequences within the set. For 3280 example, if one nucleotide sequence in the alignment contains a CDS 3281 feature, you can annotate a similar CDS on the other nucleotide 3282 sequences in the set. 3283 3284 #The default source of features to be propagated is the first member 3285 of the set. If you would like to use a different entry as the source of 3286 the features, scope to that entry in the Target Sequence menu before 3287 selecting Feature Propagate from the Edit menu. 3288 3289 #The Feature Propagate window allows you to select which sequences 3290 should receive the new annotation and which features will be 3291 propagated. You can also select whether the features will be extended 3292 or split at gaps in the alignment. The split at gaps selection will 3293 produce two features, one on either side of the gap within the 3294 alignment. If you are propagating a CDS feature, you may specify that 3295 the translation end or extend through internal stop codons. You may 3296 also extend the translation after the stop codon on the source entry by 3297 chosing to translate the CDS after partial 3' boundary. If the CDS 3298 that you are propagating to other records is partial on either end, you 3299 should select the 'Cleanup CDS partials after propagation' check box. 3300 This will retain the partial nature of the CDS features on all records. 3301 The fuse adjacent propagated intervals function will create one 3302 feature from two of the same type that contain abutting nucleotide 3303 intervals due to the nature of the alignment used for propagation. 3304 3305 *Add Sequence 3306 3307 #This selection allows you to add a new sequence to an existing 3308 population, mutation, phylogenetic, or environmental sample set. 3309 You may import the new entry in FASTA format or ASN.1 format (for 3310 example, a sequence record that has been saved in Sequin). 3311 3312 *Parse File to Source 3313 3314 #This selection allows you to add unique information for one source 3315 qualifier for each of the records in a batch or set. The input file 3316 must be in the format of a tab-delimited, two column table. The first 3317 column should list the SeqID exactly as it was listed in the original 3318 FASTA file. The second column should list the text value for the 3319 desired source qualifier for each record. Once the file has been 3320 imported, a pop-up box will appear with the source qualifiers listed in 3321 the pull down menus. The qualifiers are separated into three menus: 3322 one for taxonomic information, one for the Organism modifiers and one 3323 for the Source modifiers. For example, in order to add the clone 3324 designations 57 and 49 to the sequences labeled seq1 and seq2, the table 3325 3326 seq1 57 3327 seq2 49 3328 3329 should be used and clone should be selected from the Source modifiers 3330 pull-down menu. 3331 3332 >Search Menu 3333 3334 *Find ASN.1 3335 3336 #Under this command, you can find and replace strings of letters in 3337 those fields of your submission that contain manually entered data. 3338 The fields that can be altered are Locus, Definition, Accession, 3339 Keywords, Source, Reference, and Features. To use this option, select 3340 Find and fill the Find and Replace lines with the appropriate text. 3341 Note that you cannot edit the sequence in this way. 3342 3343 *Find FlatFile 3344 3345 #Under this command, you can find strings of letters in all fields of 3346 your submission. You can use the Find First and Find Next buttons to 3347 identify the specified text sequentially through the flatfile. 3348 3349 *Find by Gene 3350 3351 #This option allows you to move quickly in the record viewer to a gene 3352 feature containing the specified gene symbol. 3353 3354 *Find by Protein 3355 3356 #This option allows you to move quickly in the record viewer to a CDS 3357 feature containing the specified product name. 3358 3359 *Find by Position 3360 3361 #This option allows you to move quickly in the record viewer to any 3362 feature annotated at the specified nucleotide location. 3363 3364 *Validate 3365 3366 #This option detects discrepancies between the format of your submission 3367 and that required by the database selected for entry. If discrepancies 3368 are present, it suggests ways in which to correct them. See the topic on 3369 3370 <A HREF="#SubmittingtheFinishedRecordtotheDatabase"> 3371 Submitting the Finished Record to the Database 3372 </A> 3373 in the Sequin help documentation. 3374 3375 *CDD Search 3376 3377 #Performs a CDD BLAST search of the selected sequence against the 3378 NCBI's 3379 <A HREF="http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml"> 3380 Conserved Domain Database 3381 </A> 3382 . To do a CDD BLAST search, Sequin must be in its network aware mode. 3383 3384 #CDD currently contains domains derived from two popular collections, 3385 Smart and Pfam, plus contributions from colleagues at NCBI. The source 3386 databases also provide descriptions and links to citations. Since 3387 conserved domains correspond to compact structural units, CDs contain 3388 links to 3D-structure via Cn3D whenever possible. 3389 3390 #The results of the CDD search will be displayed in the record 3391 viewer. These results are for your use only and should be removed 3392 from the record before submission. 3393 3394 *Vector Screen 3395 3396 #This option allows you to run a BLAST search of your nucleotide 3397 sequence(s) against NCBI's 3398 <A HREF="http://www.ncbi.nlm.nih.gov/VecScreen/UniVec.html"> 3399 UniVec 3400 </A> 3401 database. We highly recommend that you run this analysis and remove 3402 any vector contamination before submission. The UniVec database 3403 contains only one copy of every unique sequence segment from a large 3404 number of vectors. It also contains sequences for adapters, linkers 3405 and primers commonly used. 3406 3407 #To run Vector screen on a submission containing multiple sequences, 3408 scope to ALL SEQUENCES in the Target Sequence pull-down before running 3409 the analysis. If there are many sequences, a status bar will appear 3410 indicating the progress of the search. If no contamination is found, a 3411 pop-up box will appear to notify you. If contamination is found, a 3412 miscellaneous feature will be annotated on the flatfile with the 3413 location of the contamination. Details will include the relative 3414 strength of the BLAST hit. You must trim the nucleotide sequence to 3415 remove this feature before submission. 3416 3417 *ORF Finder 3418 3419 #The ORF Finder shows a graphical representation of all open reading 3420 frames (ORFs) in the nucleotide sequence. This tool allows you to 3421 select ORFs and have them appear as coding sequence (CDS) features on 3422 the sequence record. 3423 3424 #The ORFs, indicated by colored boxes, are defined as the longest sequence 3425 containing a start codon and stop codon. All six reading frames are shown as 3426 separate lines in the graphical view. The top three lines represent the plus 3427 strands, and the bottom three lines the minus strands. In the default view, 3428 the nucleotide sequence intervals of the ORFs are displayed in descending 3429 length order on the right side of the window. Intervals on the complementary 3430 (minus) strand are indicated by a 'c'. Selecting 'Order by Start' will 3431 reorder the list based on the nucleotide location of the start codon. 3432 3433 #Clicking on the box labelled ORF changes the graphical display so that the 3434 potential start codons are indicated in white, and stop codons in red. 3435 3436 #The default settings display only those ORFs which contain an ATG start 3437 codon. Selecting 'Alternative' in the 'Initiation Codon' box, will also 3438 include ORFs beginning with all valid alternative start codons as determined 3439 by the genetic code listed in the source feature. If the genetic code for the 3440 source organism has not been specified, the default 3441 <A HREF="http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c#SG1"> 3442 standard genetic code 3443 </A> 3444 will be used. 3445 3446 #The ORF length button sets the length of ORFs that are 3447 displayed. For example, the default of 10 shows all ORFs that are 3448 greater than 10 nucleotides in length. 3449 3450 #Checking the Show Partial ORFs box will display ORFs that extend to the 3451 end(s) of the nucleotide sequence but are 5' or 3' partial. 3452 3453 #ORFs can be selected by clicking on the graphical representation or on the 3454 sequence interval. Once an ORF has been selected, its location and amino acid 3455 sequence will automatically be fielded in the 3456 <A HREF="#CDS"> 3457 CDS feature editor 3458 </A> 3459 accessed under the Annotate menu. 3460 3461 *Select Target 3462 3463 #This option changes the sequence that is selected in the Target 3464 Sequence pop-up. Type the SeqID of the sequence in the box, and the 3465 record viewer will be updated to display that sequence. 3466 3467 >Misc Menu 3468 3469 *Style Manager 3470 3471 #The Style manager allows you to choose between different formats in 3472 which to view the Graphical Display Format. The graphical display is 3473 selected by choosing the Graphic display format on the record viewer. 3474 Using the Style Manager, you can also copy the style or modify it to 3475 suit your needs. 3476 3477 *Net Configure 3478 3479 #As a default, Sequin is available as a stand-alone program. However, 3480 the program can also be configured to exchange information with the NCBI 3481 (GenBank) over the Internet. The network-aware mode of Sequin is 3482 identical to the stand-alone mode, but it contains some additional 3483 useful options. 3484 3485 #Sequin will only function in its network-aware mode if the computer on 3486 which it resides has a direct Internet connection. Electronic mail 3487 access to the Internet is insufficient. In general, if you can install 3488 and use a WWW browser on your system, you should be able to install and 3489 use network-aware Sequin. Check with your system administrator or 3490 Internet provider if you are uncertain as to whether you have direct 3491 Internet connectivity. 3492 3493 #There are two ways to change Sequin into its network-aware mode. If 3494 you are still on the initial Welcome to Sequin form, select Net 3495 Configure under the Misc menu. If you have already worked on a Sequin 3496 submission and are looking at the record in the record viewer, select 3497 the Net Configure option from the Misc menu. 3498 3499 #Most users will be able to use the default (Normal) settings on the 3500 Network Configuration page; select Accept to complete the configuration 3501 process. 3502 3503 #If a "Normal" Connection does not work, you may need to select the 3504 Firewall Connection. Contact your system administrator to determine 3505 what to enter into the Proxy and Port fields. If you do not have 3506 access to the domain name server (DNS), uncheck this box. 3507 3508 #The Timeout pop-up selects the length of time that your local copy of 3509 Sequin will wait for a reply from the NCBI server. You may need to set 3510 this number higher (i.e., 60 seconds or 5 minutes) if you are outside 3511 of the United States or have a bad internet connection. 3512 3513 #If you have problems setting up the network configuration, contact 3514 3515 <a href="mailto:info@ncbi.nlm.nih.gov"> 3516 info@ncbi.nlm.nih.gov. 3517 </a> 3518 3519 #If you would like to change Sequin back to its stand-alone mode, select 3520 Net Configure again from the Misc menu. Click on Connection: None. 3521 3522 #The network-aware mode of Sequin allows you to perform a number of 3523 additional, important functions. These functions all appear as 3524 additional menu items. A brief description of these functions follows. 3525 Further descriptions are available as indicated elsewhere in the help 3526 documentation. 3527 3528 **Updating Existing GenBank Records 3529 3530 #Using Sequin in its network-aware mode, you can download an existing 3531 GenBank record from Entrez using the GenBank accession number or GI 3532 identification number (NID). You can then use Sequin to make any 3533 necessary changes to the record, and resubmit it to GenBank as a 3534 sequence update. 3535 3536 <A HREF="#WelcometoSequinForm"> 3537 Instructions 3538 </A> 3539 for submitting sequence updates are presented under the Welcome to 3540 Sequin Form. You can download any record from Entrez and look at it in 3541 Sequin. However, you can only formally update those records which you 3542 have submitted since submitters retain editorial control of their 3543 records. 3544 3545 **Performing a PubMed Look-Up 3546 3547 #In its network-aware mode, Sequin can import the relevant sections of a 3548 PubMed record directly into a sequence submission record. Rather than 3549 typing in the entire citation, you can enter minimal information, such 3550 as the PubMed Unique Identifier (PMID), or Journal name, volume, year, 3551 and pages. The 3552 3553 <A HREF="#JournalPage"> 3554 PubMed lookup 3555 </A> 3556 is explained in the section of the documentation entitled Publications. 3557 3558 **Performing a Taxonomy Look-up 3559 #In its network-aware mode, Sequin can look 3560 up the taxonomic lineage of an organism from the NCBI's Taxonomy 3561 database. This look-up is normally performed by the NCBI database staff 3562 after the record has been submitted to GenBank. If you look up the 3563 taxonomy before submitting the sequence, you can make a note in the 3564 record of any disagreements. The 3565 <A HREF="#LineageSubpage"> 3566 taxonomy lookup 3567 </A> 3568 is explained in the section of the documentation covering 3569 Biological Source: Organism page: Lineage subpage. 3570 3571 **Accessing the NCBI DeskTop 3572 #The NCBI DeskTop displays the internal 3573 structure of the record being viewed in Sequin. The 3574 <A HREF="#NCBIDeskTop"> 3575 DeskTop 3576 </A> 3577 is explained under the Misc menu. 3578 3579 *NCBI DeskTop 3580 3581 #This option is only available if you are running Sequin in its 3582 <A HREF="#NetConfigure"> 3583 network-aware 3584 </A> 3585 mode. 3586 3587 #The NCBI DeskTop provides a view of the internal structure of the 3588 Sequin record, the ASN.1. Its display resembles a Venn diagram and 3589 represents all the structures represented in the ASN.1 data model. 3590 3591 #In addition, a number of undocumented software tools from the NCBI can 3592 be accessed from the DeskTop. These tools are components of the NCBI 3593 portable software Toolkit. You can also customize these functions using 3594 the Toolkit with your own software tools. The Toolkit and its 3595 documentation are available from the NCBI by anonymous 3596 <A HREF="ftp://ftp.ncbi.nih.gov/toolbox/README"> 3597 FTP. 3598 </A> 3599 3600 #The DeskTop should only be used by very seasoned users. At this time, 3601 we are not providing any documentation for these specialized functions. 3602 3603 >Annotate Menu 3604 3605 #This menu allows you to enter features and descriptors on the sequence. 3606 3607 #The first six options, Genes and Named Regions, Coding Regions and 3608 Transcripts, Structural RNAs, Bibliographic and Comments, Sites and 3609 Bonds, and Remaining Features refer to types of Features that can be 3610 added to the sequence. Features are described in more detail in the 3611 above section entitled 3612 <A HREF="#Features"> 3613 Features. 3614 </A> 3615 3616 #If you are submitting a set of similar sequences, you can add the same 3617 feature across the entire span of each by using the Batch Feature Apply 3618 option. The feature must span the entire nucleotide sequence of each 3619 member; you can not annotate specific nucleotide locations using this 3620 option (for this, see 3621 3622 <A HREF="#FeaturePropagate">Feature Propagate</A>). 3623 3624 For each feature, you will be prompted to designate whether the feature 3625 is 5' or 3' partial and whether is is on the plus or minus strand. You 3626 may also add a comment or other qualifier to the feature. The Add 3627 Qualifier option allows you to add a qualifier to an existing feature. 3628 You must specify the feature and qualifier in the Add Qualifier pop-up 3629 box. Source qualifiers can be added to all entries using the Add 3630 Source Qualifier option. Qualifiers specific to the CDS and gene can 3631 be added using Add CDS-Gene-Prot-mRNA and RNA qualifiers using Add RNA 3632 Qual. In each case, a pop-up box appears with qualifier options 3633 appropriate for that feature. 3634 3635 #The Batch Feature Edit function allows you to edit existing qualifiers. 3636 For each menu choice, a pop-up box allows you to select the feature 3637 containing the qualifier and the specific qualifier to be edited. You 3638 can use the Find/Replace text boxes to edit the information contained 3639 within the qualifier. 3640 3641 #The Publications option allows you to add a Publication Feature or 3642 Publication Descriptor to the record. Publications are described in 3643 more detail in the above section entitled 3644 3645 <A HREF="#Publications"> 3646 Publications. 3647 </A> 3648 3649 #The Descriptors option allows you to add Descriptors to the 3650 record. Descriptors are described in more detail in the section 3651 entitled 3652 <A HREF="#Descriptors"> 3653 Descriptors, 3654 </A> 3655 above. 3656 3657 #The Generate Definition Line option will generate a title for your 3658 sequence based on the information provided in the record. This option 3659 will work for single sequences as well as sets of sequences, and can 3660 handle complex annotations with multiple features. The title will 3661 follow GenBank conventions, but may be modified by the database staff 3662 if it is not appropriate. The title you enter here will replace any 3663 title you entered elsewhere in the submission, for example, any title 3664 that was attached to the nucleotide sequence. 3665 3666 #The Advanced Table Readers option imports a properly formatted structured 3667 comment table. Please contact us if you wish to use this option. 3668 3669 #Sort Unique Count by Group opens a new window which displays your record(s) 3670 the number of times an individual line appears in the flatfile(s). This is 3671 particularly useful when checking that all records in a large set contain the 3672 required source or feature information. 3673 3674 >Options Menu 3675 3676 #This menu is only available when using Sequin in its network-aware mode. 3677 *Font 3678 3679 #Use this item to change the display font. From the pop-up menus, 3680 choose the style and size of type. For additional changes, mark the 3681 Bold, Italic, or Underline check boxes. The default font is 10-point 3682 Courier. 3683 3684 >Sequence Editor 3685 3686 #This editor allows you to modify the nucleotide or amino acid sequences 3687 and corresponding annotation in your entry. Although the Sequence Editor 3688 does allow you to undo changes you make to the sequence, we strongly 3689 suggest that you save a copy of the entry before launching the Sequence 3690 Editor so that you can revert to it if necessary. 3691 3692 *Starting the Sequence Editor 3693 3694 #The sequence that appears in the editor is dependent on the sequence(s) 3695 selected in the Target Sequence pull-down list. There are two ways to 3696 launch the sequence editor for nucleotide sequences. First, you can 3697 double click within sequence in any display format of the record viewer. 3698 A window containing the DNA sequence will appear. Second, in the record 3699 viewer, select the sequence that you would like to edit in the Target 3700 Sequence pop-up menu. Click on Edit Sequence under the Edit menu. You 3701 can launch the editor for protein sequences by selecting the protein 3702 sequence in the Target Sequence pop-up menu and double clicking within 3703 the protein sequence. A window containing the protein sequence will 3704 appear. 3705 3706 *Moving around the Sequence Editor 3707 3708 #The cursor can be moved with the mouse or the arrow keys. The display 3709 window will change to show the position of the cursor. The sequence 3710 location of the first residue on each line is indicated on the left side 3711 of the window. The cursor location, or the range of sequences selected 3712 by the mouse, is shown in the upper left corner of the window. If you 3713 want to move the cursor to a specific location, type the number in the 3714 box on the top left of the sequence editor window, and hit the Go to 3715 button. If you want to look at a specific sequence, but not move the 3716 cursor to it, type the number in the upper right box of the window and 3717 hit the Look at button. 3718 3719 *Editing Sequence and Existing Annotation 3720 3721 #Select a piece of sequence by highlighting it with the mouse. To 3722 select the entire sequence, click on a sequence location number on the 3723 left side of the window. Any sequence that is highlighted in the 3724 Sequence Editor will show up as a box on the sequence when it is viewed 3725 in the Graphic Display Format. 3726 3727 #One way to insert and delete residues is with the mouse. Move the 3728 cursor to the appropriate location and type. Text will be inserted to 3729 the left of the cursor. Delete sequence with the backspace or delete 3730 key. Text will be deleted to the left of the cursor. To delete a block 3731 of sequence, highlight it with the mouse and use the delete or backspace 3732 key. 3733 3734 #Another way to insert and delete residues is with options under the Edit 3735 menu of the Sequence Editor. Use Cut to remove, or Copy to copy, 3736 highlighted residues. Copied residues can then be pasted elsewhere 3737 within the sequence by using the Paste option. 3738 3739 #Features annotated via the record viewer will be displayed in a 3740 graphical format within the sequence editor. CDS features will be be 3741 displayed as a blue line across the appropriate nucleotide location. All 3742 other features will be displayed as a black line. To the left of the 3743 line, the name of the feature is displayed. In the case of CDS or mRNA 3744 features, the product name is shown. For gene features, the gene locus 3745 is shown. 3746 3747 #Double-clicking on the feature will launch the feature editor just as in 3748 the record viewer. However, you can also change the nucleotide location 3749 of any feature within the graphical view. To move the entire feature, 3750 select the feature and drag it to the appropriate location while holding 3751 down the mouse button. To alter the 5' or 3' end of a feature, click on 3752 the feature's end and drag to the new location while holding down the 3753 mouse button. 3754 3755 #Before moving the nucleotide locations of a CDS feature, it may be 3756 useful to view the codons in the current translation. You can do this by 3757 clicking on the feature line and releasing the mouse button. A grid will 3758 be displayed that shows the triplet location for the current annotation. 3759 Once you have changed the nucleotide location of a CDS feature in the 3760 graphical view, you can see the new translation by using the Translate 3761 CDS button at the bottom of the window. 3762 3763 #To save changes you have made to the sequence, press the Accept button 3764 at the bottom of the Sequence Editor display window. If you do not want 3765 to save the changes, press the Cancel button at the bottom of the 3766 Sequence Editor display window. Selecting either Accept or Cancel will 3767 quit the Sequence Editor and return you to the record viewer. Any 3768 changes you make will not become a permanent part of the Sequin record 3769 until you Save the record in the record viewer. 3770 3771 #New features can be added using the Features menu. 3772 3773 *Sequence Editor Window Buttons 3774 3775 **Go to 3776 3777 #Moves the cursor to the indicated location. 3778 3779 **Look at 3780 3781 #Moves the window to the indicated location without moving the cursor. 3782 3783 **Merge Feature Mode/Split Feature Mode 3784 3785 #In merge mode, any new sequence that is entered into a region spanned 3786 by an existing feature becomes part of that feature. For example, if 3787 you enter new sequence in the middle of a CDS, that sequence will be 3788 translated as part of the CDS. In split mode, the new sequence 3789 interrupts the feature. For example, if you enter new sequence in the 3790 middle of a CDS, the CDS will be interrupted by that sequence (see the 3791 location of the CDS in the record viewer). 3792 3793 **Numbering 3794 3795 #Allows the sequence location numbering to be hidden, displayed on the 3796 side, or displayed on the top of the sequence. 3797 3798 **Grid 3799 3800 #Allows the display to show a grid separating each feature and sequence 3801 for easier viewing. 3802 3803 **Show/Hide Features 3804 3805 #This box toggles between hiding and showing the features on a sequence. 3806 3807 **Accept 3808 3809 #Closes the Sequence Editor after saving all of the changes made to 3810 sequences and features. 3811 3812 **Cancel 3813 3814 #Closes the Sequence Editor without saving any changes made to sequences or 3815 features. 3816 3817 **Translate CDS 3818 3819 #Allows translation of coding region features after the location has been 3820 changed within the graphical view. 3821 3822 *Sequence Editor File Menu 3823 3824 **Export 3825 3826 #Allows the export of a range of sequence as a FASTA file or text file. 3827 Using the text option will also export overlapping features if they are 3828 displayed. If the features are first hidden, only the sequence will be 3829 exported. All protein translations displayed at the time of export, will 3830 be exported as well. 3831 3832 **Accept 3833 3834 #Closes the Sequence Editor after saving all of the changes made to 3835 sequence and features. 3836 3837 **Cancel 3838 3839 #Closes the Sequence Editor without saving any changes made to sequences 3840 of features. 3841 3842 *Sequence Editor Edit Menu 3843 3844 **Undo 3845 3846 #Undoes all actions performed in the Sequence Editor since the last save. 3847 3848 **Redo 3849 3850 #Restores changes removed with Undo option 3851 3852 **Cut 3853 3854 #Removes the highlighted sequence. This sequence can be pasted elsewhere. 3855 3856 **Paste 3857 3858 #Pastes a cut or copied sequence to the right of the cursor. 3859 3860 **Copy 3861 3862 #Copies the highlighted sequence. This sequence can be pasted elsewhere. 3863 3864 **Find 3865 3866 #Allows you to find DNA or amino acid sequence patterns in your sequence. 3867 The search is case insensitive. To find an exact match to a DNA 3868 sequence pattern, type the pattern in the box. The number of items found 3869 will be displayed and you can toggle through each instance with the Find 3870 Next button. To find the reverse complement of the pattern, click on 3871 the reverse complement box at the bottom of the pop-up box. 3872 3873 #To find an exact match to an amino acid seqeunce pattern, type that 3874 sequence in the box, and click on "translate sequence". Sequin will look 3875 for all occurrences of that pattern in all six open reading frames. The 3876 DNA sequence encoding that protein sequence in any of the six reading 3877 frames will be hightlighted. 3878 3879 **Translate CDS 3880 3881 #Allows translation of coding region features after the location has been 3882 changed within the graphical view. 3883 3884 **Complement 3885 3886 #Shows the complement of the submitted strand underneath the original. 3887 3888 **Reading Frames 3889 3890 #Shows the indicated phase translation of the selected coding sequence. 3891 You can select any or all of the six reading frames, all reading frames 3892 or all positive or negative frames. 3893 3894 **Protein Mismatches 3895 3896 #Indicates amino acid which does not match conceptual translation 3897 following a nucleotide sequence change. The original amino acid sequence 3898 will be displayed until the Translate CDS function is used. Differences 3899 will be indicated by a red box around the amino acid abbreviation. 3900 3901 **On-the-fly Protein Translations 3902 3903 #Creates a second amino acid sequence in the display which retranslates 3904 as the nucleotide sequence is changed to allow side-by-side comparison to 3905 the original amino acid sequence. 3906 3907 *Sequence Editor Features Menu 3908 3909 #The menu contains a long list of all features that can be annotated on a 3910 sequence. These features are the same as those that are accessible 3911 through the main Sequin Annotate menu. 3912 3913 #You can annotate features either in the Annotate menu or in the Sequence 3914 Editor. If you annotate them in the Annotate menu, you must type in the 3915 nucleotide sequence location of the feature. However, if you add 3916 features from the Sequence Editor, you can highlight the sequence that 3917 the feature covers, and the location of the sequence will be 3918 automatically entered in the feature location box. Additional 3919 explanations of how to annotate features are provided in the section on 3920 <A HREF="#Features"> 3921 Features. 3922 </A> 3923 3924 >Working with Sets of Aligned Sequences 3925 3926 #Sequin allows you to work with aligned sets of closely related 3927 nucleotide sequences that are part of a population, phylogenetic, or 3928 mutation study. If the sequences are imported in a pre-aligned format, 3929 such as PHYLIP, Sequin uses this alignment. If the sequences are 3930 imported individually in FASTA format, Sequin can generate its own 3931 alignment. 3932 3933 #You can view the aligned sequences in the Sequence Alignment Editor. In 3934 the record viewer, select All Sequences in the Target Sequences menu, 3935 and select the Alignment Display Format. 3936 3937 #The Alignment Assistant is launched by selecting Alignment Assistant 3938 from the Edit menu in the record viewer. It can be used to apply 3939 features to the whole set of sequences using the alignment coordinates. 3940 Rather than calculating the nucleotide coordinates for every feature on 3941 every nucleotide sequence, you may select the feature's location using 3942 its alignment coordinates and apply it to every member of the set 3943 simultaneously. Sequin will calculate the nucleotide locations as they 3944 apply to each member of the set. 3945 3946 *Alignment Assistant Window Buttons 3947 3948 **Go to 3949 3950 #The Go to alignment position and Go to sequence position buttons both 3951 scroll the aligment assistant so that the requested position is 3952 visible. If the requested position is already visible, nothing will 3953 happen. Unlike the Sequence editor window, the 'go to' button does not 3954 control the cursor position. 3955 3956 **Numbering 3957 3958 #Allows the sequence location numbering to be hidden, displayed on the 3959 side, or displayed on the top of the sequence. 3960 3961 **Grid 3962 3963 #Allows the display to show a grid separating each feature and sequence for 3964 easier viewing. 3965 3966 **Features Toggle 3967 3968 #It is possible to view annotated features in the aligment assistant. 3969 The features are displayed as a bar underneath the coordinates for that 3970 feature. The identity of the feature is displayed in the left-hand 3971 column. The default selection is to have the features Hidden. You may 3972 display the features associated only with the Target Sequence or 3973 features annotated on All Sequences in the alignment. 3974 3975 *Alignment Assistant File Menu 3976 3977 **Export 3978 3979 #Allows you to export the alignment to a file in three different 3980 formats. The contiguous and interleaved options export the alignment 3981 accordingly in FASTA+GAP format. The text representation option saves 3982 the alignment as it appears in the Alignment Assistant. Note that with 3983 this option features are included if they are displayed at the time of 3984 export. 3985 3986 **Close 3987 3988 #Closes the Alignment Assistant window and saves any changes made. 3989 3990 *Alignment Assistant Edit Menu 3991 3992 **Remove Sequences from Alignment 3993 3994 #Allows you to remove selected sequence(s) from the alignment. Select 3995 the sequence by clicking on it. You can select multiple sequences by 3996 holding down the control key. The sequence will then be highlighted in 3997 grey. Note that this option will remove the sequence from the 3998 alignment, but it is still present in your submission. 3999 4000 **Validate Alignment 4001 4002 #Checks for problems with the alignment. If errors are reported, please 4003 review and attempt to fix your alignment before submission. 4004 4005 **Propagate Features 4006 4007 #This function is the same as that available under the Edit Menu in the 4008 record viewer. A full description is available 4009 4010 <A HREF="#FeaturePropagate"> 4011 above 4012 </A> 4013 . 4014 4015 *Alignment Assistant View Menu 4016 4017 **Target 4018 4019 #Allows you to select a sequence within the alignment as the target 4020 sequence. This can also be done by double-clicking on the sequence 4021 within the alignment. The SeqID of the target sequence will be 4022 displayed in red. Features can be displayed on the target sequence 4023 only and it is the sequence used for comparison in the 4024 4025 <A HREF="#ShowSubstitutions"> 4026 Show Substitutions 4027 </A> 4028 view. 4029 4030 **Show Substitutions 4031 4032 #Changes the alignment view so that identities are replaced with a "." 4033 and only substitutions are shown. The substitutions and identities are 4034 relative to the selected target sequence. 4035 4036 *Alignment Assistant Features Menu 4037 4038 #Allows the annotation of features to a single sequence or all sequences 4039 within the alignment. All features available in this menu are 4040 discussed through the main Sequin Annotate menu. 4041 4042 #Select the feature location by clicking the start location on one of 4043 the sequences, keeping the mouse button depressed, drag the cursor to 4044 the end of the feature location. The selected area will now be 4045 underlined and red and the alignment coordinates of this area will be 4046 displayed in the upper left of the Alignment Assistant window. 4047 4048 **Apply to Target Sequence 4049 4050 #Allows you to choose a feature to be applied only to the target 4051 sequence. The locations may be entered manually or can be determined 4052 based on highlighting the sequence as described above. 4053 4054 **Apply to Alignment 4055 4056 #Allows you to add the selected feature to all sequences within your 4057 alignment based on the alignment coordinates you have selected. Note 4058 that in the feature pop-up boxes in this menu, the Location will always 4059 be entered as the location relative to the alignment coordinates. 4060 4061 <HR> 4062 4063 <CENTER> 4064 <P>  4065 <P CLASS=medium1><B>Questions or Comments?</B> 4066 <BR>Write to the <A HREF="mailto:info@ncbi.nlm.nih.gov">NCBI Service 4067 Desk</A></P> 4068 <P CLASS=medium1>Revised November 17, 2008 4069 4070 </CENTER> 4071 4072 <!-- end of content --> 4073 4074 </body> 4075 </html> 4076
This page was automatically generated by the
LXR engine.
Visit the LXR main site for more information. |