Workshop Agenda

Seminars

Monday

Time

Subject

Speaker

 

8:30 AM

Registration

   
 

Session I - Introduction to JGI

   

09:00 - 09:15

Welcome and Overview of the Workshop

Nikos Kyrpides

 

09:15 - 09:45

Introduction to the JGI

The powerful high-throughput DNA sequencing technologies catalyzed by the Human Genome Project, which have contributed to dramatic advances in biomedicine, are now being directed to characterizing the genomes of plants and microbes. Leading this effort is the US Department of Energy (DOE) Joint Genome Institute (JGI), a national user facility that unites the expertise of five national laboratories to advance genomics in support of the DOE mission areas of bioenergy, carbon cycling, and bioremediation.

Jim Bristow

09:45 - 10:15

New Sequencing Technologies

JGI’s future depends on new sequencing technologies.  Currently, we are under the process of evaluating, validating, and developing applications for three next-generation sequencing technologies, namely Roche’s GS FLX, Illumina’s Genome Analyzer, and AB’s SOLiD system.  Introduction to all three technologies will be given and advantages and disadvantages will be compared and discussed.  Examples of applications in genomic research for these new technologies will be presented. 

Feng Chen

10:15 - 10:45

Microbial Genome Assembly and Finishing

The US DOE Joint Genome Institute's mission is to provide the scientific community with high-quality finished genomes. Approximately 400 microbial genomes are currently in the JGI pipeline and to date, 166 have been completed. The value of a totally complete microbial genome was recognized and “appreciated” by scientists. Finished genomes allow, for example, the study of genome-level evolution, while the draft sequences are usually of sufficient quality to determine the basic genetic and metabolic parameters of an organism. Some interesting traits can be lost when only working from draft. Computational and lab approaches will be discussed

Alla Lapidus

10:45 - 11:00

Break

11:00 - 11:30

Single cell genomics

The bulk of finished microbial genomes to date are derived from bacteria and archaea that can be readily grown in culture. However, the vast majority of microorganisms on this planet elude current culturing attempts, severely limiting access to their genomes. While various enrichment methods as well as metagenomic approaches have been successfully applied to aid the genome analysis of such non-cultivable environmental microbes, these methodologies are not suitable for countless community members of interest. Single cell genomics is a new approach which aims to access the genome from an individual microbial cell. Single cells can be isolated from the community using optical tweezers, micromanipulators, flow-sorting, or serial dilutions. After cell lysis, the microbial genome is amplified using multiple displacement amplification (MDA), allowing random genome shotgun sequencing.  The advantages as well problems associated with the single cell genomics approach will be discussed.

Tanja Woyke

and

Jan-Fang Cheng

11:30 - 12:00

JGI Eukaryotic Microbial Annotation of Eukaryotic Genomes

Over 50 eukaryotic genomes from different taxonomic groups are annotated at JGI using JGI Annotation pipeline. The pipeline integrates several gene prediction, annotation, and analysis tools to annotate a diverse set of genomes in high-throughput but genome-specific manner. To address gene prediction challenges in eukaryotes that often display high repeat content, low gene density, and complex gene structure, we combine different gene predictors with available experimental data and comparative genomics analysis. JGI Eukaryotic Portal provides web-based tools for user communities to enable comprehensive genome analysis and manual curation of predicted genes and functions.

Igor Grigoriev

12:00 - 13:00

Lunch - JGI Facilities Tour

 

Session II - JGI Science Projects

   

13:00 - 13:30

Microbial Genomics

Since the release of the first completely sequenced microbial genome, more than decade ago, the genomics world has been changing rapidly as large amounts of microbial sequencing data have been accumulating at exponential rate. Microbial Genomics, fueled by recent advancements in sequencing technology, are now playing a central role in medicine and biotechnology and have greatly expanded our understanding of the available phylogenetic and metabolic complexity. Where are we going next? The past, present and future of microbial genomics will be discussed.

Nikos Kyrpides

 

13:30 - 14:00

Archaeal Genomics

Archaea is the least well characterized organisms of the three domains of life. Yet, they share many important features with eukaryotes and are the key in understanding the origins and nature of the last common ancestor. JGI has a strong interest in Archaea due to their broad biotechnological applications as well as their relevance in Energy production, and therefore a large number of archaeal sequencing projects are currently under way. The analysis of two Crenarchaeal genomes that have been completely sequenced will be presented. Examples will be shown of how unique genes and genes uniquely missing from these genomes can be identified and characterized.

Iain Anderson

14:00 - 14:30

Phylogenomics

Clarifying the relationships among bacterial lineages is important to provide a phylogenetic framework on which to trace the evolution of bacterial diversity. The large number of complete genomes now available from numerous bacterial lineages has greatly augmented the power of phylogenetic analyses. Large-scale protein alignments and other approaches are revealing the topology of many areas of the bacterial tree, although others still remain controversial. We’ll discuss how to use genomic information to investigate bacterial phylogeny, in spite of horizontal transfer and other phenomena that entangle phylogenetic reconstruction, and how the phylogenies obtained can advance our understanding of genome evolution in bacteria.

Pilar Francino

14:30-15.00

Break

15:00 - 15:30

Introduction to Metagenomics

Metagenomics, the application of high throughput sequencing to environmental samples is an emerging field that is rapidly advancing our understanding of how microbial communities function and evolve. This introductory talk with trace the roots of metagenomics, it's current practice and speculate on future developments in the field

Phil Hugenholtz

 15:30 - 16:00

Metagenomics of Hypersaline mats

The Guerrero Negro hypersaline microbial mat in Baja California is one of the most complex and diverse microbial communities yet described.. We have generated shotgun sequence of 10 successive layers of a ~6 cm thick mat core for comparative analysis. Millimeter-scale functional gradients were inferred from gene and pathway frequency distributions that often tracked with the physicochemical profile of the mat. The environment and the results of the metagenome analysis will be presented and discussed

Victor Kunin

16:00 - 16:30

Discovery of feedstock-targeted glycosyl hydrolases by Metatranscriptomics

Highly active and stable cellulolytic enzymes are major bottlenecks for the efficient large-scale production of biofuels from lignocellulose, Complex microbial habitats such as the bovine foregut are known to harbor fibrolytic microbes and represent promising sources of novel biocatalysts for lignocellulose degradation. We employed high-throughput pyrosequencing to identify feedstock-targeted enzymes within the transcriptome of bovine rumen microbial communities.

Matthias Hess

16:30 - 17:00

Pre-processing of metagenomic datasets

The rapid increase of metagenomic projects is leading to an exponential growth of the sequence data, which in turn creates new challenges related to efficient data storage and analysis. This problem is expected to become more prominent as new sequencing technologies are adopted and large scale sequencing projects are carried out (e.g. HMP, GOS). The Genome Biology Group at the DOE-JGI is developing methods to address these challenges in metagenomic projects, which allow efficient compression of the datasets and representation without loss of sequence, contextual and functional information. These methods include the metafolds and proxygene clusters for Sanger and 454 based metagenomic datasets respectively. In many occasions these approaches allow the extraction of information that was previously undetected such as genomic variations and relationships between members of a group.

Kostas Mavrommatis

17:30 - 19:00

JGI tour - Poster session and reception

   


Tuesday

Time

Subject

Speaker

 

8:30 AM

Registration

   
 

Session III -  Microbial Genomics

   

09:00 - 09:30

Basic Bioinformatics Tools

Introduction to the concepts behind the most essential tools in computational biology and bioinformatics. These will include blast alignments, hidden Markov models, analysis using sequences, multiple sequence analysis, protein family classifications, and basics of phylogenetics.

Amrita Pati

09:30 - 10:00

Data Sources

Genome analysis and gene function prediction depends on the comparison of sequences to the existing information stored in databases. They can either be simple repositories of nucleotide or protein sequence, or contain curated information, related to the function of the genetic elements. Used in combination, bioinformatics databases constitute the most powerful method for gene function prediction. In this presentation databases commonly used for genome analysis will be discussed.

Kostas Mavrommatis

10:00 - 10:30

Sequence space Gene Clustering

To visualize and condense complex relational information, clustering techniques are often used. We will discuss two approaches; the purely mathematical and the sequence-based approach, along with some examples of both. K-means and spectral clustering of similarity matrices are examined along with protein family (PFAM, SCOP) methods

Sean Hooper

10:30 - 11:00

Break

11:00 - 11:30

Finding the genes in microbial genomes

Annotation of microbial genomes usually starts with finding the genes coding for stable RNAs (rRNA and tRNA) and protein-coding genes (CDSs). The principles underlying gene prediction in microbial genomes, as well as different implementations of these algorithms and most popular gene finding tools will be discussed.

Natalia Ivanova

11:30 - 12:00

Gene models Quality Control (Gene QC)

Accurate gene prediction is an indispensable step for correct subsequent genome analysis. All currently available tools for automatic gene-finding have a 10-15% error rate in their accuracy.  A methodology for gene model validation and manual curation will be presented.

Athanasios Lykidis

12:00 - 13:00

Lunch

13:00 - 13:30

Annotation: Function prediction and Metabolic Reconstruction

In this section we will discuss methodologies for assigning functions to gene products.  Methods based on homology, common motif occurrence, and chromosomal context will be presented.  The steps necessary to reconstruct the metabolic network of an organism will be presented.

Athanasios Lykidis

13:30 - 14:00

IMG Terms and Pathways

Description of the Control Vocabularies for the annotations in IMG (IMG Terms) and the curation of the IMG pathway database (IMG pathways)

Natalia Ivanova

14:00 - 14:45

Introduction to IMG

Nikos Kyrpides

14:45 - 15:15

Break

15:15 - 16:00

IMG Systems and Design Walk Through

Victor Markowitz

16:00 - 17:00

IMG-ER data submission – Hands on

 

 

 

Tutorials

Wednesday

Time

Subject

Speaker

 

8:30 AM

Registration

   
 

IMG Tutorial   (Genome annotation and analysis)

   

09:00 - 10:00

IMG  - Genes and Genomes

Microbial genome data analysis in IMG is set in the comparative context of multiple microbial genomes. IMG allows navigating the microbial genome data space along three key dimensions: genomes (organisms), functions (terms and pathways) and genes. In this section, IMG-based comparative analysis of gene families and genomes will be presented.  Tools that will be discussed include phylogenetic profiles and occurrences, homology-based and chromosomal context analysis, VISTA, abundance profiles, and genome clustering.

Athanasios Lykidis

10:00 - 11:15

Hands-on IMG (exercises)

Users

 

11:15 - 12:00

Exercise solutions

Athanasios Lykidis

12:00 - 13:00

Lunch

13:00 - 13:45

IMG  - Functions and Pathways

IMG has several ways for users to interact with protein functions and pathways, including Clusters of Orthologous Groups (COGs) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. In addition, JGI is developing a controlled vocabulary for the representation of functions and pathways known as IMG Terms and Pathways. The use of the various Functional Groups and their Pathways and their importance in comparative genome analysis will be presented and discussed.

Iain Anderson

13:45 - 14:15

MyIMG

The functional annotation for individual genes can be modified using the MyIMG Annotations features of MyIMG. In addition to curation of functional annotations, MyIMG provides support for uploading user genome selections that have been saved earlier from the Genome Browser or Genome Statistics and for setting systemwide user preferences. The use and functionality of MyIMG features will be discussed.

Iain Anderson

14:15 - 14:45

Gene Context Analysis in IMG

Kostas Mavrommatis

14:45 - 16:00

Hands-on IMG (exercises)

Users

 

16:00 - 17:00

Exercise solutions

Iain Anderson


Thursday

Time

Subject

Speaker

 
 

IMG/M Tutorial  (metagenome analysis)

   

09:00 - 10:00

IMG/M

Introduction to Metagenome analysis

A snapshot of microbial community structure can be derived from analysis of metagenomic data. IMG/M methods and tools for establishing the taxonomic identity of community members will be presented along with tools for determining the fine population structure, genetic variation and genome dynamics of the dominant populations. Methods for assessing the diversity and abundance of microbial communities will be discussed.

Natalia Ivanova

10:00 - 10:30

Statistical analysis of metagenomic datasets

The systematic evaluation of the relative abundances of individual as well as sets of protein functions across various metagenomic datasets, can yield statistically significant deductions about over- and under-representation of protein function(s) and biological pathways in these communities. We can derive statistical methods for comparing the relative abundances of both individual as well as sets of protein families in 2 given metagenomic datasets. Statistical models for modeling individual abundances and methods for identifying protein families whose difference in abundances are statistically significant, will be presented.

Amrita Pati

10:30 - 12:00

Hands-on IMG (exercises)

Users

 

12:00 - 13:00

Lunch - Hands-on continues

13:00 - 14:00

Exercise solutions

Natalia Ivanova

14:00 - 14:30

A Genome Analysis test case

The methodology and steps to analyze a genome in IMG will be presented with a user case

Kostas Mavrommatis

14:30 - 15:00

A MetaGenome Analysis test case

The methodology and steps to analyze a metagenome in IMG/M will be presented with a user case 

Athanasios Lykidis

15:00 - 15:30

Break

 

15:30 - 17:00

General Discussion & User’s Feedback

 


Friday

Time

Subject

Speaker

 
 

CAMERA, Greengenes & JGI Eukaryotic tutorials

   

09:15 - 10:00

CAMERA - I

CAMERA (http://camera.calit2.net/) stands for Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis. The aim of this project is to serve the needs of the microbial ecology research community by creating a rich, distinctive data repository and a bioinformatics tools resource that will address many of the unique challenges of metagenomic analysis

Paul Gilna

10:00 - 10:15

Break

10:15 - 12:00

CAMERA - II

CAMERA Tutorial

Michael Chiu

and

Shulei Sun

 

12:00 - 13:00

Lunch

13:00 - 14:15

Greengenes

Greengenes (http://greengenes.lbl.gov) is a web application assisting molecular ecologists with data analysis. Aligning 16S rRNA gene sequences, removing chimeras, and classifying the members of a microbial community against all of the five dominant bacterial and archaeal taxonomies will be covered. Two advanced methods will also be discussed: integration of PhyloChip community analysis with sequencing data and how to import your Greengenes pre-processed data into ARB for visualization. Participants may preview the online tutorial from the Greengenes website.

Todd DeSantis

14:15 - 14:30

Break

14:30 - 15:15

VISTA

The VISTA portal (http://genome.lbl.gov/vista) is a comprehensive comparative genomics resource that provides scientists with a single unified framework to generate and download multiple sequence alignments, visualize the results in the context of existing annotations and analyze comparative results in search for important sequence signals in alignments. Among the servers for user-submitted sequences are: GenomeVISTA for aligning a sequence (draft or finished) against whole genome assemblies, mVISTA and wgVISTA for globally aligning sequences of different species up to 10 Mb long, rVISTA that uses conservation among species to improve prediction of transcription factor binding sites, and Phylo-VISTA for visualization of multiple alignments with a phylogenetic tree.

Inna Dubchak

15:15 - 15:30

Break

15:30 - 17:00

Eukaryotic Tutorial

Andrea Aerts

 

End of Workshop

   

 

OnLine Tools

IMG

IMG/M

IMG-ER

IMG-EDU

Artemis

VISTA

Greengenes

ARB

GOLD

BLAST

ClustalX

COGs

EBI

Eukaryotic Portal

InterPro

KEGG

NCBI/GenBank

Pfam

PIR

Sequencher