pmc logo imageJournal ListSearchpmc logo image
Logo of plntcellJournal URL: redirect3.cgi?&&auth=0H502tWe6r5lE-KdMDbzJWPbGFSByYUDmgt7kzQH5&reftype=publisher&artid=135537&article-id=135537&iid=3974&issue-id=3974&jid=95&journal-id=95&FROM=Article|Banner&TO=Publisher|Other|N%2FA&rendering-type=normal&&http://www.plantcell.org
Plant Cell. 2001 April; 13(4): 979–988.
PMCID: PMC135537
Comparative Sequence Analysis Reveals Extensive Microcolinearity in the Lateral Suppressor Regions of the Tomato, Arabidopsis, and Capsella Genomes
Mathias Rossberg,1,2a Klaus Theres,1bc Adile Acarkan,a Rubén Herrero,3b Thomas Schmitt,c Karin Schumacher,4c Gregor Schmitz,b and Renate Schmidt5a
aMax-Delbrück-Laboratorium in der Max-Planck-Gesellschaft, 50829 Cologne, Germany
bMax-Planck-Institut für Züchtungsforschung, 50829 Cologne, Germany
cInstitut für Genetik, Universität zu Köln, 50829 Cologne, Germany
1These authors contributed equally to this work.
2Current address: AMGEN GmbH, Riesstrasse 25, 80992 München, Germany.
3Current address: Instituto Valenciano de Investigaciones Agrarias, Apartado Oficial, 46113 Moncada, Valencia, Spain.
4Current address: Zentrum für Molekularbiologie der Pflanzen–Pflanzenphysiologie, Universität Tübingen, Auf der Morgenstelle 1, 72076 Tübingen, Germany.
5To whom correspondence should be addressed. E-mail rschmidt/at/mpiz-koeln.mpg.de; fax 49-221-5062-613
Received October 16, 2000; Accepted January 28, 2001.
Abstract
A 57-kb region of tomato chromosome 7 harboring five different genes was compared with the sequence of the Arabidopsis genome to search for microsynteny between the genomes of these two species. For all five genes, homologous sequences could be identified in a 30-kb region located on Arabidopsis chromosome 1. Only two inversion events distinguish the arrangement of the five genes in tomato from that in Arabidopsis. Inversions were not detected when the arrangement of the five Arabidopsis genes was compared with the arrangement in the orthologous region of Capsella, a plant closely related to Arabidopsis. These results provide evidence for microcolinearity between closely and distantly related dicotyledonous species. The degree of microcolinearity found can be exploited to localize orthologous genes in Arabidopsis and tomato in an unambiguous way.
INTRODUCTION

Arabidopsis, a small crucifer, has been adopted as a model in plant genome analysis. The small genome size of 125 Mbp and a low number of repetitive elements facilitated the assembly of comprehensive molecular marker and clone contig maps for the five Arabidopsis chromosomes (reviewed in Schmidt, 1998). Sequence analysis of the genome has been completed (The Arabidopsis Genome Initiative, 2000). The 430-Mbp rice genome is a model for monocotyledonous plants, and the rice genome project aims to decipher the entire genomic sequence for this species (Sasaki and Burr, 2000). Equally detailed studies are not feasible for many other plant genomes at present, especially given the large genome sizes of most crop plants. It needs to be established if and how information generated on the Arabidopsis and rice genomes can be used for the study of other plant genomes.

Comparative genetic mapping experiments yielded evidence for the conservation of gene repertoire and colinear chromosome segments for related species. An extensive conservation of marker order was found for the 12 tomato and potato chromosomes, and five chromosomal inversions could explain differences in marker organization (Tanksley et al., 1992). For the Poaceae family, a remarkable degree of genome conservation could be established even between species that diverged as long as 60 million years ago and that differ considerably in genome size (reviewed in Gale and Devos, 1998). Comparing the genetic maps of Arabidopsis and different Brassica species also has revealed many colinear chromosome segments for species belonging to the Brassicaceae family (reviewed in Schmidt, 2000). The results of the first microsynteny studies using sequence-level resolution in the Poaceae (Chen et al., 1997; Messing and Llaca, 1998; Tikhonov et al., 1999) and Brassicaceae families (Grant et al., 1998; Acarkan et al., 2000) support the view that genome colinearity can be observed at the level of genes.

Few attempts to analyze genome colinearity between more distantly related species have been reported (Paterson et al., 1996; Devos et al., 1999; van Dodeweerd et al., 1999; Ku et al., 2000). The low degree of sequence homology in distantly related species hampers the unambiguous recognition of orthologous sequences, a prerequisite for studying colinearity relationships between species. With the completion of the Arabidopsis genomic sequencing project, comparisons with distantly related species can now rely on sequence homology if sequence information is generated for the other species. Tomato and Arabidopsis were chosen for such a comparative analysis because these species are representatives of two major clades of the eudicots, the asterids and rosids, respectively (Soltis et al., 1999). Colinear chromosome segments between distantly related species are expected to be small (Paterson et al., 1996); therefore, a microsynteny approach was taken to search for colinearity between the tomato and Arabidopsis genomes. The sequence of a region of the tomato genome spanning five genes including the Lateral suppressor gene (Schumacher et al., 1999) was determined. Sequence comparisons were performed to identify orthologous gene sequences in two species of the Brassicaceae family, Arabidopsis and Capsella. We sought to determine to what degree gene repertoire, order, spacing, and orientation were conserved in the three species analyzed.

RESULTS

Identification of Coding Sequences in a 57-kb Region of Tomato Chromosome 7
To search for microsynteny between the Arabidopsis and tomato genomes, we chose a region located between molecular markers CD61 (GenBank accession number AA824678) and CD65 (GenBank accession number AA824680) of tomato chromosome 7. The sequence of a cosmid contig spanning 57 kb of genomic DNA was determined (Figure 1; EMBL accession number AJ303345). Cosmid clones and a yeast artificial chromosome clone spanning this area were used as probes to isolate cDNAs corresponding to four different genes (Le-A, GenBank accession number AF098674; Le-B, EMBL accession number AJ303342; Le-D, EMBL accession number AJ303343; and Le-E, EMBL accession number AJ303344) from a tomato cDNA library (Schumacher et al., 1999). Sequence comparisons revealed that the cDNAs represent cognate sequences for genes on the cosmid contig that have between 99.8 and 100% identity to the tomato genomic DNA sequence. Analysis of the sequenced cosmid contig with the help of gene prediction programs (Eukaryotic GeneMark.hmm: http://dixie.biology.gatech.edu/GeneMark/eukhmm.cgi; Genscan: http://genes.mit.edu/GENSCAN.html) provided evidence for one additional gene in the region of interest, Le-C. Figure 1 shows the positions and orientations of the five different genes in the 57-kb tomato region.
Figure 1.Figure 1.
Arrangement of Genes in a 57-kb Genomic Region of Tomato.

A comparison of the genomic sequence with the tomato gene index (Quackenbush et al., 2000; http://www.tigr.org/tdb/lgi/) identified cognate expressed sequence tag (EST) sequences for two of the previously identified genes. One EST sequence (GenBank accession number AI772724) represents the 3′ untranslated region of cDNA Le-E, whereas six EST sequences (GenBank accession numbers BE459792, AW220369, BE458609, AW040531, AW441812, and AI485506) correspond to parts of cDNA Le-D. EST sequences AW220369 and BE458609 differ in the 5′ untranslated leader region compared with EST sequence BE459792 and cDNA Le-D. Sequence comparisons of these different EST sequences with the tomato genomic DNA sequence provide evidence for differential splicing in the 5′ untranslated region of gene Le-D (data not shown).

Le-A corresponds to the Lateral suppressor gene of tomato (GenBank accession number AF098674; Schumacher et al., 1999), Le-B represents a protein of unknown function, Le-C has similarity to receptor kinase–like proteins, Le-D contains a domain that is characteristic of the WRKY transcription factor family (Eulgem et al., 2000), and Le-E shows similarity to chloride channels.

Sequences Corresponding to Genes of the Tomato Cosmid Contig Map to Arabidopsis Bacterial Artificial Chromosome F20N2
Coding sequences that map to the 57-kb tomato region were used for a FASTA analysis (Pearson and Lipman, 1988) to search for corresponding sequences in the Arabidopsis genome. cDNAs A and B have the highest FASTA scores with sequences of bacterial artificial chromosomes (BACs) F20N2 (GenBank accession number AC002328) and T5A14 (GenBank accession number AC005223); cDNA E has the highest score with BAC F20N2. One of the molecular markers flanking the cosmid contig, CD65, also displays sequence homology with BAC F20N2. Gene Le-C has similar FASTA scores with sequences of BAC F20N2 and the P1 clone MRP15 (GenBank accession number AP000603). cDNA D shows homology with the sequence of BAC F20N2; however, higher matches are found for several other chromosomal regions of the Arabidopsis genome, all of which contain sequences homologous with the domain characteristic of the WRKY transcription factor family.

BACs F20N2 and T5A14 are partially overlapping clones (Figure 2) and map near molecular marker nga280 (83.8 centimorgans) to chromosome 1 of Arabidopsis (http://www.arabidopsis.org/). The homologies of the five tomato genes with Arabidopsis BAC clone F20N2 can be localized to a region spanning <30 kb (GenBank accession number AC002328; base pair 1 to 30,000). This region was subjected to a detailed analysis (see below). The area of sequence homology of CD65 with BAC F20N2 maps ~11 kb distant from this area.

Figure 2.Figure 2.
Gene At-C Is Part of a Large-Scale Duplication in the Arabidopsis Genome.

Arabidopsis Gene Sequences Map to the Region of Interest on Clone F20N2
For the region of Arabidopsis BAC F20N2 that shows homology with the tomato genes, five genes are predicted (http://mips.gsf.de/proj/thal/db/index.html: Atg55580, At1g55590, At1g55600, At1g55610, and At1g55620). The areas of homology between the different tomato gene sequences and the sequences of BAC F20N2 coincide with the predicted gene sequences in Arabidopsis. Accordingly, the Arabidopsis genes are designated A, B, C, D, and E, like their tomato counterparts (Figure 3B).
Figure 3.Figure 3.
Comparison of the Arrangement of Genes in Genomic Regions of Arabidopsis, Capsella, and Tomato.

In the immediate vicinity of gene At-E, two more genes have been predicted that do not show homology with the sequenced region of the tomato genome: a gene putatively coding for tRNASer and a gene of unknown function, At1g55630 (At-F; Figure 3B).

A comparison of the genomic sequence with Arabidopsis EST collections (Höfte et al., 1993; Newman et al., 1994) was performed. Four different EST sequences (21484, GenBank accession number N96681; 5787, GenBank accession number T42524; 22916, GenBank accession number W43308; and 701673971, GenBank accession number AI995414) were identified that showed homologies of [gt-or-equal, slanted]92% with the Arabidopsis region. Sequence analysis of clones 21484 (EMBL accession number AJ303346), 5787 (EMBL accession number AJ303347), and 22916 (EMBL accession number AJ303348) confirmed that they represent cognate cDNAs for genes At-B and At-E. cDNA clones 21484 and 5787 correspond to gene model At1g55590 (At-B). Clone 21484 is missing the first 18 nucleotides of the open reading frame (ORF), whereas clone 5787 is lacking the first 1481 nucleotides of the ORF. cDNA clone 22916 (At-E) shows similarity to gene model At1g55620; however, the cDNA sequence differs in the 3′ region from the predicted gene. EST 701673971 is a cognate sequence for gene At-F. Thus, for three of the seven predicted genes, experimental evidence could be obtained in Arabidopsis. Figure 3B shows the positions and orientations of the seven different genes in the 31,500-bp Arabidopsis region.

Evidence for a Duplication Event in the Arabidopsis Genome
BLAST (Altschul et al., 1990) and FASTA (Pearson and Lipman, 1988) analyses revealed that for gene Le-C, two homologs could be identified in Arabidopsis: At-C1, which maps to chromosome 1, and At-C2, which maps to chromosome 3 (MRP15; http://www.arabidopsis.org/). To determine whether gene C is part of a larger duplicated segment in the Arabidopsis genome, we used sequences of overlapping BAC clones T5A14 and F20N2 for BLAST searches with genomic Arabidopsis sequences. Clones mapping to chromosome 3 near molecular marker nga162 at 20.5 centimorgans (http://www.arabidopsis.org/) showed multiple matches. Figure 2 shows that 14 gene predictions of chromosome 1 clones T5A14 and F20N2 are homologous with gene predictions of clones MRP15 (GenBank accession number AP000603) and MDC11 (GenBank accession number AB024034) located on chromosome 3, which is indicative of a duplication event in the Arabidopsis genome. However, only one of the genes from the region of interest (gene C) is duplicated; genes A, B, D, E, and F and the putative tRNASer gene are absent from the region located on Arabidopsis chromosome 3 (Figure 2). For the copy of gene At-C, which is located on chromosome 3, a cognate cDNA sequence could be identified (EST RZL15e10F; GenBank accession number AV546538).

Identification of a Capsella Region Orthologous with the Arabidopsis Region
Capsella was included in the microsynteny studies to establish the degree of microcolinearity for the region of interest between Arabidopsis and a closely related species. Polymerase chain reaction (PCR) products corresponding to Arabidopsis genes At-A, At-B, At-D, and At-E were used as probes in colony hybridization experiments to identify Capsella cosmid clones containing sequences homologous with the region located on Arabidopsis chromosome 1.

On the basis of results from DNA gel blot hybridization experiments with the PCR products as probes, the resulting eight cosmids were arranged into a contig. Sequence analysis of part of the Capsella cosmid contig spanning 27,056 bp (EMBL accession number AJ303349) was performed to allow detailed comparisons of the genomic regions in Arabidopsis and Capsella. Alignments of predicted Arabidopsis genes (At-A, At1g55580; At-C, At1g55610; At-D, At1g55600, tRNASer; and At-F, At1g55630) and cDNA sequences (At-B, At1g55590; and At-E, EMBL accession number AJ303348) with the Capsella genomic sequence (EMBL accession number AJ303349) established the positions of genes Cr-A to Cr-F; however, only part of gene Cr-F is represented in the sequenced 27-kb region (Figure 3C).

Comparison of Gene Arrangements in the Tomato Cosmid Contig and in the Corresponding Genomic Regions in Arabidopsis and Capsella
All seven genes are arranged in the same order in Capsella and Arabidopsis. The orientation of the genes relative to each other is maintained, and intergenic regions are of similar size in both species. Thus, complete microcolinearity could be established for the region of interest in Arabidopsis and Capsella. Five of the genes also are present in tomato in close physical proximity; however, the order of genes in tomato differs from that in Arabidopsis and Capsella. Gene Le-A and gene pair Le-C and Le-D are present in an inverted orientation with respect to neighboring genes compared with the corresponding genes in Arabidopsis and Capsella. The region is approximately twofold larger in tomato than in the cruciferous species (Figure 3).

Sequence alignments of the tomato genes (Le-A to Le-E) and the predicted genes in Arabidopsis and Capsella were performed to compare gene structures and sequence identities of exon sequences. Numbers of exons are generally conserved, although for gene E an additional intron is found in tomato compared with Arabidopsis and Capsella. Pronounced differences in exon length are restricted to the 5′ and 3′ regions of genes B and E. In contrast, exons of gene D differ considerably in size in all three species (Figure 4).

Figure 4.Figure 4.
Comparisons of Gene Structures.

Sequence identities of >91% are observed at the nucleotide and amino acid levels for genes A, B, and E in Arabidopsis and Capsella (Table 1). The tRNASer genes are identical in sequence in both species. Only small stretches of sequences similarly highly conserved can be identified if intron and intergenic sequences are compared between Arabidopsis and Capsella. Tomato genes A, B, and E show between 56 and 69% sequence identity to the corresponding Arabidopsis and Capsella genes at the nucleotide level. For gene D, much lower levels of sequence identity are observed. The copies of Arabidopsis gene C show very similar sequence identity values to Le-C (Table 1). The sequence comparisons reveal that At-C1 is more closely related to Cr-C than to At-C2.

Table 1.Table 1.
Comparison of Exon Sequences of Genes A, B, C, D, and E in Arabidopsis, Capsella, and Tomato

Sizes of introns vary in the three species (Figure 4 and Table 2). Sizes of introns and intergenic sequences in tomato are on average two- to threefold larger than those in Arabidopsis or Capsella (Table 2).

Table 2.Table 2.
Composition of Regions Encompassing Genes A to E in Arabidopsis, Capsella, and Tomato

Hallmarks of retroelements were not found in any of the three genomic regions analyzed. In the tomato region, several perfect or imperfect tandem repeats with sizes of 31 to 85 bp and two to three copies were found in intergenic sequences. BLAST analyses (Altschul et al., 1990) revealed stretches of sequences (<300 bp) in intergenic regions and introns that show homologies of [gt-or-equal, slanted]80% with sequences in the analyzed region of the tomato genome and/or other genomic sequences for species of the Solanaceae family that are available in the databases. These results indicate the repetitive nature of these sequences.

DISCUSSION

The comparison of the tomato chromosome 7 region with the corresponding regions of the Arabidopsis and Capsella genomes indicates that microcolinearity can be established if species belonging to different families are studied at the sequence level.

Five genes are present in close physical proximity in all three species. For another two genes, a colinear arrangement could be shown in Arabidopsis and Capsella, but these two genes are not present in the sequenced region of the tomato cosmid contig. Most of the intergenic regions in tomato are expanded in size compared with those in Arabidopsis and Capsella; thus, it cannot be excluded that these genes also are present in the vicinity of gene Le-E in a colinear arrangement on tomato chromosome 7. It has not been established how large the colinear regions are in tomato and the cruciferous species. Molecular marker CD65 resides together with the analyzed 57-kb contig on a yeast artificial chromosome that spans 320 kb (Schumacher et al., 1999); hence, it should be separated by at most 267 kb from gene Le-E on the cosmid contig. Sequences homologous with CD65 in Arabidopsis are located <13 kb from gene At-E on chromosome 1. This region encompasses another five predicted genes in Arabidopsis. If the arrangement of all genes is conserved in tomato, a set of at least 11 genes would be colinear between these two species.

Ku et al. (2000) recently studied a region of tomato chromosome 2 and found evidence for microcolinearity when comparing the region at sequence level with the Arabidopsis genome. An average gene density of one gene per 6.2 kb was calculated for this segment of tomato chromosome 2 (Ku et al., 2000). For the region on tomato chromosome 7 described here, a lower gene density was found (Figure 3A and Table 2). Together, these data indicate that microcolinearity might be found in many different areas of the Arabidopsis and tomato genomes, even if the regions vary with respect to features such as gene density.

Interestingly, Ku et al. (2000) found that some homologous ORFs reside in reversed orientation in the region of tomato chromosome 2 and its Arabidopsis counterparts. The same situation is observed if the orientation of genes A and B relative to each other is compared in the tomato chromosome 7 region and the homologous Arabidopsis region (Figure 3). Ku et al. (2000) propose that genes present in reverse orientation despite residing in otherwise conserved colinear regions might have resulted from inverted gene duplications followed by loss of the gene copy in the original orientation. Alternately, such changes in the arrangement of genes could be explained by inversion events. Such a mechanism is likely for the observed differences in gene order of genes B, C, D, and E in Arabidopsis and tomato described here (Figure 3).

Tomato and the two species of the Brassicaceae family, Arabidopsis and Capsella, are representative of two major clades of the eudicots (Soltis et al., 1999). The extensive colinearity seen for these distantly related species in the region of interest suggests that such a pattern could also be observed if genomic regions derived from other dicotyledonous plants were compared. Comparative analysis of sequence information generated for many different genomic regions of various dicotyledonous and monocotyledonous plants is needed to assess the degree of microcolinearity between distantly related species in a more comprehensive and systematic manner.

The microcolinearity study presented here for regions of the Arabidopsis and Capsella genomes shows a very extensive conservation of genome structure. Gene repertoire, order, and orientation in the studied 31.5-kb region of the Arabidopsis genome are identical to those in Capsella; furthermore, the genes are present in a similarly sized region in both species (Figure 3). These data are in agreement with results obtained by comparing a region of Arabidopsis chromosome 4 with its counterpart in Capsella (Acarkan et al., 2000) and provide additional evidence for genome colinearity of diploid species of the Brassicaceae family. Extensive colinearity at the gene level also was found in detailed microsynteny studies in the Poaceae family (Chen et al., 1997; Messing and Llaca, 1998; Tikhonov et al., 1999).

In Arabidopsis, gene C is present in two copies. The duplicated gene is part of a recently described large duplicated segment located on Arabidopsis chromosomes 1 and 3 (The Arabidopsis Genome Initiative, 2000; Blanc et al., 2000). The gene repertoire in the Arabidopsis chromosome 1, Capsella, and tomato regions studied is identical but different from that on Arabidopsis chromosome 3 (Figures 2 and 3). Furthermore, analysis of those amino acid positions, which differ in genes At-C1, At-C2, and Cr-C, clearly reveals that gene At-C1, which is located on chromosome 1, is more similar to Cr-C than to At-C2, which maps to chromosome 3 (data not shown). Results of hybridization experiments show that sequences homologous with gene C are found not only in the sequenced region of the Capsella genome (Figure 3C) but also in the immediate vicinity of sequences homologous with genes 8 and 9 (Figure 2). This arrangement is similar or identical to that found for the Arabidopsis chromosome 3 region (data not shown). These data indicate that the duplication event of the region harboring gene C most likely took place before the progenitors of Arabidopsis and Capsella diverged. Comparative physical mapping studies using Arabidopsis and Brassica oleracea also revealed that another duplication event in the Arabidopsis genome predates the divergence of the progenitors of those two species (O'Neill and Bancroft, 2000). Similarly, comparative studies of soybean and Arabidopsis (Grant et al., 2000) as well as tomato and Arabidopsis (Ku et al., 2000) have confirmed the presence of large duplicated segments in the Arabidopsis genome. Comparative analysis of duplicated segments in the Arabidopsis and Brassica genomes has revealed that they show differences in gene repertoire; however, the order of the genes that are in common in both regions was found to be very similar or identical (The Arabidopsis Genome Initiative, 2000; Blanc et al., 2000; O'Neill and Bancroft, 2000). The same pattern was observed in the duplicated region of the Arabidopsis genome studied here (Figure 2). The results of a comparison of the gene arrangement in triplicated Brassica segments among each other and with the corresponding regions in Arabidopsis suggested that differences in gene repertoire are caused by deletions of genes (The Arabidopsis Genome Initiative, 2000; O'Neill and Bancroft, 2000). The same conclusion was reached when it was established that a region of tomato chromosome 2 showed conservation of gene repertoire and order with four different segments of the Arabidopsis genome (Ku et al., 2000).

A comparison of Arabidopsis cDNA sequences or gene predictions with genomic sequences of Arabidopsis and Capsella identified conserved exon sizes and sequence identities of >90% for exon sequences of four of the five genes analyzed, as has been established for a set of four different genes (Acarkan et al., 2000). In contrast, exon sizes of gene D differ in both species, and sequence identities of exon sequences are only ~80% at the nucleotide level. This low degree of sequence conservation coincides with the finding that gene D belongs to the WRKY family of transcription factors, which is composed of >100 representatives in the Arabidopsis genome. The different Arabidopsis gene copies show highly divergent structures but a strong conservation of the WRKY domain (Eulgem et al., 2000). The Le-D gene harbors two WRKY domains (designated D1 and D2), whereas the Arabidopsis (At-D) and Capsella (Cr-D) genes contain only one. Comparison of the sequences of only the WRKY domains (WRKYGQK…HXH) of the D genes of Arabidopsis, Capsella, and tomato shows that the C-terminal domain (D2) of the tomato gene is more similar to the domains in Arabidopsis and Capsella than is the N-terminal domain (sequence identities at the amino acid level: At-D to Cr-D, 86.5%; At-D to Le-D1, 56.9%; At-D to Le-D2, 71.2%; Cr-D to Le-D1, 58.8%; Cr-D to Le-D2, 69.2%; and Le-D1 to Le-D2, 58.8%). Furthermore, the position of the fourth intron is conserved in the C-terminal WRKY domain in the tomato gene and in the WRKY domains of the Arabidopsis and Capsella genes; in contrast, the N-terminal WRKY domain (Le-D1) is not interrupted by an intron.

The example of gene D indicates the utility of microcolinearity studies. For large gene families, it can be difficult to define unambiguously the orthology of genes derived from different species, especially if only subsets of sequences are known. FASTA searches were performed with all genes of the tomato chromosome 7 region to find the corresponding sequences in the Arabidopsis genome. Interestingly, for all genes, with the exception of gene Le-D, the homologous Arabidopsis chromosome 1 or the duplicated chromosome 3 region was identified unambiguously. For the much faster evolving gene D, several additional regions were identified, some of which even showed higher FASTA scores. This finding shows that for such genes orthology can be defined unambiguously only if sequence information is combined with data on neighboring sequences.

This approach was followed for the Lateral suppressor gene, Le-A (Schumacher et al., 1999). Le-A is a member of the GRAS gene family, for which several members with regulatory functions have been described in Arabidopsis (GAI [Peng et al., 1997], RGA [Silverstone et al., 1998], SCR [Di Laurenzio et al., 1996]). Several sequences of unknown function also are found in the Arabidopsis genome (Pysh et al., 1999). However, only one of these Arabidopsis gene sequences, At-A, is located adjacent to a gene homologous with the neighboring gene Le-B in tomato. Thus, it can be concluded that Le-A and At-A are derived from the same ancestral gene. Analysis of gene function was performed for gene At-A to determine whether At-A and the Lateral suppressor gene perform similar or identical functions in Arabidopsis and tomato. Such studies indicate that Le-A and At-A are indeed functional orthologs (K. Theres, unpublished results). Thus, microcolinearity studies can be used successfully to locate orthologous gene sequences in distantly related plant species.

In the tomato genomic region analyzed, intergenic regions are much larger than the corresponding regions of the Arabidopsis and Capsella genomes. The abundance of retroelement-like sequences in the maize genome (SanMiguel et al., 1996) is correlated with the size differences observed in intergenic sequences of the sorghum and maize genomes (Chen et al., 1997; Tikhonov et al., 1999). However, none of the intergenic sequences analyzed here for Arabidopsis, Capsella, and tomato show the hallmarks of retroelement-like sequences. The absence of such sequences in the Arabidopsis region is in accordance with the low abundance of these elements in the genome of Arabidopsis (reviewed in Schmidt, 1998).

METHODS

Isolation of Tomato Cognate cDNAs
A tomato (Lycopersicon esculentum cv VFNT Cherry) shoot tip cDNA library was screened using yeast artificial chromosome 61-5, the whole cosmid contig, or the insert of cosmid G as a probe (Figure 1). Partial cDNA sequences of genes Le-A, Le-B, Le-D, and Le-E were complemented with the rapid amplification of cDNA ends technique (Life Technologies, Eggenstein, Germany) using specific primers deduced from the respective cDNAs. Amplified fragments were cloned into the pGEM-T (Promega, Mannheim, Germany) plasmid vector.

The genomic tomato sequence of the region of interest was used for a BLAST analysis (Altschul et al., 1990) to search for corresponding tomato expressed sequence tag (EST) sequences.

Isolation of Capsella Cosmid Clones
A library of 46,000 Capsella rubella cosmid DNA clones was screened by colony hybridization. Preparation of library filters, hybridization, and washing conditions were as described by Acarkan et al. (2000).

Isolation of Arabidopsis Cognate cDNAs
The genomic Arabidopsis thaliana sequence of the region of interest (F20N2; GenBank accession number AC002328) was used for a BLAST analysis (Altschul et al., 1990) to search for corresponding Arabidopsis EST sequences. Putative cognate cDNA clones were obtained from the Arabidopsis Biological Resource Center (Ohio State University, Columbus).

Subcloning and Sequencing
Cosmid clones were restricted with appropriate enzymes and cloned into pGEM vectors (Promega). DNA sequences of subclones and polymerase chain reaction (PCR) products were determined on Applied Biosystems (Weiterstadt, Germany) Abi Prism 377 and 3700 sequencers using BigDye-terminator chemistry by the DNA core facility of the Max-Planck-Institut für Züchtungsforschung (Cologne, Germany). Premixed reagents were from Applied Biosystems. Oligonucleotides were purchased from Life Technologies or Metabion (Martinsried, Germany). Gene sequences were sequenced on both strands. Analysis of sequences was performed using the Wisconsin Package (version 10.0-UNIX; Genetics Computer Group, Madison, WI). Sequence alignments of gene sequences were restricted to regions from start to stop codons and were determined with the GAP program using default parameters (for alignments of nucleotide sequences, gap creation penalty 50 and gap extension penalty 3; for alignments of amino acid sequences, gap creation penalty 8 and gap extension penalty 2).

Acknowledgments

We are grateful to E. Tillmann and E. Schäfer for excellent technical assistance. This work would not have been possible without the data on the Arabidopsis genome that was generated by the Arabidopsis genome initiative. Arabidopsis ESTs were provided by the Arabidopsis Biological Resource Center (Ohio State University, Columbus). This research was funded by grants from the Deutsche Forschungsgemeinschaft to K.T., by Grant 0311107 from the Bundes-ministerium für Bildung und Forschung to R.S., and by a European Community Marie Curie Research Training Grant to R.H.

References
  • Acarkan, A., Rossberg, M., Koch, M., and Schmidt, R. (2000). Comparative genome analysis reveals extensive conservation of genome organisation for Arabidopsis thaliana and Capsella rubella. Plant J. 23:, 55–62. [PubMed].
  • Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. (1990). Basic local alignment search tool. J. Mol. Biol. 215:, 403–410. [PubMed].
  • The Arabidopsis Genome Initiative. (2000). Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:, 796–815. [PubMed].
  • Blanc, G., Barakat, A., Guyot, R., Cooke, R., and Delseny, M. (2000). Extensive duplication and reshuffling in the Arabidopsis genome. Plant Cell 12:, 1093–1102. [PubMed].
  • Chen, M., SanMiguel, P., de Oliveira, A.C., Woo, S.-S., Zhang, H., Wing, R.A., and Bennetzen, J.L. (1997). Microcolinearity in sh2-homologous regions of the maize, rice, and sorghum genomes. Proc. Natl. Acad. Sci. USA 94:, 3431–3435. [PubMed].
  • Devos, K.M., Beales, J., Nagamura, Y., and Sasaki, T. (1999). Arabidopsis-rice: Will colinearity allow gene prediction across the eudicot–monocot divide? Genome Res. 9:, 825–829. [PubMed].
  • Di Laurenzio, L., Wysocka-Diller, J., Malamy, J.E., Pysh, L., Helariutta, Y., Freshour, G., Hahn, M.G., Feldmann, K.A., and Benfey, P.N. (1996). The SCARECROW gene regulates an asymmetric cell division that is essential for generating the radial organization of the Arabidopsis root. Cell 86:, 423–433. [PubMed].
  • Eulgem, T., Rushton, P.J., Robatzek, S., and Somssich, I.E. (2000). The WRKY superfamily of plant transcription factors. Trends Plant Sci. 5:, 199–206. [PubMed].
  • Gale, M.D., and Devos, K.M. (1998). Comparative genetics in the grasses. Proc. Natl. Acad. Sci. USA 95:, 1971–1974. [PubMed].
  • Grant, D., Cregan, P., and Shoemaker, R.C. (2000). Genome organization in dicots: Genome duplication in Arabidopsis and synteny between soybean and Arabidopsis. Proc. Natl. Acad. Sci. USA 97:, 4168–4173. [PubMed].
  • Grant, M.R., McDowell, J.M., Sharpe, A.G., de Torres Zabala, M., Lydiate, D.J., and Dangl, J.L. (1998). Independent deletions of a pathogen-resistance gene in Brassica and Arabidopsis. Proc. Natl. Acad. Sci. USA 95:, 15843–15848. [PubMed].
  • Höfte, H., et al. (1993). An inventory of 1152 expressed sequence tags obtained by partial sequencing of cDNAs from Arabidopsis thaliana. Plant J. 4:, 1051–1061. [PubMed].
  • Ku, H.-M., Vision, T., Liu, J., and Tanksley, S.D. (2000). Comparing sequenced segments of the tomato and Arabidopsis genomes: Large-scale duplication followed by selective gene loss creates a network of synteny. Proc. Natl. Acad. Sci. USA 97:, 9121–9126. [PubMed].
  • Messing, J., and Llaca, V. (1998). Importance of anchor genomes for any plant genome project. Proc. Natl. Acad. Sci. USA 95:, 2017–2020. [PubMed].
  • Newman, T., de Bruijn, F.J., Green, P., Keegstra, K., Kende, H., McIntosh, L., Ohlrogge, J., Raikhel, N., Somerville, S., Thomashow, M., Retzel, E., and Somerville, C. (1994). Genes galore: A summary of methods for accessing results from large-scale partial sequencing of anonymous Arabidopsis cDNA clones. Plant Physiol. 106:, 1241–1255. [PubMed].
  • O'Neill, C.M., and Bancroft, I. (2000). Comparative physical mapping of segments of the genome of Brassica oleracea var. alboglabra that are homoeologous to sequenced regions of chromosomes 4 and 5 of Arabidopsis thaliana. Plant J. 23:, 233–244. [PubMed].
  • Paterson, A.H., et al. (1996). Toward a unified genetic map of higher plants, transcending the monocot–dicot divergence. Nat. Genet. 14:, 380–382. [PubMed].
  • Pearson, W.R., and Lipman, D.J. (1988). Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85:, 2444–2448. [PubMed].
  • Peng, J., Carol, P., Richards, D.E., King, K.E., Cowling, R.J., Murphy, G.P., and Harberd, N.P. (1997). The Arabidopsis GAI gene defines a signaling pathway that negatively regulates gibberellin responses. Genes Dev. 11:, 3194–3205. [PubMed].
  • Pysh, L.D., Wysocka-Diller, J.W., Camilleri, C., Bouchez, D., and Benfey, P.N. (1999). The GRAS gene family in Arabidopsis: Sequence characterization and basic expression analysis of the SCARECROW-LIKE genes. Plant J. 18:, 111–119. [PubMed].
  • Quackenbush, J., Liang, F., Holt, I., Pertea, G., and Upton, J. (2000). The TIGR gene indices: Reconstruction and representation of expressed gene sequences. Nucleic Acids Res. 28:, 141–145. [PubMed].
  • SanMiguel, P., Tikhonov, A., Jin, Y.-K., Motchoulskaia, N., Zakharov, D., Melake-Berhan, A., Springer, P.S., Edwards, K.J., Lee, M., Avramova, Z., and Bennetzen, J.L. (1996). Nested retrotransposons in the intergenic regions of the maize genome. Science 274:, 765–768. [PubMed].
  • Sasaki, T., and Burr, B. (2000). International genome sequencing project: The effort to completely sequence the rice genome. Curr. Opin. Plant Biol. 3:, 138–141. [PubMed].
  • Schmidt, R. (1998). The Arabidopsis thaliana genome: Towards a complete physical map. In Arabidopsis: Annual Plant Reviews, Vol. I, M. Anderson and J.A. Roberts, eds (Sheffield, UK: Sheffield Academic Press), pp. 1–30.
  • Schmidt, R. (2000). Synteny: Recent advances and future prospects. Curr. Opin. Plant Biol. 3:, 97–102. [PubMed].
  • Schumacher, K., Schmitt, T., Rossberg, M., Schmitz, G., and Theres, K. (1999). The Lateral suppressor (Ls) gene of tomato encodes a new member of the VHIID protein family. Proc. Natl. Acad. Sci. USA 96:, 290–295. [PubMed].
  • Silverstone, A.L., Ciampaglio, C.N., and Sun, T.-P. (1998). The Arabidopsis RGA gene encodes a transcriptional regulator repressing the gibberellin signal transduction pathway. Plant Cell 10:, 155–169. [PubMed].
  • Soltis, P.S., Soltis, D.E., and Chase, M.W. (1999). Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology. Nature 402:, 402–404. [PubMed].
  • Tanksley, S.D., et al. (1992). High density molecular linkage maps of the tomato and potato genomes. Genetics 132:, 1141–1160. [PubMed].
  • Tikhonov, A.P., SanMiguel, P.J., Nakajima, Y., Gorenstein, N.M., Bennetzen, J.L., and Avramova, Z. (1999). Colinearity and its exceptions in orthologous adh regions of maize and sorghum. Proc. Natl. Acad. Sci. USA 96:, 7409–7414. [PubMed].
  • van Dodeweerd, A.M., Hall, C.R., Bent, E.G., Johnson, S.J., Bevan, M.W., and Bancroft, I. (1999). Identification and analysis of homoeologous segments of the genomes of rice and Arabidopsis thaliana. Genome 42:, 887–892. [PubMed].