pmc logo imageJournal ListSearchpmc logo image
Logo of geneticsJournal URL: redirect3.cgi?&&auth=0Xsk3gr0bfKeuG3OvTWsDA9AJyV7CxZWD1pQ3CAHk&reftype=publisher&artid=1456376&article-id=1456376&iid=130868&issue-id=130868&jid=301&journal-id=301&FROM=Article|Banner&TO=Publisher|Other|N%2FA&rendering-type=normal&&http://www.genetics.org/
Genetics. 2006 April; 172(4): 2269–2281.
doi: 10.1534/genetics.105.052746.
PMCID: PMC1456376
Concerted Evolution of Two Novel Protein Families in Caenorhabditis Species
James H. Thomas1
Department of Genome Sciences, University of Washington, Seattle, Washington 98195
1Address for correspondence: Department of Genome Sciences, Box 357360, University of Washington, Seattle, WA 98195. E-mail: jht/at/u.washington.edu
Communicating editor: S. Yokoyama
Received October 22, 2005; Accepted January 16, 2006.
Abstract
Among a large number of homologous gene clusters in C. elegans, two gene families that appear to undergo concerted evolution were studied in detail. Both gene families are nematode specific and encode small secreted proteins of unknown function. For both families in three Caenorhabditis species, concerted groups of genes are characterized by close genomic proximity and by genes in inverted orientation. The rate of protein evolution in one of the two families could be calibrated by comparison with a closely related nonconcerted singleton gene with one-to-one orthologs in all three species. This comparison suggests that protein evolution in concerted gene clusters is two- to sevenfold accelerated. A broader survey of clustered gene families, focused on adjacent inverted gene pairs, identified an additional seven families in which concerted evolution probably occurs. All nine identified families encode relatively small proteins, eight of them encode putative secreted proteins, and most of these have very unusual amino acid composition or sequence. I speculate that these genes encode rapidly evolving antimicrobial peptides.
 
THE genomes of plants and animals contain an abundance of gene families that arose by duplication and divergence of ancestral genes. In many cases, some or all members of a gene family are closely related to each other in sequence, suggesting that they arose by recent duplication events, are subject to intense purifying selection, or undergo concerted evolution as a result of intergenic recombination. These explanations can be difficult to distinguish, yet their mechanism of origin and their evolutionary implications are very different. The increasingly common availability of multiple related whole-genome sequence assemblies offers a potentially general method for identifying cases of concerted evolution. If a group of closely related genes is identified in one species and a corresponding group of genes is identified in other related species, these genes are candidates for undergoing concerted evolution. If the groups of genes in all the species have diverged very little from each other, this situation can be explained, without invoking concerted evolution, by strong purifying selection resulting in a low rate of sequence evolution. This explanation suffices to explain evolution in many well-known families, including histone and ubiquitin gene families (Nei and Rooney 2004). However, if the groups of genes in each species have diverged substantially from each other, yet have retained a high degree of sequence identity within each species, the situation is best explained by concerted evolution. In effect, due to ongoing genetic exchange among different genes within each species, the genes as a group can behave as if they were alleles of a single gene. Upon speciation, the genes can continue to evolve as groups in species-specific lineages in a manner similar to divergence of single genes.

In Caenorhabditis elegans, gene duplications occur predominantly in tandem and locally, resulting in two nearby identical (or nearly identical) copies of the gene (Semple and Wolfe 1999). The proximity and identity of new duplicate genes is likely to permit genetic exchange between the two genes, either by gene conversion or by unequal crossing over (the latter will also result in unstable gene copy number). When such genetic exchange occurs over a sufficiently long time, it results in concerted evolution of the duplicate genes. It is unknown how often such exchange happens, how long it persists over time, and how much it contributes to the pattern of divergence of new duplicates. The duration of incipient concerted evolution of duplicate genes will depend on factors including the frequency of intergenic recombination, the rate of mutation, the degree of dependence of intergenic recombination on sequence identity, and the time of retention of both copies of the duplicate genes. It is clear that duplicate genes frequently escape from concerted evolution even when they remain adjacent in the genome, as evidenced by hundreds of cases of local duplicates that have diverged from each other substantially (Semple and Wolfe 1999; Robertson 2000, 2001; Coghlan and Wolfe 2004; Chen et al. 2005; Thomas 2005; Thomas et al. 2005). Indeed, only a single case of clear concerted evolution has been described in nematodes, involving the Hsp70-7 and Hsp70-8 gene pair (Nikolaidis and Nei 2004). From a survey of clustered homologous genes in C. elegans (Thomas 2005) and comparison of those genes with the sequences of the related nematodes C. briggsae and C. remanei, I report several additional cases of clustered genes that probably undergo concerted evolution. One of these cases permits a comparison of evolutionary rate in the concerted genes and nonconcerted relatives; this comparison suggests that protein sequence evolution in the concerted genes is accelerated two- to sevenfold.

MATERIALS AND METHODS

Gene annotation:
A few of the gene models in C. elegans and C. briggsae required correction and all of the genes in C. remanei were new predictions. These predictions were based on protein motif searches, tblastn searches, and conserved intron position essentially as previously described (Thomas et al. 2005). All genes in the Nspb and Nspc families were identified and annotated, with the possible exception of unsequenced regions in C. briggsae and C. remanei. Because of the high degree of similarity in each family and the availability of multiple gene family members, the gene models could be derived with considerable confidence; in addition, several genes in the Nspc family have EST sequence support. A few genes in each family appeared to extend into unsequenced gaps between contigs and a few others had clear defects, mostly deletions at one end of the gene. These were classified as “incomplete sequence” and pseudogenes, respectively. Phylogenetic trees and alignments are shown only for the putative functional genes with complete sequence, and in some figures the other types of genes are shown marked “i” (incomplete sequence) or “ψ” (pseudogene). Nspb proteins correspond to PFAM07312 (DUF1459), which is annotated as nematode specific (http://www.sanger.ac.uk/software/pfam/2005). Psi-blast searches on 7/13/2005 on the NCBI nr database initiated with NSPB or NSPC proteins failed to identify any members in species outside of Caenorhabditis. Full annotation information was provided to WormBase (http://wormbase.org/).

The unusually high degree of nucleotide similarity among genes, which also includes introns, raised the possibility that these are not protein-coding genes but instead are aberrant gene predictions on the basis of a midrepeat DNA sequence that has fortuitous features consistent with a coding gene. Several analyses taken together rule this possibility out: (1) comparative codon analysis indicates low dN/dS ratios characteristic of coding sequence; (2) several of the genetic variants in the Nspb family are indels that are invariably in frame; (3) intron positions within groups are perfectly conserved; (4) several genes in the Nspc family are abundantly transcribed, as indicated by multiple EST sequences (http://wormbase.org/); and (5) patterns of gene conservation across the three species are consistent with coding sequence (e.g., retention of intron position but not intron sequence).

DNA distances and dN/dS analysis:
DNA distances were computed for genomic DNA extending from the ATG start codon through the stop codon, including intron sequence. DNA sequences were aligned using ClustalX with default settings (Jeanmougin et al. 1998) and distances were computed from the multiple alignment by the dnadist program from PHYLIP (Felsenstein 1993), using the Kimura 2-parameter model and transition/transversion ratio of 1.7 (Denver et al. 2004). dN/dS values were computed by the codeml program from PAML 3.14 (Yang 1997). Proteins were aligned by ClustalX (default settings) and in some cases were hand adjusted with Bonsai 1.1 (Thomas 2004). The protein alignment was used to generate the corresponding codon alignment and codeml was run in pairwise mode, with a transition/transversion ratio of 1.7 (Denver et al. 2004).

Shared DNA:
VISTA plots (AVID alignment algorithm) were generated, comparing genomic sequence for each gene in a cluster to each other cluster member. The points at which the VISTA plot rose above 75% sequence identity were used as the endpoints of shared sequence. In most cases, the change to shared sequence was abrupt and the boundary was clear; in a few cases there was a region of variable alignment (none > ~100 nt), apparently where the last genetic exchange was old. In these cases, the region of contiguous high identity was used as the length of shared sequence.

Identification of additional candidates for concerted evolution:
An all-by-all blastp search was conducted using the complete set of predicted proteins from WormBase release WS148. All matches with 90% amino acid identity or higher were collected. Annotated transposases and a few known families were removed (histone, MSP, collagen, tubulin) to simplify analysis; concerted evolution in these large families is thus possible and was not investigated. The remaining matches were screened manually for cases that involved nearby inverted genes and each of these cases was investigated individually. A few cases were discarded when gene model conflicts or other complications appeared too severe to easily resolve. For the remaining candidate duplicate genes, predictions were generated for C. briggsae and C. remanei using a combination of existing C. briggsae predictions, a GeneWise (Birney et al. 2004) prediction pipeline guided by the C. elegans candidate proteins, and hand prediction. Probable concerted evolution was inferred when sequences from the three species clearly formed species-specific groups, consistent with a pattern similar to that documented for the Nspb and Nspc families. Most cases did not meet this criterion: the genes from the three species formed ortholog matches or more complex relationships suggesting birth–death evolution rather than concerted evolution.

RESULTS

Gene identification:
Using comparative genomics, genes undergoing concerted evolution can be recognized by the presence of groups of genes whose sequences appear to evolve as if they were alleles of a single gene (Figure 1 shows two real examples, marked with bars). Genetic exchange among genes in such groups is probably facilitated by physical clustering of the genes. In previous work, I identified a large number of gene families that include clustered genes in the C. elegans genome (Thomas 2005). Most such gene clusters appear to arise by local gene duplication followed by divergence, without any obvious indication of concerted evolution among duplicate copies (data not shown). However, two gene families consisted of clusters of genes encoding proteins with unusually high protein sequence similarity. Findings presented below confirm that clustered genes within these two families undergo concerted evolution. Both families are predicted to encode small secreted proteins that are unrelated to each other and unrelated to any known proteins outside of nematodes. They have been assigned the gene family names nspb and nspc (nematode-specific peptide families b and c). The specific function of the genes is unknown. RNA-interference-mediated knockdown of most genes caused no gross phenotype, and the one group with a phenotype (nspb-15) varied from lethal to sterile (Kamath and Ahringer 2003). Several of the genes are known to be transcribed on the basis of EST analysis, but none have a known tissue expression pattern and other annotations provide no obvious clue to function (http://wormbase.org/).
Figure 1.Figure 1.
Protein tree and alignment for the Nspb family. Unrooted protein distance tree and alignments for all full-length NSPB proteins from C. elegans, C. briggsae, and C. remanei. In the tree, C. elegans proteins are labeled green, C. briggsae blue, and C. (more ...)

Figure 1 shows a protein distance tree for all full-length NSPB proteins from C. elegans, C. briggsae, and C. remanei. NSPB proteins fall into two subfamilies that are related to each other but align poorly; a multiple alignment within each subfamily is shown next to the tree. Figure 2 shows a similar tree and alignment for NSPC proteins, all of which align well with each other. In both families, a prominent feature of the protein trees is that each of the three species encodes groups of proteins that are more closely related to each other than they are to any protein from the other species. Although it is formally possible that multiple recent duplications occurred independently in each species, results presented below confirm that the pattern results from concerted evolution of physically clustered genes.

Figure 2.Figure 2.
Protein tree and alignment for the Nspc family. Unrooted protein distance tree and alignments for a subset of full-length NSPC proteins from C. elegans, C. briggsae, and C. remanei. Orange squares mark 10 conserved Cys residues, which are likely to form (more ...)

Nspb genomic clusters:
In C. elegans, the 12 nspb genes fall into two subfamilies by various criteria, including protein similarity and the position of their single intron (Figure 1). I will refer to these subfamilies as group A and group B. All six genes in group A are on chromosome IV, and five of these (nspb-1–5) are in a compact genomic cluster (Figure 3) and are very closely related in sequence. The sixth gene in group A, nspb-6, is ~690 kb away from this cluster and is substantially divergent from them in sequence. All six group A genes have a single predicted intron in the same position relative to the aligned proteins (Figure 1). Four of the six genes have 195 nt of coding sequence in exon 1 and 51 nt of coding sequence in exon 2; exons in the other two genes differ by one codon in length from this norm. A strikingly similar pattern is seen for the five group B genes on chromosome II: four genes (nspb-710) are in a compact cluster and are very similar in sequence, and one gene (nspb-11) is located ~165 kb away and is substantially divergent in sequence. All the group B genes have 14 nt of coding sequence in exon 1, all four compactly clustered genes have 223 nt of coding sequence in exon 2, and the distant gene has a single extra codon. Clustering patterns in both groups in C. briggsae and C. remanei are similar to those in C. elegans, except that all of the genes are in tight clusters and C. remanei group B genes are in two separate clusters (Figure 3). The final nspb gene, nspb-12, is on a chromosome different from either cluster and is considered separately below.
Figure 3.Figure 3.
Schematic of Nspb clusters. All clustered nspb genes from C. elegans, C. briggsae, and C. remanei. nspb genes on the plus strand are pink, nspb genes on the minus strand are blue green, and non-nspb genes are white (shown only for C. elegans). For C. (more ...)

I interpret the Nspb cluster arrangements in C. elegans in the following way. Each gene cluster undergoes occasional genetic exchange among genes in its cluster, thereby maintaining a high degree of sequence similarity. The two genes that are farther away on the same chromosome are singleton escapers from their nearby cluster, which were once part of the cluster but were separated by genome rearrangement and subsequently underwent individual evolution. Further support for these interpretations is presented below.

Nspc genomic clusters:
The genomic arrangement of the nspc genes in C. elegans follows a pattern similar to that of the nspb genes. The 18 full-length nspc genes are arranged in three clusters of five, six, and seven genes (Figure 4). Gene arrangements in C. briggsae and C. remanei are broadly similar (data not shown). All the nspc genes have identical intron positions, and genes within each cluster have high nucleotide identity. A few differences from the nspb genes are apparent as well: there are no escaper singleton genes for any of the clusters, the genomic arrangement and divergence of genes in C. briggsae and C. remanei differ more from C. elegans than for nspb genes, and there are no internal indel differences among proteins (compare Figures 1 and 2). I conclude that there are several core features that characterize both Nsbp and Nspc gene families: strong gene clustering, probable genetic exchange among clustered genes, and divergence among physically separated genes.
Figure 4.Figure 4.
Schematic of Nspc clusters. All nspc genes from C. elegans. Markings are as in Figure 3. Clusters in C. briggsae and C. remanei had generally similar arrangements, except that there are more clusters in C. remanei.

Inverted orientation of gene pairs:
Gene clusters in both the Nspb and Nspc families have a strong tendency to consist of two or more pairs of closely spaced genes with inverted orientation (Figures 3 and 4). In some clusters, one or two additional nspb or nspc genes are added to this configuration, with the additional gene often located at a greater distance from the main cluster. In other clusters, one gene from this standard configuration is missing or is a probable pseudogene. I interpret both types of exceptional cases to represent genome rearrangements or mutations that occurred relatively recently. The inverted pairs of genes are typically within a few kilobases of each other, and there is often an unrelated gene or two in between adjacent pairs (in Figure 3 other predicted genes are shown only for C. elegans). It seems likely that these arrangements are important for intracluster genetic exchange or stability because they characterize both gene families and persist across all three species. A similar arrangement of local inverted genes was found for the Hsp70-7 and Hsp70-8 genes, the only other case of probable concerted evolution described in C. elegans (Nikolaidis and Nei 2004).

Concerted DNA sequence evolution:
To provide sufficient variation for statistical analysis, partitioning of genes on trees was tested using DNA sequence multiple alignments. Trees were constructed by the maximum-likelihood method and bootstrapping as implemented in PHYLIP (Felsenstein 1993). The two Nspb subfamilies were tested separately because they align poorly to each other. Intron sequences were included for the Nspb family, but introns were excluded for the Nspc family because alignment quality in introns across clusters was poor. For each Nspb subfamily, physically clustered genes grouped together on trees with bootstrap support of at least 90% in all cases (supplemental Figures 1 and 2 at http://www.genetics.org/supplemental/). For the Nspc family, physically clustered genes grouped together on the tree in all cases, with high bootstrap support in all but a few cases (supplemental Figure 3 at http://www.genetics.org/supplemental/). Other conserved features of clustered genes, including indel positions and positions of translation start and stop (see Figures 1 and 2), are not included in these bootstrap tests and strongly corroborate them. These results confirm that the intuitively apparent results observed in the protein trees and alignments are statistically significant.

The fully assembled C. elegans genome was used to assess the relationship between genome position and degree of sequence similarity among gene family members. Detailed views of this information are given for one Nspb gene cluster (Figure 5) and for one Nspc gene cluster (Figure 6). The only strong correlation was that clustered genes are closely related and are divergent from genes that are physically distant, as described above. Attempts to correlate nucleotide divergence among genes within each cluster with other gene features gave variable results. Specifically, correlation was weak with length of shared sequence, relative gene orientation, and distance between genes. The best correlate was to the length of shared sequence (supplemental Table 1 at http://www.genetics.org/supplemental/), supporting a homology-dependent mechanism of concerted evolution. However, the correlation was imperfect, as expected if concerted genetic exchange were rare and stochastic.

Figure 5.Figure 5.
Detailed schematic of Nspb cluster A. Gene schematic markings are as in Figure 3 except that Nspb genes are shaded. (Bottom) A nucleotide-based maximum-likelihood gene tree and tables of nucleotide identity and length of shared nucleotide sequence. The (more ...)
Figure 6.Figure 6.
Detailed schematic of Nspc cluster C. Gene schematic markings are as in Figure 5. (Bottom) A nucleotide-based maximum-likelihood gene tree and tables of nucleotide identity and length of shared nucleotide sequence. The nucleotide alignments on which the (more ...)

Shared nucleotide sequences among cluster genes were visualized using pairwise VISTA and DNA dot plots (Sonnhammer and Durbin 1995; Mayor et al. 2000). Pairwise and multiple DNA alignments confirmed the same patterns (data not shown). An example is shown in Figure 7, which is a dot plot of genomic DNA that includes the five clustered genes from nspb group A in C. elegans. For most gene pairs, shared sequence extended <100 nucleotides beyond the coding sequence on either end, sometimes ending very close to the end of the coding sequence. Two exceptions are apparent for the genes shown: nspb-1 and -4 share ~550 nt of sequence downstream from their coding regions, and nspb-2 and -3 share ~320 nt of sequence upstream from their coding regions. These two pairs are also least divergent in nucleotide sequence within the genes (Figure 5).

Figure 7.Figure 7.
Dot plot view of Nspb cluster A. Dot plot self-comparison of a 10-kb genomic segment containing the five tightly clustered nspb genes from C. elegans cluster A. The axes are decorated with schematics of genes in the region, with markings as in Figure (more ...)

Synteny:
nspb cluster A in C. elegans (Figure 3) is syntenic with nspb cluster A in C. briggsae and nspb cluster A in C. remanei. The elegansbriggsae synteny is supported by extensive homology of single-copy genes inside and on both sides of the nspb genes (supplemental Figure 4 at http://www.genetics.org/supplemental/). The elegansremanei synteny is extensive on one side of the nspb cluster but ends after one gene on the other side (data not shown). Concerted evolution makes it impossible to assign individual nspb genes as ortholog pairs by direct comparison of protein or DNA sequences. However, other than a single nspb duplication in C. elegans (or deletion in C. briggsae and C. remanei), no within-cluster rearrangements are needed to explain the current gene configuration. Single-copy orthologous genes immediately to the left and right and one unique orthologous gene within the cluster are all oriented similarly in C. elegans and C. briggsae, suggesting that no other rearrangements occurred within the cluster. The nspb genes within each species are extremely similar to each other, yet each set has become substantially different from the set in the other species (Figure 1 and supplemental Figure 1 at http://www.genetics.org/supplemental/). The only plausible explanation for these results is concerted evolution of genes within the cluster in all three lineages. Since the total divergence time among the three species may be as much as 200 MY (Stein et al. 2003), the state of concerted evolution must be evolutionarily stable for this cluster.

VISTA, dot plot, and blast analysis yielded no clear evidence of extended synteny of other C. elegans clusters in the Nspb and Nspc families with C. briggsae and C. remanei. Nevertheless, inspection of the protein and DNA trees (Figures 1 and 2 and supplemental Figures 1–3 at http://www.genetics.org/supplemental/) suggests that some of these clusters share an evolutionary root. Nspb cluster B from C. elegans and cluster B from C. remanei also had synteny to one side of the cluster, suggesting that they are orthologous (data not shown). By extension, some of the other clusters are probably orthologous clusters that have undergone a similar process of concerted evolution in each species, but genome rearrangements after speciation have disconnected the clusters from flanking unique genes.

Purifying selection:
Although genes within each cluster undergo sequence homogenization, in many cases they differ enough to obtain valid dN and dS values for within-cluster comparisons. These values are given in Table 1, averaged for each C. elegans gene cluster for all pairwise gene comparisons; the values should be interpreted with caution since recent genetic exchange among genes will cause overcounting of some events. Complete data for individual pairwise comparisons in C. elegans clusters are given in supplemental Table 2 at http://www.genetics.org/supplemental/. On the basis of the low dN/dS ratios, it is clear that purifying selection is acting on within-cluster variation. Presumably the rate of nucleotide substitution at a given site is substantially higher than the rate of genetic exchange at that site, and the resulting individual gene divergence remains subject to purifying selection that eliminates most nonsynonymous changes. Maximum-likelihood analysis of codon alignments found no direct evidence for positive selection within or between gene clusters in any of the three species (data not shown).
TABLE 1TABLE 1
Summary of dN and dS values within and among C. elegans clusters

Nonconcerted ortholog trio:
One C. elegans gene, nspb-12, is a singleton on chromosome III. This gene is singularly informative because it has one close relative in both C. briggsae [cb12(CBG08980)] and C. remanei [cr15(contig5.108.1)] and the three genes appear to be one-to-one orthologs. In support of their orthology, C. elegans nspb-12 has synteny to C. remanei on both sides of cr15 and synteny to C. briggsae on one side of cb12 (there is an apparent synteny break on the other side). A dot plot showing the synteny between the C. elegans and C. remanei regions is shown in supplemental Figure 5 at http://www.genetics.org/supplemental/. These data indicate that there was a singleton nspb gene present in the shared common ancestor of the three species and that this gene has persisted as a single gene in each lineage. The proteins for the orthologous trio are too similar to produce a meaningful protein phylogeny, but, using synonymous-site nucleotide changes, the trio has relative distances typical for these three species (synonymous site changes: c-b 47.1, c-r 50.6, b-r 37.1; see materials and methods). There is no indication that these genes have undergone concerted evolution with other nspb genes and none are located near nspb clusters in any of the species. These results are significant because they suggest that nspb genes, when divorced from nearby nspb genes, evolve in a manner typical for single-copy genes in these organisms. Furthermore, these genes provide a molecular clock that can be used to calibrate divergence among nspb gene clusters undergoing concerted evolution.

Evolution in concerted Nspb clusters is accelerated:
Using the singleton orthologs nspb-12, cb12, and cr15 for comparison, I measured the rates of synonymous and nonsynonymous codon evolution between matching nspb cluster genes from C. elegans, C. briggsae, and C. remanei (see materials and methods). These comparisons are summarized in Table 2 and full data are given in supplemental Table 3 at http://www.genetics.org/supplemental/. The results are striking: the rates of nonsynonymous change are two- to sevenfold higher in genes undergoing concerted evolution. In addition, the frequency and length of indel mutations are substantially higher in genes undergoing concerted evolution. Rates of synonymous change are probably also modestly accelerated, but less so than rates of nonsynonymous change. All of the genes, whether from concerted clusters or not, are subject to strong purifying selection, as indicated by low dN/dS ratios. If we assume that the selection pressures acting on nspb genes are similar for the singleton orthologs and the clustered genes, these results suggest that the process of concerted evolution accelerates the rate of protein change. The cluster B comparisons may be more meaningful than the cluster A comparisons, since the cluster B genes are more closely related to the orthologs that calibrate the clock. The Nspc genes lack any singleton orthologs for comparison with clustered genes, so a similar analysis could not be done for that family.
TABLE 2TABLE 2
Rates of evolution of cluster genes compared to singletons

Other probable cases of concerted evolution:
The patterns observed for the Nspb and Nspc families were used as guides to identify other cases of probable concerted evolution (see materials and methods). Briefly, C. elegans protein predictions were systematically tested to identify pairs of proteins with >90% amino acid identity encoded by physically clustered genes in inverted orientation. These C. elegans genes were used to guide gene predictions in C. briggsae and C. remanei, and coding DNA sequences from the three species were aligned and treed to test for species-specific grouping of genes. The method was sensitive since it readily identified concerted evolution in the Nspb, Nspc, and Hsp70 gene families. The method additionally identified seven other families that probably undergo concerted evolution, as summarized in Table 3. For two of the families, maximum-likelihood DNA trees for the three species and genome arrangement in C. elegans are shown in Figure 8. The related genes undergoing putative concerted evolution in C. briggsae and C. remanei were also invariably clustered in the genome and usually in alternating orientation. Analysis of these families was less extensive and the inference of concerted evolution was based largely on clusters of species-specific relatives. A few cases of gene pairs or clusters in tandem orientation in C. elegans were also investigated and no similar evidence of concerted evolution was found, but this analysis was far from exhaustive. Including the previously documented Hsp70 genes, 8 of the 10 concerted gene families encode probable secreted proteins, many of which are small and contain very unusual amino acid compositions or sequences (supplemental Table 4 at http://www.genetics.org/supplemental/).
TABLE 3TABLE 3
Other probable concerted gene clusters
Figure 8.Figure 8.
Other concerted gene cluster trees. Two examples from the additional seven identified concerted evolution candidates are shown. For each example, the top portion is a maximum-likelihood tree of the coding DNA for clustered genes from the three species, (more ...)

DISCUSSION

Origin and maintenance of concerted clusters:
It is likely that the Nspb and Nspc concerted gene clusters arose from local tandem gene duplication events followed by small inversions (Semple and Wolfe 1999). At least one of the concerted clusters appears to be extremely stable, since it is orthologous and similarly structured in the three species examined, with a cumulative divergence of ~200 MY. How the shared sequence is stably maintained is not immediately apparent. It is expected that the length of shared sequence will shrink with time due to stochastic divergence near the ends, which will exclude further gene conversion. Although the regions of shared sequence in the Nspb and Nspc families are fairly well defined (for example, see Figure 7), in some cases there are short gray zones at the end of the shared sequence. These may be regions that were once subject to gene conversion but are now excluded and are drifting apart. In addition to gradual loss of shared sequence at the ends, simulation studies show that entire concerted duplicates will eventually escape gene conversion when they stochastically drift apart enough that conversion rates fall (Teshima and Innan 2004). How then is a length of stable shared sequence maintained? One simple possibility is that concerted clusters have limited life spans determined by the rate at which gene conversion is lost over time. With appropriate parameters, this life span could account for the observed long-term stability. Alternatively, an occasional new local gene duplication may generate new gene pairs with a longer region of shared sequence, while genes whose shared sequence shrinks or diverges too far to support frequent gene conversion are eventually lost. The striking parallels between related gene clusters in the three nematode species argue that gene duplications and losses are infrequent, but there are some differences in cluster gene number that indicate that they do occur. If the generation of new genes by duplication were balanced with gene loss, the result could be a fairly stable gene configuration.

Adjacent genes in all of the gene clusters are predominantly in alternating orientation. These configurations appear to persist over long periods and genetic exchange occurs among all of the genes, implying that the mechanism of concerted evolution in these families is predominantly gene conversion rather than unequal crossing over. Since most new gene duplicates in C. elegans are tandemly oriented, the inverted pattern of concerted gene clusters begs explanation. If there is a selective advantage to having a cluster of genes undergo concerted evolution over long periods of evolutionary time, then stable arrangements of those genes may emerge. I hypothesize that alternating gene orientation stabilizes the cluster arrangement. A simple mechanism would be that clusters with tandemly oriented genes are unstable over the long term due to unequal crossing over. Such tandem genes may undergo concerted evolution initially, but this condition is less likely to have persisted over the long time periods that separate the three species studied here.

A previous study of the positions and orientations of genes with high sequence identity (interpreted as recent duplicates) concluded that the majority of duplication events in C. elegans result in inverted gene orientation (Katju and Lynch 2003). It is possible that most such apparent inverted duplication events are actually older duplications that are undergoing concerted evolution.

Why concerted evolution?
One explanation for the observed patterns is that they are a consequence of genome dynamics with little or no selective significance. We might imagine a recently duplicated pair of genes located close to each other. If no special recombination signals or chromosomal characteristics are required for gene conversion, the stability of concerted evolution between the duplicates will depend on the stability of the gene pair and the frequency and length of gene conversion events relative to the mutation rate (Teshima and Innan 2004). If the mutation rate predominates, the duplicate genes will soon lose sufficient nucleotide similarity for gene conversion and they will subsequently evolve separately. If gene conversion predominates, then evolution of the duplicates will be concerted for a substantial period of time without any need to invoke selection. Concerted evolution will also end when one member of the gene pair is deleted or a genome rearrangement separates the two genes.

My finding of an increased rate of amino acid change in concerted Nspb genes compared to nonconcerted Nspb orthologs suggests that another possibility may apply in some cases. For a gene that is subject to long-term positive selection driven by some external influence, concerted gene clusters might serve to increase the rate of evolution. Specifically, I speculate that each gene in a concerted cluster independently explores protein evolution space. As usual, most mutations will be deleterious and will be eliminated by purifying selection (before or after the variant allele spreads in the cluster by conversion). Occasionally, one gene may change to confer an advantage due to changing selection pressure, and this change may drive a selective sweep in the population. Subsequent or concurrent gene conversion events that spread the novel allele to other members of the concerted cluster will be favored because the reverse conversion will be selected against (or possibly because of selection for increased dosage of the favorable allele). This hypothesis can explain both an increased general rate of evolution and a greater increase in nonsynonymous changes relative to synonymous changes (if gene conversion tracts are short). A more prosaic alternative explanation cannot be ruled out: if the orthologous Nspb gene used for comparison had acquired a specific unique function prior to speciation, it might be subject to stronger purifying selection than the concerted genes, slowing its evolution and giving the appearance of accelerated evolution in other genes.

Ironically in light of these findings, concerted evolution has been documented and studied best in multicopy rDNA genes, in which repeat orientations are tandem and the rate of nucleotide evolution within the rRNA coding regions is very slow (Brown et al. 1972). These differences can be reconciled as follows. First, the copy number of rDNA genes is indeed unstable (Michel et al. 2005), as expected for large tandem arrays that undergo unequal crossing over. For rDNA genes, strong selection for an approximately optimal copy number presumably counteracts this instability. For other gene families, stability may arise instead from a stable genome structure involving genes in inverted orientation. Second, the slow rate of evolution of coding rDNA probably results from strong purifying selection rather than from concerted evolution itself (Nei and Rooney 2004). If advantageous rDNA coding variants arose at an appreciable frequency, presumably these could sweep through the rDNA repeats and result in accelerated evolution, as I hypothesize is the case for the Nspb gene family. The fact that such variants do not arise is a consequence of the stable function of ribosomal RNAs.

Eight of the 10 gene families that undergo probable concerted evolution encode relatively small secreted proteins, most which have unusual amino acid compositions or sequence (supplemental Table 4 at http://www.genetics.org/supplemental/). These are common characteristics of antimicrobial peptide families (Brogden 2005). I speculate that many of these genes encode secreted antimicrobial proteins and that concerted evolution permits rapid evolution in response to changing pathogen pressure.

Acknowledgments

I thank Paul Davis for pointing out the Nspa gene family and Emily Rocke and Zhirong Bao for useful discussions.

References
  • Birney, E., M. Clamp and R. Durbin, 2004 GeneWise and genomewise. Genome Res. 14: 988–995. [PubMed].
  • Brogden, K. A., 2005 Antimicrobial peptides: Pore formers or metabolic inhibitors in bacteria? Nat. Rev. Microbiol. 3: 238–250. [PubMed].
  • Brown, D. D., P. C. Wensink and E. Jordan, 1972 A comparison of the ribosomal DNA's of Xenopus laevis and Xenopus mulleri: the evolution of tandem genes. J. Mol. Biol. 63: 57–73. [PubMed].
  • Chen, N., S. Pai, Z. Zhao, A. Mah, R. Newbury et al., 2005 Identification of a nematode chemosensory gene family. Proc. Natl. Acad. Sci. USA 102: 146–151. [PubMed].
  • Coghlan, A., and K. H. Wolfe, 2004 Origins of recently gained introns in Caenorhabditis. Proc. Natl. Acad. Sci. USA 101: 11362–11367. [PubMed].
  • Denver, D. R., K. Morris, M. Lynch and W. K. Thomas, 2004 High mutation rate and predominance of insertions in the Caenorhabditis elegans nuclear genome. Nature 430: 679–682. [PubMed].
  • Felsenstein, J., 1993 PHYLIP (Phylogeny Inference Package), Version 3.6a2. Department of Genome Sciences, University of Washington, Seattle.
  • Jeanmougin, F., J. D. Thompson, M. Gouy, D. G. Higgins and T. J. Gibson, 1998 Multiple sequence alignment with Clustal X. Trends Biochem. Sci. 23: 403–405. [PubMed].
  • Kamath, R. S., and J. Ahringer, 2003 Genome-wide RNAi screening in Caenorhabditis elegans. Methods 30: 313–321. [PubMed].
  • Katju, V., and M. Lynch, 2003 The structure and early evolution of recently arisen gene duplicates in the Caenorhabditis elegans genome. Genetics 165: 1793–1803. [PubMed].
  • Mayor, C., M. Brudno, J. R. Schwartz, A. Poliakov, E. M. Rubin et al., 2000 VISTA: visualizing global DNA sequence alignments of arbitrary length. Bioinformatics 16: 1046–1047. [PubMed].
  • Michel, A. H., B. Kornmann, K. Dubrana and D. Shore, 2005 Spontaneous rDNA copy number variation modulates Sir2 levels and epigenetic gene silencing. Genes Dev. 19: 1199–1210. [PubMed].
  • Nei, M., and A. P. Rooney, 2004 Concerted and birth-and-death evolution of multigene families. Annu. Rev. Genet. 39: 121–152.
  • Nikolaidis, N., and M. Nei, 2004 Concerted and nonconcerted evolution of the Hsp70 gene superfamily in two sibling species of nematodes. Mol. Biol. Evol. 21: 498–505. [PubMed].
  • Robertson, H. M., 2000 The large srh family of chemoreceptor genes in Caenorhabditis nematodes reveals processes of genome evolution involving large duplications and deletions and intron gains and losses. Genome Res. 10: 192–203. [PubMed].
  • Robertson, H. M., 2001 Updating the str and srj (stl) families of chemoreceptors in Caenorhabditis nematodes reveals frequent gene movement within and between chromosomes. Chem. Senses 26: 151–159. [PubMed].
  • Semple, C., and K. H. Wolfe, 1999 Gene duplication and gene conversion in the Caenorhabditis elegans genome. J. Mol. Evol. 48: 555–564. [PubMed].
  • Sonnhammer, E. L., and R. Durbin, 1995 A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis. Gene 167: GC1–10. [PubMed].
  • Stein, L. D., Z. Bao, D. Blasiar, T. Blumenthal, M. R. Brent et al., 2003 The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics. PLoS Biol. 1: E45. [PubMed].
  • Teshima, K. M., and H. Innan, 2004 The effect of gene conversion on the divergence between duplicated genes. Genetics 166: 1553–1560. [PubMed].
  • Thomas, J. H., 2004 Bonsai 1.1.4 download, March 2004 (http://calliope.gs.washington.edu/software/index.html).
  • Thomas, J. H., 2005 Analysis of homologous gene clusters in Caenorhabditis elegans reveals striking regional cluster domains. Genetics 172: 127–143. [PubMed].
  • Thomas, J. H., J. L. Kelley, H. M. Robertson, K. Ly and W. J. Swanson, 2005 Adaptive evolution in the SRZ chemoreceptor families of Caenorhabditis elegans and Caenorhabditis briggsae. Proc. Natl. Acad. Sci. USA 102: 4476–4481. [PubMed].
  • Yang, Z., 1997 PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13: 555–556. [PubMed].