Skip to local navigation | Skip to main content

Fourth Annual DNA Grantees' Workshop

Wednesday, June 25, 2003

MORNING SESSION

Development of Multiplexed SNP Assays from Mitochondrial and Y-Chromosome DNA for Human Identity Testing
Peter M. Vallone
Biography

MR. STAPLES: Our next speaker is Dr. Vallone. In 1999 he went to work at NIST (National Institute of Standards and Technology) as a National Research Council Fellow, working with Dennis Rieter. For more than 3 years, Dr. Vallone has worked in the DNA Technologies Group at NIST with John Butler. He has developed multiplex assays for detection of SNPs (single nucleotide polymorphisms) located on the Y chromosome and mitochondrial genome. He received his Ph.D. in chemistry from the University of Illinois. Dr. Vallone.

DR. VALLONE: Thank you. I'm going to talk about some of the benefits of multiplexing and some of the samples that we're typing at NIST (National Institute of Standards and Technology), U.S. population samples, the assays and instrumentation that we're using, markers on the Y chromosome and the mitochondrial genome, and results from the mitochondrial 11plex assay and some Y–SNP (Y-chromosome single nucleotide polymorphism) multiplexes we've been working with, both commercial ones and those designed in house. Vallone: Slides 1 and 2

First, I'll go over some of the advantages of multiplexing: Vallone: Slide 3

  • You're obtaining more information per unit of time.
  • In terms of forensics, you're using less sample, which is important.
  • You save on reagents, enzymes, buffers, and things like that.
  • You have a reduction in labor.
  • It streamlines your data analysis.
  • It's essential for certain markers, especially when there is no recombination, like on the Y chromosome or mitochondria. In SNPs, for example, you generally have to look at more markers because they're biallelic.
  • Overall, multiplexing coincides with high-capacity instrumentation and new SNP typing technologies—things like chip-based technologies or bead arrays.

In our group, we have several goals for multiplex assay development? One of them is working with collaborators, who have markers of forensic interest, and we also look at the literature for markers of interest. We want to evaluate the forensic utility of newly discovered markers. For that, we use medium-size multiplexes, 5 to 10 loci. The larger your multiplex, the harder it is to design. Plus, you're not always sure if the markers will be useful, so we're kind of going with a medium-to-low approach in multiplexing. Vallone: Slide 4

We also want to further the understanding of developing these assays in terms of primer design, selection, and quality control. Then we publish the assays for others to evaluate, for commercial purposes, or for just research.

I can give you some background information on some of the samples that I will be talking about. We have more than 600 samples of males with self-identified ethnicities. Right now the breakdown is 260 Caucasians, 260 African Americans, 140 Hispanics, and 3 Asians. These were extracted in our lab, and we have about 80 micrograms of total extracted genomic DNA. Vallone: Slide 5

We have these diluted down into plates that are 1 nanogram per microliter, so all the typing results are typically done on 1 nanogram of genomic DNA. To date we have made more than 35,000 allele calls. These samples have been run by Identifiler, Roche Mito-strips, and 27 different Y–STR (short tandem repeat) markers.

We have different types of instrumentation at our lab. You're familiar with the ABI (Applied Biosystems) 3100. We also have a mass spectrometer, real-time PCR (polymerase chain reaction), and a Luminex 100 flow cytometer. The SNP assays that I will be talking about today will be based on the ABI 3100 with a primer extension-based assay. The work of the Luminex system flow cytometry is done with straight hybridization from commercial kits. Vallone: Slide 6

Now, I just want to review briefly the allele-specific primer extension. What you're going to have is your PCR amplified DNA template with your SNP site of interest. You've got your SNP primer or your primer extension primer, which typically binds at 3' to 5' adjacent to or up to the SNP site of interest. The tail—typically a poly T-tail—is then added to the back end of this primer to vary electrophoretic ability. Vallone: Slide 7

We use the ABI Prism SNaPshot multiplex system, which consists of the fluorescently labeled dideoxynucleotide primers (ddNTPs) and the polymerase, so they supply you with those reagents and you do all the primer design and informatics yourself. You basically do cycle sequencing, and your SNP primer is extended by one base unit. You have four different colors. It takes about an hour or so for the cycling to occur, and the identity of the SNP is elucidated by the ddNTP with the fluorescent label.

Next, I just want to briefly describe what's going on with the Luminex bead array system. You have the cytometer, and Luminex will supply the beads—they have 100 different colored beads right now. For a biallelic system, you could use this for typing up to 50 different SNPs, so each of these beads can be spectrally resolved from one another. Vallone: Slide 8

In the simple case of an assay, you might have two probes of 20 or 30 bases, although typically they're much longer. Most of the time, the site you're probing will be in the center of this hybridization oligo. You have two different colored beads, with one being synthesized for allele A and the other being synthesized for allele B.

In your PCR step, you've amplified your template, and you do another step where you dye-label it. Next, you mix these all together in solution at 55°C for about 30 minutes. The proper probe will then bind to the allele on the PCR product.

These then go through the flow cytometer where they're sorted into the bead and the PCR products; typically one goes through at a time. The red laser will give you the identity of the bead or the probe, and the green laser will detect the PCR products.

This is what the data looks like. It's usually just a set of numbers, but you can plot it as a histogram. So just for an example, this is Y–SNP M2. In this case you would have an allele call of an A, so you do get some background from the nonpresent allele, and of course on the Y chromosome, they're going to be homozygous. It takes approximately 30 seconds (plus or minus 15 seconds) to process each sample, so you can run a plate in less than 1 hour.

The SNPs that we'll be talking about are typed with the Luminex system. Forty-two Y–SNPs are available from Marligen Bioscience.

So from an assay standpoint, what are the advantages of typing SNPs? Well, probing a single base change usually only requires amplification around the surrounding site, so you have small PCR amplicons, which should result in success with degraded samples and a higher sensitivity overall. For the most part with SNPs, you have biallelic markers that makes testing much simpler than testing by using length polymorphisms. But you get less information, so you have to type more of them. Vallone: Slide 9

In terms of where things need to go for SNPs, we need to improve multiplex assay development, both on the PCR and the SNP detection. For serious forensic usage, I think that parallel high-throughput methods will be required for typing high levels of multiplexing.

What are the markers of interest that we talked about? Mitochondrial DNA, just briefly, is maternally inherited. Typically you have discrimination made through sequencing HVI and HVII in the D-loop. It's used because it survives well. You have 500 to 2,000 copies per cell. I'll be showing some information done in collaboration with Tom Parsons of SNPs in the coding region. Vallone: Slide 10

The Y chromosome, on the other hand, is paternally inherited. There is also a variety of Y–STR and Y–SNP markers (I'll be talking about the Y–SNP markers), and you generate a haplotype rather than a genotype. We require large databases because recombination does not occur.

Tom may have showed this slide or a slide similar to this when talking about the current strategy of sequencing HVI and HVII and making comparisons. Now, the problem that you run into in this case is that approximately 7 percent of HVI and HVII regions in Caucasians are identical, so you need more SNP sites or more regions to look at for discrimination. Vallone: Slide 11

This is where full mitochondrial genome polymorphisms come into play. Mitochondrial genome sequencing has revealed numerous SNP sites that can help distinguish Caucasians who share common HVI and HVII types. Vallone: Slide 12

Tom Parsons gave us 11 sites that were selected to help resolve Caucasians having the most common HVI and HVII types. He has talked about the criteria for those. Our goal was to put these in a multiplex assay that can be run on a common forensic instrumental platform.

This, in cartoon format, was the primer extension assay. What you have here is the 11 SNP sites and their identity. They're all biallelic and located throughout the mitochondrial genome. Now, I'm not going to go into the strategy for it because we're publishing it soon, but part of the strategy for designing the multiplex PCR was to coamplify all the regions at once. So we have an 11plex PCR to amplify these different regions. Vallone: Slide 13

The PCR sizes were kept under 150 base pairs to enable success with degraded samples. Typically they're about 120 base pairs. Also with some in-house software, we designed the extension primers for probing all these regions to do the SNaPshot assay in multiplex. After your cycle sequencing, then, hopefully you'll detect all the different SNPs.

Here are some of the initial results of this assay. Just to clarify, here you have RFUs (relative fluorescent units) versus the sizing standard. The size of the fragment correlates to what locus you're looking at, because you know what size tail you put on each SNP primer. The color, then, allows you to probe the SNP identity. Vallone: Slide 14

What I have here, I say, is equimolar and balanced, and those aren't the PCR products. For PCR, we use the exact same primer concentrations and then empirically balance the PCR reaction just by raising and lowering the temperature after doing multiple experiments to try to get a better optimized signal. That's what I'm showing here at the bottom, but you can see the assay is capable of detecting all 11 SNPs.

Here is the 11plex run again on seven unique samples. These samples were provided by Tom Parsons' lab because they show variation at each allele. The point is that it's important to confirm that your assay accurately detects each variant—and you can go down each row to check for the proper color to see where the variant is for each one—to make sure that you're not getting a type you shouldn't be seeing. Vallone: Slide 15

The collection of this information, the sizing of all these, can be used to develop a macro for automated typing. So in the software, you can say where the fragments should come out—plus or minus an error—and what color they should be, so you can just take the data out of Genescan and get a table of allele calls.

This is an initial study for sensitivity. I'm showing concentrations of genomic DNA from 100 picograms down to 1 picogram. This is probably pristine genomic DNA; it's not degraded. I believe in Tom's lab they're starting to do some work with degraded samples to show how the assay performs, but this gives you an idea of what's going on with the sensitivity with a good sample. Vallone: Slide 16

One thing we were interested in looking at was how the SNaPshot assay will deal with mixtures. This data was run by Rebecca Hamm in Tom's lab. Here, you have two different samples that vary at four loci. On the right is the ratio in which they were mixed together, and as you can see, it's definitely capable of detecting mixtures. Vallone: Slide 17

Let's look at the one with the 50:50 ratio. The black and the red are CT polymorphisms that are approximately equal in peak height, as you'd expect for a 50:50 ratio. But for 16519, the blue peak is almost twice the size of the green peak. There could be various reasons for this. It could be a sequencing artifact or have something to do with the sequence context or even with something specific about the locus 16519.

It's just important to note that it's not enough just to think of the ratio of the peaks. You probably would have to run some type of standard curve or something like that ahead of time to truly understand what's going on with the locus and the different dyes.

Some of the more than 600 samples were run by Margaret Kline and Jan Redman on the Roche Mito-strips. Here is just an example of some of the data on 10 regions—8 regions within HVI and HVII and 2 outside sites. This was run on all our population samples. Vallone: Slide 18

The point of me showing this is just to show you that mito type 11111111AT was the most common mito type found in our U.S. Caucasian population. It showed up in 15 percent of the samples. The reason that I'm showing this is because I figured on running the 11plex on these samples to see if I could discriminate them even more. Just a note: All of HVI and HVII types are not sequenced for these population samples. We had just the mito strips, but I thought it would be a good start to see if we could separate these out with the 11plex assay.

Here are some initial results from 44 samples that I ran recently by the mito strip assay. We observed 10 different haplogroups, of which 3 were unique, and 2 of 11 sites didn't vary at all: 10211 and 7202. As I said, this data was just collected, so we're still looking at the allele frequencies and things like that. Vallone: Slide 19

I'm going to break there with the mito data and move on to some of our the Y–SNP data. Just briefly, what's the forensic utility of Y-chromosome SNPs? They're useful for human identification purposes, paternity tests, evolutionary studies, and population studies. They're also useful markers in a mixed male-female sample. The haplogroups are nonrandomly distributed among populations, so the potential exists for predicting the population of origin. Furthermore, the nonrecombining region is similar to the 250 SNPs referred to earlier today by Alan Redd in his discussion about the Y Chromosome Consortium (YCC). Vallone: Slide 20

So, what are the Y–SNPs that we've typed at NIST? We've got 18 SNPs in three 6plexes on the SNaPshot assays (allele-specific primer extension). We've also run a commercial kit from Marligen for which 42 Y–SNPs and Amelogenin are present in five multiplexes and that's a hybridization-based assay. Vallone: Slide 21

Ten of the SNP sites were redundant between the two methodologies. So you have a total of 50 Y–SNPs that have been typed. Among these SNPs, I'm going to show you 50 African Americans and 114 Caucasians.

This is the map or the tree that Alan showed earlier from the YCC paper. He mentioned there are 153 total, but the markers that we look at will have the potential for defining 45 haplogroups. I've highlighted the ones from the Marligen kit, which just show the level of multiplexing for each: You have a 9plex and a 12-, 7-, 8-, and 7plex. Vallone: Slide 22

These are NIST's in-house assays that were developed for typing 18 Y–SNPs. Multiplexing was done at both PCR and SNP levels. We've run these SNP assays on 229 samples. These are 6plexes. They were designed quite rapidly—in a week or so. The red markers indicate those that are unique to the set. The grey ones overlap with the Marligen kit. Vallone: Slide 23

So, to summarize the SNP data: A total of 16 nanograms of genomic DNA were consumed in the eight multiplexes. Out of the 45 haplogroups, we had 18 that were observed for our 229 samples. We have over a 99-percent success rate for allele calls, and that's with both methods. Variation was only observed in 24 of the 50 Y–SNPs, so 26 of the 50 Y–SNPs were monomorphic. Vallone: Slide 24

We saw 100-percent concordance for the 10 overlapping markers, and we got the same results from Marligen and SNaPshot—that's more than 2,000 allele calls. Among the 50 markers, 12 haplogroups were found in the African-American population, of which 6 were unique. Twelve haplogroups, 6 of which were unique, were also present for Caucasians, and 6 of the haplogroups were shared, which equals 18.

Breaking that data out, you can see that E3a was the main haplogroup observed among the 115 African Americans, which is to be expected. R1b, which is the European haplogroup, was found in 47 percent of the 114 Caucasian samples. In terms of admixture, it's important to point out that the R1b haplogroup is present in 23 percent of the African-American samples, but the E3a haplogroup cannot be found in the Caucasian samples. Vallone: Slide 25 and 26

P25 was an important marker because it designated the R1B haplogroup. But when typing it initially with the Marligen kit, the derived allele A was never observed. In talking to Alan Redd, he informed us that P25 is a multicopy locus; that is, it's found multiple times on the Y chromosome. So after further review of our data, we were able to make the correct allele call for the Y–SNP P25 marker based on a signal intensity ratio, and BLAST results indicate that the region surrounding P25 is present three times on the Y chromosome.

Here is how we were able to type P25 with our system. First, I'm going to show the results from the Y–SNP M2, which sort of followed normal behavior. The solid bar represents the allele calls. As you can see, the G peak is higher than the A peak. This is sort of the background. Vallone: Slide 27

Then for an A allele call, the A peak is stronger than the G peak. You still have quite a bit of background, but these error bars represent all the experiments averaged together, so it's still pretty straightforward to make an allele call. Vallone: Slide 28

Moving on to the P25, you see that the peaks on the right are always higher than the peaks on the left. That is, the C peak always dominates the A peak. But if you take the ratios of all of them, you get rather tight numbers: A ratio of 1.7 plus or minus 0.1 for the allele call of A to 3.2 plus or minus 2 for the allele call of C. Vallone: Slide 29

So if you're not aware that it's a multicopy locus, then it can be a problem. Also, if you're thinking about looking at mixtures or heterozygous samples, then this type of background is going to be an issue too.

This slide sort of compares two assays in terms of signal-to-noise ratio. For the SNaPshot assay, I have two different samples run by a sort of concordance; that is, the same samples run by the different assays for the same marker. For Y–SNP M172 on top, you have a G in blue and a T in red. You can see that the alternative allele is not observed. If it was, then it would show up in blue or red. With very little background, the signal-to-noise ratio is quite high. Vallone: Slide 30

You can see here for a hybridization-based assay that the calls are still being made. Here, however, you have quite a bit of background, so the signal-to-noise ratio is quite low. Again, that could be an issue for mixtures or for heterozygous samples.

In conclusion, 11plexes can accurately detect 11 SNPs in a single assay by multiplexing the PCR and the primer extension. We'll submit a manuscript—hopefully submitted in a week or two—that describes the assay, and we're going to look into developing more mito-SNP assays in collaboration with AFDIL (Armed Forces DNA Identification Laboratory) for separating out other common HVI and HVII types. Vallone: Slide 31

In terms of Y–SNP assays, we're looking at additional Y–SNP markers with the help of Alan Redd and Mike Hammer. We would also maybe like to try allele-specific primer extension with the Luminex beads. They have some universal beads out there, and using primer extension with those in the place of hybridization may result in lower backgrounds.

We're also going to type the additional NIST population samples. I only showed a fraction of those today, but the results that I showed for the 50 Y–SNPs will be submitted in manuscript shortly.

I'd like to thank NIJ for funding and allowing me to talk; project leader John Butler; Margaret Kline and Jan Redman, who did all of the mito-strip work and the DNA extraction; and my collaborators, Tom Parsons and Rebecca Hamm, who did a lot of work with me on the 11plex; Mike Coble at AFDIL; Dave Carlson at Marligen; and Mike Hammer and Alan Redd at the University of Arizona. Thank you for your attention. Vallone: Slide 32


Previous          Contents          Next
Date Entered: January 17, 2008