|A ribosome doing its thing.
So a few weeks ago I wrote a post about how to identify wild yeasts based on their biochemical and morphological features. As I mentioned, this is tedious, time consuming, and not the most accurate process. It also requires access to specialized growth media, agar plates, high powered microscopes, etc. Even though I have access to these items at work, their cost is prohibitive especially when you consider that my wild yeast project aims to purify & characterize a number of strains.
In that same article I mentioned my plans to use a high tech, and yet strangely cheaper, method. This method will be outlined in this post. This method uses a technique called “polymerase chain reaction”, a method used to copy specific DNA sequences to identify yeast. We amplify a specific portion of the yeast (or bacterial) genome and then sequence the amplified DNA. The resulting sequence is then compared to a DNA database to ID the yeast/bacteria in the sample. Despite the high tech nature of the method, it is relatively cheap. Low-cost PCR kits, combined with crude DNA isolations and low-cost sequencing keep prices down to roughly $10/strain. While you wouldn’t want to screen hundreds of strains, this method is more than affordable to identify the final candidates in the wild yeast project.
All the good stuff is below the fold…
Early Publication Note:
Due to repeated emails (seriously guys/gals, its OK to post comments) I am posting this before running some actual tests. Hopefully the real-world sequencing tests will work out well. I hope to have the real-world examples completed in the next few weeks.
Brief Outline of the Method
Before going into the nitty-gritty I think its worth briefly explaining how this all works, in a way that is free of all the technical terminology. If you find some of the terminology confusing it may be worth going back and reading the Taxonomy section of my first post on identifying yeasts.
This method is simple in concept, if not in implementation. The end-goal is to sequence a small part of the yeast’s genome, and to use that sequence to then identify the yeast. Not any sequence will do – we need something conserved enough between different types of yeast/bacteria so that PCR can amplify it, but we also need there to be enough differences so that we can identify different species (genera, families, etc). The best part of the genome for this is the part that encodes ribosome’s – for reasons outlined below.
To make this happen we start by growing up a bit of yeast (or bacteria), and then crudely purify the DNA. We need only a crude purification, as the next step is to use a technology termed PCR to amplify the part of the ribosome that we will use for sequencing. We then send off the PCR product for sequencing. Finally, we take the sequence and use it to search a DNA sequence database, hopefully matching it up with a known yeast/bacterial genome. Vola – we’ve identified the yeast.
WTFerment is PCR?
So at this point you may be asking “that’s nice, but what the ferment is PCR”. PCR is short-form for “polymerase chain reaction”; a longer way of saying “DNA photocopier”. But unlike a photocopier, we can very specifically amplify (copy) exact portions of the genome. This selective amplification ability allows us to take a very small amount of crudely purified DNA and then amplify out a large quantity of the desired sequence.
So how does it work?
PCR starts by heating the DNA to just below the boiling point of water (usually 96C). This causes the DNA’s normally double stranded structure to ‘unwind’ into single strands – a process we refer to as ‘DNA melting’ or ‘denaturation’. In the same solution as our melting DNA are primers – short sequences of single-stranded DNA that match up with the ends of the portion of the yeast/bacterial genome we wish to amplify. We then cool the sample to between 45C and 60C; this causes the DNA to try and become double-stranded again (‘annealing’). However, we setup our initial conditions such that there is a lot of primers and not a lot of genomic DNA. As such most of the double-stranded DNA we get is regions where our primers are bound to the ends of the DNA sequence we are trying to amplify. We then warm the sample slightly – usually to round 72C – which activates a DNA-copying enzyme that then extends the primers such that we copy the DNA between the primers. The process is then repeated, with every repeat doubling the number of copies of the desired DNA sequence.
|Graphical Description of PCR
Click for Full-Sized
Once the PCR amplification is complete we send the DNA to a DNA sequencing facility. $5 later we have the sequence of our DNA region of interest – in this case, of the ribosomal ITS region.
WTFerment is a Ribosome & why do we use it?
Ribosomes are a key biological machine, without which we cannot live. These machines allow our genes (which are essentially blueprints on how to make proteins) to be turned into functional proteins. The overall process is simple – our DNA is copied into a similar chemical called RNA. The RNA copy is then “read” by the ribosome which, using the genetic code, builds a protein with the desired structure. That’s the coles-notes version; it is, of course, far more complex than that.
So why target ribosomes? Ribosomes themselves are a bit unusual, and are made of a mix of RNA and protein. The RNA portion is so very important to the functioning of the ribosome that most mutations lethally damage the ribosome. As such, ribosomes evolve very slowly – except for one part. During their production, all of the RNA part of a ribosome is copied from the genomic DNA as one intact strand. This strand is then cut into the 4 separate pieces of RNA that comprise the final ribosome. Between two of these pieces is a short segment that is not part of the ribosome termed the “internal transcribed spacer” (ITS) region. Since this piece is simply cut out, and mutations do not harm it, it evolves quite quickly.
So why does this make ribosomes an ideal choice for identifying yeast/bacteria by genome sequencing? The answer has to do with the primers we use for PCR – to work (i.e. to bind to the right portion of DNA) the sequence of the primer needs to match the sequence of the DNA very closely. But to identify species we need to sequence a region which will have genetic differences between various species. So we can design our primers to match the ribosomal RNA on either side of the ITS – which is part of the final ribosome, and thus doesn’t vary much between species – and use that to amplify the ITS from nearly any yeast ribosome (similar primers are possible for bacteria). Since the ITS is highly genetically variable, we can then use its sequence to identify the species of yeast!
|Yeast ribosome gene structure.
So we are targeting a very specific part of the ribosome. Specifically, we are using primers that bind to the end of the 18S ribosomal subunit and to the start of 5.8S ribosomal subunit, thus amplifying the region between them (i.e. the ITS). These structures can be seen in the figure to the right; the leftmost thick black bar is the 18S, the next (very short) black bar is the 5.8S. The ITS is the gap between them. By amplifying – and then sequencing – the ITS, we can figure out what species of yeast we have.
These protocols are based on those described in Brewhouse-Resident Microbiota Are Responsible for Multi-Stage Fermentation of American Coolship Ale.
First, the appropriate primers need to be ordered. Two are needed for yeast, and two different ones for bacteria. Two are needed for each, since a different primer binds to either side of the ITS region. The ribosomes are different enough between bacteria & yeast that we cannot use the same primers (note: the 5′ and 3′ demarcates the direction of the DNA strand). The bacterial primers don’t technically amplify an ITS region, but the principal is the same – our primers bind conserved regions, and amplify a section that varies between species.
ITS Primers For Yeast:
To begin: Mix the purified DNA, primers, and reagents that come in your PCR kits, as per manufacturers instructions Generally, you want to use 0.5ul to 1ul of the purified DNA in a 30ul to 50ul PCR reaction. Then amplify the DNA using the following amplification protocol:
- Activate your PCR enzyme as per manufacturers instruction; this is usually an initial heat for 2-3min at 95C.
- Heat to 95C for 30-60 seconds to melt the DNA.
- Let the primers bind the DNA by cooling to 50C (yeast) or 66C (bacteria) for 30-60 seconds.
- Warm to the active temperature of your PCR enzyme; usually 72C. Hold at this temperature for long enough to amplify 1000bp of DNA (usually 30-60 seconds)
- Repeat steps 2-4 44 more times.
- Perform a final extension, 72C for 5-10 minutes.
Check/Purify DNA Sample:The next step is to use a DNA gel to check that the PCR worked, and to purify the DNA. There are so many methods to do this that I cannot mention them all here. In short form, run out the PCR reaction on a 2% agarose gel. Post-stain with ethidium bromide and image on a UV imager. If PCR is successful, cut out the band(s) (sometimes there will be 2) corresponding to the amplified region (usually 400 to 1000 basepairs in size) and purify into 12ul water with a glassmilk kit (i.e. geneclean II kit).
Sequencing:Sequencing is done using contract facilities. Mine charges $5 per sequence. Samples need to be prepared as per the facilities instructions.
Protocol: Using BLAST to Analyze the DNA Sequence
|BLAST analysis of the published
Saccharomyces cerevisiae ITS region
In the picture to the right I show the results of blasting the published Saccharomyces cerevisiae ITS region. The red indicates highly accurate matches – in all cases, clicking on the red bar will take you to additional information. In all of these cases, the match is to a strain of Saccharomyces cerevisiae.
To try this yourself:
- Browse to NCBI Blast, then click the “Nucleotide” link half-way down the page. This will take you to a search engine for DNA sequences.
- Copy your DNA sequence file into the window titled ‘Enter accession number(s), gi(s), or FASTA sequence(s)’. As an example, copy the “defult” Saccharomyces cerevisiae ITS DNA sequence from this link.
- Ignore the rest of the options on the BLAST page, click the blue “Blast” button.
- After a few seconds, you will get your list of matched sequences.
Of course, our environmental samples are unlikely to match this well. Instead, we’ll probably get something more like this sequence (which I generated by adding in 50 mutations):
|Data Table – click to zoom in.|
The key numbers are the ‘Query Cover’ and ‘Max Ident’ columns. “Query Cover” indicates the percentage of the sequence we entered into the BLAST algorithm (the query) that matched to a sequence in the database. 100% means everything we entered matched up. Sometimes only part of the sequence will match, and this number will drop. The ‘Max Ident’ column indicates how well the query sequence that overlapped with the database sequence matched the database exactly. So a 50% ‘Query Cover’ with 99% ‘Max Ident’ means that 50% of the sequence we entered matched almost-perfectly with the database. Realistically, our ‘Query Cover’ will be close to 100% and our ‘Max Ident’ will be somewhere between 95% and 100%.
So which one is our yeast? Its probably the one at the top of the list, which is sorted by a score which incorporates both the ‘Query Cover’ and ‘Max Ident’ values.
- PCR Primers That Amplify Fungal rRNA Genes from Environmental Samples – Free scientific article outlining a method to ID fungi (including yeast) based on their ribosomal sequences.
- Conserved primer sequences for PCR amplification and sequencing from nuclear ribosomal RNA – webpage outlining primers and methods to ID yeast by sequencing.
- Brewhouse-Resident Microbiota Are Responsible for Multi-Stage Fermentation of American Coolship Ale – Free scientific article on the yeasts/bacteria found in lambic-style beer & the use of sequencing primers to ID the species within the sample.
- List of yeast protocols, including the DNA isolation method mentioned here.
- NCBI BLAST – search multiple DNA databases for genome sequences.
- Yeast Genome Database – database of yeast & other fungi genomes, includes a BLAST feature.