www.roche-applied-science.com Genome Sequencer FLX System Longer sequencing reads mean more applications. In 2005, the Genome Sequencer 20 System was launched ■ Read length: 100 bases ■ 20 million bases in less than 5 hours In 2007, the Genome Sequencer FLX System was launched ■ Read length: 250 to 300 bases ■ 100 million bases in less than 8 hours Available in 2008, the Genome Sequencer FLX with improved chemistries ■ Read length: >400 bases ■ 1 billion bases in less than 24 hours More applications lead to more publications. Sequencing-by-Synthesis: Using an enzymatically coupled reaction, light is generated when individual nucleotides are incorporated. Hundreds of thousands of Proven performance with an expanding list of applications and more than 130 peer-reviewed publications. Visit www.genome-sequencing.com to learn more. individual DNA fragments are sequenced in parallel. For life science research only. Not for use in diagnostic procedures. 454 and GENOME SEQUENCER are trademarks of 454 Life Sciences Corporation, Branford, CT, USA. © 2008 Roche Diagnostics GmbH. All rights reserved. Roche Diagnostics GmbH Roche Applied Science 68298 Mannheim, Germany Table of contents Letter from the Editor . . . . . . . . . . . . . . . . . . . . . . .4 Index of Experts . . . . . . . . . . . . . . . . . . . . . . . . . . .4 For Roche/454 Users: Q1: How do you ensure accuracy and reproducibility when you isolate genomic regions of interest to be sequenced? . . . . 6 Q2: How do you optimize the amount of input DNA? . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Q3: What steps do you take to ensure a timeeffective sample preparation protocol? . . . . . .8 For Illumina/Solexa Users: Q4: How do you ensure accuracy and reproducibility when you isolate genomic regions of interest to be sequenced? . . . 10 Q5: How do you optimize the amount of input DNA? . . . . . . . . . . . . . . . . . . . . . . . . . 11 Q6: What steps do you take to ensure a timeeffective sample preparation protocol? . . . . 12 List of Resources . . . . . . . . . . . . . . . . . . . . . . . . . 14 HybSelect™ from febit: DNA extraction for next generation sequencing using Geniom® Biochips Genomic DNA Sequencing library DNA selection by specific capture probe matrix Target DNA elution Target your sequencing to those regions that interest you most E A R LY A CC E S S : Contact febit to find out more about our method for targeted sequence capture, based on our microfluidic, fully flexible oligo array platform info@febit.com (USA & Canada) · info@febit.eu (Europe & RoW) For all Next Generation Sequencers Letter from the editor For the first installment in what we envision to be a series, GT looks to the future of next-gen sequencing. By the end of last year, three next-gen platforms had made it to market: Roche/454's Genome Sequencer FLX (an upgrade of the Genome Sequencer 20); Illumina's Genome Analyzer; and Applied Biosystems's SOLiD sequencer. For the purposes of this guide, we've focused on Roche and Illumina, the two platforms that have been around for a year or more to ensure that our experts have had enough time to refine their protocols. While there’s been less demand in the research market for CE instruments, next-gen platforms have roared to life for a number of applications, including de novo genome sequencing, gene expression profiling, ChIP sequencing, small RNA analysis, metagenomics, and resequencing. And considering the ever-declining prices, no doubt scientists will continue to use them for efficient, high-throughput sequencing analyses. One area that seems to present the most difficulty in these early days is sample preparation. To that end, we've gathered experts familiar with both platforms to lend their insight to the challenge of maintaining efficient, standardized procedures. In this guide, users offer advice on isolating genomic regions of interest, maximizing the amount of input DNA, and ensuring timely preparation procedures. As always, don't miss our resources section, which lists additional places to go for advice on how to keep your next-gen runs as accurate and reproducible as possible. — Jeanene Swanson Index of experts Anoja Perera Genome Technology would like to thank the following contributors for taking the time to respond to the questions in this tech guide. 4 Stowers Institute for Medical Research Ghia Euskirchen Stephen Kingsmore Richard Reinhardt (Mike Snyder's lab) Yale University National Center for Genome Resources Max Planck Institute for Molecular Genetics Yuan Gao Matthias Meyer Virginia Commonwealth University Max Planck Institute for Evolutionary Anthropology University of Oklahoma Neil Hall Kenneth Nelson Agnes Viale University of Liverpool (Mike Snyder's lab) Memorial SloanKettering Cancer Center Yale University Next-Gen Sequencing Sample Preparation Bruce Roe Genome Technology Res earch Development Manufacturing Use your research resources wisely... Go Green, Go Cogenics The Genomics Services Company Cogenics is setting the standard in customizing and delivering expert genomics solutions for Research, Clinical, and Manufacturing applications in the biotechnology and pharmaceutical industries. Whether your questions are best answered by sequencing, conventional or next-generation, gene expression, genotyping, or a combination of techniques, Cogenics provides resourceeffective, expertly-run solutions for your research or FDA regulated genomics projects. Your analyses will be performed using the most appropriate platform to answer your research questions with fast delivery times and high quality data. Whether you are planning a full or pilot project, here are some of the solutions we consistently provide: Sequencing solutions Genetic variant assay development and validation Viral and oncogene analyses Drug efficacy and safety related analyses SNP Discovery and Genotyping Support of global multi-center clinical trials Cell Bank Characterization Biodistribution and Residual DNA Analyses w w w.c o g e n i c s . com / gogr een US: 1 877-226-4364 France: +33 (0) 456-381102 UK: +44 (0) 1279-873837 Germany: +49 (0) 8158-998518 www.cogenics.com Email: sales@cogenics.com The Genomics Services Company For Roche/454 Users: How do you ensure accuracy and reproducibility when you isolate genomic regions of interest to be sequenced? We don't do much of this. So far we have only isolated genomic regions using high fidelity PCR. — Neil Hall Apart from whole genome shotgun sequencing we are currently targeting small genomic regions, which can be easily enriched through preamplification by PCR or long-range PCR. In our hands, the success of longrange PCR greatly varies not only with DNA quality, but also with the PCR system, and we found it helpful to evaluate the performance of kits from different suppliers. — Matthias Meyer "Touchdown" PCR coupled with a second round of nested primers and "Touchdown" PCR to amplify genomic DNA regions of interest. — Bruce Roe We require core facility users to provide purified genomic DNA. For whole genome sequencing, we long-range PCR greatly varies have obtained good quality 4 5 4 data using not only with DNA quality, but DNA extracted by several different methods (e.g., also with the PCR system, and DNeasy and Proteinase K / phe no l - c h lorofor m we found it helpful to evaluate extraction kits from Qiagen). We do not believe the purification method is a the performance of kits from critical parameter provided that the resultant DNA is We don't use the 454 for different suppliers.” high molecular weight and re-sequencing but mainly very clean (260/230 > 1.7). — Matthias Meyer for de novo sequencing For amplicon resequencing, a (based on pooled BACs, proofreading polymerase which have been should be used during the individually measured and amplification. We routinely purify the PCR product adjusted) and cDNA/microRNA [libraries]. using the AMPure Agencourt kit. We have not yet — Richard Reinhardt optimized protocols for resequencing long-range We ac tually rarely focus on specific PCR products. — Agnes Viale genomic regions, but when we do, we use 6 “In our hands, the success of Next-Gen Sequencing Sample Preparation Genome Technology How do you optimize the amount of input DNA? We use a 2:1 template-to-bead ratio. We don't do titrations and have had consistent runs between 100 Mb and 150 Mb. — Neil Hall The material requirements for 454 sequencing are very low; 1 nanogram or less starting material will usually produce sufficient library for sequencing, and there is, in principle, no requirement for optimizing the amount of input DNA. However, this is only true if quantitative PCR is used to estimate the copy number in the sequencing library. The quantification methods suggested in Roche’s library preparation protocol are not sufficiently sensitive, and micrograms of input material are required to detect resulting libraries on Agilent chips or in RiboGreen assays. I generally recommend implementing the quantitative PCR when working with the 454 platform. It not only drastically reduces the material requirements to nanograms or picograms, but in our experience also gives more consistent sequence yields. From 100 or so libraries we quantified with this method, most gave optimal sequence numbers without further titration runs. When using the method for the first time, it is advisable to include an existing, well-titrated sequencing library into the measurements for use as an initial reference point. — Matthias Meyer The titration step is the most accurate and best method to optimize input DNA. However, to get Genome Technology into the ballpark, we use RiboGreen and PicoGreen (Invitrogen) assays for quantity and Agilent Bioanalyzer for sizing. (These are the standard 454 methods, but we find that they are essential and cannot be skipped.) A German group recently published a method using qPCR, but we have not tried that yet. — Kenneth Nelson We generally check the quality using the Agilent system from which we extract empirical factors, and in some cases we use titration runs. — Richard Reinhardt We typically begin making our library with 5 to 10 ug input DNA, and at various stages we quantitate the DNA on the Caliper AMS-90. In the emPCR step we use less input DNA (0.8 molecules of DNA/molecule of beads) rather that the 1.0 to 1.2 molecules of DNA recommended by Roche/454. — Bruce Roe This step is crucial. An inadequate copy-per-bead ratio can completely spoil a run. If the DNA is a discrete band, we use a PicoGreen-based quantification method to calculate the molarity of the sample. If the starting material is a smear (e.g., cDNA), we use the PicoGreen results but we size-weight the value according to the Agilent Bioanalyzer DNA 1000 Assay results. This approach was developed empirically but it works fairly well. — Agnes Viale Next-Gen Sequencing Sample Preparation 7 What steps do you take to ensure a time-effective sample preparation protocol? Really we only use the manufacturer’s protocol. Shortcuts such as cetrifuging to break emulsions have not worked for us. At the moment, we find that shortcuts have reduced our throughput. — Neil Hall multiple sequencing runs, one careful library prep is very time effective. We have not found any real shortcuts to the Roche/454 protocols. — Kenneth Nelson We consequently stick to protocols supplied by Roche. Sample preparation for 454 sequencing in our lab — Richard Reinhardt often involves barcoding of multiple samples before the construction of a single sequencing library. This We adhere to a strict time schedule for the library adapts the 454 technology for use with multiple and emPCR protocols that samples and in many cases has been established over better exploits the “We adhere to a strict time the past two-plus years. My sequencing resources. Since technicians and students the barcoding reactions add doing these protocols also to the time required for schedule for the library and work in teams and that sample preparation, we have helps keep to the developed a protocol for emPCR protocols.” set schedule. multichannel setup in plates, allowing for par tial — Bruce Roe — Bruce Roe automation on a pipetting At this point, we are still robot. Once the samples are processing our samples manually. To reduce reagent barcoded, Roche’s standard protocol for cost, we first set up two or three emPCR per sample sequencing library preparation only takes some with different copy-per-bead ratio. Then, based on hours. However, we have observed that sequencing the percentage of bead recovery, we select an libraries degrade very rapidly. Freezing libraries in optimal ratio and process the remaining samples aliquots immediately after their production is very using this ratio for the emPCR. This process bypasses helpful to decrease the risk of failed or suboptimal the titration on PTP, but does not reduce the sequencing runs, and can therefore save a lot of processing time (in general, we perform sample time and money on this side. preparation/processing Monday through Thursday — Matthias Meyer and run the 454 Thursday nights). Since the library prep usually yields enough DNA for — Agnes Viale 8 Next-Gen Sequencing Sample Preparation Genome Technology Announcing the arrival of THE next-generation in sequencing platforms — Applied Biosystems SOLiD™ System Setting the standard in next-generation technology, the SOLiD™ System is the only platform to provide the accuracy, throughput and scalability required to enable exciting new applications beyond the boundaries of traditional genetic analysis. The company that automated sequencing now introduces a genomic platform with the power to break the barriers into a new generation of discovery. To join the new generation, visit solid.appliedbiosystems.com For Research Use Only. Not for use in diagnostic procedures. © 2008 Applied Biosystems. All rights reserved. All other trademarks are the property of their respective owners. Applera, Applied Biosystems, AB (Design) and ABI PRISM are registered trademarks. SOLiD is a trademark of Applera Corporation or its subsidiaries in the U.S and/or certain other countries. For Illumina Users: How do you ensure accuracy and reproducibility when you isolate genomic regions of interest to be sequenced? Most of our Solexa (Illumina) work is ChIP sequencing. Many of the standards that were developed for ChIP-chip also apply to ChIP-seq, with antibody validation being critical to all ChIP experiments. We validate antibodies by IP-western as well as by mass spectrometry. For reproducibility we perform and evaluate three biological replicates, zeroing in on control loci if they are known for a given factor. — Ghia Euskirchen We pretty much check the accuracy and reproducibility by: • mapping the reads to the regions of our interest • using Sanger sequencing to confirm • performing technical replicates to see correlation — Yuan Gao The National Center for Genome Resources currently has two Solexa-Illumina sequencers in full-time operation and a third on its way. About one half of our throughput represents in-house samples and the other half are provided by academic and industry clients nationwide. To date, we have brought two applications into full production — genomic DNA sequencing and messenger RNA sequencing. The mRNA protocol was developed by Gary Schroth's group at Illumina and has been tweaked by Jim Huntley at NCGR, while our genomic DNA protocol is standard. For these sample types, we have developed standard procedures and a LIMS system to ensure accuracy and reproducibility. It tracks each sample through the Solexa sequencing process and Joann Mudge at NCGR has been working hard to validate quality metrics at various stages of the process. The standard yield that passes quality control from seven channels is ~1 10 gigabase of singleton reads. Our standard read length is 36 bp, although we've recently been extending this to 46 bp. One neat accuracy check that we've done is to run a set of samples both on the Solexa sequencer and on Infinium HapMap 550K genotyping chips. This has helped us tremendously to validate raw and bioinformatically filtered SNP detec tion accuracy. For nucleotide variant detec tion and management of case -control association studies we are using a software system we've developed called Alpheus (http://alpheus.ncgr.org/). For other sample types, such as isolated genomic regions of interest, we ask clients to do the isolation and first steps in the library preparation. They ship us libraries and we generate clusters and sequence them. The yield and quality of these libraries vary. — Stephen Kingsmore So far we have not isolated genomic regions. We have only performed whole genome-wide experiments. In the future, if we do isolate regions we will have to perform validation experiments. The type of validation experiment will depend on what regions are isolated and the techniques used to isolate. For instance, if we do long-range PCR to isolate a small region, we could run a gel to ensure we are amplifying the expected size. Also, we can perform Sanger sequencing with the PCR primers to confirm the amplified region. — Anoja Perera Any kind of UV- or gel-based measurements are used to determine the amount of PCR-amplified samples, cDNA [libraries] for expression profiling or ChIPbased experiments. — Richard Reinhardt Next-Gen Sequencing Sample Preparation Genome Technology How do you optimize the amount of input DNA? Library size is an important parameter in obtaining There are two points at which we seek to optimize the amount of input material. good quality data. We monitor library The first is at the time of RNA library generation, performance in part by examining sequence data when many clients want to generate for identical reads which can be generated during sequence from as little as 1 microgram of the PCR amplification step if insufficient starting total RNA. material was used. Additionally, if there is an The second point is at excess of adapters relative c l u s ter generation. to input material, the Addition of either too adapters ligate to each much or too little library other without an inser t “We try to determine the results in fewer sequence and yield a large number reads. The optimal number of adapter reads. amount of clusters generated by of clusters will generate — Ghia Euskirchen almost 5 million passing fluorescent measurement, but We have used different reads per channel. We use amounts of input DNA to an Agilent Bioanalyzer to m a ke l i b r a r i e s a nd determine the library mainly it is based on empirical t he n d e te r m ine w hi c h concentration and typically concentration yields better load 1 pM to 3.5 pM. feeling and empirical factors.” results. We found out that the — Stephen Kingsmore most important optimization — Richard Reinhardt is the input library Quantity as well as quality concentration. We usually use matters when it comes to 3 pM to 4 pM of library DNA input DNA. Here, an to generate clusters. There are many ways to measure efficient cleanup technique is a must! — Anoja Perera the concentration of the library. We used a combination of measuring the amount of input DNA by Nanodrop We try to determine the amount of clusters and running against a quantitative marker on a gel. We generated by fluorescent measurement, but highly recommend doing both, as this may be the mainly it is based on empirical feeling and empirical most important factor to determine your final factors. sequencing output. — Yuan Gao — Richard Reinhardt Genome Technology Next-Gen Sequencing Sample Preparation 11 How do you ensure accuracy and reproducibility when you isolate genomic regions of interest to be sequenced? We find the genomic and ChIP DNA library preparation to be quite straightforward. Mostly we try to space out our samples during the library preparation to avoid any cross-contamination. — Ghia Euskirchen Solexa sample preparation is easy enough. We pretty much follow Illumina's protocol. — Yuan Gao next steps while you are on a waiting step to see what needs to be thawed to cut out down time. Arrange your work area to maximize workflow. — Anoja Perera We use the cluster station from Illumina but try to consequently follow the protocols. — Richard Reinhardt The Solexa-Illumina sample preparation protocol is fast (~a day) and several libraries can be generated simultaneously. The bottlenecks in the process are not at sample preparation, but at cluster generation (we have two cluster stations for two sequencers to alleviate this), sequence generation (particularly when we are generating 46-bp reads), basecalling, and genomic alignments. — Stephen Kingsmore Plan ahead of time, set up a schedule, and organize yourself. Familiarize yourself with the protocols beforehand. Make sure all reagents and supplies are available to work with. Have a backup plan! For example, have extra supplies in case something goes wrong. We have had two faulty amplification manifolds in the past, and if we didn't have backup ones our experiments would have been delayed. Read your protocols and draw out timelines next to the steps. The gene expression protocols take three full days and without proper preparation you will be putting in more than eight hours. Look to the 12 Next-Gen Sequencing Sample Preparation Genome Technology Massively Parallel Sequencing Projects? Data Can Be Overwhelming. NCGR has a solution… I need Alpheus™! Contact fds@ncgr.or fds@ncgr.org www.ncgr.org List of resources Our panel of experts referred to a number of publications and online tools that may be able to help you get a handle on sample preparation for next-generation sequencing. Whether you're a novice or pro at this new technology, these resources are sure to come in handy. Publications Brockman W, Alvarez P, Young S, Garber M, Giannoukos G, Lee WL, Russ C, Lander ES, Nusbaum C, Jaffe DB. Quality scores and SNP detection in sequencing-by-synthesis systems. Genome Res. Jan 22, 2008 [Epub ahead of print]. Don RH, Cox PT, Wainwright BJ, Baker K, Mattick JS. 'Touchdown' PCR to circumvent spurious priming during gene amplification. Nucleic Acids Res. 19(14): 4008 (1991). Fahlgren N, Howell MD, Kasschau KD, Chapman EJ, Sullivan CM, Cumbie JS, Givan SA, Law TF, Grant SR, Dangl JL, Carrington JC. High-throughput sequencing of Arabidopsis microRNAs: evidence for frequent birth and death of MIRNA genes. PLoS ONE. 2(2):e219 (2007). Hafner M, Landgraf P, Ludwig J, Rice A, Ojo T, Lin C, Holoch D, Lim C, Tuschl T. Identification of microRNAs and other small regulatory RNAs using cDNA library sequencing. Methods. 44(1):3-12 (2008). Hillier LW, Marth GT, Quinlan AR, Dooling D, Fewell G, Barnett D, Fox P, Glasscock JI, Hickenbotham M, Huang W, Magrini VJ, Richt RJ, Sander SN, Stewart DA, Stromberg M, Tsung EF, Wylie T, Schedl T, Wilson RK, Mardis ER. Wholegenome sequencing and variant discovery in C. elegans. Nat Methods. 5(2):183-8 (2008). Meyer M, Briggs AW, Maricic T, Höber B, Höffner B, Krause J, Weihmann, Pääbo S, Hofreiter M. From micrograms to picograms: quantitative 14 PCR reduces the material demands of highthroughput sequencing. Nucleic Acids Res. 36(1):e5 (2008). Meyer M, Stenzel U, and Hofreiter M. Parallel tagged sequencing on the 454 platform. Nature Protocols. 3:267-278 (2008). Robertson G, Hirst M, Bainbridge M, Bilenky M, Zhao Y, Zeng T, Euskirchen G, Bernier B, Varhol R, Delaney A, Thiessen N, Griffith OL, He A, Marra M, Snyder M, Jones S. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat Methods. 4(8):651-7 (2007). Rusk N, Kiermer V. Primer: Sequencing — the next generation. Nat Methods. 5(1):15 (2008). Schuster SC. Next-generation sequencing transforms today's biology. Nat Methods. 5(1):16-8 (2008). Tarasov V, Jung P, Verdoodt B, Lodygin D, Epanchintsev A, Menssen A, Meister G, Hermeking H. Differential regulation of microRNAs by p53 revealed by massively parallel sequencing: miR-34a is a p53 target that induces apoptosis and G1-arrest. Cell Cycle. 6(13):1586-93 (2007). Wold B, Myers RM. Sequence census methods for functional genomics. Nat Methods. 5(1): 19-21 (2008). Conferences Next Generation Sequencing: Platforms, Applications, and Case Studies (CHI conference) http://www.healthtech.com/2008/seq/index.asp Next Generation Sequencing Symposium http://www.nminbre.org/pages/events/nmbis/2008/ Next-Generation Sequencing Data Management http://blog.bioteam.net/2008/01/15/workshopnext-generation-sequencing-data-management/ Next-Gen Sequencing Sample Preparation Genome Technology Evolving? Don’t change jobs without us. E-mail your updated address information to evolving@genomeweb.com. Please include the subscriber number appearing directly above your name on the address card. GenomeWeb Intelligence Network