Sequence data evaluation For each sample, the following files are generated: a filename .ab1 and a filename .seq file. The filename.seq file is a text file output of the sequence and can be used for blast searches. The filename .ab1 file contains annotation of the sample, the raw data trace and the analysed electropherogram. Basecalling and analysis algorithms are applied to the raw data to create the analysed data trace. When evaluating or trouble-shooting sequence data, it is important to look at the raw and analysed data traces in addition to data values (signal strength and start/end points) displayed in the annotation file. Software available to open the filename .ab1 file and view the data include: Sequence scanner (for Windows) – http://www.appliedbiosystems.com/sequencescanner Ape (for Windows and Macintosh) – http://www.biology.utah.edu/jorgensen/wayned/ape The analysed data trace should show sharp, evenly spaced peaks across the read and a clear baseline Tabs allow you to view raw, analysed, annotation or sequence views of the data Vertical QV Bars indicate the Probability of Error for each base call. Blue = QV>20 = Pe < 1% Refer Fig. 2 The raw data should show an even distribution of peaks across the read and no residual dyes ← Fig. 1 Analysed and raw data views of a LongRead control sample opened in Sequence Scanner The analysis program is set to call an “N” when QV<15 for a base call Figure 2 Quality Value Chart indicates the probability of error for each base call The base call start indicates the scan point that the read commences at and should be ~600 to 800. The end point should be ~13,000 to 14,000 or at the end of the read Average signal to noise ratio indicates labelling efficiency and should be 100 to 750 The number of QV bases >=20 should be ~950 to 1000 (less for shorter pcr fragments) Figure 3 Annotation of the sample and expected values for signal strength and start/end points Average signal to noise ratio indicates labelling efficiency and should be 100 to 750 Figure 4 Printed electropherogram Filename and plate position location Start and end points HiSQV indicates the number of bases called that have QV>20 Client name and unique laboratory number Instrument name Date run commenced and finished Plate name Trouble-shooting (Figure 5) For each sample approximately 1000 bases of sequence can be read, however, the accurate read length is largely dependent upon a variety of factors. The analysed electropherogram may display: 1. No recognisable sequence 2. Poor data and weak signal 3. Top-heavy sequence 4. Abrupt signal loss 5. Multiple sequences 6. Repeat sequence 7. Slippage after homopolymer regions 8. Delayed migration 9. Excess dye peaks 10. Pull-up peaks and very strong signal 1. No recognisable sequence Failed reactions are characterised by: • The absence of clearly defined peaks in the raw data trace • The absence of base calls in the analysed electropherogram • Very low signal-to-noise ratios S/N G:<25 A:<25 T:<25 C:<25 The cause of a failed reaction may be due to: • Insufficient or poor quality template and/or primer • Absence of primer annealing site or mutation in primer binding site • Failed sequencing reaction or clean-up Recommended actions include: • Check template and/or primer concentrations and quality • Check primer binding site and primer design • Check ethanol concentrations and centrifugation speed and times Only back ground noise present in the analysed trace Absence of peaks in the raw data profile Figure 6 No analysed data is present because the signal-to-noise level is below the threshold for bases to be called 2. Poor data and weak signal Reactions displaying low signal strength are characterised by: • Very low peak height in the raw data trace and the presence of dye blobs • In the analysed electropherogram, base calls fade off before the end of the read • Very low signal-to-noise ratios S/N G:<50 A:<50 T:<50 C:<50 Low signal strength may be the result of: • Insufficient or poor quality template and/or primer • Poor primer design (low Tm) or mutation in primer binding site • Inferior reagents used or poor clean-up Recommended actions include: • Check template and/or primer concentrations and quality • Check primer binding site and primer design • Check sequencing reagents and clean-up protocol Figure 7 Poor data, weak signal and the presence of dye blobs are often a result of low template concentration 3. Top-heavy sequence Reactions displaying top-heavy sequence are characterised by: • Very high peaks in the raw data trace that fade off abruptly • In the analysed electropherogram, base calls fade off before the end of the read • An excess of short fragments are generated that are preferentially injected into the capillary Top-heavy sequence may be the result of: • Too much template used in the sequencing reaction • Too much primer used in the sequencing reaction Recommended actions include: • Check template concentration. • Check primer concentration. Use 3.2 pmol Figure 8 Sample set up with too much template. Template and primers are exhausted at the beginning of cycle sequencing creating an excess of short fragments 4. Abrupt signal loss Abrupt signal loss is often characterised by: • Very high peaks in the raw data trace that stop abruptly • In the analysed electropherogram, base calls suddenly stop before the end of the read Abrupt signal loss may be the result of: • Secondary structure in the template • High GC content • Primer dimer contamination Recommended actions include: • Sequence complementary strand • Use a primer that anneals at a different position • Incubate the reaction at 96 degrees C for 10 minutes before cycling • Increase the extension temperature by 2 to 3 degrees C • Increase denaturation temperature to 98 degrees C • Add DMSO to a final concentration of 5% • Double all reaction components and incubate at 98 degrees for 10 minutes before cycling • Linearise the DNA with a restriction enzyme • Shear the insert into smaller fragments (<200bp) and subclone • Redesign primer to avoid primer dimer formation Figure 9 Sample displays abrupt signal loss due to the presence of secondary structure 5. Multiple sequences Reactions displaying multiple sequences are characterized by: • Lower peaks in the raw data trace • More than one sequence trace in the analysed data trace • More than one sequence commencing after base 50 to 100 (MCS) Multiple sequences may be the result of: • Mixed plasmid preparation • Multiple PCR products • Frame shift mutation • Primer-dimer contamination • Multiple priming sites • Multiple primers in reaction • Primer with N-1 contamination • Slippage after homopolymer or repeat regions in the template Recommended actions include: • Re-isolate the DNA from a pure colony and re-sequence • Check PCR template on gel for single band • Use a different primer after the mutation or sequence the complementary strand • Optimise PCR amplification or redesign primer • Make sure primer only has one priming site • Ensure only one primer has been used Figure 10 Multiple sequences – plasmid template. In this example overlapping sequences start after the multiple cloning region in the vector because more than one colony was purified Figure 11 Multiple sequences – PCR template. The presence of more than one PCR template in a reaction will result in overlapping sequences being generated 6. Repeat sequences Reactions displaying repeat sequences are characterised by: • • The gradual decrease of peak height in the raw data trace after the repeat region In the analysed electropherogram, base calls fade off after the repeat region Recommended actions include: • Sequence the complementary strand • Use a primer that anneals at a different position Figure 12 Sample displays signal loss due to the presence of a repetitive sequence 7. Homopolymer regions Sequence data containing homopolymer regions display: • Overlapping sequence following a homopolymer region due to slippage of the enzyme Recommended actions include: • Sequence the complementary strand • Use a primer that anneals at a different position • Use an anchored primer (i.e., a sequencing primer that is polyT containing a A, C, or G base at the 3’ end of a poly A region). The 3’ base will anchor the primer into place at the end of the homopolymer region Figure 13 Long homopolymer T regions (or A regions) can cause problems due to enzyme slippage. E.g. in a 20 “T” homopolmyer region, 20 “T” bases as well as 21 “T” or 22 “T” bases may be incorporated causing overlapping sequence after the homopolymer region 8. Delayed migration Sequence data displaying delayed migration show: • Peaks commence after the usual start point of 600 to 800 in the raw data trace • Peaks are not evenly spaced in the raw and analysed traces • Poor base calls in the analysd electropherogram Delayed migration may be the result of: • Contaminating negative ions (salts or other contaminants) in the sample being preferentially injected to the labeled fragments • Heavily overloaded samples. Excess of template used during sequencing Recommended actions include: • Diluting the sample in deionised formamide and rerunning the sample can often correct this problem and yield good data Figure 14 Delayed samples often result from an excess of salt in the sample 9. Excess dye peaks at beginning of sequence Sequence data displaying excess dye peaks have: • Peaks of excess dye present in the raw data trace • Dye blobs in the analysed data trace at positions 80, 120 and 190 • Low signal to noise ratios S/N G:<50 A:<50 T:<50 C:<50 The presence of dye blobs in the data may be the result of: • Incorrect estimation of template concentration (i.e. insufficient used) • Poor removal of unincorporated dye terminators Recommended actions include: • Check template concentration by agarose gel • Use fresh ethanol and sodium acetate (at room temp.) and use correct concentrations • With microfuge tubes, aspirate the supernatant rather than decanting • Do not use denatured alcohol • Do not leave reactions precipitating overnight Figure 15 Incomplete removal of excess dyes during the post cycle sequencing cleanup can obscure data at the beginning of the sequence 10 Pull-up peaks and very strong signal Sequence data displaying pull-up peaks are characterized by: • very high peaks in the raw data trace • very high peaks in the analysed data trace with pull up peaks and poor base calls • very high signal to noise ratios S/N G:>750 A:>750 T:>750 C:>750 Pull-up peaks may be the result of: • Incorrect estimation of template concentration (i.e. too much used) Recommended actions include: • Diluting the sample in deionised formamide and rerunning the sample can often correct this problem and yield good data • Reduce the amount of template used in sequencing Figure 16 Pull-up peaks and very high signal may result from use of too much template during cycle sequencing
© Copyright 2025