2007 POINTS OF VIEW Wilkinson, M. 1996. Majority-rule reduced consensus trees and their use in bootstrapping. Mol. Biol. Evol. 13:437-444. Wilkinson, M., J. A. Cotton, C. Creevey, O. Eulenstein, S. R. Harris, F.-J. Lapointe, C. Levasseur, J. O. Mclnerney, D. Pisani, and J. L. Thorley. 2005. The shape of supertrees to come: Tree shape related properties of fourteen supertree methods. Syst. Biol. 54:419-31. Wilkinson, M., F.-J. Lapointe, and D. J. Gower. 2003. Branch lengths and support. Syst. Biol. 52:127-130. Winkworth, R. C, D. Bryant, P. J. Lockhart, D. Havell, and V. Moulton. 2005. Biogeographic interpretation of splits graphs: Least squares optimization of branch lengths. Syst. Biol. 54:56-65. 355 Xu, S. 2000. Phylogenetic analysis under reticulate evolution. Mol. Biol. Evol. 17:897-907. Zaretskii, K. 1965. Constructing a tree on the basis of a set of distances between the hanging vertices. Uspekhi Matematicheskikh Nauk 20:90-92 [in Russian]. First submitted 9 May 2006; reviews returned 7 July 2006; final acceptance 15 October 2006 Associate Editor: Allan Baker Syst. Biol. 56(2):355-363,2007 Copyright © Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150701294733 JOHN GATESY,1 ROB DESALLE,2 AND NIKLAS WAHLBERG3 1 Department of Biology, University of California Riverside, Spieth Hall, Riverside, California 92521, USA; E-mail: john.gatesy@ucr.edu Division of Invertebrates and Molecular Systematics Laboratory, American Museum of Natural History, Central Park West at 79th Street, New York, New York 10024, USA; E-mail: desalle@amnh.org ^Department of Zoology, Stockholm University, S-106 91, Stockholm, Sweden and Laboratory of Genetics, University of Turku, 20014 Turku, Finland; E-mail: niklas.wahlberg@utu.fi 2 The average size of molecular systematic data sets topologies that contradicted all nodes in the tree based has grown steadily over the past 20 years. Combined on concatenation of 106 genes (Fig. la). Pairwise comparphylogenetic matrices that include multiple genetic loci isons of gene trees showed extensive incongruence, and currently are the norm, and in many cases, rapid compi- one conflicting clade, S. kudriavzevii + S. bayanus, was lation of extremely large DNA data sets is feasible. Thus, supported by a very large percentage of the gene trees a frequently asked question is "How many genes should (Fig. la). Replicated support for this anomalous clade a systematist sequence in order to generate a robust phy- was apparent in analyses of nucleotides, transversions, logenetic hypothesis?" This query generally has been codons, and amino acids for a variety of systematic methaddressed by computer simulation, where the amount ods (Rokas et al., 2003; also see Holland et al., 2004,2006; of virtual DNA sequence data that can be generated is Phillips et al., 2004; Taylor and Piel, 2004; Collins et al., unlimited (e.g., Huelsenbeck and Hillis, 1993). Genomic 2005; Gatesy et al., 2005; Ren et al., 2005; Hedtke et al., data, however, provide systematists with a multitude of 2006). By examining correlations between bootstrap scores empirical molecular data for phylogenetic analysis, and several authors have taken advantage of this resource to and possible confounding factors, however, Rokas et al. examine the effects of increasing the number of genes to (2003) concluded that "... none of the factors known or quantities that seemed impossible in the recent past (e.g., predicted to cause phylogenetic error could systematiCummings et al, 1995; Bapteste et al, 2002; Goremykin, cally account for the observed incongruence, suggesting that there may be no good predictor of the phylogenetic 2004). In one noteworthy study, Rokas et al. (2003) compiled informativeness of genes" (p. 802). Therefore, many rana large systematic matrix of 127,026 nucleotide positions domly selected genes were necessary to overwhelm confrom 106 genes for 7 species of Saccharomyces yeast and an flicting signals. In this case study, very large concatenated outgroup (Candida albicans). Maximum likelihood (ML) data sets of ~20 genes were required to provide 95% bootand parsimony analyses of this large data set produced strap support for all nodes in the combined data tree, congruent, well-supported results with bootstrap scores "substantially more genes than commonly used but a of 100% for all clades (Fig. la). In spite of this over- small fraction of any genome" (p. 799). Rokas et al. (2003) whelming support, Rokas et al. (2003) noted that there concluded that "These results have important implicawas widespread topological conflict among gene trees. tions for resolving branches of the tree of life" (p. 799) and Separate analyses of individual genes produced various "... important implications for many current practices in Downloaded from http://sysbio.oxfordjournals.org/ by guest on October 6, 2014 How Many Genes Should a Systematist Sample? Conflicting Insights from a Phylogenomic Matrix Characterized by Replicated Incongruence 356 VOL. 56 SYSTEMATIC BIOLOGY b) Calb 0.10 substitution/site FIGURE 1. (a) The tree supported by the concatenation of 106 genes from Rokas et al. (2003). ML bootstrap scores are above internodes, and the percentage of ML gene trees that strictly supported a particular clade are indicated below internodes. The most common, conflicting clade, Skud+Sbay, also is shown, (b) Branch lengths for the optimal ML tree for the concatenation of 106 yeast genes; scale bar shows expected numbers of substitutions per site (length of outgroup branch is indicated). All phylogenetic analyses in this paper were branch and bound searches executed in PAUP* 4.0bl0 (Swofford, 2002). All ML models were chosen by likelihood ratio tests as in Rokas et al. (2003) using PAUP* and ModelTest 3.06 (Posada and Crandall, 1998). Bootstrap analyses (Felsenstein, 1985) were as in Gatesy and Baker (2005). Seer = Saccharomyces cerevisiae; Spar = S. paradoxus; Smik = S. mikatae; Skud = S. kudriavzevii; Sbay = S. bayanus; Seas = S. castellii; Sklu = S. kluyveri; and Calb = Candida albicans. ity of phylogenetic hypotheses (Lanyon, 1985; Philippe and Douzery, 1994; Siddall, 1995; Brochu, 1997; Poe, 1998; Siddall and Whiting, 1999; Holland et al., 2003). Here, we use selected removal of taxa to explore patterns of incongruence in the yeast data set. In particular, we analyze different subsets of species to determine whether disagreements among gene trees are tempered or accentuated by altering taxonomic representation. In combination with documentation of branch lengths for individual gene trees, our subsampling results show that the set of species analyzed by Rokas et al. (2003) is not representative of most published systematic studies. We suggest that the yeast matrix does not provide a coherent, general recommendation for how many genes to sample in future molecular systematic studies. However, patterns of conflict for different subsets of species offer a very simple explanation for replication of the discrepant S. kudriavzevii + S. bayanus clade in many gene trees (Fig. la). EXCEPTIONALLY LONG BRANCHES Examination of the optimal topology for the concatenation of 106 genes showed a striking difference between branches that connected 5 closely related Saccharomyces species (S. cerevisiae, S. paradoxus, S. mikatae, S. kudriavzevii, S. bayanus) and branches that led to S. castellii, S. kluyveri, and the outgroup C. albicans (Fig. lb; Hedtke et al., 2006; Jeffroy et al., 2006). For the ML model utilized by Rokas et al. (2003), the branches that connected to S. castellii, S. kluyveri, and C. albicans ranged from 0.31 to 1.58 expected substitutions per site, whereas the branches that joined the remaining, closely related Saccharomyces species were from 0.03 to 0.08 substitutions per site. Only 15% of the inferred nucleotide substitutions occurred on branches that linked these 5 species (Fig. lb). Consistent with an estimated Precambrian (~723 Mya) divergence of Candida albicans from Saccharomyces cere- visiae (Hedges et al., 2004), the outgroup branch in the yeast tree was exceptionally long. Each site in the concatenated data set is expected to change 1.58 times on this Downloaded from http://sysbio.oxfordjournals.org/ by guest on October 6, 2014 molecular phylogenetics" (p. 802), points that were reasserted in a commentary by Gee (2003). Specifically, if 20 or more genes generally are required to yield robust support, then most previous phylogenetic analyses are inadequate in terms of character sampling. This assertion is based on the assumption that the 8 taxa analyzed by Rokas et al. (2003) represent a typical systematic problem. Rokas et al. (2003) considered this issue, noting that "It is possible that the 8 yeast taxa we have analyzed represent a very difficult phylogenetic case, atypical of the situations found in other groups. However, the widespread occurrence of incongruence at all taxonomic levels argues strongly against such a view. Rather, we believe that this group is a representative model for key issues that researchers in phylogenetics are confronting" (p. 802). Large matrices that combine information from 20 or more gene fragments are rare (e.g., Murphy et al., 2001; Bapteste et al., 2002; Gatesy et al., 2002; Goremykin, 2004); therefore, if the test case of Rokas et al. (2003) is representative, most published molecular systematic studies are, at best, preliminary efforts. Rokas et al. (2003) primarily used the nonparametric bootstrap (Felsenstein, 1985) to assess support and to search for correlates of incongruence in the yeast matrix. Recent reanalyses have utilized a variety of techniques to further characterize conflicting signals in the yeast data set. These approaches included Bayesian analysis (Taylor and Piel, 2004; Jeffroy et al., 2006), transversion coding (Phillips et al., 2004; Jeffroy et al., 2006), removal of rapidly evolving third codon positions (Collins et al., 2005; Jeffroy et al., 2006), partitioned Bremer support scores (Collins et al., 2005; Gatesy et al., 2005), consensus networks (Holland et al., 2004, 2006), isolation of genes with shifting base compositional biases (Collins et al., 2005), supertree bootstrapping (Burleigh et al., 2006), increased taxon sampling (Rokas and Carroll, 2005; Hedtke et al., 2006), and better fitting models of molecular evolution (Ren et al., 2005). Alternatively, several authors have suggested that reducing the number of taxa included in analysis can yield insights regarding the stabil- 2007 357 POINTS OF VIEW r Seer f Spar I Smik Jskud pj'sbay A L Seas L-Sklu ^ 95.31 1- Calb YJL085W I Seer J Spar J Smik rlskud rn'sbay A >—Seas I—Sklu Calb •i 49.89 h — 1.00 substitution/site YIL090W YDL006W Seer Spar Smik Skud Sbay Seas Sklu Calb YPL210C YDR484W YML096W YDL116W FIGURE 2. The 8 yeast genes with the longest ML branch lengths for the tree supported by the concatenation of 106 genes. The scale bar represents 1.00 expected substitution per site; branches that connect the 5 most closely related Saccharomyces species are tiny at this scale. The length of the longest branch in each yeast gene tree is indicated. Note that some topologies show more than one branch that is >1.00 expected substitution per site. Abbreviations for yeast species are as in Figure 1. b r a n c h according to the ML estimate (Fig. l b ) . For the topology supported by the concatenation of 106 genes, 43% of the yeast genes had one branch that was >2.00 expected substitutions per site, and 79% of the yeast genes had at least one branch that was >1.00 substitution per site (also see Hedtke et al, 2006). For comparison, in an often cited discussion of long branches in a 28S rDNA tree of holometabolous insects, Huelsenbeck (1998) remarked that two branches in his analysis were "among the longest ever observed (approximately 1.0 substitution per site)" (p. 530). However, branches in many of the yeast gene trees dwarfed those in the insect rDNA tree and were up to 95 times longer (Fig. 2). From another perspective, the longest branches in the yeast data set exceeded those in a tree based on mitochondrial genomes from 5 animal phyla (Naylor and Brown, 1998) and also were much longer than branches in simulations designed to assess misplacement of long branches (e.g., Anderson and Swofford, 2004). Although it has been suggested that the set of species in the yeast data set represents a typical phylogenetic problem (Rokas et al., 2003), the extraordinarily long branch lengths in most yeast gene trees demonstrate that this is not the case (e.g., Fig. 2). related Saccharomyces species (Fig. la). In ML analyses, 11% of gene trees conflicted with the S. cerevisiae + S. paradoxus clade, 28% conflicted with the S. cerevisiae + S. paradoxus + S. mikatae clade, and 45% conflicted with the S. cerevisiae -f S. paradoxus + S. mikatae + S. kudriavzevii clade. Given the moderate lengths of branches that linked S. cerevisiae, S. paradoxus, S. mikatae, S. kudriavzevii, and S. bayanus (Fig. lb), we were surprised by the widespread discrepancies among genes at this level. To further explore differences among gene trees, we reanalyzed the 5 closely related species of Saccharomyces in isolation from their distant relatives, S. castellii, S. kluyveri, and C. albicans. Given the diversity of gene trees for all 8 taxa, we expected to find many conflicting topologies but were shocked by complete congruence among the 106 gene trees in ML analyses (Fig. 3). There are 15 possible bifurcating trees (unrooted) for a data set of 5 taxa; assuming an equal probability for each topology a priori, the chance of recovering the same tree 106 straight times is astronomically low (P = 3.24 x 10~124). Ironically, a systematic data set that has been presented as a prime example of pervasive, inexplicable conflict among genes (Rokas et al., 2003) can be transformed, with the removal of 3 species, into a remarkably congruent data set that shows 100% agreement among COMPLETE CONGRUENCE FOR FIVE CLOSELY RELATED 106 genes. For the set of 5 closely related Saccharomyces species, 20 genes were not necessary to resolve relationSaccharomyces SPECIES For the yeast data set, gene trees that included all 8 ships; basically any gene will do (Fig. 3c). Subsets of only species showed many conflicts among the 5 most closely 600 randomly resampled nucleotides from the yeast data Downloaded from http://sysbio.oxfordjournals.org/ by guest on October 6, 2014 •i 16.94 YKL034W 418.84 358 VOL. 56 SYSTEMATIC BIOLOGY +CD CO T 3 1—|O a) ites th he cla ™~ Seer b) — Spar 100- 90- TO 5 = y. / "o CD 80M- Q. vP 0^ 0 = Scei+Spar • = Scer+Spar+S nik 3 CO 70" 400 200 600 800 1000 Base pairs sampled O) / ioor Seer 100/1 Spar ggj-Scer 7jj- Seer 99(1 Spar 82 Seer 96/Lspar Jl—Smik gy-Scer 100/L Spar 1— Skud 1 — Sbay loofLspar Jl—Smik II—Skud L _ Sbay Jl—Smik II—Skud fl—Skud II—Skud 1—Sbay L — Sbay I—Sbay YAL053W 1701 bp YAR007 1458 bp YBL015W 1557 bp YBL091C 993 bp YBR039W 744 bp 8 3|- Seer loort-Spar ,001-Scer loofi-Spar sj-Scer 89n_spar Jl-Smik fl—Skud I Sbay Jl—Smk ioorScer 99jT.Spar ioorScer sijH-Spar Jl-Smik fl—Skud Jl-Smik fl—Skud Jl—Smik fl—Skud Jl—Smik fl—Skud 1 1 1 1 Sbay 76|-Scer s j l Spar Jl—Smik fl—Skud L_Sbay YDR054C 618 bp sip Seer iooflspar Jl—Smk |l—Skud 1 Sbay YER090W 1434 bp 93j-Scer 89n.Spar JL-Smk fl—Skud I—Sbay YHR019C 1272 bp Sbay YCR017C 1122 bp 77 j-Scer loon-Spar Jl-Smik fl—Skud L_Sbay YDR072C 891 bp 100 r Seer 92/I. Spar Jl—Smik 1—Skud 1—Sbay YFR044C 1269 bp 10o|-Scer iooflspar JL-Smik P—Skud 1—Sbay YHR137W 1056 bp ,oprScer 97 j-Scer 99JI Spar Jl—Smik fl—Skud ssfLspar Jl-Smik |l—Skud I—Sbay 1 Sbay YMLuoc Sbay YDL006W 552 bp ,opj- Seer 92jlspar Jl—Smik 1—Skud 1 Sbay YDRioic 1362 bp 99i-Scer fl—Skud [1—Skud 1 1 Sbay YGL001C 1026bp ,001-Seer 100JI Spar JL-Smik fl—Skud I Sbay YIL109C 1890 bp gyScer 100/1. Spar 78 j-Scer 6o/LSpar Jl—Smik fl—Skud «—Sbay YiRoosc 1062 bp 1 Sbay YDL126C 2193 bp ,oopScer loojlspar YDL148C 1215 bp 78j-Scer 93/lSpar Jl—Smik ,oo r Seer 82/lSpar Jl—Smik ioorScer 100/I Spar Jl—Smik fl—Skud fl—Skud fl—Skud 1 1 1 11—Skud I—Sbay 84i- Seer Sbay YGL225W 813 bp Sbay YGL253W 1026 bp Jl-Smik I—Skud 1 Sbay Jl-Smik [1—Skud Jl-Smik I—Skud I—Sbay 97j-Scer 68/LSpar Jl—Smik I—Skud I Sbay ,001-Scer 93|Lspar Jl—Smik |l—Skud 1 Sbay YIL088C 981 bp YIL090W 993 bp YIL125W 2826 bp 95|lspar Jl—Smik |l—Skud 1—Sbay ioorScer 99(1 Spar ioorScer loon. Spar 98 r Seer 97/1 Spar 99- Seer 99R-Spar Jl—Smik y—Skud I—Sbay JL-Smik fl—Skud Jl-Smik I—.Sbay 1 Jl-Smik fl—Skud I—Sbay YKR099W 633 bp YML096W 909 bp YMR041C 771 bp fl—Skud Sbay YMR203W 858 bp Sbay YGRO05C 741 bp 1 Sbay YJR117W 1227 bp Sbay YKL104C 990 bp YNL201C 1614 bp ,oorScer 96n.Spar YDR531W 483 bp 7 i f l Spar Jl—Smik I ,001-Scer 95n.Spar YDR484W 1116 bp 1 YOR361C 1575 bp YMR277W 1392 bp <50pScer 92/1 Spar YDR465C 810 bp L — Sbay L—Sbay YDL195W 1254 bp YDR443C 1818 bp Jl—Smik fl—Skud I—Sbay Jl-Smik fl—Skud Jl—Smik fL_Skud I—Sbay Jl—Smik Jl—Smik fl—Skud Jl-Smik fl—Skud I—Sbay ,oorScer 'OOTL Spar 1—Skud 1 Sbay Jl-Smik fl—Skud Seer 9iH.Spar YBR126C 906 bp 1—Skud 1 Sbay Jl-Smik fl—Skud 99 j-Scer 55n.Spar Sbay Jl—Smik 97 j-Scer 99/Lspar YNL104C 1596 bp 1 Jl—Smik ,001-Scer 96/t-Spar II—Skud I Sbay 11—Skud 1—Skud L—Sbay 861-Seer i o o ] l Spar YJLIOOW 390 bp gar Seer <5on.Spar Jl—Smik Jl—Smik II—Skud I Sbay YNL155W 495 bp 861- Seer 57|lspar YDLI66C 519 bp ggj-Scer 100/L Spar I—-Sbay YOR197W 1026 bp 92j-Seer 97j-Scer 94n_Spar Jl—Smik fl—Skud Sbay 1170bp Jl—Smik fl—Skud L_Sbay Jl-Smik YOR158W 756 bp YBRHOW Jl—Smik fl—Skud I Sbay 1—Skud 1 Sbay 1 Sbay JL-Smik fl—Skud Jl—Smik YOR025W 900 bp I sjfi-Spar 1—Skud L—Sbay 1 Jl—Smik II—Skud saj-Scer 98ll.Spar Jl—Smik YNL062C 1035 bp ,001- Seer 100ft. Spar gar Seer loort-Spar gjj-Scer 86Jt_Spar Jl—Smik fl—Skud 1—-Sbay 93r Seer YBR07OC 432 bp ,001-Scer 96(1 Spar JL-Smik fl—Skud Sbay YGL205W 1614 bp I—Skud Sbay 1OOj-Scer 100/Lspar Jl-Smik fl—Skud 1128 bp YGL192W 756 bp I saj-Scer 95ft. Spar Jl—Smk fl—Skud YPRI8IC Sbay Jl-Smik ioorScer e m . Spar loorScer 93flSpar 1 YDR361C 429 bp ejlSpar Jl—Smik fl—Skud I Sbay gap Seer looTUSpar YPR140W 1011 bp Y0R176W 711 bp 1001-Scer ioor Seer 94H. Spar Sbay JL-Smik II Skud L_Sbay gji-Scer 95JL Spar Jl—Smik II—Skud I Sbay 92/t.Spar Jl—Smik YMR186W 2058 bp YOL145C 1788 bp ,001-Scer 100/LSpar YDL116W 1479 bp 85r Seer 97(1 Spar Jl—Smik YMR015C 1413 bp 735 bp Sbay YDL031W 1953 bp YBR056W 882 bp gei-Scer 89fLspar Sbay YKR089C 1446 bp YGR094W 2994 bp ioorScer 93flspar Jl—Smik fl—Skud I—Sbay sjrScer ,001-Scer 98(1 Spar e-sflSpar Jl-Smik II—Skud I Sbay Jl—Smik fl—Skud YBR162C 879 bp YBR179C 1815 bp 1 Sbay gjj-Scer looTLspar g^-Scer ioort-Spar JL-Smik fl—Skud JLsmik fl—Skud L—Sbay L—Sbay "0L215C £448 bp YDL238C 678 bp 7B j-Scer L—.Sbay L—Sbay YEL037C :i99 bp 77j-Scer 93jrt-Spar .fl—Smik II—Skud L—Sbay /GR194C 1032 bp 1 Sbay YER005W 552 bp 94i-Seer 1001-Scer eejlspar Jl—Smik II—Skud 1—Sbay YGR285C 1236 bp g7#.Scer 100/Lspar ,oorScer 100JI Spar JL-Smik fl—Skud I—Sbay Jl-Smik fl—Skud L_Sbay ,oprScer 100/lspar ,001-Scer seflSpar gpjScer 95TlSpar Jl-Smik Jl—Smik fl-Smik 1— Skud 1 Sbay 1—Skud —Sbay JL-Smik fl—Skud I—Sbay YNL287W 2454 bp YNR038W 1080 bp YOL049W 1086 bp YLR389C 2016 bp ,001-Scer 9 m . Spar gjj-Scer 93JI Spar 99j-Scer M R . Spar loorScer toon. Spar JL-Smik Jl-Smik fl—Skud JL-Smik Jl-Smik fl—Skud Jl-Smik L—.Sbay Lskud 1 1 Jl—Smik fl—Skud 1—Sbay ,001-Scer iooflspar Jl-Smik II—Skud 1 Sbay YJL085W 1131 bp Sbay YHL014C 996 bp 1001-Scer sjflspar JL-Smik fl—Skud L__ Sbay YML021C 654 bp ,oor Seer 9in.Spar Jl—Smik fl—Skud I—Sbay YOL097C 927 bp 9irScer loorlspar Jl—Smik fl—Skud 1—Sbay YPL104W 1542 bp YPL106C 1995 bp YPL169C 780 bp YPL195W 1614 bp YPL210C 1251 bp YPR074C 1770 bp 96j-Scer 100/Lspar ggj-Scer 100/L Spar ggj-Scer 100H. spar 92j-Scer 114/L Spar 9 grScer 98/L Spar 72P Seer 81/L Spar Jl-Smik II—Skud I—-Sbay Jl—Smik II—Skud I Sbay -H-Smik 11— Skud fl-Smik Jl—Smik fl—Skud L_Sbay JL-Smik II—Skud L—Sbay YJL087C 1305 bp YJRO68W 894 bp L — Sbay 1—Skud —Sbay YJR072C 1062 bp YKL034W 852 bp 97r Seer 1001-Scer gsrScer 1001-Scer 99jlspar Jl—Smik fl—Skud I—Sbay iooflspar JL-Smik fl—Skud 1—Sbay loojlspar Jl-Smik fl—Skud 98jlspar Jl-Smik fl—Skud 1—Sbay YNL123W 2241 bp YNL220W 1257 bp YNL082W 828 bp 1—Skud —Sbay YER087W 810 bp 96|lspar Jl—Smik 1—Skud L — Sbay 99|-Scer 79/L Spar Sbay YDR021W 1146 bp ioor Seer 98 j-Scer 99fuSpar YPL028W 885 bp Jl—Smik fl—Skud L_Sbay oojlSpar JL-Smik 11—Skud 86r Seer 100ft. Spar M—Skud l—_ Sbay ioorScer loojrLspar 951- Seer YLR253W 1287 bp YNL248C 618 bp YBR198C 1587bp 6'ji-Spar i o o f l Spar -Il-Smik Jl-Smik fl—Skud f 1— Skud YLL029W 1605 bp II—Skud I Sbay ioorScer iooflspar Jl—Smik 1—Skud 1—Sbay L—Sbay YNL313C 1590 bp YKL120W 831 bp YKR071C 651 bp ioorScer l u o f l Spar Jl-Smik II—Skud 1—Sbay YNR008W 1176 bp FIGURE 3. One hundred percent congruence among 106 gene trees for the 5 closely related Saccharomyces species. In separate ML analyses, all genes supported the same topology (a) that included Scer+Spar (white circle) and Scer+Spar+Smik (black circle). Very few randomly sampled nucleotides were required to consistently recover these clades in analyses of the 5 closely related Saccharomyces species (b); subsamples of only 600 nucleotides supported each clade >95% of the time (dotted line = 95%). The 106 completely congruent ML gene trees for Seer, Spar, Smik, Skud, and Sbay are shown in (c); ML bootstrap scores and the number of nucleotides for each gene are indicated. Species abbreviations are as in Figure 1. For all genes, ML models for 5 species were chosen as in Rokas et al. (2003) using PAUP* and ModelTest 3.06. Analyses of randomly sampled sites from the yeast data set (b) were done in PAUP* and included 500 replicates for each sample size (200, 400, 600, 800, and 1000 nucleotide sites). ML searches were branch and bound. Downloaded from http://sysbio.oxfordjournals.org/ by guest on October 6, 2014 YCL054W 2199 bp Jl-Smik 97i- Seer 94JI Spar Jl—Smik I—Skud I Sbay 2007 POINTS OF VIEW 359 change was restricted to the outgroup branch (3.04 substitutions per site), and the concatenation of 106 genes supported a grouping of S. kudriavzevii + S. bayanus with an ML bootstrap score of 76% (Fig. 4). This relationship was incompatible with the S. cerevisiae + S. paradoxus + S. mikatae + S. kudriavzevii clade that had 100% bootstrap support in the analysis of 106 genes for 8 species INCONGRUENCE AMONG GENES WITH THE ADDITION la). Thus, a systematic data set of 8 species, in which (Fig. OF DIVERGENT TAXA 20 genes were considered sufficient for robust phylogeIn ML analyses of S. cerevisiae, S. paradoxus, S. mikatae, netic support (Rokas et al., 2003), can be transformed, S. kudriavzevii, and S. bayanus, there were no topologi- with the deletion of 2 species, into a data set of 106 genes cal discrepancies among gene trees (Fig. 3c), but when that yields a contradictory tree; >100 concatenated genes more distantly related taxa (S. castellii, S. kluyveri, and did not provide >95% bootstrap support at all nodes C. albicans) were included, gene trees showed extensive for this set of 6 species (Fig. 4). Phylogenetic analyses of conflicts regarding relationships among the 5 closely re- taxon subsamples (Figs. 3,4) clearly show that the numlated Saccharomyces species. Previously, we noted such ber of genes required to yield strong bootstrap scores is conflicts for the full complement of 8 species (Fig. la). highly dependent on the particulars of a given systematic Analyses of the 7 Saccharomyces species, excluding the problem and suggest that long branches (Fig. 2) explain outgroup C. albicans, also revealed widespread incongru- much of the incongruence among genes in the yeast data ence among genes (Rokas et al., 2003), as did analyses of set (Taylor and Piel, 2004; Hedtke et al., 2006; Jeffroy et al., 6 species (Fig. 4). 2006). For the 6 species set composed of 5 closely related Saccharomyces species and C. albicans, the disparity in length HIGHLY REPLICATED INCONGRUENCE WITH between the outgroup branch and ingroup branches was THE ADDITION OF DISTANT TAXA greatest. Approximately 87% of the expected character In ML analyses of the 5 closely related Saccharomyces species, all gene trees had the same 7 branches (Fig. 3a). Assignment of the root to 5 of these 7 branches will yield the incongruent S. kudriavzevii + S. bayanus clade (Fig. 5a). When the distantly related C. albicans was added to a matrix that included S. cerevisiae, S. paradoxus, S. -(0.93]. mikatae, S. kudriavzevii, and S. bayanus, all ML gene trees Sbay were consistent with the unrooted topology for these 5 •Seas Saccharomyces species (Fig. 3a), but rooting position was scattered among the 7 ingroup branches (Fig. 5b). The most common root position was on the "correct" branch (S. bayanus; 31 times). For the other 75 genes, the root was distributed across the remainder of the topology, and the majority of gene trees (57 of 106) supported the S. kudriavzevii + S. bayanus clade (Figs. 4 and 5b). Assuming an equal a priori probability of recovering the S. kudriavzevii + S. bayanus clade or the S. cerevisiae + S. paradoxus + S. mikatae + S. kudriavzevii clade, it would be highly unlikely for one of these groups to be supported in >57 of 88 gene trees, as was the case here (binomial probability of 0.007). Analogous but less extreme patterns were observed in ML gene trees for other combinations of 6 taxa, in which the 5 closely related Saccharomyces species Calbwere rooted with either S. castellii or S. kluyveri. S. kuFIGURE 4. Trees supported by the concatenation of 106 genes in driavzevii + S. bayanus was always the most common, ML analyses of 6 species. Note the very long outgroup branches; ex- conflicting clade (Fig. 4). Likewise, Rokas et al. (2003) pected substitutions per site for the outgroup branches are indicated. In documented the same pattern of replicated support for the topology rooted with Calb, the conflicting Skud+Sbay clade was supported. ML bootstrap scores are above internodes, and the per- the conflicting S. kudriavzevii + S. bayanus in gene trees centage of ML gene trees that strictly supported a particular clade are for the 7 Saccharomyces species, excluding the outgroup indicated below internodes. For the top and middle trees, numbers in C. albicans. parentheses indicate the percentage of times that the Skud+Sbay clade Of the 10,395 possible bifurcating topologies for all was supported; for the bottom tree, the number in parentheses is the 8 species, the S. kudriavzevii + S. bayanus bipartition is percentage of gene trees that supported the Scer+Spar+Smik+Skud clade. Species abbreviations are as in Figure 1. For each gene and for found in only 9% of all trees. However, ML analyses the concatenation of 106 genes, ML models for each set of 6 species of 8 species showed that S. kudriavzevii + S. bayanus were chosen as in Rokas et al. (2003) using PAUP* and ModelTest 3.06. was recovered 32 times in 106 gene trees (Fig. la); set were sufficient to support the 2 nodes in the 5 taxon tree in >95% of replicates, and only 200 nucleotides recovered each clade > 75% of the time (Fig. 3a, b). These are very small samples of characters relative to most modern systematic studies. Downloaded from http://sysbio.oxfordjournals.org/ by guest on October 6, 2014 360 VOL. 56 SYSTEMATIC BIOLOGY Seer a) Spar Smik Skud Sbay 0.02 substitutions/site b) ITTT" nmmmmnr mrmmmmm' Seer Spar Smik Skud Sbay nnmmSpar mmtmmmm r J"L £v Lr ^Seer Skud Sbay Smik Seer Spar r J~L - [P ^ " T- beer Spar Skud Sbay — Smik J«— Smik (DJ1— Seer 1 Spar c) rtmr Spar TTTT mr trm Smik mmmmnn Skud Sbay r Skud Jl-Sbay nnmmim mnmiTTTmtmmr TTTmTTTTT Seer Smik Skud Sbay FIGURE 5. (a) ML tree supported by concatenation of 106 genes for the 5 closely related Saccharomyces species, with possible root placements shown. Five of the 7 roots yield the conflicting Skud+Sbay clade. (b) Placement of the root in 106 ML gene trees for 6 species (Seer, Spar, Smik, Skud, Sbay, and Calb). (c) Placement of the root in 106 ML gene trees for all 8 yeast species. Arrows on branches indicate the number of times a particular branch was rooted by Calb (b) or by Calb, Seas, and Sklu (c). For 3 genes, there were 2 optimal rootings; truncated arrows on branches in (b) indicate that in 1 of the 2 best trees, the root was assigned to that branch. Species abbreviations are as in Figure 1. previous parsimony, ML, and Bayesian results showed How MANY GENES ARE ENOUGH? this same pattern of replicated incongruence whether Rokas et al. (2003) suggested that their analyses of 106 nucleotides, transversions, codons, or amino acids were genes from 8 species had important implications for reanalyzed (Rokas et al, 2003; Phillips et al., 2004; solving the tree of life. In particular, they argued that Taylor and Piel, 2004; Collins et al., 2005; Gatesy et al., 20 or more genes might be required to garner robust 2005; Ren et al., 2005; Burleigh et al., 2006; Holland et al., support for phylogenetic relationships. This assertion 2004,2006). Repeated recovery of the incongruent S. ku- is based on two critical assumptions: (1) There are no driavzevii + S. bayanus clade in ~30% of our ML gene good predictors for the utility of different genes, and trees strongly suggested an underlying bias. Once again, (2) the 8 species in their data set represent a typical phyall ML gene trees for 8 species were compatible with re- logenetic problem. Recent reanalyses of the yeast data lationships in the unrooted tree for the 5 closely related set have contested both of these assumptions. Phillips Saccharomyces species (Fig. 3a), but different placements et al. (2004) noted that differences in G-C content might of the 3 long branch taxa (Fig. 6a, b) resulted in many explain non-historical signals in the yeast matrix. Subsegene trees that supported the S. kudriavzevii + S. bayanus quently, Collins et al. (2005) found that shifts in base comclade (Figs, la and 5c). As in the analyses of 6 taxa (Fig. 4), position were most prominent at third codon positions replicated support for the conflicting S. kudriavzevii + (also see Jeffroy, 2006). When Collins et al. (2005) resamS. bayanus grouping was due to erratic rooting of the pled genes with stationary base compositions, only 10 uniformly supported, pectinate topology for S. cerevisiae, genes were required to record high bootstrap percentS. paradoxus, S. mikatae, S. kudriavzevii, and S. bayanus ages for relationships supported by the concatenated (Fig. 3a) by the very distantly related S. castellii, S. data set. By contrast, 23 genes characterized by large kluyveri, and C. albicans (Figs. 4 to 6; for discussion of shifts in base composition were necessary to yield the distant outgroups see Wheeler, 1990; Huelsenbeck et al., same level of support (Collins et al., 2005). In a study 2002; Holland et al., 2003; Anderson and Swofford, 2004; focused on Bayesian support measures, Taylor and Piel Bergsten, 2005; Susko et al., 2005; Goloboff and Pol, 2005; (2004) found that, "Overall the external/internal branch Hedkeetal.,2006). length ratios were greater for trees that were incongruent Downloaded from http://sysbio.oxfordjournals.org/ by guest on October 6, 2014 r _P"1— p "I 2007 361 POINTS OF VIEW ® | Skud Sbay Smik Jl-Sm i 1 1 Spar Spar Seer I- Seer Seas Sklu Calb 0.10 0.10 YBR039W (72%) YDL215C (47%) | — 0.10 0.10 YGR194C (50%) Calb 0.10 YEL037C (72%) 0.10 YNL155W (77%) Skud Sbay Smik J-Srr © | L Spar ** Seer Seas Sklu Calb YLL029W (75%) pO-HZ i— 1 — Sklu Calb Calb 0.10 YNL201C (60%) b) L — 0.10 YDL195W — 0.10 YOR025W (54%) Skud Sbay Smik Seer f<6>f Si i Spar Spi Seas - Sklu Calb 0.10 YJR072C (68%) Smik Calb Calb 0.10 YGL192W 0.10 YKL120W FIGURE 6. Optimal ML topologies for 12 yeast genes. All gene trees are consistent with the unrooted topology for the 5 closely related Saccharomyces species (Fig. 3); branches that connect these 5 species are shown as thick lines. Numbers in circles indicate rooting position according to Figure 5. The 9 genes in (a) all supported the Skud+Sbay clade (bootstrap support for Skud+Sbay is shown in parentheses). When these 9 genes were combined, Skud+Sbay was not supported, and the topology favored by the concatenation of 106 genes was optimal (see Gatesy and Baker, 2005). The three gene trees in (b) support alternative topologies. Two of these genes did not support monophyly of the 5 closely related Saccharomyces species but were compatible with the unrooted tree for these species (Fig. 3a). with the reference tree [our Fig. la, b ] . . . " (p. 1536), a result that was statistically significant. In sum, the contention that there are no good predictors of phylogenetic utility for particular genes does not seem to hold for this phylogenomic data set. Following Taylor and Piel (2004), Ren et al. (2005), Hedtke et al. (2006), Jeffroy et al. (2006), and Holland et al. (2006) noted the presence of exceptionally long branches and argued that a high level of divergence and associated branch length inequalities (e.g., Fig. 2) were determinants of conflict among genes in the yeast data set. Here, we extended these arguments and concluded that the 8 species in the yeast data set do not represent a "typical" phylogenetic problem. The tree based on the concatenated matrix of 106 genes showed great disparities in branch lengths (Fig. lb), but individual gene trees had some truly extraordinary branches that were up to 95.31 expected substitutions per site (Fig. 2). This saturation of nucleotide substitution does not represent a typical phylogenetic problem; many systematists acknowledge that this degree of divergence is a very difficult problem (Felsenstein, 1978; Hendy and Penny, 1989; Wheeler, 1990; Huelsenbeck, 1998; Pol and Siddall, 2001; Holland et al., 2003; Anderson and Swofford, 2004; Bergsten, 2005; Susko et al., 2005). S. castellii, S. kluyveri, and C. albicans are very genetically distant from each other and from the 5 most closely related Saccharomyces species (Figs. 1, 2, 4, and 6). Therefore, it was not surprising that there were wholesale conflicts among gene trees in parsimony, Bayesian, and ML analyses (Taylor and Piel, 2004; Hedtke Downloaded from http://sysbio.oxfordjournals.org/ by guest on October 6, 2014 Seer Spar Skud Sbay L cSmik Seas 362 SYSTEMATIC BIOLOGY VOL. 56 ITERATED CONFLICT IN PHYLOGENOMIC MATRICES Our reanalyses provided a very simple explanation for replicated support of the conflicting S. kudriavzevii + S. bayanus clade in many yeast gene trees (Fig. la). Misrooting of a stable topology for 5 close relatives (Fig. 3a) by 3 genetically distant taxa (Fig. 4) can account for this iterated pattern. Because the topology for S. cerevisiae, S. paradoxus, S. mikatae, S. kudriavzevii, and S. bayanus was pectinate (Fig. 3a), erratic placement of the root (Fig. 5) repeatedly yielded the discrepant S. kudri- Anderson, F. E., and D. L. Swofford. 2004. Should we be worried about long-branch attraction in real data sets? Investigations using metazoan 18S rDNA. Mol. Phylogenet. Evol. 33:440-451. Bapteste, E., H. Brinkmann, J. A. Lee, D. V. Moore, C. W. Sensen, P. Gordon, L. Durufle, T. Gaasterland, P. Lopez, M. Miiller, and H. Philippe. 2002. The analysis of 100 genes supports the grouping of three highly divergent amoebae: Dictyostelium, Entamoeba, and Mastigamoeba. Proc. Natl. Acad. Sci. USA 99:1414-1419. Bergsten, J. 2005. A review of long-branch attraction. Cladistics 21:163193. Brochu, C. 1997. Morphology, fossils, divergence timing, and the phylogenetic relationships of Gavialis. Syst. Biol. 46:479-522. Burleigh, J. G., A. C. Driskell, and M. J. Sanderson. 2006. Supertree bootstrapping methods for assessing phylogenetic variation among genes in genome-scale data sets. Syst. Biol. 55:426-440. Downloaded from http://sysbio.oxfordjournals.org/ by guest on October 6, 2014 et al., 2006). In fact, the 3 divergent taxa, which also were avzevii + S. bayanus clade. In the most extreme case that characterized by the largest shifts in base composition we examined (a data set that included 6 of 8 species in (Collins et al., 2005), accounted for all conflicts among the yeast matrix), the "wrong clade" was preferred over the "right clade" in the majority of ML gene trees (57 of genes in ML analyses. Because of extensive incongruence, Rokas et al. (2003) 106 = 54%) and in the concatenated analysis of 106 genes found that 20 randomly sampled genes from the yeast (Fig. 4). This result represents a cautionary tale for phylogematrix were required for a robustly supported tree of 8 species, but this result has no generality. Even within this nomic studies, in which >100 genes from relatively few 106 gene matrix, it is clear that some systematic problems taxa may be sampled, and where congruence among inare much more difficult to solve relative to others. For the dividual gene trees has been used to assess support (e.g., 5 closely related Saccharomyces species, one gene might Rokas et al., 2003; Holland et al., 2004, 2006; Burleigh be sufficient (Fig. 3c). ML analyses of individual genes et al., 2006). Previously, many authors have argued produced the same tree 106 straight times, and sets of 600 that large concatenations of genes can provide strong, randomly sampled nucleotides consistently supported but spurious, bootstrap support because of model misthis topology (Fig. 3b). By contrast, in ML analyses of specification, inadequate taxon sampling, or both (e.g., these 5 species plus C. albicans, 106 concatenated genes Philippe and Douzery, 1994; Naylor and Brown, 1998; (127,026 nucleotides) apparently were insufficient; the Holland et al., 2004,2006; Phillips et al., 2004; Soltis et al., optimal ML tree (Fig. 4) contradicted the best tree for 2004; Stefanovic et al., 2004; Hedtke et al., 2006; Jeffroy all 8 yeast species, a topology that was thought to show et al., 2006). Here, we documented an exceptional patan unprecedented level of support (Fig. la, b; Gee, 2003; tern of replicated conflict in which a consensus derived from separate analyses of >100 genes failed to give the Rokas et al., 2003). Clearly, the quantity of genes that is required to right result; nearly twice as many gene trees favored robustly resolve relationships will be dependent on the wrong grouping of S. kudriavzevii + S. bayanus over the specifics of the phylogenetic problem at hand the right S. cerevisiae + S. paradoxus + S. mikatae + S. (Cummings and Meyer, 2005; Hedtke et al., 2006), as well kudriavzevii clade (Fig. 4). In comparison to concatenaas a particular researcher's definition of "adequate sup- tion of genes, it might be expected that partitioned phyport" (e.g., Satta et al, 2000; Zander, 2001; Siddall, 2002; logenetic analyses of individual genes should be less Grant and Kluge, 2003; Soltis et al., 2004; Taylor and Piel, prone to highly supported but spurious results. Unfor2004; Jeffroy et al., 2006). For easy phylogenetic problems tunately, this is not always the case (Fig. 4), and a simwhere divergence among taxa is not great and intern- ple compilation of many genes for very few taxa (Rokas odes are moderately long, a single gene might provide et al., 2003; Rokas and Carroll, 2004) cannot be trusted high bootstrap support (e.g., Fig. 3). However, even in as a general solution for "ending incongruence" (Gee, this situation, sequencing 2 or more genes may be jus- 2003). tified, given that tightly linked nucleotides do not necessarily provide independent evidence for phylogenetic ACKNOWLEDGEMENTS relationships (Doyle, 1992). For cases where exceptionWe thank R. Baker, T. Collins, A. de Queiroz, J. Garb, C. Hayashi, ally long branches are apparent, even 106 genes might M. R. McGowen, R. Page, and two anonymous reviewers for comments not be enough (e.g., Fig. 4). When faced with extreme on different versions of the manuscript. J. Gatesy was supported by NSF branch lengths like these, increased taxonomic sampling (USA) DEB-0212572, DEB-0213171, and EAR-0228629; R. DeSalle was (e.g., Zwickl and Hillis, 2002), evidence from the fos- supported by the Lewis B. and Dorothy Cullman Program in Molecsil record (e.g., Brochu, 1997), or a set of more slowly ular Systematics at the American Museum of Natural History and by evolving genes (e.g., Springer et al., 2001) with station- NSF (USA) DBI-0421604; N. Wahlberg was supported by the Swedish Council 621-2004-2853. G. Naylor provided alignments of anary base frequencies (e.g., Collins et al., 2005) may be Research imal mitochondrial genomes. A. Rokas provided published multiple required. Educated guesses can be made, but in the end, sequence alignments and supporting materials that made the present the amount of character data needed to arrive at a sta- study possible. ble, well-supported phylogenetic hypothesis can only be quantified by adding new data to existing data and then reassessing the results. REFERENCES 2007 POINTS OF VIEW 363 Downloaded from http://sysbio.oxfordjournals.org/ by guest on October 6, 2014 Collins, T. M., O. Fedrigo, and G. J. P. Naylor. 2005. Choosing the best Philippe, H., and E. Douzery. 1994. The pitfalls of molecular phylogeny genes for the job: The case for stationary genes in genome-scale phybased on four species, as illustrated by the Cetacea/Artiodactyla logenies. Syst. Biol. 54:493-500. relationship. J. Mamm. Evol. 2:133-152. Cummings, M. P., and A. Meyer. 2005. Magic bullets and golden Phillips, M. J., F. Delsuc, and D. Penny. 2004. Genome-scale phylogeny rules: Data sampling in molecular phylogenetics. Zoology 108:329and the detection of systematic biases. Mol. Biol. Evol. 21:1455-1458. 336. Poe, S. 1998. Sensitivity of phylogeny estimation to taxonomic samCummings, M. P., S. P. Otto, and J. Wakeley. 1995. Sampling properpling. Syst. Biol. 47:18-31 ties of DNA sequence data in phylogenetic analysis. Mol. Biol. Evol. Pol, D., and M. E. Siddall. 2001. Biases in maximum likelihood and par12:814-822. simony: A simulation approach to a 10-taxon case. Cladistics 17:266Doyle, J. J. 1992. Gene trees and species trees: Molecular systematics as 281. one-character taxonomy. Syst. Bot. 17:144-163. Posada, D., and K. Crandall. 1998. ModelTest: Testing the model of Felsenstein, J. 1978. Cases in which parsimony and compatibility methDNA substitution. Bioinformatics 14:817-818. ods will be positively misleading. Syst. Zool. 27:401-410. Ren, F., H. Tanaka, and Z. Yang. 2005. An empirical examination of the Felsenstein, J. 1985. Confidence limits on phylogenies: An approach utility of codon-substitution models in phylogeny reconstruction. using the bootstrap. Evolution 39:783-791. Syst. Biol. 54:808-818. Gatesy, J., and R. H. Baker. 2005. Hidden likelihood support in ge- Rokas, A., and S. B. Carroll. 2005. More genes or more taxa? The relanomic data: Can forty-five wrongs make a right? Syst. Biol. 54:483tive contribution of gene number and taxon number to phylogenetic 492. accuracy. Mol. Biol. Evol. 22:1337-1344. Gatesy, J., C. Matthee, R. DeSalle, and C. Hayashi. 2002. Resolution of Rokas, A., B. Williams, N. King, and S. B. Carroll. 2003. Genome-scale a supertree/supermatrix paradox. Syst. Biol. 51:652-664. approaches to resolving incongruence in molecular phylogenies. NaGee, H. 2003. Ending incongruence. Nature 425:782. ture 425:798-804. Goloboff, P. A., and D. Pol. 2005. Parsimony and Bayesian phyloge- Satta, Y., J. Klein, and N. Takahata. 2000. DNA archives and our nearest netics. Pages 148-159 in Parsimony, phylogeny, and genomics (V. A. relative: The trichotomy problem revisited. Mol. Phylogenet. Evol. Albert, ed.). Oxford University Press, Oxford, UK. 14:259-275. Goremykin, V. V. 2004. The chloroplast genome of Nymphaea alba: Siddall, M. 1995. Another monophyly index: Revisiting the jackknife. Whole-genome analyses and the problem of identifying the most Cladistics 11:33-56. basal angiosperm. Mol. Biol. Evol. 21:1445-1454. Siddall, M., and M. Whiting. 1999. Long-branch abstractions. Cladistics Grant, T, and A. G. Kluge. 2003. Data exploration in phylogenetic in15:9-24. ference: Scientific, heuristic, or neither. Cladistics 19:379-418. Siddall, M. E. 2002. Measures of support. Pages 80-101 in Methods and Hedges, S. B., J. E. Blair, M. L. Venturi, and J. L. Shoe. 2004. A molecular tools in biosciences and medicine: Techniques in molecular systemtimescale of eukaryote evolution and the rise of complex multicelluatics and evolution (R. DeSalle, G. Giribet, and W. Wheeler, eds.). lar life. BMC Evol. Biol. 4:2. Birkhauser Verlag, Basel, Switzerland. Hedtke, S. M., T. M. Townsend, and D. M. Hillis. 2006. Resolution of Soltis, D. E., V. A. Albert, V. Savolainen, K. Hilu, Y.-L. Qiu, M. W. Chase, phylogenetic conflict in large data sets by increased taxon sampling. J. S. Farris, S. Stefanovic, D. W. Rice, J. D. Palmer, and P. S. Soltis. 2004. Syst. Biol. 55:522-529. Genome-scale data, angiosperm relationships, and 'ending congruHendy, M. D., and D. Penny. 1989. A framework for the quantitative ence': A cautionary tale in phylogenetics. Trends Plant Sci. 9:477-483. study of evolutionary trees. Syst. Zool. 38:297-309. Springer, M. S., R. W. DeBry, C. Douady, H. M. Amrine, O. Madsen, Holland, B. R., K. T. Huber, V. Moulton, and P. J. Lockhart. 2004. Using W. W. de Jong, and M. J. Stanhope. 2001. Mitochondrial versus nuconsensus networks to visualize contradictory evidence for species clear gene sequences in deep-level mammalian phylogeny reconphylogeny. Mol. Biol. Evol. 21:1459-1461. struction. Mol. Biol. Evol. 18:132-143. Holland, B. R., L. S. Jermiin, and V. Moulton. 2006. Improved consensus Stefanovic, S., D. W. Rice, and J. D. Palmer. 2004. Long branch attraction, taxon sampling, and the earliest angiosperms: Amborella or mononetwork techniques for genome-scale phylogeny. Mol. Biol. Evol. cots? BMC Evol. Biol. 4:35. 23:848-855. Holland, B. R., D. Penny, and M. D. Hendy. 2003. Outgroup misplace- Susko, E., M. Spencer, and A. J. Roger. 2005. Biases in phylogenetic ment and phylogenetic inaccuracy under a molecular clock—A simestimation can be caused by random sequence segments. J. Mol. Evol. 61:351-359. ulation study. Syst. Biol. 52:229-238. Huelsenbeck, J. P. 1998. Systematic bias in phylogenetic analysis: Is the Swofford, D. L. 2002. PAUP*. Phylogenetic analysis using parsiStrepsiptera problem solved? Syst. Biol. 47:519-537. mony (*and other methods). Version 4.0bl0. Sinauer Associates, Huelsenbeck, J. P., J. P. Bollback, and A. M. Levine. 2002. Inferring the Sunderland, Massachusetts. root of a phylogenetic tree. Syst. Biol. 51:32-43. Taylor, D. J., and W. H. Piel. 2004. An assessment of accuracy, error, and Huelsenbeck, J. P., and D. M. Hillis. 1993. Success of phylogenetic methconflict with support values from genome-scale phylogenetic data. Mol. Biol. Evol. 21:1534-1537. ods in the four-taxon case. Syst. Biol. 42:247-264. Jeffrey, O., H. Brinkmann, F. Delsuc, and H. Philippe. 2006. Phyloge- Wheeler, W. C. 1990. Nucleic acid sequence phylogeny and random nomics: The beginning of incongruence. Trends Genet. 22:225-231. outgroups. Cladistics 6:363-367. Lanyon, S. 1985. Detecting internal inconsistencies in distance data. Zander, R. H. 2001. A conditional probability of reconstruction measure Syst. Zool. 34:397-403. for internal cladogram branches. Syst. Biol. 50:425-437. Murphy, W. J., E. Eizirik, S. J. O'Brien, O. Madsen, M. Scally, C. J. Zwickl, D. J., and D. M. Hillis. 2002. Increased taxon sampling greatly Douady, E. Teeling, O. A. Ryder, M. J. Stanhope, W. W. de Jong, reduces phylogenetic error. Syst. Biol. 51:588-598. and M. S. Springer. 2001. Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science 294:2348-2351. Naylor, G. J. P., and W. M. Brown. 1998. Amphioxus mitochondrial First submitted 28 April 2006; reviews returned 7 July 2006; final acceptance 19 October 2006 DNA, chordate phylogeny, and the limits of inference based on comparisons of sequences. Syst. Biol. 47:61-76. Associate Editor: Tim Collins
© Copyright 2024