Document 272597

2007
POINTS OF VIEW
Wilkinson, M. 1996. Majority-rule reduced consensus trees and their
use in bootstrapping. Mol. Biol. Evol. 13:437-444.
Wilkinson, M., J. A. Cotton, C. Creevey, O. Eulenstein, S. R. Harris, F.-J.
Lapointe, C. Levasseur, J. O. Mclnerney, D. Pisani, and J. L. Thorley.
2005. The shape of supertrees to come: Tree shape related properties
of fourteen supertree methods. Syst. Biol. 54:419-31.
Wilkinson, M., F.-J. Lapointe, and D. J. Gower. 2003. Branch lengths and
support. Syst. Biol. 52:127-130.
Winkworth, R. C, D. Bryant, P. J. Lockhart, D. Havell, and V. Moulton.
2005. Biogeographic interpretation of splits graphs: Least squares
optimization of branch lengths. Syst. Biol. 54:56-65.
355
Xu, S. 2000. Phylogenetic analysis under reticulate evolution. Mol. Biol.
Evol. 17:897-907.
Zaretskii, K. 1965. Constructing a tree on the basis of a set of distances between the hanging vertices. Uspekhi Matematicheskikh
Nauk 20:90-92 [in Russian].
First submitted 9 May 2006; reviews returned 7 July 2006;
final acceptance 15 October 2006
Associate Editor: Allan Baker
Syst. Biol. 56(2):355-363,2007
Copyright © Society of Systematic Biologists
ISSN: 1063-5157 print / 1076-836X online
DOI: 10.1080/10635150701294733
JOHN GATESY,1 ROB DESALLE,2 AND NIKLAS WAHLBERG3
1
Department of Biology, University of California Riverside, Spieth Hall, Riverside, California 92521, USA; E-mail: john.gatesy@ucr.edu
Division of Invertebrates and Molecular Systematics Laboratory, American Museum of Natural History, Central Park West at 79th Street, New York,
New York 10024, USA; E-mail: desalle@amnh.org
^Department of Zoology, Stockholm University, S-106 91, Stockholm, Sweden and Laboratory of Genetics, University of Turku, 20014 Turku, Finland;
E-mail: niklas.wahlberg@utu.fi
2
The average size of molecular systematic data sets topologies that contradicted all nodes in the tree based
has grown steadily over the past 20 years. Combined on concatenation of 106 genes (Fig. la). Pairwise comparphylogenetic matrices that include multiple genetic loci isons of gene trees showed extensive incongruence, and
currently are the norm, and in many cases, rapid compi- one conflicting clade, S. kudriavzevii + S. bayanus, was
lation of extremely large DNA data sets is feasible. Thus, supported by a very large percentage of the gene trees
a frequently asked question is "How many genes should (Fig. la). Replicated support for this anomalous clade
a systematist sequence in order to generate a robust phy- was apparent in analyses of nucleotides, transversions,
logenetic hypothesis?" This query generally has been codons, and amino acids for a variety of systematic methaddressed by computer simulation, where the amount ods (Rokas et al., 2003; also see Holland et al., 2004,2006;
of virtual DNA sequence data that can be generated is Phillips et al., 2004; Taylor and Piel, 2004; Collins et al.,
unlimited (e.g., Huelsenbeck and Hillis, 1993). Genomic 2005; Gatesy et al., 2005; Ren et al., 2005; Hedtke et al.,
data, however, provide systematists with a multitude of 2006).
By examining correlations between bootstrap scores
empirical molecular data for phylogenetic analysis, and
several authors have taken advantage of this resource to and possible confounding factors, however, Rokas et al.
examine the effects of increasing the number of genes to (2003) concluded that "... none of the factors known or
quantities that seemed impossible in the recent past (e.g., predicted to cause phylogenetic error could systematiCummings et al, 1995; Bapteste et al, 2002; Goremykin, cally account for the observed incongruence, suggesting
that there may be no good predictor of the phylogenetic
2004).
In one noteworthy study, Rokas et al. (2003) compiled informativeness of genes" (p. 802). Therefore, many rana large systematic matrix of 127,026 nucleotide positions domly selected genes were necessary to overwhelm confrom 106 genes for 7 species of Saccharomyces yeast and an flicting signals. In this case study, very large concatenated
outgroup (Candida albicans). Maximum likelihood (ML) data sets of ~20 genes were required to provide 95% bootand parsimony analyses of this large data set produced strap support for all nodes in the combined data tree,
congruent, well-supported results with bootstrap scores "substantially more genes than commonly used but a
of 100% for all clades (Fig. la). In spite of this over- small fraction of any genome" (p. 799). Rokas et al. (2003)
whelming support, Rokas et al. (2003) noted that there concluded that "These results have important implicawas widespread topological conflict among gene trees. tions for resolving branches of the tree of life" (p. 799) and
Separate analyses of individual genes produced various "... important implications for many current practices in
Downloaded from http://sysbio.oxfordjournals.org/ by guest on October 6, 2014
How Many Genes Should a Systematist Sample? Conflicting Insights from
a Phylogenomic Matrix Characterized by Replicated Incongruence
356
VOL. 56
SYSTEMATIC BIOLOGY
b)
Calb
0.10 substitution/site
FIGURE 1. (a) The tree supported by the concatenation of 106 genes from Rokas et al. (2003). ML bootstrap scores are above internodes,
and the percentage of ML gene trees that strictly supported a particular clade are indicated below internodes. The most common, conflicting
clade, Skud+Sbay, also is shown, (b) Branch lengths for the optimal ML tree for the concatenation of 106 yeast genes; scale bar shows expected
numbers of substitutions per site (length of outgroup branch is indicated). All phylogenetic analyses in this paper were branch and bound
searches executed in PAUP* 4.0bl0 (Swofford, 2002). All ML models were chosen by likelihood ratio tests as in Rokas et al. (2003) using PAUP*
and ModelTest 3.06 (Posada and Crandall, 1998). Bootstrap analyses (Felsenstein, 1985) were as in Gatesy and Baker (2005). Seer = Saccharomyces
cerevisiae; Spar = S. paradoxus; Smik = S. mikatae; Skud = S. kudriavzevii; Sbay = S. bayanus; Seas = S. castellii; Sklu = S. kluyveri; and Calb =
Candida albicans.
ity of phylogenetic hypotheses (Lanyon, 1985; Philippe
and Douzery, 1994; Siddall, 1995; Brochu, 1997; Poe, 1998;
Siddall and Whiting, 1999; Holland et al., 2003).
Here, we use selected removal of taxa to explore patterns of incongruence in the yeast data set. In particular, we analyze different subsets of species to determine
whether disagreements among gene trees are tempered
or accentuated by altering taxonomic representation. In
combination with documentation of branch lengths for
individual gene trees, our subsampling results show that
the set of species analyzed by Rokas et al. (2003) is not
representative of most published systematic studies. We
suggest that the yeast matrix does not provide a coherent,
general recommendation for how many genes to sample
in future molecular systematic studies. However, patterns of conflict for different subsets of species offer a very
simple explanation for replication of the discrepant S. kudriavzevii + S. bayanus clade in many gene trees (Fig. la).
EXCEPTIONALLY LONG BRANCHES
Examination of the optimal topology for the concatenation of 106 genes showed a striking difference
between branches that connected 5 closely related Saccharomyces species (S. cerevisiae, S. paradoxus, S. mikatae,
S. kudriavzevii, S. bayanus) and branches that led to S.
castellii, S. kluyveri, and the outgroup C. albicans (Fig. lb;
Hedtke et al., 2006; Jeffroy et al., 2006). For the ML model
utilized by Rokas et al. (2003), the branches that connected to S. castellii, S. kluyveri, and C. albicans ranged
from 0.31 to 1.58 expected substitutions per site, whereas
the branches that joined the remaining, closely related
Saccharomyces species were from 0.03 to 0.08 substitutions
per site. Only 15% of the inferred nucleotide substitutions
occurred on branches that linked these 5 species (Fig. lb).
Consistent with an estimated Precambrian (~723 Mya)
divergence of Candida albicans from Saccharomyces cere-
visiae (Hedges et al., 2004), the outgroup branch in the
yeast tree was exceptionally long. Each site in the concatenated data set is expected to change 1.58 times on this
Downloaded from http://sysbio.oxfordjournals.org/ by guest on October 6, 2014
molecular phylogenetics" (p. 802), points that were reasserted in a commentary by Gee (2003).
Specifically, if 20 or more genes generally are required
to yield robust support, then most previous phylogenetic
analyses are inadequate in terms of character sampling.
This assertion is based on the assumption that the 8 taxa
analyzed by Rokas et al. (2003) represent a typical systematic problem. Rokas et al. (2003) considered this issue,
noting that "It is possible that the 8 yeast taxa we have analyzed represent a very difficult phylogenetic case, atypical of the situations found in other groups. However, the
widespread occurrence of incongruence at all taxonomic
levels argues strongly against such a view. Rather, we
believe that this group is a representative model for key
issues that researchers in phylogenetics are confronting"
(p. 802). Large matrices that combine information from
20 or more gene fragments are rare (e.g., Murphy et al.,
2001; Bapteste et al., 2002; Gatesy et al., 2002; Goremykin,
2004); therefore, if the test case of Rokas et al. (2003)
is representative, most published molecular systematic
studies are, at best, preliminary efforts.
Rokas et al. (2003) primarily used the nonparametric
bootstrap (Felsenstein, 1985) to assess support and to
search for correlates of incongruence in the yeast matrix.
Recent reanalyses have utilized a variety of techniques
to further characterize conflicting signals in the yeast
data set. These approaches included Bayesian analysis
(Taylor and Piel, 2004; Jeffroy et al., 2006), transversion
coding (Phillips et al., 2004; Jeffroy et al., 2006), removal
of rapidly evolving third codon positions (Collins et al.,
2005; Jeffroy et al., 2006), partitioned Bremer support
scores (Collins et al., 2005; Gatesy et al., 2005), consensus
networks (Holland et al., 2004, 2006), isolation of genes
with shifting base compositional biases (Collins et al.,
2005), supertree bootstrapping (Burleigh et al., 2006), increased taxon sampling (Rokas and Carroll, 2005; Hedtke
et al., 2006), and better fitting models of molecular evolution (Ren et al., 2005). Alternatively, several authors
have suggested that reducing the number of taxa included in analysis can yield insights regarding the stabil-
2007
357
POINTS OF VIEW
r Seer
f Spar
I Smik
Jskud
pj'sbay
A L Seas
L-Sklu
^ 95.31 1-
Calb
YJL085W
I Seer
J Spar
J Smik
rlskud
rn'sbay
A >—Seas
I—Sklu
Calb
•i 49.89 h
— 1.00
substitution/site
YIL090W
YDL006W
Seer
Spar
Smik
Skud
Sbay
Seas
Sklu
Calb
YPL210C
YDR484W
YML096W
YDL116W
FIGURE 2. The 8 yeast genes with the longest ML branch lengths for the tree supported by the concatenation of 106 genes. The scale bar
represents 1.00 expected substitution per site; branches that connect the 5 most closely related Saccharomyces species are tiny at this scale. The
length of the longest branch in each yeast gene tree is indicated. Note that some topologies show more than one branch that is >1.00 expected
substitution per site. Abbreviations for yeast species are as in Figure 1.
b r a n c h according to the ML estimate (Fig. l b ) . For the
topology supported by the concatenation of 106 genes,
43% of the yeast genes had one branch that was >2.00 expected substitutions per site, and 79% of the yeast genes
had at least one branch that was >1.00 substitution per
site (also see Hedtke et al, 2006). For comparison, in an
often cited discussion of long branches in a 28S rDNA
tree of holometabolous insects, Huelsenbeck (1998) remarked that two branches in his analysis were "among
the longest ever observed (approximately 1.0 substitution per site)" (p. 530). However, branches in many of the
yeast gene trees dwarfed those in the insect rDNA tree
and were up to 95 times longer (Fig. 2). From another
perspective, the longest branches in the yeast data set exceeded those in a tree based on mitochondrial genomes
from 5 animal phyla (Naylor and Brown, 1998) and also
were much longer than branches in simulations designed
to assess misplacement of long branches (e.g., Anderson
and Swofford, 2004). Although it has been suggested that
the set of species in the yeast data set represents a typical
phylogenetic problem (Rokas et al., 2003), the extraordinarily long branch lengths in most yeast gene trees
demonstrate that this is not the case (e.g., Fig. 2).
related Saccharomyces species (Fig. la). In ML analyses,
11% of gene trees conflicted with the S. cerevisiae + S.
paradoxus clade, 28% conflicted with the S. cerevisiae + S.
paradoxus + S. mikatae clade, and 45% conflicted with the
S. cerevisiae -f S. paradoxus + S. mikatae + S. kudriavzevii
clade. Given the moderate lengths of branches that linked
S. cerevisiae, S. paradoxus, S. mikatae, S. kudriavzevii, and S.
bayanus (Fig. lb), we were surprised by the widespread
discrepancies among genes at this level.
To further explore differences among gene trees, we
reanalyzed the 5 closely related species of Saccharomyces
in isolation from their distant relatives, S. castellii, S.
kluyveri, and C. albicans. Given the diversity of gene
trees for all 8 taxa, we expected to find many conflicting topologies but were shocked by complete congruence among the 106 gene trees in ML analyses (Fig. 3).
There are 15 possible bifurcating trees (unrooted) for a
data set of 5 taxa; assuming an equal probability for each
topology a priori, the chance of recovering the same
tree 106 straight times is astronomically low (P = 3.24
x 10~124). Ironically, a systematic data set that has been
presented as a prime example of pervasive, inexplicable
conflict among genes (Rokas et al., 2003) can be transformed, with the removal of 3 species, into a remarkably
congruent data set that shows 100% agreement among
COMPLETE CONGRUENCE FOR FIVE CLOSELY RELATED
106 genes. For the set of 5 closely related Saccharomyces
species, 20 genes were not necessary to resolve relationSaccharomyces SPECIES
For the yeast data set, gene trees that included all 8 ships; basically any gene will do (Fig. 3c). Subsets of only
species showed many conflicts among the 5 most closely 600 randomly resampled nucleotides from the yeast data
Downloaded from http://sysbio.oxfordjournals.org/ by guest on October 6, 2014
•i 16.94
YKL034W
418.84
358
VOL. 56
SYSTEMATIC BIOLOGY
+CD
CO T 3
1—|O
a)
ites th
he cla
™~ Seer
b)
— Spar
100-
90-
TO 5 =
y. /
"o CD
80M-
Q.
vP
0^
0
=
Scei+Spar
•
=
Scer+Spar+S nik
3
CO
70"
400
200
600
800
1000
Base pairs sampled
O)
/
ioor Seer
100/1 Spar
ggj-Scer
7jj- Seer
99(1 Spar
82 Seer
96/Lspar
Jl—Smik
gy-Scer
100/L Spar
1— Skud
1 — Sbay
loofLspar
Jl—Smik
II—Skud
L _ Sbay
Jl—Smik
II—Skud
fl—Skud
II—Skud
1—Sbay
L — Sbay
I—Sbay
YAL053W
1701 bp
YAR007
1458 bp
YBL015W
1557 bp
YBL091C
993 bp
YBR039W
744 bp
8 3|- Seer
loort-Spar
,001-Scer
loofi-Spar
sj-Scer
89n_spar
Jl-Smik
fl—Skud
I
Sbay
Jl—Smk
ioorScer
99jT.Spar
ioorScer
sijH-Spar
Jl-Smik
fl—Skud
Jl-Smik
fl—Skud
Jl—Smik
fl—Skud
Jl—Smik
fl—Skud
1
1
1
1
Sbay
76|-Scer
s j l Spar
Jl—Smik
fl—Skud
L_Sbay
YDR054C
618 bp
sip Seer
iooflspar
Jl—Smk
|l—Skud
1
Sbay
YER090W
1434 bp
93j-Scer
89n.Spar
JL-Smk
fl—Skud
I—Sbay
YHR019C
1272 bp
Sbay
YCR017C
1122 bp
77 j-Scer
loon-Spar
Jl-Smik
fl—Skud
L_Sbay
YDR072C
891 bp
100 r Seer
92/I. Spar
Jl—Smik
1—Skud
1—Sbay
YFR044C
1269 bp
10o|-Scer
iooflspar
JL-Smik
P—Skud
1—Sbay
YHR137W
1056 bp
,oprScer
97 j-Scer
99JI Spar
Jl—Smik
fl—Skud
ssfLspar
Jl-Smik
|l—Skud
I—Sbay
1
Sbay
YMLuoc
Sbay
YDL006W
552 bp
,opj- Seer
92jlspar
Jl—Smik
1—Skud
1
Sbay
YDRioic
1362 bp
99i-Scer
fl—Skud
[1—Skud
1
1
Sbay
YGL001C
1026bp
,001-Seer
100JI Spar
JL-Smik
fl—Skud
I
Sbay
YIL109C
1890 bp
gyScer
100/1. Spar
78 j-Scer
6o/LSpar
Jl—Smik
fl—Skud
«—Sbay
YiRoosc
1062 bp
1
Sbay
YDL126C
2193 bp
,oopScer
loojlspar
YDL148C
1215 bp
78j-Scer
93/lSpar
Jl—Smik
,oo r Seer
82/lSpar
Jl—Smik
ioorScer
100/I Spar
Jl—Smik
fl—Skud
fl—Skud
fl—Skud
1
1
1
11—Skud
I—Sbay
84i- Seer
Sbay
YGL225W
813 bp
Sbay
YGL253W
1026 bp
Jl-Smik
I—Skud
1 Sbay
Jl-Smik
[1—Skud
Jl-Smik
I—Skud
I—Sbay
97j-Scer
68/LSpar
Jl—Smik
I—Skud
I
Sbay
,001-Scer
93|Lspar
Jl—Smik
|l—Skud
1
Sbay
YIL088C
981 bp
YIL090W
993 bp
YIL125W
2826 bp
95|lspar
Jl—Smik
|l—Skud
1—Sbay
ioorScer
99(1 Spar
ioorScer
loon. Spar
98 r Seer
97/1 Spar
99- Seer
99R-Spar
Jl—Smik
y—Skud
I—Sbay
JL-Smik
fl—Skud
Jl-Smik
I—.Sbay
1
Jl-Smik
fl—Skud
I—Sbay
YKR099W
633 bp
YML096W
909 bp
YMR041C
771 bp
fl—Skud
Sbay
YMR203W
858 bp
Sbay
YGRO05C
741 bp
1
Sbay
YJR117W
1227 bp
Sbay
YKL104C
990 bp
YNL201C
1614 bp
,oorScer
96n.Spar
YDR531W
483 bp
7 i f l Spar
Jl—Smik
I
,001-Scer
95n.Spar
YDR484W
1116 bp
1
YOR361C
1575 bp
YMR277W
1392 bp
<50pScer
92/1 Spar
YDR465C
810 bp
L — Sbay
L—Sbay
YDL195W
1254 bp
YDR443C
1818 bp
Jl—Smik
fl—Skud
I—Sbay
Jl-Smik
fl—Skud
Jl—Smik
fL_Skud
I—Sbay
Jl—Smik
Jl—Smik
fl—Skud
Jl-Smik
fl—Skud
I—Sbay
,oorScer
'OOTL Spar
1—Skud
1
Sbay
Jl-Smik
fl—Skud
Seer
9iH.Spar
YBR126C
906 bp
1—Skud
1
Sbay
Jl-Smik
fl—Skud
99 j-Scer
55n.Spar
Sbay
Jl—Smik
97 j-Scer
99/Lspar
YNL104C
1596 bp
1
Jl—Smik
,001-Scer
96/t-Spar
II—Skud
I
Sbay
11—Skud
1—Skud
L—Sbay
861-Seer
i o o ] l Spar
YJLIOOW
390 bp
gar Seer
<5on.Spar
Jl—Smik
Jl—Smik
II—Skud
I
Sbay
YNL155W
495 bp
861- Seer
57|lspar
YDLI66C
519 bp
ggj-Scer
100/L Spar
I—-Sbay
YOR197W
1026 bp
92j-Seer
97j-Scer
94n_Spar
Jl—Smik
fl—Skud
Sbay
1170bp
Jl—Smik
fl—Skud
L_Sbay
Jl-Smik
YOR158W
756 bp
YBRHOW
Jl—Smik
fl—Skud
I
Sbay
1—Skud
1
Sbay
1
Sbay
JL-Smik
fl—Skud
Jl—Smik
YOR025W
900 bp
I
sjfi-Spar
1—Skud
L—Sbay
1
Jl—Smik
II—Skud
saj-Scer
98ll.Spar
Jl—Smik
YNL062C
1035 bp
,001- Seer
100ft. Spar
gar Seer
loort-Spar
gjj-Scer
86Jt_Spar
Jl—Smik
fl—Skud
1—-Sbay
93r Seer
YBR07OC
432 bp
,001-Scer
96(1 Spar
JL-Smik
fl—Skud
Sbay
YGL205W
1614 bp
I—Skud
Sbay
1OOj-Scer
100/Lspar
Jl-Smik
fl—Skud
1128 bp
YGL192W
756 bp
I
saj-Scer
95ft. Spar
Jl—Smk
fl—Skud
YPRI8IC
Sbay
Jl-Smik
ioorScer
e m . Spar
loorScer
93flSpar
1
YDR361C
429 bp
ejlSpar
Jl—Smik
fl—Skud
I
Sbay
gap Seer
looTUSpar
YPR140W
1011 bp
Y0R176W
711 bp
1001-Scer
ioor Seer
94H. Spar
Sbay
JL-Smik
II Skud
L_Sbay
gji-Scer
95JL Spar
Jl—Smik
II—Skud
I
Sbay
92/t.Spar
Jl—Smik
YMR186W
2058 bp
YOL145C
1788 bp
,001-Scer
100/LSpar
YDL116W
1479 bp
85r Seer
97(1 Spar
Jl—Smik
YMR015C
1413 bp
735 bp
Sbay
YDL031W
1953 bp
YBR056W
882 bp
gei-Scer
89fLspar
Sbay
YKR089C
1446 bp
YGR094W
2994 bp
ioorScer
93flspar
Jl—Smik
fl—Skud
I—Sbay
sjrScer
,001-Scer
98(1 Spar
e-sflSpar
Jl-Smik
II—Skud
I
Sbay
Jl—Smik
fl—Skud
YBR162C
879 bp
YBR179C
1815 bp
1
Sbay
gjj-Scer
looTLspar
g^-Scer
ioort-Spar
JL-Smik
fl—Skud
JLsmik
fl—Skud
L—Sbay
L—Sbay
"0L215C
£448 bp
YDL238C
678 bp
7B j-Scer
L—.Sbay
L—Sbay
YEL037C
:i99 bp
77j-Scer
93jrt-Spar
.fl—Smik
II—Skud
L—Sbay
/GR194C
1032 bp
1
Sbay
YER005W
552 bp
94i-Seer
1001-Scer
eejlspar
Jl—Smik
II—Skud
1—Sbay
YGR285C
1236 bp
g7#.Scer
100/Lspar
,oorScer
100JI Spar
JL-Smik
fl—Skud
I—Sbay
Jl-Smik
fl—Skud
L_Sbay
,oprScer
100/lspar
,001-Scer
seflSpar
gpjScer
95TlSpar
Jl-Smik
Jl—Smik
fl-Smik
1— Skud
1
Sbay
1—Skud
—Sbay
JL-Smik
fl—Skud
I—Sbay
YNL287W
2454 bp
YNR038W
1080 bp
YOL049W
1086 bp
YLR389C
2016 bp
,001-Scer
9 m . Spar
gjj-Scer
93JI Spar
99j-Scer
M R . Spar
loorScer
toon. Spar
JL-Smik
Jl-Smik
fl—Skud
JL-Smik
Jl-Smik
fl—Skud
Jl-Smik
L—.Sbay
Lskud
1
1
Jl—Smik
fl—Skud
1—Sbay
,001-Scer
iooflspar
Jl-Smik
II—Skud
1 Sbay
YJL085W
1131 bp
Sbay
YHL014C
996 bp
1001-Scer
sjflspar
JL-Smik
fl—Skud
L__ Sbay
YML021C
654 bp
,oor Seer
9in.Spar
Jl—Smik
fl—Skud
I—Sbay
YOL097C
927 bp
9irScer
loorlspar
Jl—Smik
fl—Skud
1—Sbay
YPL104W
1542 bp
YPL106C
1995 bp
YPL169C
780 bp
YPL195W
1614 bp
YPL210C
1251 bp
YPR074C
1770 bp
96j-Scer
100/Lspar
ggj-Scer
100/L Spar
ggj-Scer
100H. spar
92j-Scer
114/L Spar
9 grScer
98/L Spar
72P Seer
81/L Spar
Jl-Smik
II—Skud
I—-Sbay
Jl—Smik
II—Skud
I
Sbay
-H-Smik
11— Skud
fl-Smik
Jl—Smik
fl—Skud
L_Sbay
JL-Smik
II—Skud
L—Sbay
YJL087C
1305 bp
YJRO68W
894 bp
L — Sbay
1—Skud
—Sbay
YJR072C
1062 bp
YKL034W
852 bp
97r Seer
1001-Scer
gsrScer
1001-Scer
99jlspar
Jl—Smik
fl—Skud
I—Sbay
iooflspar
JL-Smik
fl—Skud
1—Sbay
loojlspar
Jl-Smik
fl—Skud
98jlspar
Jl-Smik
fl—Skud
1—Sbay
YNL123W
2241 bp
YNL220W
1257 bp
YNL082W
828 bp
1—Skud
—Sbay
YER087W
810 bp
96|lspar
Jl—Smik
1—Skud
L — Sbay
99|-Scer
79/L Spar
Sbay
YDR021W
1146 bp
ioor Seer
98 j-Scer
99fuSpar
YPL028W
885 bp
Jl—Smik
fl—Skud
L_Sbay
oojlSpar
JL-Smik
11—Skud
86r Seer
100ft. Spar
M—Skud
l—_ Sbay
ioorScer
loojrLspar
951- Seer
YLR253W
1287 bp
YNL248C
618 bp
YBR198C
1587bp
6'ji-Spar i o o f l Spar
-Il-Smik Jl-Smik
fl—Skud f 1— Skud
YLL029W
1605 bp
II—Skud
I
Sbay
ioorScer
iooflspar
Jl—Smik
1—Skud
1—Sbay
L—Sbay
YNL313C
1590 bp
YKL120W
831 bp
YKR071C
651 bp
ioorScer
l u o f l Spar
Jl-Smik
II—Skud
1—Sbay
YNR008W
1176 bp
FIGURE 3. One hundred percent congruence among 106 gene trees for the 5 closely related Saccharomyces species. In separate ML analyses, all
genes supported the same topology (a) that included Scer+Spar (white circle) and Scer+Spar+Smik (black circle). Very few randomly sampled
nucleotides were required to consistently recover these clades in analyses of the 5 closely related Saccharomyces species (b); subsamples of only
600 nucleotides supported each clade >95% of the time (dotted line = 95%). The 106 completely congruent ML gene trees for Seer, Spar, Smik,
Skud, and Sbay are shown in (c); ML bootstrap scores and the number of nucleotides for each gene are indicated. Species abbreviations are as in
Figure 1. For all genes, ML models for 5 species were chosen as in Rokas et al. (2003) using PAUP* and ModelTest 3.06. Analyses of randomly
sampled sites from the yeast data set (b) were done in PAUP* and included 500 replicates for each sample size (200, 400, 600, 800, and 1000
nucleotide sites). ML searches were branch and bound.
Downloaded from http://sysbio.oxfordjournals.org/ by guest on October 6, 2014
YCL054W
2199 bp
Jl-Smik
97i- Seer
94JI Spar
Jl—Smik
I—Skud
I
Sbay
2007
POINTS OF VIEW
359
change was restricted to the outgroup branch (3.04 substitutions per site), and the concatenation of 106 genes
supported a grouping of S. kudriavzevii + S. bayanus with
an ML bootstrap score of 76% (Fig. 4). This relationship
was incompatible with the S. cerevisiae + S. paradoxus
+ S. mikatae + S. kudriavzevii clade that had 100% bootstrap
support in the analysis of 106 genes for 8 species
INCONGRUENCE AMONG GENES WITH THE ADDITION
la).
Thus, a systematic data set of 8 species, in which
(Fig.
OF DIVERGENT TAXA
20 genes were considered sufficient for robust phylogeIn ML analyses of S. cerevisiae, S. paradoxus, S. mikatae,
netic support (Rokas et al., 2003), can be transformed,
S. kudriavzevii, and S. bayanus, there were no topologi- with the deletion of 2 species, into a data set of 106 genes
cal discrepancies among gene trees (Fig. 3c), but when that yields a contradictory tree; >100 concatenated genes
more distantly related taxa (S. castellii, S. kluyveri, and did not provide >95% bootstrap support at all nodes
C. albicans) were included, gene trees showed extensive for this set of 6 species (Fig. 4). Phylogenetic analyses of
conflicts regarding relationships among the 5 closely re- taxon subsamples (Figs. 3,4) clearly show that the numlated Saccharomyces species. Previously, we noted such ber of genes required to yield strong bootstrap scores is
conflicts for the full complement of 8 species (Fig. la). highly dependent on the particulars of a given systematic
Analyses of the 7 Saccharomyces species, excluding the problem and suggest that long branches (Fig. 2) explain
outgroup C. albicans, also revealed widespread incongru- much of the incongruence among genes in the yeast data
ence among genes (Rokas et al., 2003), as did analyses of set (Taylor and Piel, 2004; Hedtke et al., 2006; Jeffroy et al.,
6 species (Fig. 4).
2006).
For the 6 species set composed of 5 closely related Saccharomyces species and C. albicans, the disparity in length
HIGHLY REPLICATED INCONGRUENCE WITH
between the outgroup branch and ingroup branches was
THE ADDITION OF DISTANT TAXA
greatest. Approximately 87% of the expected character
In ML analyses of the 5 closely related Saccharomyces
species, all gene trees had the same 7 branches (Fig. 3a).
Assignment of the root to 5 of these 7 branches will
yield the incongruent S. kudriavzevii + S. bayanus clade
(Fig. 5a). When the distantly related C. albicans was added
to a matrix that included S. cerevisiae, S. paradoxus, S.
-(0.93].
mikatae, S. kudriavzevii, and S. bayanus, all ML gene trees
Sbay
were consistent with the unrooted topology for these 5
•Seas
Saccharomyces species (Fig. 3a), but rooting position was
scattered among the 7 ingroup branches (Fig. 5b). The
most common root position was on the "correct" branch
(S. bayanus; 31 times). For the other 75 genes, the root was
distributed across the remainder of the topology, and the
majority of gene trees (57 of 106) supported the S. kudriavzevii + S. bayanus clade (Figs. 4 and 5b). Assuming an
equal a priori probability of recovering the S. kudriavzevii + S. bayanus clade or the S. cerevisiae + S. paradoxus
+ S. mikatae + S. kudriavzevii clade, it would be highly
unlikely for one of these groups to be supported in >57
of 88 gene trees, as was the case here (binomial probability of 0.007). Analogous but less extreme patterns were
observed in ML gene trees for other combinations of 6
taxa, in which the 5 closely related Saccharomyces species
Calbwere rooted with either S. castellii or S. kluyveri. S. kuFIGURE 4. Trees supported by the concatenation of 106 genes in driavzevii + S. bayanus was always the most common,
ML analyses of 6 species. Note the very long outgroup branches; ex- conflicting clade (Fig. 4). Likewise, Rokas et al. (2003)
pected substitutions per site for the outgroup branches are indicated. In documented the same pattern of replicated support for
the topology rooted with Calb, the conflicting Skud+Sbay clade was
supported. ML bootstrap scores are above internodes, and the per- the conflicting S. kudriavzevii + S. bayanus in gene trees
centage of ML gene trees that strictly supported a particular clade are for the 7 Saccharomyces species, excluding the outgroup
indicated below internodes. For the top and middle trees, numbers in C. albicans.
parentheses indicate the percentage of times that the Skud+Sbay clade
Of the 10,395 possible bifurcating topologies for all
was supported; for the bottom tree, the number in parentheses is the 8 species, the S. kudriavzevii + S. bayanus bipartition is
percentage of gene trees that supported the Scer+Spar+Smik+Skud
clade. Species abbreviations are as in Figure 1. For each gene and for found in only 9% of all trees. However, ML analyses
the concatenation of 106 genes, ML models for each set of 6 species of 8 species showed that S. kudriavzevii + S. bayanus
were chosen as in Rokas et al. (2003) using PAUP* and ModelTest 3.06. was recovered 32 times in 106 gene trees (Fig. la);
set were sufficient to support the 2 nodes in the 5 taxon
tree in >95% of replicates, and only 200 nucleotides recovered each clade > 75% of the time (Fig. 3a, b). These are
very small samples of characters relative to most modern
systematic studies.
Downloaded from http://sysbio.oxfordjournals.org/ by guest on October 6, 2014
360
VOL. 56
SYSTEMATIC BIOLOGY
Seer
a)
Spar
Smik
Skud
Sbay
0.02 substitutions/site
b)
ITTT"
nmmmmnr
mrmmmmm'
Seer
Spar
Smik
Skud
Sbay
nnmmSpar
mmtmmmm
r
J"L
£v Lr
^Seer
Skud
Sbay
Smik
Seer
Spar
r
J~L
- [P
^ " T-
beer
Spar
Skud
Sbay
— Smik
J«— Smik
(DJ1— Seer
1
Spar
c)
rtmr Spar
TTTT
mr
trm
Smik
mmmmnn
Skud
Sbay
r Skud
Jl-Sbay
nnmmim
mnmiTTTmtmmr
TTTmTTTTT
Seer
Smik
Skud
Sbay
FIGURE 5. (a) ML tree supported by concatenation of 106 genes for the 5 closely related Saccharomyces species, with possible root placements
shown. Five of the 7 roots yield the conflicting Skud+Sbay clade. (b) Placement of the root in 106 ML gene trees for 6 species (Seer, Spar, Smik,
Skud, Sbay, and Calb). (c) Placement of the root in 106 ML gene trees for all 8 yeast species. Arrows on branches indicate the number of times a
particular branch was rooted by Calb (b) or by Calb, Seas, and Sklu (c). For 3 genes, there were 2 optimal rootings; truncated arrows on branches
in (b) indicate that in 1 of the 2 best trees, the root was assigned to that branch. Species abbreviations are as in Figure 1.
previous parsimony, ML, and Bayesian results showed
How MANY GENES ARE ENOUGH?
this same pattern of replicated incongruence whether
Rokas et al. (2003) suggested that their analyses of 106
nucleotides, transversions, codons, or amino acids were genes from 8 species had important implications for reanalyzed (Rokas et al, 2003; Phillips et al., 2004; solving the tree of life. In particular, they argued that
Taylor and Piel, 2004; Collins et al., 2005; Gatesy et al., 20 or more genes might be required to garner robust
2005; Ren et al., 2005; Burleigh et al., 2006; Holland et al., support for phylogenetic relationships. This assertion
2004,2006). Repeated recovery of the incongruent S. ku- is based on two critical assumptions: (1) There are no
driavzevii + S. bayanus clade in ~30% of our ML gene good predictors for the utility of different genes, and
trees strongly suggested an underlying bias. Once again, (2) the 8 species in their data set represent a typical phyall ML gene trees for 8 species were compatible with re- logenetic problem. Recent reanalyses of the yeast data
lationships in the unrooted tree for the 5 closely related set have contested both of these assumptions. Phillips
Saccharomyces species (Fig. 3a), but different placements et al. (2004) noted that differences in G-C content might
of the 3 long branch taxa (Fig. 6a, b) resulted in many explain non-historical signals in the yeast matrix. Subsegene trees that supported the S. kudriavzevii + S. bayanus quently, Collins et al. (2005) found that shifts in base comclade (Figs, la and 5c). As in the analyses of 6 taxa (Fig. 4), position were most prominent at third codon positions
replicated support for the conflicting S. kudriavzevii + (also see Jeffroy, 2006). When Collins et al. (2005) resamS. bayanus grouping was due to erratic rooting of the pled genes with stationary base compositions, only 10
uniformly supported, pectinate topology for S. cerevisiae, genes were required to record high bootstrap percentS. paradoxus, S. mikatae, S. kudriavzevii, and S. bayanus ages for relationships supported by the concatenated
(Fig. 3a) by the very distantly related S. castellii, S. data set. By contrast, 23 genes characterized by large
kluyveri, and C. albicans (Figs. 4 to 6; for discussion of shifts in base composition were necessary to yield the
distant outgroups see Wheeler, 1990; Huelsenbeck et al., same level of support (Collins et al., 2005). In a study
2002; Holland et al., 2003; Anderson and Swofford, 2004; focused on Bayesian support measures, Taylor and Piel
Bergsten, 2005; Susko et al., 2005; Goloboff and Pol, 2005; (2004) found that, "Overall the external/internal branch
Hedkeetal.,2006).
length ratios were greater for trees that were incongruent
Downloaded from http://sysbio.oxfordjournals.org/ by guest on October 6, 2014
r
_P"1—
p
"I
2007
361
POINTS OF VIEW
®
|
Skud
Sbay
Smik
Jl-Sm
i
1 1 Spar
Spar
Seer
I- Seer
Seas
Sklu
Calb
0.10
0.10
YBR039W (72%)
YDL215C (47%)
|
—
0.10
0.10
YGR194C (50%)
Calb
0.10
YEL037C (72%)
0.10
YNL155W (77%)
Skud
Sbay
Smik
J-Srr
©
| L Spar
** Seer
Seas
Sklu
Calb
YLL029W (75%)
pO-HZ
i—
1
— Sklu
Calb
Calb
0.10
YNL201C (60%)
b)
L
— 0.10 YDL195W
— 0.10 YOR025W (54%)
Skud
Sbay
Smik
Seer
f<6>f Si
i Spar
Spi
Seas
- Sklu
Calb
0.10
YJR072C (68%)
Smik
Calb
Calb
0.10
YGL192W
0.10
YKL120W
FIGURE 6. Optimal ML topologies for 12 yeast genes. All gene trees are consistent with the unrooted topology for the 5 closely related
Saccharomyces species (Fig. 3); branches that connect these 5 species are shown as thick lines. Numbers in circles indicate rooting position
according to Figure 5. The 9 genes in (a) all supported the Skud+Sbay clade (bootstrap support for Skud+Sbay is shown in parentheses). When
these 9 genes were combined, Skud+Sbay was not supported, and the topology favored by the concatenation of 106 genes was optimal (see
Gatesy and Baker, 2005). The three gene trees in (b) support alternative topologies. Two of these genes did not support monophyly of the 5 closely
related Saccharomyces species but were compatible with the unrooted tree for these species (Fig. 3a).
with the reference tree [our Fig. la, b ] . . . " (p. 1536), a
result that was statistically significant. In sum, the contention that there are no good predictors of phylogenetic
utility for particular genes does not seem to hold for this
phylogenomic data set.
Following Taylor and Piel (2004), Ren et al. (2005),
Hedtke et al. (2006), Jeffroy et al. (2006), and Holland et al.
(2006) noted the presence of exceptionally long branches
and argued that a high level of divergence and associated branch length inequalities (e.g., Fig. 2) were determinants of conflict among genes in the yeast data set.
Here, we extended these arguments and concluded that
the 8 species in the yeast data set do not represent a "typical" phylogenetic problem. The tree based on the concatenated matrix of 106 genes showed great disparities
in branch lengths (Fig. lb), but individual gene trees had
some truly extraordinary branches that were up to 95.31
expected substitutions per site (Fig. 2). This saturation
of nucleotide substitution does not represent a typical
phylogenetic problem; many systematists acknowledge
that this degree of divergence is a very difficult problem (Felsenstein, 1978; Hendy and Penny, 1989; Wheeler,
1990; Huelsenbeck, 1998; Pol and Siddall, 2001; Holland
et al., 2003; Anderson and Swofford, 2004; Bergsten, 2005;
Susko et al., 2005). S. castellii, S. kluyveri, and C. albicans
are very genetically distant from each other and from
the 5 most closely related Saccharomyces species (Figs. 1,
2, 4, and 6). Therefore, it was not surprising that there
were wholesale conflicts among gene trees in parsimony,
Bayesian, and ML analyses (Taylor and Piel, 2004; Hedtke
Downloaded from http://sysbio.oxfordjournals.org/ by guest on October 6, 2014
Seer
Spar
Skud
Sbay
L cSmik
Seas
362
SYSTEMATIC BIOLOGY
VOL. 56
ITERATED CONFLICT IN PHYLOGENOMIC MATRICES
Our reanalyses provided a very simple explanation
for replicated support of the conflicting S. kudriavzevii + S. bayanus clade in many yeast gene trees (Fig.
la). Misrooting of a stable topology for 5 close relatives
(Fig. 3a) by 3 genetically distant taxa (Fig. 4) can account for this iterated pattern. Because the topology for
S. cerevisiae, S. paradoxus, S. mikatae, S. kudriavzevii, and S.
bayanus was pectinate (Fig. 3a), erratic placement of the
root (Fig. 5) repeatedly yielded the discrepant S. kudri-
Anderson, F. E., and D. L. Swofford. 2004. Should we be worried about
long-branch attraction in real data sets? Investigations using metazoan 18S rDNA. Mol. Phylogenet. Evol. 33:440-451.
Bapteste, E., H. Brinkmann, J. A. Lee, D. V. Moore, C. W. Sensen,
P. Gordon, L. Durufle, T. Gaasterland, P. Lopez, M. Miiller, and
H. Philippe. 2002. The analysis of 100 genes supports the grouping of three highly divergent amoebae: Dictyostelium, Entamoeba, and
Mastigamoeba. Proc. Natl. Acad. Sci. USA 99:1414-1419.
Bergsten, J. 2005. A review of long-branch attraction. Cladistics 21:163193.
Brochu, C. 1997. Morphology, fossils, divergence timing, and the phylogenetic relationships of Gavialis. Syst. Biol. 46:479-522.
Burleigh, J. G., A. C. Driskell, and M. J. Sanderson. 2006. Supertree
bootstrapping methods for assessing phylogenetic variation among
genes in genome-scale data sets. Syst. Biol. 55:426-440.
Downloaded from http://sysbio.oxfordjournals.org/ by guest on October 6, 2014
et al., 2006). In fact, the 3 divergent taxa, which also were avzevii + S. bayanus clade. In the most extreme case that
characterized by the largest shifts in base composition we examined (a data set that included 6 of 8 species in
(Collins et al., 2005), accounted for all conflicts among the yeast matrix), the "wrong clade" was preferred over
the "right clade" in the majority of ML gene trees (57 of
genes in ML analyses.
Because of extensive incongruence, Rokas et al. (2003) 106 = 54%) and in the concatenated analysis of 106 genes
found that 20 randomly sampled genes from the yeast (Fig. 4).
This result represents a cautionary tale for phylogematrix were required for a robustly supported tree of 8
species, but this result has no generality. Even within this nomic studies, in which >100 genes from relatively few
106 gene matrix, it is clear that some systematic problems taxa may be sampled, and where congruence among inare much more difficult to solve relative to others. For the dividual gene trees has been used to assess support (e.g.,
5 closely related Saccharomyces species, one gene might Rokas et al., 2003; Holland et al., 2004, 2006; Burleigh
be sufficient (Fig. 3c). ML analyses of individual genes et al., 2006). Previously, many authors have argued
produced the same tree 106 straight times, and sets of 600 that large concatenations of genes can provide strong,
randomly sampled nucleotides consistently supported but spurious, bootstrap support because of model misthis topology (Fig. 3b). By contrast, in ML analyses of specification, inadequate taxon sampling, or both (e.g.,
these 5 species plus C. albicans, 106 concatenated genes Philippe and Douzery, 1994; Naylor and Brown, 1998;
(127,026 nucleotides) apparently were insufficient; the Holland et al., 2004,2006; Phillips et al., 2004; Soltis et al.,
optimal ML tree (Fig. 4) contradicted the best tree for 2004; Stefanovic et al., 2004; Hedtke et al., 2006; Jeffroy
all 8 yeast species, a topology that was thought to show et al., 2006). Here, we documented an exceptional patan unprecedented level of support (Fig. la, b; Gee, 2003; tern of replicated conflict in which a consensus derived
from separate analyses of >100 genes failed to give the
Rokas et al., 2003).
Clearly, the quantity of genes that is required to right result; nearly twice as many gene trees favored
robustly resolve relationships will be dependent on the wrong grouping of S. kudriavzevii + S. bayanus over
the specifics of the phylogenetic problem at hand the right S. cerevisiae + S. paradoxus + S. mikatae + S.
(Cummings and Meyer, 2005; Hedtke et al., 2006), as well kudriavzevii clade (Fig. 4). In comparison to concatenaas a particular researcher's definition of "adequate sup- tion of genes, it might be expected that partitioned phyport" (e.g., Satta et al, 2000; Zander, 2001; Siddall, 2002; logenetic analyses of individual genes should be less
Grant and Kluge, 2003; Soltis et al., 2004; Taylor and Piel, prone to highly supported but spurious results. Unfor2004; Jeffroy et al., 2006). For easy phylogenetic problems tunately, this is not always the case (Fig. 4), and a simwhere divergence among taxa is not great and intern- ple compilation of many genes for very few taxa (Rokas
odes are moderately long, a single gene might provide et al., 2003; Rokas and Carroll, 2004) cannot be trusted
high bootstrap support (e.g., Fig. 3). However, even in as a general solution for "ending incongruence" (Gee,
this situation, sequencing 2 or more genes may be jus- 2003).
tified, given that tightly linked nucleotides do not necessarily provide independent evidence for phylogenetic
ACKNOWLEDGEMENTS
relationships (Doyle, 1992). For cases where exceptionWe
thank
R.
Baker,
T. Collins, A. de Queiroz, J. Garb, C. Hayashi,
ally long branches are apparent, even 106 genes might
M. R. McGowen, R. Page, and two anonymous reviewers for comments
not be enough (e.g., Fig. 4). When faced with extreme on different versions of the manuscript. J. Gatesy was supported by NSF
branch lengths like these, increased taxonomic sampling (USA) DEB-0212572, DEB-0213171, and EAR-0228629; R. DeSalle was
(e.g., Zwickl and Hillis, 2002), evidence from the fos- supported by the Lewis B. and Dorothy Cullman Program in Molecsil record (e.g., Brochu, 1997), or a set of more slowly ular Systematics at the American Museum of Natural History and by
evolving genes (e.g., Springer et al., 2001) with station- NSF (USA) DBI-0421604; N. Wahlberg was supported by the Swedish
Council 621-2004-2853. G. Naylor provided alignments of anary base frequencies (e.g., Collins et al., 2005) may be Research
imal mitochondrial genomes. A. Rokas provided published multiple
required. Educated guesses can be made, but in the end, sequence alignments and supporting materials that made the present
the amount of character data needed to arrive at a sta- study possible.
ble, well-supported phylogenetic hypothesis can only be
quantified by adding new data to existing data and then
reassessing the results.
REFERENCES
2007
POINTS OF VIEW
363
Downloaded from http://sysbio.oxfordjournals.org/ by guest on October 6, 2014
Collins, T. M., O. Fedrigo, and G. J. P. Naylor. 2005. Choosing the best Philippe, H., and E. Douzery. 1994. The pitfalls of molecular phylogeny
genes for the job: The case for stationary genes in genome-scale phybased on four species, as illustrated by the Cetacea/Artiodactyla
logenies. Syst. Biol. 54:493-500.
relationship. J. Mamm. Evol. 2:133-152.
Cummings, M. P., and A. Meyer. 2005. Magic bullets and golden Phillips, M. J., F. Delsuc, and D. Penny. 2004. Genome-scale phylogeny
rules: Data sampling in molecular phylogenetics. Zoology 108:329and the detection of systematic biases. Mol. Biol. Evol. 21:1455-1458.
336.
Poe, S. 1998. Sensitivity of phylogeny estimation to taxonomic samCummings, M. P., S. P. Otto, and J. Wakeley. 1995. Sampling properpling. Syst. Biol. 47:18-31
ties of DNA sequence data in phylogenetic analysis. Mol. Biol. Evol. Pol, D., and M. E. Siddall. 2001. Biases in maximum likelihood and par12:814-822.
simony: A simulation approach to a 10-taxon case. Cladistics 17:266Doyle, J. J. 1992. Gene trees and species trees: Molecular systematics as
281.
one-character taxonomy. Syst. Bot. 17:144-163.
Posada, D., and K. Crandall. 1998. ModelTest: Testing the model of
Felsenstein, J. 1978. Cases in which parsimony and compatibility methDNA substitution. Bioinformatics 14:817-818.
ods will be positively misleading. Syst. Zool. 27:401-410.
Ren, F., H. Tanaka, and Z. Yang. 2005. An empirical examination of the
Felsenstein, J. 1985. Confidence limits on phylogenies: An approach
utility of codon-substitution models in phylogeny reconstruction.
using the bootstrap. Evolution 39:783-791.
Syst. Biol. 54:808-818.
Gatesy, J., and R. H. Baker. 2005. Hidden likelihood support in ge- Rokas, A., and S. B. Carroll. 2005. More genes or more taxa? The relanomic data: Can forty-five wrongs make a right? Syst. Biol. 54:483tive contribution of gene number and taxon number to phylogenetic
492.
accuracy. Mol. Biol. Evol. 22:1337-1344.
Gatesy, J., C. Matthee, R. DeSalle, and C. Hayashi. 2002. Resolution of Rokas, A., B. Williams, N. King, and S. B. Carroll. 2003. Genome-scale
a supertree/supermatrix paradox. Syst. Biol. 51:652-664.
approaches to resolving incongruence in molecular phylogenies. NaGee, H. 2003. Ending incongruence. Nature 425:782.
ture 425:798-804.
Goloboff, P. A., and D. Pol. 2005. Parsimony and Bayesian phyloge- Satta, Y., J. Klein, and N. Takahata. 2000. DNA archives and our nearest
netics. Pages 148-159 in Parsimony, phylogeny, and genomics (V. A.
relative: The trichotomy problem revisited. Mol. Phylogenet. Evol.
Albert, ed.). Oxford University Press, Oxford, UK.
14:259-275.
Goremykin, V. V. 2004. The chloroplast genome of Nymphaea alba: Siddall, M. 1995. Another monophyly index: Revisiting the jackknife.
Whole-genome analyses and the problem of identifying the most
Cladistics 11:33-56.
basal angiosperm. Mol. Biol. Evol. 21:1445-1454.
Siddall, M., and M. Whiting. 1999. Long-branch abstractions. Cladistics
Grant, T, and A. G. Kluge. 2003. Data exploration in phylogenetic in15:9-24.
ference: Scientific, heuristic, or neither. Cladistics 19:379-418.
Siddall, M. E. 2002. Measures of support. Pages 80-101 in Methods and
Hedges, S. B., J. E. Blair, M. L. Venturi, and J. L. Shoe. 2004. A molecular tools in biosciences and medicine: Techniques in molecular systemtimescale of eukaryote evolution and the rise of complex multicelluatics and evolution (R. DeSalle, G. Giribet, and W. Wheeler, eds.).
lar life. BMC Evol. Biol. 4:2.
Birkhauser Verlag, Basel, Switzerland.
Hedtke, S. M., T. M. Townsend, and D. M. Hillis. 2006. Resolution of Soltis, D. E., V. A. Albert, V. Savolainen, K. Hilu, Y.-L. Qiu, M. W. Chase,
phylogenetic conflict in large data sets by increased taxon sampling.
J. S. Farris, S. Stefanovic, D. W. Rice, J. D. Palmer, and P. S. Soltis. 2004.
Syst. Biol. 55:522-529.
Genome-scale data, angiosperm relationships, and 'ending congruHendy, M. D., and D. Penny. 1989. A framework for the quantitative
ence': A cautionary tale in phylogenetics. Trends Plant Sci. 9:477-483.
study of evolutionary trees. Syst. Zool. 38:297-309.
Springer, M. S., R. W. DeBry, C. Douady, H. M. Amrine, O. Madsen,
Holland, B. R., K. T. Huber, V. Moulton, and P. J. Lockhart. 2004. Using W. W. de Jong, and M. J. Stanhope. 2001. Mitochondrial versus nuconsensus networks to visualize contradictory evidence for species
clear gene sequences in deep-level mammalian phylogeny reconphylogeny. Mol. Biol. Evol. 21:1459-1461.
struction. Mol. Biol. Evol. 18:132-143.
Holland, B. R., L. S. Jermiin, and V. Moulton. 2006. Improved consensus Stefanovic, S., D. W. Rice, and J. D. Palmer. 2004. Long branch attraction,
taxon sampling, and the earliest angiosperms: Amborella or mononetwork techniques for genome-scale phylogeny. Mol. Biol. Evol.
cots? BMC Evol. Biol. 4:35.
23:848-855.
Holland, B. R., D. Penny, and M. D. Hendy. 2003. Outgroup misplace- Susko, E., M. Spencer, and A. J. Roger. 2005. Biases in phylogenetic
ment and phylogenetic inaccuracy under a molecular clock—A simestimation can be caused by random sequence segments. J. Mol. Evol.
61:351-359.
ulation study. Syst. Biol. 52:229-238.
Huelsenbeck, J. P. 1998. Systematic bias in phylogenetic analysis: Is the Swofford, D. L. 2002. PAUP*. Phylogenetic analysis using parsiStrepsiptera problem solved? Syst. Biol. 47:519-537.
mony (*and other methods). Version 4.0bl0. Sinauer Associates,
Huelsenbeck, J. P., J. P. Bollback, and A. M. Levine. 2002. Inferring the
Sunderland, Massachusetts.
root of a phylogenetic tree. Syst. Biol. 51:32-43.
Taylor, D. J., and W. H. Piel. 2004. An assessment of accuracy, error, and
Huelsenbeck, J. P., and D. M. Hillis. 1993. Success of phylogenetic methconflict with support values from genome-scale phylogenetic data.
Mol. Biol. Evol. 21:1534-1537.
ods in the four-taxon case. Syst. Biol. 42:247-264.
Jeffrey, O., H. Brinkmann, F. Delsuc, and H. Philippe. 2006. Phyloge- Wheeler, W. C. 1990. Nucleic acid sequence phylogeny and random
nomics: The beginning of incongruence. Trends Genet. 22:225-231.
outgroups. Cladistics 6:363-367.
Lanyon, S. 1985. Detecting internal inconsistencies in distance data. Zander, R. H. 2001. A conditional probability of reconstruction measure
Syst. Zool. 34:397-403.
for internal cladogram branches. Syst. Biol. 50:425-437.
Murphy, W. J., E. Eizirik, S. J. O'Brien, O. Madsen, M. Scally, C. J. Zwickl, D. J., and D. M. Hillis. 2002. Increased taxon sampling greatly
Douady, E. Teeling, O. A. Ryder, M. J. Stanhope, W. W. de Jong,
reduces phylogenetic error. Syst. Biol. 51:588-598.
and M. S. Springer. 2001. Resolution of the early placental mammal
radiation using Bayesian phylogenetics. Science 294:2348-2351.
Naylor, G. J. P., and W. M. Brown. 1998. Amphioxus mitochondrial First submitted 28 April 2006; reviews returned 7 July 2006;
final acceptance 19 October 2006
DNA, chordate phylogeny, and the limits of inference based on comparisons of sequences. Syst. Biol. 47:61-76.
Associate Editor: Tim Collins