Epistasis between antibiotic resistance mutations drives the

65
Evolution, Medicine, and Public Health [2013] pp. 65–74
doi:10.1093/emph/eot003
Epistasis between antibiotic
resistance mutations drives
the evolution of extensively
drug-resistant tuberculosis
So`nia Borrell1,2, Youjin Teo1,2, Federica Giardina1,2, Elizabeth M. Streicher3,
¨ller1,2,3, Tommie C. Victor3 and
Marisa Klopper3, Julia Feldmann1,2, Borna Mu
1,2
Sebastien Gagneux*
1
Department of Medical Parasitology and Infection Biology, Swiss Tropical and Public Health Institute, 4002 Basel,
Switzerland; 2University of Basel, 4003 Basel, Switzerland and 3DST/NRF Centre of Excellence for Biomedical
Tuberculosis Research/MRC Centre for Molecular and Cellular Biology, Division of Molecular Biology and Human
Genetics, Faculty of Health Sciences, Stellenbosch University, 7505 Cape Town, South Africa
*Corresponding author. Department of Medical Parasitology and Infection Biology, Swiss Tropical and Public Health
Institute (Swiss TPH), Socinstrasse 57, CH-4002 Basel, Switzerland. Tel:+41-61-284-8369; Fax:+41-61-284-8101;
E-mail: sebastien.gagneux@unibas.ch
The first two authors contributed equally to this work.
Received 20 December 2012; revised version accepted 5 March 2013
ABSTRACT
Background and objectives: Multidrug resistant (MDR) bacteria are a growing threat to global health.
Studies focusing on single antibiotics have shown that drug resistance is often associated with a fitness
cost in the absence of drug. However, little is known about the fitness cost associated with resistance to
multiple antibiotics.
Methodology: We used Mycobacterium smegmatis as a model for human tuberculosis (TB) and an in
vitro competitive fitness assay to explore the combined fitness effects and interaction between mutations conferring resistance to rifampicin (RIF) and ofloxacin (OFX); two of the most important first- and
second-line anti-TB drugs, respectively.
Results: We found that 4 out of 17 M. smegmatis mutants (24%) resistant to RIF and OFX showed a
statistically significantly higher or lower competitive fitness than expected when assuming a multiplicative model of fitness effects of each individual mutation. Moreover, 6 out of the 17 double drug-resistant
mutants (35%) had a significantly higher fitness than at least one of the corresponding single drugresistant mutants. The particular combinations of resistance mutations associated with no fitness
deficit in M. smegmatis were the most frequent among 151 clinical isolates of MDR and extensively
drug-resistant (XDR) Mycobacterium tuberculosis from South Africa.
Conclusions and implications: Our results suggest that epistasis between drug resistance mutations in
mycobacteria can lead to MDR strains with no fitness deficit, and that these strains are positively
ß The Author(s) 2013. Published by Oxford University Press on behalf of the Foundation for Evolution, Medicine, and Public Health. This is an Open
Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits
unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
orig inal
research
article
66
| Borrell et al.
Evolution, Medicine, and Public Health
selected in settings with a high burden of drug-resistant TB. Taken together, our findings support a role
for epistasis in the evolution and epidemiology of MDR- and XDR-TB.
K E Y W O R D S : microbiology; antimicrobial; epidemiology; infection
BACKGROUND AND OBJECTIVES
Epistasis refers to the phenomenon where the phenotypic effect of one mutation differs depending on the
presence of another mutation [1]. The importance of
epistasis for our understanding of biology is increasingly recognized; it has been implicated in many
processes, ranging from pathway organization, the
evolution of sexual reproduction, mutational load,
and genomic complexity, to speciation and the origin
of life [2]. Moreover, recent studies have reported a
role for epistasis in the evolution of antibiotic resistance [3–6]. Multidrug-resistant (MDR) bacteria are
emerging worldwide, in some cases leading to incurable disease. Although new antibiotics are urgently
needed, a better understanding of the forces that lead
to the emergence of drug resistance would help prolong the lifespan of existing drugs.
Studies in various bacterial species have
shown that the acquisition of antibiotic resistance
often imposes a physiological cost on the bacteria
in absence of the drug [7–9]. However, some drug
resistance conferring mutations have been
associated with low or no fitness cost, and compensatory evolution can mitigate some of the initial
fitness defects associated with particular drug resistance conferring mutations [10]. Most of these
studies have focussed on resistance to a single drug.
Given the public health threat posed by MDR bacteria, there is a need to understand the factors that
influence the emergence of resistance to multiple
drugs.
Recent studies in model organisms have
shown that mutations conferring resistance to different drugs can interact epistatically. A study in
Pseudomonas aeruginosa found that the relative fitness of certain strains resistant to streptomycin and
rifampicin (RIF) [4,6] was lower than expected based
on the fitness of the corresponding single-resistant
mutants. Similarly, a study in Escherichia coli [3]
showed that strains resistant to two drugs can have
a higher fitness than strains resistant to only one
drug; a phenomenon referred to as ‘sign epistasis’
[11]. However, whether such epistatic interactions
play any role in the emergence and spread of MDR
bacteria in clinical settings has not been determined.
Multidrug resistance is a particular problem in
human tuberculosis (TB) [12]. Recent surveillance
data showed the highest rates of resistance
ever documented with some Eastern European
countries reporting up to 50% of TB cases as MDR
[13]. In Mycobacterium tuberculosis, the main causative agent of human TB, drug resistance is
chromosomally encoded and results from de novo
acquisition of mutations in particular genes [14].
These mutations are acquired sequentially, giving
rise to MDR and extensively drug-resistant (XDR)
strains [15,16]. MDR-TB is defined as strains resistant to at least RIF and isoniazid, the two most important first-line anti-TB drugs. XDR-TB is caused by
strains that, in addition to being MDR, are also resistant to ofloxacin (OFX), or any other fluoroquinolone, and to at least one of the injectable second-line
drugs [17].
In this study, we used Mycobacterium smegmatis as
a model for M. tuberculosis to investigate putative
epistatic interactions between mutations conferring
resistance to RIF and OFX, two of the most
widely used first- and second-line anti-TB drugs,
respectively. M. smegmatis is used widely in the TB
research community because it is non-pathogenic, in
contrast to M. tuberculosis, which requires biosafetylevel 3 containment. Moreover, M. smegmatis forms
visible colonies in 2–3 days, compared with 3–4
weeks for M. tuberculosis. We then compared our experimental data generated with M. smegmatis to the
clinical frequency of particular combinations of RIF
and OFX resistance conferring mutations in a panel
of MDR and XDR M. tuberculosis clinical strains from
South Africa.
METHODOLOGY
Bacterial strains and growing conditions
All strains used for the competitive fitness experiments were derived from the wild-type M. smegmatis
strain mc2155. Bacteria were grown in Middlebrook
7H9 broth supplemented with ADC or on
Middlebrook 7H11 agar plates supplemented with
Borrell et al. |
Epistasis in drug-resistant tuberculosis
OADC. The culture tubes were incubated in standard
conditions and the optical density (OD600) was
recorded daily to measure the growth.
Selection of single- and double-resistant M.
smegmatis mutants
Independent RIF- and OFX-resistant M. smegmatis
single mutants were isolated as follows. A starting
culture of M. smegmatis mc2155 was prepared
from wild-type M. smegmatis and adjusted to
300 bacilli/ml (OD600 0.01). Ten milliliter of
culture was transferred into 14 individual 50 ml
falcon tubes. When the bacteria reached end of
log-phase (OD600 3.00), the cultures were
concentrated by centrifugation at 1500 rpm for 5
min, the supernatant discarded, and the bacteria
resuspended in 500 ml Middlebrook 7H9 media.
This concentrated bacterial culture was plated
onto Middlebrook 7H11 media containing 200 mg
RIF/ml for the isolation of RIF-resistant colonies,
and 2 mg OFX/ml for the isolation of OFX-resistant colonies. The plates were incubated for 3–5
days at 37 C until colonies became visible. One
colony from each plate was picked and subcultured in antibiotic-free Middelbrook 7H9 broth.
For the isolation of double-resistant mutants, different rpoB- and gyrA-mutants were used to
generate different combinations of mutations
conferring resistance to both antibiotics. Some
double-resistant mutants were selected by plating
on Middlebrook 7H11-OADC media containing
both 200 mg/ml of RIF and 2 mg/ml of OFX.
Mutation identification
The main target genes for resistance to RIF and
OFX are rpoB and gyrA, respectively. To detect the
relevant drug resistance conferring mutations, the
rpoB and gyrA genes were amplified by PCR using
DNA extracted from the single- and the doubleresistant mutants. The primers used to amplify
the portion of the rpoB gene encoding the main
set of mutations conferring resistance to RIF were
50 -GGA CGT GGA GGC GAT CAC ACC-30 . For
amplification of the gyrA gene, the primers 50 CAT GAG CGT GAT CGT GGG CCG-30 and 50 CAG AAC CGT GGG CTC CTG CAC-30 were used.
The same primers were used for direct DNA
sequencing from the PCR product.
Fitness assay and calculation of fitness ratio
The rpoB-, gyrA- and rpoB–gyrA-mutants were
competed against the wild-type antibiotic-susceptible strain in antibiotic-free Middlebrook 7H9
media. A total of 100 CFU of bacteria/ml were
inoculated in 10 ml of Middlebrook 7H9 media in a
1:1 ratio. For each wild-type-mutant pair, between
four and eight replicate competition assays were performed. At the start of the experiment (t = 0 h),
50 ml from each competition culture was plated on
both antibiotic-free- and antibiotic-containing
Middlebrook 7H11 plates in triplicates to estimate
the baseline CFU counts. The competition cultures
were incubated at standard conditions on a shaking
incubator at 100 rpm, and the optical densities
(OD600) were recorded daily. After 72 h, the same
competition cultures were diluted 105- to 106-fold
and plated on both selective and non-selective
Middlebrook 7H11 media to obtain the endpoint
CFU counts. For both competing strains, the
Malthusian parameters were calculated by taking
the natural log of the endpoint CFU over the baseline
CFU [7]. The mean CFU count of the three replicates
was used for the calculation of the relative competitive fitness. This gave the Malthusian parameters
(ms and mr) for both strains, which correspond to
the number of doublings (generations) that each
strain went through during the observed time
period. Finally, the relative fitness of the drug-resistant strain relative to the wild-type was determined
using Wrs = mr/ms [7]. Shapiro–Wilk test evidenced
the normality of the fitness data (P = 0.3). Student’s
t-test was used to detect differences in the mean
fitness and the limit for statistical significance was
set at P = 0.05. Test statistics and estimates were
based on 1000 bootstrap replicates. Statistical analysis was performed with STATA SE/10.
Measuring epistasis
To explore putative genetic interactions between
drug resistance mutations, pairwise epistasis (e)
was measured assuming a multiplicative model in
which e = WABWab WAbWaB, where Wab is the fitness of the clone carrying alleles a and b, and capital
letters represent the wild-type sensitive alleles [3].
Following this model, values of e > 0.0 indicate that
the fitness of the double mutant is higher than
expected based on the fitness values of the individual single mutants. Similarly, values of e < 0.0 indicate that the fitness of the double mutant is lower
67
68
| Borrell et al.
Evolution, Medicine, and Public Health
than expected based on the fitness values of the individual single mutants. We tested the normality of
the epistasis data with a Shapiro–Wilk test. To test
whether epistasis values were significantly different
from zero, we used the error-propagation method
described by Trindade et al. [3]. We considered that
alleles a and b showed significant epistasis whenever the calculated error was smaller than the average value of e (Fig. 3).
To detect the presence of sign epistasis, we performed pairwise comparisons between the fitness of
each double-resistant mutant and the corresponding single-resistant mutants using a one-sided bootstrap Student’s t-test with 1000 replicates (Fig. 5).
The combined P-values were obtained using Fisher’s
method.
Clinical frequency of rpoB and gyrA mutation
combinations in M. tuberculosis
A total of 151 clinical MDR- and XDR-TB M. tuberculosis isolates were included in this study. These were
collected in the Eastern (N = 99) and Western Cape
(N = 52) Provinces of South Africa between 2008–
2009 and 2001–2008, respectively. RIF and OFX resistance determining regions in the rpoB and gyrA
genes were analysed using standardized PCR and
sequencing [18,19]. Amplification products were
sequenced using an ABI 3130XL genetic analyzer,
and the resulting chromatograms were analysed
using Chromas software.
RESULTS
Fitness cost of single drug-resistant mutants
We first determined the relative fitness of M.
smegmatis mutants resistant to a single drug. To this
end, we selected a series of spontaneous M.
smegmatis mutants resistant to RIF or OFX. From
the RIF-selected mutants, we used five clones with
rpoB mutations for further analysis (H526R, H526P,
H526Y, S531W and S531L) (Supplementary Table
S1). These mutants were competed in vitro against
their RIF-susceptible ancestor as described previously [7]. We found that S531L, S531W and H526Y
showed no difference in relative fitness compared
with the ancestor (Fig. 1A), while H526R and
H526P showed a significantly lower relative fitness
(Bootstrap P = 0.02 and P < 0.01, respectively).
Similar to previous work in M. tuberculosis [7], we
found a strong correlation between fitness cost of
rpoB mutations in M. smegmatis and the frequency of
these mutations in clinical isolates of M. tuberculosis
(Spearman’s Rank coefficient 0.9, P = 0.04;
Supplementary Table S1). Individually, S531L and
H526Y that showed no fitness cost in our M.
smegmatis model are the most frequent RIF resistance conferring mutations in clinical settings,
whereas S526P that had the lowest relative fitness
of all mutants occurs only in 0.1% of clinical strains
(Supplementary Table S1). We found no correlation
between the spontaneous mutation frequency of
rpoB mutations and the clinical frequency of these
mutations (Supplementary Table S1).
From the OFX-selected mutants, we selected four
that carried distinct gyrA mutations for further analysis (D94G, G88C, D94N and D94Y). In vitro competition against the OFX-susceptible ancestor
revealed that mutants carrying D94G and D94Y
had no fitness defect, while D94N and G88C had a
significantly lower relative fitness (Bootstrap
P = 0.02 and P < 0.01, respectively) (Fig. 1B). We
compared our fitness measures with the frequency
of gyrA mutations found in M. tuberculosis clinical
isolates using data from a recently published review
based on 1220 OFX-resistant M. tuberculosis isolates
[20] (Supplementary Table S2). Similar to our
findings with RIF-resistant mutants, we found that
mutations at codon position 94 of gyrA, which
showed overall the highest in vitro fitness in M.
smegmatis, were the most common mutations in
M. tuberculosis clinical strains. By contrast, gyrA
G88C that had the lowest fitness is only rarely
(1.6%) found in clinical settings (Supplementary
Table S2). In contrast to the rpoB mutations, mutations at codon position 94 of gyrA were also the most
frequent during the in vitro selection (Supplementary
Table S2).
Evidence for epistasis between rpoB and gyrA
mutations
To test for possible epistatic interactions between
mutations conferring RIF and OFX resistance, we
selected for spontaneous mutants resistant to both
drugs. These double mutants harbouring a mutation
in rpoB and gyrA were selected starting from the
available single drug-resistant mutants. A total of
17 rpoB–gyrA double mutants were generated out
of the 20 possible combinations (Supplementary
Table S3). The relative fitness of the double mutants
was determined by standard competition assays
against the pan-susceptible ancestor strain and
Borrell et al. |
Epistasis in drug-resistant tuberculosis
B
A 1.2
Relative fitness
1
0.8
0.6
0.4
0.2
94
D
94
D
G
Y
C
N
88
G
L
31
W
31
6Y
6P
6R
94
D
S5
S5
52
H
52
H
52
H
gyrA mutants
rpoB mutants
Figure 1. Relative fitness of M. smegmatis mutants resistant to a single drug compared with their pan-susceptible ancestor. Bars
represent 95% confidence intervals. (A) Relative fitness of rpoB single mutants resistant to RIF. (B) Relative fitness of gyrA single
mutants resistant to OFX
B
1.2
rpoB
531L
1.1
1.0
88C
531W
526P
526R
526Y
-
-
+
-
+
+
+
0.9
0.8
gyrA
Observed fitness
A
0.7
94N
-
94Y
-
+
+
94G
+
-
+
0.6
0.5
0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10
-
-
Expected fitness
Figure 2. Evidence of epistasis between mutations conferring resistance to RIF and OFX. (A) Relationship between observed
and expected multiplicative fitness for the 17 double-resistant mutants (data point above/below the bar). The solid line represents the null hypothesis of multiplicative fitness effects. Deviations from this line arise as a consequence of epistatic fitness
effects. (B) Allelic combination analysed and the corresponding sign of epistasis. The grey squares correspond to the pairs of
mutations showing statistically significant epistasis
compared with the fitness of the corresponding single-resistant mutants. We compared the observed
fitness of each double mutant with the expected fitness assuming no epistasis based on a multiplicative model (Fig. 2, see ‘Methodology’ section for
details). We found that in 11/17 (65%) of the double
mutants, the observed fitness was different from the
expected, suggesting either negative or positive
epistasis between particular RIF and OFX resistance
conferring mutations (Fig. 2A).
To measure epistasis quantitatively, we measured
pairwise epistasis (e) between all the different single-mutant pairs we had fitness data for, assuming a
multiplicative model (Supplementary Table S4);
positive and negative values of e indicate positive
or negative epistasis, respectively [3]. Overall, the
e-values across all mutant pairs followed a normal
distribution (Shapiro–Wilk, P = 0.062) with an average positive value of 0.027 (95% confidence interval
0.02, 0.08) (Supplementary Table S4). Four out of
17 (24%) double mutants showed statistically significant positive or negative epistasis between RIF
and OFX resistance conferring mutations. Moreover
as shown in Fig. 2B, these epistatic interactions were
allele-specific, showing differences in the sign (i.e.
positive versus negative) of the e-value depending
on the specific amino acid change at a particular
codon position.
Theoretical and experimental evidence predicts a
correlation between the average deleterious effect of
a single mutation and the strength of epistasis
[21–23]. Hence, we tested whether this relationship
holds for drug-resistant mycobacteria. In agreement
with these predictions, we found a negative
69
| Borrell et al.
Evolution, Medicine, and Public Health
MF ε
0.6
0.5
0.4
Average epistasis
70
0.3
R 2= 0.78
0.2
0.1
0
0.75
0.8
0.85
0.9
0.95
1
1.05
1.1
Expected fitness
Figure 3. Correlation between the average expected fitness and the strength of epistasis. Average epistasis was measured as
deviation from a multiplicative model of double-resistant mutant fitness scores estimated by head-to-head competition in
Middlebrook 7H9 broth. MFe: minimum fitness for e
correlation between the expected fitness of our
double mutants and the strength of epistasis between the respective RIF and OFX resistance
conferring mutations (R2 = 0.78; P < 0.001) (Fig. 3).
However, this correlation was only observed above a
particular threshold of expected fitness, which we
refer to as ‘minimal fitness for epistasis’ (MFe).
Above MFe, epistasis tended to be positive when
individual mutations were costly and negative when
individual mutations were beneficial [21,23]. Below
MFe, the correlation was lost (R2 = 0.07; P = 0.304),
likely because these data points were all derived
from mutants carrying the G88C mutation in gyrA,
which was associated with a high fitness defect.
Evidence for sign epistasis in rpoB/gyrA double
mutants
Sign epistasis refers to the case where a particular
mutation that is deleterious on its own is beneficial
in the presence of another mutation [3]. In the context of drug resistance, sign epistasis occurs when
the fitness of the double-resistant mutant is higher
than at least one of the corresponding singleresistant mutants. We found that 6 out of 17 double
mutants (35%) showed statistically significant evidence of sign epistasis (Fig. 4). In addition, the
observed sign epistatis was allele specific, i.e. the
epistatic effects varied according to the specific
alleles of the same gene. For example, D94N in
gyrA led to the conversion of the fitness sign in the
S526P RIF-resistant background but not in the
S531L RIF-resistant background (Fig. 4).
Role of epistasis in clinical XDR-TB
Given the evidence for epistasis between RIF and
OFX resistance mutations in M. smegmatis, we
investigated how fitness changes along the mutational pathway leading from MDR-TB to XDR-TB
might be influenced by corresponding epistatic
interactions in M. tuberculosis (Fig. 5A). In the standard treatment protocols for TB [17], RIF is an essential part of the first-line regimen for drug-susceptible
disease, and OFX is a part of the second-line regimen when resistance against first-line drugs has developed. Thus, rpoB mutations are generally
acquired first and gyrA mutations second.
Following this trajectory, selection by RIF will occur
first, and the RIF-resistant mutants that survive will
exhibit heterogeneous fitness in the absence of the
drug depending on their rpoB mutations (Fig. 1)
[7,24]. At this point, MDR-TB has developed and second-line treatment is initiated. Selection for OFX
resistance begins, but the fitness levels of the
emerging double mutants can still be positively or
negatively affected depending on which gyrA mutation is acquired. Our M. smegmatis data showed that
Borrell et al. |
Epistasis in drug-resistant tuberculosis
1.4
Relative fitness
1.2
1.0
0.8
0.6
0.4
0.2
88
52
C
6Y
/G
S5
88
C
31
W
/G
88
S5
C
31
L/
D9
H
4N
52
6P
/D
94
H
52
N
6R
/D
94
H
52
N
6Y
/D
94
S5
N
31
L/
D9
H
4Y
52
6P
/D
94
H5
Y
26
Y/
D
S5
9
31 4Y
W
/D
94
S5
Y
31
L/
D
S5
94
G
31
W
/D
94
H
52
G
6P
/D
94
H
52
G
6R
/D
94
H
G
52
6Y
/D
94
G
H
R/
G
52
6
H
H
52
6
P/
G
88
C
0.0
rpoB/gyrA mutations
Figure 4. Evidence for sign epistasis between mutations conferring resistance to RIF and OFX. Sign epistasis occurs when the
fitness of the double-resistant mutant (pink bar) is greater than the fitness of at least one corresponding single-resistant mutant
[purple-(RIF) and blue-(OFX) bars]. The bars represent the standard deviation of the values. Double-resistant mutants with a
bootstrapped P < 0.05 are highlighted with a star
the gyrA D94G mutation was associated with improved fitness in all of the double mutants, irrespective of the rpoB mutation (pink bars compared with
purple bars in Fig. 4). This was statistically significant in two of the five corresponding double mutants
tested. Hence, based on the most likely clinical scenario of moving from MDR- to XDR-TB (Fig. 5A), we
would expect the gyrA D94G mutation to be the most
commonly found mutation in XDR-TB strains, and
also to be found in combination with many different
rpoB mutations. By contrast, we would expect gyrA
G88C, which was consistently associated with negative epistasis in our M. smegmatis model (Figs 3, 4
and 5A), to show the opposite trend. To test these
predictions, we analysed 151 MDR- and XDR-TB
clinical isolates from South Africa. Sequencing of
the relevant genes revealed that 71/151 (47%)
harboured gyrA D94G whereas G88C occurred only
once (0.7%). Moreover, among the gyrA mutations
represented in the M. smegmatis dataset, gyrA D94G
was the only mutation that occurred in combination
with four different rpoB mutations in clinical strains
(Fig. 5B). Taken together, our results show that experimental fitness data generated with M. smegmatis
can be predictive of clinical TB. Moreover, these
findings support a role for epistasis in the progression of M. tuberculosis from MDR to XDR.
CONCLUSION AND IMPLICATIONS
In this study, we used M. smegmatis as a model to
show that epistasis can occur between mutations
conferring resistance to RIF and OFX, which are
two of the most important anti-TB drugs.
Specifically, in several of the mutants resistant to
both of these drugs, some of the mutations
conferring resistance to one drug mitigated the
negative fitness effects of some of the mutations
conferring resistance to the other drug (or vice
versa). Moreover, we found clear evidence of sign
epistasis, showing that in some cases, the doubleresistant mutants had a higher relative fitness than
at least one of the corresponding single-resistant
mutants. In the context of MDR, sign epistasis between different drug resistance conferring mutations represent the worst case scenario; instead of
accumulating fitness defects with each additional
drug resistance, MDR strains manage to increase
their relative fitness by acquiring additional drug resistance determinants. One limitation of our study is
that we cannot exclude the possibility that additional
mutation(s) could have arisen during the selection
of our mutants, which may compensate for the initial
fitness defects associated with the individual resistance mutations.
More work is needed to elucidate the mechanisms
involved in the interaction between mutations in
rpoB and gyrA. Yet, several features make such interactions biologically plausible. GyrA encodes one of
the subunits of DNA gyrase which is involved in the
introduction of negative supercoiling to doublestranded DNA, thereby relaxing the positive supercoils that form during DNA replication [25]. RpoB
encodes a part of the RNA polymerase and therefore
71
72
| Borrell et al.
Evolution, Medicine, and Public Health
B
A
Figure 5. (A) Mutational pathway leading to rpoB–gyrA double mutants when a patient undergoes standard TB treatment. RpoB
mutations are generally acquired first, followed by gyrA mutations. The relative fitness of the various double-resistant mutants is
indicated as determined by in vitro competition using the M. smegmatis model. wt—drug-susceptible wild-type strain; rpoB—
point mutations in rpoB conferring RIF resistance; gyrA—point mutations in gyrA conferring OFX resistance. (B) Frequency of
rpoB–gyrA mutation pairs found in MDR- and XDR-TB clinical isolates from the Eastern Cape and Western Cape Provinces of
South Africa (only considering pairs including gyrA mutants for which M. smegmatis fitness data were available; N = 89)
important for the transcription of DNA to RNA [26].
Although these two pathways are separate [27,28],
GyrA and RpoB are both involved in the fundamental
flow from DNA to RNA. Intriguingly, Gupta et
al. isolated an ‘RNA-polymerase-DNA gyrase complex’ in M. smegmatis that exhibited both DNA
supercoiling and transcriptional activities. The authors also found that DNA gyrase inhibitors not only
reduced DNA gyrase activity but also reduced transcriptional activity indicating a role of DNA gyrase in
transcription [29]. Finally, it has been shown that
during transcription, RNA polymerase introduces
positive supercoiling ahead as it slides along its template DNA. This leads to a reduced accessibility as
supercoiling increases, further supporting a potential role for DNA gyrase in transcription [25].
Our study also showed that experimental data obtained from M. smegmatis are relevant for our understanding of clinical TB. Not only did we observe the
same drug resistance conferring mutations in
M. smegmatis as routinely encountered in clinical
strains of M. tuberculosis, but similar to previous
studies, we found a good correlation for both RIF
and OFX between the fitness cost observed in vitro
in M. smegmatis mutants and the relative clinical
frequency of the corresponding mutations in M. tuberculosis [20,24]. Our M. smegmatis data showed
particular relevance when focusing on MDR- and
XDR-TB. Based on the most probable mutational
pathway leading from MDR to XDR, our
M. smegmatis fitness data predicted particular combinations of rpoB and gyrA mutations to be more
frequent than others in clinical settings. This prediction was confirmed when screening a large panel of
MDR and XDR M. tuberculosis clinical strains from
South Africa, which is one of the regions with the
highest burden of XDR-TB in the world [17].
Our mutational pathway analysis also showed
that in some cases, if certain mutations are acquired
first, the fitness of these drug-resistant strains is permanently set at a high baseline that cannot be drastically affected regardless of the individual fitness
cost associated with the second mutation.
Moreover, some gyrA mutations can act as ‘fitness
safety nets’ offering the bacteria the possibility to
recover from loss of fitness caused by any of the
Borrell et al. |
Epistasis in drug-resistant tuberculosis
initial rpoB mutations. Taken together, our results
suggest that although evolution towards MDRand XDR-TB can follow multiple trajectories, these
are likely to be influenced by epistatic interactions
between the particular drug resistance conferring
mutations. This will constrain the particular mutational combinations to those that either increase or
at least maintain fitness at a minimum level (Fig. 4).
Above this minimum level of fitness, our study indicates that the strength of epistasis between gyrA and
rpoB will be stronger when the individual mutations
are associated with large fitness defects. Although
the fitness measures reported here were generated
during in vitro growth, M. tuberculosis is facing
harsher environments during human infection. The
fitness effects of drug resistance mutations have
been shown to vary in different environments
[6,30]. Hence, it would be interesting to explore
how host immune pressure, oxidative and other
stresses might influence epitasis between drug resistance mutations.
Our finding that a specific gyrA mutation (i.e.
D94G; Figs 4 and 5A) can restore the fitness of
strains carrying different rpoB mutations has implications for the development of new TB treatment
regimens. So far, OFX and other fluoroquinolones
have primarily been used as second-line drugs to
treat MDR-TB [31]. However, because of their potential to shorten TB chemotherapy, they are currently
being evaluated in the context of new first-line treatment regimens for drug-susceptible TB [32]. Our results highlight that using fluoroquinolones as firstline treatment is likely to result in the early selection
of fluoroquinolone resistance conferring mutations
such as D94G gyrA that not only confer resistance
but might promote also the acquisition of additional
drug resistance while maintaining bacterial fitness
at an advantageous level, either through positive
epistasis with mutations conferring resistance to
RIF or other drugs, or by establishing a higher baseline fitness [33]. Moreover, exposure to fluoroquinolones induces the bacterial SOS response which
leads to the induction of error-prone DNA polymerases, thereby increasing the bacterial mutation rate
and the propensity of acquiring additional drug resistance conferring mutations [34]. Interestingly, we
found that resistance mutations at codon position
94 of gyrA were also most frequent during in vitro
selection, suggesting that in addition to epistatic
interactions between rpoB and gyrA mutations, other
mechanisms might influence the frequency of
particular combinations of drug resistance mutations in clinical settings.
In conclusion, our study together with previous
findings demonstrates that epistasis between different drug resistance conferring mutations occurs
across several bacterial species. Although our study
focused on the interaction between mutations in
rpoB and gyrA, further work should explore possible
similar effects in resistance to other anti-TB drugs,
both existing as well as those currently under development [35] (http://www.newtbdrugs.org/pipeline.
php). Three new drug candidates have shown
promising results in recent clinical trials of MDRTB treatment [32]. However, how these new compounds should best be deployed, and in what combinations, remains unclear. Our study suggests that
considering putative epistasis between the relevant
drug resistance conferring mutations could help optimize treatment regimens. For example, combining
drugs in which the resistance conferring mutations
interact negatively would reduce the probability of
resistance emerging.
supplementary data
Supplementary data is available at EMPH online.
acknowledgements
We thank all the other members of our group for
the stimulating discussions.
funding
This work was supported by the Swiss National
Science Foundation (grant number PP0033119205) and the National Institutes of Health
(AI090928 and HHSN266200700022C). Funding to
pay the Open Access publication charges for this
article was provided by the Swiss National Science
Foundation (PP0033-119205).
Conflict of interest: None declared.
references
1. Lehner B. Molecular mechanisms of epistasis within and
between genes. Trends Genet 2011;27:323–31.
2. Breen MS, Kemena C, Vlasov PK et al. Epistasis as the
primary factor in molecular evolution. Nature 2012;490:
535–8.
73
74
| Borrell et al.
Evolution, Medicine, and Public Health
3. Trindade S, Sousa A, Xavier KB et al. Positive epistasis
drives the acquisition of multidrug resistance. PLoS
Genet 2009;5:e1000578.
genetics-based second-line drug resistance testing.
Antimicrob Agents Chemother 2012;56:2420–7.
20. Maruri F, Sterling TR, Kaiga AW et al. A systematic review
4. Ward H, Perron GG, Maclean RC. The cost of multiple
of gyrase mutations associated with fluoroquinolone-re-
drug resistance in Pseudomonas aeruginosa. J Evol Biol
sistant Mycobacterium tuberculosis and a proposed gyrase
2009;22:997–1003.
¨rkman J, Hughes D, Andersson DI. Virulence of antibi5. Bjo
numbering system. J Antimicrob Chemother 2012;67:
819–31.
otic-resistant Salmonella typhimurium. Proc Natl Acad Sci
21. Hall AR, Iles JC, MacLean RC. The fitness cost of rifampicin
U S A 1998;95:3949–53.
6. Hall AR, Maclean RC. Epistasis buffers the fitness effects
of rifampicin-resistance mutations in Pseudomonas
aeruginosa. Evolution 2011;65:2370–9.
7. Gagneux S, Long CD, Small PM et al. The competitive cost
of antibiotic resistance in Mycobacterium tuberculosis.
Science 2006;312:1944–6.
8. Shcherbakov D, Akbergenov R, Matt T et al. Directed mutagenesis of Mycobacterium smegmatis 16S rRNA to recon-
resistance in Pseudomonas aeruginosa depends on demand for RNA polymerase. Genetics 2011;187:817–22.
22. Lalic´ J, Elena SF. Magnitude and sign epistasis among
deleterious mutations in a positive-sense plant RNA virus.
Heredity (Edinb) 2012;109:71–7.
23. Wilke CO, Adami C. Interaction between directional epistasis and average mutational effects. Proc Biol Sci 2001;
268:1469–74.
24. O’Sullivan DM, McHugh TD, Gillespie SH. Analysis of
struct the in-vivo evolution of aminoglycoside resistance
rpoB and pncA mutations in the published literature: an
in Mycobacterium tuberculosis. Mol Microbiol 2010;77:
insight into the role of oxidative stress in Mycobacterium
830–40.
9. Andersson DI, Levin BR. The biological cost of antibiotic
tuberculosis evolution? J Antimicrob Chemother 2005;55:
674–9.
resistance. Curr Opin Microbiol 1999;2:489–93.
10. Andersson DI, Hughes D. Antibiotic resistance and its
25. Reece RJ, Maxwell A. DNA gyrase: structure and function.
Crit Rev Biochem Mol Biol 1991;26:335–75.
cost: is it possible to reverse resistance? Nat Rev
26. Miller LP, Crawford JT, Shinnick TM. The rpoB gene of
Microbiol 2010;8:260–71.
11. Weinreich DM, Watson RA, Chao L. Perspective: sign epistasis and genetic constraint on evolutionary trajectories.
Evolution 2005;59:1165–74.
Mycobacterium tuberculosis. Antimicrob Agents Chemother
1994;38:805–11.
27. McClure WR, Cech CL. On the mechanism of rifampicin
inhibition of RNA synthesis. J Biol Chem 1978;253:8949–56.
12. Muller B, Borrell S, Rose G et al. The heterogeneous evo-
28. Almeida Da Silva PEA, Palomino JC. Molecular basis and
lution of multidrug-resistant Mycobacterium tuberculosis.
Trends Genet 2013;29:160–9.
mechanisms of drug resistance in Mycobacterium tuberculosis: classical and new drugs. J Antimicrob Chemother
13. World Health Organization. Multidrug and extensively
2011;66:1417–30.
drug-resistant TB (M/XDR-TB). Global Report on
29. Gupta R, China A, Manjunatha UH et al. A complex of DNA
Surveillance and Response. World Health Organization:
gyrase and RNA polymerase fosters transcription in
Geneva, 2010.
Mycobacterium smegmatis. Biochem Biophys Res Commun
14. Sandgren A, Strong M, Muthukrishnan P et al.
2006;343:1141–5.
Tuberculosis drug resistance mutation database. PLoS
Med 2009;6:e1000002.
15. Perdiga˜o J, Macedo R, Silva C et al. From multidrug-resist-
30. Miskinyte M, Gordo I. Increased survival of antibiotic-re-
ant to extensively drug-resistant tuberculosis in Lisbon,
31. Ginsburg AS, Grosset JH, Bishai WR. Fluoroquinolones,
sistant Escherichia coli inside macrophages. Antimicrob
Agents Chemother 2013;57:189–95.
Portugal: the stepwise mode of resistance acquisition. J
tuberculosis, and resistance. Lancet Infect Dis 2003;3:
Antimicrob Chemother 2013;68:27–33.
432–42.
16. Sun G, Luo T, Yang C et al. Dynamic population changes
32. Zumla A, Hafner R, Lienhardt C et al. Advancing the devel-
in Mycobacterium tuberculosis during acquisition and fix-
opment of tuberculosis therapy. Nat Rev Drug Discov 2012;
ation of drug resistance in patients. J Infect Dis 2012;206:
11:171–2.
1724–33.
17. World Health Organization. (2012) Global tuberculosis con-
33. Luo N, Pereira S, Sahin O et al. Enhanced in vivo fitness of
fluoroquinolone-resistant Campylobacter jejuni in the ab-
trol: surveillance, planning, financing. World Health
sence of antibiotic selection pressure. Proc Natl Acad Sci U
Organization: Geneva, 2012.
18. Van Der Zanden AGM, Te Koppele-Vije EM, Vijaya Bhanu
S A 2005;102:541–6.
34. Ysern P, Clerch B, Castan´o M et al. Induction of SOS genes
N et al. Use of DNA extracts from Ziehl–Neelsen-stained
in Escherichia coli and mutagenesis in Salmonella
slides for molecular detection of rifampin resistance and
typhimurium by fluoroquinolones. Mutagenesis 1990;5:
spoligotyping of Mycobacterium tuberculosis. J Clin
Microbiol 2003;41:1101–8.
19. Streicher EM, Bergval I, Dheda K et al. Mycobacterium tuberculosis population structure determines the outcome of
63–6.
35. Ma Z, Lienhardt C, McIlleron H et al. Global tuberculosis
drug development pipeline: the need and the reality.
Lancet 2010;375:2100–9.
241
Evolution, Medicine, and Public Health [2013] pp. 241–253
doi:10.1093/emph/eot013
Genetic links between
post-reproductive lifespan
and family size in
Framingham
Xiaofei Wang*1, Sean G. Byars2 and Stephen C. Stearns3
1
Department of Statistics, Yale University, New Haven, CT 06520-8102, USA, 2Department of Biology, Copenhagen
University, Universitetsparken 15, 2100 Copenhagen, Denmark and 3Department of Ecology and Evolutionary Biology,
Yale University, New Haven, CT 06520-8102, USA
*Correspondence address. Department of Statistics, Yale University, New Haven, CT 06520-8102, USA. Tel: 203 432 0666;
Fax: 203 432 0633; E-mail: xiaofei.wang@yale.edu
Received 7 December 2012; revised version accepted 17 June 2013
ABSTRACT
Background and objectives: Is there a trade-off between children ever born (CEB) and post-reproductive
lifespan in humans? Here, we report a comprehensive analysis of reproductive trade-offs in the
Framingham Heart Study (FHS) dataset using phenotypic and genotypic correlations and a genomewide association study (GWAS) to look for single-nucleotide polymorphisms (SNPs) that are related to
the association between CEB and lifespan.
Methodology: We calculated the phenotypic and genetic correlations of lifespan with CEB for men and
women in the Framingham dataset, and then performed a GWAS to search for SNPs that might affect
the relationship between post-reproductive lifespan and CEB.
Results: We found significant negative phenotypic correlations between CEB and lifespan in both
women (rP = 0.133, P < 0.001) and men (rP = 0. 079, P = 0.036). The genetic correlation was large,
highly significant and strongly negative in women (rG = 0.877, P = 0.009) in a model without covariates,
but not in men (P = 0.777). The GWAS identified five SNPs associated with the relationship between
CEB and post-reproductive lifespan in women; some are near genes that have been linked to cancer.
None were identified in men.
Conclusions and implications: We identified several SNPs for which the relationship between CEB and
post-reproductive lifespan differs by genotype in women in the FHS who were born between 1889 and
1958. That result was not robust to changes in the sample. Further studies on larger samples are needed
to validate the antagonistic pleiotropy of these genes.
K E Y W O R D S : genome-wide association study; longevity; trade-off; family size
ß The Author(s) 2013. Published by Oxford University Press on behalf of the Foundation for Evolution, Medicine, and Public Health. This is an Open
Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits
unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
orig inal
research
article
242
| Wang et al.
Evolution, Medicine, and Public Health
BACKGROUND AND OBJECTIVES
Both the theory of life-history evolution and the
evolutionary theory of aging assume a trade-off between reproduction and survival: a cost of
reproduction paid in lifespan [1–4]. Although well
documented in model organisms, the existence of
this trade-off in humans has been controversial (e.g.
[5]). Negative [6–11], positive [12–17], U-shaped
[18–20] and mixed or insignificant [21–27] relationships between completed family size and lifespan
have all been found. Some results have been
criticized on statistical grounds; some authors
doubt that the trade-off exists at all (e.g. [28–32]).
Two papers suggest that the cost is only expressed in
women of low social class or nutritional status; a
similar effect has been found in model organisms
[5, 21, 27].
Although most of the attempts to measure the
trade-off in humans are based on phenotypic correlations, the standard of evidence for the existence of a
trade-off in evolutionary analyses of model organisms is a negative genetic correlation demonstrated
as a correlated response to selection (e.g. [5, 33]).
Such experiments reveal genetic relationships often
hidden by phenotypic plasticity. This standard cannot be met in humans, where experimental evolution
is not possible.
Two other types of genetic evidence, however, are
available in humans. First, genetic correlations can
be measured with pedigree analysis using methods
developed for animal breeding. Using such
¨gele et al. [34] found a significantly
methods, Go
‘positive’ genetic correlation between completed
family size and lifespan in a sample of more than
5100 men and women who lived between 1658 and
1907 in South Tyrol, Italy.
Second, genome-wide association studies
(GWAS) can be done on populations where both
the relevant traits and the single-nucleotide polymorphisms (SNPs) have been measured. In a
GWAS done on more than 3500 women from
Rotterdam, Kuningas et al. [35] found four chromosomal regions that influenced completed family
size; none of them appeared also to affect lifespan.
The aims of this analysis of men and women in the
Framingham Heart Study (FHS) were to add to the
genetic information on reproductive trade-offs in
humans by (i) first measuring the phenotypic correlation of lifespan with children ever born (CEB), (ii)
second estimating the genetic correlation of lifespan
with CEB and (iii) performing a GWAS to search for
SNPs with effects on the relationship of lifespan to
CEB. We found significantly negative phenotypic and
genetic correlations between post-reproductive lifespan and CEB in women. We also found five chromosomal regions mediating the trade-off that were
genome-wide significant in several statistical
models but not when we added smoking as a
covariate. Some of the genes in those five regions
are associated with increased risk of cancer.
METHODOLOGY
The Framingham Heart Study
Initiated in 1948 in the town of Framingham (MA),
the FHS includes three generations of participants
that continue to be measured. Beginning with 5209
men and women initially enrolled in the originalcohort, the study added 5124 offspring-cohort participants in 1971 that were mostly offspring of the
original-cohort. In 2002, a third-cohort was added
consisting of offspring of the second cohort.
Original-cohort participants have been examined
every 2 years (28 exams in total to date), the offspring-cohort every 4 years (eight exams in total).
Participants are mostly of European ancestry (20%
UK, 40% Ireland, 10% Italy and 10% Quebec). Data
were de-identified by the FHS. Data-use and human
subjects’ approval were obtained from the National
Institutes of Health (dbGaP) and the Yale
Institutional Review Board.
Phenotypic correlations
Our sample included men and women who were
born between the 1890s and the 1950s, except for
age at menarche where the available sample was
much smaller (i.e. 1923–56). Cox regression was
used to calculate risk of death depending on age
at first birth (nmen = 2579; nwomen = 2193), CEB
(nmen = 3833; nwomen = 3658), and age at menarche
(n = 1355) and menopause (n = 2415) in women. In
each regression, potentially confounding effects in
lifespan were controlled by including education,
country of origin and smoking status. To test for
potential nonlinear effects, a separate regression
was run with a quadratic term included for the main
predictor traits. If quadratic terms were significant,
this was explored further by examining the Cox
Wang et al. |
Genetic links in Framingham
regression model (from the survival library in R)
using penalized splines (with 4 df) [36, 37].
The Cox proportional hazards model is a standard
tool for survival analysis, in which the log of the
hazard function h(t) is assumed to be a linear combination of the covariates. Specifically, for a model
containing p covariates x1 , . . . ,xp ; the fitted model
takes the form of
hðtÞ ¼ hðt0 Þexpð1 x1+ +p xp Þ,
where i is the coefficient fit to covariate xi and
hðt0 Þ is the unknown baseline hazard function.
Equivalently, this equation can be expressed as
hðtÞ
ln
¼ 1 x1+ +p xp :
hðt0 Þ
Note that FHS reports CEB as a value from ‘0’ to
‘5’, where ‘5’ indicates having had five or more children. Several variables were pre-adjusted for age and
year measured. For body mass index (BMI), systolic
blood pressure (SBP) and total cholesterol, age and
year effects were removed by taking residuals of each
trait against age (measures between 20 and 60 years
old) and year measured using a generalized additive model (locally weighted scatterplot smoothing,
LOESS). All residuals for a subject were then
averaged to obtain an average residual for each trait,
which were then used for modelling. As
demonstrated previously, the surface of the
generalized additive model can be accurately
estimated due to the large number of trait measurements [38].
Our initial sample included 4123 women for
whom data on age at death, CEB, education level,
smoking history, estrogen use and BMI were available. We then removed 941 women who were born in
or after 1941, a period when the correlation between
lifespan and CEB was weaker, possibly because of
the improvement of health care after World War II.
We did so because to have a chance of detecting any
significantly correlated SNPs in the GWAS, we
needed to focus on a period where the phenotypic
correlation is relatively strong. Nineteen women
who died before the age of 50 years were also
excluded, because their CEB records might represent incomplete observations. Because we excluded
women who died before the age of 50 years, we are
specifically studying the relationship of CEB to postreproductive mortality. Of the remaining 3163
women, keeping only those who had genotype data
reduced our sample size to 1810. We required this
sample to have associated genotype data because
we later used the same sample for the GWAS. Note
that our phenotypic analysis used the year 1919 as a
cut-off because the yearly ratio of individuals alive to
individuals deceased increased to about 50% in
1919, and continued to rise thereafter.
For illustrative purposes, we also ran a multiple
linear regression on a smaller sample for women,
including only the deceased subjects who were born
prior to 1919 (n = 680) out of a total of 1810 who
satisfied specific criteria outlined above.
We similarly ran a regression model on a smaller
sample of men who have died (n = 712) out of a total
of 1474 men satisfying similar criteria.
Genetic correlations and heritabilities
We estimated heritabilities and genetic correlations
for traits from pedigrees using a mixed effects
restricted maximum likelihood (REML) model in
ASReml version 3.0 [39]. We considered models in
which there were no covariates as well as adjusted
models where phenotypic variation was partitioned
into additive genetic, residual variance and a single
random effect (maternal ID, paternal ID or education level). To be consistent with the phenotypic correlation models, we also considered models in
which fixed effects (smoking status and country of
origin) and both random effects for maternal ID and
education level were included. Sex was not included
as a fixed effect as male and female estimates were
obtained separately. Smoking status (0/1, nonsmoker/smoker) and country of origin (0/1, US
born/foreign born) were coded as binary variables.
Education described number of years completed,
with missing values coded as 8 years (the minimum). Maternal variance components ranged from
0.0 (age at first birth) to 0.12 ± 0.04 (lifespan) and 0.0
(age at first birth) to 0.20 ± 0.03 (lifespan) for female
and male analyses, respectively. Education variance
components ranged from 0.0 (age at menarche) to
0.06 ± 0.03 (CEB) and 0.0 (age at first birth) to
0.014 ± 0.009 (CEB) for female and male analyses,
respectively. The Framingham pedigree totals
15 877 individuals in 1538 pedigrees consisting of
both immediate and extended family. Heritability
estimates were tested for significance with likelihood ratios that compared full models with reduced
ones (i.e. 21DF = 2 (LogLFULL LogLREDUCED))
lacking the additive genetic component. Genetic
correlations were also tested for significance by
243
244
| Wang et al.
Evolution, Medicine, and Public Health
comparing likelihood values from full models to
ones where the genetic covariance was fixed at zero.
Our genetic correlation analysis between CEB and
lifespan included a total of 5133 females for whom
age at death and CEB information were available.
Supplementary Fig. S4 summarizes the pedigree information for these women, grouped by cohort via
the ‘pedantics’ package in R [40]. Pedigree depths
(computed using the same package) for the
Framingham dataset range from 0 to 4, with mean
1.02 (±1.06). On average, each woman had 2.38
(±1.59) children in her lifetime and lived 77.21
(±12.73) years. The average level of education in
years was 11.66. The average age at menarche was
12.81 (±1.54), average age at first birth was 26.49
(±4.81) and average age at menopause was 49.20
(±4.10).
Genome-wide association study
Our association results are based on 444 205 SNPs
from the 500 K and 50 K Affymetrix samples that
satisfied the following criteria: call rate >90%,
Hardy–Weinberg equilibrium P-value >0.00001,
Mendel error rate <2% and minor allele frequency
>0.01. These SNP selection criteria are further discussed in the Supplementary Information.
We used Cox proportional hazards models, as
done in the phenotypic correlation analysis, to estimate the interactions between survival time past age
50 years, CEB and genotype. For censored individuals, we used their times of last observation past age
50 years as their censoring time.
Several models were run under this setup, which
we number to emphasize that they are nested
models. Model 1 did not adjust for any covariates.
We then added covariates to reduce confounding by
variables that may be correlated with lifespan and
CEB. Model 2 used education level. Model 3 further
added BMI, estrogen use and cohort as covariates.
Models 4a–d were intermediate steps in which one
of the four additional covariates was added: blood
pressure treatment indicator (Model 4a), total cholesterol (Model 4b), SBP (Model 4c) and smoking
indicator (Model 4d). Model 5 included all four of
these additional covariates. Models 4a–d were run
retrospectively to pinpoint which covariate, when
added, resulted in removing significance from all
SNPs. A summary of the models fitted can be found
in the Supplementary Information.
Both genotypes and CEB were included as continuous variables to model an additive effect of the
minor allele. We used both the raw genotypes
provided by FHS as well as an imputed dataset.
The imputation was done in several stages. First,
we incorporated values imputed by MACH that were
included in the FHS dataset. The MACH algorithm
imputes missing genotypes based on shared haplotype stretches between subjects and HapMap data
[41]. Of the remaining missing values, we sampled
among the possible genotypes given the genotypes
of parents, when parent genotypes were available.
Any remaining missing values were simply sampled
according to genotype proportions of the entire
group. This sequence of operations created a full
set of genotypes that had no missing values.
Cohort was defined as a categorical variable
computed from the year of birth: born before or in
1917 and born in or after 1918.
In addition to running the above five models on
the full sample of 1810, we tested our models for
robustness by mimicking an out-of-sample analysis.
To that end, we randomly divided our sample into
two equal parts and fitted Models 1–5 to each part
separately to check for consistency in significance of
the top performing SNPs. A true out-of-sample performance check would include the calculation of prediction error based on a model fitted on a training
set. Our method does not aim to validate prediction
out of sample, but rather to ensure that a SNP discovered to be significant in one sample ought to be
significant in another sample—a less stringent, but
still important requirement of consistency. To minimize the effects of missing genotypes on each subsample, which would further lower our sample size
in each of the two separate runs, we only used the
imputed genotypes for this portion of our analysis.
The downside of using imputed genotypes is the risk
of imputation error. To verify that our risk of imputation error is low, we used the imputed SNP data to
repeat our full-sample analyses for Models 1–5. Our
aim was to show that our results for these models
are similar, regardless of whether we used imputed
or raw SNP data.
To explore possible non-additive genotypic
effects, we ran a separate Model 6 that used genotype as a categorical variable. The covariates used in
Model 6 are identical to those used in Model 3, and
any SNPs for which the homozygous minor genotype had fewer than 20 counts were excluded. We
did not apply the half-sample testing to Model 6,
because in many cases, the genotype counts in the
homozygous minor allele category were too small to
Wang et al. |
Genetic links in Framingham
further subdivide the group for categorical
modelling.
Finally, we ran two additional models that are outside of the nested framework given above on the raw
data only (and therefore, they are not numbered).
A quadratic model was run to search for a possible
nonlinear effect by adding a quadratic CEB term
along with its interaction with genotype to
Model 1. The ‘matching covariates’ model was run
to provide a frame of reference to the reader; this
model uses exactly the same covariates that were
included in the phenotypic and genotypic correlation
analyses—education, smoking indicator and country of origin.
RESULTS
Phenotypic correlations
In the Cox regression analysis where as many men
and women were included as possible (birth-year
range 1889–1958), censoring was used to account
for those who were still alive according to the latest
medical records. Risk of mortality beyond age 50
years increased if women (adjusted incidence rate
ratio (RR) = 1.045, P = 0.030) had more children
(Table 1). When a nonlinear term for CEB was
included, it significantly improved the model fit
and became more significant than the linear term.
245
Penalized splines for unadjusted mortality risk
(Fig. 1) support a predominantly U-shaped pattern
for the association between CEB and lifespan, similar to that found in some other studies (e.g. [19]).
This is consistent with a cost of reproduction that is
experienced by women with three or more children
and with a benefit of reproduction to those who have
one or two children. Highest mortality risk occurred
in women with no children or more than three to four
children, with lowest risk for those with approximately two. Mortality risk decreased if the first child
was born later (women, unadjusted RR = 0.971,
P < 0.001; men, adjusted RR = 0.985, P = 0.011; see
Supplementary Fig. S1), but the significance of this
effect depended on whether estimates were adjusted
or not (Table 1). Mortality risk was also reduced if
menopause occurred later in women (unadjusted
RR = 0.970, P = 0.003), although this effect disappeared when other effects were controlled for
(Table 1). Full model results can be seen in
Supplementary Table S1.
In the analysis where only the 680 women were
included in the range of birth years 1889–1918 in
which all had died, the phenotypic correlation between CEB and lifespan was highly significant and
negative (r = 0.133, P = 0.0005; Fig. 2). Linear regression indicated that every additional child cost
0.74 years of lifespan (standard error (SE) = 0.21
years). There was, however, significant variation in
Table 1. Incidence RR (±95% confidence interval) for age at death due to stroke, heart attack or cancer
(beyond age 50 years)
Trait
Women
Men
Unadjusted
Adjusted
Unadjusted
Adjusted
CEB
1.050*
(1.011–1.092)NL**
n = 3729
1.045*
(1.005–1.087)NL***
0.995
(0.960–1.033)
n = 3888
1.031
(0.993–1.071)
Age first birth
0.971***
(0.955–0.988)NL**
n = 2236
0.977*
(0.960–0.994)NL*
0.990
(0.979–1.001)
n = 2613
0.985**
(0.974–0.995)
Menarche
0.891
(0.757–1.050)
n = 1367
0.917
(0.782–1.077)
Menopause
0.970**
(0.951–0.990)
n = 2461
0.984
(0.965–1.005)
Unadjusted Cox regression estimates included only the main predictor trait. Cultural effects (smoking, education and country-of-origin) were accounted
for in adjusted estimates. ‘NL’ indicates that a significant nonlinear effect was also detected for the association between this trait and longevity.
*P < 0.05, **P < 0.01, ***P < 0.001.
246
| Wang et al.
Evolution, Medicine, and Public Health
Figure 1. Summary of CEB and mortality risk in Framingham
women. A histogram of CEB and log-relative mortality risk
values for each CEB value with 95% confidence bands
Figure 3. Correlation between CEB and lifespan by birth year
(n = 5133)
for women. Women (n = 680) were grouped by overlapping
10-year intervals of birth year, and the correlation between
CEB and lifespan was computed for each group. Individual
points indicate the sample size of each 10-year group, with
the mean birth year plotted on the x-axis and correlation
plotted on the y-axis
phenotypic correlations are dependent on birth
year is consistent with previous findings that
selection pressures changed over time in
Framingham [38].
Heritabilities and genetic correlations
Figure 2. Relationship between CEB and lifespan for women.
Scatterplot illustrating correlation between CEB and lifespan
(r = 0.133, P < 0.001) (n = 680). Both variables have been
jittered to minimize overlap of points
the phenotypic correlation by birth year (Fig. 3); it
was positive (with one exception) from 1893 to 1907
and negative from 1908 to 1913. Many in the earlier
group were giving birth before the Great Depression
and World War II. Some of the latter group encountered those two major environmental perturbations.
The correlation between CEB and lifespan for the 712
men was slightly negative (r = 0.079, P = 0.0355;
Supplementary Fig. S2). An additional child cost
0.54 years of male lifespan (SE = 0.26 years).
Again, the correlation varied by birth year, but the
variations were less pronounced than for females
(Supplementary Fig. S3). The observation that
In women (Table 2), the heritabilities of most major
life-history traits differed significantly from zero,
including age at death (h2 = 0.12, P = 0.01), CEB
(h2 = 0.09, P = 0.03), age at first birth (h2 = 0.18,
P < 0.001) and menopause (h2 = 0.44, P < 0.001).
In women, the genetic correlation of CEB with age
at death was large, negative and significant
(rG = 0.88, P = 0.01) in a model without covariates
(Supplementary Table S2). When we included education as a random effect, the genetic correlation
decreased to 0.70 but was still significant
(P = 0.02). When we included either the mother or
the father identifiers in place of education as a random effect, the genetic covariance remained large
and negative, but was no longer significant (mother:
rG = 1.58, P = 0.11; father: rG = 1.46, P = 0.15). The
model in which we adjusted for education, smoking
status and country of origin also produced a large
negative genetic correlation, but the correlation was
not significant (rG = 0.69, P = 0.14).
The correlation between the quadratic term CEB2
and lifespan was large, negative and significant in
Wang et al. |
Genetic links in Framingham
247
Table 2. Heritabilities (h2, on the diagonal) and genetic correlations (rG, off the diagonal) of life history
traits (±SE)
Women
Age at death
Age at death
CEB
Age first birth
Menarche
Menopause
0.12 ± 0.08
P = 0.0176
n = 3010
0.69 ± 0.52
P = 0.1420
0.20 ± 0.25
P = 0.2083
0.07 ± 0.23
P = 0.3886
0.15 ± 0.17
P = 0.1917
0.09 ± 0.05
P = 0.0394
n = 4123
0.40 ± 0.35
P = 0.1545
0.31 ± 0.24
P < 0.0001
0.21 ± 0.21
P = 0.1377
0.18 ± 0.06
P = 0.0008
n = 2912
0.38 ± 0.33
P = 0.0911
0.06 ± 0.14
P = 0.3541
0.16 ± 0.13
P = 0.0948
n = 1638
0.10 ± 0.21
P = 0.3121
CEB
Age first birth
Menarche
Menopause
Men
Age at death
CEB
0.44 ± 0.06
P < 0.0001
n = 3400
<0.01 ± <0.01
P = 0.8875
n = 2963
<0.01 ± <0.01
P = 0.7773
<0.01 ± <0.01
P = 0.6101
<0.01 ± <0.01
P = 0.5485
n = 4051
<0.01 ± <0.01
P = 0.3884
Age first birth
0.12 ± 0.07
P = 0.0300
n = 2688
SEs and P-values were obtained from maximum-likelihood estimates. Cultural (smoking, education and country-of-origin) and maternal effects were
accounted for in all estimates. P-values < 0.05 are in bold.
three of four models (no covariates: rG = 1.09,
P = 0.003, only mother identifier as random effect:
rG = 1.73, P = 0.04, only education as random effect: rG = 0.85, P = 0.01), and borderline non-significant in the model with only the father identifier
(rG = 1.61, P = 0.06).
Furthermore, we looked to see if the genetic correlation between CEB and lifespan was robust to
pedigree depth in the simplest model where no
covariates were included. Including only those
women with pedigree depth of 1 or higher (n = 2540),
we got rG = 0.46 (P = 0.14) and including only those
women with pedigree depth of 2 or higher (n = 948),
we got rG = 0.21 (P = 0.60); both correlations were
no longer significant in the reduced samples.
The genetic correlation of CEB with age at menarche was relatively large, positive and highly significant (rG = 0.31, P < 0.001). In men (Table 2), the
heritability of age at first birth (inferred from their
spouses) was small and only just significant
(h2 = 0.12, P = 0.03). All other male heritability and
genetic correlation estimates were non-significant.
Full model results for heritability can be seen in
Supplementary Table S2.
Genome-wide association study
GWAS results are summarized in Tables 3–10; the
birth years for the 1810 women included in
the GWAS are shown in the Supplementary
Information. We deemed a SNP to be genome-wide
significant if its interaction coefficient with CEB had
a P-value that was less than a Bonferroni-adjusted
threshold of 1.13 107 ( = 0.05), unless otherwise
indicated. For females, we found two SNPs that attained genome-wide significance using the full
248
| Wang et al.
Evolution, Medicine, and Public Health
Table 3. GWAS for SNPs that affect the relationship between CEB and lifespan: summary of significant
SNPs in Models 1–3 and 5 (full sample)
Ssid
Rsid
Chr
Position
P-values (genotype CEB)
Near
Model 1
ss66450977 rs6768456 3
ss66475987 rs2575533 4
Model 2
Model 3
Model 4
Model 5
Matching
covariates
7.99E07 4.93E08a
27867272 EOMES 4.03E10a 4.38E10a 8.40E09a
(see Table 4)
42432336 ATP8A1 8.02E08a 5.30E08a 3.06E06
2.49E05 2.11E07
n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3.
a
SNP attained genome-wide significance.
Table 4. GWAS for SNPs that affect the relationship between CEB and lifespan: summary of significant
SNPs in Models 4 (full sample)
Ssid
Rsid
Chr
Position
P-values (genotype CEB)
Near
Model 4a
ss66450977
ss66475987
rs6768456
rs2575533
3
4
27867272
42432336
EOMES
ATP8A1
1.40E09
1.02E05
Model 4b
a
a
7.44E09
3.56E06
Model 4c
8.65E09
5.23E06
Model 4d
a
4.02E07
1.35E05
n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3.
a
SNP attained genome-wide significance.
Table 5. GWAS for SNPs that affect the relationship between CEB and lifespan: summary of nominally
significant SNPs in Model 6
Ssid
Rsid
Chr
Position
Near
P-value
Aa CEB
P-value
aa CEB
Homozygous minor
genotype count
ss66450977
ss66500131
ss66392234
ss66495977
rs6768456
rs1777023
rs7132724
rs2180957
3
9
12
14
27867272
92008266
65001044
68238574
EOMES
OR7E31P
HELB
RAD51B
1.00E07
1.00E01
1.30E01
1.20E01
2.40E03
3.00E07
9.60E08
8.70E07
21
26
102
21
n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3.
Table 6. GWAS for SNPs that affect the relationship between CEB and lifespan: re-evaluating significant
SNPs in Models 1–3 and 5 (split samples)
Ssid
ss66450977
ss66475987
Sample half 1
Sample half 2
P-values (genotype CEB)
P-values (genotype CEB)
Model 1
Model 2
Model 3
Model 5
Model 1
Model 2
Model 3
Model 5
0.00032
0.0002
0.00041
0.00012
0.00097
0.0021
0.007
0.001
9.39E08a
5.46E04
7.04E08a
4.46E04
1.36E06
1.56E03
4.58E06
1.39E02
n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3.
a
SNP attained genome-wide significance.
Wang et al. |
Genetic links in Framingham
249
Table 7. GWAS for SNPs that affect the relationship between CEB and lifespan: re-evaluating significant
SNPs in Models 4a–d (split samples)
Ssid
ss66450977
ss66475987
Sample half 1
Sample half 2
P-values (genotype CEB)
P-values (genotype CEB)
Model 4a
Model 4b
Model 4c
Model 4d
Model 4a
Model 4b
Model 4c
Model 4d
8.40E04
3.00E03
1.30E03
9.40E04
8.00E04
1.80E03
7.40E03
3.70E03
3.33E06
2.00E03
1.19E06
2.30E03
1.32E06
3.80E03
3.35E06
3.40E03
n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3.
Table 8. GWAS for SNPs that affect the relationship between CEB and lifespan: top SNPs in Model 5
(split sample)
Ssid
Rsid
ss66092635
ss66508254
ss66392234
ss66328248
ss66531142
ss74823403
ss66231005
ss66273879
ss66526690
ss66490007
Chr
rs6581676
rs2961258
rs7132724
rs13248967
rs11219832
rs7860830
rs10899741
rs1728810
rs1602160
rs11009744
P-values (genotype CEB)
Position
12
7
12
8
11
9
7
3
6
10
64992353
15150223
65001044
114920075
124272500
26882137
52215028
10992443
94277193
34675601
Sample 1
Sample 2
9.12E06
1.41E05
1.82E05
2.81E05
3.65E05
3.27E01
4.62E01
4.15E01
9.00E01
9.86E01
4.58E01
7.86E01
4.86E01
6.86E01
1.79E01
7.19E10a
9.84E08a
1.07E07a
1.57E07
2.37E07
n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3.
a
SNP attained genome-wide significance.
Table 9. GWAS for SNPs that affect the relationship between CEB and lifespan: summary of significant
SNPs in Models 1–3 and 5 (full sample) (imputed SNPs)
Ssid
ss66450977
ss66475987
Rsid
rs6768456
rs2575533
Chr
3
4
Position
27867272
42432336
P-values (genotype CEB)
Near
EOMES
ATP8A1
Model 1
Model 2
Model 3
Model 4
Model 5
2.91E10a
1.50E07
2.20E10a
6.57E08a
6.44E09a
5.03E06
(see Table 10)
5.56E07
2.94E05
n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3.
a
SNP attained genome-wide significance.
sample: ss66450977 on Chromosome 3 (close to
EOMES) and ss66475987 on Chromosome 4 (close
to ATP8A1). Their levels of significance decreased as
additional covariates were included in the model;
however, these SNPs were also significant in the
matching covariates model (Tables 3 and 4). We
also found two nominally significant SNPs that
exhibited possibly non-additive effects: ss66392234
250
| Wang et al.
Evolution, Medicine, and Public Health
Table 10. GWAS for SNPs that affect the relationship between CEB and lifespan: summary of
significant SNPs in Model 4 (full sample) (imputed SNPs)
Ssid
Rsid
Chr
Position
P-values (genotype CEB)
Near
Model 4a
ss66450977
ss66475987
rs6768456
rs2575533
3
4
27867272
42432336
EOMES
ATP8A1
1.40E08
1.02E05
Model 4b
a
a
6.30E09
5.80E06
Model 4c
4.30E09
5.40E06
Model 4d
a
3.87E07
2.30E05
n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3.
a
SNP attained genome-wide significance.
on Chromosome 12 (in HELB) and ss66500131
on Chromosome 9 (close to the pseudogene
OR7E31P) (Table 5). Nearby genes/pseudogenes
were determined based on a radius of 150 kb from
each SNP.
In the split-sample analysis using imputed SNP
data (see ‘Methodology’ section regarding details
on imputation), no SNPs were found to be significant for females (Tables 6–8), even when the randomization used in the split-sample assignment
was replicated 100 times. We verified that using
the imputed data for the full-sample analysis would
have yielded comparable levels of significance for
the two SNPs previously discovered in Models 1–5
(Tables 9 and 10).
No significant SNPs were detected for males in
Models 1–3. As in the GWAS for females, the addition of more covariates decreased levels of significance, and therefore no further models were run.
No significant SNPs were detected in a model that
included a quadratic effect of CEB. Further details on
the GWAS for females are in the Supplementary
Information.
CONCLUSIONS AND IMPLICATIONS
Phenotypic and genetic correlations
The phenotypic correlation between CEB and
lifespan in women differed with birth year,
demonstrating the importance of phenotypic plasticity on the relationships among life-history traits.
Secular cultural and environmental changes affect
that correlation and probably account for much of
the variation among studies [6, 15, 19, 21, 22]. The
estimate of a negative genetic correlation in women
when not accounting for covariates (rG = 0.88) was
large. The effects of shared environment reduced the
strength of the linear correlation and increased the
strength of the quadratic correlation, and education
mimicked the effects of a cost of reproduction in that
increased level of education was associated with
both fewer children and longer life: including education decreased the estimate of the genetic
correlation.
Some of our genetic correlation estimates were
below 1. This indicates that the estimated variance
component is negative, known to be a possible result of REML estimation [42].
When we controlled for the effects of smoking,
education, country of origin and maternal effects,
the correlation was still negative (rG = 0.69) yet
no longer significant. This mirrors the pattern we
observed in the GWAS; as covariates were
introduced into the model, associations became
insignificant.
The mean pedigree depth of 1.02 implies that our
pedigree is dominated by parent–offspring relationships. This may result in some difficulty distinguishing
parental, environmental and additive genetic effects.
For example, cultural and lifestyle habits that are
unique to nuclear families (such as diet) are known
to affect lifespan, but these habits are not recorded,
and therefore the genetic correlations that we see may
be confounded by these unobservable factors.
One can only find a genetic correlation when the
phenotypic correlation is significant, and one can
only find significant effects of SNPs on a phenotypic
correlation when it differs from zero. Our chain of
inference thus depends on genetic effects not being
too masked by phenotypic plasticity.
Gene functions
We found several SNPs with nominally significant
effects on the correlation of CEB with post-reproductive lifespan; two of them are near EOMES and
RAD51B, genes that are related to cancer when
under-expressed. The effect of the SNP close to
EOMES reached genome-wide significance. The
Wang et al. |
Genetic links in Framingham
EOMES gene has been associated with multiple
sclerosis and bladder cancer [43, 44]. RAD51B, a
gene involved in encoding proteins that participate
in DNA repair, has been linked to breast cancer and
brain cancer [45–48]. Further details on the genes in
proximity to the SNPs found significant in our GWAS
are included in the Supplementary Information.
Although these SNPs were close in physical distance
to their respective genes (<130 kb), further study of
linkage disequilibrium would help to understand
their possible association.
Other studies
Voorhuis et al. [49] collated the results of many genetic studies of age at natural menopause. None of
the SNPs that we discovered were found in the
studies included in their summary.
Several other recent genetic studies relate fertility
to genotype. Kosova et al. [50] found 41 SNPs
(P < 104) that were associated with decreased male
fertility. Adachi et al. [51] found 36 SNPs (P < 104)
with possible links to endometriosis in Japanese females. Both were GWAS studies that did not find any
genome-wide significant SNPs. Murray et al. [52] reported confirmations for four SNPs previously
identified as associated with age at menopause.
Ewens et al. [53] examined 15 SNPs linked with obesity to evaluate possible associations with polycystic
ovary syndrome, the cause of a form of infertility in
women; only one SNP had a nominal level of significance, and the significance did not hold up in another case–control study. Our methods differ
fundamentally from these four studies in that we
considered lifespan in conjunction with fertility,
and the significant SNPs we found were not reported
in their analyses [50–53].
Although the Kuningas Rotterdam study
incorporated mortality in its analysis and was therefore more similar to our study [35], it differs from our
approach in three ways: (i) our analysis included
many more SNPs (444 205 versus their 1664), (ii)
we adjusted for the effects of several direct mortality-affecting covariates such as smoking and SBP,
(iii) Kuningas used an initial screening of the 1664
SNPs with a set-based test (with a threshold of
P < 0.05), whereas we started with a GWAS across
444 205 SNPs in models that relate each SNP to
both CEB and lifespan (with a threshold of
P < 1.13 107). We did not find Bonferroni-level
significance with SNPs near the four gene regions
identified in [35].
Summary
We have analysed phenotypic and genetic correlations between reproductive success and survival
and have identified a small set of genes that may
mediate a trade-off between them. This warrants further studies in other samples.
The Framingham dataset has some shortcomings. In particular, women born before the start
of the study would only have been included
in the study if they survived until 1948–52
(when the study began). Therefore, our dataset
does not include anyone who died during
World War I, the 1918 flu pandemic, the
Great Depression and World War II. If these catastrophic events affected women differently depending on their fertility and lifespan, then excluding
these women from our analysis would bias our
results. The issue is inherent in such observational
studies of humans, and unfortunately cannot be
avoided.
We failed to find any significant SNPs when
covariates (i.e. smoking, country of origin and
average cholesterol levels) were included and
when we did a rough check for consistency out
of sample. It is unknown how often such checks
modify significance of SNP associations, for many
other published GWAS studies do not account for
the effects of covariates or do out-of-sample
predictions.
AUTHOR CONTRIBUTIONS
S.G.B. and X.W. jointly worked on processing and
cleaning the data and phenotypic correlation calculations. S.G.B. further calculated the genetic correlations and heritabilities. X.W. performed the GWAS.
S.C.S. conceived of the study and drafted the initial
manuscript. All authors contributed to the final
manuscript.
supplementary data
Supplementary data is available at EMPH online.
acknowledgements
The authors thank Drs John W. Emerson and Andrew
Pakstis for their feedback and insight on the project and
the three anonymous reviewers for their constructive
feedback.
251
252
| Wang et al.
Evolution, Medicine, and Public Health
funding
16. Muller HG, Chiou JM, Carey JR et al. Fertility and life span:
late children enhance female longevity. J Gerontol Ser A
The study was supported by the Yale University and the Marie
Curie International Incoming Fellowship FP7-PEOPLE-2010IIF-276565.
Conflict of interest: None declared.
Biol Sci Med Sci 2002;57:B202–6.
17. Sear R. The impact of reproduction on Gambian
women: does controlling for phenotypic quality reveal
costs of reproduction? Am J Phys Anthropol 2007;132:
632–41.
18. Lawlor DA, Emberson JR, Ebrahim S et al. Is the
references
association between parity and coronary heart disease
due to biological effects of pregnancy or adverse lifestyle
1. Williams G. Pleiotropy, natural selection, and the evolution of senescence. Evolution 1957;11:398–411.
2. Williams G. Natural selection, costs of reproduction,
and a refinement of Lack’s principle. Am Nat 1966;100:
687–90.
3. Roff D. The Evolution of Life Histories. New York: Chapman
and Hall, 1992.
4. Stearns SC. The Evolution of Life Histories. Oxford
University Press: Oxford, 1992, 249.
5. Stearns SC, Partridge L (2001) The genetics of aging in
Drosophila. In: Masoro EJ, Austad SN (eds), Handbook
of the Bioloy of Aging, 5th edn. 2001, 353–68.
6. Doblhammer G, Oeppen J. Reproduction and longevity
among the British peerage: the effect of frailty and health
selection. Proc Biol Sci 2003;270:1541–7.
risk factors associated with child-rearing? Findings
from the British Women’ Heart and Health Study and
the British Regional Heart Study. Circulation 2003;107:
1260–4.
19. Lund EE, Arnesen EE, Borgan JKJ. Pattern of childbearing
and mortality in married women—a national prospective
study from Norway. J Epidemiol Commun Health 1990;44:
237–40.
20. Manor OO, Eisenbach ZZ, Israeli AA et al. Mortality differentials among women: the Israel Longitudinal Mortality
Study. Soc Sci Med (1967) 2000;51:1175–88.
21. Dribe M. Long-term effects of childbearing on mortality:
evidence from pre-industrial Sweden. Popul Stud 2004;58:
297–310.
22. Gavrilova N, Gavrilov L, Semyonova VG et al. Does excep
7. Gagnon A, Smith KR, Tremblay M et al. Is there a trade-
tional human longevity come with a high cost of infertility?
off between fertility and longevity? A comparative
Testing the evolutionary theories of aging. Ann N Y Acad
study of women from three large historical databases
accounting for mortality selection. Am J Hum Biol 2009;
21:533–40.
8. Maklakov AA. Sex difference in life span affected by female
birth rate in modern humans. Evol Hum Behav 2008;29:
444–9.
9. Tabatabaie V, Atzmon G, Rajpathak SN et al. Exceptional
longevity is associated with decreased reproduction. Aging
2011;3:1202–5.
10. Thomas F, Teriokhin A, Renaud F et al. Human longevity at
the cost of reproductive success: evidence from global
data. J Evol Biol 2000;13:409–14.
11. Westendorp R, Kirkwood T. Human longevity at the cost of
reproductive success. Nature 1998;396:743–6.
12. Fuster V. Widowhood, illegitimacy, marital reproduction
and female longevity in a rural Spanish population. Homo
2011;62:500–9.
13. Helle S, Lummaa V, Jokela J. Are reproductive and somatic
senescence coupled in humans? Late, but not early,
Sci 2004;1019:513–7.
¨¨
23. Helle S, Ka
ar P, Jokela J. Human longevity and early reproduction in pre-industrial Sami populations. J Evol Biol
2002;15:803–7.
24. Jacobsen BK, Knutsen SF, Oda K et al. Parity and total,
ischemic heart disease and stroke mortality. The
Adventist Health Study, 1976–1988. Eur J Epidemiol
2011;26:711–8.
25. Jasienska G, Nenko I, Jasienski M. Daughters increase
longevity of fathers, but daughters and sons equally
reduce longevity of mothers. Am J Hum Biol 2006;18:
422–5.
26. Korpelainen H. Fitness, reproduction and longevity
among
European
aristocratic
and
rural
Finnish
families in the 1700s and 1800s. Proc Biol Sci 2000;267:
1765–70.
27. Lycett JE, Dunbar RIM, Voland E. Longevity and the costs
of reproduction in a historical human population. Proc Biol
Sci 2000;267:31–5.
reproduction correlated with longevity in historical sami
28. Cesarini D, Lindqvist E, Wallace B. Maternal longevity and
women. Proc Biol Sci 2005;272:29–37.
14. Le Bourg E, Thon B, Le´gare´ J et al. Reproductive life of
the sex of offspring in pre-industrial Sweden. Ann Hum
French–Canadians in the 17–18th centuries: a search for
29. Gavrilov L, Gavrilova N. Is there a reproductive cost for
a trade-off between early fecundity and longevity. Exp
Gerontol 1993;28:217–32.
15. McArdle P, Pollin T, O’Connell J et al. Does having children
Biol 2007;34:535–46.
human longevity? J Anti-Aging Med 1999;2:121–3.
30. Jasienska G. Reproduction and lifespan: trade-offs,
overall energy budgets, intergenerational costs, and
extend life span? A genealogical study of parity and lon-
costs neglected by research. Am J Hum Biol 2009;21:524–32.
gevity in the Amish. J Gerontol Ser A Biol Sci Med Sci 2006;
31. Mitteldorf J. Female fertility and longevity. Age 2010;32:
61:190–5.
79–84.
Wang et al. |
Genetic links in Framingham
32. Le Bourg E. Does reproduction decrease longevity in
human beings? Ageing Res Rev 2007;6:141–9.
application of these as urinary tumor markers. Clin
Cancer Res 2011;17:5582–92.
33. Stearns S, Ackermann M, Doebeli M et al. Experimental
44. Patsopoulos NA, de Bakker PIW. Genome-wide meta-
evolution of aging, growth, and reproduction in fruitflies.
analysis identifies novel multiple sclerosis susceptibility
Proc Natl Acad Sci USA 2000;97:3309–13.
¨gele M, Pattaro C, Fuchsberger C et al. Heritability
34. Go
analysis of life span in a semi-isolated population followed
45. Liu Y, Shete S, Wang LE et al. Gamma-radiation sensitivity
and polymorphisms in RAD51L1 modulate glioma risk.
across four centuries reveals the presence of pleiotropy
between life span and reproduction. J Gerontol Ser A Biol
Sci Med Sci 2011;66:26–37.
loci. Ann Neurol 2011;70:897–912.
Carcinogenesis 2010;31:1762–9.
46. Figueroa JD, Garcia-Closas M, Humphreys M et al.
Associations of common variants at 1p11. 2 and 14q24. 1
¨e S, Uitterlinden AG et al. The relation35. Kuningas M, Altma
(RAD51L1) with breast cancer risk and heterogeneity
ship between fertility and lifespan in humans. Age 2011;33:
by tumor subtype: findings from the Breast Cancer
Association Consortium. Hum Mol Genet 2011;20:4693–706.
615–22.
36. R Core Team. (2009) R: A Language and Environment
for Statistical Computing. R Foundation for Statistical
47. Shu XO, Long J, Lu W et al. Novel genetic markers of breast
cancer survival identified by a genome-wide association
37. Therneau T. (2013) A Package for Survival Analysis in S.
study. Cancer Res 2012;72:1182–9.
¨stro
¨m S, Henriksson R et al. DNA-repair
48. Wibom C, Sjo
R package version 2.37–4. http://CRAN.R-project.org/
gene variants are associated with glioblastoma survival.
Computing: Vienna, Austria, 2009.
Acta Oncol 2012;51:325–32.
package=survival.
al.
49. Voorhuis M, Onland-Moret NC, van der Schouw YT et al.
Colloquium papers: natural selection in a contemporary
Human studies on genetics of the age at natural meno-
human population. Proc Natl Acad Sci 2010;107(Suppl. 1),
1787–92.
pause: a systematic review. Hum Reprod Update 2010;16:
364–77.
39. Gilmour A, Gogel B, Cullis B. ASReml user guide release 3.0.
50. Kosova G, Scott NM, Niederberger C et al. Genome-
38. Byars
SG,
Ewbank
D,
Govindaraju
DR
et
wide association study identifies candidate genes for
VSN International Ltd, 2002.
40. Morrissey MB, Wilson A. Pedantics: an R package for
pedigree-based
genetic
simulation
and
pedigree
manipulation, characterization and viewing. Molecular
Ecology Resources 2010;10:711–9.
41. Li Y, Abecasis GR. Mach 1.0: rapid haplotype reconstruction and missing genotype inference. Am J Hum Genet
2006;S79:2290.
male fertility traits in humans. Am J Hum Genet 2012;90:
950–61.
51. Adachi S, Tajima A, Quan J et al. Meta-analysis of genomewide association scans for genetic susceptibility to endometriosis in Japanese population. J Hum Genet 2010;55:
816–21.
52. Murray A, Bennett CE, Perry JRB et al. Common genetic
42. Thompson WA Jr. The problem of negative esti-
variants are significant risk factors for early menopause:
mates of variance components. Ann Math Stat 1962;33:
results from the Breakthrough Generations Study. Hum
273–89.
Mol Genet 2010;20:186–92.
43. Reinert T, Modin C, Castano FM et al. Comprehensive
53. Ewens KG, Jones MR, Ankener W et al. FTO and MC4R
genome methylation analysis in bladder cancer: identifi-
gene variants are associated with obesity in polycystic
cation and validation of novel methylated genes and
ovary syndrome. PLoS One 2011;6:e16390.
253
241
Evolution, Medicine, and Public Health [2013] pp. 241–253
doi:10.1093/emph/eot013
Genetic links between
post-reproductive lifespan
and family size in
Framingham
Xiaofei Wang*1, Sean G. Byars2 and Stephen C. Stearns3
1
Department of Statistics, Yale University, New Haven, CT 06520-8102, USA, 2Department of Biology, Copenhagen
University, Universitetsparken 15, 2100 Copenhagen, Denmark and 3Department of Ecology and Evolutionary Biology,
Yale University, New Haven, CT 06520-8102, USA
*Correspondence address. Department of Statistics, Yale University, New Haven, CT 06520-8102, USA. Tel: 203 432 0666;
Fax: 203 432 0633; E-mail: xiaofei.wang@yale.edu
Received 7 December 2012; revised version accepted 17 June 2013
ABSTRACT
Background and objectives: Is there a trade-off between children ever born (CEB) and post-reproductive
lifespan in humans? Here, we report a comprehensive analysis of reproductive trade-offs in the
Framingham Heart Study (FHS) dataset using phenotypic and genotypic correlations and a genomewide association study (GWAS) to look for single-nucleotide polymorphisms (SNPs) that are related to
the association between CEB and lifespan.
Methodology: We calculated the phenotypic and genetic correlations of lifespan with CEB for men and
women in the Framingham dataset, and then performed a GWAS to search for SNPs that might affect
the relationship between post-reproductive lifespan and CEB.
Results: We found significant negative phenotypic correlations between CEB and lifespan in both
women (rP = 0.133, P < 0.001) and men (rP = 0. 079, P = 0.036). The genetic correlation was large,
highly significant and strongly negative in women (rG = 0.877, P = 0.009) in a model without covariates,
but not in men (P = 0.777). The GWAS identified five SNPs associated with the relationship between
CEB and post-reproductive lifespan in women; some are near genes that have been linked to cancer.
None were identified in men.
Conclusions and implications: We identified several SNPs for which the relationship between CEB and
post-reproductive lifespan differs by genotype in women in the FHS who were born between 1889 and
1958. That result was not robust to changes in the sample. Further studies on larger samples are needed
to validate the antagonistic pleiotropy of these genes.
K E Y W O R D S : genome-wide association study; longevity; trade-off; family size
ß The Author(s) 2013. Published by Oxford University Press on behalf of the Foundation for Evolution, Medicine, and Public Health. This is an Open
Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits
unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
orig inal
research
article
242
| Wang et al.
Evolution, Medicine, and Public Health
BACKGROUND AND OBJECTIVES
Both the theory of life-history evolution and the
evolutionary theory of aging assume a trade-off between reproduction and survival: a cost of
reproduction paid in lifespan [1–4]. Although well
documented in model organisms, the existence of
this trade-off in humans has been controversial (e.g.
[5]). Negative [6–11], positive [12–17], U-shaped
[18–20] and mixed or insignificant [21–27] relationships between completed family size and lifespan
have all been found. Some results have been
criticized on statistical grounds; some authors
doubt that the trade-off exists at all (e.g. [28–32]).
Two papers suggest that the cost is only expressed in
women of low social class or nutritional status; a
similar effect has been found in model organisms
[5, 21, 27].
Although most of the attempts to measure the
trade-off in humans are based on phenotypic correlations, the standard of evidence for the existence of a
trade-off in evolutionary analyses of model organisms is a negative genetic correlation demonstrated
as a correlated response to selection (e.g. [5, 33]).
Such experiments reveal genetic relationships often
hidden by phenotypic plasticity. This standard cannot be met in humans, where experimental evolution
is not possible.
Two other types of genetic evidence, however, are
available in humans. First, genetic correlations can
be measured with pedigree analysis using methods
developed for animal breeding. Using such
¨gele et al. [34] found a significantly
methods, Go
‘positive’ genetic correlation between completed
family size and lifespan in a sample of more than
5100 men and women who lived between 1658 and
1907 in South Tyrol, Italy.
Second, genome-wide association studies
(GWAS) can be done on populations where both
the relevant traits and the single-nucleotide polymorphisms (SNPs) have been measured. In a
GWAS done on more than 3500 women from
Rotterdam, Kuningas et al. [35] found four chromosomal regions that influenced completed family
size; none of them appeared also to affect lifespan.
The aims of this analysis of men and women in the
Framingham Heart Study (FHS) were to add to the
genetic information on reproductive trade-offs in
humans by (i) first measuring the phenotypic correlation of lifespan with children ever born (CEB), (ii)
second estimating the genetic correlation of lifespan
with CEB and (iii) performing a GWAS to search for
SNPs with effects on the relationship of lifespan to
CEB. We found significantly negative phenotypic and
genetic correlations between post-reproductive lifespan and CEB in women. We also found five chromosomal regions mediating the trade-off that were
genome-wide significant in several statistical
models but not when we added smoking as a
covariate. Some of the genes in those five regions
are associated with increased risk of cancer.
METHODOLOGY
The Framingham Heart Study
Initiated in 1948 in the town of Framingham (MA),
the FHS includes three generations of participants
that continue to be measured. Beginning with 5209
men and women initially enrolled in the originalcohort, the study added 5124 offspring-cohort participants in 1971 that were mostly offspring of the
original-cohort. In 2002, a third-cohort was added
consisting of offspring of the second cohort.
Original-cohort participants have been examined
every 2 years (28 exams in total to date), the offspring-cohort every 4 years (eight exams in total).
Participants are mostly of European ancestry (20%
UK, 40% Ireland, 10% Italy and 10% Quebec). Data
were de-identified by the FHS. Data-use and human
subjects’ approval were obtained from the National
Institutes of Health (dbGaP) and the Yale
Institutional Review Board.
Phenotypic correlations
Our sample included men and women who were
born between the 1890s and the 1950s, except for
age at menarche where the available sample was
much smaller (i.e. 1923–56). Cox regression was
used to calculate risk of death depending on age
at first birth (nmen = 2579; nwomen = 2193), CEB
(nmen = 3833; nwomen = 3658), and age at menarche
(n = 1355) and menopause (n = 2415) in women. In
each regression, potentially confounding effects in
lifespan were controlled by including education,
country of origin and smoking status. To test for
potential nonlinear effects, a separate regression
was run with a quadratic term included for the main
predictor traits. If quadratic terms were significant,
this was explored further by examining the Cox
Wang et al. |
Genetic links in Framingham
regression model (from the survival library in R)
using penalized splines (with 4 df) [36, 37].
The Cox proportional hazards model is a standard
tool for survival analysis, in which the log of the
hazard function h(t) is assumed to be a linear combination of the covariates. Specifically, for a model
containing p covariates x1 , . . . ,xp ; the fitted model
takes the form of
hðtÞ ¼ hðt0 Þexpð1 x1+ +p xp Þ,
where i is the coefficient fit to covariate xi and
hðt0 Þ is the unknown baseline hazard function.
Equivalently, this equation can be expressed as
hðtÞ
ln
¼ 1 x1+ +p xp :
hðt0 Þ
Note that FHS reports CEB as a value from ‘0’ to
‘5’, where ‘5’ indicates having had five or more children. Several variables were pre-adjusted for age and
year measured. For body mass index (BMI), systolic
blood pressure (SBP) and total cholesterol, age and
year effects were removed by taking residuals of each
trait against age (measures between 20 and 60 years
old) and year measured using a generalized additive model (locally weighted scatterplot smoothing,
LOESS). All residuals for a subject were then
averaged to obtain an average residual for each trait,
which were then used for modelling. As
demonstrated previously, the surface of the
generalized additive model can be accurately
estimated due to the large number of trait measurements [38].
Our initial sample included 4123 women for
whom data on age at death, CEB, education level,
smoking history, estrogen use and BMI were available. We then removed 941 women who were born in
or after 1941, a period when the correlation between
lifespan and CEB was weaker, possibly because of
the improvement of health care after World War II.
We did so because to have a chance of detecting any
significantly correlated SNPs in the GWAS, we
needed to focus on a period where the phenotypic
correlation is relatively strong. Nineteen women
who died before the age of 50 years were also
excluded, because their CEB records might represent incomplete observations. Because we excluded
women who died before the age of 50 years, we are
specifically studying the relationship of CEB to postreproductive mortality. Of the remaining 3163
women, keeping only those who had genotype data
reduced our sample size to 1810. We required this
sample to have associated genotype data because
we later used the same sample for the GWAS. Note
that our phenotypic analysis used the year 1919 as a
cut-off because the yearly ratio of individuals alive to
individuals deceased increased to about 50% in
1919, and continued to rise thereafter.
For illustrative purposes, we also ran a multiple
linear regression on a smaller sample for women,
including only the deceased subjects who were born
prior to 1919 (n = 680) out of a total of 1810 who
satisfied specific criteria outlined above.
We similarly ran a regression model on a smaller
sample of men who have died (n = 712) out of a total
of 1474 men satisfying similar criteria.
Genetic correlations and heritabilities
We estimated heritabilities and genetic correlations
for traits from pedigrees using a mixed effects
restricted maximum likelihood (REML) model in
ASReml version 3.0 [39]. We considered models in
which there were no covariates as well as adjusted
models where phenotypic variation was partitioned
into additive genetic, residual variance and a single
random effect (maternal ID, paternal ID or education level). To be consistent with the phenotypic correlation models, we also considered models in
which fixed effects (smoking status and country of
origin) and both random effects for maternal ID and
education level were included. Sex was not included
as a fixed effect as male and female estimates were
obtained separately. Smoking status (0/1, nonsmoker/smoker) and country of origin (0/1, US
born/foreign born) were coded as binary variables.
Education described number of years completed,
with missing values coded as 8 years (the minimum). Maternal variance components ranged from
0.0 (age at first birth) to 0.12 ± 0.04 (lifespan) and 0.0
(age at first birth) to 0.20 ± 0.03 (lifespan) for female
and male analyses, respectively. Education variance
components ranged from 0.0 (age at menarche) to
0.06 ± 0.03 (CEB) and 0.0 (age at first birth) to
0.014 ± 0.009 (CEB) for female and male analyses,
respectively. The Framingham pedigree totals
15 877 individuals in 1538 pedigrees consisting of
both immediate and extended family. Heritability
estimates were tested for significance with likelihood ratios that compared full models with reduced
ones (i.e. 21DF = 2 (LogLFULL LogLREDUCED))
lacking the additive genetic component. Genetic
correlations were also tested for significance by
243
244
| Wang et al.
Evolution, Medicine, and Public Health
comparing likelihood values from full models to
ones where the genetic covariance was fixed at zero.
Our genetic correlation analysis between CEB and
lifespan included a total of 5133 females for whom
age at death and CEB information were available.
Supplementary Fig. S4 summarizes the pedigree information for these women, grouped by cohort via
the ‘pedantics’ package in R [40]. Pedigree depths
(computed using the same package) for the
Framingham dataset range from 0 to 4, with mean
1.02 (±1.06). On average, each woman had 2.38
(±1.59) children in her lifetime and lived 77.21
(±12.73) years. The average level of education in
years was 11.66. The average age at menarche was
12.81 (±1.54), average age at first birth was 26.49
(±4.81) and average age at menopause was 49.20
(±4.10).
Genome-wide association study
Our association results are based on 444 205 SNPs
from the 500 K and 50 K Affymetrix samples that
satisfied the following criteria: call rate >90%,
Hardy–Weinberg equilibrium P-value >0.00001,
Mendel error rate <2% and minor allele frequency
>0.01. These SNP selection criteria are further discussed in the Supplementary Information.
We used Cox proportional hazards models, as
done in the phenotypic correlation analysis, to estimate the interactions between survival time past age
50 years, CEB and genotype. For censored individuals, we used their times of last observation past age
50 years as their censoring time.
Several models were run under this setup, which
we number to emphasize that they are nested
models. Model 1 did not adjust for any covariates.
We then added covariates to reduce confounding by
variables that may be correlated with lifespan and
CEB. Model 2 used education level. Model 3 further
added BMI, estrogen use and cohort as covariates.
Models 4a–d were intermediate steps in which one
of the four additional covariates was added: blood
pressure treatment indicator (Model 4a), total cholesterol (Model 4b), SBP (Model 4c) and smoking
indicator (Model 4d). Model 5 included all four of
these additional covariates. Models 4a–d were run
retrospectively to pinpoint which covariate, when
added, resulted in removing significance from all
SNPs. A summary of the models fitted can be found
in the Supplementary Information.
Both genotypes and CEB were included as continuous variables to model an additive effect of the
minor allele. We used both the raw genotypes
provided by FHS as well as an imputed dataset.
The imputation was done in several stages. First,
we incorporated values imputed by MACH that were
included in the FHS dataset. The MACH algorithm
imputes missing genotypes based on shared haplotype stretches between subjects and HapMap data
[41]. Of the remaining missing values, we sampled
among the possible genotypes given the genotypes
of parents, when parent genotypes were available.
Any remaining missing values were simply sampled
according to genotype proportions of the entire
group. This sequence of operations created a full
set of genotypes that had no missing values.
Cohort was defined as a categorical variable
computed from the year of birth: born before or in
1917 and born in or after 1918.
In addition to running the above five models on
the full sample of 1810, we tested our models for
robustness by mimicking an out-of-sample analysis.
To that end, we randomly divided our sample into
two equal parts and fitted Models 1–5 to each part
separately to check for consistency in significance of
the top performing SNPs. A true out-of-sample performance check would include the calculation of prediction error based on a model fitted on a training
set. Our method does not aim to validate prediction
out of sample, but rather to ensure that a SNP discovered to be significant in one sample ought to be
significant in another sample—a less stringent, but
still important requirement of consistency. To minimize the effects of missing genotypes on each subsample, which would further lower our sample size
in each of the two separate runs, we only used the
imputed genotypes for this portion of our analysis.
The downside of using imputed genotypes is the risk
of imputation error. To verify that our risk of imputation error is low, we used the imputed SNP data to
repeat our full-sample analyses for Models 1–5. Our
aim was to show that our results for these models
are similar, regardless of whether we used imputed
or raw SNP data.
To explore possible non-additive genotypic
effects, we ran a separate Model 6 that used genotype as a categorical variable. The covariates used in
Model 6 are identical to those used in Model 3, and
any SNPs for which the homozygous minor genotype had fewer than 20 counts were excluded. We
did not apply the half-sample testing to Model 6,
because in many cases, the genotype counts in the
homozygous minor allele category were too small to
Wang et al. |
Genetic links in Framingham
further subdivide the group for categorical
modelling.
Finally, we ran two additional models that are outside of the nested framework given above on the raw
data only (and therefore, they are not numbered).
A quadratic model was run to search for a possible
nonlinear effect by adding a quadratic CEB term
along with its interaction with genotype to
Model 1. The ‘matching covariates’ model was run
to provide a frame of reference to the reader; this
model uses exactly the same covariates that were
included in the phenotypic and genotypic correlation
analyses—education, smoking indicator and country of origin.
RESULTS
Phenotypic correlations
In the Cox regression analysis where as many men
and women were included as possible (birth-year
range 1889–1958), censoring was used to account
for those who were still alive according to the latest
medical records. Risk of mortality beyond age 50
years increased if women (adjusted incidence rate
ratio (RR) = 1.045, P = 0.030) had more children
(Table 1). When a nonlinear term for CEB was
included, it significantly improved the model fit
and became more significant than the linear term.
245
Penalized splines for unadjusted mortality risk
(Fig. 1) support a predominantly U-shaped pattern
for the association between CEB and lifespan, similar to that found in some other studies (e.g. [19]).
This is consistent with a cost of reproduction that is
experienced by women with three or more children
and with a benefit of reproduction to those who have
one or two children. Highest mortality risk occurred
in women with no children or more than three to four
children, with lowest risk for those with approximately two. Mortality risk decreased if the first child
was born later (women, unadjusted RR = 0.971,
P < 0.001; men, adjusted RR = 0.985, P = 0.011; see
Supplementary Fig. S1), but the significance of this
effect depended on whether estimates were adjusted
or not (Table 1). Mortality risk was also reduced if
menopause occurred later in women (unadjusted
RR = 0.970, P = 0.003), although this effect disappeared when other effects were controlled for
(Table 1). Full model results can be seen in
Supplementary Table S1.
In the analysis where only the 680 women were
included in the range of birth years 1889–1918 in
which all had died, the phenotypic correlation between CEB and lifespan was highly significant and
negative (r = 0.133, P = 0.0005; Fig. 2). Linear regression indicated that every additional child cost
0.74 years of lifespan (standard error (SE) = 0.21
years). There was, however, significant variation in
Table 1. Incidence RR (±95% confidence interval) for age at death due to stroke, heart attack or cancer
(beyond age 50 years)
Trait
Women
Men
Unadjusted
Adjusted
Unadjusted
Adjusted
CEB
1.050*
(1.011–1.092)NL**
n = 3729
1.045*
(1.005–1.087)NL***
0.995
(0.960–1.033)
n = 3888
1.031
(0.993–1.071)
Age first birth
0.971***
(0.955–0.988)NL**
n = 2236
0.977*
(0.960–0.994)NL*
0.990
(0.979–1.001)
n = 2613
0.985**
(0.974–0.995)
Menarche
0.891
(0.757–1.050)
n = 1367
0.917
(0.782–1.077)
Menopause
0.970**
(0.951–0.990)
n = 2461
0.984
(0.965–1.005)
Unadjusted Cox regression estimates included only the main predictor trait. Cultural effects (smoking, education and country-of-origin) were accounted
for in adjusted estimates. ‘NL’ indicates that a significant nonlinear effect was also detected for the association between this trait and longevity.
*P < 0.05, **P < 0.01, ***P < 0.001.
246
| Wang et al.
Evolution, Medicine, and Public Health
Figure 1. Summary of CEB and mortality risk in Framingham
women. A histogram of CEB and log-relative mortality risk
values for each CEB value with 95% confidence bands
Figure 3. Correlation between CEB and lifespan by birth year
(n = 5133)
for women. Women (n = 680) were grouped by overlapping
10-year intervals of birth year, and the correlation between
CEB and lifespan was computed for each group. Individual
points indicate the sample size of each 10-year group, with
the mean birth year plotted on the x-axis and correlation
plotted on the y-axis
phenotypic correlations are dependent on birth
year is consistent with previous findings that
selection pressures changed over time in
Framingham [38].
Heritabilities and genetic correlations
Figure 2. Relationship between CEB and lifespan for women.
Scatterplot illustrating correlation between CEB and lifespan
(r = 0.133, P < 0.001) (n = 680). Both variables have been
jittered to minimize overlap of points
the phenotypic correlation by birth year (Fig. 3); it
was positive (with one exception) from 1893 to 1907
and negative from 1908 to 1913. Many in the earlier
group were giving birth before the Great Depression
and World War II. Some of the latter group encountered those two major environmental perturbations.
The correlation between CEB and lifespan for the 712
men was slightly negative (r = 0.079, P = 0.0355;
Supplementary Fig. S2). An additional child cost
0.54 years of male lifespan (SE = 0.26 years).
Again, the correlation varied by birth year, but the
variations were less pronounced than for females
(Supplementary Fig. S3). The observation that
In women (Table 2), the heritabilities of most major
life-history traits differed significantly from zero,
including age at death (h2 = 0.12, P = 0.01), CEB
(h2 = 0.09, P = 0.03), age at first birth (h2 = 0.18,
P < 0.001) and menopause (h2 = 0.44, P < 0.001).
In women, the genetic correlation of CEB with age
at death was large, negative and significant
(rG = 0.88, P = 0.01) in a model without covariates
(Supplementary Table S2). When we included education as a random effect, the genetic correlation
decreased to 0.70 but was still significant
(P = 0.02). When we included either the mother or
the father identifiers in place of education as a random effect, the genetic covariance remained large
and negative, but was no longer significant (mother:
rG = 1.58, P = 0.11; father: rG = 1.46, P = 0.15). The
model in which we adjusted for education, smoking
status and country of origin also produced a large
negative genetic correlation, but the correlation was
not significant (rG = 0.69, P = 0.14).
The correlation between the quadratic term CEB2
and lifespan was large, negative and significant in
Wang et al. |
Genetic links in Framingham
247
Table 2. Heritabilities (h2, on the diagonal) and genetic correlations (rG, off the diagonal) of life history
traits (±SE)
Women
Age at death
Age at death
CEB
Age first birth
Menarche
Menopause
0.12 ± 0.08
P = 0.0176
n = 3010
0.69 ± 0.52
P = 0.1420
0.20 ± 0.25
P = 0.2083
0.07 ± 0.23
P = 0.3886
0.15 ± 0.17
P = 0.1917
0.09 ± 0.05
P = 0.0394
n = 4123
0.40 ± 0.35
P = 0.1545
0.31 ± 0.24
P < 0.0001
0.21 ± 0.21
P = 0.1377
0.18 ± 0.06
P = 0.0008
n = 2912
0.38 ± 0.33
P = 0.0911
0.06 ± 0.14
P = 0.3541
0.16 ± 0.13
P = 0.0948
n = 1638
0.10 ± 0.21
P = 0.3121
CEB
Age first birth
Menarche
Menopause
Men
Age at death
CEB
0.44 ± 0.06
P < 0.0001
n = 3400
<0.01 ± <0.01
P = 0.8875
n = 2963
<0.01 ± <0.01
P = 0.7773
<0.01 ± <0.01
P = 0.6101
<0.01 ± <0.01
P = 0.5485
n = 4051
<0.01 ± <0.01
P = 0.3884
Age first birth
0.12 ± 0.07
P = 0.0300
n = 2688
SEs and P-values were obtained from maximum-likelihood estimates. Cultural (smoking, education and country-of-origin) and maternal effects were
accounted for in all estimates. P-values < 0.05 are in bold.
three of four models (no covariates: rG = 1.09,
P = 0.003, only mother identifier as random effect:
rG = 1.73, P = 0.04, only education as random effect: rG = 0.85, P = 0.01), and borderline non-significant in the model with only the father identifier
(rG = 1.61, P = 0.06).
Furthermore, we looked to see if the genetic correlation between CEB and lifespan was robust to
pedigree depth in the simplest model where no
covariates were included. Including only those
women with pedigree depth of 1 or higher (n = 2540),
we got rG = 0.46 (P = 0.14) and including only those
women with pedigree depth of 2 or higher (n = 948),
we got rG = 0.21 (P = 0.60); both correlations were
no longer significant in the reduced samples.
The genetic correlation of CEB with age at menarche was relatively large, positive and highly significant (rG = 0.31, P < 0.001). In men (Table 2), the
heritability of age at first birth (inferred from their
spouses) was small and only just significant
(h2 = 0.12, P = 0.03). All other male heritability and
genetic correlation estimates were non-significant.
Full model results for heritability can be seen in
Supplementary Table S2.
Genome-wide association study
GWAS results are summarized in Tables 3–10; the
birth years for the 1810 women included in
the GWAS are shown in the Supplementary
Information. We deemed a SNP to be genome-wide
significant if its interaction coefficient with CEB had
a P-value that was less than a Bonferroni-adjusted
threshold of 1.13 107 ( = 0.05), unless otherwise
indicated. For females, we found two SNPs that attained genome-wide significance using the full
248
| Wang et al.
Evolution, Medicine, and Public Health
Table 3. GWAS for SNPs that affect the relationship between CEB and lifespan: summary of significant
SNPs in Models 1–3 and 5 (full sample)
Ssid
Rsid
Chr
Position
P-values (genotype CEB)
Near
Model 1
ss66450977 rs6768456 3
ss66475987 rs2575533 4
Model 2
Model 3
Model 4
Model 5
Matching
covariates
7.99E07 4.93E08a
27867272 EOMES 4.03E10a 4.38E10a 8.40E09a
(see Table 4)
42432336 ATP8A1 8.02E08a 5.30E08a 3.06E06
2.49E05 2.11E07
n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3.
a
SNP attained genome-wide significance.
Table 4. GWAS for SNPs that affect the relationship between CEB and lifespan: summary of significant
SNPs in Models 4 (full sample)
Ssid
Rsid
Chr
Position
P-values (genotype CEB)
Near
Model 4a
ss66450977
ss66475987
rs6768456
rs2575533
3
4
27867272
42432336
EOMES
ATP8A1
1.40E09
1.02E05
Model 4b
a
a
7.44E09
3.56E06
Model 4c
8.65E09
5.23E06
Model 4d
a
4.02E07
1.35E05
n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3.
a
SNP attained genome-wide significance.
Table 5. GWAS for SNPs that affect the relationship between CEB and lifespan: summary of nominally
significant SNPs in Model 6
Ssid
Rsid
Chr
Position
Near
P-value
Aa CEB
P-value
aa CEB
Homozygous minor
genotype count
ss66450977
ss66500131
ss66392234
ss66495977
rs6768456
rs1777023
rs7132724
rs2180957
3
9
12
14
27867272
92008266
65001044
68238574
EOMES
OR7E31P
HELB
RAD51B
1.00E07
1.00E01
1.30E01
1.20E01
2.40E03
3.00E07
9.60E08
8.70E07
21
26
102
21
n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3.
Table 6. GWAS for SNPs that affect the relationship between CEB and lifespan: re-evaluating significant
SNPs in Models 1–3 and 5 (split samples)
Ssid
ss66450977
ss66475987
Sample half 1
Sample half 2
P-values (genotype CEB)
P-values (genotype CEB)
Model 1
Model 2
Model 3
Model 5
Model 1
Model 2
Model 3
Model 5
0.00032
0.0002
0.00041
0.00012
0.00097
0.0021
0.007
0.001
9.39E08a
5.46E04
7.04E08a
4.46E04
1.36E06
1.56E03
4.58E06
1.39E02
n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3.
a
SNP attained genome-wide significance.
Wang et al. |
Genetic links in Framingham
249
Table 7. GWAS for SNPs that affect the relationship between CEB and lifespan: re-evaluating significant
SNPs in Models 4a–d (split samples)
Ssid
ss66450977
ss66475987
Sample half 1
Sample half 2
P-values (genotype CEB)
P-values (genotype CEB)
Model 4a
Model 4b
Model 4c
Model 4d
Model 4a
Model 4b
Model 4c
Model 4d
8.40E04
3.00E03
1.30E03
9.40E04
8.00E04
1.80E03
7.40E03
3.70E03
3.33E06
2.00E03
1.19E06
2.30E03
1.32E06
3.80E03
3.35E06
3.40E03
n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3.
Table 8. GWAS for SNPs that affect the relationship between CEB and lifespan: top SNPs in Model 5
(split sample)
Ssid
Rsid
ss66092635
ss66508254
ss66392234
ss66328248
ss66531142
ss74823403
ss66231005
ss66273879
ss66526690
ss66490007
Chr
rs6581676
rs2961258
rs7132724
rs13248967
rs11219832
rs7860830
rs10899741
rs1728810
rs1602160
rs11009744
P-values (genotype CEB)
Position
12
7
12
8
11
9
7
3
6
10
64992353
15150223
65001044
114920075
124272500
26882137
52215028
10992443
94277193
34675601
Sample 1
Sample 2
9.12E06
1.41E05
1.82E05
2.81E05
3.65E05
3.27E01
4.62E01
4.15E01
9.00E01
9.86E01
4.58E01
7.86E01
4.86E01
6.86E01
1.79E01
7.19E10a
9.84E08a
1.07E07a
1.57E07
2.37E07
n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3.
a
SNP attained genome-wide significance.
Table 9. GWAS for SNPs that affect the relationship between CEB and lifespan: summary of significant
SNPs in Models 1–3 and 5 (full sample) (imputed SNPs)
Ssid
ss66450977
ss66475987
Rsid
rs6768456
rs2575533
Chr
3
4
Position
27867272
42432336
P-values (genotype CEB)
Near
EOMES
ATP8A1
Model 1
Model 2
Model 3
Model 4
Model 5
2.91E10a
1.50E07
2.20E10a
6.57E08a
6.44E09a
5.03E06
(see Table 10)
5.56E07
2.94E05
n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3.
a
SNP attained genome-wide significance.
sample: ss66450977 on Chromosome 3 (close to
EOMES) and ss66475987 on Chromosome 4 (close
to ATP8A1). Their levels of significance decreased as
additional covariates were included in the model;
however, these SNPs were also significant in the
matching covariates model (Tables 3 and 4). We
also found two nominally significant SNPs that
exhibited possibly non-additive effects: ss66392234
250
| Wang et al.
Evolution, Medicine, and Public Health
Table 10. GWAS for SNPs that affect the relationship between CEB and lifespan: summary of
significant SNPs in Model 4 (full sample) (imputed SNPs)
Ssid
Rsid
Chr
Position
P-values (genotype CEB)
Near
Model 4a
ss66450977
ss66475987
rs6768456
rs2575533
3
4
27867272
42432336
EOMES
ATP8A1
1.40E08
1.02E05
Model 4b
a
a
6.30E09
5.80E06
Model 4c
4.30E09
5.40E06
Model 4d
a
3.87E07
2.30E05
n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3.
a
SNP attained genome-wide significance.
on Chromosome 12 (in HELB) and ss66500131
on Chromosome 9 (close to the pseudogene
OR7E31P) (Table 5). Nearby genes/pseudogenes
were determined based on a radius of 150 kb from
each SNP.
In the split-sample analysis using imputed SNP
data (see ‘Methodology’ section regarding details
on imputation), no SNPs were found to be significant for females (Tables 6–8), even when the randomization used in the split-sample assignment
was replicated 100 times. We verified that using
the imputed data for the full-sample analysis would
have yielded comparable levels of significance for
the two SNPs previously discovered in Models 1–5
(Tables 9 and 10).
No significant SNPs were detected for males in
Models 1–3. As in the GWAS for females, the addition of more covariates decreased levels of significance, and therefore no further models were run.
No significant SNPs were detected in a model that
included a quadratic effect of CEB. Further details on
the GWAS for females are in the Supplementary
Information.
CONCLUSIONS AND IMPLICATIONS
Phenotypic and genetic correlations
The phenotypic correlation between CEB and
lifespan in women differed with birth year,
demonstrating the importance of phenotypic plasticity on the relationships among life-history traits.
Secular cultural and environmental changes affect
that correlation and probably account for much of
the variation among studies [6, 15, 19, 21, 22]. The
estimate of a negative genetic correlation in women
when not accounting for covariates (rG = 0.88) was
large. The effects of shared environment reduced the
strength of the linear correlation and increased the
strength of the quadratic correlation, and education
mimicked the effects of a cost of reproduction in that
increased level of education was associated with
both fewer children and longer life: including education decreased the estimate of the genetic
correlation.
Some of our genetic correlation estimates were
below 1. This indicates that the estimated variance
component is negative, known to be a possible result of REML estimation [42].
When we controlled for the effects of smoking,
education, country of origin and maternal effects,
the correlation was still negative (rG = 0.69) yet
no longer significant. This mirrors the pattern we
observed in the GWAS; as covariates were
introduced into the model, associations became
insignificant.
The mean pedigree depth of 1.02 implies that our
pedigree is dominated by parent–offspring relationships. This may result in some difficulty distinguishing
parental, environmental and additive genetic effects.
For example, cultural and lifestyle habits that are
unique to nuclear families (such as diet) are known
to affect lifespan, but these habits are not recorded,
and therefore the genetic correlations that we see may
be confounded by these unobservable factors.
One can only find a genetic correlation when the
phenotypic correlation is significant, and one can
only find significant effects of SNPs on a phenotypic
correlation when it differs from zero. Our chain of
inference thus depends on genetic effects not being
too masked by phenotypic plasticity.
Gene functions
We found several SNPs with nominally significant
effects on the correlation of CEB with post-reproductive lifespan; two of them are near EOMES and
RAD51B, genes that are related to cancer when
under-expressed. The effect of the SNP close to
EOMES reached genome-wide significance. The
Wang et al. |
Genetic links in Framingham
EOMES gene has been associated with multiple
sclerosis and bladder cancer [43, 44]. RAD51B, a
gene involved in encoding proteins that participate
in DNA repair, has been linked to breast cancer and
brain cancer [45–48]. Further details on the genes in
proximity to the SNPs found significant in our GWAS
are included in the Supplementary Information.
Although these SNPs were close in physical distance
to their respective genes (<130 kb), further study of
linkage disequilibrium would help to understand
their possible association.
Other studies
Voorhuis et al. [49] collated the results of many genetic studies of age at natural menopause. None of
the SNPs that we discovered were found in the
studies included in their summary.
Several other recent genetic studies relate fertility
to genotype. Kosova et al. [50] found 41 SNPs
(P < 104) that were associated with decreased male
fertility. Adachi et al. [51] found 36 SNPs (P < 104)
with possible links to endometriosis in Japanese females. Both were GWAS studies that did not find any
genome-wide significant SNPs. Murray et al. [52] reported confirmations for four SNPs previously
identified as associated with age at menopause.
Ewens et al. [53] examined 15 SNPs linked with obesity to evaluate possible associations with polycystic
ovary syndrome, the cause of a form of infertility in
women; only one SNP had a nominal level of significance, and the significance did not hold up in another case–control study. Our methods differ
fundamentally from these four studies in that we
considered lifespan in conjunction with fertility,
and the significant SNPs we found were not reported
in their analyses [50–53].
Although the Kuningas Rotterdam study
incorporated mortality in its analysis and was therefore more similar to our study [35], it differs from our
approach in three ways: (i) our analysis included
many more SNPs (444 205 versus their 1664), (ii)
we adjusted for the effects of several direct mortality-affecting covariates such as smoking and SBP,
(iii) Kuningas used an initial screening of the 1664
SNPs with a set-based test (with a threshold of
P < 0.05), whereas we started with a GWAS across
444 205 SNPs in models that relate each SNP to
both CEB and lifespan (with a threshold of
P < 1.13 107). We did not find Bonferroni-level
significance with SNPs near the four gene regions
identified in [35].
Summary
We have analysed phenotypic and genetic correlations between reproductive success and survival
and have identified a small set of genes that may
mediate a trade-off between them. This warrants further studies in other samples.
The Framingham dataset has some shortcomings. In particular, women born before the start
of the study would only have been included
in the study if they survived until 1948–52
(when the study began). Therefore, our dataset
does not include anyone who died during
World War I, the 1918 flu pandemic, the
Great Depression and World War II. If these catastrophic events affected women differently depending on their fertility and lifespan, then excluding
these women from our analysis would bias our
results. The issue is inherent in such observational
studies of humans, and unfortunately cannot be
avoided.
We failed to find any significant SNPs when
covariates (i.e. smoking, country of origin and
average cholesterol levels) were included and
when we did a rough check for consistency out
of sample. It is unknown how often such checks
modify significance of SNP associations, for many
other published GWAS studies do not account for
the effects of covariates or do out-of-sample
predictions.
AUTHOR CONTRIBUTIONS
S.G.B. and X.W. jointly worked on processing and
cleaning the data and phenotypic correlation calculations. S.G.B. further calculated the genetic correlations and heritabilities. X.W. performed the GWAS.
S.C.S. conceived of the study and drafted the initial
manuscript. All authors contributed to the final
manuscript.
supplementary data
Supplementary data is available at EMPH online.
acknowledgements
The authors thank Drs John W. Emerson and Andrew
Pakstis for their feedback and insight on the project and
the three anonymous reviewers for their constructive
feedback.
251
252
| Wang et al.
Evolution, Medicine, and Public Health
funding
16. Muller HG, Chiou JM, Carey JR et al. Fertility and life span:
late children enhance female longevity. J Gerontol Ser A
The study was supported by the Yale University and the Marie
Curie International Incoming Fellowship FP7-PEOPLE-2010IIF-276565.
Conflict of interest: None declared.
Biol Sci Med Sci 2002;57:B202–6.
17. Sear R. The impact of reproduction on Gambian
women: does controlling for phenotypic quality reveal
costs of reproduction? Am J Phys Anthropol 2007;132:
632–41.
18. Lawlor DA, Emberson JR, Ebrahim S et al. Is the
references
association between parity and coronary heart disease
due to biological effects of pregnancy or adverse lifestyle
1. Williams G. Pleiotropy, natural selection, and the evolution of senescence. Evolution 1957;11:398–411.
2. Williams G. Natural selection, costs of reproduction,
and a refinement of Lack’s principle. Am Nat 1966;100:
687–90.
3. Roff D. The Evolution of Life Histories. New York: Chapman
and Hall, 1992.
4. Stearns SC. The Evolution of Life Histories. Oxford
University Press: Oxford, 1992, 249.
5. Stearns SC, Partridge L (2001) The genetics of aging in
Drosophila. In: Masoro EJ, Austad SN (eds), Handbook
of the Bioloy of Aging, 5th edn. 2001, 353–68.
6. Doblhammer G, Oeppen J. Reproduction and longevity
among the British peerage: the effect of frailty and health
selection. Proc Biol Sci 2003;270:1541–7.
risk factors associated with child-rearing? Findings
from the British Women’ Heart and Health Study and
the British Regional Heart Study. Circulation 2003;107:
1260–4.
19. Lund EE, Arnesen EE, Borgan JKJ. Pattern of childbearing
and mortality in married women—a national prospective
study from Norway. J Epidemiol Commun Health 1990;44:
237–40.
20. Manor OO, Eisenbach ZZ, Israeli AA et al. Mortality differentials among women: the Israel Longitudinal Mortality
Study. Soc Sci Med (1967) 2000;51:1175–88.
21. Dribe M. Long-term effects of childbearing on mortality:
evidence from pre-industrial Sweden. Popul Stud 2004;58:
297–310.
22. Gavrilova N, Gavrilov L, Semyonova VG et al. Does excep
7. Gagnon A, Smith KR, Tremblay M et al. Is there a trade-
tional human longevity come with a high cost of infertility?
off between fertility and longevity? A comparative
Testing the evolutionary theories of aging. Ann N Y Acad
study of women from three large historical databases
accounting for mortality selection. Am J Hum Biol 2009;
21:533–40.
8. Maklakov AA. Sex difference in life span affected by female
birth rate in modern humans. Evol Hum Behav 2008;29:
444–9.
9. Tabatabaie V, Atzmon G, Rajpathak SN et al. Exceptional
longevity is associated with decreased reproduction. Aging
2011;3:1202–5.
10. Thomas F, Teriokhin A, Renaud F et al. Human longevity at
the cost of reproductive success: evidence from global
data. J Evol Biol 2000;13:409–14.
11. Westendorp R, Kirkwood T. Human longevity at the cost of
reproductive success. Nature 1998;396:743–6.
12. Fuster V. Widowhood, illegitimacy, marital reproduction
and female longevity in a rural Spanish population. Homo
2011;62:500–9.
13. Helle S, Lummaa V, Jokela J. Are reproductive and somatic
senescence coupled in humans? Late, but not early,
Sci 2004;1019:513–7.
¨¨
23. Helle S, Ka
ar P, Jokela J. Human longevity and early reproduction in pre-industrial Sami populations. J Evol Biol
2002;15:803–7.
24. Jacobsen BK, Knutsen SF, Oda K et al. Parity and total,
ischemic heart disease and stroke mortality. The
Adventist Health Study, 1976–1988. Eur J Epidemiol
2011;26:711–8.
25. Jasienska G, Nenko I, Jasienski M. Daughters increase
longevity of fathers, but daughters and sons equally
reduce longevity of mothers. Am J Hum Biol 2006;18:
422–5.
26. Korpelainen H. Fitness, reproduction and longevity
among
European
aristocratic
and
rural
Finnish
families in the 1700s and 1800s. Proc Biol Sci 2000;267:
1765–70.
27. Lycett JE, Dunbar RIM, Voland E. Longevity and the costs
of reproduction in a historical human population. Proc Biol
Sci 2000;267:31–5.
reproduction correlated with longevity in historical sami
28. Cesarini D, Lindqvist E, Wallace B. Maternal longevity and
women. Proc Biol Sci 2005;272:29–37.
14. Le Bourg E, Thon B, Le´gare´ J et al. Reproductive life of
the sex of offspring in pre-industrial Sweden. Ann Hum
French–Canadians in the 17–18th centuries: a search for
29. Gavrilov L, Gavrilova N. Is there a reproductive cost for
a trade-off between early fecundity and longevity. Exp
Gerontol 1993;28:217–32.
15. McArdle P, Pollin T, O’Connell J et al. Does having children
Biol 2007;34:535–46.
human longevity? J Anti-Aging Med 1999;2:121–3.
30. Jasienska G. Reproduction and lifespan: trade-offs,
overall energy budgets, intergenerational costs, and
extend life span? A genealogical study of parity and lon-
costs neglected by research. Am J Hum Biol 2009;21:524–32.
gevity in the Amish. J Gerontol Ser A Biol Sci Med Sci 2006;
31. Mitteldorf J. Female fertility and longevity. Age 2010;32:
61:190–5.
79–84.
Wang et al. |
Genetic links in Framingham
32. Le Bourg E. Does reproduction decrease longevity in
human beings? Ageing Res Rev 2007;6:141–9.
application of these as urinary tumor markers. Clin
Cancer Res 2011;17:5582–92.
33. Stearns S, Ackermann M, Doebeli M et al. Experimental
44. Patsopoulos NA, de Bakker PIW. Genome-wide meta-
evolution of aging, growth, and reproduction in fruitflies.
analysis identifies novel multiple sclerosis susceptibility
Proc Natl Acad Sci USA 2000;97:3309–13.
¨gele M, Pattaro C, Fuchsberger C et al. Heritability
34. Go
analysis of life span in a semi-isolated population followed
45. Liu Y, Shete S, Wang LE et al. Gamma-radiation sensitivity
and polymorphisms in RAD51L1 modulate glioma risk.
across four centuries reveals the presence of pleiotropy
between life span and reproduction. J Gerontol Ser A Biol
Sci Med Sci 2011;66:26–37.
loci. Ann Neurol 2011;70:897–912.
Carcinogenesis 2010;31:1762–9.
46. Figueroa JD, Garcia-Closas M, Humphreys M et al.
Associations of common variants at 1p11. 2 and 14q24. 1
¨e S, Uitterlinden AG et al. The relation35. Kuningas M, Altma
(RAD51L1) with breast cancer risk and heterogeneity
ship between fertility and lifespan in humans. Age 2011;33:
by tumor subtype: findings from the Breast Cancer
Association Consortium. Hum Mol Genet 2011;20:4693–706.
615–22.
36. R Core Team. (2009) R: A Language and Environment
for Statistical Computing. R Foundation for Statistical
47. Shu XO, Long J, Lu W et al. Novel genetic markers of breast
cancer survival identified by a genome-wide association
37. Therneau T. (2013) A Package for Survival Analysis in S.
study. Cancer Res 2012;72:1182–9.
¨stro
¨m S, Henriksson R et al. DNA-repair
48. Wibom C, Sjo
R package version 2.37–4. http://CRAN.R-project.org/
gene variants are associated with glioblastoma survival.
Computing: Vienna, Austria, 2009.
Acta Oncol 2012;51:325–32.
package=survival.
al.
49. Voorhuis M, Onland-Moret NC, van der Schouw YT et al.
Colloquium papers: natural selection in a contemporary
Human studies on genetics of the age at natural meno-
human population. Proc Natl Acad Sci 2010;107(Suppl. 1),
1787–92.
pause: a systematic review. Hum Reprod Update 2010;16:
364–77.
39. Gilmour A, Gogel B, Cullis B. ASReml user guide release 3.0.
50. Kosova G, Scott NM, Niederberger C et al. Genome-
38. Byars
SG,
Ewbank
D,
Govindaraju
DR
et
wide association study identifies candidate genes for
VSN International Ltd, 2002.
40. Morrissey MB, Wilson A. Pedantics: an R package for
pedigree-based
genetic
simulation
and
pedigree
manipulation, characterization and viewing. Molecular
Ecology Resources 2010;10:711–9.
41. Li Y, Abecasis GR. Mach 1.0: rapid haplotype reconstruction and missing genotype inference. Am J Hum Genet
2006;S79:2290.
male fertility traits in humans. Am J Hum Genet 2012;90:
950–61.
51. Adachi S, Tajima A, Quan J et al. Meta-analysis of genomewide association scans for genetic susceptibility to endometriosis in Japanese population. J Hum Genet 2010;55:
816–21.
52. Murray A, Bennett CE, Perry JRB et al. Common genetic
42. Thompson WA Jr. The problem of negative esti-
variants are significant risk factors for early menopause:
mates of variance components. Ann Math Stat 1962;33:
results from the Breakthrough Generations Study. Hum
273–89.
Mol Genet 2010;20:186–92.
43. Reinert T, Modin C, Castano FM et al. Comprehensive
53. Ewens KG, Jones MR, Ankener W et al. FTO and MC4R
genome methylation analysis in bladder cancer: identifi-
gene variants are associated with obesity in polycystic
cation and validation of novel methylated genes and
ovary syndrome. PLoS One 2011;6:e16390.
253
27
Evolution, Medicine, and Public Health [2013] pp. 27–36
doi:10.1093/emph/eot001
orig inal
research
article
Identifying future zoonotic
disease threats
Where are the gaps in our
understanding of primate infectious
diseases?
1
School of Natural Sciences; 2Trinity Centre for Biodiversity Research, Trinity College Dublin, Dublin 2, Ireland and
3
Department of Human Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA.
*Corresponding author. School of Natural Sciences, Trinity College Dublin, Dublin 2, Ireland. Tel: +353-(0)1-896-1926;
Fax:+353-1-6778094; E-mail: ncooper@tcd.ie; cnunn@oeb.harvard.edu.
Received 30 October 2012; revised version accepted 1 January 2013.
ABSTRACT
Background and objectives: Emerging infectious diseases often originate in wildlife, making it important
to identify infectious agents in wild populations. It is widely acknowledged that wild animals are incompletely sampled for infectious agents, especially in developing countries, but it is unclear how much
more sampling is needed, and where that effort should focus in terms of host species and geographic
locations. Here, we identify these gaps in primate parasites, many of which have already emerged as
threats to human health.
Methodology: We obtained primate host–parasite records and other variables from existing databases.
We then investigated sampling effort within primates relative to their geographic range size, and within
countries relative to their primate species richness. We used generalized linear models, controlling for
phylogenetic or spatial autocorrelation, to model variation in sampling effort across primates and
countries. Finally, we used species richness estimators to extrapolate parasite species richness.
Results: We found uneven sampling effort within all primate groups and continents. Sampling effort
among primates was influenced by their geographic range size and substrate use, with terrestrial species receiving more sampling. Our parasite species richness estimates suggested that, among the best
sampled primates and countries, almost half of primate parasites remain to be sampled; for most
primate hosts, the situation is much worse.
Conclusions and implications: Sampling effort for primate parasites is uneven and low. The sobering
message is that we know little about even the best studied primates, and even less regarding the spatial
and temporal distribution of parasitism within species.
K E Y W O R D S : sampling events; parasite species richness; Global Mammal Parasite Database;
relative sampling effort
ß The Author(s) 2013. Published by Oxford University Press on behalf of the Foundation for Evolution, Medicine, and Public Health. This is an Open Access
article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits
unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Natalie Cooper*1,2 and Charles L. Nunn3
28
Evolution, Medicine, and Public Health
| Cooper and Nunn
INTRODUCTION
we can do this, we need to identify gaps in our
knowledge of wildlife infectious diseases.
Here, we investigate gaps in our knowledge of
primate parasites. We chose primates because they
are our closest relatives, and partly as a consequence, many of humanity’s biggest killers have
originated in wild primates (e.g. HIV [4]). In addition, much is known about primate parasites. We
acknowledge at the outset, however, that many other
vertebrates have been sources of emerging
infectious diseases in humans, and are thus suitable
for extensions of the effort conducted here. We use
the word parasite in a general sense, referring to
both microparasites such as viruses, bacteria, fungi
and protozoa, and macroparasites such as helminths and arthropods. To assess gaps in our understanding of primate parasites, we examined records
from the GMPD [12]—a large-scale compilation of
parasite records from wild mammals—and use
these data to quantify and model variation in
sampling effort.
METHODOLOGY
Data collection
We obtained host–parasite records from the
GMPD (accessed 15 October 2012; [12]), geographic
range maps from IUCN [14] and the dated consensus phylogeny from ‘10kTrees’ version 3 [15]. For
consistency across our analyses, we only included
primate species found in both the range maps and
phylogeny, and that we could identify to the species
level using the taxonomy of Wilson and Reeder [16].
For analyses of geographical sampling gaps, we
obtained latitude and longitude coordinates for each
host–parasite record with locality data from the
GMPD.
For each primate species, we collated data on
adult body mass (g) from Jones et al. [17]. We also
defined the substrate use of each species as terrestrial (>90% of time on ground), semi-terrestrial
(<90% but >50% of time on ground), semi-arboreal
(<90% but >50% of time in trees) or arboreal
(>90% of time in trees) using Nowak [18], and
treated this as a continuously varying character in
the analyses. For each country, we assembled data
on gross domestic product (GDP) per capita (USD),
land area (km2), and the number of airports from
Central Intelligence Agency [19] and Emerson et al.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Many of the most devastating infectious diseases in
humans have origins in wildlife [1–3]. For example,
the global AIDS pandemic originated through
human contact with wild African primates [4] and
influenza viruses circulate among wild bird populations [5]. These are not only historical occurrences.
Recently, for example, rodents were identified as the
source of a hantavirus outbreak in Yosemite
National Park, USA [6] and a novel rhabdovirus
(Bas-Congo virus) of probable animal origin
emerged in the Democratic Republic of Congo [7].
As human populations continue to expand into new
areas and global changes in temperature and habitat
alter the distributions of wild animals, humans
around the world will have greater contact with wildlife [8]. Thus, understanding which infectious agents
have the potential to spread from animals to
humans is crucial for preventing future human disease outbreaks. Here, we outline current gaps in our
knowledge of primate infectious diseases at phylogenetic and geographic scales. By doing so, we provide new directions for sampling wild primates and a
statistical framework to address this issue in other
groups.
The first step in predicting zoonotic disease risks
to humans is to identify the animal hosts of infectious agents. This information provides several insights. First, it gives information on the host range
and specificity of the infectious agent. Second, it
provides information on the geographic distribution
of the infectious agent in wildlife, which can be
compared with human population density. Finally,
knowing the hosts of an infectious agent also provides information on risks for host shifts to humans
[9, 10]. For example, a host living at high density is
likely to exhibit higher prevalence of the infectious
disease and to have more contact with humans or
domesticated animals.
Many efforts are being made to document and
collate information on wildlife and human diseases
(e.g. HealthMap [11], EID Event Database [2] and
Global Mammal Parasite Database (GMPD) [12]).
Unfortunately, large-scale analyses of this type have
revealed major variations in sampling effort among
hosts and geographic regions, with some species
and areas being sampled rarely or not at all [10, 13].
If we hope to use wildlife disease data to make reliable predictions about future risks to humans, we
need to increase sampling in potential hosts and
the areas in which they are found. However, before
Identifying future zoonotic disease threats
Sampling effort
Our measure of sampling effort is the number of
sampling events for each primate. We define a
sampling event as one primate species being
sampled for one parasite species in one location
in one paper. The number of sampling events in a
paper depends on how the results were reported in
the paper, and hence how they were added to the
GMPD. For example, a paper reporting that Pan
troglodytes is infected by Ascaris lumbricoides represents one sampling event; a paper reporting that P.
troglodytes is infected by A. lumbricoides and SIVcpz
in Location A and Location B represents four
sampling events. This method assumes that each
sampling event represents equivalent research effort; however, some sampling events may represent
multiple years, multiple populations and/or multiple individuals, while others represent only one
individual sampled once. Other samples may be
counted multiple times, for example one fecal sample may reveal several parasites. However, in
general, we believe that our definition of sampling
events should give us a conservative estimate of
sampling effort. Note that we included sampling
events with zero prevalence for the parasite
sampled because these still represent valid
sampling effort.
In total, our host–parasite data consisted of 5459
sampling events, which we used to quantify relative
sampling of primate species. Of these sampling
events, 4067 have georeferenced localities in the
GMPD and thus we also used these to quantify relative sampling of geographic regions. As mentioned
above, we only included parasites we could identify
to species or strain for species accumulation curves,
leaving us with 3999 sampling events in these analyses. These criteria meant our species accumulation curves only use around 75% of our sampling
events for some analyses, but they are necessary to
ensure that we are using the highest quality data in
the analyses of specific areas. It also further highlights the need for more research into primate parasites. We deposited all data in the Dryad repository:
doi:10.5061/dryad.510sb.
Analyses
Variation in sampling effort among primate
species
All else being equal, primates should be sampled in
proportion to their abundance, so we used ln(geographic range size) as a proxy for abundance and
assumed primates with the largest geographic range
sizes should be sampled more than primates with
small ranges. We estimated sampling relative
to geographic range size using the residuals from
a phylogenetic generalized least squares (PGLS)
model of ln(sampling events) against ln(geographic
range size), fitted using the R package caper [21]
(Appendix 1). We considered primates with positive residuals as being relatively better sampled
given their geographic range size than primates
with negative residuals, and displayed these results
on the phylogeny. We also expect great apes
(Hominoidea) to be better sampled than other
primates, so we tested this using phylogenetic
analysis of variance (ANOVA; Appendix 1).
Variation in sampling effort among geographic
regions
We assumed that countries should be sampled in
proportion to the number of primates found within
the country, i.e. countries with high primate species
29
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
[20]. We estimated airport density (airport/km2) by
dividing the number of airports by the land area of
the country.
Overall, our dataset contained 228 primate species from 89 countries. We located host–parasite
data for 166 of these species from 57 countries.
The remaining 62 species and 32 countries have
no records in the GMPD and were listed as
‘unsampled’ in our analyses. Our parasite data contain 651 unique parasite species or genera with
non-zero prevalence in primates (87 arthropods,
50 bacteria, 6 fungi, 326 helminths, 115 protozoa
and 67 viruses). Our dataset also contains 46 unique
parasite species or genera that have been looked for
in primates but never found (these data are important for estimating sampling effort, see below). We
included all parasites when quantifying the relative
amount of sampling by species and country, even
those we could only identify to the generic level; for
species accumulation curves, however, we only
included parasites we could identify to species or
strain to avoid double counting parasite species.
This left us with 161 primate species and 502 parasite species with non-zero prevalence in primates
(73 arthropods, 32 bacteria, 4 fungi, 242 helminths,
93 protozoa and 58 viruses) for these analyses. We
also excluded 22 ‘unsampled’ primates from our
models of variation in sampling effort among primate species because we were unable to locate life
history data for them.
Cooper and Nunn |
30
Evolution, Medicine, and Public Health
| Cooper and Nunn
richness should be sampled more than countries
with low primate species richness. We therefore
estimated sampling relative to primate species richness within each country using the residuals from a
spatial generalized least squares (GLS) model of
ln(sampling events) against ln(primate species richness) using the R package nlme [22] (Supplementary
Data, Appendix 1). We considered countries with
positive residuals as relatively better sampled given
their primate species richness than countries with
negative residuals and displayed these results on a
world map.
Sampling events per primate
¼ f (geographic range size
þ phylogenetic distance from humans
þ substrate use + body sizeÞ
ð1Þ
We fit PGLS models for the 205 primate species for
which we had data (including 40 ‘unsampled’ primates). All variables except substrate use were natural log transformed prior to analysis. We also used
caper [21] to estimate phylogenetic signal (i.e. l,
Supplementary Data, Appendix 1) in the number of
sampling events across primates. Phylogenetic signal is the tendency for related species to resemble
each other more than they resemble species drawn
at random from a phylogenetic tree [23]. High phylogenetic signal, i.e. l values close to 1, indicates that
closely related species have similar numbers of
sampling events, whereas low phylogenetic signal,
i.e. l values close to 0, indicates that the number of
sampling events varies randomly across the phylogeny. We acknowledge that many of our variables—such as sampling effort and geographic
Modeling geographic variation in sampling
effort
We predicted that the following variables would influence sampling effort among countries: (i) primate
species richness (countries with more primates are
likely to be sampled more often than countries with
fewer primates because there are more primates to
sample); (ii) GDP (we expect countries with a high
GDP to have more resources for disease monitoring
and hence to be sampled more often than countries
with a lower GDP) and (iii) airport density (countries
with more airports given their area are likely to be
easier to visit, and hence disease monitoring should
be more frequent). We therefore fit the following
model:
Sampling events per country
¼ f ðGDP + primate species richness
þ airport densityÞ
ð2Þ
We fit spatial GLS models for the 89 countries
that contain primates (including 32 ‘unsampled’
countries). All variables were natural log transformed prior to analysis. Note that the results were
almost identical when using a spherical rather than
an exponential correlation structure, so we only
report the exponential correlation structure results.
Extrapolating parasite species richness for
primates and countries
We first used the R package vegan [24] to plot species
accumulation curves [25] of cumulative parasite
species richness against sampling events for each
primate species (N = 41) and country (N = 21) with
30 or more sampling events. To reduce the effects of
inter-sampling event heterogeneity on the shapes of
the curves, we used rarefaction (Supplementary
Data, Appendix 1) to produce smooth mean species
accumulation curves, with confidence intervals
2 standard deviations from the mean.
Next, we used the data from our curves to predict
parasite species richness for these 41 primates
and 21 countries. We used two nonparametric algorithms, Chao2 and first-order Jackknife (Jackknife1),
which have been recommended in this context
[25–27] (Supplementary Data, Appendix 1). We also
estimated standard errors for our extrapolated parasite species richness values based on references in
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Modeling variation in sampling effort among
primate species
We predicted that the following variables would
influence sampling effort among primate species:
(i) geographic range size (we expect primates with
larger geographic ranges to be sampled more often
than primates with smaller ranges); (ii) phylogenetic
distance from humans (medical research is likely to
focus on our closest relatives, thus we expect them
to be sampled more often); (iii) body size (small
species are easier to capture and so likely to be
sampled more than larger species) and (iv) substrate use (terrestrial species are easier to sample
than arboreal species and thus should be sampled
more often). We therefore fit the following model:
range size—are not biological traits subject to normal evolutionary change. However, they may
still show phylogenetic non-independence, and l
enables quantification of that non-independence regardless of which underlying process generates it.
Identifying future zoonotic disease threats
Oksanen et al. [24], and used these to calculate
upper and lower bounds on extrapolated parasite
species richness. Finally, we plotted species accumulation curves of cumulative parasite species richness for all primate species combined, first using all
parasites and then using arthropods, helminths,
protozoa and viruses separately. We did not use
bacteria and fungi as we had very few of these parasites in our dataset (bacteria = 32 species; fungi = 4
species).
We used R version 2.15.0 [28] for all the analyses
above.
Cooper and Nunn |
31
and Supplementary Fig. S1). As predicted, we found
that the Hominoidea (great apes) were relatively well
sampled in relation to their geographic range size
and were sampled significantly more than other primates (F1,225 = 12.01, P = 0.002). We also found
great heterogeneity in the degree of parasite
sampling across all other major groups of primates,
i.e. Old World monkeys (Cercopithecoidea), New
World monkeys (Platyrrhini) and strepsirrhines
(Strepsirrhini, i.e. lemurs and galagos).
Variation in sampling effort among geographic
regions
RESULTS
Sampling effort was unevenly distributed among primates and ranged from 0 (62 ‘unsampled’ species)
to 630 sampling events (P. troglodytes), with a mean
of 30.71 ± 4.970. We plotted sampling effort in relation to the primates’ geographic range sizes (Fig. 1
Cercopithecoidea
Hominoidea
Platyrrhini
Strepsirrhini
Figure 1. Sampling effort for parasites across the primate phylogeny, assuming that primates should be sampled in proportion
to their geographic range size. Species names have been omitted for clarity (see Supplementary Fig. S1 for a larger version with
species names). Relative sampling effort was quantified using the residuals from a generalized linear model of ln(geographic
range size) against the number of sampling events for each primate species. Gray circles indicate primates with poor sampling
relative to their geographic range size (lower 25% of model residuals), black circles indicate primates with better sampling
relative to their geographic range size (upper 25% of model residuals).
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Variation in sampling effort among primate
species
Sampling effort was also unevenly distributed geographically and ranged from 0 (32 ‘unsampled’
countries) to 416 sampling events (Uganda), with
a mean of 42.70 ± 9.131. Many countries were poorly
sampled in relation to their primate species richness
(Fig. 2), with particularly low levels of sampling in
parts of South East Asia (including China, Thailand,
Cambodia, Myanmar, Laos and Vietnam), Central
and Western Africa (including Sudan, Somalia,
32
Evolution, Medicine, and Public Health
| Cooper and Nunn
Low
Sampling relave to primate species richness
High
Figure 2. Sampling effort for parasites across the world, assuming that countries should be sampled in proportion to their
mate species richness) against the number of sampling events for each country. The colors indicate whether countries are poorly
sampled (low; red) or better sampled (high; yellow) relative to their primate species richness.
Table 1. PGLS model for explaining variation in sampling effort among
primate species
Variable
Slope ± SE
t201
Geographic range size (km2)
Phylogenetic distance (My)
Substrate use
Body size (g)
0.347 ± 0.056
0.189 ± 0.729
0.864 ± 0.155
0.409 ± 0.158
6.222***
0.260
5.572***
2.597*
l = 0.322; r2 = 0.333. Phylogenetic distance is measured as phylogenetic distance from humans in millions of years.
Substrate use is a four-state-ordered variable ranging from fully terrestrial to fully arboreal, with more arboreal species
scored higher. *P < 0.05; ***P < 0.001.
Angola, Zambia, Guinea and Ghana) and South
America (including Bolivia, Ecuador, Venezuela,
Guyana and Suriname).
Modeling variation in sampling effort among
primate species
Sampling effort for primate parasites covaried with
primate geographic range size, body mass and
substrate use: the most sampled primates tend to
have larger geographic ranges, to be larger in
body mass and to be more terrestrial (Table 1 and
Supplementary Table S1). We found no significant
effect of phylogenetic distance between humans and
primates, indicating that both our close relatives and
more distantly related species show evidence for
sampling gaps. Our overall model explained around
a third of the variation in sampling effort (r2 = 0.333),
most of which appears to relate to the geographic
range size and terrestriality of the primates (in single
predictor models, geographic range size: r2 = 0.175;
body size: r2 = 0.061; substrate use: r2 = 0.163).
The number of sampling events for primates
showed significant, but intermediate, levels of phylogenetic signal (N = 228, l = 0.589). This value was
significantly different from both l = 0 and l = 1
(P < 0.001), indicating moderate phylogenetic
non-independence.
Modeling geographic variation in
sampling effort
Predictably, sampling effort across countries
increased with primate species richness. However,
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
primate species richness. Relative sampling effort was quantified using the residuals from a generalized linear model of ln(pri-
Identifying future zoonotic disease threats
Cooper and Nunn |
neither GDP nor airport density significantly predicted sampling effort (Table 2 and Supplementary
Table S2).
Extrapolating parasite species richness for
primates and countries
Figure 3 shows the parasite species accumulation
curve for all primates combined, and for arthropods,
helminths, protozoa and viruses. Across primates
and countries, most parasite species accumulation
curves were starting to show some downward curvature, indicating a declining rate of parasite species
discovery, but in no cases had they approached an
asymptote. Interestingly, the different types of parasites accumulated at different rates, with arthropods
and helminths accumulating at a much faster rate
than protozoa and viruses (see slope differences in
Fig. 3). The parasite species accumulation curve for
P. troglodytes is shown in Supplementary Fig. S2.
Our estimates of parasite species richness using
Chao2 and Jackknife1 are shown in Supplementary
Tables S3–S5. Using Jackknife1, which appears to
give more reasonable values, across the 41 best
sampled primates, on average, we predict that there
should be between 38 and 79% more parasites than
currently recorded in the GMPD (Supplementary
Table S3). For countries, on average, we predict
that there should be between 29 and 40% more
parasites than currently recorded in the GMPD
(Supplementary Table S4). For all 161 primates in
our study combined, we should find between 685
Variable
Slope ± SE
t84
Primate species richness
GDP per capita (USD)
Airport density (airport/km2)
1.241 ± 0.290
0.437 ± 0.246
3.559 ± 3.041
4.273***
1.778
1.170
parasites
= 1.938; GDP = gross domestic product; ***P < 0.001.
500
250
400
200
300
150
200
100
100
50
0
0
1000
2000
3000
sampling events
4000
Arthropods
Helminths
Protozoa
Viruses
0
500
1000
1500
sampling events
Figure 3. Parasite species accumulation curve for all 161 primates combined and all parasites (left-hand side). Parasite species
accumulation curve for all 161 primates combined and helminths, protozoa and viruses separately (right-hand side).
Parasites = cumulative parasite species richness. Arthropods = orange curve; helminths = blue curve, protozoa = green curve
and viruses = red curve. For each curve, the darker line shows the mean curve and the lighter shaded region shows 2 standard
deviations from the mean curve, each obtained from 1000 random permutations of the data. Note that the axes sizes are different
on the left- and right-hand plots.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Table 2. Spatial GLS model with an exponential correlation structure,
explaining variation in sampling effort among countries
0
33
34
Evolution, Medicine, and Public Health
| Cooper and Nunn
and 713 parasites, i.e. between 36 and 42% more
parasites than the 502 parasites identified to species
level that are currently reported in the GMPD for
these 161 primates (Supplementary Table S5).
DISCUSSION
Variation in sampling effort among primate
species and geographic regions
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Virtually every broadscale comparison of sampling
effort, whether sampling disease agents in epidemiology or species in biodiversity studies, reveals bias
in what is sampled. In epidemiology, for example,
sampling may be highest for diseases with easily
detectable symptoms and for areas easily accessed
by medical personnel. Here, we showed that primate
parasites are also unevenly sampled across both primate species and space. This supports previous
studies of sampling gaps in primate parasites that
used an earlier version of the GMPD data [13], but
unlike previous studies, we also investigated the
drivers of sampling effort variation across primates
and geographic areas.
For our analyses of variation in sampling effort
among primate species, we predicted that our
closest relatives (chimpanzees, gorillas and
orangutans) would be relatively well sampled because a great deal of research has focused on these
species. We expected most other primate species to
be comparatively poorly sampled, except when they
are more terrestrial or larger in body mass. As predicted, chimpanzees, gorillas and orangutans were
generally better sampled. However, we found incredible variation in sampling among all other major
primate groups, intermediate phylogenetic signal
in sampling effort and no significant relationship
between sampling effort and the phylogenetic distance from humans to the primate in question.
Instead, our models suggested that most variation
in sampling effort among primates can be explained
by the geographic range size and level of terrestriality
of the primates. Put more simply, the primates that
researchers sample most are the species they encounter most often, including those that are more
likely to be on the ground than in the trees. This is
also supported by the low sampling of nocturnal
primates.
However, our models only explained 30% of
the variation in sampling effort across primates,
indicating that we did not capture every explanation
for this variation. Some primates may be sampled
because they are already intensively studied for
infectious disease, with researchers building on
previous knowledge rather than starting from
scratch. Other species may be sampled thoroughly
because they live in frequently used and wellequipped field sites. Some of the variation in
sampling may have more idiosyncratic explanations; for example, the extensive sampling of some
Macaca species likely reflects their use in medical
research.
We also identified great heterogeneity in sampling
among countries, even among those in the same
region. We found particularly low sampling in parts
of South East Asia, Central and Western Africa, and
South America, and better sampling in Eastern
Africa and Brazil. However, the only variable in our
statistical models that predicted sampling effort
among countries was the primate species richness
of the country, with parasite sampling highest in
countries that have more primates to sample. We
expected that the GDP of the countries would also
positively affect sampling effort, but we found no
evidence for this in our analyses, possibly because
much of the research is not funded by the country in
which the research takes place. In fact, on average,
only 22% of tropical biological field station funding
comes from the host country [29]. Perhaps a better
predictor of sampling effort would be the number of
research stations in a country.
Our parasite species accumulation curves, for
both primate species and countries, were starting
to show some downward curvature, but in no cases
had they approached an asymptote. In these analyses, we only used species or countries with at least
30 sampling events. This indicates that, at least for
these fairly well-sampled primates and countries,
sampling is slowly approaching levels sufficient to
quantify parasite species richness. However, when
we extended these analyses to extrapolate parasite
species richness values, we found that even within
our best sampled primates and countries, we are
missing substantial parasite diversity. On average,
we predicted that 38–79% more parasite species
than currently reported in the GMPD should be
found in our best sampled primate species and
29–40% more parasite species than currently reported in the GMPD should be found in our best
sampled countries. This emphasizes exactly how
poor our sampling is across all primates and
countries. The other primates and countries obviously represent even larger gaps in our knowledge.
Identifying future zoonotic disease threats
Priorities for future research
Identifying parasite sampling gaps across primate
species and geographic regions is only the first step,
we need to find strategies to minimize these
sampling gaps if we are to predict which primate
infectious diseases may emerge in humans. One solution is to set research priorities based on the
sampling gaps [13], for example, by focusing effort
and funding on relatively poorly sampled primate
species, arboreal primates, those with small geographic ranges or those found in relatively poorly
sampled regions of South East Asia, Central and
Western Africa, and South America.
Focusing on relatively poorly sampled primate
species and areas may improve our general understanding of primate parasites, but it is only one factor in predicting risk to humans. For example, hosts
are more likely to share parasites with their close
relatives than with more distant relatives [9, 10].
Thus, continuing to focus our sampling efforts on
parasites of our closest relatives (chimpanzees, gorillas and orangutans) may provide the greatest return in the case of risks to humans. This is
particularly important because we found that chimpanzees are expected to have 33–50% more
35
parasites than currently found in the GMPD. In addition, ecological similarities also influence parasite
sharing among primates, and humans share more
parasites with terrestrial than arboreal primate species [9, 10]. As with sampling effort, this probably
reflects higher contact rates among humans and terrestrial primates compared with arboreal primates.
As a related issue, a host living at higher density is
expected to have higher prevalence of parasites and
may have more contact with human populations or
our domesticated animals, thus increasing
opportunities for host shifts to humans. The large
numbers of zoonotic emerging infectious diseases
with rodent or domesticated animal sources also
highlight the importance of rates of contact and host
density for disease emergence in humans [2, 3, 5].
CONCLUSIONS AND IMPLICATIONS
The aim of this study was to identify where the gaps
lie in our knowledge of primate parasites. We found
that sampling effort was unevenly distributed across
primate species and countries, and that the best
predictors of sampling effort were the geographic
range size or terrestriality of the primate species,
or the primate species richness of the country. We
also found that, according to our extrapolations of
parasite species richness, even our best sampled
primates and countries were still vastly undersampled, typically with only a quarter to two-thirds
of their parasites documented, and possibly even
less given that fungi and bacteria are so underrepresented in current records. This implies that if
we want to predict primate disease emergence in
humans, more sampling for parasites is needed
across all primate species and countries. This is especially important as human populations grow and
spread into new areas where they will encounter
more primates and consequently more diseases.
supplementary data
Supplementary data is available at EMPH online and
the Dryad repository: doi:10.5061/dryad.510sb.
Conflict of interest: none declared.
references
1. Wolfe ND, Dunavan CP and Diamond J. Origins of major
human infectious diseases. Nature 2007;447:279–83.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Sampling was also uneven across types of parasites; when we analyzed arthropods, helminths,
protozoa and viruses separately, we saw faster rates
of parasite accumulation in arthropods and helminths but with little evidence for sufficient
sampling in these species. This is interesting given
that helminths make up 48% of the parasite species
in our study (arthropods = 33%; bacteria = 6%;
fungi = 0.8%; protozoa = 19% and viruses = 12%).
We were not able to fit species accumulation curves
to bacteria or fungi because we have so few bacteria
and fungi species in our dataset. Given the importance of bacterial and fungal emerging diseases in
humans [2, 30], this lack of sampling in wild primates is of concern. Another concern is that although viruses make up only 12% of the parasites
in our dataset, viruses arguably present the greatest
zoonotic disease threat to humans because their
fast rates of evolution will allow them to easily adapt
to new hosts [3, 5]. The relatively low number of
viruses probably reflects detection bias. Viruses are
very hard to detect, and even when detected prove
difficult to classify. Therefore, our results probably
grossly underestimate the number of viruses present in primates.
Cooper and Nunn |
36
Evolution, Medicine, and Public Health
| Cooper and Nunn
2. Jones KE, Patel NG, Levy MA et al. Global trends in
emerging infectious diseases. Nature 2008;451:990–3.
3. Woolhouse MEJ and Gowtage-Sequeria S. Host range and
emerging and reemerging pathogens. Emerg Infect Dis
2005;11:1842–7.
4. Hahn BH, Shaw GM, De KM et al. AIDS as a zoonosis:
scientific and public health implications. Science 2000;287:
607–14.
5. Parrish CR, Holmes EC, Morens DM et al. Cross-species
virus transmission and the emergence of new epidemic
diseases. Microbiol Mol Biol Rev 2008;72:457–70.
6. Roehr B. US officials warn 39 countries about risk of
hantavirus among travellers to Yosemite. BMJ 2012;345:
16. Wilson DE and Reeder DAM. Mammal Species of the
World: A Taxonomic and Geographic Reference, 3rd edn.
Smithsonian Institution Press: Washington, DC, 2005.
17. Jones KE, Bielby J, Cardillo M et al. PanTHERIA: a
species-level database of life-history, ecology and geography of extant and recently extinct mammals. Ecology
2009;90:2648.
18. Nowak RM. Walker’s Mammals of the World. The Johns
Hopkins University Press: Baltimore, MD, 1999.
19. Central Intelligence Agency. (2009) The World Factbook
2009. Washington, DC, 2009, https://www.cia.gov/library/publications/the-world-factbook/index.html
(21
August 2012, date last accessed).
e6054.
7. Grard G, Fair JN, Lee D et al. A novel rhabdovirus
20. Emerson JW, Hsu A, Levy MA et al. Environmental
Performance Index and Pilot Trend Environmental
associated with acute hemorrhagic fever in central
Performance Index. Yale Center for Environmental Law
Africa. PLoS Pathog 2012;8:e1002924.
8. Kilpatrick AM. Globalization, land use, and the invasion of
Comparative Analyses of Phylogenetics and Evolution in R,
R package version 0.5, 2012.
predict pathogen community similarity in wild primates
22. Pinheiro J, Bates D, DebRoy S et al. nlme: Linear and
and humans. Proc R Soc Lond B Biol Sci 2008;275:
1695–701.
Nonlinear Mixed Effects Models, R package version
3.1-103, 2012.
10. Cooper N, Griffin R, Franz M et al. Phylogenetic host spe-
23. Blomberg SP, Garland T and Ives AR. Testing for phylo-
cificity and understanding parasite sharing in primates.
genetic signal in comparative data: behavioral traits are
Ecol Lett 2012;15:1370–7.
11. Brownstein JS, Freifeld CC, Reis BY et al (2007)
HealthMap: internet-based emerging infectious disease
more labile. Evolution 2003;57:717–45.
24. Oksanen J, Blanchet FG, Kindt R et al. vegan: Community
Ecology Package, R package version 2.0-3. 2012.
intelligence. In: Medicine,IO (ed.), Infectious Disease
25. Dove ADM and Cribb TH. Species accumulation curves
Surveillance and Detection: Assessing the Challenges—
Finding Solutions. The National Academies Press:
and their applications in parasite ecology. Trends Parasit
2006;22:568–74.
Washington, DC, 2007, 183–204.
12. Nunn CL and Altizer S. The Global Mammal Parasite
Database: an online resource for infectious disease
records in wild primates. Evol Anthr 2005;14:1–2.
13. Hopkins ME and Nunn CL (2010) Gap analysis and the
geographical distribution of parasites. In: Morand,S and
Krasnov,B (eds), The Biogeography of Host–Parasite
Interactions. Oxford University Press: Oxford, 2010,
129–42.
14. IUCN. IUCN Red List of Threatened Species. Version 2010.4,
26. Poulin R, Krasnov BR and Mouillot D. Host specificity in
phylogenetic and geographic space. Trends Parasit 2011;
27:355–61.
27. Walther BA and Morand S. Comparative performance of
species richness estimation methods. Parasitology 1998;
116:395–405.
28. R Development Core Team. (2012) R: A Language and
Environment for Statistical Computing. R Foundation for
Statistical Computing: Vienna, Austria, 2012.
29. Whitesell S, Lilieholm RJ and Sharik TL. A global survey of
2010. http://www.iucnredlist.org (25 July 2011, date last
tropical biological field stations. BioScience 2002;52:
accessed).
55–64.
15. Arnold C, Matthews LJ and Nunn CL. The 10kTrees web-
30. Fisher MC, Henk DA, Briggs CJ et al. Emerging fungal
site: a new online resource for primate phylogeny. Evol
threats to animal, plant and ecosystem health. Nature
Anthr 2010;19:114–8.
2012;484:186–94.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
West Nile virus. Science 2011;334:323–7.
9. Davies TJ and Pedersen AB. Phylogeny and geography
and Policy: New Haven, CT, 2012.
21. Orme CDL, Freckleton RP, Thomas GH et al. caper:
173
Evolution, Medicine, and Public Health [2013] pp. 173–186
doi:10.1093/emph/eot015
orig inal
research
article
Hygiene and the world
distribution of Alzheimer’s
disease
Molly Fox*1, Leslie A. Knapp1,2, Paul W. Andrews3 and Corey L. Fincher4
1
Division of Biological Anthropology, Department of Anthropology and Archaeology, University of Cambridge, Pembroke
Street, Cambridge CB2 3QY, UK, 2Department of Anthropology, University of Utah, 270 S 1400 E, Salt Lake City, UT 84112,
USA, 3Department of Psychology, Neuroscience & Behaviour, McMaster University, 1280 Main Street W, Hamilton, ON
L8S 4K1, Canada and 4Institute of Neuroscience and Psychology, University of Glasgow, 58 Hillhead Street, Glasgow G12
8QB, UK
*Correspondence address: Department of Biological Anthropology, University of Cambridge, Pembroke Street,
Cambridge CB2 3QY, UK. Tel:+44(0)1223 335638; Fax:+44(0)1223 335460; E-mail: mf392@cam.ac.uk
Received 13 February 2013; revised version accepted 6 August 2013
ABSTRACT
Background and objectives: Alzheimer’s disease (AD) shares certain etiological features with
autoimmunity. Prevalence of autoimmunity varies between populations in accordance with variation
in environmental microbial diversity. Exposure to microorganisms may improve individuals’
immunoregulation in ways that protect against autoimmunity, and we suggest that this may also be
the case for AD. Here, we investigate whether differences in microbial diversity can explain patterns of
age-adjusted AD rates between countries.
Methodology: We use regression models to test whether pathogen prevalence, as a proxy for microbial
diversity, across 192 countries can explain a significant amount of the variation in age-standardized AD
disability-adjusted life-year (DALY) rates. We also review and assess the relationship between pathogen
prevalence and AD rates in different world populations.
Results: Based on our analyses, it appears that hygiene is positively associated with AD risk. Countries
with greater degree of sanitation and lower degree of pathogen prevalence have higher age-adjusted AD
DALY rates. Countries with greater degree of urbanization and wealth exhibit higher age-adjusted AD
DALY rates.
Conclusions and implications: Variation in hygiene may partly explain global patterns in AD rates.
Microorganism exposure may be inversely related to AD risk. These results may help predict AD burden
ß The Author(s) 2013. Published by Oxford University Press on behalf of the Foundation for Evolution, Medicine, and Public Health. This is an Open
Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits
unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Epidemiological evidence for a
relationship between microbial
environment and age-adjusted
disease burden
174
| Fox et al.
Evolution, Medicine, and Public Health
in developing countries where microbial diversity is rapidly diminishing. Epidemiological forecasting is
important for preparing for future healthcare needs and research prioritization.
K E Y W O R D S : Alzheimer’s disease; hygiene hypothesis; inflammation; dementia; pathogen preva-
lence; Darwinian medicine
INTRODUCTION
in the household [22], exhibiting lower rates of
atopic and autoimmune disorders.
Low-level persistent stimulation of the immune
system leads naı¨ve T-cells to take on a suppressive
regulatory phenotype [23] necessary for regulation of
both type-1 inflammation (e.g. autoimmunity) and
type-2 inflammation (e.g. atopy) [12, 24, 25].
Individuals with insufficient immune stimulation
may experience insufficient proliferation of regulatory T-cells (TRegs) [26, 27]. AD has been described
as a disease of systemic inflammation [28], with the
AD brain and periphery exhibiting upregulated type1 dominant inflammation [29]: a possible sign of
TReg deficiency. Immunodysregulation as a consequence of low immune stimulation may contribute
to AD risk through the T-cell system. Altogether, we
suggest that a hygiene hypothesis for Alzheimer’s
disease (HHAD) predicts that AD incidence may
be positively correlated to hygiene.
The period from gestation through childhood is
typically thought to be a critical window of time during which the immune system is established [14, 30,
31], with some authors limiting this critical window
to the first 2 years of life [32]. However, proliferation
of TRegs occurs throughout the life course: there are
age-related increases in number of TRegs [30, 33] with
peaks at adolescence and in the sixth decade [34].
Therefore, it may be not only early-life immune
stimulation that affects AD risk (and perhaps
risk of other types of immunodysregulation) but also
immune stimulation throughout life. Our study is
designed based on the hypothesis that microorganism exposure across the lifespan may be related
to AD risk.
At an epidemiological level, our prediction is
opposite to Finch’s [6] hypothesis that early-life
pathogen exposure should be positively correlated
to AD risk. Both their and our predictions are based
on speculation about T-cell differentiation, although
we reached opposite suppositions. It is clear
that inflammation is upregulated in AD, and Finch
suggested that pathogen exposure, which is proinflammatory, may increase AD risk [6]. We propose
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Exposure to microorganisms is critical for the
regulation of the immune system. The immunodysregulation of autoimmunity has been associated
with insufficient microorganism exposure [1].
Global incidence patterns of autoimmune diseases
reflect this aspect of their etiology: autoimmunity
is inversely correlated to microbial diversity [1, 2].
The inflammation characteristic of Alzheimer’s
disease (AD) shares important similarities with
autoimmunity [3, 4]. The similarity in immunobiology may lead to similarity in epidemiological patterns. For this reason, here we test the hypothesis
that AD incidence may be positively correlated
to hygiene. The possibility that AD incidence is
related to environmental sanitation was previously
introduced by other authors [5, 6], and remains as of
yet untested.
The ‘hygiene hypothesis’ [7] suggests that certain
aspects of modern life (e.g. antibiotics, sanitation,
clean drinking-water, paved roads) are associated
with lower rates of exposure to microorganisms
such as commensal microbiota, environmental
saprophytes and helminths than would have been
omnipresent during the majority of human history
[8]. Low amount of microbe exposure leads to low
lymphocyte turnover in the developing immune system, which can lead to immunodysregulation.
Epidemiological studies have found that populations exposed to higher levels of microbial diversity
exhibit lower rates of autoimmunities as well as
atopies [9], a pattern that holds for countries
with differing degrees of development [10, 11].
Differences in environmental sanitation can partly
explain the patterns of autoimmunity and atopy
across history and across world regions [2, 12].
Patient-based studies have demonstrated that individuals whose early-life circumstances were
characterized by less exposure to benign infectious
agents exhibit higher rates of autoimmune and
atopic disorders. This pattern has been
demonstrated for farm living versus rural non-farm
living [13–15], daycare attendance [16, 17], more
siblings [7, 18], later birth order [19–21] and pets
Fox et al. |
Hygiene and Alzheimer’s epidemiology
METHODS
In order to test whether there is epidemiological evidence for an HHAD, data for a wide range of
countries were compared. Age-standardized disability-adjusted life-year (DALY) rates (henceforth ‘AD
rates’) in 2004 were evaluated in light of proxies for
microbial diversity across a range of years selected
to fully encompass lifespans of individuals in the
AD-risk age group in 2004.
Prevalence of Alzheimer’s and other dementias
We utilized the WHO’s Global Burden of Disease
(GBD) report published in 2009, which presents
data for 2004 [36]. The WHO only reports information for AD and other dementias across different
countries, rather than AD alone. This variable does
not include Parkinsonism [37]. While there are other
types of dementia, AD accounts for between 60%
and 80% of all dementia cases [38], and the
neurobiological distinctions between some of the
subtypes may be vague [39, 40]. The WHO report
presents three variables related to AD: agestandardized DALY, age-standardized deaths, and
DALY for age 60+. There is low correlation between
these three measures (linear regressions after necessary data transformations had R-squared values
0.040; 0.089; 0.041).
The WHO’s GBD report includes the following
ICD-10 codes as ‘Alzheimer’s and other dementias:’
F01, F03, G30–G31, uses an incidence-based approach, 3% time discounting, the West Level 26
and 25 life tables for all countries assuming global
standard life expectancy at birth [41], standard GBD
disability weights [42] and non-uniform age-weights
[43, 44]. DALYs are calculated from years of lost life
(YLL) and years lost due to disability (YLD). YLL data
sources for AD included death registration records
for 112 countries, population-based epidemiological
studies, disease registers and notification systems
[45]. Vital registration data with coverage over 85%
were available for 76 countries, and information for
the remaining 114 countries was calculated using a
combination of cause-of-death modeling, regional
patterns and cause-specific estimates (see [45] for
details). YLD data sources for AD included disease
registers, population surveys and existing epidemiological studies [43]. When only prevalence data were
available, incidence statistics were modeled from
estimates of prevalence, remission, fatality and
background mortality using the WHO’s DisMod II
software [43].
Age-standardized rates were calculated by
adjusting the crude AD DALY for 5-year age groups
by age-weights reflecting the age-distribution of the
standard population [35]. The new WHO World
Standard was developed in 2000 to best reflect
projections of world age-structures for the period
2000–25 [35], and particularly closely reflects the
population age-structures of low- and middleincome countries [46].
DALYs as better measure than death rates.
AD as cause of death is a particularly flawed measurement across different countries, as certain
countries rarely attribute mortality to AD or recognize dementia as abnormal aging [47], and other
causes of death may occur at far greater frequencies
masking AD prevalence. The DALY measurement is
the sum of years lost due to premature mortality and
years spent in disability. The number of years lost
due to premature mortality is based on the standard
life expectancy at the age when death due to AD
occurs [48]. Therefore, this measurement omits the
effects of differential mortality rates previous to
age 65, which accounts for the vast majority of differences in life expectancy between developed and
developing countries [49], opportunely isolating the
effects of later life mortality causes. Therefore, DALY
includes but is not limited to AD as official cause of
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
that at the population level, pathogen exposure,
as a proxy for benign microorganism exposure,
may be protective against AD.
Griffin and Mrak [5] suggested that an HHAD
is justified because hygiene-related changes in
immune development are probably related to AD
etiology, but it is not possible to predict the directionality of this effect. They focus on how hygiene
might influence microglial activation and IL-1
expression, and the opposing effects these changes
may have on AD-relevant pathways.
Our prediction, Finch’s contrary [35] prediction,
and the assertion of Griffin and Mrak that further
research is needed to establish the directionality
of hygiene’s effect on AD motivated this empirical
investigation of whether pathogen prevalence was
correlated to AD rates at the country level. This
information could lead to a better understanding
of the environmental influences on AD etiology
and could help judge the accuracy of existing predictions for future AD burden.
175
176
| Fox et al.
Evolution, Medicine, and Public Health
death, making it a more inclusive variable than AD
as cause of death.
Predictive variables
The hygiene hypothesis is sometimes referred to as
the ‘old friends hypothesis’ [52]. This alternative title
highlights the fact that for the vast majority of our
species’ history, humans would have been regularly
exposed to a high degree of microbial diversity,
and the human immune system was shaped through
natural selection in these circumstances [53]. With
rapidly increasing global urbanization beginning
in the early nineteenth century, individuals began
to experience diminishing exposure to these
‘friendly’ microbes due to diminishing contact with
animals, feces and soil [1]. The microbes that were
our ‘old friends’ previous to this epidemiological
transition whose absence may lead to immunodysregulation in modern environments included gut,
skin, lung and oral microbiota; orofecally transmitted bacteria, viruses and protozoa; helminthes;
environmental saprophytes; and ectoparasites
[1, 52]. The predictive variables in our analysis were
selected for their relevance to these ‘old friends’.
These variables do not specifically reflect exposure
to those microbiota and commensal microorganisms that were omnipresent during our evolutionary
history, but rather cover a more general collection
of microbial exposures. This inclusive approach is
both because of limitations of available datasets,
and because it is not known if particular microbial
elements specifically relate to AD etiology.
Statistical methods
All variables were transformed to optimize distribution symmetry to avoid undue influence by
countries with extreme values for each variable
(Supplementary Section S1). Linear regression was
used to determine whether there was a relationship
between each of the microbial exposure proxies and
AD rate between countries. As there was reason to
suspect predictive overlap among the variables, a
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Age-standardized data better indicator than
age 60+ data.
Age-standardized rates represent what the burden
of AD would be if all countries had the same agedistribution [35]. While the WHO GBD report
also provides DALY for ages 60+, these data are
not age-standardized. Using DALY for ages 60+, a
country with a significant population over age 85,
for instance, would always appear to have greater
AD incidence than a country with few people over
85, regardless of differences in age-specific AD
incidence in the two countries. This is due to the
exponential way in which AD risk increases with
age [50, 51]. Differences in age-specific AD incidence
would be masked by differences in population
age structure. See Supplementary Section S6 for
examples and further explanation.
Murray and Schaller [54] assessed historical disease prevalence for the years 1944–61. Their ‘sevenitem index of historical disease prevalence’ included
leishmanias, schistosomes, trypanosomes, malaria,
typhus, filariae and dengue. Their ‘nine-item index’
included the above diseases plus leprosy and a contemporary measure of tuberculosis. Another measure of pathogen exposure [55] combines data from
the WHO and the ‘Global Infectious Diseases and
Epidemiology Network’ (GIDEON) for 2002 and
2009. The WHO reports countries’ ‘percent population using improved sanitation facilities’, which
‘separate human excreta from human contact’ [56],
and ‘improved drinking-water sources’, which protect the source from contamination [56]. Of 3 years
for which sufficient data were available, we chose to
look at 1995 (Supplementary Section S1). ‘Infant
mortality rate’ (IMR) is measured as number of infant mortalities per 1000 live births [57]. We consulted historical IMR statistics [58] for years 1900–
2002. The World Bank reports countries’ ‘gross national income’ (GNI) per capita at purchasing power
parity (PPP) [57], the economic variable that contributes to the composite Human Development Index
and is therefore an indicator of the contribution of
wealth to quality of life. We looked at both the earliest
year with sufficient data available, 1970 and 2004.
‘Gross domestic product’ (GDP) per capita (PPP)
is an economic measure of a country divided by its
mid-year population [57], often used as a measure of
standard of living, although it is not a direct assessment of this. We consulted Maddison’s historical
economic statistics [59] for GDP from 1900 to
2002. Children growing up in rural areas may be
more exposed to pathogens due to factors including
unpaved roads, contact with livestock and animal
feed, and consumption of unprocessed milk [13,
14]. The World Bank reports the ‘percent of a country’s population living in urban areas’ [57], and we
looked at both the earliest year with data available,
1960 and 2004.
Fox et al. |
Hygiene and Alzheimer’s epidemiology
principal component regression was conducted
(Supplementary Section S4).
To explore which part of the lifespan has the most
bearing on AD, we measure the strength of the same
proxy across several years as a predictor of 2004 AD
prevalence. We compare historical and contemporary disease prevalence, and GNI and urban living.
Also, two of the proxies had more detailed historical
information available: GDP [59] and IMR [58]. GDPs
and IMRs from years spanning 1900–2002 were
compared. Each year’s GDP and IMR were
compared to 2004 AD rates by linear regression.
The resulting regression coefficients were interpreted as the degree to which each year’s GDP or
IMR and 2004 AD were correlated (Supplementary
Section S5).
Statistical analyses revealed highly significant
relationships between various measures of hygiene
and age-adjusted AD DALY. High levels of pathogen
exposure were associated with lower AD rates.
Countries with higher disease and pathogen prevalence and IMR had lower 2004 AD rates (Table 1,
Figs 1–4, Fig. S1). These results are consistent with
a protective role of exposure to microbial diversity
against AD, and support an HHAD.
Greater degree of hygiene, and therefore potentially lower degree of microorganism exposure, was
associated with higher AD rates. Countries with
a higher percent of the population using improved
sanitation facilities, improved drinking-water
sources, living in urbanized areas, higher GNI and
GDP per capita (PPP) had higher rates of AD in 2004
(Table 1, Fig. 3, Figs S2–S5). The existence of a
strong positive correlation between historical levels
of sanitation and AD rate in 2004 is consistent with
the predictions of the HHAD.
Comparing predictive power of data from
different years
Our results indicate that microbe exposure across
the lifespan, not necessarily just during early-life,
is associated with AD burden. Predictive variables
reflecting the years during which elderly people in
2004 would have experienced their earlier years of
life were not consistently better indicators of 2004
AD rate compared to years during which they would
have spent their mid or later life years, consistent
with observations of lifelong TReg proliferation
[30, 33, 34]. Contemporary measures were more
powerful indicators in the cases of contemporary
parasite stress versus historical disease prevalence,
and GNI in 2004 versus 1970, while percent of
population living in urban areas in 1960 versus
2004 was a slightly better indicator of AD in 2004
(Fig. 3).
A series of linear regressions measuring the relationship between IMRs from 1900 to 2002 and AD
in 2004 revealed that IMR was significantly
Table 1. An evaluation of proxies for hygiene as predictors of Alzheimer burden
Variable
Year
N
R2
P
Direction
consistent
with HHAD?
Historical disease prevalence nine-item
Historical disease prevalence seven-item
Contemporary parasite stress
Improved sanitation facilities
Improved drinking-water sources
GNI
1944–61
1944–61
2002, 2009
1995
1995
1970
2004
148
191
190
177
178
171
174
0.358
0.242
0.373
0.334
0.327
0.296
0.328
****
****
****
****
****
****
****
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Urban %
1960
2004
187
187
0.282
0.204
****
****
Yes
Yes
Linear regression for each predictive variable after transformation to optimize symmetry (Supplementary Section S1).
Principal component regression for parasite stress, sanitation facilities, drinking-water sources, infant mortality,
GNI and urbanization had a higher R2 than any predictive variable on its own (Supplementary Section S4).
****P < 0.0000.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
RESULTS
177
178
| Fox et al.
Evolution, Medicine, and Public Health
Countries with higher contemporary parasite prevalence have lower age-standardized rates of Alzheimer’s in 2004. N = 190,
R2 = 0.373, P < 0.0000. Contemporary parasite stress [55] combines years 2002 and 2009, and the variable is transformed by
adding a constant and taking square root. The Alzheimer variable is transformed by adding a constant and taking the natural log
of 2004 Alzheimer age-standardized DALY [36]
negatively correlated to 2004 AD rates with an
increasing annual ability to explain variance
(Fig. 4, Supplementary Section S5). This analysis
was restricted to countries with IMR data across
1900–2002 (N = 45) and thus was free from sample
size biases.
Similarly, a series of linear regressions measuring
the relationship between GDPs from 1900 to 2002
and AD in 2004 revealed that GDP was significantly
positively correlated to 2004 AD rates, although the
increasing annual ability to explain variance was only
consistently true for years 1940–2002 (Fig. S7).
Principal component analysis
A principal component regression analysis
demonstrated that the combined statistical impact
of the proxies for microbial diversity had a
significant effect on AD rates. There were high degrees of correlation between disease prevalence,
parasite stress, sanitation facilities, improved
drinking water, GNI and urban population
(Table S1). There was a significant relationship between the principal component and 2004 AD rate
(N = 159, R2 = 0.425, P < 0.0000) (Supplementary
Section S4, Fig. S6).
DISCUSSION
Inflammation plays an important role in AD pathogenesis, and previous authors have hypothesized
that immune stimulation could increase AD risk
through T-cell [6] or microglial action [5, 60].
However, our results indicate that some immune
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Figure 1. Countries’ parasite stress negatively correlated to Alzheimer’s burden
Fox et al. |
Hygiene and Alzheimer’s epidemiology
Countries with historically more infectious disease have lower age-standardized rates of Alzheimer’s in 2004. N = 148, R2 = 0.358,
P < 0.0000. Historical disease prevalence 9-item data compiled by Murray and Schaller for years 1944–61 [54]. Alzheimer prevalence variable is transformed by adding a constant and taking the natural log of 2004 Alzheimer age-standardized DALY [36]
stimulation may protect against AD risk. The traditional hygiene hypothesis has effectively explained
prevalence rates of autoimmunity and atopy in
many populations. We have found that the hygiene
hypothesis can help explain some patterns in rates
of AD, as predicted previously [5, 8] with a direction
of correlation consistent with our predictions based
on T-cell differentiation.
In support of the HHAD, we found that proxies
for immune stimulation correlated to AD rates
across countries. These proxies included historical
disease prevalence, parasite stress, access to sanitation facilities, access to improved drinking-water
sources, IMR, GNI, GDP and urbanization. These
variables’ relationships to AD rate are independent
of countries’ age structures.
Further evidence supporting this hypothesis
from previous studies includes higher incidence of
AD in developed compared to developing countries,
comparison to incidence patterns of other disorders
with similar immunobiology and consideration of
the etiology of inflammation in AD.
Higher Alzheimer’s risk in industrialized
countries and urban environments
People living in developed compared to developing
countries have higher rates of AD. AD incidence at
age 80 is higher in North America and Europe than
in other countries [51]. A meta-analysis found
that dementia incidence doubled every 5.8 years in
high income countries and every 6.7 years in lowand middle-income countries, where the overall
incidence of dementia was 36% lower [39] (but
see [61]). Another meta-analysis found that agestandardized AD prevalences in Latin America,
China and India were all lower than in Europe,
and within those regions, lower in rural compared
to urban settings [62]. In a meta-analysis of
Asian countries, the wealthier ones had higher
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Figure 2. Countries’ historical disease prevalence negatively correlated to Alzheimer’s burden 2004
179
180
| Fox et al.
Evolution, Medicine, and Public Health
Countries with more of the population living in urban areas in 1960 have higher age-standardized rates of Alzheimer’s in 2004.
N = 187, R2 = 0.282, P < 0.0000. Urbanization data [57] are transformed by adding a constant and taking the square root. The
Alzheimer prevalence variable is transformed by adding a constant and taking the natural log of 2004 Alzheimer age-standardized
DALY [36]
age-standardized rates of AD prevalence than
the poorer countries [63]. A meta-analysis of
rural versus urban AD incidence demonstrated
that, overall, rural living was associated with higher
incidence, but when only high-quality studies were
considered, rural living was associated with reduced
incidence [47].
In developing countries and rural environments,
there are higher rates of microbial diversity,
exposure and infection. It is well-documented
that rates of atopies including allergies, hay fever,
eczema and asthma are lower in developing than
developed countries [12, 64], as are autoimmunities
such as multiple sclerosis, type-1 diabetes mellitus
and Crohn’s disease [65], and the same pattern
holds for rural versus urban environments [2, 14,
15] (but see [66]). We found that wealthier countries
(Figs S3 and S4) and more urbanized countries
(Fig. 2) had greater AD rates after adjustment for
population age-structures.
Alzheimer’s risk changes with environment
Previous studies have found that individuals from
similar ethnic backgrounds living in low versus high
sanitation environments exhibit low versus high risk
of AD. African Americans in Indiana had higher agespecific mortality-adjusted incidence rates of AD
than Yoruba in Nigeria [67], and while there is no
indication that the sampled African Americans had
any Yoruba or even Nigerian heritage, the results
are at least consistent with our predictions for the
HHAD. There is also evidence that immigrant populations exhibit AD rates intermediate between their
home country and adopted country [63, 68–70].
Moving from a high-sanitation country to a
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Figure 3. Countries’ urbanization 1960 positively correlated to Alzheimer’s burden 2004
Fox et al. |
Hygiene and Alzheimer’s epidemiology
For each year (x), the regression coefficient (y) of the correlation between year’s (x) IMR and AD burden in 2004. IMR for the
various years were transformed by square root and natural log. See Supplementary Section S5. All correlations were significant
besides the years 1900, 1901 and 1911. Significance: 1902–19, P < 0.05; 1920–44, P < 0.01; 1945–2002, P < 0.000
low-sanitation country can decrease immigrants’ AD
risk [47] [e.g. Italy to Argentina [71] (but see [72])].
Our results are consistent with the possibility that
living in different countries confers different AD risk,
stratified by sanitation (Figs 1 and 2).
Alzheimer’s risk and family size
Having relatively more siblings would be expected
to correspond to higher rates of early-life immune
activation because of more interaction with
other children who may harbor ‘friendly’ commensal
microorganisms. Individuals with more siblings
have lower atopy prevalence [7, 18, 20]. Certain
studies have found that individuals with AD have
fewer siblings than controls [73, 74], but Moceri
and coworkers found no difference [75], and the
opposite [76].
Younger siblings compared to first-borns would
be expected to have higher rates of microorganism
exposure because this would indicate higher proportion of childhood spent in contact with siblings.
Later-born siblings have lower rates of atopy [7, 20,
66, 77, 78]. No birth order effect has been observed
for AD [79].
Lymphocytes and Alzheimer’s
T cells are important modulators of immune function and have been identified as the major affected
system in trends attributable to the hygiene hypothesis [8]. TReg cells become more abundant with age
in healthy subjects [80]. Whether the proliferation
of T cells in AD [80, 81] is effector or regulatory has
been the subject of controversy. Recently, Larbi et al.
re-analyzed their earlier postulation that TReg cells
may be upregulated in AD [82]. When a marker for
TRegs, FoxP3 [83], was considered, the authors
determined that the upregulation occurring was proliferation of activated effector T cells [84], consistent
with our predictions for an HHAD.
One study found that TReg cells were increased in
mild cognitive impairment (MCI) and TReg-induced
immunosuppression was stronger in MCI than in
AD or controls [85]. This could be evidence that
among people with predisposition toward AD (i.e.
genetic or environmental risk factors), those with
adequate TReg function may merely develop MCI,
while those with inadequate TReg function may
develop AD. Also, it may be that during AD pathogenesis, those with adequate TReg function may
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Figure 4. Contemporary environment may be better indicator of Alzheimer’s burden than early-life environment
181
182
| Fox et al.
Evolution, Medicine, and Public Health
ApoE
ApoE-e4 is an allele that has pro-inflammatory
effects, and increases BBB permeability [95].
Compromise of the BBB would make other proinflammatory mechanisms become exacerbated risk
factors for AD, when otherwise, inflammation might
have been limited to the periphery. There is already
evidence that ApoE alleles confer different degrees
of AD risk in different environments. While the e4 allele was an AD risk factor among African Americans,
this was not the case among the Yoruba in Nigeria
[96], nor is e4 associated with increased risk among
Nyeri Kenyans, Tanzanians [97], Wadi Ara Arab
Israelis [98], Khoi San [99], Bantu and Nilotic
African cohorts [100]. These patterns support the
possibility that environmental factors such as microbial diversity interact with inflammatory pathways to
influence AD risk.
Darwinian medicine
One of the major aims of Darwinian medicine is
understanding human health and disease within
the context of our species’ evolutionary history
[101]. In their formative 1991 paper, Williams and
Nesse discussed the promise for Darwinian medicine to make strides in aging research [102]. There is
growing enthusiasm for Darwinian medicine
approaches to understanding aging [103], specifically neurodegeneration [104] and more specifically
AD (Supplementary Section S7).
The hygiene hypothesis is an important contribution from Darwinian medicine [52]. Recently,
authors have suggested extending the hygiene hypothesis toward explaining obesity [105] and certain
cancers [106, 107]. Thus far, the hygiene hypothesis
has been discussed mostly in the context of consequences for early and mid-life pathologies, and
we feel that more attention should be paid to its
potential to explain patterns of disease in later life.
Study limitations
Some critics suggest that age-weighting is not reflective of social values [42, 108]. A systematic review
of GBD methodologies determined that the WHO’s
report [42] represented a rigorously comprehensive
methodology, but is still limited by the fact that many
developing countries use paper-based health surveillance based more on estimates and projections
than actual counts. It is also possible that certain
predictive variables, especially urbanization, are
related to surveillance accuracy.
Large-scale epidemiological studies often suffer
from lack of surveillance and statistical limitations.
Much of public health and epidemiological research
is based on correlational studies, inherently
limited by the inability to demonstrate causality.
Nonetheless, these types of investigation provide
necessary perspective of environmental influences
on biological mechanisms, and help evaluate public
health burden and predict future healthcare needs.
Importance and applications
As AD becomes an increasingly global epidemic,
there is growing need to be able to predict AD rates
across world regions in order to prepare for the
future public health burden [39]. It is in low- and
middle-income countries that the sharpest rise is
predicted to occur in the coming decades [109].
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
linger in the MCI phase for longer, while those with
inadequate TReg function may progress to AD more
rapidly.
It has already been well established that insufficient TReg numbers lead to excessive TH1 inflammation in autoimmunity and TH2 inflammation in atopy
[52, 86]. The inflammation characteristic of AD is
TH1 dominant [29, 87, 88]. In AD, amyloid-b
activates microglia and astrocytes, which stimulate
preferential TH1 proliferation [89], and AD patients
have elevated levels of TH1-associated cytokines
[90–92]. Non-steroidal anti-inflammatory drugs
have been demonstrated to be protective against
AD [93].
T cells may influence AD pathogenesis not only
from within the brain but also from the periphery.
Activated TH1 cells in the periphery could secrete
pro-inflammatory cytokines, which cross the blood–
brain barrier (BBB) and directly activate microglia
and astrocytes in the brain, as well as indirectly induce inflammation by activating dendritic cells [89].
It should be investigated whether hygiene directly
affects the development of microglia. Microglia
sometimes exhibit a non-inflammatory phenotype,
in contrast to the pro-inflammatory phenotype
typical of the activated state in the context of
AD and other brain insults [5, 94], It is unknown
whether hygiene would promote development of
the inflammatory or non-inflammatory phenotypes
[5], and further research is needed to establish
the effect of microbial deprivation on microglial
development.
Fox et al. |
Hygiene and Alzheimer’s epidemiology
We suggest that ecological changes that will occur in
low- and middle-income countries as they become
more financially developed may affect AD rates in
ways not currently appreciated. Other authors have
also discussed the ways in which infectious disease
may affect patterns of non-communicable chronic
disease [110]. Disease predictions affect public
policy, healthcare prioritization, research funding
and resource allocation [111]. Better methods for
estimating AD rates could affect these policies and
strategies [112].
The next step in testing the HHAD should be
to directly investigate patients’ immune activity
throughout life and AD risk, preliminarily in a
cross-sectional study utilizing medical records,
serum analysis, or reliable interview techniques.
nutrition.
Proc
Natl
Acad
Sci
USA
2010;
107(Suppl. 1):1718–24.
7. Strachan DP. Hay fever, hygiene, and household size.
Br Med J 1989;299:1259–60.
8. Rook GAW (ed.). Introduction: The changing microbial
environment, Darwinian medicine and the hygiene hypothesis,
The
Hygiene
Hypothesis
and
Darwinian
Medicine. Springer: Basel, Switzerland, 2009, 1–27.
9. Lynch NR, Hagel I, Perez M et al. Effect of anthelmintic
treatment on the allergic reactivity of children in a tropical
slum. J Allergy Clin Immunol 1993;92:404–11.
10. Von Mutius E, Fritzsch C, Weiland SK et al. Prevalence of
asthma and allergic disorders among children in united
Germany: a descriptive comparison. Br Med J 1992;305:
1395–9.
11. Cookson WOCM, Moffatt MF. Asthma—an epidemic in
the absence of infection? Science 1997;275:41–2.
12. Romagnani S. The increased prevalence of allergy and the
Supplementary data are available at EMPH online.
immune suppression, or both? Immunology 2004;112:
352–63.
13. Leynaert B, Neukirch C, Jarvis D et al. Does living on a farm
during childhood protect against asthma, allergic rhinitis,
acknowledgements
and atopy in adulthood? Am J Respir Crit Care Med 2001;
164:1829–34.
The authors would like to thank the Gates Cambridge Trust
14. von Mutius E, Vercelli D. Farm living: effects on child-
and Tom LeDuc. They also appreciate the insights of the
hood asthma and allergy. Nat Rev Immunol 2010;10:
associate editor and anonymous reviewers who handled this
861–8.
manuscript.
15. Kilpelainen M, Terho E, Helenius H et al. Farm environment in childhood prevents the development of allergies.
funding
Clin Exp Allergy 2000;30:201–8.
16. Haby MM, Marks GB, Peat JK et al. Daycare attend-
M.F. is supported by Gonville & Caius College, Cambridge.
ance before the age of two protects against atopy
C.L.F. is supported by ESRC grant ES/I031022/1.
in preschool age children. Pediatr Pulmonol 2000;30:
377–84.
Conflict of interest: None declared.
17. Kramer U, Heinrich J, Wjst M et al. Age of entry to day
nursery and allergy in later childhood. Lancet 1999;353:
references
450–4.
18. Von Mutius E, Martinez FD, Fritzsch C et al. Skin test re-
1. Rook GAW. Hygiene hypothesis and autoimmune
diseases. Clin Rev Allergy Immunol 2012;42:5–15.
activity and number of siblings. Br Med J (Clin Res Ed)
1994;308:692–5.
2. Rook GAW. The hygiene hypothesis and the increasing
19. Westergaard T, Rostgaard K, Wohlfahrt J et al. Sibship
prevalence of chronic inflammatory disorders. Trans R
characteristics and risk of allergic rhinitis and asthma.
Soc Trop Med Hyg 2007;101:1072–4.
3. D’Andrea MR. Add Alzheimer’s disease to the list of
autoimmune diseases. Med Hypotheses 2005;64:458–63.
4. D’Andrea MR. Evidence linking neuronal cell death to
autoimmunity in Alzheimer’s disease. Brain Res 2003;
982:19–30.
Am J Epidemiol 2005;162:125–132.
20. Matricardi PM, Franzinelli F, Franco A et al. Sibship size,
birth order, and atopy in 11,371 Italian young men. J Allergy
Clin Immunol 1998;101:439–44.
21. Strachan DP, Taylor EM, Carpenter RG. Family structure,
neonatal infection, and hay fever in adolescence. Arch Dis
medicine and the hygiene hypothesis in Alzheimer patho-
Child 1996;74:422–6.
22. Apter AJ. Early exposure to allergen: is this the cat’s meow,
genesis? In: Graham AWR (ed.) The Hygiene Hypothesis
or are we barking up the wrong tree? J Allergy Clin Immunol
5. Griffin WST, Mrak RE. Is there room for Darwinian
and Darwinian Medicine. Springer: Basel, Switzerland,
2009, 257–78.
6. Finch CE. Evolution of the human lifespan and
diseases of aging: roles of infection, inflammation, and
2003;111:938–46.
23. Abbas AK, Lohr J, Knoechel B. Balancing autoaggressive
and protective T cell responses. J Autoimmun 2007;28:
59–61.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
hygiene hypothesis: missing immune deviation, reduced
supplementary data
183
184
| Fox et al.
Evolution, Medicine, and Public Health
24. Yazdanbakhsh M, Kremsner PG, Van Ree R. Allergy,
parasites, and the hygiene hypothesis. Science 2002;296:
490–4.
25. Chang JS, Wiemels JL, Buffler PA. Allergies and childhood
leukemia. Blood Cells Mol Dis 2009;42:99–104.
Evidence for Health Policy. World Health Organization:
Geneva, 2001.
42. Polinder S, Haagsma JA, Stein C et al. Systematic review
of general burden of disease studies using disabilityadjusted life years. Popul Health Metr 2012;10:1–15.
26. Vukmanovic-Stejic M, Zhang Y, Cook JE et al. Human
43. Mathers CD, Lopez AD, Murray CJ. The burden of disease
CD4+ CD25hi Foxp3+ regulatory T cells are derived by
and mortality by condition: data, methods, and results for
rapid turnover of memory populations in vivo. J Clin
2001. In: Lopez AD, Mathers CD, Ezzati M, Jamison DT
Invest 2006;116:2423–33.
27. Sakaguchi S, Miyara M, Costantino CM et al. FOXP3+
regulatory T cells in the human immune system. Nat Rev
Immunol 2010;10:490–500.
28. Pellicano M, Larbi A, Goldeck D et al. Immune profiling of
Alzheimer patients. J Neuroimmunol 2011;242:52–9.
et al. (eds) Global Burden of Disease and Risk Factors. World
Bank: Washington, DC, 2006, 45–241.
44. Williams A. Calculating the global burden of disease: time
for a strategic reappraisal? Health Econ 1999;8:1–8.
45. Mathers C, Fat DM, Boerma J. The Global Burden of Disease:
29. Schwarz MJ, Chiang S, Muller N et al. T-helper-1 and
2004 Update. World Health Organization: Geneva, 2008.
46. World Health Organization. Information on Estimation
T-helper-2 responses in psychiatric disorders. Brain
Methods. Global Health Observatory (GHO) 2013. http://
Behav Immun 2001;15:340–70.
www.who.int/gho/ncd/methods/en/ (2 August 2013,
30. Prescott SL. Promoting tolerance in early life: pathways
date last accessed).
47. Russ TC, Batty GD, Hearnshaw GF et al. Geographical
31. Smith M, Tourigny MR, Noakes P et al. Children with egg
variation in dementia: systematic review with meta-
allergy have evidence of reduced neonatal CD4+ CD25+
analysis. Int J Epidemiol 2012;41:1012–32.
48. World Health Organization. Mortality and Burden of
CD127 lo/- regulatory T cell function. J Allergy Clin
Immunol 2008;121:1460–6.
32. Nesse RM, Williams GC. Allergy. Why We Get Sick:
The New Science of Darwinian Medicine. Chapter 11.
New York: Random House, 1996, 158–70.
Disease Estimates for WHO Member States in 2004.
Geneva: WHO, 2009. http://www.who.int/healthinfo/
global_burden_disease/estimates_country/en/index.
html (29 July 2013, date last accessed).
33. Gregg R, Smith C, Clark F et al. The number of human
49. Kalaria RN, Maestre GE, Arizaga R et al. Alzheimer’s
peripheral blood CD4+ CD25high regulatory T cells
disease and vascular dementia in developing countries:
increases with age. Clin Exp Immunol 2005;140:540–6.
prevalence, management, and risk factors. Lancet Neurol
34. Caetano Faria AM, Monteiro de Moraes S, Ferreira de
2008;7:812–26.
Freitas LH et al. Variation rhythms of lymphocyte subsets
50. Brookmeyer R, Gray S, Kawas C. Projections of
during healthy aging. Neuroimmunomodulation 2008;15:
Alzheimer’s disease in the United States and the public
365–79.
health impact of delaying disease onset. Am J Public Health
35. Ahmad OB, Boschi-Pinto C, Lopez AD et al. Age
Standardization of Rates: A New WHO Standard. World
Health Organization: Geneva, 2001.
1998;88:1337–42.
51. Ziegler-Graham K, Brookmeyer R, Johnson E et al.
Worldwide variation in the doubling time of Alzheimer’s
36. World Health Organization. Burden of Disease. Health
disease incidence rates. Alzheimers Dement 2008;4:
Statistics and Health Information Systems: Disease and
316–23.
52. Rook G. 99th Dahlem conference on infection, inflamma-
Injury Country Estimates. Geneva: WHO, 2009. http://
www.who.int/healthinfo/global_burden_disease/
tion and chronic inflammatory disorders: Darwinian
estimates_country/en/index.html (29 July 2013, date last
medicine and the ‘‘hygiene’’ or ‘‘old friends’’ hypothesis.
accessed).
Clin Exp Immunol 2010;160:70–9.
37. World Health Organization. Deaths and DALYs 2004:
53. Nesse RM, Williams GC. Why We Get Sick: The
Analysis Categories and Mortality Data Sources (Annex C).
New Science of Darwinian Medicine. Random House:
Global Burden of Disease: 2004 Update 2009. http://www.
New York, 1996.
54. Murray DR, Schaller M. Historical prevalence of infectious
who.int/ (29 August 2013, date last accessed).
38. Alzheimer’s Association. 2012 Alzheimer’s disease facts
and figures. Alzheimer Dement 2012;8:131–68.
39. Sosa-Ortiz AL, Acosta-Castillo I, Prince MJ. Epidemiology
of dementias and Alzheimer’s disease. Arch Med Res 2012;
43:600–8.
40. Ince P. Pathological correlates of late-onset dementia in a
multicentre, community-based population in England and
Wales. Lancet 2001;357:169–75.
41. Mathers C, Vos T, Lopez A et al. National Burden of Disease
Studies: A Practical Guide. WHO Global Program on
diseases within 230 geopolitical regions: A tool for
investigating origins of culture. J Cross Cult Psychol 2010;
41:99–108.
55. Fincher CL, Thornhill R. Parasite-stress promotes in-group
assortative sociality: the cases of strong family ties and
heightened religiosity. Behav Brain Sci 2012;35:61–79.
56. World Health Organization. MDG 7: Environment
Sustainability. Global Health Observatory Data Repository,
2011. http://apps.who.int/ghodata (2 August 2013, date
last accessed).
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
and pitfalls. Curr Allergy Clin Immunol 2008;21:64–9.
Fox et al. |
Hygiene and Alzheimer’s epidemiology
57. The World Bank. World Development Indicators, 2011.
74. Alafuzoff I, Almqvist E, Adolfsson R et al. A comparison of
http://data.worldbank.org (29 July 2013, date last
multiplex and simplex families with Alzheimer’s disease/
accessed).
senile dementia of Alzheimer type within a well defined
58. Abouharb MR, Kimball AL. A new dataset on infant mortality rates, 1816–2002. J Peace Res 2007;44:743–754.
for
Economic
Co-operation
and
Development (OECD): Paris, 2003.
neuropathogenic factors in HIV infection: pathogenic
similarities to Alzheimer’s disease. J Neuropathol Exp
Neurol 1994;53:231–8.
data and birth certificates to reconstruct the early-life
opment of Alzheimer’s disease. Epidemiology 2001;12:
383–9.
76. Moceri VM, Kukull W, Emanuel I et al. Early-life risk factors
and the development of Alzheimer’s disease. Neurology
61. World Health Organization. Dementia: A Public Health
WHO–Alzheimer’s
75. Moceri VM, Kukull WA, Emanual I et al. Using census
socioeconomic environment and the relation to the devel-
60. Stanley LC, Mrak RE, Woody RC et al. Glial cytokines as
Priority.
population. J Neural Transm Park Dis Dement Sect 1994;
7:61–72.
59. Maddison A. The World Economy: Historical Statistics.
Organisation
Disease
International:
Geneva, 2012.
62. Rodriguez JJL, Ferri CP, Acosta D et al. Prevalence of dementia in Latin America, India, and China: a populationbased cross-sectional survey. Lancet 2008;372:464–74.
2000;54:415–20.
77. Strachan DP, Harkins LS, Johnston IDA et al. Childhood
antecedents of allergic sensitization in young British
adults. J Allergy Clin Immunol 1997;99:6–12.
78. Jarvis D, Chinn S, Luczynska C et al. The association of
family size with atopy and atopic disease. Clin Exp Allergy
1997;27:240–5.
Interethnic differences in dementia epidemiology: global
79. Knesevich JW, LaBarge E, Martin RL et al. Birth order and
and Asia-Pacific perspectives. Dement Geriatr Cogn Disord
maternal age effect in dementia of the Alzheimer type.
2010;30:492–8.
Psychiatry Res 1982;7:345–50.
64. Beasley R. Worldwide variation in prevalence of symptoms
80. Rosenkranz D, Weyer S, Tolosa E et al. Higher frequency of
of asthma, allergic rhinoconjunctivitis, and atopic
regulatory T cells in the elderly and increased suppressive
eczema: ISAAC. Lancet 1998;351:1225–32.
65. Bach JF. The effect of infections on susceptibility to
autoimmune and allergic diseases. New Engl J Med
2002;347:911–20.
66. Tallman PS, Kuzawa C, Adair L et al. Microbial exposures in
activity in neurodegeneration. J Neuroimmunol 2007;188:
117–27.
81. Togo T, Akiyama H, Iseki E et al. Occurrence of T cells in
the brain of Alzheimer’s disease and other neurological
diseases. J Neuroimmunol 2002;124:83–92.
infancy predict levels of the immunoregulatory cytokine
82. Larbi A, Pawelec G, Witkowski JM et al. Dramatic
interleukin-4 in filipino young adults. Am J Hum Biol
shifts in circulating CD4 but not CD8 T cell subsets
2012;24:446–53.
67. Hendrie HC, Osuntokun BO, Hall KS et al. Prevalence of
in mild Alzheimer’s disease. J Alzheimers Dis 2009;17:
91–103.
Alzheimer’s disease and dementia in two communities:
83. Vaeth M, Schliesser U, Muller G et al. Dependence on
Nigerian Africans and African Americans. Am J Psychiatry
nuclear factor of activated T-cells (NFAT) levels discrim-
1995;152:1485–92.
inates conventional T cells from Foxp3+regulatory T cells.
68. Yamada T, Kadekaru H, Matsumoto S et al. Prevalence of
dementia in the older Japanese-Brazilian population.
Psychiatry Clin Neurosci 2002;56:71–5.
69. Graves A, Larson E, Edland S et al. Prevalence of dementia
Proc Natl Acad Sci 2013;109:16258–63.
84. Pellicano M, Larbi A, Goldeck D et al. Immune
profiling of Alzheimer patients. J Neuroimmunol 2012;
242:52–9.
and its subtypes in the Japanese American population of
85. Saresella M, Calabrese E, Marventano I et al. PD1 negative
King County, Washington State The Kame Project. Am J
and PD1 positive CD4+T regulatory cells in mild cognitive
Epidemiol 1996;144:760–71.
70. Hendrie HC, Ogunniyi A, Hall KS et al. Incidence of
impairment and Alzheimer’s disease. J Alzheimers Dis
dementia and Alzheimer disease in 2 communities.
86. Poojary KV, Yi-chi MK, Farrar MA. Control of Th2-mediated
JAMA 2001;285:739–47.
71. Roman G. Gene-environment interactions in the ItalyArgentina ‘‘Colombo 2000’’ project. Funct Neurol 1998;
13:249–52.
72. Adelman S, Blanchard M, Rait G et al. Prevalence of
dementia in African-Caribbean compared with UK-born
White older people: two-stage cross-sectional study. Br J
Psychiatry 2011;199:119–25.
73. Kim J, Stewart R, Shin I et al. Limb length and dementia in
an older Korean population. J Neurol Neurosurg Psychiatry
2010;21:927–38.
inflammation by regulatory T cells. Am J Pathol 2010;177:
525–31.
87. Remarque E, Bollen E, Weverling-Rijnsburger A et al.
Patients with Alzheimer’s disease display a pro-inflammatory phenotype. Exp Gerontol 2001;36:171–6.
88. Singh V, Mehrotra S, Agarwal S. The paradigm of Th1 and
Th2 cytokines. Immunol Res 1999;20:147–161.
89. Town T, Tan J, Flavell RA et al. T-cells in Alzheimer’s
disease. Neuromolecular Med 2005;7:255–64.
90. Heneka MT, O’Banion MK. Inflammatory processes in
Alzheimer’s disease. J Neuroimmunol 2007;184:69–91.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
63. Venketasubramanian N, Sahadevan S, Kua E et al.
2003;74:427–32.
185
186
| Fox et al.
Evolution, Medicine, and Public Health
91. Swardfager W, Lanctot K, Rothenburg L et al. A metaanalysis of cytokines in Alzheimer’s disease. Biol
Psychiatry 2010;68:930–41.
102. Williams GC, Nesse RM. The dawn of Darwinian
medicine. Q Rev Biol 1991;66:1–22.
103. Williams G. Pleiotropy, natural selection, and the
92. Huberman M, Shalit F, Roth-Deri I et al. Correlation of
evolution of senescence. Evolution 1957;11:398–411.
cytokine secretion by mononuclear cells of Alzheimer
104. Frank SA. Somatic evolutionary genomics: mutations
patients and their disease stage. J Neuroimmunol 1994;
52:147–52.
93. McGeer PL, Schulzer M, McGeer EG. Arthritis and anti-
during development cause highly variable genetic mosaicism with risk of cancer and neurodegeneration.
Proc Natl Acad Sci 2010;107(Suppl. 1):1725–30.
inflammatory agents as possible protective factors for
105. Isolauri E, Kalliomaki M, Rautava S et al. Obesity—
94. Morgan D. Modulation of microglial activation state fol-
extending the hygiene hypothesis. In: Microbial–Host
Interaction: Tolerance versus Allergy, Vol. 64. Nestle´
lowing passive immunization in amyloid depositing
Nutrition Institute Workshop Series: Pediatric Program.
Alzheimer’s disease. Neurology 1996;47:425–32.
transgenic mice. Neurochem Int 2006;49:190–4.
University of Turku: Turku, Finland, 2010, 75–89.
95. Donahue JE, Johanson CE. Apolipoprotein E, amyloid[beta], and blood-brain barrier permeability in
Alzheimer disease. J Neuropathol Exp Neurol 2008;67:
2010;126:1651–65.
107. Erdman SE, Poutahidis T. Roles for inflammation and
261–70.
96. Osuntokun BO, Sahota A, Ogunniyi A et al. Lack of an
between
apolipoprotein
E
e4
and
Alzheimer’s disease in elderly Nigerians. Ann Neurol
2004;38:463–5.
97. Sayi J, Patel N, Premukumar D et al. Apolipoprotein E
polymorphism in elderly east Africans. East Afr Med J
1997;74:668–70.
regulatory T cells in colon cancer. Toxicol Pathol 2010;
38:76–87.
108. Anand S, Hanson K. Disability-adjusted life years: a
critical review. J Health Econ 1997;16:685–702.
109. Prince M, Jackson J. World Alzheimer’s Report
Executive Summary. Alzheimer’s Disease International
2009, 1–24.
98. Farrer LA, Friedland RP, Bowirrat A et al. Genetic and
110. Remais JV, Zeng G, Li G et al. Convergence of non-
environmental epidemiology of Alzheimer’s disease in
communicable and infectious diseases in low- and
Arabs residing in Israel. J Mol Neurosci 2003;20:207–12.
middle-income countries. Int J Epidemiol 2013;42:221–7.
99. Sandholzer C, Delport R, Vermaak H et al. High frequency
111. Lopez AD, Mathers CD, Ezzati M et al. Measuring the
of the apo e4 allele in Khoi San from South Africa. Hum
global burden of disease and risk factors, 1990–2001.
Genet 1995;95:46–8.
100. Chen CH, Mizuno T, Elston R et al. A comparative study to
In: Lopez AD, Mathers CD, Ezzati M et al. (eds) Global
Burden of Disease and Risk Factors. World Bank:
screen dementia and APOE genotypes in an ageing East
African population. Neurobiol Aging 2010;31:732–40.
101. Nesse RM. How is Darwinian medicine useful? West J
Med 2001;174:358–60.
Washington, DC, 2006, 1–14.
112. Bloom BS, de Pouvourville N, Straus WL. Cost of illness
of Alzheimer’s disease: how useful are current estimates?
Gerontologist 2003;43:158–64.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
association
106. Erdman SE, Rao VP, Olipitz W et al. Unifying roles for
regulatory T cells and inflammation in cancer. Int J Cancer
3
Evolution, Medicine, and Public Health [2012] pp. 3–13
article
doi:10.1093/emph/eos003
Why are male malaria
parasites in such a rush?
Sex-specific evolution and
host–parasite interactions
1
Leiden Malaria Research Group, Department of Parasitology, LUMC, Albinusdreef 2, 2333 ZA Leiden, The Netherlands;
2
Centre for Immunity, Infection and Evolution, Institutes of Evolution, Infection and Immunity, School of Biological
Sciences, University of Edinburgh, Edinburgh EH9 3JT, UK; 3Division of Infection and Immunity, Institute of Biomedical
Life Sciences & Wellcome Centre for Molecular Parasitology, University of Glasgow, Glasgow G12 8TA, UK; 4Department
of Bioinformatics, Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Pawinskiego 5a, 02-106
Warszawa, Poland.
*Corresponding author. E-mail: sarah.reece@ed.ac.uk; tel:+44-131-650-5547; fax:+44-131-650-6564.
y
These authors contributed equally to this work.
Received 30 August 2012; revised version accepted 11 September 2012.
ABSTRACT
Background: Disease-causing organisms are notorious for fast rates of molecular evolution and the
ability to adapt rapidly to changes in their ecology. Sex plays a key role in evolution, and recent studies,
in humans and other multicellular organisms, document that genes expressed principally or exclusively
in males exhibit the fastest rates of adaptive evolution. However, despite the importance of sexual
reproduction for many unicellular taxa, sex-biased gene expression and its evolutionary implications
have been overlooked.
Methods: We analyse genomic data from multiple malaria parasite (Plasmodium) species and proteomic
data sets from different parasite life cycle stages.
Results: The accelerated evolution of male-biased genes has only been examined in multicellular taxa,
but our analyses reveal that accelerated evolution in genes with male-specific expression is also a
feature of unicellular organisms. This ‘fast-male’ evolution is adaptive and likely facilitated by the
male-biased sex ratio of gametes in the mating pool. Furthermore, we propose that the exceptional
rates of evolution we observe are driven by interactions between males and host immune responses.
Conclusions: We reveal a novel form of host–parasite coevolution that enables parasites to evade host
immune responses that negatively impact upon fertility. The identification of parasite genes with
accelerated evolution has important implications for the identification of drug and vaccine targets.
Specifically, vaccines targeting males will be more vulnerable to parasite evolution than those targeting
females or both sexes.
K E Y W O R D S : sex-specific selection; Plasmodium; host–parasite coevolution; gene expression
ß The Author(s) 2012. Published by Oxford University Press on behalf of the Foundation for Evolution, Medicine, and Public Health. This is an Open Access
article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits
unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Shahid M. Khan1,y, Sarah E. Reece2,*,y, Andrew P. Waters3, Chris J. Janse1 and
Szymon Kaczanowski4,y
4
| Reece et al.
Evolution, Medicine, and Public Health
INTRODUCTION
all life cycle stages, except for a brief zygote phase,
are haploid, and sex determination does not involve
sex chromosomes or regions of contiguous genes
[15]. Therefore, sexual dimorphism in Plasmodium
is generated solely by differential gene expression,
which is observed throughout the genome (Fig. S1).
Depending on population structure, the ratio of
male to female gametes may be male-biased in
Plasmodium, but because gametogenesis involves
only three more rounds of mitosis for each male
compared with each female, the opportunity for mutational bias to result in fast-male evolution is much
lower than for many multicellular organisms.
Here, we integrate advances in genomics and
proteomics to examine the evolutionary forces on
genes encoding features of Plasmodium sexual
cells. We compare divergence between closely
related pairs of species and reveal rapid, adaptive
evolution of male-biased genes in Plasmodium. The
consequences of host–parasite interactions for the
evolution of parasite sexual cells are poorly understood, but our analyses suggest a role for host
immune responses in driving ‘fast-male’ evolution.
Our findings demonstrate the necessity of an evolutionary framework for the development of medical
interventions that disrupt disease transmission by
preventing parasites from mating.
METHODOLOGY
We take advantage of proteomic data for malaria
parasites [16] to analyse the rate of change of
genes that are expressed in different life cycle
stages, across several Plasmodium species. First,
we test whether the rapid evolution of genes exclusively expressed in males occurs in unicellular parasites. Second, having found that male-biased genes
evolve more rapidly than female-biased genes, we
asked whether selective forces resulting from
host–parasite interactions could influence this
sex-specific evolution. Third, we investigated
whether the rapid evolution of male-biased genes
is due to the relaxation of selection constraints or
adaptive evolution.
Generation of data sets
Plasmodium parasites replicate asexually in the circulation of the host and must undergo a round of
sexual reproduction in the mosquito vector to be
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Explaining variation in rates of molecular evolution
is fundamental to evolutionary biology and is especially important for organisms that respond rapidly
to shifts in the environment, such as diseasecausing parasites and microbes that perform ecosystem services. In multicellular organisms, genes
expressed principally or exclusively in one sex evolve
at an accelerated rate compared with genes expressed in both sexes [1]. In particular, rapid
adaptive evolution occurs in male-specific genes
whose expression is associated with tissues and
traits that underpin mating success and fertility
[2–5]. This solution, to the problem of different
optimal phenotypes in males and females, has
been extensively documented for multicellular taxa
[6, 7], but whether males and females of dioecious
unicellular taxa are also subject to different, or
opposing, selection pressures has been overlooked.
This is surprising since sexual reproduction is an
obligate feature of the life cycles of many parasite
species.
There is increasing interest in the reciprocal
approach of using an evolutionary framework to
understand the biology of parasites and exploiting
the novel experimental opportunities provided by
these species to test the generality of evolutionary
theories. For example, blocking the fertility of
gametes to prevent sexual reproduction is a
priority target for the development of transmissionblocking interventions against malaria [8–12].
Understanding how selection shapes the evolution
of parasite sexes and mating systems is therefore
central to identifying the most ‘evolution-proof’
transmission-blocking targets and implementation
strategies. For evolutionary biology, distinguishing
between the footprints of natural and sexual selection pressures and identifying the biological mechanisms that drive the accelerated evolution
observed in male-biased genes has proved difficult.
For example, most males produce large numbers of
sperm, and so multiple rounds of mitosis could
result in a higher mutation rate in males than
females. Alternatively, in mammals, genes on the Y
chromosome could evolve rapidly simply because
the lack of repair during recombination facilitates
mutation [1], but X-linked genes could also evolve
quickly because recessive mutations are exposed to
selection in males [13, 14]. Using unicellular organisms such as malaria (Plasmodium) parasites overcomes many of these problems of inference because
Reece et al.
Fast-male malaria parasites
|
5
each proteome. Proteins are subdivided into putative membrane and non-membrane proteins. Numbers inside the circles refer to the number of putative
non-membrane proteins (solid background) and putative membrane proteins (shaded background) detected exclusively in proteomes of Males, Females, Asexual
Stages or detected in all three stages (All Stages). (B–D) Rates of evolution determined by comparing genes from each closely related pair of Plasmodium species.
The genes used for this analysis are the orthologs of the P. berghei genes identified in (A). P. berghei and P. yoelii (B); P. falciparum and P. reichenowi (C); P. vivax and
P. knowlesi (D).
transmitted [17, 18]. A small percentage of asexual
stages differentiate into sexual stages, termed gametocytes. Male and female gametocytes differentiate
into male and female gametes as soon as they are
taken up in a mosquito blood meal. Within
10–12 min, each male gametocyte undergoes three
rounds of mitosis, producing up to eight gametes,
and each female gametocyte differentiates into a
single gamete.
Male and female gametocytes each express a
specific and distinct set of proteins [16]. Using the
proteomes of males, females and asexual blood
stages, we distinguished Plasmodium berghei
genes based on their expression [unambiguous
identification of two or more high scoring
(MASCOT score >15) peptides] in different
stages during intra-erythrocytic development.
We classified genes (Fig. 1A; Table S1, PB) as
having male-biased expression (termed Male),
female-biased expression (Female), expression in
asexual and gametocyte stages (All Stages) and expression in asexual blood stages but not in gametocytes (Asexual Blood).
We identified orthologs of the P. berghei genes
using reciprocal BLAST for the following species:
Plasmodium yoelii, Plasmodium vivax, Plasmodium
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Figure 1. Stage-specific rates of evolution. Estimated as dN/dS, for genes expressed in sexual (male, female) and asexual blood stage malaria parasites. (A) Pattern
of ‘stage-specific’ expression of genes based on proteomes of P. berghei sexual and asexual blood stages. Red represents the total number of proteins identified in
6
| Reece et al.
Evolution, Medicine, and Public Health
knowlesi, Plasmodium reichenowi and Plasmodium falciparum (Table S1, PF and PV). We assume stage
specificity of gene expression is similar across
Plasmodium species, as previously suggested for
sporozoites [19], asexual stages [20] and gametocytes [21]. We also subdivided each of the four
categories into putative membrane and nonmembrane proteins, based on the presence of predicted signal peptides, transmembrane domains
and glycosylphosphatidylinositol anchors (Table
S1, PB, PF and PV).
Plasmodium berghei gene models
Protein selection
We used proteins identified by Khan et al. [16] and
describe the mapping of these data onto the new
release for P. berghei in Table S6. We assume that
membrane proteins contain more than one TM
segment or signal peptide, predicted using hidden
Markov models (http://www.cbs.dtu.dk/services/
SignalP/) [22]. Orthologous proteomes were predicted using reciprocal BLAST with e-value
<1 106 [23]. Sets of proteins containing
epitopes were obtained from PlasmoDB [24] as
were values of dN, dS and pN/pS for the alignment
of 3D7 versus Ghana isolates and the other P. falciparum strains (IT, DD2 and HB3).
Codon alignments
We used predicted sequences of P. berghei and
P. vivax and genomes of P. yoelii and P. knowlesi
obtained from PlasmoDB [23]. Genome regions containing orthologs of analysed sequences were predicted using the BLASTN program. Exons of genes
were predicted using the sim4 program [25].
Sequences of exons were aligned using the
dN/dS analysis
For each expression class, we compared the rate of
non-synonymous (dN) and synonymous (dS) nucleotide substitutions of the residue encoding sites of
the same gene from pairs of closely related malaria
parasites (Table S2). We calculated dN/dS between
the rodent malaria parasites P. berghei versus P.
yoelii, the human and non-human primate malaria
parasites P. falciparum versus P. reichenowi and P.
vivax versus P. knowlesi. We used these pairs of
closely related species to compare the speed of evolution of different protein classes. The evolution of
the gene classes we analyse occurred independently
in each pair, and our observations are supported
across all species pairs. Estimates of dN and dS
were each obtained using PAML version 3.13d,
under a codon-based model with average nucleotide
frequencies estimated from the data at each codon
position. We calculated dN/dS for genes for which the
alignment between the two species was longer
than 20 codons. For each pair of orthologous
genes, the value of dN/dS was estimated from the
ratio of maximum likelihood estimates of the
number of non-synonymous substitutions per nonsynonymous nucleotide site (dN) and the number of
synonymous substitutions per synonymous nucleotide site (dS). We also investigated whether any gene
models used in our analysis are fragments that have
been concatenated with others to form larger and
more accurate gene models in the 8X P. berghei
ANKA release. Of the 809 genes analysed, only
three (one expressed in All Stages and two expressed
in Males) are now predicted to be part of larger
genes. Furthermore, for comparison with the
results presented in Table S2, PB (for P. berghei
versus P. yoelii), we have repeated the analyses
using the new gene models and the results do not
differ (Table S6).
McDonald–Kreitman test
To formally test for a role of positive selection, we
used a modified version of the McDonald–Kreitman
test [27, 28]. The parameter refers to the fraction
of amino acid substitutions driven by positive
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
The gene models deposited in PlasmoDB are constantly being improved, and during the preparation
of this article, a new release of the P. berghei genome
became available. Therefore, we have adopted the
naming convention based on the 8X P. berghei
ANKA release 2010-06-01 (available at http://
plasmodb.org/plasmo/showXmlDataContent.do;
jsessionid=3A2D8DAC94289AB19ADA50C7810AD
CE?name=XmlQuestions.DataSources). The annotation for P. berghei was obtained, with permission,
from the Pathogen Sequencing Unit at the Wellcome
Trust Sanger Institute.
Needelman and Wunsch algorithm [26]. Codons
neighbouring the gaps were removed from alignments. All calculations were performed using perl
pipeline. Alignments containing stop codons or
<60 codons were removed.
Reece et al.
Fast-male malaria parasites
Statistical significance
Statistical significance was based on 10 000 mean
values (dN, dS, dN/dS and pN/pS) for two sets
bootstrapped independently and than compared.
For the McDonald–Kreitman test, we compared
frequencies using the DoFE-all method, which
returns a value of alpha and 95% credibility
intervals.
RESULTS
Fast-male evolution in Plasmodium
The P. berghei versus P. yoelii (P = 0.01) and P. falciparum versus P. reichenowi (P = 0.001) comparisons
revealed that Male genes are evolving significantly
faster than Female genes (Fig. 1B and C). A similar
trend, although not significant (P = 0.11), is
observed in the Male versus Female genes in the
P. vivax and P. knowlesi comparison (Fig. 1D).
Values for dN and dS for all comparisons are
plotted in Supplementary Fig. S2, and we obtain
qualitatively similar results when analysing dN
alone. Interestingly, in all comparisons (all
P > 0.08), Male genes show high average dN/dS
values that are comparable to those of Asexual
Blood genes. Proteins of Asexual Blood stages
have been shown to evolve rapidly due to selective
pressures from the host immune system [29]. As
expected from observations of multicellular organisms [4], genes commonly expressed (e.g. housekeeping proteins) in All Stages show significantly
lower dN/dS values than stage-specific genes, for all
comparisons (all P < 0.03).
Genes encoding proteins with a membrane
location in Asexual Blood stages have been
shown to accumulate mutations faster than
non-membrane proteins, which is thought to be
due to selective pressures resulting from host
immunity [29]. We tested whether the accumulation
of mutations was also greater for genes encoding
membrane proteins and particularly in genes
encoding Male proteins (Fig. S2A–C; Table S2, PB,
PF and PV). We find the following patterns (Fig. 2).
First, as expected, membrane proteins of Asexual
Blood and All Stages evolve faster than their
non-membrane proteins (P = 0.03 and P = 0.01, respectively, for P. berghei versus P. yoelii, and the
same but non-significant trend for the other
species comparisons). Second, for Females, genes
encoding membrane proteins evolve faster
(P < 0.01) than genes encoding non-membrane
proteins, except in the P. falciparum versus P.
reichenowi comparison where this trend is reversed
(P = 0.01). However, only eight genes are included in
the Female membrane subset for P. falciparum
versus P. reichenowi, giving this analysis the least
power (>20 genes are compared for the other
species pairs). Third, Male genes encoding
non-membrane genes are exceptional because
there is no significant difference in rates of evolution
between non-membrane and membrane proteins
(all comparisons, P > 0.22). Furthermore, genes
encoding Male non-membrane proteins are
evolving faster than genes encoding Female nonmembrane proteins (P < 0.01 for both P. berghei
versus P. yoelii and P. knowlesi versus P. vivax;
P = 0.01 for P. falciparum versus P. reichenowi). In
contrast, when comparing only genes encoding
membrane proteins, Males are not evolving significantly faster than Females for P. berghei versus P.
yoelii (P = 0.46), and Females are evolving faster
than Males for the P. falciparum versus P. reichenowi
(P = 0.04) and P. knowlesi versus P. vivax comparisons (P = 0.03).
7
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
selection which is estimated from polymorphism
and divergence. The effects of positive selection
() can be distinguished from purifying selection
and neutrality by comparing divergence (rate of
fixation of non-synonymous mutations; dN/dS)
between species to the observed relative rate of
non-synonymous polymorphisms (pN/pS) in
natural populations within a species. The data for
each gene comprise a 2 2 table with rows corresponding to putative selected or non-selected sites
and columns corresponding to polymorphism or
divergence. We applied a maximum likelihood
approach using the ‘distribution of fitness effects’
program (DoFE; [27, 28]). The DoFE-all method
uses a maximum likelihood approach that maximizes the number of genes that can be analysed
(even those with no polymorphism) and does not
sum DN, DS, PN and PS values across genes. This
is important for selecting genes under positive selection when signals are weak because it also avoids
underestimating when mildly deleterious mutations obscure divergence by inflating polymorphism. A negative value for indicates that there are
no amino acid substitutions driven by positive selection and that the majority of polymorphisms are
removed due to purifying selection.
|
8
| Reece et al.
Evolution, Medicine, and Public Health
Host–parasite interactions
Figure 2. Stage- and location-specific rates of evolution.
Estimated as dN/dS, for predicted non-membrane (solid
bars) and membrane (shaded bars) proteins expressed in
sexual (male, female) and asexual blood stage malaria parasites. Genes were classified according to their exclusive detection in P. berghei proteomes of Males, Females, Asexual Stages
or All Stages (see Fig. 1), The rates of evolution were
determined by comparison of genes of the following pairs of
Plasmodium species: P. berghei and P. yoelii (A); P. falciparum
and P. reichenowi (B); P. vivax and P. knowlesi (C).
Relaxation of constraints
Next, we investigated whether a relaxation of constraints resulting in genetic drift or the fixation of
deleterious mutations could explain the accelerated
evolution of genes coding for Male non-membrane
proteins. First, we compared the rate of synonymous
We next examined what aspects of parasite ecology
could explain the accelerated evolution of Male
compared with Female genes, and particularly the
exceptional rates of evolution in Male nonmembrane genes. When taken up in a blood meal,
male and female gametocytes must rapidly differentiate into gametes and mate. Host immune factors
(resulting from exposure to circulating gametocytes) are also taken up in the blood meal and can
reduce fertility, especially of male gametes, and
block transmission [30–32]. In the host, male and
female gametocytes are protected inside red blood
cells, but in the blood meal, gametocytes become
more vulnerable to immune factors because they
must exit red blood cells during gametogenesis. In
contrast to females, male gametogenesis is complex
and involves rapid DNA replication, flagella and
gamete construction, extrusion and detachment of
gametes [33–35]. Male gametes must then travel
through the blood meal to locate and fertilize
females.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
mutations (dS) according to whether gene expression is Male- or Female-biased and encodes
membrane or non-membrane proteins (Table S2,
PB, PK and PV). There are no significant differences
in dS when comparing genes coding for nonmembrane proteins of Females with both nonmembrane and membrane proteins of Males
(P > 0.09). This is also the case for genes coding
membrane proteins of Females when compared
with non-membrane and membrane proteins of
Males in the P. berghei versus P. yoelii comparison
(P > 0.1). Second, we tested whether the frequency
of nonsense mutations (i.e. premature termination
and stop codons) differs according to the four different expression categories: Male, Female, All
Stages and Asexual Blood. We also tested 789
genes of P. berghei and P. yoelii for which >20
codons were available (Table S3). A total of 10
(1.3%) contained nonsense mutations. The
number of nonsense mutations in Male and
Female genes do not significantly differ (P > 0.05)
from that expected by chance (1.3%), which is consistent with random occurrence of nonsense mutations. The lack of variation in dS and the random
occurrence of nonsense mutations suggest that
the rapid evolution of Male genes is not due to a
relaxation of constraints and thus that selection
may play a role.
Reece et al.
Fast-male malaria parasites
proteins experience greater levels of immune recognition than Female proteins and that the immunogenicity of Males is equivalent to Asexual Blood
stages.
We then tested whether the rate of nonsynonymous versus synonymous substitutions is
higher in genes encoding Male proteins containing
epitopes compared with those without epitopes. We
analysed the polymorphisms (pN/pS) in gene sequences for Male membrane and non-membrane
proteins, with and without epitopes, from the two
most comprehensively sequenced P. falciparum
strains/isolates (3D7 and Ghana; Table S4).
Consistent with other studies [34], we find that
pN/pS is high in Plasmodium and that genes
encoding Male proteins containing epitopes do
have a significantly higher pN/pS than Females for
both non-membrane (Fig. 3B; P = 0.01) and
membrane encoding genes (Fig. 3C; P < 0.01).
Furthermore, the proportion of non-synonymous
versus synonymous substitutions is particularly
high for Male non-membrane proteins containing
epitopes involved in cellular motor machinery
such as kinesins or dyneins (Table S4). This
suggests that interactions with the host immune
system shape genes encoding Male non-membrane
proteins.
Adaptive ‘fast-male’ evolution
Our analyses suggest that accelerated evolution of
Male-biased genes encoding both non-membrane
and membrane proteins is better explained by
selection pressures resulting from interactions
with host immunity than a relaxation of constraints.
To test if Male-biased genes are evolving under
positive selection, we used a modified version of
the McDonald–Kreitman test (the ‘DoFE-all
method’ [27, 28]) and data from natural isolates
of P. falciparum (3D7, Dd2, HB3, 7G8, D10, D6,
Santa Lucia, K1, RO-33, IT, FCC-2/Hainan,
Senegal, IGH-CR14 [41–43]) compared with P.
reichenowi, and we assume that mutations manifesting as polymorphism occur in more than four
isolates. We calculated the fraction of residues
fixed by positive selection () for all genes expressed
in Male, Female, All Stages and Asexual Blood
classes (Table 1), assuming that all nonsynonymous polymorphisms are neutral. When
non-synonymous polymorphisms are deleterious
and removed from the population, takes a
negative value that is not expressed as a proportion
9
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
The more complex activities of males are predicted to make them more vulnerable to host
immune factors taken up in the blood meal [36–
39]. This could occur because immune factors
interact with male proteins with both membrane
and non-membrane locations (non-membrane
proteins are not exposed in females) and/or
immune attack may occur for both sexes, but male
proteins may be more easily damaged and the consequences for fertility may be more severe. For
example, male fertility is reduced by exposure to
transmission-blocking immune factors, such as
reactive oxygen species, that affect intracellular
processes [40]. Interactions between male proteins
and immune factors may also be indirect; for
example, antibodies that agglutinate the surface
proteins of male gametes may select for faster performance of intracellular proteins involved in flagella
formation. Furthermore, senescence or damage
may result in male gametocytes inappropriately
being activated in the host circulation, which will
present both their intracellular and membrane
proteins to the host immune system (activated
females may only present membrane proteins).
To investigate whether interactions between parasites and host immune responses could drive the
accelerated evolution of genes encoding Male
proteins, we first tested whether Male proteins are
more immunogenic than those of Females. Because
the membrane or non-membrane location of
proteins is not a definitive predictor of exposure to
host immunity, we classified proteins based on
whether they are known immune epitopes. We
focused on P. falciparum because epitopes for this
species have been characterized and experimentally
confirmed. These data are available from the
Immune Epitope Database and Analysis Resource
(http://www.immuneepitope.org/;
Table
S4).
Specifically, we tested whether the proportion of
‘immune epitopes’ of P. falciparum proteins
differed between Males and Females and their predicted membrane or non-membrane location. This
analysis reveals that Male non-membrane proteins
have a significantly higher percentage of epitopes
compared with those of Females and All Stages
(Fig. 3A; Fisher exact test P = 0.045 and
P < 0.0001). Furthermore, the percentage of Male
non-membrane proteins with epitopes is comparable to that of Asexual Blood’ stages (Fisher exact
test P = 0.098). Similar, but non-significant trends
are present in the percent of Male membrane
proteins with epitopes. This suggests that Male
|
10
| Reece et al.
Evolution, Medicine, and Public Health
suggesting that the accelerated evolution of Male
genes is driven by positive selection. This is the
pattern observed in multicellular taxa: male-biased
mutations manifesting as polymorphisms are often
subject to positive selection [2–5].
DISCUSSION
pN/pS, for P. falciparum proteins with (solid bars) or without
predicted immune epitopes (shaded bars) expressed in
sexual (male, female) and asexual blood stage malaria parasites. (A) Percentage of proteins containing immune epitopes
identified from the Immune Epitope Database and Analysis
Resource. Genes were classified according to their exclusive
detection in P. berghei proteomes of Males, Females, Asexual
Stages or All Stages (see Fig. 1). The strength of diversifying
selection on P. falciparum predicted non-membrane (B) and
membrane (C) proteins.
(i.e. when is positive, it is expressed as a proportion, but when is negative, it can vary between 0
and infinity). For all expression categories—except
for Male genes— is negative (Table 1; Table S5),
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Figure 3. The strength of diversifying selection. Estimated as
Our analyses reveal that the accelerated evolution of
male-biased genes is not exclusive to multicellular
taxa and that sex-specific selection occurs in dioecious single-celled organisms without sex chromosomes. The biology of Plasmodium also sheds light
on how the rapid and adaptive evolution of
male-biased genes occurs. We can rule out a role
of sex chromosomes and mutational bias because
male gametogenesis only involves three mitoses.
However, the ratio of male to female gametes will
often be slightly male biased, depending on the
multiplicity of infections and presence of
transmission-blocking factors [37, 39]. This could
result in male gametes behaving as large population
(compared with females), resulting in faster removal
of deleterious mutations and rapid fixation of beneficial mutations associated with male performance.
Multicellular taxa have traditionally been the focus
for investigating sex-dependent rates of evolution
and their underlying causes, but we demonstrate
that Plasmodium parasites are also a useful model
for research in this area.
Rapid evolution of male traits is often expected to
be a response to male–male competition or adaptations to attract females [6, 44], but we show that
natural selection driven by host–parasite interactions can shape the evolutionary trajectories of
the sexes. How natural and sexual selection
interact to affect fitness in a changing environment
is a fundamental question in evolutionary biology
and has important implications for adaptation
and speciation [45]. Studies revealing fast-male evolution in multicellular taxa have implicated sexual
selection as a driver [1], but processes such as
intra-sex competition, sexual antagonism (conflict)
and natural selection can all contribute to driving
sex-dependent selection [46]. While our results
show that natural selection pressures, in the form
of host immune factors, shape sex-dependent selection in Plasmodium, sexual selection could also
make a contribution. Plasmodium populations
span from clonal to genetically diverse [47] and infections containing multiple con-specific genotypes
provide the opportunity for ‘sperm’ competition
Reece et al. |
Fast-male malaria parasites
11
Table 1. Adaptive Male evolution
Number of genes
Male
All
Membrane excluded
Female
All
Membrane excluded
Asexual blood
All
Membrane excluded
All stages
All
Membrane excluded
Genes with frequent polymorphisms
; for frequent polymorphisms
163
139
21
17
0.07
0.26
74
57
8
5
0.94
0.41
137
109
25
19
0.52
0.79
237
211
18
15
5.35
7.07
between unrelated males. This could be evaluated by
comparing male-biased rates of evolution
across populations with different inbreeding rates.
Another approach lies in the potential offered by
unicellular organisms to genetically manipulate
fast evolving genes to test whether their functions
are adaptations for immune evasion or sperm
competition.
How the interplay between phenotypic plasticity
and microevolution shapes phenotypes is a key
question in evolutionary biology [48] and has implications for the development of transmissionblocking interventions [49, 50]. Evolutionary theory
predicts that malaria parasites facultatively increase
their investment in males during periods in infections when the host produces immune factors that
reduce the fertility of males more than females [36,
51]. By revealing that males are more immunogenic
than females, we support this theory and show that,
in addition to evading host immunity through
phenotypic plasticity in sex ratios [52], parasites
can also respond with rapid microevolution.
Moreover, by adjusting sex ratios to produce more
males when conditions for mating are unfavourable
[36, 51], parasites may benefit from maximizing both
their immediate mating success and the potential
for adaptive evolution in response to the conditions
they experience. There is a drive to develop
transmission-blocking interventions that, when administered to hosts, kill gametocytes (e.g.
gametocidal drugs) or produce immune responses
that are taken up in the blood meals of vectors
and prevent parasites from mating (e.g. by
vaccinating against the antigens of one or both
sexes). Our results suggest that a vaccine harnessing host immunity to target males is unlikely to
be a novel selection pressure for malaria parasites
and that phenotypic plasticity and rapid microevolution could quickly undermine such an
intervention.
Conclusions and implications
Because male-biased genes evolve at an accelerated
rate, our results predict that interventions specifically targeting males are more vulnerable to parasite
counter-evolution than interventions targeting
female or universally expressed genes. Furthermore,
our approach, coupled with phenotypic analyses,
provides a powerful way to assess the evolutionary
potential of candidate antigens and identify slower
evolving targets.
supplementary data
Supplementary data are available at EMPH online.
acknowledgements
We thank P.R. Haddrill, D.J. Obbard and R.S. Ramiro for
discussions and D.L. Hartl for advice. Thanks to M.
Berriman and colleagues at the Wellcome Trust Sanger
Institute for permitting us to use the latest iteration of the
P. berghei genome and to A Eyre-Walker for advice on using
DoFE software.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Estimates of the fraction () of fixed non-synonymous polymorphism due to positive selection, with or without restriction to frequent polymorphisms.
Natural isolates of P. falciparum (Dd2, HB3, 7G8, D10, D6, Santa Lucia, K1, RO-33, IT, FCC-2/Hainan, 3D7, Senegal, IGH-CR14) and P. reichenowi are
compared. We classify frequent polymorphisms as those observed in four or more isolates.
12
| Reece et al.
Evolution, Medicine, and Public Health
funding
S.E.R. is supported by a Wellcome Trust Fellowship
(WT082234MA); S.K. by a Columb fellowship from
the Foundation for Polish Science and supporting
grant (1/722/N-COST/2010/0) COST Action of the
Polish Ministry of Science); C.J.J. by the EU Seventh
Framework Program (FP7/2007-2013; 242095);
S.E.R., S.K. and S.M.K. are also supported by the
COST Action (BM0802), and S.M.K., A.P.W. and
C.J.J. received support from the European Commission (FP7, EVIMalaR Network of Excellence).
Funding to pay the Open Access publication
charges for this article was provided by xxxxx.
Conflict of interest: none declared.
13. Vicoso B and Charlesworth B. Evolution on the X chromosome: unusual patterns and processes. Nat Rev Genet
2006;7:645–3.
14. Khaitovich P, Hellmann I, Enard W et al. Parallel patterns
of evolution in the genomes and transcriptomes of
humans and chimpanzees. Science 2005;309:1850–4.
15. Paul REL, Brey PT and Robert V. Plasmodium sex determination and transmission to mosquitoes. Trends Parasitol
2002;18:32–8.
16. Khan SM, Franke-Fayard B, Mair GR et al. Proteome analysis
of separated male and female gametocytes reveals novel
sex-specific Plasmodium biology. Cell 2005;121:675–87.
17. Sinden RE. Sexual development of malarial parasites.
Advan Parasitol 1983;22:153–216.
18. Sinden RE, Canning EU, Bray RS et al. Gametocyte and
gamete development in Plasmodium falciparum. Proc Biol
Sci 1978;201:375–99.
19. Lasonder E, Ishihama Y, Andersen JS et al. Analysis of the
Plasmodium falciparum proteome by high-accuracy mass
spectrometry. Nature 2002;419:537–42.
1. Ellegren H and Parsch J. The evolution of sex-biased genes
20. Zhou Y, Ramachandran V, Kumar KA et al. Evidence-based
and sex-biased gene expression. Nat Rev Genet 2007;8:
annotation of the malaria parasite’s genome using com-
689–98.
parative expression profiling. PLoS One 2008;3:e1570.
2. Ranz JM, Castillo-Davis CI, Meiklejohn CD et al. Sex-
21. Silvestrini F, Lasonder E, Olivieri A et al. Protein export
dependent gene expression and evolution of the
marks the early phase of gametocytogenesis of the
Drosophila transcriptome. Science 2003;300:1742–5.
human malaria parasite Plasmodium falciparum. Mol Cell
3. Wyckoff GJ, Wang W and Wu CI. Rapid evolution of male
reproductive genes in the descent of man. Nature 2000;
403:304–9.
4. Zhang Z, Hambuch TM and Parsch J. Molecular evolution
Proteomics 2010;9:1437–48.
22. Emanuelsson O, Brunak S, von Heijne G et al. Locating
proteins in the cell using TargetP, SignalP and related
tools. Nat Protocol 2007;2:953–71.
of sex-biased genes in Drosophila. Mol Biol Evol 2004;21:
23. Altschul SF, Gish W, Miller W et al. Basic local alignment
2130–9.
5. Good JM and Nachman MW. Rates of protein evolution
search tool. J Mol Biol 1990;215:403–10.
24. Aurrecoechea C, Brestelli J, Brunk BP et al. PlasmoDB: a
are positively correlated with developmental timing of ex-
functional genomic database for malaria parasites.
pression during mouse spermatogenesis. Mol Biol Evol
2005;22:1044–52.
6. Andersson M. Sexual Selection. Princeton University Press:
Princeton, 1994.
7. Darwin C. The Descent of Man and Selection in Relation to
Sex. Murray: London, 1871.
8. Carter R. Transmission blocking malaria vaccines. Vaccine
2001;19:2309–14.
9. Williamson K. Pfs230: from malaria transmissionblocking vaccine candidate toward function. Parasit
Immunol 2003;25:351–9.
10. Malkin EM, Durbin AP, Diemert DJ et al. Phase 1 vaccine
trial of Pvs25H: a transmission blocking vaccine for
Plasmodium vivax malaria. Vaccine 2005;23:3131–8.
11. Ramjanee S, Robertson JS, Franke-Fayard B et al. The use
Nucleic Acids Res 2009;37:D539–43.
25. Ogasawara J and Morishita S. A fast and sensitive algorithm for aligning ESTs to the human genome. J Bioinform
Comput Biol 2003;1:363–86.
26. Needleman SB and Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 1970;48:443–53.
27. Bierne N and Eyre-Walker A. The genomic rate of adaptive
amino acid substitution in Drosophila. Mol Biol Evol 2004;
21:1350–60.
28. Eyre-Walker A. The genomic rate of adaptive evolution.
Trends Ecol Evol 2006;21:569–75.
29. Volkman SK, Barry AE, Lyons EJ et al. Recent origin of
Plasmodium falciparum from a single progenitor. Science
of transgenic Plasmodium berghei expressing the
2001;293:482–4.
30. Aikawa M, Rener J, Carter R et al. An electron microscop-
Plasmodium vivax antigen P25 to determine the
ical study of the interaction of monoclonal antibodies with
transmission-blocking activity of sera from malaria
gametes of the malaria parasite Plasmodium gallinaceum. J
vaccine trials. Vaccine 2007;25:886–94.
12. Sutherland C. Surface antigens of Plasmodium falciparum
31. Carter R and Chen DH. Malaria transmission blocked by
gametocytes—a new class of transmission-blocking
immunisation with gametes of the malaria parasite.
vaccine targets? Mol Biochem Parasitol 2009;166:93–8.
Nature 1976;263:57–60.
Protozool 1981;28:383–8.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
references
Reece et al. |
Fast-male malaria parasites
32. Targett GAT. Plasmodium falciparum—natural and experi-
42. Gardner MJ, Hall N, Fung E et al. Genome sequence of
mental transmission-blocking immunity. Immunol Lett
the human malaria parasite Plasmodium falciparum.
1988;19:235–40.
Nature 2002;419:498–511.
33. Janse CJ, Vanderklooster PFJ, Vanderkaay HJ et al. Rapid
43. Volkman SK, Sabeti PC, DeCaprio D et al. A genome-wide-
repeated DNA-replication during microgametogenesis
map of diversity in Plasmodium falciparum. Nat Genet
and DNA-synthesis in young zygotes of Plasmodium
berghei. Trans R Soc Trop Med Hyg 1986;80:154–7.
2007;39:113–9.
34. Kawamoto F, Alejoblanco R, Fleck SL et al. Plasmodium
berghei—ionic regulation and the induction of gametogenesis. Exp Parasitol 1991;72:33–42.
35. Sinden RE, Canning EU and Spain B. Gametogenesis
and fertilisation in Plasmodium yeolli nigeriensis: a trans-
44. Clutton-Brock T. Sexual selection in males and females.
Science 2007;318:1882–5.
45. Ritchie MG. Sexual selection and speciation. Ann Rev Ecol
Evol Systemat 2007;38:79–102.
46. Carranza J. Defining sexual selection and sex-dependent
selection. An Behav 2009;77:749–51.
mission electron microscope study. Proc Biol Sci 1976;
47. Nkhoma SC, Nair S, Cheeseman IH et al. Close kinship
193:55–76.
36. Gardner A, Reece SE and West SA. Even more extreme
within multiple-genotype malaria parasite infections. Proc
fertility insurance and the sex ratios of protozoan blood
48. Chevin L-M, Lande R and Mace GM. Adaptation, plasticity,
parasites. J Theor Biol 2003;223:515–21.
37. Paul REL, Ariey F and Robert V. The evolutionary ecology of
38. Paul REL, Coulson TN, Raibaud A et al. Sex determination
in malaria parasites. Science 2000;287:128–31.
39. West SA, Reece SE and Read AF. Evolution of gametocyte
sex ratios in malaria and related apicomplexan (protozoan) parasites. Trends Parasitol 2001;17:525–31.
Biol Sci 2012;279:2589–98.
and extinction in a changing environment: towards a predictive theory. PLoS Biol 2010;8:e1000357.
49. Mideo N and Reece SE. Plasticity in parasite phenotypes:
evolutionary and ecological implications for disease.
Future Microbiol 2012;7:17–24.
50. van Dijk MR, van Schaijk BC, Khan SM et al. Three
members of the 6-cys protein family of Plasmodium play
a role in gamete fertility. PLoS Pathog 2010;6:e1000853.
40. Ramiro RS, Alpedrinha Jo, Carter L et al. Sex and death:
51. West SA, Smith TG, Nee S et al. Fertility insurance and
the effects of innate immune factors on the sexual repro-
the sex ratios of malaria and related hemospororin blood
duction of malaria Parasites. PLoS Pathog 2011;7:
e1001309.
41. Jeffares DC, Pain A, Berry A et al. Genome variation and
evolution of the malaria parasite Plasmodium falciparum.
Nat Genet 2007;39:120–5.
parasites. J Parasitol 2002;88:258–63.
52. Reece SE, Drew DR and Gardner A. Sex ratio adjustment
and kin discrimination in malaria parasites. Nature 2008;
453:609–15.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Plasmodium. Ecol Lett 2003;6:866–80.
13
148
o r i gi na l
research
article
Evolution, Medicine, and Public Health [2013] pp. 148–160
doi:10.1093/emph/eot012
Ademir Jesus Martins*1,2, Luiz Paulo Brito1, Jutta Gerlinde Birggitt Linss1,
Gustavo Bueno da Silva Rivas3, Ricardo Machado3, Rafaela Vieira Bruno2,3,
Jose´ Bento Pereira Lima1, Denise Valle2,4 and Alexandre Afranio Peixoto2,3,y
1
Laborato´rio de Fisiologia e Controle de Artro´podes Vetores, Instituto Oswaldo Cruz—FIOCRUZ and Laborato´rio de
Entomologia, Instituto de Biologia do Exe´rcito, Rio de Janeiro, RJ, 21040-360, Brazil, 2Instituto Nacional de Cieˆncia e
Tecnologia em Entomologia Molecular, Brazil, 3Laborato´rio de Biologia Molecular de Insetos, Instituto Oswaldo Cruz—
FIOCRUZ, Rio de Janeiro, RJ, 21040-360, Brazil and 4Laborato´rio de Biologia Molecular de Flavivirus, Instituto Oswaldo
Cruz—FIOCRUZ, Rio de Janeiro, RJ, 21040-360, Brazil
*Correspondence address. Laborato´rio de Fisiologia e Controle de Artro´podes Vetores, Instituto Oswaldo Cruz—
FIOCRUZ and Laborato´rio de Entomologia, Instituto de Biologia do Exe´rcito, Rio de Janeiro, RJ, 21040-360, Brazil.
Tel:+55 21 25621398; Fax:+55 21 25621308; E-mail: ademirjr@ioc.fiocruz.br
y
In memoriam.
Received 9 March 2013; revised version accepted 9 June 2013
ABSTRACT
Background and objectives: Mutations in the voltage-gated sodium channel gene (NaV), known as kdr
mutations, are associated with pyrethroid and DDT insecticide resistance in a number of species. In the
mosquito dengue vector Aedes aegypti, besides kdr, other polymorphisms allowed grouping AaNaV
sequences as type ‘A’ or ‘B’. Here, we point a series of evidences that these polymorphisms are actually
involved in a gene duplication event.
Methodology: Four series of methods were employed: (i) genotypying, with allele-specific PCR (AS-PCR),
of two AaNaV sites that can harbor kdr mutations (Ile1011Met and Val1016Ile), (ii) cloning and
sequencing of part of the AaNaV gene, (iii) crosses with specific lineages and analysis of the offspring
genotypes and (iv) copy number variation assays, with TaqMan quantitative real-time PCR.
Results: kdr mutations in 1011 and 1016 sites were present only in type ‘A’ sequences, but never in the
same haplotype. In addition, although the 1011Met-mutant allele is widely disseminated, no homozygous (1011Met/Met) was detected. Sequencing revealed three distinct haplotypes in some individuals,
raising the hypothesis of gene duplication, which was supported by the genotype frequencies in the
offspring of specific crosses. Furthermore, it was estimated that a laboratory strain selected for
ß The Author(s) 2013. Published by Oxford University Press on behalf of the Foundation for Evolution, Medicine, and Public Health. This is an Open
Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits
unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Evidence for gene
duplication in the
voltage-gated sodium
channel gene of Aedes
aegypti
Sodium channel gene duplication in Aedes aegypti
Martins et al. |
149
insecticide resistance had 5-fold more copies of the sodium channel gene compared with a susceptible
reference strain.
Conclusions and implications: The AaNaV duplication here found might be a recent adaptive response
to the intense use of insecticides, maintaining together wild-type and mutant alleles in the same
organism, conferring resistance and reducing some of its deleterious effects.
K E Y W O R D S : gene duplication; kdr mutation; sodium channel; pyrethroid resistance; Aedes aegypti
BACKGROUND AND OBJECTIVES
Several mutations have been identified in the
Ae. aegypti NaV gene (AaNaV) comprising the
IIS5–S6 region: Gly923Val, Leu982Trp, Ile1011Met,
Ile1011Val, Val1016Ile and Val1016Gly [12–16]. The
Ile1011Met substitution was associated with low
sensitivity to pyrethroids evidenced by electrophysiological assays [12] and was the most frequent
in a resistant Brazilian natural Ae. aegypti population
[14]. However, substitutions in another position,
1016 (Val/Ile in South and Central America and
Val/Gly in Thailand), are presently attributed with a
more important role in pyrethroid resistance, the
1016 substitutions appearing as a recessive trait
[13, 16–18]. Outside domain II, a Phe1534Cys substitution in the IIIS6 region was also related to pyrethroid resistance [19]. Besides amino acid changes,
nucleotide and insertion/deletion polymorphisms
have been detected in intron 20 in the AaNaV IIS6
genomic region that enable grouping the sequences
in two categories, type ‘A’ or type ‘B’. The Ile1011Met
and Val1016Ile mutations are found only in type ‘A’
sequences [14].
Herein, we further investigated the nature of this
polymorphism. Sequencing of the AaNaV IIS6 genomic region and alelle specific-PCR (AS-PCR) typing
of the 1011 and 1016 sites revealed, in several cases,
three haplotypes in the same mosquito. Besides, in
no case were homozygous specimens for the
1011Met mutation in natural populations detected.
Crosses between laboratory-selected genotypes and
copy number variation assays strongly suggested the
occurrence of duplication events in the sodium channel gene, at least for the studied genomic region.
MATERIALS AND METHODS
Mosquitoes
Rockefeller strain, continuously reared in the laboratory as a standard for insecticide susceptibility and
life-history trait parameters, was used as reference
for wild-type alleles for the voltage-gated sodium
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
The use of DDT as public health insecticide was one
of the factors responsible for the yellow fever mosquito eradication in many Latin American countries
in the 1950s [1]. Since the reintroduction of Aedes
aegypti to South America, organophosphates and,
subsequently, pyrethroid insecticides have been extensively used in governmental campaigns as well
as in residential or private services. Pyrethroids have
similar effects as DDT but with a lower residual effect
in the environment, and they represent nowadays the
main class of insecticide against arthropods, not only
those of medical and veterinary importance but also
in relation to agriculture and livestock [2]. In Brazil,
despite the recent introduction of pyrethroids in campaigns for dengue control throughout the whole
country, resistance to these compounds has already
been detected in many Ae. aegypti populations [3, 4].
Pyrethroids and DDT have a rapid effect on the
insect central nervous system, leading to repetitive
and involuntary muscular contractions, followed by
paralysis and death, commonly reported as
knockdown effect [5, 6]. Accordingly, resistance to
this is referred to as knockdown resistance (kdr),
the principal cause being a mutation in the pyrethroid/DDT target site, the voltage-gated sodium
channel (NaV). The NaV is an axonic transmembrane
protein composed of four homologous domains
(I–IV), each one with six hydrophobic segments
(S1–S6) [7]. To date, most of the kdr mutations
described lie in the NaV IIS6 region, and the Leu/
Phe substitution in the 1014 site (numbered according to the Musca domestica amino acid primary sequence) is by far the most common among all
studied insects. Relatively recent analyses of kdr
mutations in a series of arthropod species
contributed to the knowledge concerning evolution
and dynamics of pyrethroid resistance in natural
populations. This effort is essential to formulate
strategies able to prolong the effectiveness of pyrethroids in the field and to develop new compounds
targeting the sodium channel [8, 9]. Some extensive
reviews of kdr mutations are available [2, 10, 11].
150
| Martins et al.
Evolution, Medicine, and Public Health
channel gene. The EE lineage was originated from
laboratory selection pressure for nine consecutive
generations with the pyrethroid deltamethrin using
a sample of a natural population from Natal (a locality from the Northeast of Brazil) that did not harbor the mutation in the 1016 site [20]. Rearing and
maintenance of the colonies were conducted according to standard laboratory conditions [21].
Field populations were obtained by sampling as
described elsewhere [13].
Molecular assays
Crossing experiments
Crosses were performed between mosquitoes from
Rockefeller and EE strains, respectively, homozygous (Ile/Ile) and apparently ‘heterozygous’
(Ile/Met) for the 1011 site. Each couple of one male
and one virgin female was maintained for at least 3
days in conical 50 ml tubes covered with a mesh tulle
under a cotton wool soaked in sugar solution.
Females were then blood-fed on anesthetized mice,
24 h after sugar removal. Individual females were
induced to lay eggs in small Petri dishes lined with
wet filter paper [22]. Resulting F1 larvae were reared
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Genotyping by allele-specific PCR (AS-PCR) for the
AaNaV 1011 site and sequencing of the IIS6 genomic
region were performed with the DNA from the same
specimens genotyped for the 1016 alleles, described
in a previous report [13]. PCR discriminating type ‘A’
or ‘B’ sequences (see [14]) was carried out in 12.5 ml
reactions containing 1 mM of each primer ‘forward’
(50 -AGGCTGACTGAAAGTAAATTGG-30 ) and ‘reverse’
(50 -CAAAAGCAAGGCTAAGAAAAGG-30 ),
6.25 ml of GoTaq Green Master Mix 2X (Promega)
and 0.5 ml of genomic DNA, submitted for 30 denaturation, annealing and extension cycles under,
respectively, 94 C/30", 60 C/1’ and 72 C/45". The
amplified region includes the intron 20, polymorphic
in size, in the AaNaV IIS6 region. For the 1011 site
genotyping, PCR with 0.24 mM of common and
0.12 mM of each of the two specific primers [17]
was performed as above, with 30 cycles of denaturation, annealing and extension under, respectively,
94 C/30", 57 C/1’ and 72 C/45’’ conditions. The
PCR products were analyzed in 10% polyacrylamide
gel electrophoresis stained in 1 mg/ml ethidium
bromide solution. The AaNaV IIS6 region was
amplified, cloned and sequenced as previously reported [14] in individual specimens from Uberaba,
Cuiaba´, Aparecida de Goiaˆnia, Maceio´ and Fortaleza. Sequences of at least eight clones of each insect
were analyzed.
The numbers of copies of the AaNaV IIS6 genomic
region were compared among the Rockefeller strain,
the EE lineage and their F1 offspring (Hyb). DNA
was extracted from pools of 10 L3 larvae (20 mg)
with the kit Insect DNA Extraction (Zymo Research)
according to the manufacturer’s instructions,
brought to 5 ng/ml in H2O and aliquoted. Real-time
PCR reactions were carried out based on instructions of customized TaqMan Copy Number Assay
(Applied Biosystems) in 15 ml, containing 7.5 ml of
2 TaqMan Genotyping Master Mix (Applied
Biosystems), 0.75 ml of 20 mix composed of primers and probes for both target and reference
genes, 20 ng of DNA and H2O. The chosen single
copy reference was the ribosomal gene RP49
(GenBank accession number AY539746), with primers AaRP49_F: 50 -ACATCGGTTACGGATCGAACA
AG-30 , AaRP49_R: 50 -TGTGGACCAGGAACTTCTTG
AAG-30 and probe AaRP49_M: 50 -VIC-CACCCGCCA
TATGCT-MGB-NFQ-30 . The target was determined
based on the AaNaV IIS6 region (GenBank accession
number FJ479613) with primers AaNaVex20_F: 50 ACCGACTTCATGCACTCATTCAT-30 , AaNaVex20_R:
50 -ACAAGCATACAATCCCACATGGA-30 and probe
AaNaVex20_M:
50 -FAM-CCACTCGCCGCATAAT0
MGB-NFQ-3 . Three assays were performed with
DNA from three distinct pools of each lineage, in
triplicate/assay. Reactions were conducted in an
ABI StepOne Thermocycler (Applied Biosystems),
following standard cycling conditions for TaqMan
Genotyping assays. The CTs for the target (AaNaV)
and reference (RP49) genes were determined based
on automatic threshold indicated by the StepOne
Software v2.0. Given the CT of each sample, their
CTs were established, intended to normalize the
amount of amplified products from AaNaV by RP49,
and then the average of the replicates from each pool
CT ([CT]) was calculated. The CT of the test
lineages (EE and Hyb) were obtained by the difference between their [CT] and that of Rockefeller.
Finally, the average of CTs from the three assays
([CT]) was calculated in order to estimate the
number of AaNaV copies, normalized by RP49,
related to Rockefeller. The diploid number of the target sequence of the tested sample was determined
by the formula: cnc2CT, where cnc is the copy number of the target sequence in the reference sample
and CT is the difference between the CT for the
tested sample and the reference sample.
Sodium channel gene duplication in Aedes aegypti
until adults for genotyping by AS-PCR or for subsequent crossings to obtain F2, performed as above.
Ethics statement
Mosquito blood feeding
Aedes aegypti females were fed on anesthetized mice
(ketamine:xylazine 80–120:10–16 mg/kg), according to institutional procedures, oriented by the national guideline ‘the Brazilian legal framework on
the scientific use of animals’ [23]. This study was
reviewed and approved by the Fiocruz Ethics
Committee on Animal Use (CEUA/FIOCRUZ),
license number: L-011/09.
RESULTS
Typing of 1011 and 1016 sodium channel sites
in Ae. aegypti natural populations by AS-PCR
The allele frequencies of the AaNaV 1011 site were
evaluated in the same mosquitoes which had the
1016 site analyzed previously, belonging to samples
from 15 Brazilian localities [13]. The 1011Met-mutant allele was found in all localities, except in Boa
Vista. In seven localities, specimens were divided
into pyrethroid susceptible (S) or resistant (R) [13].
Table 1 shows allele frequencies considering both
1011 and 1016 sites together, combined in six molecular phenotypes, derived from three potential
haplotypes (1011Ile+1016Val, 1011Ile+1016Ile
and 1011Met+1016Val). We assumed that the recombinant haplotype containing both mutant alleles
(1011Met+1016Ile) was not expected, because
these sites are very close in the genome and both
mutations are likely to be very recent. We observed
that the 1011Ile/Ile+1016Ile/Ile combination, i.e.
homozygous for the wild-type and for the mutant
allele, respectively, in the 1011 and 1016 sites, was
far more frequent among resistant than susceptible
insects. This suggests that the 1016 site is probably
more important for pyrethroid resistance than the
1011 site.
Two other striking results can also be observed.
First, we did not detect any specimen ‘homozygous’
for the 1011Met (1011Met/Met+1016Val/Val) mutation. Second, there is a higher than expected frequency of the 1011Ile/Met+1016Val/Val molecular
phenotype in all samples, except the near monomorphic Boa Vista population (Table 1). Although
the individual tests of the Hardy–Weinberg expectations for each sample were significant only in four
cases, likely due to the small sample sizes, the lack of
the 1011Met/Met+1016Val/Val molecular phenotype and the excess of 1011Ile/Met+1016Val/Val
were observed in almost all populations. Two simple
hypotheses were considered to explain this pattern.
One possibility is that the 1011Met mutation is
involved in a gene duplication, carrying both the mutant (1011Met+1016Val) and the wild-type allele
(1011Ile+1016Val). In this case, the 1011Met/Met
genotype would never be detected by the AS-PCR,
because that duplication would generate a molecular phenotype mimicking a heterozygous 1011Ile/
Met. Alternatively, one might argue that the
1011Met mutation is lethal when in homozygosis.
However, this is not the case ([16], see ‘Discussion’
section herein), and it does not explain the increased
frequency of 1011Ile/Met+1016Val/Val, unless one
also assumes this particular combination has a
higher fitness. In order to better understand these
data, we cloned and sequenced the IIS6 region from
a number of mosquitoes.
Sequencing of the IIS6 region of the Ae. aegypti
sodium channel gene
We obtained sequences of the AaNaV IIS6 region
from a number of mosquitoes from five Brazilian
populations (see ‘Materials and Methods’ section
for details) and confirmed the polymorphism in this
genomic region. Figure 1 shows the haplotypes and
their respective submission numbers in GenBank.
Sequences were classified as ‘A’ or ‘B’, according
to two synonymous substitutions in exon 20 and
differences in the intron (see [14] for details). The
Ile1011Met substitution was seen in all studied
populations, whereas Val1016Ile was not detected
in the Northeastern localities (Maceio´ and
Fortaleza). Both substitutions were present only in
sequences type ‘A’, and among sequences from 40
individuals, no haplotype shared substitutions in
both the 1011 and 1016 sites, indicating no recombinants between the two mutations. As mentioned
above, this was expected considering that these sites
are very close, and the mutations are likely to be very
recent. Hence, only four haplotypes were observed
151
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Entomological survey
All field egg collections were conducted by agents
from each respective State Health Secretariat, following procedures designed by the National
Program of Dengue Control/Brazilian Ministry of
Health. All ovitraps were installed and collected in
the houses with residents’ permission.
Martins et al. |
R
S
R
S
R
S
R
S
R
S
R
S
R
S
*
*
*
*
*
*
*
*
Aparecida de Goiaˆnia
0.056 (0.094)
0.105 (0.305)
0.045 (0.052)
0.118 (0.221)
0.231 (0.148)
0.571 (0.617)
0 (0.035)
0.250 (0.303)
0.250 (0.391)
0.313 (0.431)
0.467 (0.538)
0.333 (0.444)
0.043 (0.030)
0.300 (0.276)
0.950 (0.930)
0.200 (0.090)
0 (0.191)
0 (0.003)
0.900 (0.903)
0.300 (0.423)
0.938 (0.938)
0.650 (0.681)
1016Val/Val
0 (0.204)
0.053 (0.029)
0.273 (0.299)
0.118 (0.138)
0 (0.325)
0.143 (0.112)
0.063 (0.223)
0.250 (0.275)
–
–
–
–
0.087 (0.204)
0.050 (0.184)
0 (0.095)
0 (0.255)
0.250 (0.191)
0.053 (0.078)
–
–
–
–
1016Val/Ile
1011Ile/Ile
0.222 (0.111)
0 (0.001)
0.455 (0.435)
0 (0.022)
0.385 (0.179)
0 (0.005)
0.500 (0.353)
0.100 (0.063)
–
–
–
–
0.391 (0.345)
0.050 (0.031)
0.050 (0.003)
0.250 (0.181)
0.063 (0.048)
0.526 (0.543)
–
–
–
–
1016Ile/Ile
0.500
0.842
0.091
0.588
0.308
0.286
0.313
0.350
0.750
0.688
0.533
0.667
0.174
0.400
–
0.200
0.625
0.053
0.100
0.700
0.063
0.350
(0.165)
(0.301)
(0.022)
(0.095)
(0.455)
(0.061)
(0.289)
(0.221)
(0.465)
(0.052)
(0.360)
(0.148)
(0.020)
(0.260)
(0.220)
(0.469)
(0.451)
(0.391)
(0.444)
(0.083)
(0.315)
1016Val/Val
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0.222 (0.240)
0 (0.022)
0.136 (0.150)
0.176 (0.112)
0.077 (0.163)
0 (0.020)
0.125 (0.048)
0.050 (0.100)
–
–
–
–
0.304 (0.281)
0.200 (0.240)
–
0.350 (0.234)
0.063 (0.150)
0.368 (0.310)
–
–
–
–
(0.130)
(0.177)
(0.013)
(0.146)
(0.037)
(0.037)
(0.037)
(0.040)
(0.141)
(0.118)
(0.071)
(0.111)
(0.057)
(0.090)
–
(0.076)
(0.118)
(0.044)
(0.003)
(0.123)
(0.001)
(0.031)
1016Val/Val
1011Met/Met
1016Val/Ile
1011Ile/Met
Frequency of genotypes: observed (and expected assuming Hardy–Weinberg equilibrium)
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
14.7, 5, 0.0119
2.9, 5, 0.7204
1.1, 5, 0.9571
6.8, 5, 0.2347
11.2, 5, 0.0473
1.0, 5, 0.9589
15.6, 5, 0.0080
3.5, 5, 0.6213
5.8, 2, 0.0561
4.4, 2, 0.1114
2.0, 2, 0.3709
3.8, 2, 0.1534
5.5, 5, 0.3619
6.2, 5, 0.2860
1.9, 2, 0.3772
11.1, 5, 0.0487
11.7, 5, 0.0388
2.1, 5, 0.8408
0.06, 5, 0.9727
5.8, 2, 0.0551
0.02, 2, 0.9917
0.9, 2, 0.6377
2, df, P
HWE
Frequencies observed and expected (for Hardy–Weinberg equilibrium) of the molecular phenotypes derived by AS-PCR for the sites 1011 and 1016 in the same insects. In the header, the mutant
alleles are underlined. Some populations are divided regarding their resistant (R) or susceptible (S) status to pyrethroid resistance. Populations whose individuals were not divided in R or S are
marked with an asterisk (*) in status. The absence of the mutations 1011Ile/Met and 1016Val/Ile in a population is represented as endash (–). The last column gives the result of 2 analyses for
testing Hardy–Weinberg equilibrium (HWE). The 1016 genotyping data were already presented elsewhere [13].
18
19
22
17
13
14
16
20
16
16
15
15
23
20
20
20
16
19
20
20
16
20
n
| Martins et al.
Boa Vista
Cachoeiro do Itapemirim
Colatina
Foz do Iguac¸u
Ijuı´
Macapa´
Santa Ba´rbara
Santa Rosa
Uberaba
Maceio´
Fortaleza
Dourados
Cuiaba´
Campo Grande
Status
Locality
Table 1. Phenotypic frequency, considering AaNaV 1011 and 1016 sites, of Ae. aegypti natural populations from Brazil
152
Evolution, Medicine, and Public Health
Sodium channel gene duplication in Aedes aegypti
Figure 1. Diversity of a voltage-gated sodium channel gene
region observed in Ae. aegypti Brazilian populations. Part of
the region corresponding to the AaNaV exons 20 and 21, and
the intron between them, are represented. A and B indicate the
type of intron, as previously stated [14]. In red, the presumed
amino acids for the sites 1011 and 1016. Genomic sequences
representative for each haplotype were submitted to GenBank:
1011Ile+B+1016Val
(GenBank
accession
number:
FJ479613), 1011Ile+A+1016Val (FJ479611), 1011Met+A+
1016Val (FJ479612) and 1011Ile+A+1016Ile (JX275501).
TIGR = sequence
from
Ae.
aegypti
genome
project
(Vectorbase)
153
does not occur in all individuals, being therefore a
polymorphic trait. In the samples analyzed, we detected mosquitoes ‘homozygous’ for the 1011Ile+
B+1016Val, 1011Ile+A+1016Val and 1011Ile+
A+1016Ile haplotypes, all having the wild-type allele
for the 1011 site. However, the ‘1011Met+
A+1016Val’ (mutant in the 1011 site) haplotype
was never detected in ‘homozygosis’, but always in
association with ‘1011Ile+B+1016Val’, suggesting
that the duplication involves these two variants
(Table 2). Figure 2 presents a schematic representation of AaNaV haplotypes proposed for the populations analyzed based on our duplication hypothesis.
The offspring of crosses between some combinations of parental genotypes was further analyzed
in order to test this hypothesis.
Crossing experiments
In order to test the duplication hypothesis, we performed crosses between specimens with known
molecular phenotypes (based on AS-PCR) and
determined the frequency of the variants in the
AaNaV 1011 site in their offspring. Initially, we
evaluated the F1 of seven couples, each composed
of a homozygous wild-type (1011Ile/Ile) and a putative heterozygous or duplicated (1011Ile/Met) progenitor, belonging, respectively, to the Rockefeller
and the EE lineages. The latter originated from a laboratory population selection for pyrethroid resistance using a sample from a natural population that
did not harbor the mutation Val1016Ile [20]. The results are shown in Table 3, with expected values and
the Fisher tests for the three different hypotheses in
Fig. 3, assuming either a duplication or no duplication. If the 1011Ile/Met parent did not harbor the
duplicated haplotype, the offspring would present
the Ile/Ile and Ile/Met genotypes in equal
frequencies (Hypothesis 1). Assuming the occurrence of a duplication, one would expect the offspring genotyped as either 100% Ile/Met or
alternatively Ile/Ile and Ile/Met in equal frequencies,
respectively, if the parent was homozygous
(Hypothesis 2a) or heterozygous (Hypothesis 2b)
for the duplicated haplotype (Fig. 3).
Two out of seven crosses (#3 and #4) had the
1011Ile/Ile genotype in around half of their offspring, which was thus not informative. In these
two cases, this could be explained if the progenitor
harboring the 1011Met mutation was heterozygous
for the duplication (1011Ile/Ile_Met) as well as if it
was heterozygous for non-duplicated haplotypes.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
(1011Ile+A+1016Val,
1011Ile+A+1016Ile,
1011Ile+B+1016Val and 1011Met+A+1016Val)
out of six possibilities, considering the type of sequence (‘A’ or ‘B’) and the sites 1011 (Ile or Met) and
1016 (Val or Ile) (Table 2). Moreover, the
1011Met+A+1016Val haplotype was only present
in specimens which also harbored the 1011Ile+
B+1016Val haplotype, therefore, classified as ‘heterozygous’. Accordingly, typing of various natural
populations had revealed the absence of ‘homozygous’ for the 1011Met mutation (Table 1). Curiously,
some specimens presented three haplotypes, which
were in all cases: 1011Met+A+1016Val, 1011Ile+
A+1016Ile and 1011Ile+B+1016Val (Table 2). It is
important to mention that females had their abdomen removed prior to DNA extraction in order to
avoid eventual amplification of DNA from spermatozoids stored in the spermatechae, and there was no
evidence of contamination in PCR negative controls.
The last column of Table 2 presents the expected
‘genotypes’ through sequence typing (A or B) and
the 1011 and 1016 sites. Sequencing confirmed the
results for all insects genotyped by AS-PCR (data not
shown).
The presence of three alleles in one specimen suggests the gene duplication, at least in the genomic
region analyzed. However, search in the Ae. aegypti
genome project database (http://aaegypti.
vectorbase.org/) did not indicate any evidence that
the original Liverpool strain has more than one copy
of any part, let alone the whole voltage-gated sodium
channel gene. Based on the available sequences, this
strain would be classified as homozygous for the
1011Ile+B+1016Val allele, just like the Rockefeller
strain used here. Hence, the putative duplication
Martins et al. |
154
| Martins et al.
Evolution, Medicine, and Public Health
Table 2. Sequencing of the AaNaV IIS6 genomic region of specimens from
Ae. aegypti Brazilian natural populations
Locality
Sample
Haplotype (1011+intron+1016)
Ile
+
A
+
Val
Uberaba
Ap Goiaˆnia
Maceio´
Fortaleza
Ile
+
A
+
Ile
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
Ile
+
B
+
Val
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
Met
+
B
+
Val
Ile
+
B
+
Ile
Ile/Ile+AB+Val/Val
Ile/Met+AB+Val/Val
Ile/Met+AB+Val/Ile
Ile/Ile+AA+Val/Ile
Ile/Ile+AA+Val/Ile
Ile/Ile+AA+Ile/Ile
Ile/Ile+AA+Ile/Ile
Ile/Ile+BB+Val/Val
Ile/Ile+AB+Val/Ile
Ile/Met+AB+Val/Ile
Ile/Ile+AA+Ile/Ile
Ile/Ile+AA+Ile/Ile
Ile/Ile+AA+Ile/Ile
Ile/Ile+AB+Val/Ile
Ile/Ile+AA+Val/Val
Ile/Ile+AB+Val/Val
Ile/Ile+AB+Val/Val
Ile/Ile+AB+Val/Val
Ile/Ile+AB+Val/Val
Ile/Ile+AB+Val/Val
Ile/Ile+AB+Val/Val
Ile/Met+AB+Val/Val
Ile/Ile+AA+Val/Ile
Ile/Met+AB+Val/Ile
Ile/Met+AB+Val/Val
Ile/Met+AB+Val/Val
Ile/Met+AB+Val/Val
Ile/Met+AB+Val/Ile
Ile/Met+AB+Val/Val
Ile/Met+AB+Val/Val
Ile/Met+AB+Val/Val
Ile/Met+AB+Val/Val
Ile/Met+AB+Val/Val
Ile/Met+AB+Val/Val
Ile/Ile+BB+Val/Val
Ile/Ile+BB+Val/Val
Ile/Met+AB+Val/Val
Ile/Ile+AA+Val/Val
Ile/Ile+BB+Val/Val
Ile/Ile+AB+Val/Val
Identification of each sample corresponds to the sampling locality: UBR, Uberaba; CUI, Cuiaba´; APG, Aparecida de
Goiaˆnia; COM, Maceio´ and hrjg, Henrique Jorge (a district of Fortaleza). ‘Haplotypes’ indicate the combination among
site 1011 (Ile or Met)+type of intron (A or B)+site 1016 (Val or Ile). The haplotype observed for each insect is marked
by an ‘X’. In the header, the mutations are indicated in bold letters. The last column shows the phenotypic classification, confirmed by AS-PCR.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Cuiaba´
UBR-04
UBR-08
UBR-10
UBR-S25
UBR-S26
UBR-R1
UBR-R3
UBR-R10
UBR-R11
UBR-R13
UBR-R20
UBR-R22
UBR-R26
CUI-01
CUI-02
CUI-03
CUI-04
CUI-07
CUI-08
CUI-12
CUI-R16
CUI-S15
APG-01
APG-02
APG-04
APG-05
APG-06
APG-07
APG-08
APG-09
APG-10
APG-11
APG-12
COM-02
COM-07
COM-09
hrjg-21
hrjg-22
hrjg-23
hrjg-28
Met
+
A
+
Val
Molecular phenotype
(1011+intron+1016)
Sodium channel gene duplication in Aedes aegypti
Martins et al. |
155
Figure 2. Schematic representation of AaNaV haplotypes. Blue boxes indicate exons 20 and 21 with the intron between them, the latter used to classify the
haplotypes as A (orange) or B (green). Sites 1011 and 1016 are represented by the variant wild-type (blue box) or mutant (red box). According to our hypothesis,
there is a duplication in some populations, comprised of haplotypes 1011Ile+B+1016Val and 1011Met+A+1016Val. Dashed line suggests linkage of the
haplotypes, but which one is upstream was not determined
Table 3. Testing the gene duplication hypothesis: molecular phenotype frequencies for the AaNav 1011
site in F1 offspring from crossings between Ae. aegypti Ile/Ile X Ile/Met
Crossings
Hypothesesa
F1 observed (n)
With duplication
Hypothesis 1
#1
#2
#3
#4
#5
#6
#7
(, Ile/Met x < Ile/Ile)
(, Ile/Met x < Ile/Ile)
(, Ile/Met x < Ile/Ile)
(,Ile/Ile x < Ile/Met)
(, Ile/Ile x < Ile/Met)
(, Ile/Met x < Ile/Ile)
(, Ile/Met x < Ile/Ile)
Hypothesis 2a
Hypothesis 2b
Ile/Ile
Ile/Met
Ile/Ile
Ile/Met
P
Ile/Ile
Ile/Met
P
Ile/Ile
Ile/Met
P
0
0
8
9
0
0
0
20
20
12
9
30
30
22
10
10
10
9
15
15
11
10
10
10
9
15
15
11
***
***
NS
NS
***
***
***
0
0
0
0
0
0
0
20
20
20
18
30
30
22
NS
NS
**
***
NS
NS
***
10
10
10
9
15
15
11
10
10
10
9
15
15
11
***
***
NS
NS
***
NS
NS
Molecular phenotype frequencies were determined by AS-PCR for the AaNaV 1011 site (see ‘Materials and Methods’ section). aExpected numbers of F1
individuals of each molecular phenotype based on the three hypotheses of parental haplotype constitution (Fig. 3). Significance of the deviations of the
tested hypotheses obtained through Fisher’s exact test: NS = non-significant, **P < 0.01, ***P < 0.001.
However, as all the offspring from the other five
crosses were 1011Ile/Met, the progenitor who harbored the mutation was necessarily homozygous for
the duplication (Ile_Met/Ile_Met) (Fig. 3). In
addition, the F2 offspring from crosses #1 (#1.1)
and #2 (#2.1) revealed segregation in the
approximated proportion of 3Ile/Met:1Ile/Ile
(Table 4), corroborating the duplication hypothesis.
Figure 3. Three hypotheses with the expected genotypes and
molecular phenotypes in the AaNaV 1011 site for the parental
and their respective expected frequency in the F1 offspring.
Copy number assay
We analyzed the AaNaV copy number variation
through molecular assays using DNA from pools
of larvae from the Rockefeller reference strain,
homozygous for the wild-type alleles, and a strain
(EE) selected in the laboratory for pyrethroid resistance [20] and harboring the putative duplication in
The 1011Met mutation is shown in red. See text for further
details
the AaNaV, as suggested by the assays described
above. In this sense, we assessed the relative
amount of DNA molecules containing the genomic
region spanning the AaNaV 1011 site normalized by
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Without duplication
156
| Martins et al.
Evolution, Medicine, and Public Health
a reference gene (RP49). Assuming that the
Rockefeller strain has only two copies of AaNaV as
expected for a diploid with a single copy gene, the EE
lineage selected for resistance and ‘homozygous’ for
the duplication, revealed to have in fact 10 copies
(Table 5 and Supplementary Table S1). Accordingly,
the F1 resulting from Rockefeller and EE had six
copies. The results therefore indicate further duplication events and amplification in this locus.
DISCUSSION
Table 4. Testing the gene duplication hypothesis: molecular phenotype
frequencies for the AaNav 1011 site in F2 offspring from crosses #1 and #2
(Table 3)
Crossings (F1)
F2 (n)
Observed
#1.1 (, Ile/Met x < Ile/Met)
#2.1 (, Ile/Met x < Ile/Met)
Expected
Ile/Ile
Ile/Met
Ile/Ile
Ile/Met
P
5
7
25
23
8
8
22
22
NS
NS
Observed and expected numbers for each molecular phenotype in the F2 of crosses #1 and #2 (Table 3) assuming
parents carry the following haplotypes Ile/Ile_Met Ile/Ile_Met, in agreement with the duplication hypothesis (Fig. 3).
The expected frequencies are 0.25 Ile/Ile and 0.75 Ile/Met (0.50Ile/Ile_Met+0.25Ile_Met/Ile_Met). Deviations from the
proposed hypotheses are non-significant (Fisher’s exact test; P > 0.05).
Table 5. Copy number variation assay for AaNaV
Assay
1
2
3
Rock
EE
Hib
[CT]
(SD)
Cq
[CT]
(SD)
Cq
[CT]
(SD)
Cq
0.4
0
-0.7
(0.09)
(0.11)
(0.07)
0
0
0
2.7
2.4
3
(0.03)
(0.04)
(0.04)
2.3
2.4
2.4
2
1.7
-2.4
(0.07)
(0.05)
(0.06)
1.6
1.6
1.7
[CT] (SD)
0
Cn
2
(0.03)
2.3
10
(0.07)
1.6
6
Average and standard deviation CT (target reference) followed by the Cq (lineage test Rock) values from each lineage in each assay. Bottom:
mean and standard deviation of CT from the three assays and the resulting number of copies (cn) of AaNaV relative to rp49.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
DDT and pyrethroids target the voltage-gated sodium channel (NaV) of insects, a key component of
axon membranes exhibiting a fundamental physiological function in neural current propagation, with a
complex but highly conserved structure among animals [24]. Vertebrate genomes present 6–10 NaVcoding genes, whereas invertebrate classes, such
as Cnidaria and Annelida, have only 2–4 NaV genes
[25]. In insects, there is only one NaV, also commonly
referred to as ‘paralytic’ (para), due to its relationship with the phenotype of reversible paralysis under
high temperatures in Drosophila melanogaster-mutant lineages [26, 27]. An important source of NaV
protein variability in different tissues relies on alternative splicing and RNA editing [28]. However, to
date no association between pyrethroid resistance
and variation derived from post-transcriptional
modifications in the Ae. aegypti NaV gene has been
uncovered [18]. Another possible source of molecular diversity might be polymorphism generated by
recent gene duplications. Putative additional NaV
in insects (the orthologous channels DSC1 in D.
melanogaster and BSC1 in Blattella germanica) were
later grouped close to calcium channels, both functionally and evolutionarily [29, 30]. Recently, two NaV
distantly related proteins were characterized in the
Periplaneta americana cockroach, coded by the
PaNaV and PaFPC para-like genes, a finding that
Sodium channel gene duplication in Aedes aegypti
metabolic resistance was demonstrated in
Caribbean Ae. aegypti populations. Compared with
the susceptible strain, two genes (CYP9J26 and the
ABC transporter ABCB4) were amplified up to eight
and seven copies, respectively [44].
Besides insecticide resistance, duplication of
metabolic-resistance genes may also be selectively
advantageous to the organism by increasing its general ability of detoxify xenobiotics. Moreover, new
functions might be generated due to accumulation
of substitutions in duplicated genes [45]. Such
events would be more ‘free’ to occur, since the detoxifying enzyme system is redundant, reliant upon
different enzymes with a similar function. Hence, the
accumulation of potential loss of function alterations might not significantly compromise the metabolism [46].
By contrast, gene duplication events in molecules
which are targets of neurotoxic insecticides are
thought to be less likely, since they carry out very
specific and essential activities, highly conserved
throughout evolution. The increase in number might
compromise the neurological functioning of the organism, an event described as dosage-balance hypothesis [47]. For instance, a Culex pipiens lineage
with an acetilcolinesterase gene (ace-1) duplication
presents 60% increase in enzyme activity. However,
the acquired organophosphate resistance status is
accompanied by an elevated cost of several life-history trait parameters [48]. Indeed, in a number of Cx.
pipiens populations, the frequency of the ace-1R-mutant allele decays quickly in the absence of insecticide [49, 50], the same tendency observed for ace-1R
in An. gambiae [51].
However, Cx. pipiens’ natural populations with a
putative recent ace-1 gene duplication (<40 years)
have also been described. In these cases, both
copies, with and without the mutation selected for
organophosphate resistance, lie in the same
chromosome. These mosquitoes, with a ‘heterozygous’ molecular phenotype, are resistant to organophosphates but have a lower fitness loss [52],
suggesting a mechanism which favors the occurrence of duplications in neurotoxic insecticide target-coding genes.
Herein, we initially hypothesized a duplication in a
region of the NaV gene of Ae. aegypti (AaNaV) as a
polymorphic trait in natural populations of this important vector, which would include one-mutant
haplotype for the 1011 site together with one wildtype for both sites, 1011Met+1016Val and
1011Ile+1016Val, respectively, supported by a fund
157
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
suggested a possible early duplication event and
subsequent loss of the NaV gene in some lineages
[31].
The role of gene duplication and/or amplification in insecticide resistance has been described
in at least 10 arthropod species, including
mosquitoes [32]. The most classic case involves
overexpression of Culex Esterase genes, leading
to organophosphate resistance. This is the consequence of duplication of two genes (named esterase A and esterase B) or at least the esterase B [33–
35]. Amplification of esterase B1 in Californian
Culex mosquitoes was the first event described in
this context [36]. Variation in the number of copies
among insects was also observed, being directly
proportional to organophosphate resistance levels
[37]. In agreement, laboratory insecticide selection
pressure resulted in an increase in the gene copy
numbers. However, it is likely that this process has
a limit, since gene amplification is associated with
a high fitness cost [38]. In fact, unequal crossingover in the duplicated locus [37] may cause a reduction in copy number over time in the absence
of insecticide pressure.
Gene duplication was also associated with another class of enzymes related to metabolic resistance, the multi-function oxidases (MFOs) or P450
[39]. Two genes of this class (CYP6P9 and CYP6P4)
were overexpressed in pyrethroid-resistant lineages
of the malaria vector, Anopheles funestus. This
overexpression is associated within tandem gene duplications, mapped in a quantitative trait locus
(QTL locus rp1) and responsible for 87% of the genetic variation for pyrethroid resistance in this lineage.
Besides, single nucleotide polimorphisms (SNPs)
observed in these genes were described as insecticide-resistance markers [39]. Another gene duplication event was associated with overexpression of a
P450 gene (CYP9M10) in a pyrethroid-resistant strain
of Culex quinquefasciatus [40]. Duplications in genes
coding for enzymes involved in metabolic resistance
are somewhat expected, since they are components
of supergene families bearing many paralogous
genes, generally organized in genome clusters [41].
These are rapidly evolving families and few orthologs
are identified among insect species [42]. In the Ae.
aegypti genome, at least 26, 49 and 160 genes of the
main detoxifying enzymes were identified corresponding, respectively, to GST, Esterases and MFO.
These numbers represent an increase of 36%
compared with Anopheles gambiae [43]. Recently, the
importance of gene amplification for pyrethroid
Martins et al. |
158
| Martins et al.
Evolution, Medicine, and Public Health
when pools of 10 larvae were employed. The variation
in the number of copies in natural populations remains to be investigated as an important clue for this
evolutionary process.
Amplification of the NaV gene was also recently
demonstrated in a pyrethroid-resistant C. quinquefasciatus lineage. The classical kdr mutation
(Leu1014Phe), strongly associated to pyrethroid resistance, was present in one type of sequence. The
other type of sequence lacked the intron close to the
1014 site and was not related to resistance. This
haplotype was suggested to be a pseudogene [55].
To the best of our knowledge, we present here the
first evidence of a duplication event in the sodium
channel gene of the dengue vector, Ae. aegypti.
Although the available data point to a more important role of the mutations in the 1016 site for pyrethroid resistance, there is clear evidence that the
1011Met mutation, which is associated with the duplication/amplification event(s), is also associated
with some resistance [12, 14]. Therefore, the gene
duplication and amplification in the Ae. aegypti
NaV gene might be a recent adaptive response to
the intense use of insecticides, maintaining together
wild-type and mutant alleles in the same organism
conferring some resistance at the same time as
reducing some of its deleterious effects on other
aspects of fitness. It will be very interesting to investigate how much diversity in copy number variation
there is in natural populations, besides its possible
association with pyrethroid resistance and fitness
cost. It is also intriguing whether the mosquito sodium channel gene is more prone to duplications
than that of other pyrethroid-selected insects as well
as what the potential evolutionary interpretation and
implications of this process are.
supplementary data
Supplementary data is available at EMPH online.
acknowledgments
The authors thank Dr Alexandre Afranio Peixoto for his
friendship and orientation throughout this study. This work
is dedicated to his memory. They also thank Andre Torres
and Heloisa Diniz for their assistance with the figures, the
DNA sequencing facility of FIOCRUZ (Plataforma de
Sequenciamento/PDTIS/Fiocruz)
and to
the
Brazilian
Dengue Control Program that allowed utilization of samples
collected in the scope of the Brazilian A. aegypti Insecticide
Resistance Monitoring Network (MoReNAa).
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
of evidence. AS-PCR genotyping confirmed that all
individuals carrying the 1011Met mutation were
(phenotypically) ‘heterozygous’. In addition,
sequencing of the AaNaV IIS6 genomic region revealed some individuals with three haplotypes, suggesting the existence of a duplication with the
proposed aforementioned composition. Similar
results of mosquitoes harboring three alleles were
recently reported for the An. gambiae acetilcolinesterase ace-1 gene and interpreted as evidence
of a gene duplication event [53].
Saavedra-Rodriguez et al. [16] evaluated the role
of AaNaV mutations in pyrethroid resistance by
analyzing the susceptibility of the F3 offspring from
the parental crossing ,1011Ile/Met+1016Ile/Ile
(from Isla Mujeres, Mexico) <1011Ile/Ile+
1016Val/Val (from New Orleans, lineage control of
susceptibility). Interestingly, if the presence of a
duplicated sodium channel had been considered,
interpretation of some results would have been
made easier since they would have better explained
the different genotypes in the crosses. In addition, it
is remarkable that the Ile1011Met substitution
seems to appear in ‘homozygosis’ (1011Met/Met)
in high frequency in other localities in Latin
America [16, 54], indicating that this mutation is
not recessive-lethal and that different types of
duplicated haplotypes probably coexist in Ae. aegypti
populations. This might also suggest that the
gene duplication in the Ae. aegypti NaV gene we
observed in Brazilian populations is a relatively
recent event.
Our initial hypothesis was that, at least for the
Ae. aegypti populations studied herein, the 1011Met
mutation occurs only in a duplicated haplotype containing a type ‘A’ sequence and the 1016Val wild-type
allele, together and in linkage disequilibrium with a
type ‘B’ sequence, containing the wild-type allele for
both the 1011 and 1016 positions (Fig. 2). The high
frequency of ‘heterozygous’ A/B, the lack of 1011Met/
Met specimens, 1011Ile/Met+1016Ile/Ile genotypes
and the molecular phenotype of the offspring
analyzed here support this hypothesis. However, the
results obtained by the copy number variation assay
show a ratio of five copies of the AaNaV gene in the EEselected lineage when compared with the Rockefeller
strain, indicating that further duplication events
might have taken place, possibly as a result of unequal crossing-over. Moreover, it is presumed that
the number of copies is a polymorphic trait, given
the large variation observed when using single mosquito DNA (data not shown), which was diminished
Sodium channel gene duplication in Aedes aegypti
funding
Martins et al. |
159
13. Martins AJ, Lima JB, Peixoto AA et al. Frequency of
Val1016Ile mutation in the voltage-gated sodium channel
This work was supported by the Conselho Nacional de
Desenvolvimento Cientı´fico e Tecnolo´ico (CNPq - Pronex
gene of Aedes aegypti Brazilian populations. Trop Med Int
Dengue), Fundac¸a˜o Carlos Chagas Filho de Amparo a`
14. Martins AJ, Lins RM, Linss JG et al. Voltage-gated sodium
Pesquisa do Estado do Rio de Janeiro (FAPERJ - Cientistas
channel polymorphism and metabolic resistance in pyr-
do nosso estado), the Howard Hughes Medical Institute
(HHMI) and the Instituto Nacional de Cieˆmcia e Tecnologia
ethroid-resistant Aedes aegypti from Brazil. Am J Trop Med
- Entomologia Molecular (INCT-EM). The funders had no role
15. Rajatileka S, Black WC 4th, Saavedra-Rodriguez K et al.
in study design, data collection and analysis, decision to
Development and application of a simple colorimetric
publish, or preparation of the manuscript. Funding to pay
assay reveals widespread distribution of sodium channel
Health 2009;14:1351–5.
Hyg 2009;81:108–15.
the Open Access publication charges for this article was
mutations in Thai populations of Aedes aegypti. Acta Trop
provided by Fundac¸a˜o Oswaldo Cruz (FIOCRUZ).
2008;108:54–7.
Conflict of interest: None declared.
16. Saavedra-Rodriguez K, Urdaneta-Marquez L, Rajatileka S
et al. A mutation in the voltage-gated sodium channel gene
associated with pyrethroid resistance in Latin American
Aedes aegypti. Insect Mol Biol 2007;16:785–98.
references
17. Garcia GP, Flores AE, Fernandez-Salas I et al. Recent rapid
Epidemiol Serv Sau´de 2007;16:295–302.
2. Soderlund DM. Pyrethroids, knockdown resistance and
sodium channels. Pest Manag Sci 2008;64:610–6.
aegypti in Mexico. PLoS Negl Trop Dis 2009;3:e531.
18. Chang C, Shen WK, Wang TT et al. A novel amino acid
substitution in a voltage-gated sodium channel is
associated with knockdown resistance to permethrin in
Aedes aegypti. Insect Biochem Mol Biol 2009;39:272–8.
3. da-Cunha MP, Lima JB, Brogdon WG et al. Monitoring of
19. Harris AF, Rajatileka S, Ranson H. Pyrethroid resistance in
resistance to the pyrethroid cypermethrin in Brazilian Aedes
Aedes aegypti from Grand Cayman. Am J Trop Med Hyg
aegypti (Diptera: Culicidae) populations collected between
2001 and 2003. Mem Inst Oswaldo Cruz 2005;100:441–4.
2010;83:277–84.
20. Martins AJ, Ribeiro CD, Bellinato DF et al. Effect of insecti-
4. Montella IR, Martins AJ, Viana-Medeiros PF et al.
cide resistance on development, longevity and reproduc-
Insecticide resistance mechanisms of Brazilian Aedes
tion of field or laboratory selected Aedes aegypti
aegypti populations from 2001 to 2004. Am J Trop Med
populations. PLoS One 2012;7:e31889.
Hyg 2007;77:467–77.
5. Busvine JR. Mechanism of resistance to insecticide in
21. Lima JB, Da-Cunha MP, Da Silva RC et al. Resistance of
houseflies. Nature 1951;168:193–5.
6. Harrison CM. Inheritance of resistance of DDT in the
municipalities in the State of Rio de Janeiro and Espirito
housefly, Musca domestica L. Nature 1951;167:855–6.
7. Catterall WA. From ionic currents to molecular mechan-
22. Belinato TA, Martins AJ, Lima JB et al. Effect of the chitin
isms: the structure and function of voltage-gated sodium
bility and reproduction of Aedes aegypti. Mem Inst Oswaldo
channels. Neuron 2000;26:13–25.
8. Du Y, Nomura Y, Luo N et al. Molecular determinants on
23. Filipecki AT, Machado CJ, Valle S et al. The Brazilian legal
Aedes
aegypti
to
organophosphates
in
several
Santo, Brazil. Am J Trop Med Hyg 2003;68:329–33.
synthesis inhibitor triflumuron on the development, viaCruz 2009;104:43–7.
the insect sodium channel for the specific action of type II
framework on the scientific use of animals. ILAR J 2011;52:
pyrethroid insecticides. Toxicol Appl Pharmacol 2009;234:
E8–15.
266–72.
9. O’Reilly AO, Khambay BP, Williamson MS et al. Modelling
insecticide-binding sites in the voltage-gated sodium
channel. Biochem J 2006;396:255–63.
10. Davies TE, O’Reilly AO, Field LM et al. Knockdown resist-
24. Martins AJ, Valle D. The pyrethroid knockdown resistance.
In: Soloneski,S, Larramendy,MS (eds), Insecticides—Basic
and Other Applications. Rijeka: InTech, 2012, 17–38.
25. Goldin AL. Evolution of voltage-gated Na+ channels. J Exp
Biol 2002;205(Pt 5): 575–84.
ance to DDT and pyrethroids: from target-site mutations
26. Suzuki DT, Grigliatti T, Williamson R. Temperature-sensi-
to molecular modelling. Pest Manag Sci 2008;64:1126–30.
tive mutations in Drosophila melanogaster. VII. A mutation
11. Davies TG, Field LM, Usherwood PN et al. DDT, pyreth-
(para-ts) causing reversible adult paralysis. Proc Natl Acad
rins, pyrethroids and insect sodium channels. IUBMB Life
2007;59:151–62.
12. Brengues C, Hawkes NJ, Chandre F et al. Pyrethroid and
DDT cross-resistance in Aedes aegypti is correlated with
novel mutations in the voltage-gated sodium channel
gene. Med Veterinary Entomol 2003;17:87–94.
Sci USA 1971;68:890–3.
27. Loughney K, Kreber R, Ganetzky B. Molecular analysis of
the para locus, a sodium channel gene in Drosophila. Cell
1989;58:1143–54.
28. Davies TG, Field LM, Usherwood PN et al. A comparative
study of voltage-gated sodium channels in the Insecta:
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
rise of a permethrin knock down resistance allele in Aedes
1. Braga IA, Valle D. Aedes aegypti: vigilaˆncia, monitoramento
da resisteˆncia e alternativas de controle no Brasil.
160
| Martins et al.
Evolution, Medicine, and Public Health
implications for pyrethroid resistance in Anopheline and
other Neopteran species. Insect Mol Biol 2007;16:361–75.
environmental response in the honeybee. Insect Mol Biol
2006;15:615–36.
29. Zhou W, Chung I, Liu Z et al. A voltage-gated calcium-
43. Strode C, Wondji CS, David JP et al. Genomic analysis of
selective channel encoded by a sodium channel-like gene.
detoxification genes in the mosquito Aedes aegypti. Insect
Neuron 2004;42:101–12.
30. Cui YJ, Yu LL, Xu HJ et al. Molecular characterization of
DSC1 orthologs in invertebrate species. Insect Biochem
Mol Biol 2012;42:353–9.
31. Moignot B, Lemaire C, Quinchard S et al. The discovery of
a novel sodium channel in the cockroach Periplaneta
americana: evidence for an early duplication of the paralike gene. Insect Biochem Mol Biol 2009;39:814–23.
32. Bass C, Field LM. Gene ampliEcation and insecticide resistance. Pest Manag Sci 2011;67:886–9.
33. Raymond M, Chevillon C, Guillemaud T et al. An overview
of the evolution of overproduced esterases in the mosquito Culex pipiens. Philos Trans R Soc Lond B Biol Sci
1998;353:1707–11.
esterase A and B genes as a single unit in Culex pipiens
mosquitoes. Heredity (Edinb) 1996;77(Pt 5): 555–61.
ing the molecular basis of pyrethroid resistance in the
dengue vector, Aedes aegypti. PLoS Negl Trop Dis 2012;6:
e1692.
45. Kimura M, King JL. Fixation of a deleterious allele at one of
two ‘‘duplicate’’ loci by mutation pressure and random
drift. Proc Natl Acad Sci USA 1979;76:2858–61.
46. Conant GC, Wolfe KH. Turning a hobby into a job: how
duplicated genes find new functions. Nat Rev Genet 2008;
9:938–50.
47. Papp B, Pal C, Hurst LD. Dosage sensitivity and the
evolution of gene families in yeast. Nature 2003;424:
194–7.
48. Bourguet D, Raymond M, Fournier D et al. Existence of two
acetylcholinesterases in the mosquito Culex pipiens
35. Montella IR, Schama R, Valle D. The classification of esterases: an important gene family involved in insecticide
(Diptera:Culicidae). J Neurochem 1996;67:2115–23.
49. Berticat C, Boquien G, Raymond M et al. Insecticide resist-
resistance—a review. Mem Inst Oswaldo Cruz 2012;107:
ance genes induce a mating competition cost in Culex
36. Mouches C, Pasteur N, Berge JB et al. Amplification of an
pipiens mosquitoes. Genet Res 2002;79:41–7.
50. Berticat C, Duron O, Heyse D et al. Insecticide resistance
esterase gene is responsible for insecticide resistance in a
genes confer a predation cost on mosquitoes, Culex
437–49.
California Culex mosquito. Science 1986;233:778–80.
pipiens. Genet Res 2004;83:189–96.
37. Guillemaud T, Lenormand T, Bourguet D et al. Evolution of
51. Alout H, Djogbenou L, Berticat C et al. Comparison of
resistance in Culex pipiens: Allele replacement and
changing environment. Evolution 1998;52:443–53.
Anopheles gambiae and Culex pipiens Acetycholinesterase
1 biochemical properties. Comp Biochem Physiol B
38. Raymond M, Poulin E, Boiroux V et al. Stability of insecti-
Biochem Mol Biol 2008;150:271–7.
cide resistance due to amplification of esterase genes in
52. Labbe P, Berthomieu A, Berticat C et al. Independent
Culex pipiens. Heredity 1993;70:301–7.
39. Wondji CS, Irving H, Morgan J et al. Two duplicated P450
duplications of the acetylcholinesterase gene conferring
genes are associated with pyrethroid resistance in
insecticide resistance in the mosquito Culex pipiens.
Mol Biol Evol 2007;24:1056–67.
Anopheles funestus, a major malaria vector. Genome Res
53. Djogbenou L, Chandre F, Berthomieu A et al. Evidence of
2009;19:452–9.
40. Itokawa K, Komagata O, Kasai S et al. Genomic structures
introgression of the ace-1R mutation and of the ace-1 duplication in West African Anopheles gambiae s. s. PLoS One
of Cyp9m10 in pyrethroid resistant and susceptible strains
of Culex quinquefasciatus. Insect Biochem Mol Biol 2010;40:
631–40.
41. Ranson H, Claudianos C, Ortelli F et al. Evolution of supergene families associated with insecticide resistance.
Science 2002;298:179–81.
42. Claudianos C, Ranson H, Johnson RM et al. A deficit of
detoxification enzymes: pesticide sensitivity and
2008;3:e2172.
54. Lima EP, Paiva MH, de Araujo AP et al. Insecticide resistance in Aedes aegypti populations from Ceara, Brazil.
Parasit Vectors 2011;4:5.
55. Xu Q, Tian L, Zhang L et al. Sodium channel genes and
their differential genotypes at the L-to-F kdr locus in the
mosquito Culex quinquefasciatus. Biochem Biophys Res
Commun 2011;407:645–9.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
34. Rooker S, Guillemaud T, Berge J et al. Coamplification of
Biochem Mol Biol 2008;38:113–23.
44. Bariami V, Jones CM, Poupardin R et al. Gene amplification, ABC transporters and cytochrome P450s: unravel-
187
Evolution, Medicine, and Public Health [2013] pp. 187–196
doi:10.1093/emph/eot016
orig inal
research
article
Patterns of physical and
psychological development
in future teenage mothers
Daniel Nettle1, Thomas E. Dickins*2, David A. Coall3,4 and Paul de Mornay Davies2
Centre for Behaviour and Evolution, Institute of Neuroscience, Newcastle University, Newcastle, UK; 2Department of
Clinical Neurosciences, University of Western Australia, Crawley, Australia; and 4School of Medical Sciences, Edith Cowan
University, Joondalup, Australia
*Corresponding author. Department of Psychology, Middlesex University, London, UK. Tel:+44(0)2084114588;
E-mail: t.dickins@mdx.ac.uk
Received 3 June 2013; revised version accepted 13 August 2013
ABSTRACT
Background and objectives: Teenage childbearing may have childhood origins and can be viewed as the
outcome of a coherent reproductive strategy associated with early environmental conditions. Life-history
theory would predict that where futures are uncertain fitness can be maximized through diverting effort
from somatic development into reproduction. Even before the childbearing years, future teenage
mothers differ from their peers both physically and psychologically, indicating early calibration to key
ecological factors. Cohort data have not been deliberately collected to test life-history hypotheses within
Western populations. Nonetheless, existing data sets can be used to pursue relevant patterns using
socioeconomic variables as indices of relevant ecologies.
Methodology: We examined the physical and psychological development of 599 young women from the
National Child Development Study who became mothers before age 20, compared to 599
socioeconomically matched controls.
Results: Future young mothers were lighter than controls at birth and shorter at age 7. They had earlier
menarche and accelerated breast development, earlier cessation of growth and shorter adult stature.
Future young mothers had poorer emotional and behavioural adjustment than controls at age 7 and
especially 11, and by age 16, idealized younger ages for marriage and parenthood than did the controls.
Conclusions and implications: The developmental patterns we observed are consistent with the idea
that early childbearing is a component of an accelerated reproductive strategy that is induced by earlylife conditions. We discuss the implications for the kinds of interventions likely to affect the rate of
teenage childbearing.
K E Y W O R D S : life history theory; development; early reproduction; reproductive strategy
ß The Author(s) 2013. Published by Oxford University Press on behalf of the Foundation for Evolution, Medicine, and Public Health. This is an Open
Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits
unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
1
Psychology, Middlesex University, London, UK; 3Community, Culture, and Mental Health Unit, School of Psychiatry and
188
| Nettle et al.
Evolution, Medicine, and Public Health
INTRODUCTION
Whether or not an organism is high or low on
plasticity, their phenotype is regarded as the outcome of selection operating within the parameters
of key trade-offs. ‘Trade-offs represent the costs paid
in the currency of fitness when a beneficial change in
one trait is linked to a detrimental change in another’
[21]. One key trade-off is that between current and
future reproduction. Physiologically this amounts to
a decision about when to stop investing in somatic
capital (growth and maintenance) and divert energy
into reproduction [17, 22]. Some species have a total
commitment to this decision, including Pacific salmon, whose bodies deteriorate during spawning as
they divert all of their somatic capital into reproduction. They die immediately after this event. Other
species, including our own, have a mixed allocation
across lifespan, and in our case we have a lengthy
pre- and post-reproduction life [23].
Within species variation in timing of first reproduction should be sensitive to local ecology. A resource rich ecology will enable a relatively lengthy
investment in somatic capital and a consequent
delay in reproduction. Where the ecology is stressed,
and resource acquisition uncertain, the somatic investment should stop sooner, and reproduction will
commence earlier [24]. The trade-off between quality
and quantity of offspring will also provide selection
pressure. Ecological stress can lead to increased reproduction, effectively as a bet-hedging strategy.
Better resources allow for investment in more robust, higher quality offspring [25].
Human populations in the developed world are
not uniform in their ecological niche, and do not
have equal access to resources. This leads to distinct
life-history differences in terms of morbidity and
mortality across socioeconomic gradients [26].
There are also differences in reproductive strategy,
such that low socioeconomic status neighbourhoods carry a higher risk of teenage pregnancy and
motherhood [3, 13, 27–29]. Life-history theory leads
us to expect key individual differences in behaviour
and physical growth between those who engage in
early reproduction compared with those who are
relatively delayed. Thus, teenage motherhood can
be seen as an extreme end of a niche-specific early
fertility strategy. The average age of first birth in
poorer neighbourhoods will be lower than that in
wealthier boroughs, but not all reproduction will
begin during teenage years in deprived areas [30].
For those who do reproduce during their teenage
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Most women in Western populations delay the onset
of childbearing. However, there is a small minority
who become mothers before the age of 20. This
‘teenage childbearing’ phenomenon continues to
attract public health interest and policy interventions [1–3], although the basis for considering it a
major problem is debatable [4–6]. Policy makers
often regard teenage childbearing as a mistake,
stemming from lack of skills and knowledge surrounding contraception and sexual relationships
[2, 7]. However, the contention that contraceptive
behaviour or knowledge is a major causal factor is
not well supported by evidence [1, 8, 9]. Moreover,
programmes of intervention that provide contraceptive education to adolescents have been found to
have no effect on the rate of teenage childbearing
[10–12].
Policy makers have viewed this phenomenon as
the outcome of ‘poor’ reasoning, and it is assumed
that better reasoning will lead to delayed reproduction [13]. An alternative perspective holds that early
childbearing is part of a coherent reproductive strategy for some women. Indeed, women’s ideal age for
parenthood, surveyed at age 16 in the National Child
Development Study (NCDS) (see below), is generally a good predictor of their subsequent actual age
at first pregnancy [14]. Such desires could be seen as
indicative of peer pressure imposing a social norm
within such populations, but stable pro-natal attitudes of this sort also require an explanation, and
could easily be symptomatic of a reproductive strategy [13]. In addition, teenage mothers reach menarche relatively early [15], suggesting more rapid
maturation.
Reproductive strategies differ between and within
species. Life-history theory captures these differences [16]. A key assumption is that organisms will
act to maximize their average lifetime inclusive fitness, and that selection will have led to the evolution
of proximate mechanisms that enable physiological
and behavioural calibration to local ecological
contingencies [17]. The degree of calibration will vary
across species from fixed to more plastic strategies.
Those that inhabit relatively stable ecological niches
are more likely to have low levels of plasticity
compared with generalists or those from stochastic
ecologies [18–20]. Within a species, where different
ecologies are populated, we should expect to see
different phenotypic responses to maximize inclusive fitness.
Physical and psychological development in teenage mothers
examining emotional and behavioural adjustment
earlier in childhood in future teenage mothers. The
strategic view of teenage childbearing also suggests
that future teenage mothers should have a motivational orientation towards early childbearing, and
this should be significantly before first conception.
Consistent with this view, Maestripieri et al. [53]
found that adolescent women from father-absent
households, who are prone to show accelerated reproductive strategies, show a greater preference for
images of infants than their peers.
In this article, we use longitudinal data from the
NCDS to compare the developmental profiles of a
group of young women who became teenage
mothers with those of a control group who did not.
We examine physical variables (weight and height,
weight and height gain, pubertal development,
timing of menarche), and psychological variables
(psychological adjustment in childhood, reproductive intentions at adolescence). As outlined earlier,
we predict that the future young mothers will be
characterized by poorer growth very early in life,
rapid weight gain in middle childhood, early menarche and pubertal maturation and the early cessation
of growth. Psychologically, we would expect to see
negative emotional symptoms and behavioural adjustment problems in childhood, and a motivational
orientation to early parenthood that is detectable by
adolescence. We also investigate exposure to
contraceptive education at age 16, to test for effects
of lack of knowledge.
Several of the developmental differences we predict have been found in previous research (e.g. early
menarche [14], reduced adult stature [54], unhappiness in childhood [55] and idealization of early parenthood [28] are all associated with teenage
childbearing). However, not all studies control rigorously for socio-economic position. This is important, as teenage childbearing is concentrated in the
poorest social strata [56], and thus future teenage
mothers will differ from the rest of the population in
many ways that are related to poverty, but not directly related to their reproductive schedules. In this
study, we compare future young mothers only to a
socioeconomically matched control group to mitigate this problem, and to identify precursors that
are specific to teenage childbearing. Moreover, no
previous study has examined all the physical and
psychological antecedents in a single investigation.
The NCDS has exceptionally rich longitudinal data,
including a wide variety of different measures,
allowing this order of analysis. We can therefore
189
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
years we must look to additional differences between mothers, and idiosyncratic ecological issues,
beyond a general socioeconomic categorization.
Belsky et al. [31] proposed that adverse early-life
conditions—specifically, low parental investment
and family stress—induce accelerated reproductive
strategies as an adaptive response. Many studies
have observed associations consistent with this
hypothesis, such as those between low birthweight
and early menarche [32–34], poor parent–child relationships and early menarche [35–38], or between
stressful family environment and age at first sexual
activity or conception [39, 40]. It is hard to separate
out genetic and environmental explanations for these
associations, given that there are established heritable effects on pubertal maturation [41], and there
could be genetic correlations between these factors
and parenting behaviours [42, 43]. However, evidence
from genetically informative study designs [36], and
experimental animal models [44, 45], suggests that
the relationship between early-life inputs and subsequent reproductive strategies may be partly causal.
Gene Environment interactions, whereby people
with some genotypes are more responsive than
others to the effect of rearing conditions, are also
plausible [46].
If teenage childbearing is the outcome of a coherent reproductive strategy, and if that strategy is
induced by early environmental conditions, then
we can predict that future teenage mothers will differ
from their peers in many ways beyond their knowledge about contraception. Moreover, these differences should be evident well before the childbearing
years. Physically, we should expect relatively poor
growth very early in life, since growth immediately
before and after birth is highly sensitive to maternal
investment [47, 48]. This should however be
coupled with earlier puberty, and because of the
relationship between pubertal maturation and stature increase [49], also with earlier cessation of stature growth. Early puberty requires rapid weight gain
in middle childhood [50, 51], and thus we might additionally predict this pattern in future young
mothers.
At the psychological level, Belsky et al. [31] suggested that adverse rearing conditions should be
reflected in increased levels of emotional and behavioural problems in childhood, and that these mediate the acceleration of reproductive strategy.
Associations have been reported between teenage
childbearing and conduct problems in adolescence
[52], but there is a paucity of quantitative research
Nettle et al. |
190
| Nettle et al.
Evolution, Medicine, and Public Health
compare the strength of association across different
types of variables to investigate the relative
strengths of say, depression in late childhood, early
menarche and lack of contraceptive education, as
individual predictors of teenage childbearing.
METHODS
No separate ethical approval was required for this
research, as it was based on a secondary analysis of
an existing, anonymous data set. Written consent for
the storage of data was given by the parents of all
cohort members (CMs), and, in adulthood, by the
CMs themselves.
Study population and design
Measures
Physical development
Our physical development measures include birthweight (oz), weight (kg) and height (m) measured at
the ages of 7, 11, 16 and 23. We also used these
variables to calculate the gains in weight and height
between 7 and 11, 11 and 16 and 16 and 23. Pubertal
development was assessed at 11 and 16, with physicians assessing breast development (scales 1–5 at
age 11, absent/intermediate/adult at age 16) and
pubic hair (scales 1–5 at age 11, absent/sparse/
intermediate/adult at age 16). We treat the age 11
pubertal development variables as continuous, and
for the age 16 variables, we contrast ‘adult’ (the
modal response) with ‘non-adult’ (the other options
combined). Age at onset of menses is reported twice
in the NCDS data: by the girl being asked during
physician examination at age 16, and by mother’s
report in an interview at age 16. Once responses of
‘Not yet started’ and ‘Age unknown’ have been
deleted from both variables, the two correlate at
r = 0.72 (P < 0.001). Here, we use the mother report
as it has more than 100 more complete records for
our case group.
Psychological development
At ages 7 and 11, CMs’ teachers assessed their
behaviour using items from the Bristol Social
Adjustment Guides (BSAG) [58]. The teachers
indicated whether a large number of classes of behaviour indicating poor adjustment were present
(yes = 1/no = 0). These ratings give an overall
maladjustment score (BSAG total; higher score
indicates worse adjustment), and scores for
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
We used data from the NCDS, a longitudinal study of
all children born in the UK between 3 March and 9
March 1958. Extensive medical and sociological
data were gathered at the time of birth, at 7, 11, 16
and 23 years, using perinatal hospital data, physician examination and interviews with parents,
teachers and the CMs themselves. The NCDS is
ongoing.
We employed a case control design for the following reasons. First, it is advantageous for studying
dynamic populations in which follow-up is difficult.
Second, it is effective for examining outcomes with a
long latency period between exposure and manifestation—in this study this is up to 20 years. Third, it
can be used to examine multiple risk factors for development of the focal variable. Given that
longitudinal data have not been collected with our
specific hypotheses in mind we recognize that total
control is impossible to achieve. To this end
we regard this study as an exploratory proof of
concept.
Our initial sample included all female CMs whose
gestational age was known and was >259 days
(term), and who were still in the study at age 23.
From these 5152 women, 600 reported having a
child before their 20th birthday (the ‘case’ group).
Socioeconomic position in 1958 was primarily
measured using the Registrar General’s social class
framework [57], a five-point scale based on occupational ranking.
To control for family socioeconomic position, we
selected a set of controls such that the frequency
distribution of the social class of the CM’s mother’s
husband (variable n492), and the social class of
CM’s mother’s father (variable n526), was the same
in the case and control groups. This included selecting controls with missing values of these variables to
correspond to cases with missing values. Selection
of controls where there were more than needed who
met the criteria was done by lowest NCDS serial
number. One case could not be matched due to a
unique combination of social class variables and
was excluded from the study. Thus, the ‘case’ and
‘control’ groups (n = 599) are identical in terms of
their distributions of household social class at the
time of birth, and social class background of the
CM’s mother, although they are unrepresentative
of the NCDS women as a whole (see Table 1). The
case and control groups do not differ in gestational
age (cases: mean 283.31, SD 10.35; controls: mean
283.05, SD 9.70, t1196 = 0.46, n.s.).
Nettle et al. |
Physical and psychological development in teenage mothers
191
Table 1. Frequencies (percentages) of different social classes of mother’s husband, and mother’s father, in the case and control groups, and in women meeting
the inclusion criteria from the NCDS cohort as a whole
Class category
Cases and controls
229
687
3010
601
409
4
114
1
97
(4.4)
(13.3)
(58.4)
(11.7)
(7.9)
(0.1)
(2.2)
(0.01)
(1.9)
3
35
346
105
70
0
25
0
15
(0.5)
(5.8)
(57.8)
(17.5)
(11.7)
(0)
(4.2)
(0)
(2.5)
115
673
2266
633
586
36
394
60
289
(2.2)
(13.1)
(44.0)
(12.3)
(11.4)
(0.7)
(7.7)
(1.2)
(7.6)
3
47
236
103
95
3
52
6
54
(0.5)
(7.9)
(39.4)
(17.2)
(15.9)
(0.5)
(8.7)
(1.0)
(9.0)
12 subscales (unforthcomingness, withdrawal, depression, anxiety about acceptance by adults, hostility towards adults, writing off adults and standards,
anxiety about acceptance by children, hostility
towards children, restlessness, inconsequential
behaviour, miscellaneous symptoms and miscellaneous nervous symptoms). The subscale scores all
had a strong mode at zero, and so we have
treated them as dichotomous (zero score/non-zero
score). The BSAG total scores did not have a
mode at zero, but were skewed, and so we have
square root transformed them for the purposes of
t-tests.
At age 16, CMs were asked in an interview to state
the ideal age to get married, and the ideal age to start
a family. Responses were coded using a series
of categories (16 or 17, 18 or 19, 20 or 21, 22–25,
26–30, over 30). We have reconverted these
categories into ages using category mid-points (30
for ‘Over 30’), but since the resulting distribution is
non-normal, we use non-parametric statistics to test
for differences in these variables. In the same interview, CMs were asked whether they had lessons
about conception in the context of sex and
relationships education at school, and whether they
felt that they had been provided with enough information about conception.
Analysis
As our design controls for socioeconomic position,
and the CMs do not differ in age, our statistical analyses are very simple. We compare variables between
the case and control groups, reporting odds ratios
(ORs) and their confidence intervals (CIs) for dichotomous variables, and t-tests or non-parametric
Mann–Whitney U-tests for continuous ones. We
report Cohen’s d [59] as a measure of effect size
where appropriate. Note that we do not use paired
statistics. Since around 150 cases have a father and a
maternal grandfather from class III, for example, it
would be arbitrary to match each case to one particular control for statistical purposes (and there
would be many thousands of equally valid
matchings). Instead, our design ensures that the
overall socioeconomic profiles of the case and control groups do not differ, but the comparisons are
between the group means or frequencies.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Mother’s husband
I
II
III
IV
V
Students
Single, dead, away
Retired
Missing data
Mother’s father
I
II
III
IV
V
Unemployed, sick
Dead, away
Retired
Missing data
Whole cohort
192
| Nettle et al.
Evolution, Medicine, and Public Health
RESULTS
Growth and physical development
Psychological development
At age 7, the cases had higher total BSAG scores
than the controls (t1095 = 5.77, P < 0.01, d = 0.35).
At age 11, the difference had become more marked
(t1034 = 7.25, P < 0.01, d = 0.45). Table 3 shows the
OR for having a non-zero score on each of the BSAG
subscales. At age 7, cases were significantly more
likely to have a non-zero score than controls for
unforthcomingness, depression, hostility towards
adults, writing off adults and standards, inconsequential behaviour, and miscellaneous symptoms.
At age 11, cases were significantly more likely to have
a non-zero score than controls on all subscales except for withdrawal and anxiety about acceptance by
adults. Effect sizes for the BSAG subscales were generally substantial, with a mean OR of 1.82 at age 11
(Table 3).
The case group gave a significantly lower mean
ideal age for marriage than the controls (Table 4;
Mann–Whitney U-test: z = 7.77, P < 0.01). The case
group also had significantly lower mean ideal ages
DISCUSSION
Our results indicate that the differences between
British women who initiate childbearing early, and
their peers who do not, are apparent well before adolescence. Future young mothers in the NCDS cohort
were significantly lighter than their peers at birth,
and by age 7, lagged behind their peers in terms of
height. Between 7 and 16, future young mothers
caught up somewhat in terms of height, and particularly in terms of weight, though the difference in
weight gain between 7 and 16 was not statistically
significant. We note the similarity here to the growth
profile of those at risk for cardiovascular and metabolic problems later in life; low weight at birth and in
early childhood, followed by relatively rapid weight
gain in middle childhood [60]. Thus, accelerated reproductive schedules may have similar developmental origins. Our future young mothers also showed
signs of accelerated pubertal maturation, with more
adult breast development at 16, and an average age
at menarche around 4 months younger than the controls. They also gained very little height after 16
compared to their peers, suggesting early termination of growth and an accelerated transition from
adolescence to adulthood. The effect sizes for physical differences between future young mothers and
controls were generally small [59], with the difference
in timing of menarche providing the largest effect.
The psychological variables reveal increased
levels of emotional and behavioural disturbance at
age 7 and, more strongly, at age 11. In contrast to the
physical differences, the effect sizes for the psychological variables are substantial, with the odds of
depression and hostility at age 11, for example,
being over twice as high in the future young mothers
as in the control group. Previous research has found
that conduct disorder, but not affective problems
such as depression, in adolescence, is predictive of
teenage pregnancy [52]. However, using a psychological assessment in childhood, we found that both
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
The cases were on average significantly lighter than
the controls at birth (Table 2), and tended to be
lighter at age 7 (P = 0.06). All differences in weight
and also in weight gain were non-significant after age
7. The cases were significantly shorter than the controls at 7 and 11, and then again at 23. The height
gain 7–11 and 11–16 was no different for cases and
controls (data not shown). However, the height gain
between 16 and 23 was significantly less for the
cases than controls (t788 = 4.49, P < 0.01,
d = 0.32). The mean height gain 16–23 for the
cases was 0.7 cm, compared to 1.5 cm for the
controls.
There was no difference in ratings of breast or
pubic hair development at age 11 between cases
and controls (t946 = 0.92, n.s.; t945 = 0.05, n.s.).
However, at age 16, cases were more likely to be
judged to have adult breasts than the controls (marginally significant: OR = 1.34, 95% CI 1.00–1.81,
P = 0.05). The odds of being judged to have adult
pubic hair were not significantly different between
cases and controls (OR = 1.18, 95% CI 0.88–1.57).
Menarche was significantly earlier in the cases than
controls (t859 = 3.35, P < 0.01, d = 0.23; Table 2),
with a mean difference of 0.29 years.
for starting a family than the controls (Mann–
Whitney U-test: z = 7.07, P < 0.01). Within the case
group, 15.8% reported having had no sex education
lessons about conception, compared to 12.8% of the
controls (difference not significant: OR 1.28, 95% CI
0.87–1.89). Asked whether they needed more information about conception, 34.3% of the cases answered ‘yes’ or ‘maybe’. This compared to 30.7%
of the matched controls (difference not significant:
OR 1.12, 95% CI 0.95–1.49).
Nettle et al. |
Physical and psychological development in teenage mothers
193
Table 2. Comparison of the case and control groups for physical development variables
NCDS variable
Cases
Controls
Effect size
Birthweight (oz)
Weight, age 7 (kg)
Weight, age 11 (kg)
Weight, age 16 (kg)
Weight, age 23 (kg)
Height, age 7 (m)
Height, age 11 (m)
Height, age 16 (m)
Height, age 23 (m)
Breast development, age 11
Pubic hair, age 11
Breast development, age 16
Pubic hair, age 16
Age at menarche
n574
dvwt07
dvwt11
dvwt16
dvwt23
dvht07
dvht11
dvht16
dvht23
n1531
n1532
From n2005
From n2006
From n2648
114.81 (6.93)
23.12 (3.46)
36.73 (7.69)
54.52 (8.83)
58.16 (10.03)
1.208 (0.057)
1.436 (0.071)
1.600 (0.061)
1.605 (0.065)
1.98 (0.93)
1.86 (0.93)
Adult 258/non-adult 111
Adult 222/non-adult 133
12.57 (1.33)
116.81 (16.91)
23.55 (3.68)
37.54 (7.52)
54.19 (8.29)
58.37 (8.96)
1.220 (0.060)
1.447 (0.073)
1.607 (0.064)
1.621 (0.069)
2.04 (0.95)
1.86 (0.89)
Adult 268/non-adult 155
Adult 244/non-adult 172
12.86 (1.25)
0.12*
0.12
0.11
0.04
0.02
0.21*
0.15*
0.11
0.25*
0.06
0
OR 1.34*
OR 1.18
0.23*
Given are descriptive statistics for each group (means and standard deviations or frequencies, as appropriate), and effect size of the case–control
comparison (Cohen’s d or OR, as appropriate). *P < 0.05.
Table 3. OR (95% CIs) for receiving a non-zero score on each of the BSAG
subscales, for cases versus controls, at ages 7 and 11
Scale
Age 7
Age 11
Unforthcomingness
Withdrawal
Depression
Anxious accept. adults
Host. adults
Writing off adults
Anxious children
Host. children
Restlessness
Incons. behaviour
Misc. symptoms
Misc. nervous
1.50* (1.18–1.90)
1.00 (0.72–1.38)
1.64* (1.29–2.09)
1.11 (0.87–1.41)
1.95* (1.49–2.56)
1.79* (1.32–2.19)
1.11 (0.78–1.72)
1.22 (0.90–1.72)
1.30 (0.94–1.79)
1.68* (1.32–1.85)
1.45* (1.13–1.85)
1.12 (0.74–1.70)
1.30* (1.02–1.66)
1.34 (0.99–1.83)
2.28* (1.78–2.93)
1.29 (0.99–1.67)
2.00* (1.52–2.62)
1.54* (1.20–1.97)
1.59* (1.12–2.25)
2.62* (1.87–3.68)
2.43* (1.67–3.34)
1.75* (1.37–2.24)
1.69* (1.31–2.17)
1.97* (1.19–3.26)
*P < 0.05.
conduct problems and affective problems were
more prevalent in future young mothers than in controls. In fact, increased emotional and behavioural
disturbance in the future young mothers was consistent across all the subscales of the BSAG at age
11. Coupled with this was an idealization of earlier
marriage and earlier childbearing by age 16. Thus,
the psychological variables suggest a picture of poor
adjustment and negative emotionality in mid- to
late-childhood, associated with a tendency to reproduce young that is already in place by age 16. This
evidence accords with recent qualitative studies,
which have suggested that unhappiness in childhood is often a precursor to teenage motherhood,
and that it is generally experienced as a positive life
development [4, 5, 61].
The pattern of psychological development—unhappiness in childhood alongside a desire for parenthood—neatly mirrors the physical one of poorer
childhood growth, but precocious development at
and after puberty. Taken together, the physical and
psychological trajectories are consistent with the
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Measure
194
| Nettle et al.
Evolution, Medicine, and Public Health
Table 4. Comparison of the case and control groups for psychological development variables
Variable
NCDS
variable
Cases
Controls
Effect
size
BSAG total score, age 7
BSAG total score, age 11
Ideal age for marriage
Ideal age for family
No lessons about conception
Needs more info about
conception
n455
n1008
From n2809
From n2810
From n2825
From n2858
9.08 (8.29)
10.17 (9.53)
20.66 (2.54)
22.67 (2.75)
Yes 63/no 335
Yes 129/no 247
6.62 (7.36)
6.43 (7.10)
21.81 (2.26)
23.96 (2.55)
Yes 58/no 396
Yes 135/no 305
0.35*
0.45*
0.48*
0.49*
OR 1.28
OR 1.12
Given are descriptive statistics for each group (means and standard deviations or frequencies, as appropriate), and
effect size of the case–control comparison (Cohen’s d or OR, as appropriate). *P < 0.05.
that mothers who gave birth at or before age 20 were
more socioeconomically deprived, had reduced
human and social capital and experienced significantly more mental health problems than mothers
who delayed childbearing.
The current research is valuable for two reasons.
First, it allows us to clearly identify individual-level
developmental precursors of early childbearing,
above and beyond socioeconomic background.
Our results suggest that young women who physically mature earlier in comparison to their peers, and
especially those whose emotional and behavioural
adjustment before puberty is poor, are at substantially increased likelihood of seeking early parenthood. Second, it has implications for the design of
interventions. One of the few respects in which the
future young mothers did not, on aggregate, differ
significantly from the controls is in their exposure to
sex education lessons about conception, or their satisfaction with those lessons (cf. [1]). Moreover, the
finding that future young mothers had earlier ideal
ages for parenthood undermines the view that teenage pregnancy is generally caused by mistakes
stemming from poor contraceptive skills. Instead,
teenage childbearing generally occurs in the context
of early target ages for conception, and stands at the
culmination of a long developmental trajectory that
begins as early as in utero. It is quite plausible that
interventions that improve birthweight or early
growth, or reduce emotional distress in childhood,
would disrupt this developmental trajectory, and
have the eventual effect of reducing teenage pregnancy rates, while merely improving knowledge of
contraception is unlikely to have much effect. This
suggestion is borne out by the literature on the effectiveness of different kinds of intervention
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
idea of a facultative accelerated reproductive strategy being triggered by adverse early experience [31].
However, we note that with our current data, we can
only document the different developmental trajectory of future young mothers; we cannot separate
out the possible genetic and environmental
influences causing it. There is good evidence
for both genetic and environmental influences on,
for example, age at menarche [36, 41], and Gene
Environment interactions are also likely to be
important.
We should note by way of caution that the case–
control comparisons reported here aggregate all the
future young mothers together, and all the controls
together. Thus, our analyses do not reflect the fact
that there may be multiple pathways to teenage
childbearing. Some cases of teenage childbearing
may indeed reflect lack of contraceptive education;
our results merely show that this is not generally the
case in this cohort. Moreover, we have not
discriminated the possibility that, for example, one
subset of teenage conceptions is preceded by depression in childhood, while a different subset is
preceded by early menarche, from the possibility
that depression in childhood causes early menarche
which leads to early parenthood. Our data are also
relatively old, with the NCDS young mothers having
their babies in the 1970s. Although the UK rate of
teenage childbearing has declined since that time
[28], there is no reason to believe that fundamental
socioeconomic or psychosocial determinants have
altered significantly in recent decades [62]. Indeed,
one influential study of teenaged mothers in contemporary Britain noted that they continue to experience difficulties similar to those reported for earlier
cohorts. Moffitt and E-Risk Study Team [63] reported
Nettle et al. |
Physical and psychological development in teenage mothers
programme, which shows that interventions aimed
at increasing childhood well-being do tend to have
an impact [55], whereas sex education programmes
aimed at adolescents do not [10–12].
195
delivered by teachers on NHS registered conceptions
and terminations: final results of cluster randomised trial.
BMJ 2007;334:133–6.
12. Stephenson J, Strange V, Allen E et al. The long-term
effects of a peer-led sex education programme (RIPPLE):
a cluster randomised trial in schools in England. PLoS Med
acknowledgements
2008;5:e224.
13. Dickins T, Johns S, Chipman A. Teenage pregnancy in the
The NCDS is run by the Centre for Longitudinal Studies,
United Kingdom: a behavioral ecological perspective. J Soc
Institute of Education, London (www.cls.ioe.ac.uk), and data
are made available to registered researchers via the UK Data
Evol Cult Psychol 2012;6:344–59.
14. Nettle D, Coall DA, Dickins TE. Birthweight and paternal
Archive (www.data-archive.ac.uk). We should like to thank
involvement predict early reproduction in British women:
two anonymous reviewers for useful comments made on
evidence from the National Child Development Study. Am
an earlier draft of this paper, and also the editorial team
for their useful input. Conceived the study: D.N., T.E.D.
J Hum Biol 2009;22:172–9.
15. Buston K, Williamson L, Hart G. Young women under
and D.A.C.; obtained and screened data: D.N. and D.A.C.;
16 years with experience of sexual intercourse: who be-
analysed data: D.N.; wrote and revised the paper: D.N.,
comes pregnant? J Epidemiol Community Health 2007;61:
T.E.D., D.A.C. and P.D.M.D.
221–5.
references
and prospects. Naturwissenschaften 2000;87:476–86.
17. Kaplan H, Gangestad S. Life history theory and
1. Allen E, Bonell C, Strange V et al. Does the UK govern-
evolutionary psychology. In: Buss DM (ed). The
ment’s teenage pregnancy strategy deal with the correct
Handbook of Evolutionary Psychology. Hoboken, N.J.:
risk factors? Findings from a secondary analysis of data
John Wiley and Sons, 2005, 68–95.
from a randomised trial of sex education and their impli-
18. Sol D. Revisiting the cognitive buffer hypothesis for the
cations for policy. J Epidemiol Community Health 2007;61:
evolution of large brains. Biol Lett 2009;5:130–3.
19. Stearns SC. The evolutionary significance of phenotypic
20–7.
2. SEU. Teenage Pregnancy. London: Social Exclusion Unit/
HMSO, 1999.
3. Paranjothy S, Broughton H, Adappa R et al. Teenage pregnancy: who suffers? Arch Dis Child 2009;94:239–45.
plasticity: phenotypic sources of variation among organisms can be described by developmental switches and reaction norms. BioScience 1989;436:1–10.
20. Leimar O. Environmental and genetic cues in the evolu-
4. Arai L. Teenage Pregnancy: The Making and Unmaking of a
tion of phenotypic polymorphism. Evol Ecol 2007;23:
Problem. Policy Press: Bristol, 2009.
5. Duncan S. What’s the problem with teenage parents? And
125–35.
what’s the problem with policy? Crit Soc Policy 2007;27:
307–34.
21. Stearns SC. Trade-offs in life-history evolution. Funct Ecol
1989;3:259–68.
22. Chisholm JS, Ellison PT, Evans J et al. Death, hope, and sex.
maternal age adversely affect child development?
Curr Anthropol 1993;34:1–24.
23. Kaplan H, Hill KIM, Lancaster J et al. A theory of human life
Evidence from cousin comparisons in the United States.
history evolution: diet, intelligence, and longevity. Evol
6. Geronimus AT, Korenman S, Hillemeier MM. Does young
Popul Dev Rev 1994;20:585–609.
Anthropol Issues News Rev 2000;9:156–85.
7. Wight D, Abraham C. From psycho-social theory to sus-
24. Lawson DW, Mace R. Parental investment and the opti-
tainable classroom practice: developing a research-based
mization of human family size. Philos Trans R Soc Lond B
teacher-delivered sex education programme. Health Educ
Biol Sci 2011;366:333–43.
25. Borgerhoff Mulder M. Optimizing offspring: the quantity–
Res 2000;15:25–38.
8. Harvey N, Gaudoin M. Teenagers requesting pregnancy
termination are no less responsible about contraceptive
use at the time of conception than older women. BJOG
2007;114:226–9.
9. Seamark C. Design or accident? The natural history of
teenage pregnancy. J R Soc Med 2001;94:282–5.
10. DiCenso A, Guyatt G, Willan A et al. Interventions to
reduce unintended pregnancies among adolescents: systematic review of randomised controlled trials. BMJ 2002;
324:1426–30.
11. Henderson M, Wight D, Raab GM et al. Impact of a theoretically based sex education programme (SHARE)
quality tradeoff in agropastoral Kipsigis. Evol Hum Behav
2000;21:391–410.
26. Marmot M. Fair society, healthy lives. Public Health 2010;
126(Suppl.):S4–10.
27. Johns SE. Perceived environmental risk as a predictor of
teenage motherhood in a British population. Health Place
2011;17:122–31.
28. Kiernan KE. Becoming a young parent: a longitudinal
study of associated factors. Br J Sociol 1997;48:406–28.
29. Smith DM, Elander J. Effects of area and family deprivation
on risk factors for teenage pregnancy among 13-15-yearold girls. Psychol Health Med 2006;11:399–410.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
16. Stearns SC. Life history evolution: successes, limitations,
196
| Nettle et al.
Evolution, Medicine, and Public Health
30. Nettle D. Flexibility in reproductive timing in human
46. Belsky J, Pluess M. Beyond diathesis stress: differential
females: integrating ultimate and proximate explan-
susceptibility to environmental influences. Psychol Bull
ations. Philos Trans R Soc Lond B Biol Sci 2011;366:
357–65.
2009;135:885–908.
47. Wells JCK. Maternal capital and the metabolic ghetto: an
31. Belsky J, Steinberg L, Draper P. Childhood experience,
evolutionary perspective on the transgenerational basis of
interpersonal development, and reproductive strategy:
an evolutionary theory of socialization. Child Dev 1991;
health inequalities. Am J Hum Biol 2010;22:1–17.
48. Bogin B. Patterns of Human Growth. Cambridge University
62:647–70.
32. Adair LS. Size at birth predicts age at menarche. Pediatrics
2001;107:e59.
33. Sloboda DM, Hart R, Doherty DA et al. Age at menarche:
influences of prenatal and postnatal growth. J Clin
Endocrinol Metab 2007;92:46–50.
Press: Cambridge, 1988.
49. Nettle D. Women’s height, reproductive success and the
evolution of sexual dimorphism in modern humans. Proc
Biol Sci 2002;269:1919–23.
50. Blell M, Pollard TM, Pearce MS. Predictors of the age at
first menarche in the Newcastle Thousand Families Study.
34. Opdahl S, Nilsen TIL, Romundstad PR et al. Association of
size at birth with adolescent hormone levels, body size and
J Biosoc Sci 2008;40:563–75.
51. Silva IdS, De Stavola BL, Mann V et al. Prenatal factors,
age at menarche: relevance for breast cancer risk. Br J
childhood growth trajectories and age at menarche. Int J
Cancer 2008;99:201–6.
Epidemiol 2002;31:405–12.
52. Maughan B, Lindelow M. Secular change in psychosocial
relationships and individual differences in the timing
risks: the case of teenage motherhood. Psychol Med 1997;
of pubertal maturation in girls: a longitudinal test
27:1129–44.
of an evolutionary model. J Pers Soc Psychol 1999;77:
53. Maestripieri D, Roney JR, DeBias N et al. Father absence,
387–401.
36. Tither JM, Ellis BJ. Impact of fathers on daughters’ age at
menarche and interest in infants among adolescent girls.
Dev Sci 2004;7:560–6.
menarche: a genetically and environmentally controlled
54. Brennan L, McDonald J, Shlomowitz R. Teenage births and
sibling study. Dev Psychol 2008;44.
37. Bogaert AF. Menarche and father absence in a national
probability sample. J Biosoc Sci 2008;40:623–36.
38. Alvergne A, Faurie C, Raymond M. Developmental plasti-
final adult height of mothers in India, 1998-1999. J Biosoc
Sci 2005;37:185–91.
55. Harden A, Brunton G, Fletcher A et al. Teenage pregnancy
and social disadvantage: systematic review integrating
city of human reproductive development: effects of early
controlled trials and qualitative studies. BMJ 2009;339:b4254.
family environment in modern-day France. Physiol Behav
2008;95:625–32.
56. Imamura M, Tucker J, Hannaford P et al. Factors associated
with teenage pregnancy in the European Union countries: a
39. Ellis BJ, Bates JE, Dodge KA et al. Does father absence
place daughters at special risk for early sexual activity
and teenage pregnancy? Child Dev 2003;74:801–21.
40. Chisholm J, Quinlivan J, Petersen R et al. Early stress predicts age at menarche and first birth, adult attachment,
and expected lifespan. Hum Nat 2005;16:233–65.
41. Hartge P. Genetics of reproductive lifespan. Nat Rev Genet
2009;41:637–8.
42. Moffitt TE, Caspi A, Belsky J et al. Childhood experience
and the onset of menarche: a test of a sociobiological
model. Child Dev 1992;63:47–58.
43. Comings DE, Muhleman D, Johnson JP et al. Parent–
systematic review. Eur J Public Health 2007;17:630–6.
57. Office for Population Censuses and Surveys. Classification
of Occupations and Coding Index. Her Majesty’s Stationery
Office: London, 1980.
58. Stott DH. The Social-adjustment of Children: Manual to the
Bristol Social Adjustment Guides. University of London
Press: London, 1965.
59. Cohen J. Statistical Power Analysis for the Behavioral Sciences.
Lawrence Erlbaum Associates: Hillsbaum, NJ, 1988.
60. Barker DJP, Osmond C, Forse´n TJ et al. Trajectories of
growth among children who have coronary events as
adults. N Engl J Med 2005;353:1802–9.
daughter transmission of the androgen receptor gene as
61. Coleman L, Cater S. ‘Planned’ teenage pregnancy: per-
an explanation of the effect of father absence on age of
spectives of young women from disadvantaged back-
menarche. Child Dev 2002;73:1046–51.
44. Cameron NM, Fish EW, Meaney MJ. Maternal influences
on the sexual behavior and reproductive success of the
grounds in England. J Youth Stud 2006;9:593–614.
62. Hobcraft J. The timing and partnership context of
female rat. Horm Behav 2008;54:178–84.
45. Cameron NM, Shahrokh D, Del Corpo A et al.
Epigenetic programming of phenotypic variations in re-
becoming a parent: cohort and gender commonalities
and differences in childhood antecedents. Demogr Res
2008;19:1281–322.
63. Moffitt TE, E-Risk Study Team. Teen-aged mothers in
productive strategies in the rat through maternal care.
contemporary Britain. J Child Psychol Psychiatr 2002;43:
J Neuroendocrinol 2008;20:795–801.
727–42.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
35. Ellis B, McFadyen-Ketchum S. Quality of early family
273
Evolution, Medicine, and Public Health [2013] pp. 273–288
doi:10.1093/emph/eot022
orig inal
research
a r t i c le
Evidence for independent
evolution of functional
progesterone withdrawal in
primates and guinea pigs
1
Yale Systems Biology Institute and Department of Ecology and Evolutionary Biology, Yale University, New Haven,
CT, USA; 2Perinatology Research Branch, Program for Perinatal Research and Obstetrics, Division of Intramural Research,
Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH, Bethesda, MD, USA;
3
Department of Obstetrics and Gynecology, University of Michigan, Ann Arbor, MI, USA; 4Department of Epidemiology
and Biostatistics, Michigan State University, East Lansing, MI, USA; 5Department of Obstetrics and Gynecology,
Wayne State University, Detroit, MI, USA
*Correspondence address. Yale Systems Biology Institute, Yale University West Campus, Advance Biosciences Center,
Room 272B, PO BOX 27388, West Haven, CT 06516-7388, USA. Tel: þ1 203 737 3091; Fax: þ1 203 737 3109;
E-mail: gunter.wagner@yale.edu
Received 8 May 2013; revised version accepted 21 November 2013
ABSTRACT
Background and objectives: Cervix remodeling (CRM) is a critical process in preparation for parturition.
Early cervix shortening is a powerful clinical predictor of preterm birth, and thus understanding how
CRM is regulated is important for the prevention of prematurity. Humans and other primates differ from
most other mammals by the maintenance of high levels of systemic progesterone concentrations.
Humans have been hypothesized to perform functional progesterone withdrawal (FPW). Guinea pigs
are similar to humans in maintaining high-progesterone concentrations through parturition, thus
making them a prime model for studying CRM. Here, we analyze the phylogenetic history of FPW
and document gene expression in the guinea pig uterine cervix.
Methodology: Data on progesterone withdrawal were collected from the literature, and character
evolution was analyzed. Uterine cervix samples were collected from non-pregnant, mid-pregnant and
late pregnant guinea pigs. RNA was extracted and sequenced. Relative transcript levels were estimated
and compared among sample groups.
Results: The phylogenetic analysis shows that FPW evolved independently in primates and guinea
pigs. The transcriptome data confirms that guinea pigs down-regulate progesterone receptor toward
parturition, in contrast to humans. Some of the similarities between human and guinea pig are: downregulation of estrogen receptor, up-regulation of VCAN and IGFBP4 as well as likely involvement
of prostaglandins.
ß The Author(s) 2013. Published by Oxford University Press on behalf of the Foundation for Evolution, Medicine, and Public Health. This is an Open
Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits
unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
¨nter P. Wagner*1,5
Mauris C. Nnamani1, Silvia Plaza1, Roberto Romero2,3,4 and Gu
274
| Nnamani et al.
Evolution, Medicine, and Public Health
Conclusions and implications: (i) FPW in guinea pigs evolved independently from that in primates.
(ii) A small set of conserved gene regulatory changes has been detected.
K E Y W O R D S : guinea pig cervix; gene expression; functional progesterone withdrawal; evolution of
parturition
INTRODUCTION
In this communication we investigate gene
expression in the cervix of guinea pigs to assess
the similarity between human and guinea pig
CRM. Our aim is to answer the question of whether
FPW in primates and guinea pigs is homologous,
meaning that it was already present in the most
recent common ancestor of humans and guinea
pig. This question is important because molecular
mechanisms are more likely shared among homologous characters than among characters that evolved
independently. Guinea pig belongs to a basal rodent
lineage [14], and it is thus possible that FPW in primates and guinea pigs could be homologous.
Here, we report that there are extensive differences between the human and guinea pig cervical
gene expression dynamics, which makes it unlikely
that the mechanisms of ‘FPW’ are homologous
between the two species. This inference is also
supported by a phylogenetic analysis of serum
progesterone concentrations at parturition among
Archontolgires, the clade uniting primates and
rodents. Nevertheless, it is still possible that there
are conserved molecular mechanisms shared
between primates and other placental mammals,
as for instance the down-regulation of estrogen
receptor alpha (ESR1), as reported in our previous
paper for humans [17] and here for guinea pigs.
METHODOLOGY
Tissue harvesting and RNA extraction
Hartley guinea pigs (pregnant and non-pregnant)
were obtained from a commercial supplier
(Charles River, Wilmington, MA, USA). The animals
were housed in individual plastic cages with hay,
branches and other environment enhancing objects.
Non-pregnant females were examined daily for
vulva occlusion membrane. These animals were
sacrificed the day after the vulva occlusion disappeared indicating entry into estrus. Pregnant females were obtained with uncertain pregnancy dates
and were euthanized based on assessing overall
body weight of the female with gestation range
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Cervical remodeling (CRM) is a necessary step
in preparation for successful parturition [1].
Premature softening and shorting of the cervix
is also a powerful predictor of preterm birth,
and thus understanding the molecular mechanisms
regulating CRM are critical for successful intervention to prevent prematurity [2]. A major obstacle
in unraveling the mechanisms of human cervical
ripening is that mammals are highly variable with
respect to the mechanisms underlying parturition.
Most notable is the fact that in humans and other
primates the placenta produces large amounts of
progesterone through pregnancy and labor. In contrast, most other placental mammals, including
model organisms such as sheep, mouse, rabbit
and rat, have systemic progesterone withdrawal,
meaning that the serum concentration of progesterone declines toward term before the onset of
labor. Declining progesterone concentrations are
thought to remove the ‘progesterone block’ from
the uterus and the cervix and thus allow progression
toward parturition [3–5]. To explain parturition
in humans, it has been suggested that there have
to be changes downstream of the progesterone
signal that undercut the progesterone block in its
target tissues. Several mechanisms have been
proposed for this so-called functional progesterone
withdrawal (FPW) [6–9], but none of them have yet
received decisive support.
A notable exception among model organisms
is the guinea pig that, like the human, maintains
high concentrations of serum progesterone
through parturition [10]. It has thus been suggested that guinea pig is a potential model organism to study the mechanisms of FPW as well as
prematurity, which is very rare in model species
with fast gestation like mice [11]. This is an attractive possibility, as guinea pigs belong to the same
clade as other major model organisms, mice, rat
and rabbit, the so-called Glires [12–15]. It is also a
model organism with a long tradition of experimental research [16] and a sequenced genome
(http://useast.ensembl.org/Cavia_porcellus/Info/
Index).
Independent evolution of functional progesterone withdrawal
Sequencing and data processing
RNA library preparation and high-throughput
sequencing were performed on an Illumina HiSeq
2000 sequencing system following the protocol recommended by Illumina for sequencing total RNA
samples. Sequencing was done for each biological
replicates at 1 75 bp strand specific by the Yale
Center for Genome Analysis. Sequence reads were
aligned to the Cavia porcellus reference genome
(cavPor3.69) using the splice junction mapper for
RNA-seq reads TopHat2 [21–23]. Sequencing depth
for RNA-seq samples averaged 45 million reads per
biological sample with >80% overall alignment rate.
After alignment, read counts were determined with
HTSeq-count v0.5.4p1 as described by the authors.
All RNA-seq data are deposited in GEO under accession number GSE47986.
Data analysis
Relative RNA abundance was measured as transcripts per million (TPM) transcripts as recommended in [24, 25] rather than RPKM or FPKM.
The reason is that units of RPKM are not consistent
between samples and are thus problematic when
comparing RNA abundance between samples of
different transcript composition [25].
In comparing transcript abundances we distinguish two different kinds of events. At the one hand
are induction and repression of gene expression
(turning gene expression ON or OFF). On the other
hand are modulations of gene expression. We refer
to induction when the transcript abundance of a
gene is below the operational threshold of three
TPM [26, 27] in the earlier stage of the reproductive
cycle but above the threshold in the later stage. The
threshold of three TPM corresponds to 1 RPKM in
terms of the traditional RNA abundance measure
[26, 27]. The threshold is based on association between expression level and chromatin modification
status as well as a statistical model of transcript
abundance. In contrast, we call changes of transcript
abundance ‘expression modulation’ if the estimated
transcript abundance is above the operational
threshold in both stages compared. When we refer
to up or down regulation, we mean gene expression
modulation.
The motivation for making the (operational) distinction between gene expression modulation and
induction/repression is that they represent two different biological events. Induction/repression is
associated with qualitative changes in the nature
of the chromatin modification and the kind of transcription factors and co-factors associated with the
cis-regulatory elements [27]. On the other hand, gene
expression modulation can have a variety of causes,
from phosphorylation status of transcription factors
to the expression level of upstream regulators.
Statistics
When correlations were calculated or differential
expression was tested, we transformed TPM data
by square root transformation rather than the more
widely used log-transformation for two reasons.
First, TPM as well as RPKM data usually contains
zero values. In a log-transformation these data
points will be transformed into minus infinity, with
consequences for data handling and interpretation.
pffiffiffi
In contrast, 0 ¼ 0 and no further problems arise.
The second reason to prefer square root transformation is that it is in fact variance stabilizing [26], while
the log-transformation leads to an inflation of variance at low abundance values. Tests for differential
expression were not used as a discovery tool but to
test specific hypotheses. For instant, we tested differences between stages for those genes that have
been found differentially expressed in humans. For
this reason, we did not use multiple comparison
275
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
provided by the suppliers. For ‘post-mortem’ assessment of the duration of pregnancy, we took measurements of both the fetal supra-occipital-premaxilla
length and fetal weight at harvest (Supplementary
Table S1). To estimate gestational age we used
previously published data [18–20]. Estimated gestational stage for mid-pregnant and late pregnant (but
not in labor) animals was 43.6 ± 0.5 (mean ± SD)
days and 65.2 ± 1.2 days, respectively (average
gestation period of 65.5 ± 6.5 days range [16]).
Tissue harvesting focused on the vaginal portion of
the cervix to minimize contamination with uterine
myometrial tissue. Samples were collected from
the three groups in triplicates for non-pregnant
and mid-pregnant females and four replicates
for late pregnant females. Cervical tissue was immediately stored in RNAlater Solution (Ambion, cat#
AM7020) until further processing. Total RNA was
extracted using the RNeasy extraction kit (Qiagen,
cat# 75142) following manufactures instructions
that included an in-column DNAse digestion.
Quality of the RNA was then confirmed using
Agilent 2100 Bioanalyzer (Santa Clara, CA, USA).
Nnamani et al. |
276
| Nnamani et al.
Evolution, Medicine, and Public Health
corrections. We used one-way ANOVA and t-tests on
square root transformed TPM data.
BrdU labeling
RESULTS
Evolution of functional progesterone
withdrawal
The guinea pig has attracted interest because
it shares with humans the feature of maintaining
high concentrations of serum progesterone (P4)
(>200 ng/ml) through parturition [10]. For both species it has been suggested that some mechanism
Figure 1. Evolution of FPW. Phylogenetic reconstruction of serum progesterone concentration at parturition in Euarchontoglires, the clade uniting primates and
rodents. Species and lineages with sustained high systemic progesterone concentrations are indicated in white, species and lineages with a drop of systemic
progesterone concentrations are indicated in black. The root state was fixed as progesterone withdrawal based on outgroup comparison with hoofed animals and
carnivores. Tree topology in Glires follows fig. 6 in [28]. (a) Scenario that assumes that the mouse lemur has progesterone withdrawal. Under this assumption the
lack of systemic progesterone withdrawal is an ancestral state of primates and even the Euarchonta. (b) Scenario that assumes that the mouse lemur does not have
progesterone withdrawal. Under this scenario the ancestral state for Euarchonta and primates is ambiguous. Note that in both scenarios guinea pig is the only
lineage without progesterone withdrawal within the Glires and that guinea pig is deeply nested within this clade. This phylogenetic distribution suggests that FPW
evolved independently in primates (Euarchonta) and guinea pig within the Glires.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Three guinea pigs 35–38 weeks pregnant for
mid-pregnant and also three females 58–63 weeks
pregnant for late pregnancy were obtained from
Charles River laboratories. Two females from each
group were injected intraperitoneal with 10 ml of
Bromodeoxyuridine (BrdU) labeling reagent Life
Technology (Cat # 0103), and one control sample
with 10 ml PBS. After 24 h the animal was sacrificed,
and cervix was recovered and fixed in 4% paraformaldehyde (24–48 h). The tissue was processed for
paraffin embedding and serial section of 6 mm was
produced. The slides were processed following the
Life Technology protocol for BrdU (Cat # 93-3943).
downstream of the progesterone signal is responsible for CRM, called FPW. It is thus interesting to
ask whether FPW in the guinea pig is homologous to
that in humans.
In Fig. 1a and b, a survey of progesterone withdrawal within Euarchontoglires is presented based
on the phylogenetic hypothesis in [28]. Lack of systemic progesterone withdrawal is found in humans,
apes (chimpanzee [29], gorilla [30, 31]) and old world
monkeys (rhesus monkey [32, 33], baboon [34]). The
situation in new world moneys is complicated.
The detailed study by Chambers and Hearn on the
common marmoset (Callithrix jacchus) shows a
steady decline in serum progesterone levels in the
4 days leading up to parturition [35]. Other authors
did not report such a drop but also did not explicitly
document the peripartum period [36]. In contrast,
Corousos and collaborators report the peripheral
progesterone levels of two pregnant squirrel monkeys (Saimiri sciureus) but do not show a decline
of progesterone levels in the days leading up to
parturition [37], even though the levels at the day
of parturition are reported to be close to zero.
Nevertheless, it is clear that the withdrawal, if any,
would have to be very precipitous or absent. Based
on comparison with the rate of decline in the
marmoset we concluded that the squirrel monkeys
do not have systemic progesterone withdrawal. We
could not find data about tarsiers and some limited
Independent evolution of functional progesterone withdrawal
277
were mapped to the guinea pig genome version
cavPor3.69 and read counts transformed to TPM
transcripts [25]. Below we focus on two comparisons
of transcript abundance. One is the differences between non-pregnant and mid-pregnant stages and
the other the comparison of mid-pregnancy and late
pregnancy.
Gene expression changes in the cervix from
estrus to mid-pregnancy
In the comparison from non-pregnant and mid-pregnant cervices, 195 genes cross the operational criterion from being repressed to being expressed.
Figure 2a shows the expression levels of genes operationally non-expressed in estrus but expressed in
mid-pregnancy. Two genes stand out, CLCA1 and
PLA2G10 (see Supplementary Table S2). CLCA1 is
a component of Ca2þ sensitive chlorine channels
and is involved in mucus production in other organs
[42]. PLA2G10 is a phospholipase catalyzing the ratelimiting step in prostaglandin synthesis [43]. The
other highly expressed and induced genes were:
two members of the sodium-dependent decarboxylase transporters (SLC13A2 and SLC36A2), IGFBP1,
as well as a tumor necrosis factor, TNFSF11.
There are 428 genes expressed during estrus
but operationally OFF at mid-pregnancy. Figure 2b
shows their expression level during estrus. There
are 11 genes highly expressed during estrus but
operationally turned OFF in mid-pregnancy with a
discontinuity of lower expression level (big bold
arrow in Fig. 2b). The most highly expressed gene
turned off in pregnancy is interleukin 1a (IL1A). The
other genes are TMPRSS11D, a trans-membrane
serine protease involved in gland secretory activity,
interferon kappa, IFNK and an antagonist of IL1A,
IL1RN and others more (see list in Supplementary
Table S3). The gene ontology categories most
over-represented are related to inhibition of cell
proliferation.
Gene expression in guinea pig cervix
Gene expression changes in the cervix from
mid- to late-pregnancy
Cervical tissue was harvested from guinea pigs in
three reproductive stages: non-pregnant and in estrus (NP), in mid-pregnancy (MT) and late pregnancy/term (LT) (see ‘Methodology’ for details
on the timing of sampling). The samples were
transcribed and sequences with 1 75 bp to an average sequencing depth of 40–50 Mio reads. Reads
In the transition to late term 195 genes are turned
on (Fig. 2c). The strongest expressed gene, which
was not expressed in mid-pregnancy, is called
VISG8, V-set and immunoglobulin domain containing 8. Very little is known about its function
(Supplementary Table S4). The second most
expressed gene turned on toward term is SPC24, a
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
information about lemurs. Blanco and Meyer report
progesterone levels in various stages of estrus and
pregnancy of the brown mouse lemur (Microcebus
rufus) but the gestational stages in this study are very
unreliable [38]. In spite of that, the pre-partum cohort shows higher progesterone than the pre-estrus
cohort, making it possible that the mouse lemur lack
systemic progesterone withdrawal. Thus, we provide
two reconstructions one with and one without systemic progesterone withdrawal for the mouse lemur
(Fig. 1a and b).
An outgroup clade of primates are the Tupaias
(Scandentia). In Tupaia belaneri, there is a drop in
serum P4 only after parturition [39] but this paper
reports unpublished data from Elger and Hasan
about a single animal. However, the peak progesterone level is reported to occur 2 days before parturition and we thus classify it as having no systemic
progesterone withdrawal. Within the Glires, the
clade including rodents and rabbits and squirrels,
all lineages have progesterone withdrawal except
guinea pigs [40, 41].
In Fig. 1a and b, we provide two parsimony-based
ancestral character state reconstructions reflecting
the ambiguity about the situation in the lemur.
Under either scenario the lack of systemic progesterone withdrawal in the guinea pig is reconstructed
as independently derived (Fig. 1a and b). The difference between the two scenarios, lemurs with or
without progesterone withdrawal, only affects the
reconstruction of the ancestral character state in
the primate lineage (Euarchonta). Assuming that
the lemur has no systemic progesterone withdrawal
(Fig. 1a) leads to a reconstruction where this condition is ancestral for the Euarchonta, and the situation in the marmoset is inferred to be a reversal.
Assuming that the lemur has systemic progesterone withdrawal (Fig. 1b) leads to a reconstruction
where the ancestral state of the stem lineage of
primates is ambiguous. From these data we conclude that FPW likely has evolved independently
in primates and guinea pigs.
Nnamani et al. |
278
| Nnamani et al.
Evolution, Medicine, and Public Health
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Figure 2. Expression levels in TPM of genes that are turned on or off during pregnancy in the guinea pig cervix. (a) Expression
level of genes in mid-pregnancy of genes that are not expressed in non-pregnant cervix and expressed in mid-pregnancy. Note
that there are two genes much more highly expressed than the others, CLCA1 and PLA2G10. (b) Expression level of genes in
non-pregnant cervix that are not expressed in mid-pregnant cervix. There is a distinct difference in expression level between the
11 highest expressed genes and the rest (large arrow). (c) Expression level of genes in late pregnancy of genes not expressed
in mid-pregnancy. (d) Expression level of genes during mid-pregnancy that are not expressed in late pregnancy. Four genes
stand out: MGAM is the maltase-glucoamylase or alpha-glucosidase, an enzyme involved in starch digestion; SCXB the basic
helix-loop-helix transcription factor; COL9A2 is a type IX collagen mostly known from hyaline cartilage; and CSPG5 is the
chondroitin sulfate proteoglycan 5 (neuroglycan C). (e) Cervical stroma in mid-pregnancy labeled with BrdU. Note small dense
nuclei with very few BrdU labeled cells. (f) Cervical stroma in late pregnancy labeled with BrdU. The cell nuclei are less dense
and there are abundant cells labeled with BrdU, consistent with the finding that proliferation related genes are preferentially
expressed in late pregnancy
(continued)
Independent evolution of functional progesterone withdrawal
kinetochore complex component, foreshadowing
the result of the gene ontology analysis. The overwhelming majority of genes are related to enhancement of cell proliferation, suggesting a very active
proliferative state of the guinea pig cervix at term.
These conclusions are supported by the induction or
strong up-regulation of genes known to promote or
are associated with tumor growth. For instance,
among the induced genes is FOXM1 and among
the strongest up-regulated genes are ARG2,
arginase type II and FGFBP1. The high expression
Nnamani et al. |
279
of ARG2 [44–46], FGFBP1 [47, 48] and FOXM1
[49–53], are associated with several cancers.
To determine whether the late term cervical
stroma has proliferating cells we injected guinea
pigs in mid- and late-term with BrdU and harvested
the tissue 24 h later (Fig. 2e and f). While at mid-term
the nuclei of stromal cells are small and dense, in
late term the cell nuclei are larger and less dense.
BrdU stained cells are abundant in latter (Fig. 2f)
consistent with the transcriptomic data suggesting
higher proliferative activity toward late term.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Figure 2. (Continued)
(continued)
280
| Nnamani et al.
Evolution, Medicine, and Public Health
Figure 2. (Continued)
Expression dynamics of candidate genes
Steroid receptors
During estrus the expression of estrogen receptor,
ESR1, is high (>240 TPM) and is decreasing to <150
TPM in mid-pregnancy and further decreases to
<100 TPM toward term. In contrast, the progesterone receptor (PR) mRNA, PGR, is low during estrus
(27 TPM) and rises to a value of 56 TPM in midpregnancy but falls close to pre-pregnancy levels of
30 TPM toward the end of pregnancy (Fig. 3a).
Extracellular matrix
The dominant collagen mRNA in the guinea pig cervix is type 1 collagen, COL1A1 and COL1A2.
Expression is high for both type I collagen genes,
5000 TPM and 2000 TPM, respectively, and does
not change among stages of the reproductive cycle.
COL3A1, coding for collagen III identified in human
cervix, was not mapped in our transcriptome data.
The dominant small proteoglycan is fibromodulin
(FMOD) (Fig. 3b), which is known to associate with
type I collagen and is also able to sequester TGF-beta
in the extracellular matrix. FMOD shows a 2.2-fold
increase from estrus to mid-pregnant stage and then
a 0.55-fold decrease toward term, returning to about
the same level as in estrus. A similar dynamics is
observed for another small proteoglycan, testican
2, SPOCK2, although at lower levels of expression.
It increases 2.1 from estrus to mid-pregnancy and
toward term is almost completely suppressed, going
from 122 TPM at MT to 9 TPM at LT (0.07-fold
change). Repression of both FMOD and SPOCK2
transcription may play a role in extracellular matrix
remodeling in preparation for parturition. In addition, the expression dynamics of FMOD suggests
lower efficiency of TGF-beta signaling during pregnancy than before and toward term.
We found substantial expression for two large proteoglycans, versican (VCAN) and perlecan (HSPG2,
heparan sulfate proteoglycan 2) (Fig. 3c). VCAN decreases from estrus to mid-pregnant by 0.38-fold
and then increases 1.6-fold again toward term. In
contrast, HSPG2 is high in estrus but declines slowly
toward term to 59% from its value before pregnancy.
The dominant matrix metallopeptidase mRNA
is MMP2 (Fig. 4a), which is known to break down
collagen IV and gelatin. There is an apparent
increase in expression of MMP2 between mid- and
late-term, but the statistical support is weak
(P ¼ 0.056). The next highest expression was found
for MMP14, which is activating MMP2 and is likely
membrane bound. MMP14 expression in estrus
is 340 TPM and decreases to mid-pregnancy by
0.55-fold (P ¼ 0.016) and shows a slight increase of
1.18-fold toward late term (P ¼ 0.05). In contrast,
MMP11 has a moderate expression in estrus (109
TPM), decreases toward mid-pregnancy by 90%
(0.1-fold) and is barely above the operational threshold for expressed genes at term (3.75 TPM). This
MMP11 is not effective in extracellular matrix breakdown, but is reported to cleave protease inhibitor
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
In the transition to term 449 genes meeting the
operational criterion of expression in mid-pregnancy
are turned OFF toward term. Four of them have
higher expression than the rest of the distribution
(Fig. 2d). These are: MGAM, SCXB, COL9A2 and
CSPG5 (Supplementary Table S5). MGAM is the
maltase-glucoamylase or alpha-glucosidase, an enzyme involved in starch digestion; SCXB the basic
helix-loop-helix transcription factor; COL9A2 is a
type IX collagen mostly known from hyaline cartilage
and CSPG5 is the chondroitin sulfate proteoglycan 5
(neuroglycan C). Repression of these genes suggests a change in the extra-cellular matrix composition during transition to term.
Independent evolution of functional progesterone withdrawal
dynamics of large proteoglycan genes: VCAN and perlecan (HSPG2). Values at the right of each graph give the P-value of an
ANOVA test including all three samples; the valued above the graphs give t-test based P-values for two-tailed pairwise
comparisons
A1AT, which is not included in our mapped
transcripts.
Among the ADAM metallopeptidases with
thrombospondin motif, ADAMTS metallopeptidases, ADAMTS1 and -2 are most expressed.
ADAMTS1 shows a slow but insignificant increase
from non-pregnant to term, while ADAMTS2 is
increasing from non- to mid-pregnant and decreases
at late term (Fig. 4b).
The most expressed inhibitor of matrix metallopeptidase is TIMP2, which is highly expressed during
estrus and shows a slow but insignificant
decline toward term (Fig. 4c). The only highly
expressed protease inhibitor that shows a significant increase from mid- to late-term is TIMP1
(1.9, P ¼ 0.006). TIMP1 is associated with cell
proliferation and may have anti-apoptotic effects
consistent with other results suggesting proliferation
in later term cervix (see above and Fig. 2e and f). In
contrast, TIMP3 shows a 50% reduction in expression between mid- and late-term (0.51, P ¼ 0.026).
Signaling
Prostaglandin signaling, in particular PGE2
and PGF2a, is associated with many aspects of
female reproductive physiology including parturition [54]. A rate-limiting step in prostaglandin
synthesis is the liberation of arachidonic acid from
phospholipids by phospholipase A and C. In the
guinea pig cervix we find a substantial increase
in PLA2G10 just before parturition (Fig. 5a).
Interestingly, the expression of cyclo-oxygenase II,
catalyzing another key step in prostaglandin synthesis, is not increased toward term.
281
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Figure 3. Expression dynamics of steroid receptors and proteoglycans. (a) Expression of ESR1 and PR. ESR1 is high during
estrus and is steadily declining toward parturition. PGR is up-regulated in mid-pregnancy but returns to pre-pregnancy levels
toward term. (b) mRNA expression dynamics of small proteoglycan genes: FMOD, testican 2 (SPOCK2) and BGN. (c) mRNA
Nnamani et al. |
282
| Nnamani et al.
Evolution, Medicine, and Public Health
of genes coding for ADAM metallopeptidases with thrombospondin motif, ADAMTS metallopeptidases. (c) mRNA expression of genes coding for inhibitors of matrix metallopeptidases, TIMPs. Values at the right of each graph give the P-value of
an ANOVA test including all three samples; the valued above the graphs give t-test based P-values for two-tailed pairwise
comparisons
The expression of IGF signaling molecules came
to our attention though the conserved up-regulation
of IGFBP4 in both human and guinea pig cervices
(see below). mRNA expression of IGF1 and IGF2
toward term follow different trends, with IGF1 being
up-regulated (1.57, P ¼ 0.0074) and IGF2 downregulated (0.66, P ¼ 0.0038). In contrast, there
is no substantial change in IGF receptor mRNA
expression.
There is high expression of most IGF binding
proteins, with three having a dramatic regulatory
change (Fig. 5b). IGFBP5 is the most expressed
IGFBP overall but does not show a significant difference between pregnancy stages. IGFBP4 shows
the most dramatic up-regulation (2.75, P ¼
0.0022). In contrast, IGFBP3 is expressed at about
the same level as IGFBP5 at estrus and then shows
a gradual decline during pregnancy with very low
expression at term (from 1500 TPM during estrus
to 140 TPM at term). These results could point to
an attenuation of IGF2 signaling in the term cervix
of the guinea pig because two binding proteins are
increasing, and IGF2 expression is decreasing (but
IGF1 mRNA is increasing).
DISCUSSION
In this study we addressed two questions. First, we
investigated whether FPW in primates and guinea
pigs has evolved independently or whether these
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Figure 4. Expression dynamics of proteases. (a) mRNA expression of matrix metalloprotease genes. (b) mRNA expression
Independent evolution of functional progesterone withdrawal
Nnamani et al. |
283
Figure 5. Expression dynamics of genes related to prostaglandin and IGF signaling. (a) mRNA expression of genes coding for
PLA2 gamma 4A and 10, PLA2G4A and PLAG210. Note the steep increase in the expression level of PLA2G10, which suggests
(b) mRNA expression of genes coding for insulin like growth factor binding proteins, IGFBP3, IGFBP4 and IGFBP5. Values at
the right of each graph give the P-value of an ANOVA test including all three samples; the valued above the graphs give t-test
based P-values for two-tailed pairwise comparisons
are homologous features. Based on a parsimony
reconstruction of ancestral character states we
inferred that FPW in guinea pigs evolved independently of that in primates (see Fig. 1). Second, we
investigated the gene regulatory dynamics during
pregnancy in the uterine cervix using RNAseq to
assess how CRM is regulated in the presence of
sustained progesterone levels in the guinea pig.
Our data suggest two mechanisms involved in
CRM, down-regulation of PR expression and, potentially, the intracervical activation of prostaglandin
synthesis, that are discussed in more detail below.
Evolution of functional progesterone
withdrawal
Here, we use the term FPW as meaning a situation in
which the circulating progesterone levels do not substantially fall prior to parturition. At this point, we do
not ascribe a specific mechanistic meaning to the
term FPW. The reason is that it is still unclear how
different the regulation of CRM is between species
with systemic progesterone withdrawal and those
without systemic progesterone withdrawal; with
some authors clearly favoring the view that the
mechanisms of CRM are conserved between mouse
and human [55], while the question still remains
whether sustained high levels of circulating progesterone have consequences for the regulation of CRM
and parturition in general [56].
The question whether FPW in guinea pigs and primates is homologous, requires a detailed investigation because the guinea pig represents a basal
lineage among the rodents and FPW is widespread
in Euarchonta, the clade consisting of primates, tree
shrews and flying lemurs (for references see above).
It is thus possible that FPW could have been an ancestral state within the Archontoglires, the clade
uniting primates and rodents and their close relatives. In Euarchonta, the most basal lineage with
data on progesterone levels at parturition is the
northern tree shrew (Tupaia belangeri). For this
species, FPW has been documented, and thus it is
possible that FPW can be quite old.
Data about progesterone levels at parturition in
non-model organisms are sparse, but sufficient data
of the basal lineages of the Euarchontoglires are
available to allow a unique parsimony reconstruction (Fig. 1). The conclusion that FPW evolved twice
independently is based on the serum progesterone
withdrawal in rabbits and Sciuridae (here represented by ground squirrel and woodchucks), which
are two lineages more basal than the most recent
common ancestor of guinea pigs and other rodents
(Fig. 1). The outgroup of the Euarchontoglires is
Lauresiatheria, which is a large group containing
as diverse animals as bats, horses, cows, hedgehogs
and carnivores and others. We did not perform a
detailed analysis of data on circulating progesterone
levels during pregnancy in this group, but we are not
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
an increased paracrine expression of prostaglandin, since PLA2 is catalyzes a rate-limiting step in prostaglandin biosynthesis.
284
| Nnamani et al.
Evolution, Medicine, and Public Health
aware of any member of this group to show FPW. It is
widely acknowledged that sheep, cow, pig, horse,
dog and cat have circulating serum progesterone
withdrawal at parturition. We thus infer that the ancestral state for Euarchontoglires is withdrawal of
serum progesterone. Based on these data we consider the conclusion that primates and guinea pigs
evolved FPW independently as robust.
Physiology of cervix remodeling in guinea pig
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
An intriguing finding about guinea pig CRM was
reported by Rodrı´guez et al. [57]. The authors
quantified the density of PR and ESR1 positive cells
in the cervix and the uterus and reported a 50% drop
in the number of PR positive cells in the sub-epithelial portion of the cervix, i.e. the cervical stroma, 1–
2 days before parturition. The authors also found a
negative correlation between PR and collagen integrity, suggestive of a causal link between loss of PR
expression and CRM. This finding is intriguing as it
provides a mechanistic model for FPW, at least for
the cervical stroma (no comparable drop in PR expression was found in the myometrium). Our data
are consistent with that of Rodrı´guez et al. as we find
also a 50% drop in PR mRNA expression between
mid- and late-pregnancy. Interestingly, though, no
comparable decrease in PR mRNA abundance was
reported for human cervix [17, 58, 59]. Rodrı´guez
et al. also reported only small if any increase
in ESR1 expression toward term. In our data we
find a modest 0.7, but significant (P < 0.01)
decrease in ESR1 mRNA abundance, which is similar to the 0.84 decrease of ESR1 mRNA in human
cervix [17].
Another factor implicated in guinea pig CRM
is phospholipase A2 (PLA2). PLA2 catalyzes a ratelimiting step in prostaglandin synthesis by liberating arachidonic acid from phospholipids [43].
Prostaglandin E2 and F2-alpha have been shown to
stimulate collagenase activity in confluent guinea
pig cervical cell culture [60], and it is also long established that prostaglandins can induce cervical
ripening [1]. PLA2 activity increases 20-fold between non-pregnant to later term cervix extractions
[61]. The PLA2 activity measured by Rajabi and collaborators was mostly due to the 85 kD cytosolic
PLA2 (cPLA2), but contributions of lower molecular
weight secreted PLA2 was also detected, with a distinct protein peak around 20 kD [61]. Surprisingly, no
evidence for different protein levels was found between non-pregnant and term cervices by Rajabi and
Cybulsky but this was based on protein gel band
intensities. In our data, the mRNA abundance for
cPLA2 (PLA2G4A) was at moderate but significant
15 TPM at late term, and at the same level for
cervices of non-pregnant guinea pigs. In mid-term,
the level was lower at 7 TPM, which is 2 different
from late-term (P ¼ 8.88 104). However, the
most dramatic change we found was in a lowmolecular weight secreted PLA2 (PLA2G10, 18 kD),
which is not expressed before pregnancy and increases to 200 and 400 TPM at mid- and late-term
pregnancy, respectively. Hence, our data are consistent with the results of Rajabi et al. with respect to
protein levels of cPLA2 in non-pregnant and late
term cervices, but also shows a dramatic induction
of a secreted PLA2, PLA2G10, which is not expressed
before pregnancy. These data suggest an active
role of prostaglandin signaling in normal CRM in
guinea pigs.
In guinea pigs, CRM is in part achieved through
collagen degradation effected by collagenases [62,
63]. Rajabi et al. reported a 5-fold increase in
procollagenase activity from Day 50 to parturition.
In our data the highest RNA expression level was
found for MMP2, with 1000 TPM in late pregnancy
and a 1.6-fold increase from mid-term to late-term.
Among the ADAMTS metalloproteases, the highest
expression was found to be ADAMTS1, with a 1.28
increase relative to mid-term and a final 181 TPM.
The highest fold increase was recorded for
ADAMTS8 with a 6.7 increase consistent with the
finding of Hassan et al., who reported a 2.9
increase in human cervix [17]. The final RNA
abundance of ADAMTS8 (14 TPM), however, was
more than one order of magnitude lower than that
of ADAMTS1 (181 TPM) and even smaller than
the 1000 TPM found for MMT2. This comparison
illustrates the danger of sole reliance on fold change
measures to identify biologically important factors.
Overall expression of type I collagen, collagenases
and inhibitors of matrix metallopeptidases do not
show a clear picture that would explain extracellular
matrix remodeling. In contrast, the proteoglycan
expression suggests a role in CRM with increased
expression of the large proteoglycan VCAN.
Small proteoglycans, FMOD, testican, SPOCK2
and biglycan (BGN), in contrast, all show considerable decrease toward term. FMOD is interacting
with type I and type II collagens. BGN plays a role
in assembly of collagen fibrils, and testican is also
involved in extracellular matrix organization.
Independent evolution of functional progesterone withdrawal
Comparison to human cervix
285
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
There are no data available to directly compare the
results reported in the guinea pig with those in
humans. The only data set we are aware of from
humans that has some relationship to the guinea
pig data presented here is Affymetrix based data
from ripe and unripe cervices at term [17]. Ideally,
both human and the guinea pig tissues should be
obtained from corresponding stages of gestation.
However, there are methodological and ethical
considerations to prevent such a comparison.
Obtaining cervix samples from women with ongoing
gestations in the mid-trimester is not feasible
because of the risk of causing a spontaneous abortion. On the other hand, the diagnosis of cervical
ripening in the guinea pig is difficult. Changes in
collagen have been quantitated using a non-invasive
approach, which is not widely available to investigators [64, 65]. Therefore, a large number of guinea
pigs at term would need to be euthanized to obtain
material that would include both ripe and unripe
cervices.
One way of relating the observations made in this
study to data derived from humans is to assume
that an unripe cervix at term is a cervix that did not
make the transition from mid-pregnant state to a
compliant, ripe cervix. From this perspective, an
unripe cervix at term is probably closer to a midgestation pregnant cervix than to a ripe cervix.
This assumption can be challenged; yet, despite
the limitations, we believe that this comparison
may offer insight into the comparative biology of
cervical ripening, which may be difficult to gather
otherwise at this time.
Another methodological issue is the comparison
of transcriptomic data derived from microarrays
(Affymetrix) and RNAseq data. Studies in humans
have been based on Affymetrix technology [17], while
our data were obtained using RNAseq. Whether and
how Affymetrix and RNAseq data can be compared
relies upon whether the different scales on which
Affymetrix and RNAseq measures reside affect our
conclusion. A key question when exploring this comparison is whether, on average, the direction of
changes in gene expression between the two data
sets corresponded. To address this, we first only
considered statistically significant changes from
the human data, and asked whether the RNAseq
data (for the same genes) varied in the same direction. We considered fold changes estimated by
Affymetrix and RNAseq technology. Fold changes
are scale less quantities and thus, the original scales
of RNA abundance measures became irrelevant,
assuming their approximate linearity. The comparisons between the human and guinea pig samples
were determined by calculating the correlation
among the fold change data. This essentially tested
for congruence of the direction of change rather than
the magnitude of fold change. The only scenario
where the technological difference would matter is
if a change in one direction detected by Affimetrix
technology would consistently lead to a signal in the
opposite direction in RNAseq data. A substantial
degree of methodological pessimism would be
required to accept this scenario.
We find that there is the potential of shared
conserved regulatory pathways—most notably, the
down-regulation of ESR1 and Polybromo 1 (PBRM1)
as well as the up-regulation of IGFBP4, and the
increased expression of VCAN. The other changes
reported in the comparison of human ripe and unripe cervical tissue are not found in our comparison
of mid- and late-gestation samples from guinea pig.
There are methodological limitations that could explain these differences, and thus, the interpretation
needs to proceed cautiously. At the very least, these
data do not support a strong similarity of gene regulatory dynamics in humans and guinea pigs
(Supplementary Fig. S1). Hassan et al. reported 89
genes with significant differences between ripe and
unripe cervices. Most of these are up-regulated
(n ¼ 81). Of these 89 genes 76 where detected or
expressed in our guinea pig samples. The correlation
between the log 2-fold differences in the Hassan
data and the log 2-fold expression data from guinea
pig is low, r ¼ 0.054. We then determined the set of
genes that have a fold change in the same direction
as in human samples and are significantly different
between MT and LT (concordant genes). The list of
concordant genes is short, comprising only nine
genes: ESR1, PBRM1, SIGLEC1, CALD1, TPM1,
IGFBP4, PLN, COL4A2 and VCAN. In contrast, there
are 27 genes that are discordant, found to differ in
the opposite direction between unripe and ripe
human cervices at term and between MT and LT
guinea pig cervix.
More comparative data are necessary to assess
the existence of conserved core-regulatory mechanisms shared among placental mammals, or at least
among the Archontoglires, the clade combining
humans and mice. Any mechanistic interpretation
of transcriptomic data, however, will require followup studies to exclude the possibility that the changes
Nnamani et al. |
286
| Nnamani et al.
Evolution, Medicine, and Public Health
in RNA abundance are due to changes in tissue
composition, e.g. through the recruitment of
leucocytes.
5. Csapo AI, Pulkkinen MO, Wiest WG. Effects of
luteectomy and progesterone replacement therapy in
early pregnant patients. Am J Obstet Gynecol 1973;115:
759–65.
6. Casey ML, MacDonald PC. The endocrinology of human
CONCLUSIONS AND IMPLICATIONS
parturition. Ann N Y Acad Sci 1997;828:273–84.
7. Merlino AA, Welsh TN, Tan H et al. Nuclear progesterone
receptors in the human pregnancy myometrium:
evidence that parturition involves functional progesterone withdrawal mediated by increased expression of
progesterone receptor-A. J Clin Endocrinol Metab 2007;
92:1927–33.
8. Mesiano S. Myometrial progesterone responsiveness and
the control of human parturition. J Soc Gynecol Investig
2004;11:193–202.
9. Wagner GP, Tong Y, Emera D et al. An evolutionary test
of the isoform switching hypothesis of functional
progesterone withdrawal for parturition: humans have a
weaker repressive effect of PR-A than mice. J Perinat Med
2012;40:345–51.
10. Heap RB, Deanesly R. Progesterone in systemic blood and
placentae of intact and ovariectomized pregnant guineapigs. J Endocrinol 1966;34:417–23.
11. Mitchell BF, Taggart MJ. Are animal models relevant to key
supplementary data
aspects of human parturition? Am J Physiol Regul Integr
Comp Physiol 2009;297:R525–45.
Supplementary data are available at EMPH online.
12. Cao Y, Adachi J, Yano T et al. Phylogenetic place of guinea
pigs: no support of the rodent-polyphyly hypothesis
from maximum-likelihood analyses of multiple protein
funding
This research was supported, in part, by the Perinatology
Research Branch, Division of Intramural Research, Eunice
Kennedy Shriver National Institute of Child Health and
Human Development, National Institutes of Health,
Department of Health and Human Services (NICHD/NIH);
and, in part, with Federal funds from NICHD, NIH under
sequences. Mol Biol Evol 1994;11:593–604.
13. Murphy WJ, Eizirik E, O’Brien SJ et al. Resolution of the
early placental mammalian radiation using Bayesian
phylogenetics. Science 2001;294:2348–51.
14. Lin YH, McLenachan PA, Gore AR et al. Four new
mitochondrial genomes and the increased stability of
evolutionary trees of mammals from improved taxon
sampling. Mol Biol Evol 2002;19:2060–70.
Contract No. HSN275201300006C.
15. dos Reis M, Inoue J, Hasegawa M et al. Phylogenomic
Conflict of interest: None declared.
datasets provide both precision and accuracy in
estimating the timescale of placental mammal phylogeny.
Proc Biol Sci 2012;279:3491–500.
references
16. Wagner JE, Manning PJ. The Biology of the Guinea Pig. New
York: Academic Press, 1976.
1. Word RA, Li XH, Hnat M et al. Dynamics of cervical
17. Hassan SS, Romero R, Tarca AL et al. The transcriptome
remodeling during pregnancy and parturition: mechan-
of cervical ripening in human pregnancy before the
isms and current concepts. Semin Reprod Med 2007;25:
onset of labor at term: identification of novel molecular
69–79.
functions involved in this process. J Matern Fetal Neonatal
2. Romero R, Espinoza J, Kusanovic JP et al. The preterm
parturition syndrome. BJOG 2006;113(Suppl. 3), 17–42.
3. Csapo AI. The ‘See-Saw’ theory of parturition. In: Knight J,
O’Connor M (eds). The Fetus and Birth. New York: Wiley
Publishers, 2008, 159–95.
4. Csapo AI, Pulkkinen MO, Ruttner B et al. The significance
Med 2009;22:1183–93.
18. Draper RL. The prenatal growth of the guinea-pig. Anat Rec
1920;18:369–92.
19. Ibsen HL. Prenatal growth in guinea-pigs with special
refernce to environmental factors affecting weight at birth.
J Exp Zool 1928;51:51–93.
of the human corpus luteum in pregnancy maintenance.
20. Kihlstrom I. A simple method for the assessment of
I. Preliminary studies. Am J Obstet Gynecol 1972;112:
fetal age in guinea pig litters of various sizes, using a stat-
1061–7.
istical approach. Z Versuchstierkd 1985;27:227–31.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
(1) Lack of systemic progesterone withdrawal
at parturition evolved independently in
the primate and the guinea pig lineages.
(2) In guinea pig CRM at term is associated
with changes in proteoglycan expression
and less so with changes in the expression
of collagen or collagenases.
(3) We confirm previous evidence for a downregulation of PR mRNA expression in the
guinea pig cervix at term, a feature that has
not been described in humans.
(4) Potentially conserved mechanisms of CRM
include down-regulation of ESR1 and the
nuclear receptor PBRM1 as well as the
up-regulation of VCAN and IGFBP4.
Independent evolution of functional progesterone withdrawal
21. Langmead B, Trapnell C, Pop M et al. Ultrafast and mem-
37. Chrousos GP, Renquist D, Brandon D et al. The
ory-efficient alignment of short DNA sequences to the
squirrel monkey: receptor-mediated end-organ resist-
human genome. Genome Biol 2009;10:R25.
22. Langmead B, Salzberg SL. Fast gapped-read alignment
ance to progesterone? J Clin Endocrinol Metab 1982;55:
287
364–8.
38. Blanco MB, Meyer JS. Assessing reproductive profiles in
with Bowtie 2. Nat Methods 2012;9:357–9.
23. Trapnell C, Pachter L, Salzberg SL. TopHat: discovering
female brown mouse lemurs (Microcebus rufus) from
splice junctions with RNA-Seq. Bioinformatics 2009;25:
Ranomafana National Park, southeast Madagascar, using
fecal hormone analysis. Am J Primatol 2009;71:439–46.
1105–11.
24. Li B, Ruotti V, Stewart RM et al. RNA-Seq gene expression
39. Luckhardt M, Kaufmann P, Elger W. The structure of the
estimation with read mapping uncertainty. Bioinformatics
tupaia placenta. I. Histology and vascularisation. Anat
Embryol (Berl) 1985;171:201–10.
2010;26:493–500.
25. Wagner GP, Kin K, Lynch VJ. Measurement of mRNA abun-
40. Concannon P, Baldwin B, Tennant B. Serum progesterone
dance using RNA-seq data: RPKM measure is inconsistent
profiles and corpora lutea of pregnant, postpartum, barren
among samples. Theory Biosci 2012;131:281–5.
26. Wagner GP, Kin K, Lynch VJ. A model based criterion for
and isolated females in a laboratory colony of woodchucks
gene expression calls using RNA-squ data. Theory Biosci
41. Holekamp KE, Nunes S, Talamantes F. Patterns of
(Marmota monax). Biol Reprod 1984;30:945–51.
progesterone secretion in free-living California ground
2013;132:159–64.
27. Hebenstreit D, Fang M, Gu M et al. RNA sequencing re-
squirrels (Spermophilus beecheyi). Biol Reprod 1988;39:
1051–9.
42. Hauber HP, Goldmann T, Vollmer E et al. LPS-induced
zoan cells. Mol Syst Biol 2011;7:497.
28. Murphy WJ, Pringle TH, Crider TA et al. Using genomic
data to unravel the root of the placental mammal phyl-
mucin expression in human sinus mucosa can be
attenuated by hCLCA inhibitors. J Endotoxin Res 2007;13:
29. Reyes FI, Winter JS, Faiman C et al. Serial serum levels
109–16.
43. Kramer RM. Structure, function and regulation of mam-
of gonadotropins, prolactin and sex steroids in the non-
malian phospholipase A2. In: Brown BL, Dobson RM
pregnant and pregnant chimpanzee. Endocrinology 1975;
(eds). Advances in Second Messenger and Phosphoprotein
ogeny. Genome Res 2007;17:413–21.
Research. New York: Raven Press, 1993, 81–9.
96:1447–55.
30. Smith R, Wickings EJ, Bowman ME et al. Corticotropinhormone
in
chimpanzee
and
gorilla
pregnancies. J Clin Endocrinol Metab 1999;84:2820–5.
31. Bahr NI, Martin RD, Pryce CR. Peripartum sex steroid pro-
44. Bron L, Jandus C, Andrejevic-Blant S et al. Prognostic value
of arginase-II expression and regulatory T-cell infiltration
in head and neck squamous cell carcinoma. Int J Cancer
2013;132:E85–93.
files and endocrine correlates of postpartum maternal
45. Sigstad E, Paus E, Bjøro T et al. The new molecular markers
behavior in captive gorillas (Gorilla gorilla gorilla). Horm
DDIT3, STT3A, ARG2 and FAM129A are not useful in
Behav 2001;40:533–41.
32. Bosu WT, Johansson ED, Gemzell C. Peripheral plasma
diagnosing thyroid follicular tumors. Mod Pathol 2012;
25:537–47.
levels of oestrone, oestradiol-17beta and progesterone
46. Gannon PO, Godin-Ethier J, Hassler M et al. Androgen-
during ovulatory menstrual cycles in the rhesus monkey
regulated expression of arginase 1, arginase 2 and
with special reference to the onset of menstruation. Acta
interleukin-8 in human prostate cancer. PLoS One 2010;
Endocrinol (Copenh) 1973;74:732–42.
5:e12107.
33. Walsh SW, Stanczyk FZ, Novy MJ. Daily hormonal changes
47. Zheng HQ, Zhou Z, Huang J et al. Kruppel-like factor 5
in the maternal, fetal, and amniotic fluid compartments
promotes breast cell proliferation partially through
before parturition in a primate species. J Clin Endocrinol
upregulating the transcription of fibroblast growth factor
Metab 1984;58:629–39.
binding protein 1. Oncogene 2009;28:3702–13.
34. Albrecht ED, Townsley JD. Serum progesterone in the
48. Abuharbeid S, Czubayko F, Aigner A. The fibroblast growth
pregnant baboon (Papio papio). Biol Reprod 1976;14:
factor-binding protein FGF-BP. Int J Biochem Cell Biol
610–2.
35. Chambers PL, Hearn JP. Peripheral plasma levels of
2006;38:1463–8.
49. Llaurado´ M, Majem B, Castellvı´ J et al. Analysis of gene
progesterone, oestradiol-17 beta, oestrone, testosterone,
expression regulated by the ETV5 transcription factor
androstenedione and chorionic gonadotrophin during
in OV90 ovarian cancer cells identifies FOXM1 over-
pregnancy in the marmoset monkey, Callithrix jacchus.
expression in ovarian cancer. Mol Cancer Res 2012;10:
J Reprod Fertil 1979;56:23–32.
914–24.
36. Steinetz BG, Randolph C, Mahoney CJ. Patterns of relaxin
50. Chu XY, Zhu ZM, Chen LB et al. FOXM1 expression
and steroids in the reproductive cycle of the common
correlates with tumor invasion and a poor prognosis
marmoset (Callithrix jacchus): effects of prostaglandin
of colorectal cancer. Acta Histochem 2012;114:755–62.
F2 alpha on relaxin and progesterone secretion during
51. Xia JT, Wang H, Liang LJ et al. Overexpression of FOXM1 is
pregnancy. Biol Reprod 1995;53:834–9.
associated with poor prognosis and clinicopathologic
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
veals two major classes of gene expression levels in meta-
releasing
Nnamani et al. |
288
| Nnamani et al.
Evolution, Medicine, and Public Health
stage of pancreatic ductal adenocarcinoma. Pancreas
2012;41:629–35.
52. Xue YJ, Xiao RH, Long DZ et al. Overexpression of FoxM1
is associated with tumor progression in patients with clear
cell renal cell carcinoma. J Transl Med 2012;10:200.
59. Stjernholm-Vladic Y, Wang H, Stygar D et al. Differential
regulation of the progesterone receptor A and B in the
human uterine cervix at parturition. Gynecol Endocrinol
2004;18:41–6.
60. Rajabi M, Solomon S, Poole AR. Hormonal regulation of
53. Dai B, Gong A, Jing Z et al. Forkhead box M1 is regulated by
heat shock factor 1 and promotes glioma cells sur-
interstitial collagenase in the uterine cervix of the pregnant
guinea pig. Endocrinology 1991;128:863–71.
vival under heat shock stress. J Biol Chem 2013;288:
61. Rajabi MR, Cybulsky AV. Phospholipase A2 activity is
1634–42.
increased in guinea pig uterine cervix in late pregnancy
54. Snegovskikh V, Park JS, Norwitz ER. Endocrinology of
and at parturition. Am J Physiol 1995;269(5 Pt 1), E940–7.
parturition. Endocrinol Metab Clin North Am 2006;35:
62. Rajabi MR, Dodge GR, Solomon S et al. Immunochemical
173–91, viii.
and
immunohistochemical
evidence
of
estrogen-
mediated collagenolysis as a mechanism of cervical
during pregnancy and parturition. Trends Endocrinol Metab
2010;21:353–61.
dilatation in the guinea pig at parturition. Endocrinology
1991;128:371–8.
56. Mahendroo M. Cervical remodeling in term and preterm
63. Hegele-Hartung C, Chwalisz K, Beier HM et al. Ripening of
birth: insights from an animal model. Reproduction 2012;
the uterine cervix of the guinea-pig after treatment with the
143:429–38.
57. Rodrı´guez HA, Kass L, Varayoud J et al. Collagen
progesterone antagonist onapristone (ZK 98.299): an
remodelling in the guinea-pig uterine cervix at term is
associated with a decrease in progesterone receptor
64. Chwalisz K, Shao-Qing S, Garfield RE et al. Cervical
ripening in guinea-pigs after a local application of nitric
expression. Mol Hum Reprod 2003;9:807–13.
58. Stjernholm Y, Sahlin L, Akerberg S et al. Cervical ripening
oxide. Hum Reprod 1997;12:2093–101.
65. Fittkow CT, Shi SQ, Bytautiene E et al. Changes in light-
in humans: potential roles of estrogen, progesterone, and
induced fluorescence of cervical collagen in guinea
insulin-like growth factor-I. Am J Obstet Gynecol 1996;174:
pigs during gestation and after sodium nitroprusside
1065–71.
treatment. J Perinat Med 2001;29:535–43.
electron microscopic study. Hum Reprod 1989;4:369–77.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
55. Timmons B, Akins M, Mahendroo M. Cervical remodeling
75
Evolution, Medicine, and Public Health [2013] pp. 75–85
doi:10.1093/emph/eot005
orig inal
research
article
The adolescent transition
under energetic stress
Body composition tradeoffs among
adolescent women in The Gambia
1
Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA; 2MRC Keneba, MRC Unit,
The Gambia; 3MRC International Nutrition Group, London School of Hygiene and Tropical Medicine, London, WC1E 7HT,
UK; and 4MRC Human Nutrition Research, Elsie Widdowson Laboratory, Cambridge, CB1 9NL, UK
*Correspondence address. Department of Human Evolutionary Biology, Harvard University, 11 Divinity Avenue,
Cambridge, MA 02138, USA. Tel:+1-617-495-1679; Fax:+1-617-496-8041; E-mail mreiches@fas.harvard.edu
Received 30 January 2013; revised version accepted 23 February 2013
ABSTRACT
Background and objectives: Life history theory predicts a shift in energy allocation from growth to
reproductive function as a consequence of puberty. During adolescence, linear growth tapers off and,
in females, ovarian steroid production increases. In this model, acquisition of lean mass is associated
with growth while investment in adiposity is associated with reproduction. This study examines the
chronological and developmental predictors of energy allocation patterns among adolescent women
under conditions of energy constraint.
Methodology: Fifty post-menarcheal adolescent women between 14 and 20 years old were sampled for
weight and body composition at the beginning and end of 1 month in an energy-adequate season and 1
month in the subsequent energy-constrained season in a rural province of The Gambia.
Results: Chronologically and developmentally younger adolescent girls gain weight in the form of lean
mass in both energy-adequate and energy-constrained seasons, whereas older adolescents lose lean
mass under conditions of energetic stress (generalized estimating equation (GEE) Wald chi-square
comparing youngest tertile with older two tertiles 9.750, P = 0.002; GEE Wald chi-square comparing
fast- with slow-growing individuals for growth rate 19.806, P < 0.001). When energy is limited, younger
adolescents lose and older adolescents maintain fat (GEE Wald chi-square for interaction of age and
season 6.568, P = 0.010; GEE Wald chi-square comparing fast- with slow-growing individuals for interaction of growth rate and season 7.807, P = 0.005).
Conclusions and implications: When energy is constrained, the physiology of younger adolescents
invests in growth while that of older adolescent females privileges reproductively valuable adipose
tissue.
K E Y W O R D S : life history theory; puberty; body composition; reproductive ecology
ß The Author(s) 2013. Published by Oxford University Press on behalf of the Foundation for Evolution, Medicine, and Public Health. This is an Open
Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits
unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Meredith W. Reiches*1, Sophie E. Moore2,3, Andrew M. Prentice2,3, Ann Prentice2,4,
Yankuba Sawo2 and Peter T. Ellison1
76
| Reiches et al.
Evolution, Medicine, and Public Health
BACKGROUND AND OBJECTIVES
equates to 300 average kilocalories per day for the
9 months of gestation (calculated with the equation
from Aiello and Key [13] for a 42.2 kg !Kung woman)
and 640 kcal per day for 6 months of lactation [14].
At the same time, adiposity is not the only somatic
reproductive asset in women: in some developing
world populations, female height correlates positively with marriageability and with reproductive success [15–17], suggesting that somatic investments
in linear growth—or in one of its correlates, such as
pelvic growth—may yield reproductive dividends. It
is important to keep in mind, however, that greater
height may indicate that growth has already ceased
and the individual is prepared to invest in
reproduction.
This study investigated the determinants of somatic allocation strategy in energetically constrained
adolescent women, many of whom have not
completed linear growth. We asked the question:
What developmental and chronological markers
predict the transition from preferential investment
in growth in the form of lean mass to investment in
reproduction in the form of fat mass? We predicted
that, in an energetically constrained population of
adolescent women in The Gambia, developmental
age would predict somatic energy allocation strategy. The answer to this question will contribute to
our understanding of what constitutes evolutionarily
relevant cues to modulating the tempo of reproductive maturation in females.
METHODOLOGY
Study subjects and field site
Participants were 67 adolescent females between 14
and 19 years at enrolment, born to mothers enrolled
in a 1989–94 protein, energy and micronutrient supplementation trial conducted across the rural West
Kiang Region of The Gambia and co-ordinated by the
Medical Research Council (MRC) field station in
Keneba, The Gambia. Half the mothers received
pregnancy supplements from 20 weeks gestation
until delivery whereas the other half received supplements from delivery for 20 weeks. Daily supplements
were 4250 kJ and 22 g protein [18]. Subjects enrolled
in this study were resident in West Kiang, a rural
province of The Gambia where the highly seasonal
environment consists of a hungry season (June to
October) characterized by population weight loss
and a harvest season in which weight is gained
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Puberty is the transition from non-reproductive juvenility to reproductively capable maturity. In the
terms of life history theory [1], puberty represents a
transition in energy allocation: during the juvenile
period, energy available beyond the requirements
of maintenance is used for growth, as demonstrated
by accelerated growth rates in well-nourished populations relative to energy-constrained populations
[2]. At puberty, this surplus energy begins to be invested in reproductive function [3]. For human females, reproductive function is reflected by ovarian
steroid production. Ovarian estradiol promotes the
conversion of energy into adipose tissue [4], which is
mobilized during gestation and lactation [5]. In the
adolescent female body, therefore, acquisition of
lean mass, comprising bones, muscle, water and
organs, equates, in life history terms, to investment
in growth, while acquisition or preferential maintenance of adipose tissue can be understood as investment in reproduction.
Although puberty itself begins with a specific endocrine event, the initiation of pulsatile gonadotropin
releasing hormone (GnRH) secretion from the arcuate nucleus of the hypothalamus [6, 7], the transition
from juvenility to maturity occurs over the course of
several years (Supplementary Fig. S1). Linear growth
continues during this time, with height velocity in females generally peaking a year prior to menarche [8].
The overlap of these two phenomena, the adolescent
linear height spurt and the externally visible sign of
maturing gonadal function, indicates that adolescent
physiology must allocate energy simultaneously to
growth and reproductive function. This functional
overlap is in keeping with the constrained fecundity
seen in the years immediately after menarche, a
phenomenon often referred to as ‘adolescent sterility’ but more accurately termed ‘adolescent
subfecundity’ [9, 10].
The role of energy availability in mediating the
timing of pubertal maturation in traditional and
industrialized populations has been documented:
while a high ratio of adult to juvenile extrinsic mortality risk promotes early age at maturity even when
energy is limited [11, 12], there is generally a negative
relationship between energy availability and juvenile
growth rates on the one hand and age at puberty on
the other hand [2]. Less is known, however, about the
determinants of somatic energy allocation during
puberty. This is a particularly relevant question for
females, for whom a single reproductive event
Reiches et al. |
Female adolescent transition in The Gambia
[19]. All daughters enrolled in this study were born in
the hungry season, June through October inclusive,
the time of year when the pregnancy supplement had
the greatest impact on birth weight [18]. Participants
were post-menarcheal, not pregnant, had reported
at least one period since parturition if lactating and
were not using hormonal contraception.
Study design
Anthropometry
Height, weight and triceps skinfold thickness were
measured in triplicate by the same trained observers. Height was measured in barefoot participants
to the nearest millimeter with a stadiometer
(Leicester height measure, Seca 214), calibrated
daily with a wooden rod of known length. Weight
was measured in light clothing in triplicate to the
nearest 0.1 kg on a battery-operated scale (Tanita
Corporation, Japan), placed on a level surface and
calibrated daily with a 10-kg weight. Skinfold measures were taken to the nearest 2 mm with Holtain
calipers (Holtain).
Body composition
Body composition was measured with the Tanita
BC-418MA segmental body composition analyzer
at the beginning and end of each sampling season.
Prins et al. [20] validated the Tanita inbuilt prediction
equation estimates against total body water estimates of body composition in Gambian children
and developed a population-specific equation for
the estimation of percent fat-free mass. The correlation of this estimation equation with estimates
from deuterium was R = 0.84 (95% CI 0.79–0.89)
[20]. A modified version of this equation, which includes using triceps skinfold measures, was used to
convert Tanita impedance readouts in Ohms into
estimates of fat and lean mass [21].
Biological samples
Please see Supplementary Materials for detailed collection and analysis methods for C-peptide of insulin
and leptin.
Statistical analysis
Statistical analysis was performed in IBM SPSS
Statistics Version 19.0. GEEs were used to assess
the effect of chronological age, gynecological age
and height velocity on within-season changes in
weight and body composition (fat mass and lean
mass). Marginal means reflect uneven sample sizes
in groups. Potential covariates included in the
models were age, height and weight (for analyses
of fat and lean mass). Results were considered significant at P < 0.05.
Ethics
Ethics approval for the study in The Gambia was
granted by the joint Gambia Government/MRC
The Gambia Ethics Committee (proposal SCC
1169). Permission for Harvard personnel to conduct
the study was granted at Harvard by the Committee
for the Use of Human Subjects (application
#F-17744-102).
RESULTS
Study subject characteristics
Chronological, developmental and anthropometric
characteristics of study participants are detailed in
Table 1. Because no effect of maternal treatment
group was found in analysis, treatment group was
not included as a term (data not presented). Of the
67 women enrolled, data for this analysis were available for 50. The other 17 women could not be located
for end-of-month data collection. Differences in
sample size between the harvest and hungry season
are due to participants becoming pregnant,
transferring out of the study area or withdrawing
from the study.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Data reported here were collected in March during
the 2010 harvest season and during a 30-day period
spanning July and August in the 2010 hungry season.
Anthropometric measurements, weight and body
composition were measured at the beginning of
each data collection period at MRC Keneba, using
standard procedures with regularly validated equipment (see below). Weight and body composition
were measured again at the end of the data collection period in participants’ villages. First morning
fasting urine samples and non-fasted morning
blood spots were collected approximately weekly at
participants’ homes and transported to MRC
Keneba laboratory for processing.
77
78
| Reiches et al.
Evolution, Medicine, and Public Health
Predictor variables
Age
Age refers to chronological age at the beginning of
the study season. In the present analysis, the
youngest tertile was compared with the group
comprising the older two tertiles. This is because
analysis revealed clear biological differences between those closer to the beginning of puberty and
thus in the midst of pubertal growth relative to those
who were nearer completion of growth and maturation processes.
Height velocity
Height velocity in cm/year was estimated in all individuals who were present for anthropometric measurement in at least two sampling seasons. Analysis
compared individuals whose growth rate met or exceeded 1.0 cm/year, ‘fast growers’, with those whose
growth rates were <1.0 cm/year, ‘slow growers’.
Height velocity is a better proxy of maturity than
height in the post-menarcheal period when agerelated height differences are less important than
final height differences.
Relationship among predictor variables
Chronological and developmental variables are
correlated both biologically and statistically. Each
is presented separately here because each tracks a
slightly different biological process: while age marks
the passage of time, with which the probability of
maturational events increases, gynecological age
signifies distance from a threshold reproductive
event in an individual’s unique maturational history.
Height velocity, meanwhile, eventually reaches zero
in all individuals; a snapshot of height velocity therefore allows an estimate of how close to this
predetermined endpoint a given adolescent may be.
Table 1. Participant characteristics by season
Age (years)
Gynecological age (years)
Height (cm)
Start of season weight (kg)
End of season weight (kg)
Start of season % fat (Tanita derived)
End of season % fat (Tanita derived)
Log leptin (ng/ml)
Log C-peptide of insulin
ng/creatinine (mg)
Harvest season
(mean ± SE),
n = 47
Hungry season
(mean ± SE),
n = 29
Season
Wald-chi square
and P-value
17.30 (0.21)
2.2 (0.2)
161.0 (0.8)
52.7 (1.1)
52.6 (1.1)
21.8% (0.6)
21.8% (0.6)
1.1 (0.0)
1.2 (0.0)
17.98 (0.28)
2.9 (0.3)
161.4 (1.2)
55.8 (1.6)
54.9 (1.5)
24.0% (0.7)
23.3% (0.8)
1.0 (0.0)
0.9 (0.1)
5700, P < 0.001***
5710, P < 0.001***
10.7, P = 0.001**
40.2, P < 0.001***
6.29, P = 0.012*
67.9, P < 0.001***
10.2, P = 0.001**
4.32, P = 0.038*
12.8, P < 0.001***
This table represents the subset of 50 individuals for whom beginning and end of season weight and body composition
data are available. *P < 0.05. **P < 0.01. ***P < 0.001.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Gynecological age
Gynecological age is years since menarche.
Menarcheal age, as self-reported recall data, is prone
to errors of memory [22], which are compounded, in
this case, by differences between researchers’ and
participants’ concepts of time. Nonetheless, three
lines of evidence indicate that menarcheal ages reported here are biologically relevant. First, median
age at menarche in the study population was 15.00
years (95% confidence interval 14.92–15.42), while a
recent probit analysis of age at menarche in
the same population found a similar median
menarcheal age of 14.90 (95% confidence interval
14.52–15.28) [23]. Second, there was no significant
variation in reported menarcheal age relative to date
of birth: an ANOVA assessing age at menarche by
year of birth was non-significant (F = 1.64, P = 0.16).
Third, we would predict that developmentally
younger individuals have greater height velocity. As
expected, average height velocity across the study
period was associated negatively with gynecological
age (GEE estimated marginal means of height velocity for gynecological age tertiles 1.0 cm/year in the
youngest group, 0.9 cm/year in the middle group
and 0.7 cm/year in the oldest group; Wald chi-square
for gynecological age by tertile 14.6, P = 0.001,
n = 52). As with age, the youngest tertile was
compared with the older two tertiles in GEE
analyses.
Reiches et al. |
Female adolescent transition in The Gambia
Age and gynecological age tertiles were assigned
across the full sample, of which a subset is represented in within-season weight and body composition measurements. Therefore, there were different
numbers of individuals in the age and gynecological
age tertiles. Overlap among age and gynecological
age tertiles and height velocity categories in the harvest and hungry seasons is illustrated in
Supplementary Fig. S2.
Predictors of within-season weight change
On all measures of chronological and developmental age, younger and faster growing individuals
gained more weight than older, slower growing individuals in both the harvest season and the hungry
season (Fig. 1). (For this analysis and for those
below, Wald chi-square statistics for factors and
covariates are available in Table 2 and estimated
marginal means of group differences are in
Table 3.) Both the gynecological age and height velocity models confirmed the finding that weight gain
was more positive in the harvest season (Table 2).
Height at entry into the study was positively
associated with weight gain when age was a predictor (Table 2). This finding does not necessarily
contradict the association between youth and weight
gain, because height and age were not correlated in
the sample population (GEE ns).
Predictors of within-season lean mass change
In both hungry and harvest seasons, younger and
more rapidly growing individuals gained more lean
mass than older and less rapidly growing individuals
(Fig. 2 and Tables 2 and 3). The youngest tertile in
gynecological age gained more lean mass than older
tertiles in both seasons (Table 2). When participants
were divided by age, older individuals lost lean mass
in the hungry season whereas the youngest tertile
did not (Tables 2 and 3). Similarly, slower but not
faster growing individuals lost lean mass in the hungry season (Tables 2 and 3).
Predictors of within-season fat mass change
Chronologically younger individuals lost significantly more fat in the hungry season than in the harvest season (Fig. 2 and Tables 2 and 3). Although the
mean fat change among older individuals in the hungry season was likewise negative, it was not significantly different from this group’s within-season fat
change in the harvest season (Table 2). This pattern
was echoed in height velocity groups: while fat mass
remained constant for both fast and slow growers in
the harvest season, the slow growers maintained fat
mass in the hungry season while fast growers lost it
(Fig. 2 and Tables 2 and 3). Gynecological age alone
did not predict within-season fat mass change.
CONCLUSIONS AND IMPLICATIONS
Data from the study sample indicate that, during the
adolescent life history transition, the bodies of postmenarcheal adolescent women in The Gambia responded to energetic stress with somatic energy
allocation strategies that appeared to differ by age
and developmental stage. Those who were
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Differences in energy availability between
harvest and hungry seasons
The current data indicate that harvest season was
characterized by greater energy availability than the
hungry season. Three lines of evidence demonstrate
this difference. First, within season weight change in
the study population was positive in the harvest season and negative in the hungry season (weight
change in kg estimated marginal mean in the harvest
season 0.2 (SE 0.2), hungry season 0.3 (SE 0.1),
Wald chi-square of season 7.84, P = 0.005,
age covariate Wald chi-square 7.47, P = 0.006,
= 0.210, n = 49).
Second, the population as a whole maintained fat
mass in the harvest season and lost fat mass in the
hungry season (fat mass change in kg estimated
marginal mean in the harvest season 0.0 (SE 0.1),
hungry season 0.4 (SE 0.1), Wald chi-square of
season 12.0, P = 0.001).
Finally, leptin and C-peptide of insulin, endocrine
markers of long- and short-term energy status,
respectively, were significantly higher in the harvest
season than in the hungry season (log leptin in
ng/ml estimated marginal mean in the harvest season 1.1 (SE 0.0), hungry season 0.95 (SE 0.0), GEE
Wald chi-square for season 39.6 P < 0.001, fat mass
covariate Wald chi-square 104.5, P < 0.001,
= 7.107 E 6, n = 53; log C-peptide ng/Cr mg
estimated marginal mean in the harvest season 1.1
(SE 0.0), hungry season 0.93 (SE 0.0), Wald chisquare of season 11.9, P = 0.001, age covariate
Wald chi-square 6.58, P = 0.01, = 0.072, n = 52).
Taken together, these results indicate that energy
was limited in the hungry season relative to the harvest season, and the impact of energy constraint on
weight and on C-peptide of insulin was greater in
older individuals. Height and weight were not significant in any of the models and thus were not
included as covariates.
79
| Reiches et al.
Evolution, Medicine, and Public Health
Within Season Weight Change
age: youngest tertile
age: older two tertiles
gyn age: youngest tertile
gyn age: older two tertiles
height vel: ≥ 1 cm/yr
height vel: < 1 cm/yr
1
0.8
0.6
0.4
0.2
Mass kg
80
0
0
0.5
1
1.5
2
2.5
3
-0.2
-0.4
-0.6
-0.8
-1
Harvest Season
Hungry Season
Figure 1. Within season weight change in the harvest and hungry seasons as predicted by age (P = 0.005), gynecological age
(P = 0.002) and height velocity (P = 0.005). Wald chi-square statistics and additional P-values are in Table 2. Estimated marginal
means and standard errors are in Table 3
chronologically younger and gaining significant
height preferentially acquired lean mass, both in a
season of relative energy abundance and in a season
of relative energy constraint, mobilizing adipose tissue during the hungry season. In contrast, adolescent women who had neared or reached the end of
linear growth and those who were chronologically
older lost lean mass during the hungry season.
These results are consistent with earlier findings that
pregnant women who are still growing allocate a
higher proportion of energy to maternal relative to
fetal tissue than do comparably aged non-growing
pregnant women [24] (though see also [25]). The
current findings are also consistent with life history
theory, suggesting that shifts in somatic priorities of
energy allocation occur progressively during puberty. Given that all participants were postmenarcheal and height velocities were low across
the sample, it is possible that differences in intrasomatic allocation strategy would be even more apparent in a sample including younger adolescents.
When height velocity alone was considered, it appeared that slower growing adolescent women
mobilized lean mass and preserved fat mass, suggesting that ovarian function in this subset of individuals may have been more robust than in
faster growing women, with higher estradiol
levels promoting maintenance of adipose depots
important to gestation and lactation. Additional
research will be needed to establish where on the
body adipose tissue is preferentially maintained in
developmentally and chronologically older adolescents under conditions of energy constraint. The
authors hypothesize that gluteofemoral adipose
depots will be favored, as these reserves
have been shown to support gestation and lactation
[26].
The findings reported here differ from and contribute to previous research in significant ways.
Although cross-sectional patterns in female body
composition with age and parity have been reported
[27], short-term longitudinal shifts in intra-somatic
allocation among fat and lean mass during puberty
have not. Prior analyses of body composition in nonpregnant, non-lactating Gambian women found
preferential mobilization of fat tissue during the
hungry season in study participants 20–35 years of
age [28], indicating that the result reported here may
be specific to the pubertal period. Second, we did not
detect energy sparing in activity during the hungry
season relative to the harvest season, in contrast to
data from a similar population in Senegal [29].
One finding that requires further exploration is the
relative importance of height velocity compared with
gynecological age in predicting within-season somatic energy allocation strategies in adolescent
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
-1.2
Reiches et al. |
Female adolescent transition in The Gambia
81
Table 2. GEE Wald chi-square and P-values for analyses of age, gynecological
age and height velocity as predictors of within-season change in weight and body
composition
Wald chi-square and P-values
Lean
Fat
7.89, P = 0.005**
3.53, P = 0.060
0.150, P = 0.699
Height 3.95,
P = 0.047*,
= 0.032
9.75, P = 0.002**
0.18, P = 0.672
4.255, P = 0.039*
Height 4.71,
P = 0.030*,
= 0.036
0.56, P = 0.453
19, P < 0.001***
6.568, P = 0.010*
NS
Gynecological age
Gynecological age
Season
Interaction
Covariate and b
9.45, P = 0.002**
7.11, P = 0.008**
0.01, P = 0.905
NS
4.53, P = 0.033*
0.10, P = 0.750
0.28, P = 0.599
Weight 4.36,
P = 0.037*,
= 0.025
NS
NS
NS
NS
Height velocity
Height velocity
Season
Interaction
Covariate and b
7.79, P = 0.005**
8.68, P = 0.003**
0.03, P = 0.854
NS
19.8, P < 0.001***
3.16, P = 0.076
3.63, P = 0.057
NS
0.67, P = 0.412
5.76, P = 0.016*
7.81, P = 0.005**
Age 4.26,
P = 0.039*,
= 0.089
Age
Age
Season
Interaction
Covariate and b
Age and gynecological age comparisons are between the youngest tertile and older two tertiles, while height velocity
comparisons are between individuals growing 1 cm/year and those growing <1 cm/year. *P < 0.05. **P < 0.01.
***P < 0.001.
Table 3. Estimated marginal means and standard error for within-season
weight change
Weight, kg (SE)
Harvest
Age
Youngest tertile
Older two tertiles
Gynecological age
Youngest tertile
Older two tertiles
Height velocity
1 cm/year
<1 cm/year
Hungry
Lean mass, kg (SE)
Harvest
Hungry
0.5 (0.2)
01 (0.2)
0.2 (0.2)
0.5 (0.2)
0.4 (0.2)
0.2 (0.2)
0.9 (0.2)
0.2 (0.2)
0.7 (0.2)
0.0 (0.2)
0.2 (0.2)
0.5 (0.2)
0.5 (0.2)
0.1 (0.2)
0.5 (0.2)
0.0 (0.2)
0.6 (0.2)
0.1 (0.3)
0.1 (0.1)
0.8 (0.3)
0.5 (0.2)
0.0 (0.2)
0.6 (0.2)
0.7 (0.2)
Weight changes occur over a month. Harvest season n = 47. Hungry season n = 29.
Fat mass, kg (SE)
Harvest
0.1 (0.1)
0.1 (0.1)
NS
NS
0.0 (0.1)
0.1 (0.1)
Hungry
0.8 (0.2)
0.4 (0.1)
NS
NS
0.6 (0.1)
0.0 (0.1)
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Weight
82
| Reiches et al.
Evolution, Medicine, and Public Health
(age P = 0.002, season NS, interaction P = 0.039). Wald chi-square statistics in Table 2; estimated marginal means in Table 3. (b)
Comparison of lean mass change in the harvest season and the hungry season in individuals growing 1 cm/year relative to those
growing <1 cm/year (height velocity P < 0.001, season NS, interaction NS). Wald chi-square statistics in Table 2; estimated
marginal means in Table 3. (c) Comparison of fat mass change in the harvest season and the hungry season in the youngest and
older two tertiles (age NS, season P < 0.001, interaction P = 0.010). Wald chi-square statistics in Table 2; estimated marginal
means in Table 3. (d) Comparison of fat mass change in the harvest season and the hungry season in individuals growing 1 cm/
year relative to those growing <1 cm/year (height velocity NS, season P = 0.016, interaction P = 0.005). Wald chi-square statistics
and additional P-values in Table 2; estimated marginal means in Table 3
women. Specifically, why was height velocity important while distance from menarche appeared to be
less salient as a determinant of fat mass change? We
will consider two complementary explanations for
the importance of height in this study, since
prioritizing investment in linear growth suggests
that height itself is biologically significant. (Recall
that height and age were not correlated in the sample
population, suggesting that younger, faster growing
individuals were not merely approaching a height
threshold but rather were investing in height on its
own behalf, or on behalf of a physiologically relevant
trait that correlates with height.)
The lesser explanatory power of gynecological age
may be understood by keeping in mind two characteristics of pubertal maturation: first, menarche
represents not a point of biological transition, as
from sterility to fecundability, but rather a threshold
at which the functioning of the hypothalamic–pituitary–ovarian axis becomes visible. Follicular estradiol
production has reached a level at which the products
of endometrial lining proliferation can no longer be
resorbed and must be shed [30]. This proliferation,
however, is no guarantee that ovulation has
occurred [9]. The relationship of menarche to somatic energy allocation, therefore, is not completely
clear: while there must be pubertal levels of
circulating estradiol for menarche to occur, estradiol
contributes both to reproductive function and to linear growth [31], making its relative importance in
these two processes at the adolescent threshold
point indeterminate. Differences in the relationship
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Figure 2. (a) Comparison of lean mass change in the harvest season and the hungry season in the youngest and older two tertiles
Reiches et al. |
Female adolescent transition in The Gambia
reproduction, but it must prioritize which type of
somatic store to preserve and which to mobilize
when energy is limiting. The ability to negotiate
these tradeoffs adaptively during the pubertal transition is necessary to acquiring the somatic capital
that underwrites reproduction while taking advantage of reproductive opportunities at energetically
favorable moments.
In considering these results it is important to note
that they focus on individuals in the 15–20-year age
range who are not pregnant or in lactational amenorrhea. It is possible that there were physiological
differences in somatic energy allocation strategy between this group and their age- and size-matched
peers who were pregnant or in lactational amenorrhea and thus were not eligible for inclusion in this
study. Additional limitations include unequal sample sizes in hungry and harvest seasons and a relatively small overall sample size.
In summary, changes in weight and body composition over the course of an energetically constrained
hungry season and a less energetically constrained
harvest season indicated that chronologically and
developmentally younger adolescent women preferentially allocate somatic resources to growth, while
their older and more developed peers preserve
reproductively valuable adipose tissue when energy
is limiting. These results support an understanding
of adolescence as a period of life history transition
from juvenile growth to mature reproductive investment. The importance of height velocity as a predictor of somatic allocation strategy underscores
its status as proxy for the degree to which the growth
period is complete and the reproductive period
begun.
supplementary data
Supplementary data is available at EMPH online.
acknowledgments
The authors wish to thank the women of West
Kiang whose participation made the study possible. Special thanks go also to the field, Calcium,
Vitamin D and Bone Health (CDBH), and data
management teams, and to laboratory technicians
at MRC Keneba, particularly Musa Colley. In the
USA, the assistance of Dr Susan Lipson and
Megan Verlage in the laboratory was invaluable.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
of height velocity, final height and menarche in resource-constrained and resource-abundant populations affirm the variability of the menarche-growth
relationship [11]. Second, menarche typically occurs
about a year following peak height velocity [8]. Given
that there is inter-individual variation in the magnitude and duration of the pubertal spurt in linear
growth (Supplementary Fig. S3), gynecological age
or distance from menarche may correspond to different points in the individual’s linear growth trajectory [32]. That growth trajectory itself may be more
theoretically and practically robust as a predictor of
somatic energy allocation.
Second, height velocity may be a signal of structural maturation that is easy to measure outwardly
and that serves as a visible indicator of other dimensions of skeletal maturation, such as remodeling of
the interior dimensions of the bony pelvis in females,
which, even more than increases in bi-iliac breadth
or hip circumference, is the axis of anatomical
change most important to successful parturition in
humans [33]. A related and not mutually exclusive
possibility is that height is one marker of reproductive value in resource-scarce ecologies, perhaps
constituting a metric of the quality of the individual’s
developmental environment and thus the somatic
resources that she will be able to invest in reproduction [3]. Height associates positively with marriageability [15] and reproductive success [16] in many
non-Western populations, including The Gambia
[17], though the significant outcome measure in
the Gambian population is not number of births
but survival of offspring, indicating that women of
different heights may be able to allocate different
amounts of energy to fetal or infant growth or
immune function.
It is worthwhile to note that there was strong
evidence of growth and maturation in height,
weight and adiposity in the study population
across seasons (Table 1), even as within-season
changes in weight and fat mass tended to be negative in the hungry season. The shorter term patterns detected through within-season analysis
revealed responsiveness to energetic stress that
was not discernible from data collected at less frequent intervals. Subtle seasonal changes in weight
and body composition of the kind documented
here likely reflect the type of facultative shifts in
somatic energy allocation that conferred a selective
advantage over the course of human evolution: the
body must not only invest available resources in
the most beneficial life history category, growth or
83
84
| Reiches et al.
Evolution, Medicine, and Public Health
nourished lactating women. Am J Clin Nutr 1991;54:
funding
The research was supported by a Doctoral
Dissertation Improvement Grant (BCS-0925768)
and by a Senior Research Grant (BCS-0921237) from
the National Science Foundation of the USA. This
work was supported by the UK Medical Research
Council under program numbers MC-A760-5QX00,
U105960371 and U123261351.
788–98.
15. Smits J, Monden CWS. Taller indian women are more successful at the marriage market. Am J Hum Biol 2012;24:
473–8.
16. Stulp G, Verhulst S, Pollet TV et al. The effect of female
height on reproductive success is negative in western
populations, but more variable in non-western populations. Am J Hum Biol 2012;24:486–94.
17. Sear R, Allal N, Mace R et al. Height and reproductive
Conflict of interest: None declared.
success among Gambian women. Am J Hum Biol 2004;
16:223.
18. Ceesay SM, Prentice AM, Cole TJ et al. Effects on birth
weight and perinatal mortality of maternal dietary supple-
references
ments in rural Gambia: 5 year randomised controlled trial.
1. Gadgil M, Bossert WH. Life historical consequences of
natural selection. Am Nat 1970;104:1–24.
2. Eveleth PB, Tanner JM. Worldwide Variation in Human
1990.
energy balance in child-bearing Gambian women. Am J
Clin Nutr 1981;34:2790–9.
20. Prins M, Hawkesworth S, Wright A et al. Use of
3. Charnov EL, Berrigan D. Why do primates have such long
bioelectrical impedance analysis to assess body compos-
lifespans and so few babies?. Evol Anthropol 1993;1:191–4.
ition in rural Gambian children. Eur J Clin Nutr 2008;62:
4. Pallottini V, Bulzomi P, Galluzzo P et al. Estrogen regula-
1065–74.
tion of adipose tissue functions: involvement of estrogen
21. Hawkesworth S, Prentice AM, Fulford AJ et al. Dietary sup-
receptor isoforms. Infect Disord Drug Targets 2008;8:
plementation of rural Gambian women during pregnancy
52–60.
does not affect body composition in offspring at 11–17
5. Demmelmair H, Baumheuer M, Koletzko B et al.
Metabolism of U13C-labeled linoleic acid in lactating
women. J Lipid Res 1998;39:1389–96.
years of age. J Nutr 2008;138:2468–73.
22. Koo MM, Rohan TE. Accuracy of short-term recall of age at
menarche. Ann Hum Biol 1997;24:61–4.
6. Knobil E. The neuroendocrine control of the menstrual
cycle. Recent Prog Horm Res 1980;36:53–88.
23. Prentice S, Fulford AJ, Jarjou LM et al. Evidence for a downward secular trend in age of menarche in a rural Gambian
7. Plant TM, Barker-Gibb ML. Neurobiological mechanisms
of puberty in higher primates. Hum Reprod Update 2004;
10:67–77.
population. Ann Hum Biol 2010;37:717–21.
24. Scholl TO, Hediger ML, Schall JI et al. Maternal growth
during pregnancy and the competition for nutrients.
8. Tanner JM. Growth at Adolescence; with a General
and
Am J Clin Nutr 1994;60:183–8.
25. Jones RL, Cederberg HM, Wheeler SJ et al. Relationship
Environmental Factors upon Growth and Maturation from
between maternal growth, infant birthweight and nutrient
Birth to Maturity. Springfield, III, C. C. Thomas, 1962.
partitioning in teenage pregnancies. Int J Obstet Gynaecol
Consideration
of
the
Effects
of
Hereditary
9. Gray SH, Ebe LK, Feldman HA et al. Salivary progesterone
levels before menarche: a prospective study of adolescent
girls. J Clin Endocrinol Metab 2010;95:3507–11.
10. Apter D, Raisanen I, Ylostalo P et al. Follicular growth in
relation to serum hormonal patterns in adolescent
compared with adult menstrual cycles. Fertil Steril 1987;
47:82–8.
11. McIntyre MH, Kacerosky PM. Age and size at maturity in
women: a norm of reaction?. Am J Hum Biol 2011;23:
305–12.
12. Kramer KL, Greaves RD, Ellison PT. Early reproductive ma-
2010;117:200–11.
26. Rebuffe-Scrive M, Enk L, Crona N et al. Fat cell metabolism in different regions in women. Effect of menstrual
cycle, pregnancy, and lactation. J Clin Invest 1985;75:
1973–6.
27. Wells JC, Griffin L, Treleaven P. Independent changes in
female body shape with parity and age: a life-history
approach to female adiposity. Am J Hum Biol 2010;22:
456–62.
28. Lawrence M, Coward WA, Lawrence F et al. Fat gain
during pregnancy in rural African women: the effect of
turity among Pume Foragers: implications of a pooled en-
season and dietary status. Am J Clin Nutr 1987;45:
ergy model to fast life histories. Am J Hum Biol 2009;21:
1442–50.
430–7.
13. Aiello LC, Key C. Energetic consequences of being a Homo
erectus female. Am J Hum Biol 2002;14:551–65.
14. Goldberg GR, Prentice AM, Coward WA et al. Longitudinal
assessment of the components of energy balance in well-
29. Benefice E, Garnier D, Ndiaye G. High levels of habitual
physical activity in west African adolescent girls and relationship to maturation, growth, and nutritional status: results from a 3-year prospective study. Am J Hum Biol 2001;
13:808–20.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
Growth. Cambridge University Press: Cambridge, UK,
Br Med J 1997;315:786–90.
19. Prentice AM, Whitehead RG, Roberts SB et al. Long-term
Reiches et al. |
Female adolescent transition in The Gambia
30. Salamonsen LA. Current concepts of the mechanisms of
32. Tanner JM. Foetus into Man: Physical Growth from
menstruation: a normal process of tissue destruction.
Conception to Maturity. Harvard University Press:
Trends Endocrinol Metab 1998;9:305–9.
31. Juul A. The effects of oestrogens on linear bone growth.
Hum Reprod Update 2001;7:303–13.
85
Cambridge, MA, 1990.
33. Moerman ML. Growth of the birth canal in adolescent
girls. Am J obstet Gynecol 1982;143:528–32.
Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015
37
Evolution, Medicine, and Public Health [2013] pp. 37–45
doi:10.1093/emph/eot002
Socioeconomic status
determines sex-dependent
survival of human offspring
David van Bodegom*1,2, Maarten P. Rozing1, Linda May3, Hans J. Meij1,4,
Fleur Thome´se5, Bas J. Zwaan6,7 and Rudi G. J. Westendorp1,2
1
Department of Gerontology and Geriatrics, Leiden University Medical Center, PO Box 9600, 2300 RC Leiden; 2Leyden
Academy on Vitality and Ageing, Poortgebouw LUMC, Rijnsburgerweg 10, 2333 AA Leiden; 3Department of Parasitology,
Leiden University Medical Center, PO Box 9600, 2300 RC Leiden; 4Amphia Hospital, Postbus 90157, 4800 RL Breda;
5
Department of Sociology, VU University Amsterdam, De Boelelaan 1081, 1081HV Amsterdam; 6Laboratory of Genetics,
Wageningen University, PO Box 309, 6700 AH Wageningen and 7Institute of Biology, Leiden University, PO Box 9505, 2300
RA Leiden, The Netherlands.
*Corresponding author. Leyden Academy on Vitality and Ageing, Poortgebouw LUMC, Rijnsburgerweg 10, 2333 AA
Leiden, The Netherlands. Tel:+31-71-5240960; Fax:+31-71-5248159; E-mail: bodegom@leydenacademy.nl.
Received 31 August 2012; revised version accepted 26 January 2013.
ABSTRACT
Background and objectives: In polygynous societies, rich men have many offspring through the marriage
of multiple wives. Evolutionary, rich households would therefore benefit more from sons, and according
to the Trivers–Willard hypothesis, parents invest more in offspring of the sex that has the best reproductive prospects. We determined the sex differences in number of offspring, sex ratio of offspring,
offspring survival and offspring weight in rich and poor households in a polygynous population.
Methodology: We studied a population of 28 994 individuals in Northern Ghana during an 8-year prospective follow-up. We determined the fertility rate for both men and women, sex ratio of 3511 newborn
offspring and offspring survival in 16 632 offspring up to reproductive age (18 years). Also, we collected 9842 weight measurements of 1470 offspring up to the age of 3 years from growth charts of local
clinics.
Results: In rich households, men have a lifetime number of 6.0 offspring, while for women this was 3.1.
In line with evolutionary predictions, the male:female sex ratio was higher in rich households (0.52; poor
households 0.49), sons had lower mortality in rich households (hazard ratio male versus female 1.06,
P = 0.64; poor households: hazard ratio male versus female 1.46, P = 0.01) and sons also had higher
weights in rich households (P = 0.008).
Conclusions and implications: In rich households, men have higher reproductive prospects in this
polygynous society and, in line with Trivers–Willard, we registered more sons in rich households, sons
had lower mortality and higher weights, maximizing the reproductive output in this society.
K E Y W O R D S : sex differences; Trivers–Willard; reproduction; offspring survival; offspring weight;
Africa
ß The Author(s) 2013. Published by Oxford University Press on behalf of the Foundation for Evolution, Medicine, and Public Health. This is an Open Access
article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits
unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
orig inal
research
article
38
| van Bodegom et al.
Evolution, Medicine, and Public Health
BACKGROUND AND OBJECTIVES
In polygynous societies, richer men can afford to
marry multiple wives and consequently increase
their reproductive success. In terms of Darwinian
fitness, rich households would therefore benefit
more from sons with their higher reproductive prospects. According to the Trivers–Willard hypothesis,
parents invest more in offspring of the sex that has
the best reproductive prospects [1].
Although Trivers–Willard effects have been found
in many animals, they are highly debated in humans.
In a recent review of 422 studies in mammals, which
investigated sex ratios at birth, excluding humans, a
Trivers–Willard effect was consistently found in several species, while in other species, including nonhuman primates, more contradictory findings are
found [2]. An important consideration here is that
many human studies were performed in monogamous populations [3–5]. Here, large effects are not
expected since in a monogamous society, there will
mostly not be large differences in reproductive output of sons and daughters. In polygynous societies,
however, a subset of more successful sons can have
large reproductive output through the marriage of
multiple wives. Previous studies that have examined
sex ratios and the Trivers–Willard effect in polygynous human populations found no sex-specific survival differences dependent on status among the
Bari of South America, nor among the Gabbra and
Kipsigis of Kenya [6–8].
We studied reproductive output of men and
women in poor and rich households in a large population of 28 994 individuals in a rural African society
in the Upper East Region of Ghana with a high degree of polygyny. Second, we investigated the differences in offspring sex ratio, sex differences in
offspring survival and offspring weight in poor and
rich households.
METHODOLOGY
Study area
This study was conducted in the Garu-Tempane district in the Upper East region of Ghana. General fertility and mortality patterns have been described
elsewhere [9]. The characteristics of the study population are presented in Table 1. The people are patriarchal, patrilineal and patrilocal and live in extended
families, of which 48% are polygynous. During 8
years of follow-up from 2002 to 2010, we assessed
reproduction and survival among 28 994 participants. The area is currently undergoing an epidemiological transition [10]. Drinking water was assessed
on household level, water from boreholes was considered safe drinking water and water drawn from
either open wells or from rivers was considered unsafe drinking water [11].
Socioeconomic status
In 2007, we designed a DHS-type questionnaire to
assess the socioeconomic status (SES) of the households of the study participants using a free listing
technique whereby we asked people from different
villages of the research area, both male and female,
in focus group discussions to list the household
items of most value [12]. These self-listed property
questionnaires are reported to be highly correlated
to longer property questionnaires [13]. The resulting
list of valuable items was comparable to part of the
core welfare indications questionnaire from the
World Bank and to the extended DHS asset list,
adapted to our region. The list included different
items, including mainly domestic livestock and different valuable household items comprising motorbikes, bicycles and iron roofing. The average wealth
of the household possessions in market value of
2007 was 1063 US dollar with a SD of 1021 US dollar.
The distribution was skewed to the right.
From these assets, a DHS wealth index was
calculated. This was done as explained in paragraph
2.2 of the DHS wealth index comparative report [14].
Using SPSS factor analysis, the indicator variables
were first standardized by calculating z-scores.
Second, the factor coefficient scores or factor
loadings are calculated. The DHS wealth index is
the sum of the indicator values multiplied by the
loadings. This index is itself a standardized score
with a mean of 0 and a SD of 1. We defined poor
and rich as the poorest 50% of households and the
richest 50% of households divided by the median of
the DHS wealth index.
Fertility
From the registered newborn offspring and the
observed person-years of fertile men and women
during our 8-year follow up, we calculated the agespecific fertility rates. Next, we multiplied the agespecific fertility rates with the fraction of surviving
van Bodegom et al. |
SES determines sex-dependent survival of human offspring
Table 1. Characteristics of the study population
Participants (n)
Male (%)
Female (%)
28 994
46
54
Tribe
Bimoba (%)
Kusasi (%)
Other (%)
66
26
8
Households (n)
Polygynous households (%)
Mean value of household possessions in US$ (mean (SD))
Safe drinking water (%)
1703
48
1063 (1021)
80
Number offspring
Numbers of offspring registered 2002–2010 (n)
SES available (n)
3645
3511
Offspring survival
Offspring 18 years (n)
Follow-up (calenderyears)
Person years (n)
Mean follow-up (years)
Deaths during follow-up (n)
16 632
2002–2010
91 256
5.5
471
Weights of offspring
Offspring 3 years with growth chart (n)
Weight measurements (n)
Average number of measurements per child (n)
men and women of these ages to calculate the number of offspring of each age group per year. The
lifetime number of offspring was calculated as the
sum of these numbers of offspring per year for all age
groups multiplied by 5 since all age groups are 5-year
age groups.
1470
9842
7
those cases only the person-years observed below
age 18 years were included in the analysis. In total,
we followed 16 632 individuals for 91 256 personyears which makes an average of 5.5 years follow-up
below the age of 18 years per individual observed.
During our follow-up, we observed 471 deaths below
the age of 18 years.
Survival
The survival analysis used a multivariable lefttruncated Cox regression analysis adjusted for sex,
tribe and drinking source. We found no evidence
that the assumption of proportionality of hazards
was violated. The left-truncated plots represent
age-specific survival probabilities calculated from
the 8-year follow-up rather than a prospective
lifetime follow-up. For the survival analysis up to reproductive age, we included all offspring up to 18
years. This survival analysis was performed on all
person-years observed 18 years during our 8-year
follow-up. Some individuals were followed 8 years
below the age of 18 years; some individuals were
followed both below age 18 years and above and in
Weights
The weights of the offspring were obtained from
growth charts of local health clinics in 2008. The
clinics use hanging scales to measure the weight
and use growth charts from the Ghana Health
Service, adapted from the World Health Organization. For the separate sexes of each age, we
standardized the weights on age and sex by
calculating SDS or z-scores by subtracting the mean
from the observed weight and dividing by the standard deviation.
On average, we had seven measurements per
child during their first 3 years of life. To take these
repeated measures into account and not treat them
39
40
| van Bodegom et al.
Evolution, Medicine, and Public Health
as independent measures, we used a linear mixed
model. In the model, we adjusted/corrected for tribe
of the offspring. The offspring from different tribes in
the area have very different biometrics. Some tribes
have cows, and the offspring of these tribes drink
milk. Therefore, these offspring have less stunted
growth and also do not suffer from (protein) malnutrition. We also adjusted the model for drinking
source and the month and year of measurement,
as weights fluctuated dependent on the season
and year (Supplementary Fig. S1). The point estimates presented in this article are estimates derived
from this model and therefore do not always add up
to zero for each age.
Ethics
Ethical approval was given by the Ethical Review
Committee of the Ghana Health Service, the Medical
Ethical Committee of the Leiden University Medical
Centre in Leiden, The Netherlands, and by the local
chiefs and elders of the research area.
All analyses were performed with Stata 11.0
(StataCorp LP, TX, USA).
RESULTS
We visited the research area annually from 2002 to
2010. Each year we registered the deaths, migration
and newborn offspring. Figure 1 shows the cumulative survival, age-specific fertility rates and number
of offspring for men and women of different age
groups from poor and rich households.
Figure 2a compares these numbers of offspring
born to fathers and mothers of different ages in poor
and rich households. The people in the research area
are polygynous and the man must pay a bride price
of four cows to arrange a marriage. Consequently,
richer men are able to increase their number of wives
and hence offspring. Taking the age-specific fertility
rate and survival to these ages into account, in poor
households the total number of lifetime offspring,
represented by the area under the curve in the figure,
was 3.4 offspring for men and 2.7 offspring for
women. In rich households, the total number of
lifetime offspring was 6.0 offspring for men, whereas
it was 3.1 offspring for women.
Studies have shown a strong heritability of SES in
pre-transitional societies [15]. This seems applicable
to this population also, since income is generated
largely through agriculture and sons inherit the cattle and land of their fathers. If offspring inherits the
SES from their parents and rich men have better reproductive prospects, one could hypothesize that
rich households would benefit more from sons,
which would create an opportunity for selection on
sex-specific survival dependent on SES. We
compared the sex ratio of offspring, offspring survival and offspring weights in poor and rich
households.
Figure 2b shows the sex ratio of the registered
offspring in the research area. Of all 3685 offspring,
we had socioeconomic information on 3511 offspring. In poor households, we registered 544 male
offspring and 565 female offspring (male:female sex
ratio 0.49). In rich households, we registered 1240
male offspring and 1162 female offspring (male:female sex ratio 0.52). Since we did not register the
offspring at birth, but during the annual field visit,
these sex ratios are secondary sex ratios at an average age of 6 months.
Second, we studied survival of 16 632 offspring up
to reproductive age (18 years) (Fig. 2c). In poor
households, sons had much higher mortality risk
compared with daughters (hazard ratio (HR) 1.46
[95% CI 1.08–1.96]; P = 0.01). In rich households,
however, mortality risk of sons was similar to that
of daughters (HR 1.06 [95% CI 0.84–1.33]; P = 0.64,
P for interaction = 0.09).
To further investigate the observed sex differences, we also looked at the survival differences in
different strata of SES. Figure 3 shows the survival
differences for male and female offspring stratified
in different strata of SES. The accompanying HRs are
reported in Table 2. These analyses show that the
sex-specific survival differences dependent on SES
are largely due to a reduced survival of male offspring in the poorest households.
Third, we analyzed the weights of offspring using
repeated measurements from growth charts of the
local health clinics. In an analysis of 9842 age and sex
standardized measurements among 1470 offspring
up to the age of 3 years, daughters had higher
weights in poor households, whereas sons had
higher weights in rich households (Fig. 2d). These
differences in sex-specific weight gain were significantly different (P for interaction = 0.008).
CONCLUSIONS AND IMPLICATIONS
We observed sex-specific effects of SES on the sex
ratio of offspring, offspring survival and offspring
weight. Several points should be discussed when
interpreting these results.
van Bodegom et al. |
SES determines sex-dependent survival of human offspring
Poor men
0.4
0.1
0.2
Offspring / (person)year
0.6
0.0
0.6
0.4
0.1
0.2
Age group (years)
Age group (years)
Cumulative survival (right y-axis, proportion)
Age specific reproductive rate (offspring/personyear)
Cumulative survival (right y-axis, proportion)
Offspring per agegroup/year (offspring/year)
Offspringper agegroup/year (offspring/year)
Age specific reproductive rate (offspring/personyear)
Poor women
Rich women
0.2
0.0
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
9
105
0
0.00
Age group (years)
0.8
0.15
0.6
0.10
0.4
0.05
0.2
0.00
0.0
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
9
105
0
0.4
0.05
1.0
0.20
Cumulative survival
0.6
0.10
Cumulative survival
0.8
0.15
Offspring / (person)year
1.0
0.20
Offspring / (person)year
0.0
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
9
105
0
0.0
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
9
105
0
0.0
0.8
0.2
Cumulative survival
0.8
0.2
1.0
0.3
Cumulative survival
Offspring / (person)year
Rich men
1.0
0.3
Age group (years)
Cumulative survival (right y-axis, proportion)
Age specific reproductive rate (offspring/personyear)
Cumulative survival (right y-axis, proportion)
Age specific reproductive rate (offspring/personyear)
Offspring per agegroup/year (offspring/year)
Offspring per agegroup/year (offspring/year)
Figure 1. Cumulative survival, age-specific fertility rate and offspring per year for poor and rich men and women of different age
groups.
First, concerning the high ages of continued reproduction in this area. Since there is no official
registration of births in this area, the ages are
estimated ages by three independent observers,
both local and Dutch fieldworkers. We used all information available to come to a best estimate, most
notably the relation to other family members with
known ages, but some ages could be estimated
too low and some too high. Although we did our best
to come to an objective estimate, old age carries a
certain status in this area, and it is possible that
more ages are overestimations than underestimations. This could explain the unusual high age of
retained fertility for some women and it is also possible that the high reproductive output of some old
men could be a little less extreme. Although misclassification of ages does not change the interaction of wealth and sex as we describe in this
article, it is important to recognize this when interpreting the fertility data.
Second, the sex ratios are sex ratios during registration at our annual field visit. Therefore, they are
secondary sex ratios at an average age of 6 months
and they do not necessarily reflect sex ratios at birth.
Therefore, they could be the result of early mortality
differences instead. We have observed mortality differences up to 18 years and it is expected that these
differences also exist in the first 6 months of life.
Another important point to discuss in this polygynous society is that men that fail to marry migrate
to the south of Ghana to work in poor conditions in
large cities or large-scale agricultural plantations.
We have no estimate of their reproductive output
but it is possible that this is low. Since the men that
migrate are preferential poor males, the fertility figures for poor males in the research area are most
probably overestimations of the reproductive output
of all men born in poor households in the area. The
contrast in reproduction between poor and rich is
therefore probably even stronger than presented
here. It is even possible that the lifetime number of
offspring of poor women is greater than the lifetime
number of offspring of poor men if this would be
taken into account. However, it is not possible to
41
42
| van Bodegom et al.
Evolution, Medicine, and Public Health
Figure 2. Offspring per year (a), sex of offspring (b), offspring survival (c) and offspring weight (d) in poor and rich households.
Error bars indicate standard errors. SDS = Standard Deviation Score.
0
5
Cumulative survival
10
Age (years)
0.85
0.90
0.95
1.00
0
15
Male
20
Female
5
Cumulative survival
0
0.85
0.90
0.95
1.00
10
Age (years)
0.85
0.90
0.95
1.00
0
15
Male
5
20
Female
5
0.85
0.90
0.95
0
10
Age (years)
0.85
0.90
0.95
1.00
10
Age (years)
Cumulative survival
0
15
15
5
20
Male
20
Female
Male
0.85
0.90
0.95
1.00
10
Age (years)
0.85
0.90
0.95
15
0
0
15
1.00
10
Age (years)
Female
5
20
5
Male
Female
5
20
Female
Male
10
Age (years)
0.85
0.90
0.95
1.00
0
0
10
Age (years)
0.85
0.90
0.95
1.00
15
15
20
Male
20
0.85
0.90
0.95
0.85
0.90
0.95
1.00
10
Age (years)
Female
5
Male
1.00
10
Age (years)
Female
5
0
0
15
15
5
20
10
Age (years)
0.85
0.90
0.95
1.00
Female
Male
10
Age (years)
5
20
Female
Male
0
15
15
Male
20
Female
5
20
Female
Male
0.85
0.90
0.95
1.00
0
10
Age (years)
15
5
10
Age (years)
20
Female
Male
15
20
Female
Male
Figure 3. Offspring survival 18 years dependent on socioeconomic status (SES) in different strata of wealth (DHS wealth index). (a) Split by median (poor versus rich), (b) tertiles of SES, (c) quartiles of SES
and (d) quintiles of SES.
0.85
0.90
0.95
1.00
(d)
(c)
(b)
Cumulative survival
1.00
Cumulative survival
Cumulative survival
Cumulative survival
(a)
Cumulative survival
Cumulative survival
Cumulative survival
Cumulative survival
Cumulative survival
Cumulative survival
Richer
Cumulative survival
Poorer
SES determines sex-dependent survival of human offspring
van Bodegom et al. |
43
44
| van Bodegom et al.
Evolution, Medicine, and Public Health
Table 2. Hazard ratios for mortality 18 years (male versus female)
HR
95% CI
P
Poorest 50%
Richest 50%
1.46
1.05
(1.08–1.96)
(0.84–1.33)
0.01
0.64
First tertile
Second tertile
Third tertile
1.51
0.96
1.15
(1.11–2.05)
(0.70–1.33)
(0.83–1.59)
0.008
0.81
0.4
First quartile
Second quartile
Third quartile
Fourth quartile
1.43
0.97
1.24
1.11
(1.03–2.00)
(0.67–1.41)
(0.86–1.81)
(0.75–1.65)
0.03
0.89
0.25
0.59
First quintile
Second quintile
Third quintile
Fourth quintile
Fifth quintile
1.58
1.35
0.68
1.44
1.09
(1.09–2.29)
(0.88–2.07)
(0.45–1.03)
(0.92–2.25)
(0.72–1.76)
0.02
0.17
0.07
0.11
0.68
Bold values indicate significance at p < 0.05
calculate this without knowing the exact fertility
characteristics of the men that migrate. This does
not change our conclusions, however, and in fact, it
is possible that the Trivers–Willard effect could even
be stronger than presented here.
Concerning the mechanism behind the observed
sex differences dependent on SES, two possible explanations exist. First, they could be a reflection of
higher intrinsic vulnerability of sons to poor conditions. Looking at the mortality patterns in poor and
rich households, Fig. 3 shows that the differences
are largely determined by a higher mortality of sons
in poor conditions. It is known that men have higher
mortality risks throughout life in almost all countries
and in this harsh environment, this could be the
principle mechanism behind the observed survival
differences dependent on SES [16]. Second, our observations are also in line with differences in parental
investment as hypothesized by Trivers and Willard.
The observed sex differences in weight could reflect
differences in parental nursing habits; sex differences in breastfeeding have previously been
observed in Poland and the Caribbean [17, 18].
These differences in parental behavior do not have
to be based on conscious decisions. Previous
studies among the Mukogodo of Kenya also showed
that in a male-centered society, parental behavior
can, maybe not even always consciously, be female
oriented in a society where all Mukogodo are poor in
relation to the Masaai [19]. We do not have observations on parental behavior in our study. Although
this would be interesting, from an evolutionary perspective not the mechanism but the number of
surviving male and female offspring is most
relevant.
A last thing to consider is a potential effect that
birth order could have on the observed patterns. It
could be expected that the first-born son would be
preferred; because he would inherit the wealth and
therefore have high reproductive prospects while
later born sons would be less favored. Unfortunately,
we do not have reliable data on this, but we are
planning to collect this in the future. On the other
side, although the oldest son inherits the house, his
brothers together with their wive(s) will often live
with him in his household. Also, it is important to
realize that in this society, possessions are not
owned individually but are shared to a high degree
among the (male) kin of the household.
Whether the sex differences that we have observed
in our study reflect the higher vulnerability of sons to
poor conditions, or reflect a sex-specific parental investment as proposed by Trivers and Willard, the net
result is the same; sons are better off in richer households which maximizes the reproductive prospects
of households in this polygynous society. In fact, the
two explanations are not mutually exclusive. The
Trivers–Willard hypothesis refers to an ultimate
van Bodegom et al. |
SES determines sex-dependent survival of human offspring
explanation in terms of evolutionary optimization.
Differential vulnerability to poor conditions is a proximate explanation referring to a potential mechanism, even if unspecified.
6. Zaldivar ME, Lizarralde R and Beckerman S. Unbiased sex
ratios among the Bari: an evolutionary interpretation.
Hum Ecol 1991;19:469–98.
7. Mace R. Biased parental investment and reproductive success in Gabbra pastoralists. Behav Ecol Sociobiol 1996;38:
75–81.
8. Borgerhoff Mulder M. Brothers and sisters. How sibling
supplementary data
interactions affect optimal parental allocations. Hum Nat
Supplementary data are available at EMPH online.
1998;9:119–61, doi: 10.1007/s12110-998-1001-6.
9. Meij JJ, van Bodegom D, Ziem JB et al. Quality–quantity
trade-off of human offspring under adverse environmental
funding
conditions. J Evol Biol 2009;22:1014–23, doi: 10.1111/
This research was supported by the Netherlands
Foundation for the advancements of Tropical
Research (WOTRO); the Netherlands Organization
for Scientific Research (NWO); the EU-funded
Network of Excellence LifeSpan, an unrestricted
grant of the Board of the Leiden University Medical
Center and the Association Dioraphte. None of
these organizations had any role in the design, analysis, interpretation or report of the study.
j.1420-9101.2009.01713.x.
10. Meij JJ, de Craen AJ, Agana J et al. Low-cost interventions
accelerate epidemiological transition in Upper East
Ghana. Trans R Soc Trop Med Hyg 2009;103:173–8,
doi: 10.1016/j.trstmh.2008.09.015.
11. Kuningas M, May L, Tamm R et al. Selection for genetic
variation inducing pro-inflammatory responses under adverse environmental conditions in a Ghanaian population.
PLoS One 2009;4:e7795, doi: 10.1371/journal.pone.
0007795.
12. van Bodegom D, May L, Kuningas M et al. Socio-economic
Conflict of interest: none declared.
status by rapid appraisal is highly correlated with mortality
Funding to pay the Open Access publication charges
for this article was provided by the Leyden Academy
on Vitality & Ageing.
risks in rural Africa. Trans R Soc Trop Med Hyg 2009;103:
795–800, doi: 10.1016/j.trstmh.2008.12.003.
13. Morris SS, Carletto C, Hoddinott J et al. Validity of rapid
estimates of household wealth and income for health surveys in rural Africa. J Epidemiol Commun Health 2000;54:
381–7.
references
14. Rutstein SO and Johnson K. The DHS Wealth Index. DHS
1. Trivers RL and Willard DE. Natural selection of parental
ability to vary the sex ratio of offspring. Science 1973;179:
Comparative reports No. 6. Calverton MD: ORC Macro,
2004.
90–2.
2. Cameron EZ. Facultative adjustment of mammalian sex
15. Borgerhoff Mulder M, Bowles S, Hertz T et al.
ratios in support of the Trivers–Willard hypothesis: evi-
of inequality in small-scale societies. Science 2009;326:
dence for a mechanism. Proc Roy Soc Lond B 2004;271:
Intergenerational wealth transmission and the dynamics
682–8, doi: 10.1126/science.1178336.
16. Kalben B. Why men die younger: causes of mortality dif-
1723–8, doi: 10.1098/rspb.2004.2773.
3. Cameron EZ and Dalerum FA. Trivers–Willard effect in
contemporary humans: male-biased sex ratios among bil-
ferences by sex. North Am Actuarial J 2000;4:83–111.
17. Koziel S and Ulijaszek SJ. Waiting for Trivers and Willard:
10.1371/
do the rich really favor sons?. Am J Phys Anthropol 2001;
4. Chacon-Puignau GC and Jaffe K. Sex ratio at birth devi-
18. Quinlan RJ, Quinlan MB and Flinn MV. Local resource
lionaires.
PLoS
One
2009;4:e4195,
doi:
journal.pone.0004195.
115:71–9, doi: 10.1002/ajpa.1058.
ations in modern Venezuela: the Trivers–Willard effect.
enhancement
Soc Biol 1996;43:257–70.
Caribbean community. Curr Anthropol 2008;46:471–80.
and
sex-biased
breastfeeding
in
a
5. Gaulin SJ and Robbins CJ. Trivers–Willard effect in con-
19. Cronk L. Intention versus behaviour in parental sex pref-
temporary North American society. Am J Phys Anthropol
erences among the Mukogodo of Kenya. Biosoc Sci 1991;
1991;85:61–9.
23:229–40, doi: 10.1017/S0021932000019246.
45