65 Evolution, Medicine, and Public Health [2013] pp. 65–74 doi:10.1093/emph/eot003 Epistasis between antibiotic resistance mutations drives the evolution of extensively drug-resistant tuberculosis So`nia Borrell1,2, Youjin Teo1,2, Federica Giardina1,2, Elizabeth M. Streicher3, ¨ller1,2,3, Tommie C. Victor3 and Marisa Klopper3, Julia Feldmann1,2, Borna Mu 1,2 Sebastien Gagneux* 1 Department of Medical Parasitology and Infection Biology, Swiss Tropical and Public Health Institute, 4002 Basel, Switzerland; 2University of Basel, 4003 Basel, Switzerland and 3DST/NRF Centre of Excellence for Biomedical Tuberculosis Research/MRC Centre for Molecular and Cellular Biology, Division of Molecular Biology and Human Genetics, Faculty of Health Sciences, Stellenbosch University, 7505 Cape Town, South Africa *Corresponding author. Department of Medical Parasitology and Infection Biology, Swiss Tropical and Public Health Institute (Swiss TPH), Socinstrasse 57, CH-4002 Basel, Switzerland. Tel:+41-61-284-8369; Fax:+41-61-284-8101; E-mail: sebastien.gagneux@unibas.ch The first two authors contributed equally to this work. Received 20 December 2012; revised version accepted 5 March 2013 ABSTRACT Background and objectives: Multidrug resistant (MDR) bacteria are a growing threat to global health. Studies focusing on single antibiotics have shown that drug resistance is often associated with a fitness cost in the absence of drug. However, little is known about the fitness cost associated with resistance to multiple antibiotics. Methodology: We used Mycobacterium smegmatis as a model for human tuberculosis (TB) and an in vitro competitive fitness assay to explore the combined fitness effects and interaction between mutations conferring resistance to rifampicin (RIF) and ofloxacin (OFX); two of the most important first- and second-line anti-TB drugs, respectively. Results: We found that 4 out of 17 M. smegmatis mutants (24%) resistant to RIF and OFX showed a statistically significantly higher or lower competitive fitness than expected when assuming a multiplicative model of fitness effects of each individual mutation. Moreover, 6 out of the 17 double drug-resistant mutants (35%) had a significantly higher fitness than at least one of the corresponding single drugresistant mutants. The particular combinations of resistance mutations associated with no fitness deficit in M. smegmatis were the most frequent among 151 clinical isolates of MDR and extensively drug-resistant (XDR) Mycobacterium tuberculosis from South Africa. Conclusions and implications: Our results suggest that epistasis between drug resistance mutations in mycobacteria can lead to MDR strains with no fitness deficit, and that these strains are positively ß The Author(s) 2013. Published by Oxford University Press on behalf of the Foundation for Evolution, Medicine, and Public Health. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. orig inal research article 66 | Borrell et al. Evolution, Medicine, and Public Health selected in settings with a high burden of drug-resistant TB. Taken together, our findings support a role for epistasis in the evolution and epidemiology of MDR- and XDR-TB. K E Y W O R D S : microbiology; antimicrobial; epidemiology; infection BACKGROUND AND OBJECTIVES Epistasis refers to the phenomenon where the phenotypic effect of one mutation differs depending on the presence of another mutation [1]. The importance of epistasis for our understanding of biology is increasingly recognized; it has been implicated in many processes, ranging from pathway organization, the evolution of sexual reproduction, mutational load, and genomic complexity, to speciation and the origin of life [2]. Moreover, recent studies have reported a role for epistasis in the evolution of antibiotic resistance [3–6]. Multidrug-resistant (MDR) bacteria are emerging worldwide, in some cases leading to incurable disease. Although new antibiotics are urgently needed, a better understanding of the forces that lead to the emergence of drug resistance would help prolong the lifespan of existing drugs. Studies in various bacterial species have shown that the acquisition of antibiotic resistance often imposes a physiological cost on the bacteria in absence of the drug [7–9]. However, some drug resistance conferring mutations have been associated with low or no fitness cost, and compensatory evolution can mitigate some of the initial fitness defects associated with particular drug resistance conferring mutations [10]. Most of these studies have focussed on resistance to a single drug. Given the public health threat posed by MDR bacteria, there is a need to understand the factors that influence the emergence of resistance to multiple drugs. Recent studies in model organisms have shown that mutations conferring resistance to different drugs can interact epistatically. A study in Pseudomonas aeruginosa found that the relative fitness of certain strains resistant to streptomycin and rifampicin (RIF) [4,6] was lower than expected based on the fitness of the corresponding single-resistant mutants. Similarly, a study in Escherichia coli [3] showed that strains resistant to two drugs can have a higher fitness than strains resistant to only one drug; a phenomenon referred to as ‘sign epistasis’ [11]. However, whether such epistatic interactions play any role in the emergence and spread of MDR bacteria in clinical settings has not been determined. Multidrug resistance is a particular problem in human tuberculosis (TB) [12]. Recent surveillance data showed the highest rates of resistance ever documented with some Eastern European countries reporting up to 50% of TB cases as MDR [13]. In Mycobacterium tuberculosis, the main causative agent of human TB, drug resistance is chromosomally encoded and results from de novo acquisition of mutations in particular genes [14]. These mutations are acquired sequentially, giving rise to MDR and extensively drug-resistant (XDR) strains [15,16]. MDR-TB is defined as strains resistant to at least RIF and isoniazid, the two most important first-line anti-TB drugs. XDR-TB is caused by strains that, in addition to being MDR, are also resistant to ofloxacin (OFX), or any other fluoroquinolone, and to at least one of the injectable second-line drugs [17]. In this study, we used Mycobacterium smegmatis as a model for M. tuberculosis to investigate putative epistatic interactions between mutations conferring resistance to RIF and OFX, two of the most widely used first- and second-line anti-TB drugs, respectively. M. smegmatis is used widely in the TB research community because it is non-pathogenic, in contrast to M. tuberculosis, which requires biosafetylevel 3 containment. Moreover, M. smegmatis forms visible colonies in 2–3 days, compared with 3–4 weeks for M. tuberculosis. We then compared our experimental data generated with M. smegmatis to the clinical frequency of particular combinations of RIF and OFX resistance conferring mutations in a panel of MDR and XDR M. tuberculosis clinical strains from South Africa. METHODOLOGY Bacterial strains and growing conditions All strains used for the competitive fitness experiments were derived from the wild-type M. smegmatis strain mc2155. Bacteria were grown in Middlebrook 7H9 broth supplemented with ADC or on Middlebrook 7H11 agar plates supplemented with Borrell et al. | Epistasis in drug-resistant tuberculosis OADC. The culture tubes were incubated in standard conditions and the optical density (OD600) was recorded daily to measure the growth. Selection of single- and double-resistant M. smegmatis mutants Independent RIF- and OFX-resistant M. smegmatis single mutants were isolated as follows. A starting culture of M. smegmatis mc2155 was prepared from wild-type M. smegmatis and adjusted to 300 bacilli/ml (OD600 0.01). Ten milliliter of culture was transferred into 14 individual 50 ml falcon tubes. When the bacteria reached end of log-phase (OD600 3.00), the cultures were concentrated by centrifugation at 1500 rpm for 5 min, the supernatant discarded, and the bacteria resuspended in 500 ml Middlebrook 7H9 media. This concentrated bacterial culture was plated onto Middlebrook 7H11 media containing 200 mg RIF/ml for the isolation of RIF-resistant colonies, and 2 mg OFX/ml for the isolation of OFX-resistant colonies. The plates were incubated for 3–5 days at 37 C until colonies became visible. One colony from each plate was picked and subcultured in antibiotic-free Middelbrook 7H9 broth. For the isolation of double-resistant mutants, different rpoB- and gyrA-mutants were used to generate different combinations of mutations conferring resistance to both antibiotics. Some double-resistant mutants were selected by plating on Middlebrook 7H11-OADC media containing both 200 mg/ml of RIF and 2 mg/ml of OFX. Mutation identification The main target genes for resistance to RIF and OFX are rpoB and gyrA, respectively. To detect the relevant drug resistance conferring mutations, the rpoB and gyrA genes were amplified by PCR using DNA extracted from the single- and the doubleresistant mutants. The primers used to amplify the portion of the rpoB gene encoding the main set of mutations conferring resistance to RIF were 50 -GGA CGT GGA GGC GAT CAC ACC-30 . For amplification of the gyrA gene, the primers 50 CAT GAG CGT GAT CGT GGG CCG-30 and 50 CAG AAC CGT GGG CTC CTG CAC-30 were used. The same primers were used for direct DNA sequencing from the PCR product. Fitness assay and calculation of fitness ratio The rpoB-, gyrA- and rpoB–gyrA-mutants were competed against the wild-type antibiotic-susceptible strain in antibiotic-free Middlebrook 7H9 media. A total of 100 CFU of bacteria/ml were inoculated in 10 ml of Middlebrook 7H9 media in a 1:1 ratio. For each wild-type-mutant pair, between four and eight replicate competition assays were performed. At the start of the experiment (t = 0 h), 50 ml from each competition culture was plated on both antibiotic-free- and antibiotic-containing Middlebrook 7H11 plates in triplicates to estimate the baseline CFU counts. The competition cultures were incubated at standard conditions on a shaking incubator at 100 rpm, and the optical densities (OD600) were recorded daily. After 72 h, the same competition cultures were diluted 105- to 106-fold and plated on both selective and non-selective Middlebrook 7H11 media to obtain the endpoint CFU counts. For both competing strains, the Malthusian parameters were calculated by taking the natural log of the endpoint CFU over the baseline CFU [7]. The mean CFU count of the three replicates was used for the calculation of the relative competitive fitness. This gave the Malthusian parameters (ms and mr) for both strains, which correspond to the number of doublings (generations) that each strain went through during the observed time period. Finally, the relative fitness of the drug-resistant strain relative to the wild-type was determined using Wrs = mr/ms [7]. Shapiro–Wilk test evidenced the normality of the fitness data (P = 0.3). Student’s t-test was used to detect differences in the mean fitness and the limit for statistical significance was set at P = 0.05. Test statistics and estimates were based on 1000 bootstrap replicates. Statistical analysis was performed with STATA SE/10. Measuring epistasis To explore putative genetic interactions between drug resistance mutations, pairwise epistasis (e) was measured assuming a multiplicative model in which e = WABWab WAbWaB, where Wab is the fitness of the clone carrying alleles a and b, and capital letters represent the wild-type sensitive alleles [3]. Following this model, values of e > 0.0 indicate that the fitness of the double mutant is higher than expected based on the fitness values of the individual single mutants. Similarly, values of e < 0.0 indicate that the fitness of the double mutant is lower 67 68 | Borrell et al. Evolution, Medicine, and Public Health than expected based on the fitness values of the individual single mutants. We tested the normality of the epistasis data with a Shapiro–Wilk test. To test whether epistasis values were significantly different from zero, we used the error-propagation method described by Trindade et al. [3]. We considered that alleles a and b showed significant epistasis whenever the calculated error was smaller than the average value of e (Fig. 3). To detect the presence of sign epistasis, we performed pairwise comparisons between the fitness of each double-resistant mutant and the corresponding single-resistant mutants using a one-sided bootstrap Student’s t-test with 1000 replicates (Fig. 5). The combined P-values were obtained using Fisher’s method. Clinical frequency of rpoB and gyrA mutation combinations in M. tuberculosis A total of 151 clinical MDR- and XDR-TB M. tuberculosis isolates were included in this study. These were collected in the Eastern (N = 99) and Western Cape (N = 52) Provinces of South Africa between 2008– 2009 and 2001–2008, respectively. RIF and OFX resistance determining regions in the rpoB and gyrA genes were analysed using standardized PCR and sequencing [18,19]. Amplification products were sequenced using an ABI 3130XL genetic analyzer, and the resulting chromatograms were analysed using Chromas software. RESULTS Fitness cost of single drug-resistant mutants We first determined the relative fitness of M. smegmatis mutants resistant to a single drug. To this end, we selected a series of spontaneous M. smegmatis mutants resistant to RIF or OFX. From the RIF-selected mutants, we used five clones with rpoB mutations for further analysis (H526R, H526P, H526Y, S531W and S531L) (Supplementary Table S1). These mutants were competed in vitro against their RIF-susceptible ancestor as described previously [7]. We found that S531L, S531W and H526Y showed no difference in relative fitness compared with the ancestor (Fig. 1A), while H526R and H526P showed a significantly lower relative fitness (Bootstrap P = 0.02 and P < 0.01, respectively). Similar to previous work in M. tuberculosis [7], we found a strong correlation between fitness cost of rpoB mutations in M. smegmatis and the frequency of these mutations in clinical isolates of M. tuberculosis (Spearman’s Rank coefficient 0.9, P = 0.04; Supplementary Table S1). Individually, S531L and H526Y that showed no fitness cost in our M. smegmatis model are the most frequent RIF resistance conferring mutations in clinical settings, whereas S526P that had the lowest relative fitness of all mutants occurs only in 0.1% of clinical strains (Supplementary Table S1). We found no correlation between the spontaneous mutation frequency of rpoB mutations and the clinical frequency of these mutations (Supplementary Table S1). From the OFX-selected mutants, we selected four that carried distinct gyrA mutations for further analysis (D94G, G88C, D94N and D94Y). In vitro competition against the OFX-susceptible ancestor revealed that mutants carrying D94G and D94Y had no fitness defect, while D94N and G88C had a significantly lower relative fitness (Bootstrap P = 0.02 and P < 0.01, respectively) (Fig. 1B). We compared our fitness measures with the frequency of gyrA mutations found in M. tuberculosis clinical isolates using data from a recently published review based on 1220 OFX-resistant M. tuberculosis isolates [20] (Supplementary Table S2). Similar to our findings with RIF-resistant mutants, we found that mutations at codon position 94 of gyrA, which showed overall the highest in vitro fitness in M. smegmatis, were the most common mutations in M. tuberculosis clinical strains. By contrast, gyrA G88C that had the lowest fitness is only rarely (1.6%) found in clinical settings (Supplementary Table S2). In contrast to the rpoB mutations, mutations at codon position 94 of gyrA were also the most frequent during the in vitro selection (Supplementary Table S2). Evidence for epistasis between rpoB and gyrA mutations To test for possible epistatic interactions between mutations conferring RIF and OFX resistance, we selected for spontaneous mutants resistant to both drugs. These double mutants harbouring a mutation in rpoB and gyrA were selected starting from the available single drug-resistant mutants. A total of 17 rpoB–gyrA double mutants were generated out of the 20 possible combinations (Supplementary Table S3). The relative fitness of the double mutants was determined by standard competition assays against the pan-susceptible ancestor strain and Borrell et al. | Epistasis in drug-resistant tuberculosis B A 1.2 Relative fitness 1 0.8 0.6 0.4 0.2 94 D 94 D G Y C N 88 G L 31 W 31 6Y 6P 6R 94 D S5 S5 52 H 52 H 52 H gyrA mutants rpoB mutants Figure 1. Relative fitness of M. smegmatis mutants resistant to a single drug compared with their pan-susceptible ancestor. Bars represent 95% confidence intervals. (A) Relative fitness of rpoB single mutants resistant to RIF. (B) Relative fitness of gyrA single mutants resistant to OFX B 1.2 rpoB 531L 1.1 1.0 88C 531W 526P 526R 526Y - - + - + + + 0.9 0.8 gyrA Observed fitness A 0.7 94N - 94Y - + + 94G + - + 0.6 0.5 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 - - Expected fitness Figure 2. Evidence of epistasis between mutations conferring resistance to RIF and OFX. (A) Relationship between observed and expected multiplicative fitness for the 17 double-resistant mutants (data point above/below the bar). The solid line represents the null hypothesis of multiplicative fitness effects. Deviations from this line arise as a consequence of epistatic fitness effects. (B) Allelic combination analysed and the corresponding sign of epistasis. The grey squares correspond to the pairs of mutations showing statistically significant epistasis compared with the fitness of the corresponding single-resistant mutants. We compared the observed fitness of each double mutant with the expected fitness assuming no epistasis based on a multiplicative model (Fig. 2, see ‘Methodology’ section for details). We found that in 11/17 (65%) of the double mutants, the observed fitness was different from the expected, suggesting either negative or positive epistasis between particular RIF and OFX resistance conferring mutations (Fig. 2A). To measure epistasis quantitatively, we measured pairwise epistasis (e) between all the different single-mutant pairs we had fitness data for, assuming a multiplicative model (Supplementary Table S4); positive and negative values of e indicate positive or negative epistasis, respectively [3]. Overall, the e-values across all mutant pairs followed a normal distribution (Shapiro–Wilk, P = 0.062) with an average positive value of 0.027 (95% confidence interval 0.02, 0.08) (Supplementary Table S4). Four out of 17 (24%) double mutants showed statistically significant positive or negative epistasis between RIF and OFX resistance conferring mutations. Moreover as shown in Fig. 2B, these epistatic interactions were allele-specific, showing differences in the sign (i.e. positive versus negative) of the e-value depending on the specific amino acid change at a particular codon position. Theoretical and experimental evidence predicts a correlation between the average deleterious effect of a single mutation and the strength of epistasis [21–23]. Hence, we tested whether this relationship holds for drug-resistant mycobacteria. In agreement with these predictions, we found a negative 69 | Borrell et al. Evolution, Medicine, and Public Health MF ε 0.6 0.5 0.4 Average epistasis 70 0.3 R 2= 0.78 0.2 0.1 0 0.75 0.8 0.85 0.9 0.95 1 1.05 1.1 Expected fitness Figure 3. Correlation between the average expected fitness and the strength of epistasis. Average epistasis was measured as deviation from a multiplicative model of double-resistant mutant fitness scores estimated by head-to-head competition in Middlebrook 7H9 broth. MFe: minimum fitness for e correlation between the expected fitness of our double mutants and the strength of epistasis between the respective RIF and OFX resistance conferring mutations (R2 = 0.78; P < 0.001) (Fig. 3). However, this correlation was only observed above a particular threshold of expected fitness, which we refer to as ‘minimal fitness for epistasis’ (MFe). Above MFe, epistasis tended to be positive when individual mutations were costly and negative when individual mutations were beneficial [21,23]. Below MFe, the correlation was lost (R2 = 0.07; P = 0.304), likely because these data points were all derived from mutants carrying the G88C mutation in gyrA, which was associated with a high fitness defect. Evidence for sign epistasis in rpoB/gyrA double mutants Sign epistasis refers to the case where a particular mutation that is deleterious on its own is beneficial in the presence of another mutation [3]. In the context of drug resistance, sign epistasis occurs when the fitness of the double-resistant mutant is higher than at least one of the corresponding singleresistant mutants. We found that 6 out of 17 double mutants (35%) showed statistically significant evidence of sign epistasis (Fig. 4). In addition, the observed sign epistatis was allele specific, i.e. the epistatic effects varied according to the specific alleles of the same gene. For example, D94N in gyrA led to the conversion of the fitness sign in the S526P RIF-resistant background but not in the S531L RIF-resistant background (Fig. 4). Role of epistasis in clinical XDR-TB Given the evidence for epistasis between RIF and OFX resistance mutations in M. smegmatis, we investigated how fitness changes along the mutational pathway leading from MDR-TB to XDR-TB might be influenced by corresponding epistatic interactions in M. tuberculosis (Fig. 5A). In the standard treatment protocols for TB [17], RIF is an essential part of the first-line regimen for drug-susceptible disease, and OFX is a part of the second-line regimen when resistance against first-line drugs has developed. Thus, rpoB mutations are generally acquired first and gyrA mutations second. Following this trajectory, selection by RIF will occur first, and the RIF-resistant mutants that survive will exhibit heterogeneous fitness in the absence of the drug depending on their rpoB mutations (Fig. 1) [7,24]. At this point, MDR-TB has developed and second-line treatment is initiated. Selection for OFX resistance begins, but the fitness levels of the emerging double mutants can still be positively or negatively affected depending on which gyrA mutation is acquired. Our M. smegmatis data showed that Borrell et al. | Epistasis in drug-resistant tuberculosis 1.4 Relative fitness 1.2 1.0 0.8 0.6 0.4 0.2 88 52 C 6Y /G S5 88 C 31 W /G 88 S5 C 31 L/ D9 H 4N 52 6P /D 94 H 52 N 6R /D 94 H 52 N 6Y /D 94 S5 N 31 L/ D9 H 4Y 52 6P /D 94 H5 Y 26 Y/ D S5 9 31 4Y W /D 94 S5 Y 31 L/ D S5 94 G 31 W /D 94 H 52 G 6P /D 94 H 52 G 6R /D 94 H G 52 6Y /D 94 G H R/ G 52 6 H H 52 6 P/ G 88 C 0.0 rpoB/gyrA mutations Figure 4. Evidence for sign epistasis between mutations conferring resistance to RIF and OFX. Sign epistasis occurs when the fitness of the double-resistant mutant (pink bar) is greater than the fitness of at least one corresponding single-resistant mutant [purple-(RIF) and blue-(OFX) bars]. The bars represent the standard deviation of the values. Double-resistant mutants with a bootstrapped P < 0.05 are highlighted with a star the gyrA D94G mutation was associated with improved fitness in all of the double mutants, irrespective of the rpoB mutation (pink bars compared with purple bars in Fig. 4). This was statistically significant in two of the five corresponding double mutants tested. Hence, based on the most likely clinical scenario of moving from MDR- to XDR-TB (Fig. 5A), we would expect the gyrA D94G mutation to be the most commonly found mutation in XDR-TB strains, and also to be found in combination with many different rpoB mutations. By contrast, we would expect gyrA G88C, which was consistently associated with negative epistasis in our M. smegmatis model (Figs 3, 4 and 5A), to show the opposite trend. To test these predictions, we analysed 151 MDR- and XDR-TB clinical isolates from South Africa. Sequencing of the relevant genes revealed that 71/151 (47%) harboured gyrA D94G whereas G88C occurred only once (0.7%). Moreover, among the gyrA mutations represented in the M. smegmatis dataset, gyrA D94G was the only mutation that occurred in combination with four different rpoB mutations in clinical strains (Fig. 5B). Taken together, our results show that experimental fitness data generated with M. smegmatis can be predictive of clinical TB. Moreover, these findings support a role for epistasis in the progression of M. tuberculosis from MDR to XDR. CONCLUSION AND IMPLICATIONS In this study, we used M. smegmatis as a model to show that epistasis can occur between mutations conferring resistance to RIF and OFX, which are two of the most important anti-TB drugs. Specifically, in several of the mutants resistant to both of these drugs, some of the mutations conferring resistance to one drug mitigated the negative fitness effects of some of the mutations conferring resistance to the other drug (or vice versa). Moreover, we found clear evidence of sign epistasis, showing that in some cases, the doubleresistant mutants had a higher relative fitness than at least one of the corresponding single-resistant mutants. In the context of MDR, sign epistasis between different drug resistance conferring mutations represent the worst case scenario; instead of accumulating fitness defects with each additional drug resistance, MDR strains manage to increase their relative fitness by acquiring additional drug resistance determinants. One limitation of our study is that we cannot exclude the possibility that additional mutation(s) could have arisen during the selection of our mutants, which may compensate for the initial fitness defects associated with the individual resistance mutations. More work is needed to elucidate the mechanisms involved in the interaction between mutations in rpoB and gyrA. Yet, several features make such interactions biologically plausible. GyrA encodes one of the subunits of DNA gyrase which is involved in the introduction of negative supercoiling to doublestranded DNA, thereby relaxing the positive supercoils that form during DNA replication [25]. RpoB encodes a part of the RNA polymerase and therefore 71 72 | Borrell et al. Evolution, Medicine, and Public Health B A Figure 5. (A) Mutational pathway leading to rpoB–gyrA double mutants when a patient undergoes standard TB treatment. RpoB mutations are generally acquired first, followed by gyrA mutations. The relative fitness of the various double-resistant mutants is indicated as determined by in vitro competition using the M. smegmatis model. wt—drug-susceptible wild-type strain; rpoB— point mutations in rpoB conferring RIF resistance; gyrA—point mutations in gyrA conferring OFX resistance. (B) Frequency of rpoB–gyrA mutation pairs found in MDR- and XDR-TB clinical isolates from the Eastern Cape and Western Cape Provinces of South Africa (only considering pairs including gyrA mutants for which M. smegmatis fitness data were available; N = 89) important for the transcription of DNA to RNA [26]. Although these two pathways are separate [27,28], GyrA and RpoB are both involved in the fundamental flow from DNA to RNA. Intriguingly, Gupta et al. isolated an ‘RNA-polymerase-DNA gyrase complex’ in M. smegmatis that exhibited both DNA supercoiling and transcriptional activities. The authors also found that DNA gyrase inhibitors not only reduced DNA gyrase activity but also reduced transcriptional activity indicating a role of DNA gyrase in transcription [29]. Finally, it has been shown that during transcription, RNA polymerase introduces positive supercoiling ahead as it slides along its template DNA. This leads to a reduced accessibility as supercoiling increases, further supporting a potential role for DNA gyrase in transcription [25]. Our study also showed that experimental data obtained from M. smegmatis are relevant for our understanding of clinical TB. Not only did we observe the same drug resistance conferring mutations in M. smegmatis as routinely encountered in clinical strains of M. tuberculosis, but similar to previous studies, we found a good correlation for both RIF and OFX between the fitness cost observed in vitro in M. smegmatis mutants and the relative clinical frequency of the corresponding mutations in M. tuberculosis [20,24]. Our M. smegmatis data showed particular relevance when focusing on MDR- and XDR-TB. Based on the most probable mutational pathway leading from MDR to XDR, our M. smegmatis fitness data predicted particular combinations of rpoB and gyrA mutations to be more frequent than others in clinical settings. This prediction was confirmed when screening a large panel of MDR and XDR M. tuberculosis clinical strains from South Africa, which is one of the regions with the highest burden of XDR-TB in the world [17]. Our mutational pathway analysis also showed that in some cases, if certain mutations are acquired first, the fitness of these drug-resistant strains is permanently set at a high baseline that cannot be drastically affected regardless of the individual fitness cost associated with the second mutation. Moreover, some gyrA mutations can act as ‘fitness safety nets’ offering the bacteria the possibility to recover from loss of fitness caused by any of the Borrell et al. | Epistasis in drug-resistant tuberculosis initial rpoB mutations. Taken together, our results suggest that although evolution towards MDRand XDR-TB can follow multiple trajectories, these are likely to be influenced by epistatic interactions between the particular drug resistance conferring mutations. This will constrain the particular mutational combinations to those that either increase or at least maintain fitness at a minimum level (Fig. 4). Above this minimum level of fitness, our study indicates that the strength of epistasis between gyrA and rpoB will be stronger when the individual mutations are associated with large fitness defects. Although the fitness measures reported here were generated during in vitro growth, M. tuberculosis is facing harsher environments during human infection. The fitness effects of drug resistance mutations have been shown to vary in different environments [6,30]. Hence, it would be interesting to explore how host immune pressure, oxidative and other stresses might influence epitasis between drug resistance mutations. Our finding that a specific gyrA mutation (i.e. D94G; Figs 4 and 5A) can restore the fitness of strains carrying different rpoB mutations has implications for the development of new TB treatment regimens. So far, OFX and other fluoroquinolones have primarily been used as second-line drugs to treat MDR-TB [31]. However, because of their potential to shorten TB chemotherapy, they are currently being evaluated in the context of new first-line treatment regimens for drug-susceptible TB [32]. Our results highlight that using fluoroquinolones as firstline treatment is likely to result in the early selection of fluoroquinolone resistance conferring mutations such as D94G gyrA that not only confer resistance but might promote also the acquisition of additional drug resistance while maintaining bacterial fitness at an advantageous level, either through positive epistasis with mutations conferring resistance to RIF or other drugs, or by establishing a higher baseline fitness [33]. Moreover, exposure to fluoroquinolones induces the bacterial SOS response which leads to the induction of error-prone DNA polymerases, thereby increasing the bacterial mutation rate and the propensity of acquiring additional drug resistance conferring mutations [34]. Interestingly, we found that resistance mutations at codon position 94 of gyrA were also most frequent during in vitro selection, suggesting that in addition to epistatic interactions between rpoB and gyrA mutations, other mechanisms might influence the frequency of particular combinations of drug resistance mutations in clinical settings. In conclusion, our study together with previous findings demonstrates that epistasis between different drug resistance conferring mutations occurs across several bacterial species. Although our study focused on the interaction between mutations in rpoB and gyrA, further work should explore possible similar effects in resistance to other anti-TB drugs, both existing as well as those currently under development [35] (http://www.newtbdrugs.org/pipeline. php). Three new drug candidates have shown promising results in recent clinical trials of MDRTB treatment [32]. However, how these new compounds should best be deployed, and in what combinations, remains unclear. Our study suggests that considering putative epistasis between the relevant drug resistance conferring mutations could help optimize treatment regimens. For example, combining drugs in which the resistance conferring mutations interact negatively would reduce the probability of resistance emerging. supplementary data Supplementary data is available at EMPH online. acknowledgements We thank all the other members of our group for the stimulating discussions. funding This work was supported by the Swiss National Science Foundation (grant number PP0033119205) and the National Institutes of Health (AI090928 and HHSN266200700022C). Funding to pay the Open Access publication charges for this article was provided by the Swiss National Science Foundation (PP0033-119205). Conflict of interest: None declared. references 1. Lehner B. Molecular mechanisms of epistasis within and between genes. Trends Genet 2011;27:323–31. 2. Breen MS, Kemena C, Vlasov PK et al. Epistasis as the primary factor in molecular evolution. Nature 2012;490: 535–8. 73 74 | Borrell et al. Evolution, Medicine, and Public Health 3. Trindade S, Sousa A, Xavier KB et al. Positive epistasis drives the acquisition of multidrug resistance. PLoS Genet 2009;5:e1000578. genetics-based second-line drug resistance testing. Antimicrob Agents Chemother 2012;56:2420–7. 20. Maruri F, Sterling TR, Kaiga AW et al. A systematic review 4. Ward H, Perron GG, Maclean RC. The cost of multiple of gyrase mutations associated with fluoroquinolone-re- drug resistance in Pseudomonas aeruginosa. J Evol Biol sistant Mycobacterium tuberculosis and a proposed gyrase 2009;22:997–1003. ¨rkman J, Hughes D, Andersson DI. Virulence of antibi5. Bjo numbering system. J Antimicrob Chemother 2012;67: 819–31. otic-resistant Salmonella typhimurium. Proc Natl Acad Sci 21. Hall AR, Iles JC, MacLean RC. The fitness cost of rifampicin U S A 1998;95:3949–53. 6. Hall AR, Maclean RC. Epistasis buffers the fitness effects of rifampicin-resistance mutations in Pseudomonas aeruginosa. Evolution 2011;65:2370–9. 7. Gagneux S, Long CD, Small PM et al. The competitive cost of antibiotic resistance in Mycobacterium tuberculosis. Science 2006;312:1944–6. 8. Shcherbakov D, Akbergenov R, Matt T et al. Directed mutagenesis of Mycobacterium smegmatis 16S rRNA to recon- resistance in Pseudomonas aeruginosa depends on demand for RNA polymerase. Genetics 2011;187:817–22. 22. Lalic´ J, Elena SF. Magnitude and sign epistasis among deleterious mutations in a positive-sense plant RNA virus. Heredity (Edinb) 2012;109:71–7. 23. Wilke CO, Adami C. Interaction between directional epistasis and average mutational effects. Proc Biol Sci 2001; 268:1469–74. 24. O’Sullivan DM, McHugh TD, Gillespie SH. Analysis of struct the in-vivo evolution of aminoglycoside resistance rpoB and pncA mutations in the published literature: an in Mycobacterium tuberculosis. Mol Microbiol 2010;77: insight into the role of oxidative stress in Mycobacterium 830–40. 9. Andersson DI, Levin BR. The biological cost of antibiotic tuberculosis evolution? J Antimicrob Chemother 2005;55: 674–9. resistance. Curr Opin Microbiol 1999;2:489–93. 10. Andersson DI, Hughes D. Antibiotic resistance and its 25. Reece RJ, Maxwell A. DNA gyrase: structure and function. Crit Rev Biochem Mol Biol 1991;26:335–75. cost: is it possible to reverse resistance? Nat Rev 26. Miller LP, Crawford JT, Shinnick TM. The rpoB gene of Microbiol 2010;8:260–71. 11. Weinreich DM, Watson RA, Chao L. Perspective: sign epistasis and genetic constraint on evolutionary trajectories. Evolution 2005;59:1165–74. Mycobacterium tuberculosis. Antimicrob Agents Chemother 1994;38:805–11. 27. McClure WR, Cech CL. On the mechanism of rifampicin inhibition of RNA synthesis. J Biol Chem 1978;253:8949–56. 12. Muller B, Borrell S, Rose G et al. The heterogeneous evo- 28. Almeida Da Silva PEA, Palomino JC. Molecular basis and lution of multidrug-resistant Mycobacterium tuberculosis. Trends Genet 2013;29:160–9. mechanisms of drug resistance in Mycobacterium tuberculosis: classical and new drugs. J Antimicrob Chemother 13. World Health Organization. Multidrug and extensively 2011;66:1417–30. drug-resistant TB (M/XDR-TB). Global Report on 29. Gupta R, China A, Manjunatha UH et al. A complex of DNA Surveillance and Response. World Health Organization: gyrase and RNA polymerase fosters transcription in Geneva, 2010. Mycobacterium smegmatis. Biochem Biophys Res Commun 14. Sandgren A, Strong M, Muthukrishnan P et al. 2006;343:1141–5. Tuberculosis drug resistance mutation database. PLoS Med 2009;6:e1000002. 15. Perdiga˜o J, Macedo R, Silva C et al. From multidrug-resist- 30. Miskinyte M, Gordo I. Increased survival of antibiotic-re- ant to extensively drug-resistant tuberculosis in Lisbon, 31. Ginsburg AS, Grosset JH, Bishai WR. Fluoroquinolones, sistant Escherichia coli inside macrophages. Antimicrob Agents Chemother 2013;57:189–95. Portugal: the stepwise mode of resistance acquisition. J tuberculosis, and resistance. Lancet Infect Dis 2003;3: Antimicrob Chemother 2013;68:27–33. 432–42. 16. Sun G, Luo T, Yang C et al. Dynamic population changes 32. Zumla A, Hafner R, Lienhardt C et al. Advancing the devel- in Mycobacterium tuberculosis during acquisition and fix- opment of tuberculosis therapy. Nat Rev Drug Discov 2012; ation of drug resistance in patients. J Infect Dis 2012;206: 11:171–2. 1724–33. 17. World Health Organization. (2012) Global tuberculosis con- 33. Luo N, Pereira S, Sahin O et al. Enhanced in vivo fitness of fluoroquinolone-resistant Campylobacter jejuni in the ab- trol: surveillance, planning, financing. World Health sence of antibiotic selection pressure. Proc Natl Acad Sci U Organization: Geneva, 2012. 18. Van Der Zanden AGM, Te Koppele-Vije EM, Vijaya Bhanu S A 2005;102:541–6. 34. Ysern P, Clerch B, Castan´o M et al. Induction of SOS genes N et al. Use of DNA extracts from Ziehl–Neelsen-stained in Escherichia coli and mutagenesis in Salmonella slides for molecular detection of rifampin resistance and typhimurium by fluoroquinolones. Mutagenesis 1990;5: spoligotyping of Mycobacterium tuberculosis. J Clin Microbiol 2003;41:1101–8. 19. Streicher EM, Bergval I, Dheda K et al. Mycobacterium tuberculosis population structure determines the outcome of 63–6. 35. Ma Z, Lienhardt C, McIlleron H et al. Global tuberculosis drug development pipeline: the need and the reality. Lancet 2010;375:2100–9. 241 Evolution, Medicine, and Public Health [2013] pp. 241–253 doi:10.1093/emph/eot013 Genetic links between post-reproductive lifespan and family size in Framingham Xiaofei Wang*1, Sean G. Byars2 and Stephen C. Stearns3 1 Department of Statistics, Yale University, New Haven, CT 06520-8102, USA, 2Department of Biology, Copenhagen University, Universitetsparken 15, 2100 Copenhagen, Denmark and 3Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06520-8102, USA *Correspondence address. Department of Statistics, Yale University, New Haven, CT 06520-8102, USA. Tel: 203 432 0666; Fax: 203 432 0633; E-mail: xiaofei.wang@yale.edu Received 7 December 2012; revised version accepted 17 June 2013 ABSTRACT Background and objectives: Is there a trade-off between children ever born (CEB) and post-reproductive lifespan in humans? Here, we report a comprehensive analysis of reproductive trade-offs in the Framingham Heart Study (FHS) dataset using phenotypic and genotypic correlations and a genomewide association study (GWAS) to look for single-nucleotide polymorphisms (SNPs) that are related to the association between CEB and lifespan. Methodology: We calculated the phenotypic and genetic correlations of lifespan with CEB for men and women in the Framingham dataset, and then performed a GWAS to search for SNPs that might affect the relationship between post-reproductive lifespan and CEB. Results: We found significant negative phenotypic correlations between CEB and lifespan in both women (rP = 0.133, P < 0.001) and men (rP = 0. 079, P = 0.036). The genetic correlation was large, highly significant and strongly negative in women (rG = 0.877, P = 0.009) in a model without covariates, but not in men (P = 0.777). The GWAS identified five SNPs associated with the relationship between CEB and post-reproductive lifespan in women; some are near genes that have been linked to cancer. None were identified in men. Conclusions and implications: We identified several SNPs for which the relationship between CEB and post-reproductive lifespan differs by genotype in women in the FHS who were born between 1889 and 1958. That result was not robust to changes in the sample. Further studies on larger samples are needed to validate the antagonistic pleiotropy of these genes. K E Y W O R D S : genome-wide association study; longevity; trade-off; family size ß The Author(s) 2013. Published by Oxford University Press on behalf of the Foundation for Evolution, Medicine, and Public Health. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. orig inal research article 242 | Wang et al. Evolution, Medicine, and Public Health BACKGROUND AND OBJECTIVES Both the theory of life-history evolution and the evolutionary theory of aging assume a trade-off between reproduction and survival: a cost of reproduction paid in lifespan [1–4]. Although well documented in model organisms, the existence of this trade-off in humans has been controversial (e.g. [5]). Negative [6–11], positive [12–17], U-shaped [18–20] and mixed or insignificant [21–27] relationships between completed family size and lifespan have all been found. Some results have been criticized on statistical grounds; some authors doubt that the trade-off exists at all (e.g. [28–32]). Two papers suggest that the cost is only expressed in women of low social class or nutritional status; a similar effect has been found in model organisms [5, 21, 27]. Although most of the attempts to measure the trade-off in humans are based on phenotypic correlations, the standard of evidence for the existence of a trade-off in evolutionary analyses of model organisms is a negative genetic correlation demonstrated as a correlated response to selection (e.g. [5, 33]). Such experiments reveal genetic relationships often hidden by phenotypic plasticity. This standard cannot be met in humans, where experimental evolution is not possible. Two other types of genetic evidence, however, are available in humans. First, genetic correlations can be measured with pedigree analysis using methods developed for animal breeding. Using such ¨gele et al. [34] found a significantly methods, Go ‘positive’ genetic correlation between completed family size and lifespan in a sample of more than 5100 men and women who lived between 1658 and 1907 in South Tyrol, Italy. Second, genome-wide association studies (GWAS) can be done on populations where both the relevant traits and the single-nucleotide polymorphisms (SNPs) have been measured. In a GWAS done on more than 3500 women from Rotterdam, Kuningas et al. [35] found four chromosomal regions that influenced completed family size; none of them appeared also to affect lifespan. The aims of this analysis of men and women in the Framingham Heart Study (FHS) were to add to the genetic information on reproductive trade-offs in humans by (i) first measuring the phenotypic correlation of lifespan with children ever born (CEB), (ii) second estimating the genetic correlation of lifespan with CEB and (iii) performing a GWAS to search for SNPs with effects on the relationship of lifespan to CEB. We found significantly negative phenotypic and genetic correlations between post-reproductive lifespan and CEB in women. We also found five chromosomal regions mediating the trade-off that were genome-wide significant in several statistical models but not when we added smoking as a covariate. Some of the genes in those five regions are associated with increased risk of cancer. METHODOLOGY The Framingham Heart Study Initiated in 1948 in the town of Framingham (MA), the FHS includes three generations of participants that continue to be measured. Beginning with 5209 men and women initially enrolled in the originalcohort, the study added 5124 offspring-cohort participants in 1971 that were mostly offspring of the original-cohort. In 2002, a third-cohort was added consisting of offspring of the second cohort. Original-cohort participants have been examined every 2 years (28 exams in total to date), the offspring-cohort every 4 years (eight exams in total). Participants are mostly of European ancestry (20% UK, 40% Ireland, 10% Italy and 10% Quebec). Data were de-identified by the FHS. Data-use and human subjects’ approval were obtained from the National Institutes of Health (dbGaP) and the Yale Institutional Review Board. Phenotypic correlations Our sample included men and women who were born between the 1890s and the 1950s, except for age at menarche where the available sample was much smaller (i.e. 1923–56). Cox regression was used to calculate risk of death depending on age at first birth (nmen = 2579; nwomen = 2193), CEB (nmen = 3833; nwomen = 3658), and age at menarche (n = 1355) and menopause (n = 2415) in women. In each regression, potentially confounding effects in lifespan were controlled by including education, country of origin and smoking status. To test for potential nonlinear effects, a separate regression was run with a quadratic term included for the main predictor traits. If quadratic terms were significant, this was explored further by examining the Cox Wang et al. | Genetic links in Framingham regression model (from the survival library in R) using penalized splines (with 4 df) [36, 37]. The Cox proportional hazards model is a standard tool for survival analysis, in which the log of the hazard function h(t) is assumed to be a linear combination of the covariates. Specifically, for a model containing p covariates x1 , . . . ,xp ; the fitted model takes the form of hðtÞ ¼ hðt0 Þexpð1 x1+ +p xp Þ, where i is the coefficient fit to covariate xi and hðt0 Þ is the unknown baseline hazard function. Equivalently, this equation can be expressed as hðtÞ ln ¼ 1 x1+ +p xp : hðt0 Þ Note that FHS reports CEB as a value from ‘0’ to ‘5’, where ‘5’ indicates having had five or more children. Several variables were pre-adjusted for age and year measured. For body mass index (BMI), systolic blood pressure (SBP) and total cholesterol, age and year effects were removed by taking residuals of each trait against age (measures between 20 and 60 years old) and year measured using a generalized additive model (locally weighted scatterplot smoothing, LOESS). All residuals for a subject were then averaged to obtain an average residual for each trait, which were then used for modelling. As demonstrated previously, the surface of the generalized additive model can be accurately estimated due to the large number of trait measurements [38]. Our initial sample included 4123 women for whom data on age at death, CEB, education level, smoking history, estrogen use and BMI were available. We then removed 941 women who were born in or after 1941, a period when the correlation between lifespan and CEB was weaker, possibly because of the improvement of health care after World War II. We did so because to have a chance of detecting any significantly correlated SNPs in the GWAS, we needed to focus on a period where the phenotypic correlation is relatively strong. Nineteen women who died before the age of 50 years were also excluded, because their CEB records might represent incomplete observations. Because we excluded women who died before the age of 50 years, we are specifically studying the relationship of CEB to postreproductive mortality. Of the remaining 3163 women, keeping only those who had genotype data reduced our sample size to 1810. We required this sample to have associated genotype data because we later used the same sample for the GWAS. Note that our phenotypic analysis used the year 1919 as a cut-off because the yearly ratio of individuals alive to individuals deceased increased to about 50% in 1919, and continued to rise thereafter. For illustrative purposes, we also ran a multiple linear regression on a smaller sample for women, including only the deceased subjects who were born prior to 1919 (n = 680) out of a total of 1810 who satisfied specific criteria outlined above. We similarly ran a regression model on a smaller sample of men who have died (n = 712) out of a total of 1474 men satisfying similar criteria. Genetic correlations and heritabilities We estimated heritabilities and genetic correlations for traits from pedigrees using a mixed effects restricted maximum likelihood (REML) model in ASReml version 3.0 [39]. We considered models in which there were no covariates as well as adjusted models where phenotypic variation was partitioned into additive genetic, residual variance and a single random effect (maternal ID, paternal ID or education level). To be consistent with the phenotypic correlation models, we also considered models in which fixed effects (smoking status and country of origin) and both random effects for maternal ID and education level were included. Sex was not included as a fixed effect as male and female estimates were obtained separately. Smoking status (0/1, nonsmoker/smoker) and country of origin (0/1, US born/foreign born) were coded as binary variables. Education described number of years completed, with missing values coded as 8 years (the minimum). Maternal variance components ranged from 0.0 (age at first birth) to 0.12 ± 0.04 (lifespan) and 0.0 (age at first birth) to 0.20 ± 0.03 (lifespan) for female and male analyses, respectively. Education variance components ranged from 0.0 (age at menarche) to 0.06 ± 0.03 (CEB) and 0.0 (age at first birth) to 0.014 ± 0.009 (CEB) for female and male analyses, respectively. The Framingham pedigree totals 15 877 individuals in 1538 pedigrees consisting of both immediate and extended family. Heritability estimates were tested for significance with likelihood ratios that compared full models with reduced ones (i.e. 21DF = 2 (LogLFULL LogLREDUCED)) lacking the additive genetic component. Genetic correlations were also tested for significance by 243 244 | Wang et al. Evolution, Medicine, and Public Health comparing likelihood values from full models to ones where the genetic covariance was fixed at zero. Our genetic correlation analysis between CEB and lifespan included a total of 5133 females for whom age at death and CEB information were available. Supplementary Fig. S4 summarizes the pedigree information for these women, grouped by cohort via the ‘pedantics’ package in R [40]. Pedigree depths (computed using the same package) for the Framingham dataset range from 0 to 4, with mean 1.02 (±1.06). On average, each woman had 2.38 (±1.59) children in her lifetime and lived 77.21 (±12.73) years. The average level of education in years was 11.66. The average age at menarche was 12.81 (±1.54), average age at first birth was 26.49 (±4.81) and average age at menopause was 49.20 (±4.10). Genome-wide association study Our association results are based on 444 205 SNPs from the 500 K and 50 K Affymetrix samples that satisfied the following criteria: call rate >90%, Hardy–Weinberg equilibrium P-value >0.00001, Mendel error rate <2% and minor allele frequency >0.01. These SNP selection criteria are further discussed in the Supplementary Information. We used Cox proportional hazards models, as done in the phenotypic correlation analysis, to estimate the interactions between survival time past age 50 years, CEB and genotype. For censored individuals, we used their times of last observation past age 50 years as their censoring time. Several models were run under this setup, which we number to emphasize that they are nested models. Model 1 did not adjust for any covariates. We then added covariates to reduce confounding by variables that may be correlated with lifespan and CEB. Model 2 used education level. Model 3 further added BMI, estrogen use and cohort as covariates. Models 4a–d were intermediate steps in which one of the four additional covariates was added: blood pressure treatment indicator (Model 4a), total cholesterol (Model 4b), SBP (Model 4c) and smoking indicator (Model 4d). Model 5 included all four of these additional covariates. Models 4a–d were run retrospectively to pinpoint which covariate, when added, resulted in removing significance from all SNPs. A summary of the models fitted can be found in the Supplementary Information. Both genotypes and CEB were included as continuous variables to model an additive effect of the minor allele. We used both the raw genotypes provided by FHS as well as an imputed dataset. The imputation was done in several stages. First, we incorporated values imputed by MACH that were included in the FHS dataset. The MACH algorithm imputes missing genotypes based on shared haplotype stretches between subjects and HapMap data [41]. Of the remaining missing values, we sampled among the possible genotypes given the genotypes of parents, when parent genotypes were available. Any remaining missing values were simply sampled according to genotype proportions of the entire group. This sequence of operations created a full set of genotypes that had no missing values. Cohort was defined as a categorical variable computed from the year of birth: born before or in 1917 and born in or after 1918. In addition to running the above five models on the full sample of 1810, we tested our models for robustness by mimicking an out-of-sample analysis. To that end, we randomly divided our sample into two equal parts and fitted Models 1–5 to each part separately to check for consistency in significance of the top performing SNPs. A true out-of-sample performance check would include the calculation of prediction error based on a model fitted on a training set. Our method does not aim to validate prediction out of sample, but rather to ensure that a SNP discovered to be significant in one sample ought to be significant in another sample—a less stringent, but still important requirement of consistency. To minimize the effects of missing genotypes on each subsample, which would further lower our sample size in each of the two separate runs, we only used the imputed genotypes for this portion of our analysis. The downside of using imputed genotypes is the risk of imputation error. To verify that our risk of imputation error is low, we used the imputed SNP data to repeat our full-sample analyses for Models 1–5. Our aim was to show that our results for these models are similar, regardless of whether we used imputed or raw SNP data. To explore possible non-additive genotypic effects, we ran a separate Model 6 that used genotype as a categorical variable. The covariates used in Model 6 are identical to those used in Model 3, and any SNPs for which the homozygous minor genotype had fewer than 20 counts were excluded. We did not apply the half-sample testing to Model 6, because in many cases, the genotype counts in the homozygous minor allele category were too small to Wang et al. | Genetic links in Framingham further subdivide the group for categorical modelling. Finally, we ran two additional models that are outside of the nested framework given above on the raw data only (and therefore, they are not numbered). A quadratic model was run to search for a possible nonlinear effect by adding a quadratic CEB term along with its interaction with genotype to Model 1. The ‘matching covariates’ model was run to provide a frame of reference to the reader; this model uses exactly the same covariates that were included in the phenotypic and genotypic correlation analyses—education, smoking indicator and country of origin. RESULTS Phenotypic correlations In the Cox regression analysis where as many men and women were included as possible (birth-year range 1889–1958), censoring was used to account for those who were still alive according to the latest medical records. Risk of mortality beyond age 50 years increased if women (adjusted incidence rate ratio (RR) = 1.045, P = 0.030) had more children (Table 1). When a nonlinear term for CEB was included, it significantly improved the model fit and became more significant than the linear term. 245 Penalized splines for unadjusted mortality risk (Fig. 1) support a predominantly U-shaped pattern for the association between CEB and lifespan, similar to that found in some other studies (e.g. [19]). This is consistent with a cost of reproduction that is experienced by women with three or more children and with a benefit of reproduction to those who have one or two children. Highest mortality risk occurred in women with no children or more than three to four children, with lowest risk for those with approximately two. Mortality risk decreased if the first child was born later (women, unadjusted RR = 0.971, P < 0.001; men, adjusted RR = 0.985, P = 0.011; see Supplementary Fig. S1), but the significance of this effect depended on whether estimates were adjusted or not (Table 1). Mortality risk was also reduced if menopause occurred later in women (unadjusted RR = 0.970, P = 0.003), although this effect disappeared when other effects were controlled for (Table 1). Full model results can be seen in Supplementary Table S1. In the analysis where only the 680 women were included in the range of birth years 1889–1918 in which all had died, the phenotypic correlation between CEB and lifespan was highly significant and negative (r = 0.133, P = 0.0005; Fig. 2). Linear regression indicated that every additional child cost 0.74 years of lifespan (standard error (SE) = 0.21 years). There was, however, significant variation in Table 1. Incidence RR (±95% confidence interval) for age at death due to stroke, heart attack or cancer (beyond age 50 years) Trait Women Men Unadjusted Adjusted Unadjusted Adjusted CEB 1.050* (1.011–1.092)NL** n = 3729 1.045* (1.005–1.087)NL*** 0.995 (0.960–1.033) n = 3888 1.031 (0.993–1.071) Age first birth 0.971*** (0.955–0.988)NL** n = 2236 0.977* (0.960–0.994)NL* 0.990 (0.979–1.001) n = 2613 0.985** (0.974–0.995) Menarche 0.891 (0.757–1.050) n = 1367 0.917 (0.782–1.077) Menopause 0.970** (0.951–0.990) n = 2461 0.984 (0.965–1.005) Unadjusted Cox regression estimates included only the main predictor trait. Cultural effects (smoking, education and country-of-origin) were accounted for in adjusted estimates. ‘NL’ indicates that a significant nonlinear effect was also detected for the association between this trait and longevity. *P < 0.05, **P < 0.01, ***P < 0.001. 246 | Wang et al. Evolution, Medicine, and Public Health Figure 1. Summary of CEB and mortality risk in Framingham women. A histogram of CEB and log-relative mortality risk values for each CEB value with 95% confidence bands Figure 3. Correlation between CEB and lifespan by birth year (n = 5133) for women. Women (n = 680) were grouped by overlapping 10-year intervals of birth year, and the correlation between CEB and lifespan was computed for each group. Individual points indicate the sample size of each 10-year group, with the mean birth year plotted on the x-axis and correlation plotted on the y-axis phenotypic correlations are dependent on birth year is consistent with previous findings that selection pressures changed over time in Framingham [38]. Heritabilities and genetic correlations Figure 2. Relationship between CEB and lifespan for women. Scatterplot illustrating correlation between CEB and lifespan (r = 0.133, P < 0.001) (n = 680). Both variables have been jittered to minimize overlap of points the phenotypic correlation by birth year (Fig. 3); it was positive (with one exception) from 1893 to 1907 and negative from 1908 to 1913. Many in the earlier group were giving birth before the Great Depression and World War II. Some of the latter group encountered those two major environmental perturbations. The correlation between CEB and lifespan for the 712 men was slightly negative (r = 0.079, P = 0.0355; Supplementary Fig. S2). An additional child cost 0.54 years of male lifespan (SE = 0.26 years). Again, the correlation varied by birth year, but the variations were less pronounced than for females (Supplementary Fig. S3). The observation that In women (Table 2), the heritabilities of most major life-history traits differed significantly from zero, including age at death (h2 = 0.12, P = 0.01), CEB (h2 = 0.09, P = 0.03), age at first birth (h2 = 0.18, P < 0.001) and menopause (h2 = 0.44, P < 0.001). In women, the genetic correlation of CEB with age at death was large, negative and significant (rG = 0.88, P = 0.01) in a model without covariates (Supplementary Table S2). When we included education as a random effect, the genetic correlation decreased to 0.70 but was still significant (P = 0.02). When we included either the mother or the father identifiers in place of education as a random effect, the genetic covariance remained large and negative, but was no longer significant (mother: rG = 1.58, P = 0.11; father: rG = 1.46, P = 0.15). The model in which we adjusted for education, smoking status and country of origin also produced a large negative genetic correlation, but the correlation was not significant (rG = 0.69, P = 0.14). The correlation between the quadratic term CEB2 and lifespan was large, negative and significant in Wang et al. | Genetic links in Framingham 247 Table 2. Heritabilities (h2, on the diagonal) and genetic correlations (rG, off the diagonal) of life history traits (±SE) Women Age at death Age at death CEB Age first birth Menarche Menopause 0.12 ± 0.08 P = 0.0176 n = 3010 0.69 ± 0.52 P = 0.1420 0.20 ± 0.25 P = 0.2083 0.07 ± 0.23 P = 0.3886 0.15 ± 0.17 P = 0.1917 0.09 ± 0.05 P = 0.0394 n = 4123 0.40 ± 0.35 P = 0.1545 0.31 ± 0.24 P < 0.0001 0.21 ± 0.21 P = 0.1377 0.18 ± 0.06 P = 0.0008 n = 2912 0.38 ± 0.33 P = 0.0911 0.06 ± 0.14 P = 0.3541 0.16 ± 0.13 P = 0.0948 n = 1638 0.10 ± 0.21 P = 0.3121 CEB Age first birth Menarche Menopause Men Age at death CEB 0.44 ± 0.06 P < 0.0001 n = 3400 <0.01 ± <0.01 P = 0.8875 n = 2963 <0.01 ± <0.01 P = 0.7773 <0.01 ± <0.01 P = 0.6101 <0.01 ± <0.01 P = 0.5485 n = 4051 <0.01 ± <0.01 P = 0.3884 Age first birth 0.12 ± 0.07 P = 0.0300 n = 2688 SEs and P-values were obtained from maximum-likelihood estimates. Cultural (smoking, education and country-of-origin) and maternal effects were accounted for in all estimates. P-values < 0.05 are in bold. three of four models (no covariates: rG = 1.09, P = 0.003, only mother identifier as random effect: rG = 1.73, P = 0.04, only education as random effect: rG = 0.85, P = 0.01), and borderline non-significant in the model with only the father identifier (rG = 1.61, P = 0.06). Furthermore, we looked to see if the genetic correlation between CEB and lifespan was robust to pedigree depth in the simplest model where no covariates were included. Including only those women with pedigree depth of 1 or higher (n = 2540), we got rG = 0.46 (P = 0.14) and including only those women with pedigree depth of 2 or higher (n = 948), we got rG = 0.21 (P = 0.60); both correlations were no longer significant in the reduced samples. The genetic correlation of CEB with age at menarche was relatively large, positive and highly significant (rG = 0.31, P < 0.001). In men (Table 2), the heritability of age at first birth (inferred from their spouses) was small and only just significant (h2 = 0.12, P = 0.03). All other male heritability and genetic correlation estimates were non-significant. Full model results for heritability can be seen in Supplementary Table S2. Genome-wide association study GWAS results are summarized in Tables 3–10; the birth years for the 1810 women included in the GWAS are shown in the Supplementary Information. We deemed a SNP to be genome-wide significant if its interaction coefficient with CEB had a P-value that was less than a Bonferroni-adjusted threshold of 1.13 107 ( = 0.05), unless otherwise indicated. For females, we found two SNPs that attained genome-wide significance using the full 248 | Wang et al. Evolution, Medicine, and Public Health Table 3. GWAS for SNPs that affect the relationship between CEB and lifespan: summary of significant SNPs in Models 1–3 and 5 (full sample) Ssid Rsid Chr Position P-values (genotype CEB) Near Model 1 ss66450977 rs6768456 3 ss66475987 rs2575533 4 Model 2 Model 3 Model 4 Model 5 Matching covariates 7.99E07 4.93E08a 27867272 EOMES 4.03E10a 4.38E10a 8.40E09a (see Table 4) 42432336 ATP8A1 8.02E08a 5.30E08a 3.06E06 2.49E05 2.11E07 n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3. a SNP attained genome-wide significance. Table 4. GWAS for SNPs that affect the relationship between CEB and lifespan: summary of significant SNPs in Models 4 (full sample) Ssid Rsid Chr Position P-values (genotype CEB) Near Model 4a ss66450977 ss66475987 rs6768456 rs2575533 3 4 27867272 42432336 EOMES ATP8A1 1.40E09 1.02E05 Model 4b a a 7.44E09 3.56E06 Model 4c 8.65E09 5.23E06 Model 4d a 4.02E07 1.35E05 n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3. a SNP attained genome-wide significance. Table 5. GWAS for SNPs that affect the relationship between CEB and lifespan: summary of nominally significant SNPs in Model 6 Ssid Rsid Chr Position Near P-value Aa CEB P-value aa CEB Homozygous minor genotype count ss66450977 ss66500131 ss66392234 ss66495977 rs6768456 rs1777023 rs7132724 rs2180957 3 9 12 14 27867272 92008266 65001044 68238574 EOMES OR7E31P HELB RAD51B 1.00E07 1.00E01 1.30E01 1.20E01 2.40E03 3.00E07 9.60E08 8.70E07 21 26 102 21 n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3. Table 6. GWAS for SNPs that affect the relationship between CEB and lifespan: re-evaluating significant SNPs in Models 1–3 and 5 (split samples) Ssid ss66450977 ss66475987 Sample half 1 Sample half 2 P-values (genotype CEB) P-values (genotype CEB) Model 1 Model 2 Model 3 Model 5 Model 1 Model 2 Model 3 Model 5 0.00032 0.0002 0.00041 0.00012 0.00097 0.0021 0.007 0.001 9.39E08a 5.46E04 7.04E08a 4.46E04 1.36E06 1.56E03 4.58E06 1.39E02 n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3. a SNP attained genome-wide significance. Wang et al. | Genetic links in Framingham 249 Table 7. GWAS for SNPs that affect the relationship between CEB and lifespan: re-evaluating significant SNPs in Models 4a–d (split samples) Ssid ss66450977 ss66475987 Sample half 1 Sample half 2 P-values (genotype CEB) P-values (genotype CEB) Model 4a Model 4b Model 4c Model 4d Model 4a Model 4b Model 4c Model 4d 8.40E04 3.00E03 1.30E03 9.40E04 8.00E04 1.80E03 7.40E03 3.70E03 3.33E06 2.00E03 1.19E06 2.30E03 1.32E06 3.80E03 3.35E06 3.40E03 n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3. Table 8. GWAS for SNPs that affect the relationship between CEB and lifespan: top SNPs in Model 5 (split sample) Ssid Rsid ss66092635 ss66508254 ss66392234 ss66328248 ss66531142 ss74823403 ss66231005 ss66273879 ss66526690 ss66490007 Chr rs6581676 rs2961258 rs7132724 rs13248967 rs11219832 rs7860830 rs10899741 rs1728810 rs1602160 rs11009744 P-values (genotype CEB) Position 12 7 12 8 11 9 7 3 6 10 64992353 15150223 65001044 114920075 124272500 26882137 52215028 10992443 94277193 34675601 Sample 1 Sample 2 9.12E06 1.41E05 1.82E05 2.81E05 3.65E05 3.27E01 4.62E01 4.15E01 9.00E01 9.86E01 4.58E01 7.86E01 4.86E01 6.86E01 1.79E01 7.19E10a 9.84E08a 1.07E07a 1.57E07 2.37E07 n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3. a SNP attained genome-wide significance. Table 9. GWAS for SNPs that affect the relationship between CEB and lifespan: summary of significant SNPs in Models 1–3 and 5 (full sample) (imputed SNPs) Ssid ss66450977 ss66475987 Rsid rs6768456 rs2575533 Chr 3 4 Position 27867272 42432336 P-values (genotype CEB) Near EOMES ATP8A1 Model 1 Model 2 Model 3 Model 4 Model 5 2.91E10a 1.50E07 2.20E10a 6.57E08a 6.44E09a 5.03E06 (see Table 10) 5.56E07 2.94E05 n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3. a SNP attained genome-wide significance. sample: ss66450977 on Chromosome 3 (close to EOMES) and ss66475987 on Chromosome 4 (close to ATP8A1). Their levels of significance decreased as additional covariates were included in the model; however, these SNPs were also significant in the matching covariates model (Tables 3 and 4). We also found two nominally significant SNPs that exhibited possibly non-additive effects: ss66392234 250 | Wang et al. Evolution, Medicine, and Public Health Table 10. GWAS for SNPs that affect the relationship between CEB and lifespan: summary of significant SNPs in Model 4 (full sample) (imputed SNPs) Ssid Rsid Chr Position P-values (genotype CEB) Near Model 4a ss66450977 ss66475987 rs6768456 rs2575533 3 4 27867272 42432336 EOMES ATP8A1 1.40E08 1.02E05 Model 4b a a 6.30E09 5.80E06 Model 4c 4.30E09 5.40E06 Model 4d a 3.87E07 2.30E05 n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3. a SNP attained genome-wide significance. on Chromosome 12 (in HELB) and ss66500131 on Chromosome 9 (close to the pseudogene OR7E31P) (Table 5). Nearby genes/pseudogenes were determined based on a radius of 150 kb from each SNP. In the split-sample analysis using imputed SNP data (see ‘Methodology’ section regarding details on imputation), no SNPs were found to be significant for females (Tables 6–8), even when the randomization used in the split-sample assignment was replicated 100 times. We verified that using the imputed data for the full-sample analysis would have yielded comparable levels of significance for the two SNPs previously discovered in Models 1–5 (Tables 9 and 10). No significant SNPs were detected for males in Models 1–3. As in the GWAS for females, the addition of more covariates decreased levels of significance, and therefore no further models were run. No significant SNPs were detected in a model that included a quadratic effect of CEB. Further details on the GWAS for females are in the Supplementary Information. CONCLUSIONS AND IMPLICATIONS Phenotypic and genetic correlations The phenotypic correlation between CEB and lifespan in women differed with birth year, demonstrating the importance of phenotypic plasticity on the relationships among life-history traits. Secular cultural and environmental changes affect that correlation and probably account for much of the variation among studies [6, 15, 19, 21, 22]. The estimate of a negative genetic correlation in women when not accounting for covariates (rG = 0.88) was large. The effects of shared environment reduced the strength of the linear correlation and increased the strength of the quadratic correlation, and education mimicked the effects of a cost of reproduction in that increased level of education was associated with both fewer children and longer life: including education decreased the estimate of the genetic correlation. Some of our genetic correlation estimates were below 1. This indicates that the estimated variance component is negative, known to be a possible result of REML estimation [42]. When we controlled for the effects of smoking, education, country of origin and maternal effects, the correlation was still negative (rG = 0.69) yet no longer significant. This mirrors the pattern we observed in the GWAS; as covariates were introduced into the model, associations became insignificant. The mean pedigree depth of 1.02 implies that our pedigree is dominated by parent–offspring relationships. This may result in some difficulty distinguishing parental, environmental and additive genetic effects. For example, cultural and lifestyle habits that are unique to nuclear families (such as diet) are known to affect lifespan, but these habits are not recorded, and therefore the genetic correlations that we see may be confounded by these unobservable factors. One can only find a genetic correlation when the phenotypic correlation is significant, and one can only find significant effects of SNPs on a phenotypic correlation when it differs from zero. Our chain of inference thus depends on genetic effects not being too masked by phenotypic plasticity. Gene functions We found several SNPs with nominally significant effects on the correlation of CEB with post-reproductive lifespan; two of them are near EOMES and RAD51B, genes that are related to cancer when under-expressed. The effect of the SNP close to EOMES reached genome-wide significance. The Wang et al. | Genetic links in Framingham EOMES gene has been associated with multiple sclerosis and bladder cancer [43, 44]. RAD51B, a gene involved in encoding proteins that participate in DNA repair, has been linked to breast cancer and brain cancer [45–48]. Further details on the genes in proximity to the SNPs found significant in our GWAS are included in the Supplementary Information. Although these SNPs were close in physical distance to their respective genes (<130 kb), further study of linkage disequilibrium would help to understand their possible association. Other studies Voorhuis et al. [49] collated the results of many genetic studies of age at natural menopause. None of the SNPs that we discovered were found in the studies included in their summary. Several other recent genetic studies relate fertility to genotype. Kosova et al. [50] found 41 SNPs (P < 104) that were associated with decreased male fertility. Adachi et al. [51] found 36 SNPs (P < 104) with possible links to endometriosis in Japanese females. Both were GWAS studies that did not find any genome-wide significant SNPs. Murray et al. [52] reported confirmations for four SNPs previously identified as associated with age at menopause. Ewens et al. [53] examined 15 SNPs linked with obesity to evaluate possible associations with polycystic ovary syndrome, the cause of a form of infertility in women; only one SNP had a nominal level of significance, and the significance did not hold up in another case–control study. Our methods differ fundamentally from these four studies in that we considered lifespan in conjunction with fertility, and the significant SNPs we found were not reported in their analyses [50–53]. Although the Kuningas Rotterdam study incorporated mortality in its analysis and was therefore more similar to our study [35], it differs from our approach in three ways: (i) our analysis included many more SNPs (444 205 versus their 1664), (ii) we adjusted for the effects of several direct mortality-affecting covariates such as smoking and SBP, (iii) Kuningas used an initial screening of the 1664 SNPs with a set-based test (with a threshold of P < 0.05), whereas we started with a GWAS across 444 205 SNPs in models that relate each SNP to both CEB and lifespan (with a threshold of P < 1.13 107). We did not find Bonferroni-level significance with SNPs near the four gene regions identified in [35]. Summary We have analysed phenotypic and genetic correlations between reproductive success and survival and have identified a small set of genes that may mediate a trade-off between them. This warrants further studies in other samples. The Framingham dataset has some shortcomings. In particular, women born before the start of the study would only have been included in the study if they survived until 1948–52 (when the study began). Therefore, our dataset does not include anyone who died during World War I, the 1918 flu pandemic, the Great Depression and World War II. If these catastrophic events affected women differently depending on their fertility and lifespan, then excluding these women from our analysis would bias our results. The issue is inherent in such observational studies of humans, and unfortunately cannot be avoided. We failed to find any significant SNPs when covariates (i.e. smoking, country of origin and average cholesterol levels) were included and when we did a rough check for consistency out of sample. It is unknown how often such checks modify significance of SNP associations, for many other published GWAS studies do not account for the effects of covariates or do out-of-sample predictions. AUTHOR CONTRIBUTIONS S.G.B. and X.W. jointly worked on processing and cleaning the data and phenotypic correlation calculations. S.G.B. further calculated the genetic correlations and heritabilities. X.W. performed the GWAS. S.C.S. conceived of the study and drafted the initial manuscript. All authors contributed to the final manuscript. supplementary data Supplementary data is available at EMPH online. acknowledgements The authors thank Drs John W. Emerson and Andrew Pakstis for their feedback and insight on the project and the three anonymous reviewers for their constructive feedback. 251 252 | Wang et al. Evolution, Medicine, and Public Health funding 16. Muller HG, Chiou JM, Carey JR et al. Fertility and life span: late children enhance female longevity. J Gerontol Ser A The study was supported by the Yale University and the Marie Curie International Incoming Fellowship FP7-PEOPLE-2010IIF-276565. Conflict of interest: None declared. Biol Sci Med Sci 2002;57:B202–6. 17. Sear R. The impact of reproduction on Gambian women: does controlling for phenotypic quality reveal costs of reproduction? Am J Phys Anthropol 2007;132: 632–41. 18. Lawlor DA, Emberson JR, Ebrahim S et al. Is the references association between parity and coronary heart disease due to biological effects of pregnancy or adverse lifestyle 1. Williams G. Pleiotropy, natural selection, and the evolution of senescence. Evolution 1957;11:398–411. 2. Williams G. Natural selection, costs of reproduction, and a refinement of Lack’s principle. Am Nat 1966;100: 687–90. 3. Roff D. The Evolution of Life Histories. New York: Chapman and Hall, 1992. 4. Stearns SC. The Evolution of Life Histories. Oxford University Press: Oxford, 1992, 249. 5. Stearns SC, Partridge L (2001) The genetics of aging in Drosophila. In: Masoro EJ, Austad SN (eds), Handbook of the Bioloy of Aging, 5th edn. 2001, 353–68. 6. Doblhammer G, Oeppen J. Reproduction and longevity among the British peerage: the effect of frailty and health selection. Proc Biol Sci 2003;270:1541–7. risk factors associated with child-rearing? Findings from the British Women’ Heart and Health Study and the British Regional Heart Study. Circulation 2003;107: 1260–4. 19. Lund EE, Arnesen EE, Borgan JKJ. Pattern of childbearing and mortality in married women—a national prospective study from Norway. J Epidemiol Commun Health 1990;44: 237–40. 20. Manor OO, Eisenbach ZZ, Israeli AA et al. Mortality differentials among women: the Israel Longitudinal Mortality Study. Soc Sci Med (1967) 2000;51:1175–88. 21. Dribe M. Long-term effects of childbearing on mortality: evidence from pre-industrial Sweden. Popul Stud 2004;58: 297–310. 22. Gavrilova N, Gavrilov L, Semyonova VG et al. Does excep 7. Gagnon A, Smith KR, Tremblay M et al. Is there a trade- tional human longevity come with a high cost of infertility? off between fertility and longevity? A comparative Testing the evolutionary theories of aging. Ann N Y Acad study of women from three large historical databases accounting for mortality selection. Am J Hum Biol 2009; 21:533–40. 8. Maklakov AA. Sex difference in life span affected by female birth rate in modern humans. Evol Hum Behav 2008;29: 444–9. 9. Tabatabaie V, Atzmon G, Rajpathak SN et al. Exceptional longevity is associated with decreased reproduction. Aging 2011;3:1202–5. 10. Thomas F, Teriokhin A, Renaud F et al. Human longevity at the cost of reproductive success: evidence from global data. J Evol Biol 2000;13:409–14. 11. Westendorp R, Kirkwood T. Human longevity at the cost of reproductive success. Nature 1998;396:743–6. 12. Fuster V. Widowhood, illegitimacy, marital reproduction and female longevity in a rural Spanish population. Homo 2011;62:500–9. 13. Helle S, Lummaa V, Jokela J. Are reproductive and somatic senescence coupled in humans? Late, but not early, Sci 2004;1019:513–7. ¨¨ 23. Helle S, Ka ar P, Jokela J. Human longevity and early reproduction in pre-industrial Sami populations. J Evol Biol 2002;15:803–7. 24. Jacobsen BK, Knutsen SF, Oda K et al. Parity and total, ischemic heart disease and stroke mortality. The Adventist Health Study, 1976–1988. Eur J Epidemiol 2011;26:711–8. 25. Jasienska G, Nenko I, Jasienski M. Daughters increase longevity of fathers, but daughters and sons equally reduce longevity of mothers. Am J Hum Biol 2006;18: 422–5. 26. Korpelainen H. Fitness, reproduction and longevity among European aristocratic and rural Finnish families in the 1700s and 1800s. Proc Biol Sci 2000;267: 1765–70. 27. Lycett JE, Dunbar RIM, Voland E. Longevity and the costs of reproduction in a historical human population. Proc Biol Sci 2000;267:31–5. reproduction correlated with longevity in historical sami 28. Cesarini D, Lindqvist E, Wallace B. Maternal longevity and women. Proc Biol Sci 2005;272:29–37. 14. Le Bourg E, Thon B, Le´gare´ J et al. Reproductive life of the sex of offspring in pre-industrial Sweden. Ann Hum French–Canadians in the 17–18th centuries: a search for 29. Gavrilov L, Gavrilova N. Is there a reproductive cost for a trade-off between early fecundity and longevity. Exp Gerontol 1993;28:217–32. 15. McArdle P, Pollin T, O’Connell J et al. Does having children Biol 2007;34:535–46. human longevity? J Anti-Aging Med 1999;2:121–3. 30. Jasienska G. Reproduction and lifespan: trade-offs, overall energy budgets, intergenerational costs, and extend life span? A genealogical study of parity and lon- costs neglected by research. Am J Hum Biol 2009;21:524–32. gevity in the Amish. J Gerontol Ser A Biol Sci Med Sci 2006; 31. Mitteldorf J. Female fertility and longevity. Age 2010;32: 61:190–5. 79–84. Wang et al. | Genetic links in Framingham 32. Le Bourg E. Does reproduction decrease longevity in human beings? Ageing Res Rev 2007;6:141–9. application of these as urinary tumor markers. Clin Cancer Res 2011;17:5582–92. 33. Stearns S, Ackermann M, Doebeli M et al. Experimental 44. Patsopoulos NA, de Bakker PIW. Genome-wide meta- evolution of aging, growth, and reproduction in fruitflies. analysis identifies novel multiple sclerosis susceptibility Proc Natl Acad Sci USA 2000;97:3309–13. ¨gele M, Pattaro C, Fuchsberger C et al. Heritability 34. Go analysis of life span in a semi-isolated population followed 45. Liu Y, Shete S, Wang LE et al. Gamma-radiation sensitivity and polymorphisms in RAD51L1 modulate glioma risk. across four centuries reveals the presence of pleiotropy between life span and reproduction. J Gerontol Ser A Biol Sci Med Sci 2011;66:26–37. loci. Ann Neurol 2011;70:897–912. Carcinogenesis 2010;31:1762–9. 46. Figueroa JD, Garcia-Closas M, Humphreys M et al. Associations of common variants at 1p11. 2 and 14q24. 1 ¨e S, Uitterlinden AG et al. The relation35. Kuningas M, Altma (RAD51L1) with breast cancer risk and heterogeneity ship between fertility and lifespan in humans. Age 2011;33: by tumor subtype: findings from the Breast Cancer Association Consortium. Hum Mol Genet 2011;20:4693–706. 615–22. 36. R Core Team. (2009) R: A Language and Environment for Statistical Computing. R Foundation for Statistical 47. Shu XO, Long J, Lu W et al. Novel genetic markers of breast cancer survival identified by a genome-wide association 37. Therneau T. (2013) A Package for Survival Analysis in S. study. Cancer Res 2012;72:1182–9. ¨stro ¨m S, Henriksson R et al. DNA-repair 48. Wibom C, Sjo R package version 2.37–4. http://CRAN.R-project.org/ gene variants are associated with glioblastoma survival. Computing: Vienna, Austria, 2009. Acta Oncol 2012;51:325–32. package=survival. al. 49. Voorhuis M, Onland-Moret NC, van der Schouw YT et al. Colloquium papers: natural selection in a contemporary Human studies on genetics of the age at natural meno- human population. Proc Natl Acad Sci 2010;107(Suppl. 1), 1787–92. pause: a systematic review. Hum Reprod Update 2010;16: 364–77. 39. Gilmour A, Gogel B, Cullis B. ASReml user guide release 3.0. 50. Kosova G, Scott NM, Niederberger C et al. Genome- 38. Byars SG, Ewbank D, Govindaraju DR et wide association study identifies candidate genes for VSN International Ltd, 2002. 40. Morrissey MB, Wilson A. Pedantics: an R package for pedigree-based genetic simulation and pedigree manipulation, characterization and viewing. Molecular Ecology Resources 2010;10:711–9. 41. Li Y, Abecasis GR. Mach 1.0: rapid haplotype reconstruction and missing genotype inference. Am J Hum Genet 2006;S79:2290. male fertility traits in humans. Am J Hum Genet 2012;90: 950–61. 51. Adachi S, Tajima A, Quan J et al. Meta-analysis of genomewide association scans for genetic susceptibility to endometriosis in Japanese population. J Hum Genet 2010;55: 816–21. 52. Murray A, Bennett CE, Perry JRB et al. Common genetic 42. Thompson WA Jr. The problem of negative esti- variants are significant risk factors for early menopause: mates of variance components. Ann Math Stat 1962;33: results from the Breakthrough Generations Study. Hum 273–89. Mol Genet 2010;20:186–92. 43. Reinert T, Modin C, Castano FM et al. Comprehensive 53. Ewens KG, Jones MR, Ankener W et al. FTO and MC4R genome methylation analysis in bladder cancer: identifi- gene variants are associated with obesity in polycystic cation and validation of novel methylated genes and ovary syndrome. PLoS One 2011;6:e16390. 253 241 Evolution, Medicine, and Public Health [2013] pp. 241–253 doi:10.1093/emph/eot013 Genetic links between post-reproductive lifespan and family size in Framingham Xiaofei Wang*1, Sean G. Byars2 and Stephen C. Stearns3 1 Department of Statistics, Yale University, New Haven, CT 06520-8102, USA, 2Department of Biology, Copenhagen University, Universitetsparken 15, 2100 Copenhagen, Denmark and 3Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06520-8102, USA *Correspondence address. Department of Statistics, Yale University, New Haven, CT 06520-8102, USA. Tel: 203 432 0666; Fax: 203 432 0633; E-mail: xiaofei.wang@yale.edu Received 7 December 2012; revised version accepted 17 June 2013 ABSTRACT Background and objectives: Is there a trade-off between children ever born (CEB) and post-reproductive lifespan in humans? Here, we report a comprehensive analysis of reproductive trade-offs in the Framingham Heart Study (FHS) dataset using phenotypic and genotypic correlations and a genomewide association study (GWAS) to look for single-nucleotide polymorphisms (SNPs) that are related to the association between CEB and lifespan. Methodology: We calculated the phenotypic and genetic correlations of lifespan with CEB for men and women in the Framingham dataset, and then performed a GWAS to search for SNPs that might affect the relationship between post-reproductive lifespan and CEB. Results: We found significant negative phenotypic correlations between CEB and lifespan in both women (rP = 0.133, P < 0.001) and men (rP = 0. 079, P = 0.036). The genetic correlation was large, highly significant and strongly negative in women (rG = 0.877, P = 0.009) in a model without covariates, but not in men (P = 0.777). The GWAS identified five SNPs associated with the relationship between CEB and post-reproductive lifespan in women; some are near genes that have been linked to cancer. None were identified in men. Conclusions and implications: We identified several SNPs for which the relationship between CEB and post-reproductive lifespan differs by genotype in women in the FHS who were born between 1889 and 1958. That result was not robust to changes in the sample. Further studies on larger samples are needed to validate the antagonistic pleiotropy of these genes. K E Y W O R D S : genome-wide association study; longevity; trade-off; family size ß The Author(s) 2013. Published by Oxford University Press on behalf of the Foundation for Evolution, Medicine, and Public Health. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. orig inal research article 242 | Wang et al. Evolution, Medicine, and Public Health BACKGROUND AND OBJECTIVES Both the theory of life-history evolution and the evolutionary theory of aging assume a trade-off between reproduction and survival: a cost of reproduction paid in lifespan [1–4]. Although well documented in model organisms, the existence of this trade-off in humans has been controversial (e.g. [5]). Negative [6–11], positive [12–17], U-shaped [18–20] and mixed or insignificant [21–27] relationships between completed family size and lifespan have all been found. Some results have been criticized on statistical grounds; some authors doubt that the trade-off exists at all (e.g. [28–32]). Two papers suggest that the cost is only expressed in women of low social class or nutritional status; a similar effect has been found in model organisms [5, 21, 27]. Although most of the attempts to measure the trade-off in humans are based on phenotypic correlations, the standard of evidence for the existence of a trade-off in evolutionary analyses of model organisms is a negative genetic correlation demonstrated as a correlated response to selection (e.g. [5, 33]). Such experiments reveal genetic relationships often hidden by phenotypic plasticity. This standard cannot be met in humans, where experimental evolution is not possible. Two other types of genetic evidence, however, are available in humans. First, genetic correlations can be measured with pedigree analysis using methods developed for animal breeding. Using such ¨gele et al. [34] found a significantly methods, Go ‘positive’ genetic correlation between completed family size and lifespan in a sample of more than 5100 men and women who lived between 1658 and 1907 in South Tyrol, Italy. Second, genome-wide association studies (GWAS) can be done on populations where both the relevant traits and the single-nucleotide polymorphisms (SNPs) have been measured. In a GWAS done on more than 3500 women from Rotterdam, Kuningas et al. [35] found four chromosomal regions that influenced completed family size; none of them appeared also to affect lifespan. The aims of this analysis of men and women in the Framingham Heart Study (FHS) were to add to the genetic information on reproductive trade-offs in humans by (i) first measuring the phenotypic correlation of lifespan with children ever born (CEB), (ii) second estimating the genetic correlation of lifespan with CEB and (iii) performing a GWAS to search for SNPs with effects on the relationship of lifespan to CEB. We found significantly negative phenotypic and genetic correlations between post-reproductive lifespan and CEB in women. We also found five chromosomal regions mediating the trade-off that were genome-wide significant in several statistical models but not when we added smoking as a covariate. Some of the genes in those five regions are associated with increased risk of cancer. METHODOLOGY The Framingham Heart Study Initiated in 1948 in the town of Framingham (MA), the FHS includes three generations of participants that continue to be measured. Beginning with 5209 men and women initially enrolled in the originalcohort, the study added 5124 offspring-cohort participants in 1971 that were mostly offspring of the original-cohort. In 2002, a third-cohort was added consisting of offspring of the second cohort. Original-cohort participants have been examined every 2 years (28 exams in total to date), the offspring-cohort every 4 years (eight exams in total). Participants are mostly of European ancestry (20% UK, 40% Ireland, 10% Italy and 10% Quebec). Data were de-identified by the FHS. Data-use and human subjects’ approval were obtained from the National Institutes of Health (dbGaP) and the Yale Institutional Review Board. Phenotypic correlations Our sample included men and women who were born between the 1890s and the 1950s, except for age at menarche where the available sample was much smaller (i.e. 1923–56). Cox regression was used to calculate risk of death depending on age at first birth (nmen = 2579; nwomen = 2193), CEB (nmen = 3833; nwomen = 3658), and age at menarche (n = 1355) and menopause (n = 2415) in women. In each regression, potentially confounding effects in lifespan were controlled by including education, country of origin and smoking status. To test for potential nonlinear effects, a separate regression was run with a quadratic term included for the main predictor traits. If quadratic terms were significant, this was explored further by examining the Cox Wang et al. | Genetic links in Framingham regression model (from the survival library in R) using penalized splines (with 4 df) [36, 37]. The Cox proportional hazards model is a standard tool for survival analysis, in which the log of the hazard function h(t) is assumed to be a linear combination of the covariates. Specifically, for a model containing p covariates x1 , . . . ,xp ; the fitted model takes the form of hðtÞ ¼ hðt0 Þexpð1 x1+ +p xp Þ, where i is the coefficient fit to covariate xi and hðt0 Þ is the unknown baseline hazard function. Equivalently, this equation can be expressed as hðtÞ ln ¼ 1 x1+ +p xp : hðt0 Þ Note that FHS reports CEB as a value from ‘0’ to ‘5’, where ‘5’ indicates having had five or more children. Several variables were pre-adjusted for age and year measured. For body mass index (BMI), systolic blood pressure (SBP) and total cholesterol, age and year effects were removed by taking residuals of each trait against age (measures between 20 and 60 years old) and year measured using a generalized additive model (locally weighted scatterplot smoothing, LOESS). All residuals for a subject were then averaged to obtain an average residual for each trait, which were then used for modelling. As demonstrated previously, the surface of the generalized additive model can be accurately estimated due to the large number of trait measurements [38]. Our initial sample included 4123 women for whom data on age at death, CEB, education level, smoking history, estrogen use and BMI were available. We then removed 941 women who were born in or after 1941, a period when the correlation between lifespan and CEB was weaker, possibly because of the improvement of health care after World War II. We did so because to have a chance of detecting any significantly correlated SNPs in the GWAS, we needed to focus on a period where the phenotypic correlation is relatively strong. Nineteen women who died before the age of 50 years were also excluded, because their CEB records might represent incomplete observations. Because we excluded women who died before the age of 50 years, we are specifically studying the relationship of CEB to postreproductive mortality. Of the remaining 3163 women, keeping only those who had genotype data reduced our sample size to 1810. We required this sample to have associated genotype data because we later used the same sample for the GWAS. Note that our phenotypic analysis used the year 1919 as a cut-off because the yearly ratio of individuals alive to individuals deceased increased to about 50% in 1919, and continued to rise thereafter. For illustrative purposes, we also ran a multiple linear regression on a smaller sample for women, including only the deceased subjects who were born prior to 1919 (n = 680) out of a total of 1810 who satisfied specific criteria outlined above. We similarly ran a regression model on a smaller sample of men who have died (n = 712) out of a total of 1474 men satisfying similar criteria. Genetic correlations and heritabilities We estimated heritabilities and genetic correlations for traits from pedigrees using a mixed effects restricted maximum likelihood (REML) model in ASReml version 3.0 [39]. We considered models in which there were no covariates as well as adjusted models where phenotypic variation was partitioned into additive genetic, residual variance and a single random effect (maternal ID, paternal ID or education level). To be consistent with the phenotypic correlation models, we also considered models in which fixed effects (smoking status and country of origin) and both random effects for maternal ID and education level were included. Sex was not included as a fixed effect as male and female estimates were obtained separately. Smoking status (0/1, nonsmoker/smoker) and country of origin (0/1, US born/foreign born) were coded as binary variables. Education described number of years completed, with missing values coded as 8 years (the minimum). Maternal variance components ranged from 0.0 (age at first birth) to 0.12 ± 0.04 (lifespan) and 0.0 (age at first birth) to 0.20 ± 0.03 (lifespan) for female and male analyses, respectively. Education variance components ranged from 0.0 (age at menarche) to 0.06 ± 0.03 (CEB) and 0.0 (age at first birth) to 0.014 ± 0.009 (CEB) for female and male analyses, respectively. The Framingham pedigree totals 15 877 individuals in 1538 pedigrees consisting of both immediate and extended family. Heritability estimates were tested for significance with likelihood ratios that compared full models with reduced ones (i.e. 21DF = 2 (LogLFULL LogLREDUCED)) lacking the additive genetic component. Genetic correlations were also tested for significance by 243 244 | Wang et al. Evolution, Medicine, and Public Health comparing likelihood values from full models to ones where the genetic covariance was fixed at zero. Our genetic correlation analysis between CEB and lifespan included a total of 5133 females for whom age at death and CEB information were available. Supplementary Fig. S4 summarizes the pedigree information for these women, grouped by cohort via the ‘pedantics’ package in R [40]. Pedigree depths (computed using the same package) for the Framingham dataset range from 0 to 4, with mean 1.02 (±1.06). On average, each woman had 2.38 (±1.59) children in her lifetime and lived 77.21 (±12.73) years. The average level of education in years was 11.66. The average age at menarche was 12.81 (±1.54), average age at first birth was 26.49 (±4.81) and average age at menopause was 49.20 (±4.10). Genome-wide association study Our association results are based on 444 205 SNPs from the 500 K and 50 K Affymetrix samples that satisfied the following criteria: call rate >90%, Hardy–Weinberg equilibrium P-value >0.00001, Mendel error rate <2% and minor allele frequency >0.01. These SNP selection criteria are further discussed in the Supplementary Information. We used Cox proportional hazards models, as done in the phenotypic correlation analysis, to estimate the interactions between survival time past age 50 years, CEB and genotype. For censored individuals, we used their times of last observation past age 50 years as their censoring time. Several models were run under this setup, which we number to emphasize that they are nested models. Model 1 did not adjust for any covariates. We then added covariates to reduce confounding by variables that may be correlated with lifespan and CEB. Model 2 used education level. Model 3 further added BMI, estrogen use and cohort as covariates. Models 4a–d were intermediate steps in which one of the four additional covariates was added: blood pressure treatment indicator (Model 4a), total cholesterol (Model 4b), SBP (Model 4c) and smoking indicator (Model 4d). Model 5 included all four of these additional covariates. Models 4a–d were run retrospectively to pinpoint which covariate, when added, resulted in removing significance from all SNPs. A summary of the models fitted can be found in the Supplementary Information. Both genotypes and CEB were included as continuous variables to model an additive effect of the minor allele. We used both the raw genotypes provided by FHS as well as an imputed dataset. The imputation was done in several stages. First, we incorporated values imputed by MACH that were included in the FHS dataset. The MACH algorithm imputes missing genotypes based on shared haplotype stretches between subjects and HapMap data [41]. Of the remaining missing values, we sampled among the possible genotypes given the genotypes of parents, when parent genotypes were available. Any remaining missing values were simply sampled according to genotype proportions of the entire group. This sequence of operations created a full set of genotypes that had no missing values. Cohort was defined as a categorical variable computed from the year of birth: born before or in 1917 and born in or after 1918. In addition to running the above five models on the full sample of 1810, we tested our models for robustness by mimicking an out-of-sample analysis. To that end, we randomly divided our sample into two equal parts and fitted Models 1–5 to each part separately to check for consistency in significance of the top performing SNPs. A true out-of-sample performance check would include the calculation of prediction error based on a model fitted on a training set. Our method does not aim to validate prediction out of sample, but rather to ensure that a SNP discovered to be significant in one sample ought to be significant in another sample—a less stringent, but still important requirement of consistency. To minimize the effects of missing genotypes on each subsample, which would further lower our sample size in each of the two separate runs, we only used the imputed genotypes for this portion of our analysis. The downside of using imputed genotypes is the risk of imputation error. To verify that our risk of imputation error is low, we used the imputed SNP data to repeat our full-sample analyses for Models 1–5. Our aim was to show that our results for these models are similar, regardless of whether we used imputed or raw SNP data. To explore possible non-additive genotypic effects, we ran a separate Model 6 that used genotype as a categorical variable. The covariates used in Model 6 are identical to those used in Model 3, and any SNPs for which the homozygous minor genotype had fewer than 20 counts were excluded. We did not apply the half-sample testing to Model 6, because in many cases, the genotype counts in the homozygous minor allele category were too small to Wang et al. | Genetic links in Framingham further subdivide the group for categorical modelling. Finally, we ran two additional models that are outside of the nested framework given above on the raw data only (and therefore, they are not numbered). A quadratic model was run to search for a possible nonlinear effect by adding a quadratic CEB term along with its interaction with genotype to Model 1. The ‘matching covariates’ model was run to provide a frame of reference to the reader; this model uses exactly the same covariates that were included in the phenotypic and genotypic correlation analyses—education, smoking indicator and country of origin. RESULTS Phenotypic correlations In the Cox regression analysis where as many men and women were included as possible (birth-year range 1889–1958), censoring was used to account for those who were still alive according to the latest medical records. Risk of mortality beyond age 50 years increased if women (adjusted incidence rate ratio (RR) = 1.045, P = 0.030) had more children (Table 1). When a nonlinear term for CEB was included, it significantly improved the model fit and became more significant than the linear term. 245 Penalized splines for unadjusted mortality risk (Fig. 1) support a predominantly U-shaped pattern for the association between CEB and lifespan, similar to that found in some other studies (e.g. [19]). This is consistent with a cost of reproduction that is experienced by women with three or more children and with a benefit of reproduction to those who have one or two children. Highest mortality risk occurred in women with no children or more than three to four children, with lowest risk for those with approximately two. Mortality risk decreased if the first child was born later (women, unadjusted RR = 0.971, P < 0.001; men, adjusted RR = 0.985, P = 0.011; see Supplementary Fig. S1), but the significance of this effect depended on whether estimates were adjusted or not (Table 1). Mortality risk was also reduced if menopause occurred later in women (unadjusted RR = 0.970, P = 0.003), although this effect disappeared when other effects were controlled for (Table 1). Full model results can be seen in Supplementary Table S1. In the analysis where only the 680 women were included in the range of birth years 1889–1918 in which all had died, the phenotypic correlation between CEB and lifespan was highly significant and negative (r = 0.133, P = 0.0005; Fig. 2). Linear regression indicated that every additional child cost 0.74 years of lifespan (standard error (SE) = 0.21 years). There was, however, significant variation in Table 1. Incidence RR (±95% confidence interval) for age at death due to stroke, heart attack or cancer (beyond age 50 years) Trait Women Men Unadjusted Adjusted Unadjusted Adjusted CEB 1.050* (1.011–1.092)NL** n = 3729 1.045* (1.005–1.087)NL*** 0.995 (0.960–1.033) n = 3888 1.031 (0.993–1.071) Age first birth 0.971*** (0.955–0.988)NL** n = 2236 0.977* (0.960–0.994)NL* 0.990 (0.979–1.001) n = 2613 0.985** (0.974–0.995) Menarche 0.891 (0.757–1.050) n = 1367 0.917 (0.782–1.077) Menopause 0.970** (0.951–0.990) n = 2461 0.984 (0.965–1.005) Unadjusted Cox regression estimates included only the main predictor trait. Cultural effects (smoking, education and country-of-origin) were accounted for in adjusted estimates. ‘NL’ indicates that a significant nonlinear effect was also detected for the association between this trait and longevity. *P < 0.05, **P < 0.01, ***P < 0.001. 246 | Wang et al. Evolution, Medicine, and Public Health Figure 1. Summary of CEB and mortality risk in Framingham women. A histogram of CEB and log-relative mortality risk values for each CEB value with 95% confidence bands Figure 3. Correlation between CEB and lifespan by birth year (n = 5133) for women. Women (n = 680) were grouped by overlapping 10-year intervals of birth year, and the correlation between CEB and lifespan was computed for each group. Individual points indicate the sample size of each 10-year group, with the mean birth year plotted on the x-axis and correlation plotted on the y-axis phenotypic correlations are dependent on birth year is consistent with previous findings that selection pressures changed over time in Framingham [38]. Heritabilities and genetic correlations Figure 2. Relationship between CEB and lifespan for women. Scatterplot illustrating correlation between CEB and lifespan (r = 0.133, P < 0.001) (n = 680). Both variables have been jittered to minimize overlap of points the phenotypic correlation by birth year (Fig. 3); it was positive (with one exception) from 1893 to 1907 and negative from 1908 to 1913. Many in the earlier group were giving birth before the Great Depression and World War II. Some of the latter group encountered those two major environmental perturbations. The correlation between CEB and lifespan for the 712 men was slightly negative (r = 0.079, P = 0.0355; Supplementary Fig. S2). An additional child cost 0.54 years of male lifespan (SE = 0.26 years). Again, the correlation varied by birth year, but the variations were less pronounced than for females (Supplementary Fig. S3). The observation that In women (Table 2), the heritabilities of most major life-history traits differed significantly from zero, including age at death (h2 = 0.12, P = 0.01), CEB (h2 = 0.09, P = 0.03), age at first birth (h2 = 0.18, P < 0.001) and menopause (h2 = 0.44, P < 0.001). In women, the genetic correlation of CEB with age at death was large, negative and significant (rG = 0.88, P = 0.01) in a model without covariates (Supplementary Table S2). When we included education as a random effect, the genetic correlation decreased to 0.70 but was still significant (P = 0.02). When we included either the mother or the father identifiers in place of education as a random effect, the genetic covariance remained large and negative, but was no longer significant (mother: rG = 1.58, P = 0.11; father: rG = 1.46, P = 0.15). The model in which we adjusted for education, smoking status and country of origin also produced a large negative genetic correlation, but the correlation was not significant (rG = 0.69, P = 0.14). The correlation between the quadratic term CEB2 and lifespan was large, negative and significant in Wang et al. | Genetic links in Framingham 247 Table 2. Heritabilities (h2, on the diagonal) and genetic correlations (rG, off the diagonal) of life history traits (±SE) Women Age at death Age at death CEB Age first birth Menarche Menopause 0.12 ± 0.08 P = 0.0176 n = 3010 0.69 ± 0.52 P = 0.1420 0.20 ± 0.25 P = 0.2083 0.07 ± 0.23 P = 0.3886 0.15 ± 0.17 P = 0.1917 0.09 ± 0.05 P = 0.0394 n = 4123 0.40 ± 0.35 P = 0.1545 0.31 ± 0.24 P < 0.0001 0.21 ± 0.21 P = 0.1377 0.18 ± 0.06 P = 0.0008 n = 2912 0.38 ± 0.33 P = 0.0911 0.06 ± 0.14 P = 0.3541 0.16 ± 0.13 P = 0.0948 n = 1638 0.10 ± 0.21 P = 0.3121 CEB Age first birth Menarche Menopause Men Age at death CEB 0.44 ± 0.06 P < 0.0001 n = 3400 <0.01 ± <0.01 P = 0.8875 n = 2963 <0.01 ± <0.01 P = 0.7773 <0.01 ± <0.01 P = 0.6101 <0.01 ± <0.01 P = 0.5485 n = 4051 <0.01 ± <0.01 P = 0.3884 Age first birth 0.12 ± 0.07 P = 0.0300 n = 2688 SEs and P-values were obtained from maximum-likelihood estimates. Cultural (smoking, education and country-of-origin) and maternal effects were accounted for in all estimates. P-values < 0.05 are in bold. three of four models (no covariates: rG = 1.09, P = 0.003, only mother identifier as random effect: rG = 1.73, P = 0.04, only education as random effect: rG = 0.85, P = 0.01), and borderline non-significant in the model with only the father identifier (rG = 1.61, P = 0.06). Furthermore, we looked to see if the genetic correlation between CEB and lifespan was robust to pedigree depth in the simplest model where no covariates were included. Including only those women with pedigree depth of 1 or higher (n = 2540), we got rG = 0.46 (P = 0.14) and including only those women with pedigree depth of 2 or higher (n = 948), we got rG = 0.21 (P = 0.60); both correlations were no longer significant in the reduced samples. The genetic correlation of CEB with age at menarche was relatively large, positive and highly significant (rG = 0.31, P < 0.001). In men (Table 2), the heritability of age at first birth (inferred from their spouses) was small and only just significant (h2 = 0.12, P = 0.03). All other male heritability and genetic correlation estimates were non-significant. Full model results for heritability can be seen in Supplementary Table S2. Genome-wide association study GWAS results are summarized in Tables 3–10; the birth years for the 1810 women included in the GWAS are shown in the Supplementary Information. We deemed a SNP to be genome-wide significant if its interaction coefficient with CEB had a P-value that was less than a Bonferroni-adjusted threshold of 1.13 107 ( = 0.05), unless otherwise indicated. For females, we found two SNPs that attained genome-wide significance using the full 248 | Wang et al. Evolution, Medicine, and Public Health Table 3. GWAS for SNPs that affect the relationship between CEB and lifespan: summary of significant SNPs in Models 1–3 and 5 (full sample) Ssid Rsid Chr Position P-values (genotype CEB) Near Model 1 ss66450977 rs6768456 3 ss66475987 rs2575533 4 Model 2 Model 3 Model 4 Model 5 Matching covariates 7.99E07 4.93E08a 27867272 EOMES 4.03E10a 4.38E10a 8.40E09a (see Table 4) 42432336 ATP8A1 8.02E08a 5.30E08a 3.06E06 2.49E05 2.11E07 n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3. a SNP attained genome-wide significance. Table 4. GWAS for SNPs that affect the relationship between CEB and lifespan: summary of significant SNPs in Models 4 (full sample) Ssid Rsid Chr Position P-values (genotype CEB) Near Model 4a ss66450977 ss66475987 rs6768456 rs2575533 3 4 27867272 42432336 EOMES ATP8A1 1.40E09 1.02E05 Model 4b a a 7.44E09 3.56E06 Model 4c 8.65E09 5.23E06 Model 4d a 4.02E07 1.35E05 n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3. a SNP attained genome-wide significance. Table 5. GWAS for SNPs that affect the relationship between CEB and lifespan: summary of nominally significant SNPs in Model 6 Ssid Rsid Chr Position Near P-value Aa CEB P-value aa CEB Homozygous minor genotype count ss66450977 ss66500131 ss66392234 ss66495977 rs6768456 rs1777023 rs7132724 rs2180957 3 9 12 14 27867272 92008266 65001044 68238574 EOMES OR7E31P HELB RAD51B 1.00E07 1.00E01 1.30E01 1.20E01 2.40E03 3.00E07 9.60E08 8.70E07 21 26 102 21 n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3. Table 6. GWAS for SNPs that affect the relationship between CEB and lifespan: re-evaluating significant SNPs in Models 1–3 and 5 (split samples) Ssid ss66450977 ss66475987 Sample half 1 Sample half 2 P-values (genotype CEB) P-values (genotype CEB) Model 1 Model 2 Model 3 Model 5 Model 1 Model 2 Model 3 Model 5 0.00032 0.0002 0.00041 0.00012 0.00097 0.0021 0.007 0.001 9.39E08a 5.46E04 7.04E08a 4.46E04 1.36E06 1.56E03 4.58E06 1.39E02 n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3. a SNP attained genome-wide significance. Wang et al. | Genetic links in Framingham 249 Table 7. GWAS for SNPs that affect the relationship between CEB and lifespan: re-evaluating significant SNPs in Models 4a–d (split samples) Ssid ss66450977 ss66475987 Sample half 1 Sample half 2 P-values (genotype CEB) P-values (genotype CEB) Model 4a Model 4b Model 4c Model 4d Model 4a Model 4b Model 4c Model 4d 8.40E04 3.00E03 1.30E03 9.40E04 8.00E04 1.80E03 7.40E03 3.70E03 3.33E06 2.00E03 1.19E06 2.30E03 1.32E06 3.80E03 3.35E06 3.40E03 n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3. Table 8. GWAS for SNPs that affect the relationship between CEB and lifespan: top SNPs in Model 5 (split sample) Ssid Rsid ss66092635 ss66508254 ss66392234 ss66328248 ss66531142 ss74823403 ss66231005 ss66273879 ss66526690 ss66490007 Chr rs6581676 rs2961258 rs7132724 rs13248967 rs11219832 rs7860830 rs10899741 rs1728810 rs1602160 rs11009744 P-values (genotype CEB) Position 12 7 12 8 11 9 7 3 6 10 64992353 15150223 65001044 114920075 124272500 26882137 52215028 10992443 94277193 34675601 Sample 1 Sample 2 9.12E06 1.41E05 1.82E05 2.81E05 3.65E05 3.27E01 4.62E01 4.15E01 9.00E01 9.86E01 4.58E01 7.86E01 4.86E01 6.86E01 1.79E01 7.19E10a 9.84E08a 1.07E07a 1.57E07 2.37E07 n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3. a SNP attained genome-wide significance. Table 9. GWAS for SNPs that affect the relationship between CEB and lifespan: summary of significant SNPs in Models 1–3 and 5 (full sample) (imputed SNPs) Ssid ss66450977 ss66475987 Rsid rs6768456 rs2575533 Chr 3 4 Position 27867272 42432336 P-values (genotype CEB) Near EOMES ATP8A1 Model 1 Model 2 Model 3 Model 4 Model 5 2.91E10a 1.50E07 2.20E10a 6.57E08a 6.44E09a 5.03E06 (see Table 10) 5.56E07 2.94E05 n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3. a SNP attained genome-wide significance. sample: ss66450977 on Chromosome 3 (close to EOMES) and ss66475987 on Chromosome 4 (close to ATP8A1). Their levels of significance decreased as additional covariates were included in the model; however, these SNPs were also significant in the matching covariates model (Tables 3 and 4). We also found two nominally significant SNPs that exhibited possibly non-additive effects: ss66392234 250 | Wang et al. Evolution, Medicine, and Public Health Table 10. GWAS for SNPs that affect the relationship between CEB and lifespan: summary of significant SNPs in Model 4 (full sample) (imputed SNPs) Ssid Rsid Chr Position P-values (genotype CEB) Near Model 4a ss66450977 ss66475987 rs6768456 rs2575533 3 4 27867272 42432336 EOMES ATP8A1 1.40E08 1.02E05 Model 4b a a 6.30E09 5.80E06 Model 4c 4.30E09 5.40E06 Model 4d a 3.87E07 2.30E05 n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3. a SNP attained genome-wide significance. on Chromosome 12 (in HELB) and ss66500131 on Chromosome 9 (close to the pseudogene OR7E31P) (Table 5). Nearby genes/pseudogenes were determined based on a radius of 150 kb from each SNP. In the split-sample analysis using imputed SNP data (see ‘Methodology’ section regarding details on imputation), no SNPs were found to be significant for females (Tables 6–8), even when the randomization used in the split-sample assignment was replicated 100 times. We verified that using the imputed data for the full-sample analysis would have yielded comparable levels of significance for the two SNPs previously discovered in Models 1–5 (Tables 9 and 10). No significant SNPs were detected for males in Models 1–3. As in the GWAS for females, the addition of more covariates decreased levels of significance, and therefore no further models were run. No significant SNPs were detected in a model that included a quadratic effect of CEB. Further details on the GWAS for females are in the Supplementary Information. CONCLUSIONS AND IMPLICATIONS Phenotypic and genetic correlations The phenotypic correlation between CEB and lifespan in women differed with birth year, demonstrating the importance of phenotypic plasticity on the relationships among life-history traits. Secular cultural and environmental changes affect that correlation and probably account for much of the variation among studies [6, 15, 19, 21, 22]. The estimate of a negative genetic correlation in women when not accounting for covariates (rG = 0.88) was large. The effects of shared environment reduced the strength of the linear correlation and increased the strength of the quadratic correlation, and education mimicked the effects of a cost of reproduction in that increased level of education was associated with both fewer children and longer life: including education decreased the estimate of the genetic correlation. Some of our genetic correlation estimates were below 1. This indicates that the estimated variance component is negative, known to be a possible result of REML estimation [42]. When we controlled for the effects of smoking, education, country of origin and maternal effects, the correlation was still negative (rG = 0.69) yet no longer significant. This mirrors the pattern we observed in the GWAS; as covariates were introduced into the model, associations became insignificant. The mean pedigree depth of 1.02 implies that our pedigree is dominated by parent–offspring relationships. This may result in some difficulty distinguishing parental, environmental and additive genetic effects. For example, cultural and lifestyle habits that are unique to nuclear families (such as diet) are known to affect lifespan, but these habits are not recorded, and therefore the genetic correlations that we see may be confounded by these unobservable factors. One can only find a genetic correlation when the phenotypic correlation is significant, and one can only find significant effects of SNPs on a phenotypic correlation when it differs from zero. Our chain of inference thus depends on genetic effects not being too masked by phenotypic plasticity. Gene functions We found several SNPs with nominally significant effects on the correlation of CEB with post-reproductive lifespan; two of them are near EOMES and RAD51B, genes that are related to cancer when under-expressed. The effect of the SNP close to EOMES reached genome-wide significance. The Wang et al. | Genetic links in Framingham EOMES gene has been associated with multiple sclerosis and bladder cancer [43, 44]. RAD51B, a gene involved in encoding proteins that participate in DNA repair, has been linked to breast cancer and brain cancer [45–48]. Further details on the genes in proximity to the SNPs found significant in our GWAS are included in the Supplementary Information. Although these SNPs were close in physical distance to their respective genes (<130 kb), further study of linkage disequilibrium would help to understand their possible association. Other studies Voorhuis et al. [49] collated the results of many genetic studies of age at natural menopause. None of the SNPs that we discovered were found in the studies included in their summary. Several other recent genetic studies relate fertility to genotype. Kosova et al. [50] found 41 SNPs (P < 104) that were associated with decreased male fertility. Adachi et al. [51] found 36 SNPs (P < 104) with possible links to endometriosis in Japanese females. Both were GWAS studies that did not find any genome-wide significant SNPs. Murray et al. [52] reported confirmations for four SNPs previously identified as associated with age at menopause. Ewens et al. [53] examined 15 SNPs linked with obesity to evaluate possible associations with polycystic ovary syndrome, the cause of a form of infertility in women; only one SNP had a nominal level of significance, and the significance did not hold up in another case–control study. Our methods differ fundamentally from these four studies in that we considered lifespan in conjunction with fertility, and the significant SNPs we found were not reported in their analyses [50–53]. Although the Kuningas Rotterdam study incorporated mortality in its analysis and was therefore more similar to our study [35], it differs from our approach in three ways: (i) our analysis included many more SNPs (444 205 versus their 1664), (ii) we adjusted for the effects of several direct mortality-affecting covariates such as smoking and SBP, (iii) Kuningas used an initial screening of the 1664 SNPs with a set-based test (with a threshold of P < 0.05), whereas we started with a GWAS across 444 205 SNPs in models that relate each SNP to both CEB and lifespan (with a threshold of P < 1.13 107). We did not find Bonferroni-level significance with SNPs near the four gene regions identified in [35]. Summary We have analysed phenotypic and genetic correlations between reproductive success and survival and have identified a small set of genes that may mediate a trade-off between them. This warrants further studies in other samples. The Framingham dataset has some shortcomings. In particular, women born before the start of the study would only have been included in the study if they survived until 1948–52 (when the study began). Therefore, our dataset does not include anyone who died during World War I, the 1918 flu pandemic, the Great Depression and World War II. If these catastrophic events affected women differently depending on their fertility and lifespan, then excluding these women from our analysis would bias our results. The issue is inherent in such observational studies of humans, and unfortunately cannot be avoided. We failed to find any significant SNPs when covariates (i.e. smoking, country of origin and average cholesterol levels) were included and when we did a rough check for consistency out of sample. It is unknown how often such checks modify significance of SNP associations, for many other published GWAS studies do not account for the effects of covariates or do out-of-sample predictions. AUTHOR CONTRIBUTIONS S.G.B. and X.W. jointly worked on processing and cleaning the data and phenotypic correlation calculations. S.G.B. further calculated the genetic correlations and heritabilities. X.W. performed the GWAS. S.C.S. conceived of the study and drafted the initial manuscript. All authors contributed to the final manuscript. supplementary data Supplementary data is available at EMPH online. acknowledgements The authors thank Drs John W. Emerson and Andrew Pakstis for their feedback and insight on the project and the three anonymous reviewers for their constructive feedback. 251 252 | Wang et al. Evolution, Medicine, and Public Health funding 16. Muller HG, Chiou JM, Carey JR et al. Fertility and life span: late children enhance female longevity. J Gerontol Ser A The study was supported by the Yale University and the Marie Curie International Incoming Fellowship FP7-PEOPLE-2010IIF-276565. Conflict of interest: None declared. Biol Sci Med Sci 2002;57:B202–6. 17. Sear R. The impact of reproduction on Gambian women: does controlling for phenotypic quality reveal costs of reproduction? Am J Phys Anthropol 2007;132: 632–41. 18. Lawlor DA, Emberson JR, Ebrahim S et al. Is the references association between parity and coronary heart disease due to biological effects of pregnancy or adverse lifestyle 1. Williams G. Pleiotropy, natural selection, and the evolution of senescence. Evolution 1957;11:398–411. 2. Williams G. Natural selection, costs of reproduction, and a refinement of Lack’s principle. Am Nat 1966;100: 687–90. 3. Roff D. The Evolution of Life Histories. New York: Chapman and Hall, 1992. 4. Stearns SC. The Evolution of Life Histories. Oxford University Press: Oxford, 1992, 249. 5. Stearns SC, Partridge L (2001) The genetics of aging in Drosophila. In: Masoro EJ, Austad SN (eds), Handbook of the Bioloy of Aging, 5th edn. 2001, 353–68. 6. Doblhammer G, Oeppen J. Reproduction and longevity among the British peerage: the effect of frailty and health selection. Proc Biol Sci 2003;270:1541–7. risk factors associated with child-rearing? Findings from the British Women’ Heart and Health Study and the British Regional Heart Study. Circulation 2003;107: 1260–4. 19. Lund EE, Arnesen EE, Borgan JKJ. Pattern of childbearing and mortality in married women—a national prospective study from Norway. J Epidemiol Commun Health 1990;44: 237–40. 20. Manor OO, Eisenbach ZZ, Israeli AA et al. Mortality differentials among women: the Israel Longitudinal Mortality Study. Soc Sci Med (1967) 2000;51:1175–88. 21. Dribe M. Long-term effects of childbearing on mortality: evidence from pre-industrial Sweden. Popul Stud 2004;58: 297–310. 22. Gavrilova N, Gavrilov L, Semyonova VG et al. Does excep 7. Gagnon A, Smith KR, Tremblay M et al. Is there a trade- tional human longevity come with a high cost of infertility? off between fertility and longevity? A comparative Testing the evolutionary theories of aging. Ann N Y Acad study of women from three large historical databases accounting for mortality selection. Am J Hum Biol 2009; 21:533–40. 8. Maklakov AA. Sex difference in life span affected by female birth rate in modern humans. Evol Hum Behav 2008;29: 444–9. 9. Tabatabaie V, Atzmon G, Rajpathak SN et al. Exceptional longevity is associated with decreased reproduction. Aging 2011;3:1202–5. 10. Thomas F, Teriokhin A, Renaud F et al. Human longevity at the cost of reproductive success: evidence from global data. J Evol Biol 2000;13:409–14. 11. Westendorp R, Kirkwood T. Human longevity at the cost of reproductive success. Nature 1998;396:743–6. 12. Fuster V. Widowhood, illegitimacy, marital reproduction and female longevity in a rural Spanish population. Homo 2011;62:500–9. 13. Helle S, Lummaa V, Jokela J. Are reproductive and somatic senescence coupled in humans? Late, but not early, Sci 2004;1019:513–7. ¨¨ 23. Helle S, Ka ar P, Jokela J. Human longevity and early reproduction in pre-industrial Sami populations. J Evol Biol 2002;15:803–7. 24. Jacobsen BK, Knutsen SF, Oda K et al. Parity and total, ischemic heart disease and stroke mortality. The Adventist Health Study, 1976–1988. Eur J Epidemiol 2011;26:711–8. 25. Jasienska G, Nenko I, Jasienski M. Daughters increase longevity of fathers, but daughters and sons equally reduce longevity of mothers. Am J Hum Biol 2006;18: 422–5. 26. Korpelainen H. Fitness, reproduction and longevity among European aristocratic and rural Finnish families in the 1700s and 1800s. Proc Biol Sci 2000;267: 1765–70. 27. Lycett JE, Dunbar RIM, Voland E. Longevity and the costs of reproduction in a historical human population. Proc Biol Sci 2000;267:31–5. reproduction correlated with longevity in historical sami 28. Cesarini D, Lindqvist E, Wallace B. Maternal longevity and women. Proc Biol Sci 2005;272:29–37. 14. Le Bourg E, Thon B, Le´gare´ J et al. Reproductive life of the sex of offspring in pre-industrial Sweden. Ann Hum French–Canadians in the 17–18th centuries: a search for 29. Gavrilov L, Gavrilova N. Is there a reproductive cost for a trade-off between early fecundity and longevity. Exp Gerontol 1993;28:217–32. 15. McArdle P, Pollin T, O’Connell J et al. Does having children Biol 2007;34:535–46. human longevity? J Anti-Aging Med 1999;2:121–3. 30. Jasienska G. Reproduction and lifespan: trade-offs, overall energy budgets, intergenerational costs, and extend life span? A genealogical study of parity and lon- costs neglected by research. Am J Hum Biol 2009;21:524–32. gevity in the Amish. J Gerontol Ser A Biol Sci Med Sci 2006; 31. Mitteldorf J. Female fertility and longevity. Age 2010;32: 61:190–5. 79–84. Wang et al. | Genetic links in Framingham 32. Le Bourg E. Does reproduction decrease longevity in human beings? Ageing Res Rev 2007;6:141–9. application of these as urinary tumor markers. Clin Cancer Res 2011;17:5582–92. 33. Stearns S, Ackermann M, Doebeli M et al. Experimental 44. Patsopoulos NA, de Bakker PIW. Genome-wide meta- evolution of aging, growth, and reproduction in fruitflies. analysis identifies novel multiple sclerosis susceptibility Proc Natl Acad Sci USA 2000;97:3309–13. ¨gele M, Pattaro C, Fuchsberger C et al. Heritability 34. Go analysis of life span in a semi-isolated population followed 45. Liu Y, Shete S, Wang LE et al. Gamma-radiation sensitivity and polymorphisms in RAD51L1 modulate glioma risk. across four centuries reveals the presence of pleiotropy between life span and reproduction. J Gerontol Ser A Biol Sci Med Sci 2011;66:26–37. loci. Ann Neurol 2011;70:897–912. Carcinogenesis 2010;31:1762–9. 46. Figueroa JD, Garcia-Closas M, Humphreys M et al. Associations of common variants at 1p11. 2 and 14q24. 1 ¨e S, Uitterlinden AG et al. The relation35. Kuningas M, Altma (RAD51L1) with breast cancer risk and heterogeneity ship between fertility and lifespan in humans. Age 2011;33: by tumor subtype: findings from the Breast Cancer Association Consortium. Hum Mol Genet 2011;20:4693–706. 615–22. 36. R Core Team. (2009) R: A Language and Environment for Statistical Computing. R Foundation for Statistical 47. Shu XO, Long J, Lu W et al. Novel genetic markers of breast cancer survival identified by a genome-wide association 37. Therneau T. (2013) A Package for Survival Analysis in S. study. Cancer Res 2012;72:1182–9. ¨stro ¨m S, Henriksson R et al. DNA-repair 48. Wibom C, Sjo R package version 2.37–4. http://CRAN.R-project.org/ gene variants are associated with glioblastoma survival. Computing: Vienna, Austria, 2009. Acta Oncol 2012;51:325–32. package=survival. al. 49. Voorhuis M, Onland-Moret NC, van der Schouw YT et al. Colloquium papers: natural selection in a contemporary Human studies on genetics of the age at natural meno- human population. Proc Natl Acad Sci 2010;107(Suppl. 1), 1787–92. pause: a systematic review. Hum Reprod Update 2010;16: 364–77. 39. Gilmour A, Gogel B, Cullis B. ASReml user guide release 3.0. 50. Kosova G, Scott NM, Niederberger C et al. Genome- 38. Byars SG, Ewbank D, Govindaraju DR et wide association study identifies candidate genes for VSN International Ltd, 2002. 40. Morrissey MB, Wilson A. Pedantics: an R package for pedigree-based genetic simulation and pedigree manipulation, characterization and viewing. Molecular Ecology Resources 2010;10:711–9. 41. Li Y, Abecasis GR. Mach 1.0: rapid haplotype reconstruction and missing genotype inference. Am J Hum Genet 2006;S79:2290. male fertility traits in humans. Am J Hum Genet 2012;90: 950–61. 51. Adachi S, Tajima A, Quan J et al. Meta-analysis of genomewide association scans for genetic susceptibility to endometriosis in Japanese population. J Hum Genet 2010;55: 816–21. 52. Murray A, Bennett CE, Perry JRB et al. Common genetic 42. Thompson WA Jr. The problem of negative esti- variants are significant risk factors for early menopause: mates of variance components. Ann Math Stat 1962;33: results from the Breakthrough Generations Study. Hum 273–89. Mol Genet 2010;20:186–92. 43. Reinert T, Modin C, Castano FM et al. Comprehensive 53. Ewens KG, Jones MR, Ankener W et al. FTO and MC4R genome methylation analysis in bladder cancer: identifi- gene variants are associated with obesity in polycystic cation and validation of novel methylated genes and ovary syndrome. PLoS One 2011;6:e16390. 253 27 Evolution, Medicine, and Public Health [2013] pp. 27–36 doi:10.1093/emph/eot001 orig inal research article Identifying future zoonotic disease threats Where are the gaps in our understanding of primate infectious diseases? 1 School of Natural Sciences; 2Trinity Centre for Biodiversity Research, Trinity College Dublin, Dublin 2, Ireland and 3 Department of Human Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA. *Corresponding author. School of Natural Sciences, Trinity College Dublin, Dublin 2, Ireland. Tel: +353-(0)1-896-1926; Fax:+353-1-6778094; E-mail: ncooper@tcd.ie; cnunn@oeb.harvard.edu. Received 30 October 2012; revised version accepted 1 January 2013. ABSTRACT Background and objectives: Emerging infectious diseases often originate in wildlife, making it important to identify infectious agents in wild populations. It is widely acknowledged that wild animals are incompletely sampled for infectious agents, especially in developing countries, but it is unclear how much more sampling is needed, and where that effort should focus in terms of host species and geographic locations. Here, we identify these gaps in primate parasites, many of which have already emerged as threats to human health. Methodology: We obtained primate host–parasite records and other variables from existing databases. We then investigated sampling effort within primates relative to their geographic range size, and within countries relative to their primate species richness. We used generalized linear models, controlling for phylogenetic or spatial autocorrelation, to model variation in sampling effort across primates and countries. Finally, we used species richness estimators to extrapolate parasite species richness. Results: We found uneven sampling effort within all primate groups and continents. Sampling effort among primates was influenced by their geographic range size and substrate use, with terrestrial species receiving more sampling. Our parasite species richness estimates suggested that, among the best sampled primates and countries, almost half of primate parasites remain to be sampled; for most primate hosts, the situation is much worse. Conclusions and implications: Sampling effort for primate parasites is uneven and low. The sobering message is that we know little about even the best studied primates, and even less regarding the spatial and temporal distribution of parasitism within species. K E Y W O R D S : sampling events; parasite species richness; Global Mammal Parasite Database; relative sampling effort ß The Author(s) 2013. Published by Oxford University Press on behalf of the Foundation for Evolution, Medicine, and Public Health. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Natalie Cooper*1,2 and Charles L. Nunn3 28 Evolution, Medicine, and Public Health | Cooper and Nunn INTRODUCTION we can do this, we need to identify gaps in our knowledge of wildlife infectious diseases. Here, we investigate gaps in our knowledge of primate parasites. We chose primates because they are our closest relatives, and partly as a consequence, many of humanity’s biggest killers have originated in wild primates (e.g. HIV [4]). In addition, much is known about primate parasites. We acknowledge at the outset, however, that many other vertebrates have been sources of emerging infectious diseases in humans, and are thus suitable for extensions of the effort conducted here. We use the word parasite in a general sense, referring to both microparasites such as viruses, bacteria, fungi and protozoa, and macroparasites such as helminths and arthropods. To assess gaps in our understanding of primate parasites, we examined records from the GMPD [12]—a large-scale compilation of parasite records from wild mammals—and use these data to quantify and model variation in sampling effort. METHODOLOGY Data collection We obtained host–parasite records from the GMPD (accessed 15 October 2012; [12]), geographic range maps from IUCN [14] and the dated consensus phylogeny from ‘10kTrees’ version 3 [15]. For consistency across our analyses, we only included primate species found in both the range maps and phylogeny, and that we could identify to the species level using the taxonomy of Wilson and Reeder [16]. For analyses of geographical sampling gaps, we obtained latitude and longitude coordinates for each host–parasite record with locality data from the GMPD. For each primate species, we collated data on adult body mass (g) from Jones et al. [17]. We also defined the substrate use of each species as terrestrial (>90% of time on ground), semi-terrestrial (<90% but >50% of time on ground), semi-arboreal (<90% but >50% of time in trees) or arboreal (>90% of time in trees) using Nowak [18], and treated this as a continuously varying character in the analyses. For each country, we assembled data on gross domestic product (GDP) per capita (USD), land area (km2), and the number of airports from Central Intelligence Agency [19] and Emerson et al. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Many of the most devastating infectious diseases in humans have origins in wildlife [1–3]. For example, the global AIDS pandemic originated through human contact with wild African primates [4] and influenza viruses circulate among wild bird populations [5]. These are not only historical occurrences. Recently, for example, rodents were identified as the source of a hantavirus outbreak in Yosemite National Park, USA [6] and a novel rhabdovirus (Bas-Congo virus) of probable animal origin emerged in the Democratic Republic of Congo [7]. As human populations continue to expand into new areas and global changes in temperature and habitat alter the distributions of wild animals, humans around the world will have greater contact with wildlife [8]. Thus, understanding which infectious agents have the potential to spread from animals to humans is crucial for preventing future human disease outbreaks. Here, we outline current gaps in our knowledge of primate infectious diseases at phylogenetic and geographic scales. By doing so, we provide new directions for sampling wild primates and a statistical framework to address this issue in other groups. The first step in predicting zoonotic disease risks to humans is to identify the animal hosts of infectious agents. This information provides several insights. First, it gives information on the host range and specificity of the infectious agent. Second, it provides information on the geographic distribution of the infectious agent in wildlife, which can be compared with human population density. Finally, knowing the hosts of an infectious agent also provides information on risks for host shifts to humans [9, 10]. For example, a host living at high density is likely to exhibit higher prevalence of the infectious disease and to have more contact with humans or domesticated animals. Many efforts are being made to document and collate information on wildlife and human diseases (e.g. HealthMap [11], EID Event Database [2] and Global Mammal Parasite Database (GMPD) [12]). Unfortunately, large-scale analyses of this type have revealed major variations in sampling effort among hosts and geographic regions, with some species and areas being sampled rarely or not at all [10, 13]. If we hope to use wildlife disease data to make reliable predictions about future risks to humans, we need to increase sampling in potential hosts and the areas in which they are found. However, before Identifying future zoonotic disease threats Sampling effort Our measure of sampling effort is the number of sampling events for each primate. We define a sampling event as one primate species being sampled for one parasite species in one location in one paper. The number of sampling events in a paper depends on how the results were reported in the paper, and hence how they were added to the GMPD. For example, a paper reporting that Pan troglodytes is infected by Ascaris lumbricoides represents one sampling event; a paper reporting that P. troglodytes is infected by A. lumbricoides and SIVcpz in Location A and Location B represents four sampling events. This method assumes that each sampling event represents equivalent research effort; however, some sampling events may represent multiple years, multiple populations and/or multiple individuals, while others represent only one individual sampled once. Other samples may be counted multiple times, for example one fecal sample may reveal several parasites. However, in general, we believe that our definition of sampling events should give us a conservative estimate of sampling effort. Note that we included sampling events with zero prevalence for the parasite sampled because these still represent valid sampling effort. In total, our host–parasite data consisted of 5459 sampling events, which we used to quantify relative sampling of primate species. Of these sampling events, 4067 have georeferenced localities in the GMPD and thus we also used these to quantify relative sampling of geographic regions. As mentioned above, we only included parasites we could identify to species or strain for species accumulation curves, leaving us with 3999 sampling events in these analyses. These criteria meant our species accumulation curves only use around 75% of our sampling events for some analyses, but they are necessary to ensure that we are using the highest quality data in the analyses of specific areas. It also further highlights the need for more research into primate parasites. We deposited all data in the Dryad repository: doi:10.5061/dryad.510sb. Analyses Variation in sampling effort among primate species All else being equal, primates should be sampled in proportion to their abundance, so we used ln(geographic range size) as a proxy for abundance and assumed primates with the largest geographic range sizes should be sampled more than primates with small ranges. We estimated sampling relative to geographic range size using the residuals from a phylogenetic generalized least squares (PGLS) model of ln(sampling events) against ln(geographic range size), fitted using the R package caper [21] (Appendix 1). We considered primates with positive residuals as being relatively better sampled given their geographic range size than primates with negative residuals, and displayed these results on the phylogeny. We also expect great apes (Hominoidea) to be better sampled than other primates, so we tested this using phylogenetic analysis of variance (ANOVA; Appendix 1). Variation in sampling effort among geographic regions We assumed that countries should be sampled in proportion to the number of primates found within the country, i.e. countries with high primate species 29 Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 [20]. We estimated airport density (airport/km2) by dividing the number of airports by the land area of the country. Overall, our dataset contained 228 primate species from 89 countries. We located host–parasite data for 166 of these species from 57 countries. The remaining 62 species and 32 countries have no records in the GMPD and were listed as ‘unsampled’ in our analyses. Our parasite data contain 651 unique parasite species or genera with non-zero prevalence in primates (87 arthropods, 50 bacteria, 6 fungi, 326 helminths, 115 protozoa and 67 viruses). Our dataset also contains 46 unique parasite species or genera that have been looked for in primates but never found (these data are important for estimating sampling effort, see below). We included all parasites when quantifying the relative amount of sampling by species and country, even those we could only identify to the generic level; for species accumulation curves, however, we only included parasites we could identify to species or strain to avoid double counting parasite species. This left us with 161 primate species and 502 parasite species with non-zero prevalence in primates (73 arthropods, 32 bacteria, 4 fungi, 242 helminths, 93 protozoa and 58 viruses) for these analyses. We also excluded 22 ‘unsampled’ primates from our models of variation in sampling effort among primate species because we were unable to locate life history data for them. Cooper and Nunn | 30 Evolution, Medicine, and Public Health | Cooper and Nunn richness should be sampled more than countries with low primate species richness. We therefore estimated sampling relative to primate species richness within each country using the residuals from a spatial generalized least squares (GLS) model of ln(sampling events) against ln(primate species richness) using the R package nlme [22] (Supplementary Data, Appendix 1). We considered countries with positive residuals as relatively better sampled given their primate species richness than countries with negative residuals and displayed these results on a world map. Sampling events per primate ¼ f (geographic range size þ phylogenetic distance from humans þ substrate use + body sizeÞ ð1Þ We fit PGLS models for the 205 primate species for which we had data (including 40 ‘unsampled’ primates). All variables except substrate use were natural log transformed prior to analysis. We also used caper [21] to estimate phylogenetic signal (i.e. l, Supplementary Data, Appendix 1) in the number of sampling events across primates. Phylogenetic signal is the tendency for related species to resemble each other more than they resemble species drawn at random from a phylogenetic tree [23]. High phylogenetic signal, i.e. l values close to 1, indicates that closely related species have similar numbers of sampling events, whereas low phylogenetic signal, i.e. l values close to 0, indicates that the number of sampling events varies randomly across the phylogeny. We acknowledge that many of our variables—such as sampling effort and geographic Modeling geographic variation in sampling effort We predicted that the following variables would influence sampling effort among countries: (i) primate species richness (countries with more primates are likely to be sampled more often than countries with fewer primates because there are more primates to sample); (ii) GDP (we expect countries with a high GDP to have more resources for disease monitoring and hence to be sampled more often than countries with a lower GDP) and (iii) airport density (countries with more airports given their area are likely to be easier to visit, and hence disease monitoring should be more frequent). We therefore fit the following model: Sampling events per country ¼ f ðGDP + primate species richness þ airport densityÞ ð2Þ We fit spatial GLS models for the 89 countries that contain primates (including 32 ‘unsampled’ countries). All variables were natural log transformed prior to analysis. Note that the results were almost identical when using a spherical rather than an exponential correlation structure, so we only report the exponential correlation structure results. Extrapolating parasite species richness for primates and countries We first used the R package vegan [24] to plot species accumulation curves [25] of cumulative parasite species richness against sampling events for each primate species (N = 41) and country (N = 21) with 30 or more sampling events. To reduce the effects of inter-sampling event heterogeneity on the shapes of the curves, we used rarefaction (Supplementary Data, Appendix 1) to produce smooth mean species accumulation curves, with confidence intervals 2 standard deviations from the mean. Next, we used the data from our curves to predict parasite species richness for these 41 primates and 21 countries. We used two nonparametric algorithms, Chao2 and first-order Jackknife (Jackknife1), which have been recommended in this context [25–27] (Supplementary Data, Appendix 1). We also estimated standard errors for our extrapolated parasite species richness values based on references in Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Modeling variation in sampling effort among primate species We predicted that the following variables would influence sampling effort among primate species: (i) geographic range size (we expect primates with larger geographic ranges to be sampled more often than primates with smaller ranges); (ii) phylogenetic distance from humans (medical research is likely to focus on our closest relatives, thus we expect them to be sampled more often); (iii) body size (small species are easier to capture and so likely to be sampled more than larger species) and (iv) substrate use (terrestrial species are easier to sample than arboreal species and thus should be sampled more often). We therefore fit the following model: range size—are not biological traits subject to normal evolutionary change. However, they may still show phylogenetic non-independence, and l enables quantification of that non-independence regardless of which underlying process generates it. Identifying future zoonotic disease threats Oksanen et al. [24], and used these to calculate upper and lower bounds on extrapolated parasite species richness. Finally, we plotted species accumulation curves of cumulative parasite species richness for all primate species combined, first using all parasites and then using arthropods, helminths, protozoa and viruses separately. We did not use bacteria and fungi as we had very few of these parasites in our dataset (bacteria = 32 species; fungi = 4 species). We used R version 2.15.0 [28] for all the analyses above. Cooper and Nunn | 31 and Supplementary Fig. S1). As predicted, we found that the Hominoidea (great apes) were relatively well sampled in relation to their geographic range size and were sampled significantly more than other primates (F1,225 = 12.01, P = 0.002). We also found great heterogeneity in the degree of parasite sampling across all other major groups of primates, i.e. Old World monkeys (Cercopithecoidea), New World monkeys (Platyrrhini) and strepsirrhines (Strepsirrhini, i.e. lemurs and galagos). Variation in sampling effort among geographic regions RESULTS Sampling effort was unevenly distributed among primates and ranged from 0 (62 ‘unsampled’ species) to 630 sampling events (P. troglodytes), with a mean of 30.71 ± 4.970. We plotted sampling effort in relation to the primates’ geographic range sizes (Fig. 1 Cercopithecoidea Hominoidea Platyrrhini Strepsirrhini Figure 1. Sampling effort for parasites across the primate phylogeny, assuming that primates should be sampled in proportion to their geographic range size. Species names have been omitted for clarity (see Supplementary Fig. S1 for a larger version with species names). Relative sampling effort was quantified using the residuals from a generalized linear model of ln(geographic range size) against the number of sampling events for each primate species. Gray circles indicate primates with poor sampling relative to their geographic range size (lower 25% of model residuals), black circles indicate primates with better sampling relative to their geographic range size (upper 25% of model residuals). Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Variation in sampling effort among primate species Sampling effort was also unevenly distributed geographically and ranged from 0 (32 ‘unsampled’ countries) to 416 sampling events (Uganda), with a mean of 42.70 ± 9.131. Many countries were poorly sampled in relation to their primate species richness (Fig. 2), with particularly low levels of sampling in parts of South East Asia (including China, Thailand, Cambodia, Myanmar, Laos and Vietnam), Central and Western Africa (including Sudan, Somalia, 32 Evolution, Medicine, and Public Health | Cooper and Nunn Low Sampling relave to primate species richness High Figure 2. Sampling effort for parasites across the world, assuming that countries should be sampled in proportion to their mate species richness) against the number of sampling events for each country. The colors indicate whether countries are poorly sampled (low; red) or better sampled (high; yellow) relative to their primate species richness. Table 1. PGLS model for explaining variation in sampling effort among primate species Variable Slope ± SE t201 Geographic range size (km2) Phylogenetic distance (My) Substrate use Body size (g) 0.347 ± 0.056 0.189 ± 0.729 0.864 ± 0.155 0.409 ± 0.158 6.222*** 0.260 5.572*** 2.597* l = 0.322; r2 = 0.333. Phylogenetic distance is measured as phylogenetic distance from humans in millions of years. Substrate use is a four-state-ordered variable ranging from fully terrestrial to fully arboreal, with more arboreal species scored higher. *P < 0.05; ***P < 0.001. Angola, Zambia, Guinea and Ghana) and South America (including Bolivia, Ecuador, Venezuela, Guyana and Suriname). Modeling variation in sampling effort among primate species Sampling effort for primate parasites covaried with primate geographic range size, body mass and substrate use: the most sampled primates tend to have larger geographic ranges, to be larger in body mass and to be more terrestrial (Table 1 and Supplementary Table S1). We found no significant effect of phylogenetic distance between humans and primates, indicating that both our close relatives and more distantly related species show evidence for sampling gaps. Our overall model explained around a third of the variation in sampling effort (r2 = 0.333), most of which appears to relate to the geographic range size and terrestriality of the primates (in single predictor models, geographic range size: r2 = 0.175; body size: r2 = 0.061; substrate use: r2 = 0.163). The number of sampling events for primates showed significant, but intermediate, levels of phylogenetic signal (N = 228, l = 0.589). This value was significantly different from both l = 0 and l = 1 (P < 0.001), indicating moderate phylogenetic non-independence. Modeling geographic variation in sampling effort Predictably, sampling effort across countries increased with primate species richness. However, Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 primate species richness. Relative sampling effort was quantified using the residuals from a generalized linear model of ln(pri- Identifying future zoonotic disease threats Cooper and Nunn | neither GDP nor airport density significantly predicted sampling effort (Table 2 and Supplementary Table S2). Extrapolating parasite species richness for primates and countries Figure 3 shows the parasite species accumulation curve for all primates combined, and for arthropods, helminths, protozoa and viruses. Across primates and countries, most parasite species accumulation curves were starting to show some downward curvature, indicating a declining rate of parasite species discovery, but in no cases had they approached an asymptote. Interestingly, the different types of parasites accumulated at different rates, with arthropods and helminths accumulating at a much faster rate than protozoa and viruses (see slope differences in Fig. 3). The parasite species accumulation curve for P. troglodytes is shown in Supplementary Fig. S2. Our estimates of parasite species richness using Chao2 and Jackknife1 are shown in Supplementary Tables S3–S5. Using Jackknife1, which appears to give more reasonable values, across the 41 best sampled primates, on average, we predict that there should be between 38 and 79% more parasites than currently recorded in the GMPD (Supplementary Table S3). For countries, on average, we predict that there should be between 29 and 40% more parasites than currently recorded in the GMPD (Supplementary Table S4). For all 161 primates in our study combined, we should find between 685 Variable Slope ± SE t84 Primate species richness GDP per capita (USD) Airport density (airport/km2) 1.241 ± 0.290 0.437 ± 0.246 3.559 ± 3.041 4.273*** 1.778 1.170 parasites = 1.938; GDP = gross domestic product; ***P < 0.001. 500 250 400 200 300 150 200 100 100 50 0 0 1000 2000 3000 sampling events 4000 Arthropods Helminths Protozoa Viruses 0 500 1000 1500 sampling events Figure 3. Parasite species accumulation curve for all 161 primates combined and all parasites (left-hand side). Parasite species accumulation curve for all 161 primates combined and helminths, protozoa and viruses separately (right-hand side). Parasites = cumulative parasite species richness. Arthropods = orange curve; helminths = blue curve, protozoa = green curve and viruses = red curve. For each curve, the darker line shows the mean curve and the lighter shaded region shows 2 standard deviations from the mean curve, each obtained from 1000 random permutations of the data. Note that the axes sizes are different on the left- and right-hand plots. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Table 2. Spatial GLS model with an exponential correlation structure, explaining variation in sampling effort among countries 0 33 34 Evolution, Medicine, and Public Health | Cooper and Nunn and 713 parasites, i.e. between 36 and 42% more parasites than the 502 parasites identified to species level that are currently reported in the GMPD for these 161 primates (Supplementary Table S5). DISCUSSION Variation in sampling effort among primate species and geographic regions Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Virtually every broadscale comparison of sampling effort, whether sampling disease agents in epidemiology or species in biodiversity studies, reveals bias in what is sampled. In epidemiology, for example, sampling may be highest for diseases with easily detectable symptoms and for areas easily accessed by medical personnel. Here, we showed that primate parasites are also unevenly sampled across both primate species and space. This supports previous studies of sampling gaps in primate parasites that used an earlier version of the GMPD data [13], but unlike previous studies, we also investigated the drivers of sampling effort variation across primates and geographic areas. For our analyses of variation in sampling effort among primate species, we predicted that our closest relatives (chimpanzees, gorillas and orangutans) would be relatively well sampled because a great deal of research has focused on these species. We expected most other primate species to be comparatively poorly sampled, except when they are more terrestrial or larger in body mass. As predicted, chimpanzees, gorillas and orangutans were generally better sampled. However, we found incredible variation in sampling among all other major primate groups, intermediate phylogenetic signal in sampling effort and no significant relationship between sampling effort and the phylogenetic distance from humans to the primate in question. Instead, our models suggested that most variation in sampling effort among primates can be explained by the geographic range size and level of terrestriality of the primates. Put more simply, the primates that researchers sample most are the species they encounter most often, including those that are more likely to be on the ground than in the trees. This is also supported by the low sampling of nocturnal primates. However, our models only explained 30% of the variation in sampling effort across primates, indicating that we did not capture every explanation for this variation. Some primates may be sampled because they are already intensively studied for infectious disease, with researchers building on previous knowledge rather than starting from scratch. Other species may be sampled thoroughly because they live in frequently used and wellequipped field sites. Some of the variation in sampling may have more idiosyncratic explanations; for example, the extensive sampling of some Macaca species likely reflects their use in medical research. We also identified great heterogeneity in sampling among countries, even among those in the same region. We found particularly low sampling in parts of South East Asia, Central and Western Africa, and South America, and better sampling in Eastern Africa and Brazil. However, the only variable in our statistical models that predicted sampling effort among countries was the primate species richness of the country, with parasite sampling highest in countries that have more primates to sample. We expected that the GDP of the countries would also positively affect sampling effort, but we found no evidence for this in our analyses, possibly because much of the research is not funded by the country in which the research takes place. In fact, on average, only 22% of tropical biological field station funding comes from the host country [29]. Perhaps a better predictor of sampling effort would be the number of research stations in a country. Our parasite species accumulation curves, for both primate species and countries, were starting to show some downward curvature, but in no cases had they approached an asymptote. In these analyses, we only used species or countries with at least 30 sampling events. This indicates that, at least for these fairly well-sampled primates and countries, sampling is slowly approaching levels sufficient to quantify parasite species richness. However, when we extended these analyses to extrapolate parasite species richness values, we found that even within our best sampled primates and countries, we are missing substantial parasite diversity. On average, we predicted that 38–79% more parasite species than currently reported in the GMPD should be found in our best sampled primate species and 29–40% more parasite species than currently reported in the GMPD should be found in our best sampled countries. This emphasizes exactly how poor our sampling is across all primates and countries. The other primates and countries obviously represent even larger gaps in our knowledge. Identifying future zoonotic disease threats Priorities for future research Identifying parasite sampling gaps across primate species and geographic regions is only the first step, we need to find strategies to minimize these sampling gaps if we are to predict which primate infectious diseases may emerge in humans. One solution is to set research priorities based on the sampling gaps [13], for example, by focusing effort and funding on relatively poorly sampled primate species, arboreal primates, those with small geographic ranges or those found in relatively poorly sampled regions of South East Asia, Central and Western Africa, and South America. Focusing on relatively poorly sampled primate species and areas may improve our general understanding of primate parasites, but it is only one factor in predicting risk to humans. For example, hosts are more likely to share parasites with their close relatives than with more distant relatives [9, 10]. Thus, continuing to focus our sampling efforts on parasites of our closest relatives (chimpanzees, gorillas and orangutans) may provide the greatest return in the case of risks to humans. This is particularly important because we found that chimpanzees are expected to have 33–50% more 35 parasites than currently found in the GMPD. In addition, ecological similarities also influence parasite sharing among primates, and humans share more parasites with terrestrial than arboreal primate species [9, 10]. As with sampling effort, this probably reflects higher contact rates among humans and terrestrial primates compared with arboreal primates. As a related issue, a host living at higher density is expected to have higher prevalence of parasites and may have more contact with human populations or our domesticated animals, thus increasing opportunities for host shifts to humans. The large numbers of zoonotic emerging infectious diseases with rodent or domesticated animal sources also highlight the importance of rates of contact and host density for disease emergence in humans [2, 3, 5]. CONCLUSIONS AND IMPLICATIONS The aim of this study was to identify where the gaps lie in our knowledge of primate parasites. We found that sampling effort was unevenly distributed across primate species and countries, and that the best predictors of sampling effort were the geographic range size or terrestriality of the primate species, or the primate species richness of the country. We also found that, according to our extrapolations of parasite species richness, even our best sampled primates and countries were still vastly undersampled, typically with only a quarter to two-thirds of their parasites documented, and possibly even less given that fungi and bacteria are so underrepresented in current records. This implies that if we want to predict primate disease emergence in humans, more sampling for parasites is needed across all primate species and countries. This is especially important as human populations grow and spread into new areas where they will encounter more primates and consequently more diseases. supplementary data Supplementary data is available at EMPH online and the Dryad repository: doi:10.5061/dryad.510sb. Conflict of interest: none declared. references 1. Wolfe ND, Dunavan CP and Diamond J. Origins of major human infectious diseases. Nature 2007;447:279–83. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Sampling was also uneven across types of parasites; when we analyzed arthropods, helminths, protozoa and viruses separately, we saw faster rates of parasite accumulation in arthropods and helminths but with little evidence for sufficient sampling in these species. This is interesting given that helminths make up 48% of the parasite species in our study (arthropods = 33%; bacteria = 6%; fungi = 0.8%; protozoa = 19% and viruses = 12%). We were not able to fit species accumulation curves to bacteria or fungi because we have so few bacteria and fungi species in our dataset. Given the importance of bacterial and fungal emerging diseases in humans [2, 30], this lack of sampling in wild primates is of concern. Another concern is that although viruses make up only 12% of the parasites in our dataset, viruses arguably present the greatest zoonotic disease threat to humans because their fast rates of evolution will allow them to easily adapt to new hosts [3, 5]. The relatively low number of viruses probably reflects detection bias. Viruses are very hard to detect, and even when detected prove difficult to classify. Therefore, our results probably grossly underestimate the number of viruses present in primates. Cooper and Nunn | 36 Evolution, Medicine, and Public Health | Cooper and Nunn 2. Jones KE, Patel NG, Levy MA et al. Global trends in emerging infectious diseases. Nature 2008;451:990–3. 3. Woolhouse MEJ and Gowtage-Sequeria S. Host range and emerging and reemerging pathogens. Emerg Infect Dis 2005;11:1842–7. 4. Hahn BH, Shaw GM, De KM et al. AIDS as a zoonosis: scientific and public health implications. Science 2000;287: 607–14. 5. Parrish CR, Holmes EC, Morens DM et al. Cross-species virus transmission and the emergence of new epidemic diseases. Microbiol Mol Biol Rev 2008;72:457–70. 6. Roehr B. US officials warn 39 countries about risk of hantavirus among travellers to Yosemite. BMJ 2012;345: 16. Wilson DE and Reeder DAM. Mammal Species of the World: A Taxonomic and Geographic Reference, 3rd edn. Smithsonian Institution Press: Washington, DC, 2005. 17. Jones KE, Bielby J, Cardillo M et al. PanTHERIA: a species-level database of life-history, ecology and geography of extant and recently extinct mammals. Ecology 2009;90:2648. 18. Nowak RM. Walker’s Mammals of the World. The Johns Hopkins University Press: Baltimore, MD, 1999. 19. Central Intelligence Agency. (2009) The World Factbook 2009. Washington, DC, 2009, https://www.cia.gov/library/publications/the-world-factbook/index.html (21 August 2012, date last accessed). e6054. 7. Grard G, Fair JN, Lee D et al. A novel rhabdovirus 20. Emerson JW, Hsu A, Levy MA et al. Environmental Performance Index and Pilot Trend Environmental associated with acute hemorrhagic fever in central Performance Index. Yale Center for Environmental Law Africa. PLoS Pathog 2012;8:e1002924. 8. Kilpatrick AM. Globalization, land use, and the invasion of Comparative Analyses of Phylogenetics and Evolution in R, R package version 0.5, 2012. predict pathogen community similarity in wild primates 22. Pinheiro J, Bates D, DebRoy S et al. nlme: Linear and and humans. Proc R Soc Lond B Biol Sci 2008;275: 1695–701. Nonlinear Mixed Effects Models, R package version 3.1-103, 2012. 10. Cooper N, Griffin R, Franz M et al. Phylogenetic host spe- 23. Blomberg SP, Garland T and Ives AR. Testing for phylo- cificity and understanding parasite sharing in primates. genetic signal in comparative data: behavioral traits are Ecol Lett 2012;15:1370–7. 11. Brownstein JS, Freifeld CC, Reis BY et al (2007) HealthMap: internet-based emerging infectious disease more labile. Evolution 2003;57:717–45. 24. Oksanen J, Blanchet FG, Kindt R et al. vegan: Community Ecology Package, R package version 2.0-3. 2012. intelligence. In: Medicine,IO (ed.), Infectious Disease 25. Dove ADM and Cribb TH. Species accumulation curves Surveillance and Detection: Assessing the Challenges— Finding Solutions. The National Academies Press: and their applications in parasite ecology. Trends Parasit 2006;22:568–74. Washington, DC, 2007, 183–204. 12. Nunn CL and Altizer S. The Global Mammal Parasite Database: an online resource for infectious disease records in wild primates. Evol Anthr 2005;14:1–2. 13. Hopkins ME and Nunn CL (2010) Gap analysis and the geographical distribution of parasites. In: Morand,S and Krasnov,B (eds), The Biogeography of Host–Parasite Interactions. Oxford University Press: Oxford, 2010, 129–42. 14. IUCN. IUCN Red List of Threatened Species. Version 2010.4, 26. Poulin R, Krasnov BR and Mouillot D. Host specificity in phylogenetic and geographic space. Trends Parasit 2011; 27:355–61. 27. Walther BA and Morand S. Comparative performance of species richness estimation methods. Parasitology 1998; 116:395–405. 28. R Development Core Team. (2012) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing: Vienna, Austria, 2012. 29. Whitesell S, Lilieholm RJ and Sharik TL. A global survey of 2010. http://www.iucnredlist.org (25 July 2011, date last tropical biological field stations. BioScience 2002;52: accessed). 55–64. 15. Arnold C, Matthews LJ and Nunn CL. The 10kTrees web- 30. Fisher MC, Henk DA, Briggs CJ et al. Emerging fungal site: a new online resource for primate phylogeny. Evol threats to animal, plant and ecosystem health. Nature Anthr 2010;19:114–8. 2012;484:186–94. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 West Nile virus. Science 2011;334:323–7. 9. Davies TJ and Pedersen AB. Phylogeny and geography and Policy: New Haven, CT, 2012. 21. Orme CDL, Freckleton RP, Thomas GH et al. caper: 173 Evolution, Medicine, and Public Health [2013] pp. 173–186 doi:10.1093/emph/eot015 orig inal research article Hygiene and the world distribution of Alzheimer’s disease Molly Fox*1, Leslie A. Knapp1,2, Paul W. Andrews3 and Corey L. Fincher4 1 Division of Biological Anthropology, Department of Anthropology and Archaeology, University of Cambridge, Pembroke Street, Cambridge CB2 3QY, UK, 2Department of Anthropology, University of Utah, 270 S 1400 E, Salt Lake City, UT 84112, USA, 3Department of Psychology, Neuroscience & Behaviour, McMaster University, 1280 Main Street W, Hamilton, ON L8S 4K1, Canada and 4Institute of Neuroscience and Psychology, University of Glasgow, 58 Hillhead Street, Glasgow G12 8QB, UK *Correspondence address: Department of Biological Anthropology, University of Cambridge, Pembroke Street, Cambridge CB2 3QY, UK. Tel:+44(0)1223 335638; Fax:+44(0)1223 335460; E-mail: mf392@cam.ac.uk Received 13 February 2013; revised version accepted 6 August 2013 ABSTRACT Background and objectives: Alzheimer’s disease (AD) shares certain etiological features with autoimmunity. Prevalence of autoimmunity varies between populations in accordance with variation in environmental microbial diversity. Exposure to microorganisms may improve individuals’ immunoregulation in ways that protect against autoimmunity, and we suggest that this may also be the case for AD. Here, we investigate whether differences in microbial diversity can explain patterns of age-adjusted AD rates between countries. Methodology: We use regression models to test whether pathogen prevalence, as a proxy for microbial diversity, across 192 countries can explain a significant amount of the variation in age-standardized AD disability-adjusted life-year (DALY) rates. We also review and assess the relationship between pathogen prevalence and AD rates in different world populations. Results: Based on our analyses, it appears that hygiene is positively associated with AD risk. Countries with greater degree of sanitation and lower degree of pathogen prevalence have higher age-adjusted AD DALY rates. Countries with greater degree of urbanization and wealth exhibit higher age-adjusted AD DALY rates. Conclusions and implications: Variation in hygiene may partly explain global patterns in AD rates. Microorganism exposure may be inversely related to AD risk. These results may help predict AD burden ß The Author(s) 2013. Published by Oxford University Press on behalf of the Foundation for Evolution, Medicine, and Public Health. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Epidemiological evidence for a relationship between microbial environment and age-adjusted disease burden 174 | Fox et al. Evolution, Medicine, and Public Health in developing countries where microbial diversity is rapidly diminishing. Epidemiological forecasting is important for preparing for future healthcare needs and research prioritization. K E Y W O R D S : Alzheimer’s disease; hygiene hypothesis; inflammation; dementia; pathogen preva- lence; Darwinian medicine INTRODUCTION in the household [22], exhibiting lower rates of atopic and autoimmune disorders. Low-level persistent stimulation of the immune system leads naı¨ve T-cells to take on a suppressive regulatory phenotype [23] necessary for regulation of both type-1 inflammation (e.g. autoimmunity) and type-2 inflammation (e.g. atopy) [12, 24, 25]. Individuals with insufficient immune stimulation may experience insufficient proliferation of regulatory T-cells (TRegs) [26, 27]. AD has been described as a disease of systemic inflammation [28], with the AD brain and periphery exhibiting upregulated type1 dominant inflammation [29]: a possible sign of TReg deficiency. Immunodysregulation as a consequence of low immune stimulation may contribute to AD risk through the T-cell system. Altogether, we suggest that a hygiene hypothesis for Alzheimer’s disease (HHAD) predicts that AD incidence may be positively correlated to hygiene. The period from gestation through childhood is typically thought to be a critical window of time during which the immune system is established [14, 30, 31], with some authors limiting this critical window to the first 2 years of life [32]. However, proliferation of TRegs occurs throughout the life course: there are age-related increases in number of TRegs [30, 33] with peaks at adolescence and in the sixth decade [34]. Therefore, it may be not only early-life immune stimulation that affects AD risk (and perhaps risk of other types of immunodysregulation) but also immune stimulation throughout life. Our study is designed based on the hypothesis that microorganism exposure across the lifespan may be related to AD risk. At an epidemiological level, our prediction is opposite to Finch’s [6] hypothesis that early-life pathogen exposure should be positively correlated to AD risk. Both their and our predictions are based on speculation about T-cell differentiation, although we reached opposite suppositions. It is clear that inflammation is upregulated in AD, and Finch suggested that pathogen exposure, which is proinflammatory, may increase AD risk [6]. We propose Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Exposure to microorganisms is critical for the regulation of the immune system. The immunodysregulation of autoimmunity has been associated with insufficient microorganism exposure [1]. Global incidence patterns of autoimmune diseases reflect this aspect of their etiology: autoimmunity is inversely correlated to microbial diversity [1, 2]. The inflammation characteristic of Alzheimer’s disease (AD) shares important similarities with autoimmunity [3, 4]. The similarity in immunobiology may lead to similarity in epidemiological patterns. For this reason, here we test the hypothesis that AD incidence may be positively correlated to hygiene. The possibility that AD incidence is related to environmental sanitation was previously introduced by other authors [5, 6], and remains as of yet untested. The ‘hygiene hypothesis’ [7] suggests that certain aspects of modern life (e.g. antibiotics, sanitation, clean drinking-water, paved roads) are associated with lower rates of exposure to microorganisms such as commensal microbiota, environmental saprophytes and helminths than would have been omnipresent during the majority of human history [8]. Low amount of microbe exposure leads to low lymphocyte turnover in the developing immune system, which can lead to immunodysregulation. Epidemiological studies have found that populations exposed to higher levels of microbial diversity exhibit lower rates of autoimmunities as well as atopies [9], a pattern that holds for countries with differing degrees of development [10, 11]. Differences in environmental sanitation can partly explain the patterns of autoimmunity and atopy across history and across world regions [2, 12]. Patient-based studies have demonstrated that individuals whose early-life circumstances were characterized by less exposure to benign infectious agents exhibit higher rates of autoimmune and atopic disorders. This pattern has been demonstrated for farm living versus rural non-farm living [13–15], daycare attendance [16, 17], more siblings [7, 18], later birth order [19–21] and pets Fox et al. | Hygiene and Alzheimer’s epidemiology METHODS In order to test whether there is epidemiological evidence for an HHAD, data for a wide range of countries were compared. Age-standardized disability-adjusted life-year (DALY) rates (henceforth ‘AD rates’) in 2004 were evaluated in light of proxies for microbial diversity across a range of years selected to fully encompass lifespans of individuals in the AD-risk age group in 2004. Prevalence of Alzheimer’s and other dementias We utilized the WHO’s Global Burden of Disease (GBD) report published in 2009, which presents data for 2004 [36]. The WHO only reports information for AD and other dementias across different countries, rather than AD alone. This variable does not include Parkinsonism [37]. While there are other types of dementia, AD accounts for between 60% and 80% of all dementia cases [38], and the neurobiological distinctions between some of the subtypes may be vague [39, 40]. The WHO report presents three variables related to AD: agestandardized DALY, age-standardized deaths, and DALY for age 60+. There is low correlation between these three measures (linear regressions after necessary data transformations had R-squared values 0.040; 0.089; 0.041). The WHO’s GBD report includes the following ICD-10 codes as ‘Alzheimer’s and other dementias:’ F01, F03, G30–G31, uses an incidence-based approach, 3% time discounting, the West Level 26 and 25 life tables for all countries assuming global standard life expectancy at birth [41], standard GBD disability weights [42] and non-uniform age-weights [43, 44]. DALYs are calculated from years of lost life (YLL) and years lost due to disability (YLD). YLL data sources for AD included death registration records for 112 countries, population-based epidemiological studies, disease registers and notification systems [45]. Vital registration data with coverage over 85% were available for 76 countries, and information for the remaining 114 countries was calculated using a combination of cause-of-death modeling, regional patterns and cause-specific estimates (see [45] for details). YLD data sources for AD included disease registers, population surveys and existing epidemiological studies [43]. When only prevalence data were available, incidence statistics were modeled from estimates of prevalence, remission, fatality and background mortality using the WHO’s DisMod II software [43]. Age-standardized rates were calculated by adjusting the crude AD DALY for 5-year age groups by age-weights reflecting the age-distribution of the standard population [35]. The new WHO World Standard was developed in 2000 to best reflect projections of world age-structures for the period 2000–25 [35], and particularly closely reflects the population age-structures of low- and middleincome countries [46]. DALYs as better measure than death rates. AD as cause of death is a particularly flawed measurement across different countries, as certain countries rarely attribute mortality to AD or recognize dementia as abnormal aging [47], and other causes of death may occur at far greater frequencies masking AD prevalence. The DALY measurement is the sum of years lost due to premature mortality and years spent in disability. The number of years lost due to premature mortality is based on the standard life expectancy at the age when death due to AD occurs [48]. Therefore, this measurement omits the effects of differential mortality rates previous to age 65, which accounts for the vast majority of differences in life expectancy between developed and developing countries [49], opportunely isolating the effects of later life mortality causes. Therefore, DALY includes but is not limited to AD as official cause of Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 that at the population level, pathogen exposure, as a proxy for benign microorganism exposure, may be protective against AD. Griffin and Mrak [5] suggested that an HHAD is justified because hygiene-related changes in immune development are probably related to AD etiology, but it is not possible to predict the directionality of this effect. They focus on how hygiene might influence microglial activation and IL-1 expression, and the opposing effects these changes may have on AD-relevant pathways. Our prediction, Finch’s contrary [35] prediction, and the assertion of Griffin and Mrak that further research is needed to establish the directionality of hygiene’s effect on AD motivated this empirical investigation of whether pathogen prevalence was correlated to AD rates at the country level. This information could lead to a better understanding of the environmental influences on AD etiology and could help judge the accuracy of existing predictions for future AD burden. 175 176 | Fox et al. Evolution, Medicine, and Public Health death, making it a more inclusive variable than AD as cause of death. Predictive variables The hygiene hypothesis is sometimes referred to as the ‘old friends hypothesis’ [52]. This alternative title highlights the fact that for the vast majority of our species’ history, humans would have been regularly exposed to a high degree of microbial diversity, and the human immune system was shaped through natural selection in these circumstances [53]. With rapidly increasing global urbanization beginning in the early nineteenth century, individuals began to experience diminishing exposure to these ‘friendly’ microbes due to diminishing contact with animals, feces and soil [1]. The microbes that were our ‘old friends’ previous to this epidemiological transition whose absence may lead to immunodysregulation in modern environments included gut, skin, lung and oral microbiota; orofecally transmitted bacteria, viruses and protozoa; helminthes; environmental saprophytes; and ectoparasites [1, 52]. The predictive variables in our analysis were selected for their relevance to these ‘old friends’. These variables do not specifically reflect exposure to those microbiota and commensal microorganisms that were omnipresent during our evolutionary history, but rather cover a more general collection of microbial exposures. This inclusive approach is both because of limitations of available datasets, and because it is not known if particular microbial elements specifically relate to AD etiology. Statistical methods All variables were transformed to optimize distribution symmetry to avoid undue influence by countries with extreme values for each variable (Supplementary Section S1). Linear regression was used to determine whether there was a relationship between each of the microbial exposure proxies and AD rate between countries. As there was reason to suspect predictive overlap among the variables, a Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Age-standardized data better indicator than age 60+ data. Age-standardized rates represent what the burden of AD would be if all countries had the same agedistribution [35]. While the WHO GBD report also provides DALY for ages 60+, these data are not age-standardized. Using DALY for ages 60+, a country with a significant population over age 85, for instance, would always appear to have greater AD incidence than a country with few people over 85, regardless of differences in age-specific AD incidence in the two countries. This is due to the exponential way in which AD risk increases with age [50, 51]. Differences in age-specific AD incidence would be masked by differences in population age structure. See Supplementary Section S6 for examples and further explanation. Murray and Schaller [54] assessed historical disease prevalence for the years 1944–61. Their ‘sevenitem index of historical disease prevalence’ included leishmanias, schistosomes, trypanosomes, malaria, typhus, filariae and dengue. Their ‘nine-item index’ included the above diseases plus leprosy and a contemporary measure of tuberculosis. Another measure of pathogen exposure [55] combines data from the WHO and the ‘Global Infectious Diseases and Epidemiology Network’ (GIDEON) for 2002 and 2009. The WHO reports countries’ ‘percent population using improved sanitation facilities’, which ‘separate human excreta from human contact’ [56], and ‘improved drinking-water sources’, which protect the source from contamination [56]. Of 3 years for which sufficient data were available, we chose to look at 1995 (Supplementary Section S1). ‘Infant mortality rate’ (IMR) is measured as number of infant mortalities per 1000 live births [57]. We consulted historical IMR statistics [58] for years 1900– 2002. The World Bank reports countries’ ‘gross national income’ (GNI) per capita at purchasing power parity (PPP) [57], the economic variable that contributes to the composite Human Development Index and is therefore an indicator of the contribution of wealth to quality of life. We looked at both the earliest year with sufficient data available, 1970 and 2004. ‘Gross domestic product’ (GDP) per capita (PPP) is an economic measure of a country divided by its mid-year population [57], often used as a measure of standard of living, although it is not a direct assessment of this. We consulted Maddison’s historical economic statistics [59] for GDP from 1900 to 2002. Children growing up in rural areas may be more exposed to pathogens due to factors including unpaved roads, contact with livestock and animal feed, and consumption of unprocessed milk [13, 14]. The World Bank reports the ‘percent of a country’s population living in urban areas’ [57], and we looked at both the earliest year with data available, 1960 and 2004. Fox et al. | Hygiene and Alzheimer’s epidemiology principal component regression was conducted (Supplementary Section S4). To explore which part of the lifespan has the most bearing on AD, we measure the strength of the same proxy across several years as a predictor of 2004 AD prevalence. We compare historical and contemporary disease prevalence, and GNI and urban living. Also, two of the proxies had more detailed historical information available: GDP [59] and IMR [58]. GDPs and IMRs from years spanning 1900–2002 were compared. Each year’s GDP and IMR were compared to 2004 AD rates by linear regression. The resulting regression coefficients were interpreted as the degree to which each year’s GDP or IMR and 2004 AD were correlated (Supplementary Section S5). Statistical analyses revealed highly significant relationships between various measures of hygiene and age-adjusted AD DALY. High levels of pathogen exposure were associated with lower AD rates. Countries with higher disease and pathogen prevalence and IMR had lower 2004 AD rates (Table 1, Figs 1–4, Fig. S1). These results are consistent with a protective role of exposure to microbial diversity against AD, and support an HHAD. Greater degree of hygiene, and therefore potentially lower degree of microorganism exposure, was associated with higher AD rates. Countries with a higher percent of the population using improved sanitation facilities, improved drinking-water sources, living in urbanized areas, higher GNI and GDP per capita (PPP) had higher rates of AD in 2004 (Table 1, Fig. 3, Figs S2–S5). The existence of a strong positive correlation between historical levels of sanitation and AD rate in 2004 is consistent with the predictions of the HHAD. Comparing predictive power of data from different years Our results indicate that microbe exposure across the lifespan, not necessarily just during early-life, is associated with AD burden. Predictive variables reflecting the years during which elderly people in 2004 would have experienced their earlier years of life were not consistently better indicators of 2004 AD rate compared to years during which they would have spent their mid or later life years, consistent with observations of lifelong TReg proliferation [30, 33, 34]. Contemporary measures were more powerful indicators in the cases of contemporary parasite stress versus historical disease prevalence, and GNI in 2004 versus 1970, while percent of population living in urban areas in 1960 versus 2004 was a slightly better indicator of AD in 2004 (Fig. 3). A series of linear regressions measuring the relationship between IMRs from 1900 to 2002 and AD in 2004 revealed that IMR was significantly Table 1. An evaluation of proxies for hygiene as predictors of Alzheimer burden Variable Year N R2 P Direction consistent with HHAD? Historical disease prevalence nine-item Historical disease prevalence seven-item Contemporary parasite stress Improved sanitation facilities Improved drinking-water sources GNI 1944–61 1944–61 2002, 2009 1995 1995 1970 2004 148 191 190 177 178 171 174 0.358 0.242 0.373 0.334 0.327 0.296 0.328 **** **** **** **** **** **** **** Yes Yes Yes Yes Yes Yes Yes Urban % 1960 2004 187 187 0.282 0.204 **** **** Yes Yes Linear regression for each predictive variable after transformation to optimize symmetry (Supplementary Section S1). Principal component regression for parasite stress, sanitation facilities, drinking-water sources, infant mortality, GNI and urbanization had a higher R2 than any predictive variable on its own (Supplementary Section S4). ****P < 0.0000. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 RESULTS 177 178 | Fox et al. Evolution, Medicine, and Public Health Countries with higher contemporary parasite prevalence have lower age-standardized rates of Alzheimer’s in 2004. N = 190, R2 = 0.373, P < 0.0000. Contemporary parasite stress [55] combines years 2002 and 2009, and the variable is transformed by adding a constant and taking square root. The Alzheimer variable is transformed by adding a constant and taking the natural log of 2004 Alzheimer age-standardized DALY [36] negatively correlated to 2004 AD rates with an increasing annual ability to explain variance (Fig. 4, Supplementary Section S5). This analysis was restricted to countries with IMR data across 1900–2002 (N = 45) and thus was free from sample size biases. Similarly, a series of linear regressions measuring the relationship between GDPs from 1900 to 2002 and AD in 2004 revealed that GDP was significantly positively correlated to 2004 AD rates, although the increasing annual ability to explain variance was only consistently true for years 1940–2002 (Fig. S7). Principal component analysis A principal component regression analysis demonstrated that the combined statistical impact of the proxies for microbial diversity had a significant effect on AD rates. There were high degrees of correlation between disease prevalence, parasite stress, sanitation facilities, improved drinking water, GNI and urban population (Table S1). There was a significant relationship between the principal component and 2004 AD rate (N = 159, R2 = 0.425, P < 0.0000) (Supplementary Section S4, Fig. S6). DISCUSSION Inflammation plays an important role in AD pathogenesis, and previous authors have hypothesized that immune stimulation could increase AD risk through T-cell [6] or microglial action [5, 60]. However, our results indicate that some immune Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Figure 1. Countries’ parasite stress negatively correlated to Alzheimer’s burden Fox et al. | Hygiene and Alzheimer’s epidemiology Countries with historically more infectious disease have lower age-standardized rates of Alzheimer’s in 2004. N = 148, R2 = 0.358, P < 0.0000. Historical disease prevalence 9-item data compiled by Murray and Schaller for years 1944–61 [54]. Alzheimer prevalence variable is transformed by adding a constant and taking the natural log of 2004 Alzheimer age-standardized DALY [36] stimulation may protect against AD risk. The traditional hygiene hypothesis has effectively explained prevalence rates of autoimmunity and atopy in many populations. We have found that the hygiene hypothesis can help explain some patterns in rates of AD, as predicted previously [5, 8] with a direction of correlation consistent with our predictions based on T-cell differentiation. In support of the HHAD, we found that proxies for immune stimulation correlated to AD rates across countries. These proxies included historical disease prevalence, parasite stress, access to sanitation facilities, access to improved drinking-water sources, IMR, GNI, GDP and urbanization. These variables’ relationships to AD rate are independent of countries’ age structures. Further evidence supporting this hypothesis from previous studies includes higher incidence of AD in developed compared to developing countries, comparison to incidence patterns of other disorders with similar immunobiology and consideration of the etiology of inflammation in AD. Higher Alzheimer’s risk in industrialized countries and urban environments People living in developed compared to developing countries have higher rates of AD. AD incidence at age 80 is higher in North America and Europe than in other countries [51]. A meta-analysis found that dementia incidence doubled every 5.8 years in high income countries and every 6.7 years in lowand middle-income countries, where the overall incidence of dementia was 36% lower [39] (but see [61]). Another meta-analysis found that agestandardized AD prevalences in Latin America, China and India were all lower than in Europe, and within those regions, lower in rural compared to urban settings [62]. In a meta-analysis of Asian countries, the wealthier ones had higher Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Figure 2. Countries’ historical disease prevalence negatively correlated to Alzheimer’s burden 2004 179 180 | Fox et al. Evolution, Medicine, and Public Health Countries with more of the population living in urban areas in 1960 have higher age-standardized rates of Alzheimer’s in 2004. N = 187, R2 = 0.282, P < 0.0000. Urbanization data [57] are transformed by adding a constant and taking the square root. The Alzheimer prevalence variable is transformed by adding a constant and taking the natural log of 2004 Alzheimer age-standardized DALY [36] age-standardized rates of AD prevalence than the poorer countries [63]. A meta-analysis of rural versus urban AD incidence demonstrated that, overall, rural living was associated with higher incidence, but when only high-quality studies were considered, rural living was associated with reduced incidence [47]. In developing countries and rural environments, there are higher rates of microbial diversity, exposure and infection. It is well-documented that rates of atopies including allergies, hay fever, eczema and asthma are lower in developing than developed countries [12, 64], as are autoimmunities such as multiple sclerosis, type-1 diabetes mellitus and Crohn’s disease [65], and the same pattern holds for rural versus urban environments [2, 14, 15] (but see [66]). We found that wealthier countries (Figs S3 and S4) and more urbanized countries (Fig. 2) had greater AD rates after adjustment for population age-structures. Alzheimer’s risk changes with environment Previous studies have found that individuals from similar ethnic backgrounds living in low versus high sanitation environments exhibit low versus high risk of AD. African Americans in Indiana had higher agespecific mortality-adjusted incidence rates of AD than Yoruba in Nigeria [67], and while there is no indication that the sampled African Americans had any Yoruba or even Nigerian heritage, the results are at least consistent with our predictions for the HHAD. There is also evidence that immigrant populations exhibit AD rates intermediate between their home country and adopted country [63, 68–70]. Moving from a high-sanitation country to a Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Figure 3. Countries’ urbanization 1960 positively correlated to Alzheimer’s burden 2004 Fox et al. | Hygiene and Alzheimer’s epidemiology For each year (x), the regression coefficient (y) of the correlation between year’s (x) IMR and AD burden in 2004. IMR for the various years were transformed by square root and natural log. See Supplementary Section S5. All correlations were significant besides the years 1900, 1901 and 1911. Significance: 1902–19, P < 0.05; 1920–44, P < 0.01; 1945–2002, P < 0.000 low-sanitation country can decrease immigrants’ AD risk [47] [e.g. Italy to Argentina [71] (but see [72])]. Our results are consistent with the possibility that living in different countries confers different AD risk, stratified by sanitation (Figs 1 and 2). Alzheimer’s risk and family size Having relatively more siblings would be expected to correspond to higher rates of early-life immune activation because of more interaction with other children who may harbor ‘friendly’ commensal microorganisms. Individuals with more siblings have lower atopy prevalence [7, 18, 20]. Certain studies have found that individuals with AD have fewer siblings than controls [73, 74], but Moceri and coworkers found no difference [75], and the opposite [76]. Younger siblings compared to first-borns would be expected to have higher rates of microorganism exposure because this would indicate higher proportion of childhood spent in contact with siblings. Later-born siblings have lower rates of atopy [7, 20, 66, 77, 78]. No birth order effect has been observed for AD [79]. Lymphocytes and Alzheimer’s T cells are important modulators of immune function and have been identified as the major affected system in trends attributable to the hygiene hypothesis [8]. TReg cells become more abundant with age in healthy subjects [80]. Whether the proliferation of T cells in AD [80, 81] is effector or regulatory has been the subject of controversy. Recently, Larbi et al. re-analyzed their earlier postulation that TReg cells may be upregulated in AD [82]. When a marker for TRegs, FoxP3 [83], was considered, the authors determined that the upregulation occurring was proliferation of activated effector T cells [84], consistent with our predictions for an HHAD. One study found that TReg cells were increased in mild cognitive impairment (MCI) and TReg-induced immunosuppression was stronger in MCI than in AD or controls [85]. This could be evidence that among people with predisposition toward AD (i.e. genetic or environmental risk factors), those with adequate TReg function may merely develop MCI, while those with inadequate TReg function may develop AD. Also, it may be that during AD pathogenesis, those with adequate TReg function may Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Figure 4. Contemporary environment may be better indicator of Alzheimer’s burden than early-life environment 181 182 | Fox et al. Evolution, Medicine, and Public Health ApoE ApoE-e4 is an allele that has pro-inflammatory effects, and increases BBB permeability [95]. Compromise of the BBB would make other proinflammatory mechanisms become exacerbated risk factors for AD, when otherwise, inflammation might have been limited to the periphery. There is already evidence that ApoE alleles confer different degrees of AD risk in different environments. While the e4 allele was an AD risk factor among African Americans, this was not the case among the Yoruba in Nigeria [96], nor is e4 associated with increased risk among Nyeri Kenyans, Tanzanians [97], Wadi Ara Arab Israelis [98], Khoi San [99], Bantu and Nilotic African cohorts [100]. These patterns support the possibility that environmental factors such as microbial diversity interact with inflammatory pathways to influence AD risk. Darwinian medicine One of the major aims of Darwinian medicine is understanding human health and disease within the context of our species’ evolutionary history [101]. In their formative 1991 paper, Williams and Nesse discussed the promise for Darwinian medicine to make strides in aging research [102]. There is growing enthusiasm for Darwinian medicine approaches to understanding aging [103], specifically neurodegeneration [104] and more specifically AD (Supplementary Section S7). The hygiene hypothesis is an important contribution from Darwinian medicine [52]. Recently, authors have suggested extending the hygiene hypothesis toward explaining obesity [105] and certain cancers [106, 107]. Thus far, the hygiene hypothesis has been discussed mostly in the context of consequences for early and mid-life pathologies, and we feel that more attention should be paid to its potential to explain patterns of disease in later life. Study limitations Some critics suggest that age-weighting is not reflective of social values [42, 108]. A systematic review of GBD methodologies determined that the WHO’s report [42] represented a rigorously comprehensive methodology, but is still limited by the fact that many developing countries use paper-based health surveillance based more on estimates and projections than actual counts. It is also possible that certain predictive variables, especially urbanization, are related to surveillance accuracy. Large-scale epidemiological studies often suffer from lack of surveillance and statistical limitations. Much of public health and epidemiological research is based on correlational studies, inherently limited by the inability to demonstrate causality. Nonetheless, these types of investigation provide necessary perspective of environmental influences on biological mechanisms, and help evaluate public health burden and predict future healthcare needs. Importance and applications As AD becomes an increasingly global epidemic, there is growing need to be able to predict AD rates across world regions in order to prepare for the future public health burden [39]. It is in low- and middle-income countries that the sharpest rise is predicted to occur in the coming decades [109]. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 linger in the MCI phase for longer, while those with inadequate TReg function may progress to AD more rapidly. It has already been well established that insufficient TReg numbers lead to excessive TH1 inflammation in autoimmunity and TH2 inflammation in atopy [52, 86]. The inflammation characteristic of AD is TH1 dominant [29, 87, 88]. In AD, amyloid-b activates microglia and astrocytes, which stimulate preferential TH1 proliferation [89], and AD patients have elevated levels of TH1-associated cytokines [90–92]. Non-steroidal anti-inflammatory drugs have been demonstrated to be protective against AD [93]. T cells may influence AD pathogenesis not only from within the brain but also from the periphery. Activated TH1 cells in the periphery could secrete pro-inflammatory cytokines, which cross the blood– brain barrier (BBB) and directly activate microglia and astrocytes in the brain, as well as indirectly induce inflammation by activating dendritic cells [89]. It should be investigated whether hygiene directly affects the development of microglia. Microglia sometimes exhibit a non-inflammatory phenotype, in contrast to the pro-inflammatory phenotype typical of the activated state in the context of AD and other brain insults [5, 94], It is unknown whether hygiene would promote development of the inflammatory or non-inflammatory phenotypes [5], and further research is needed to establish the effect of microbial deprivation on microglial development. Fox et al. | Hygiene and Alzheimer’s epidemiology We suggest that ecological changes that will occur in low- and middle-income countries as they become more financially developed may affect AD rates in ways not currently appreciated. Other authors have also discussed the ways in which infectious disease may affect patterns of non-communicable chronic disease [110]. Disease predictions affect public policy, healthcare prioritization, research funding and resource allocation [111]. Better methods for estimating AD rates could affect these policies and strategies [112]. The next step in testing the HHAD should be to directly investigate patients’ immune activity throughout life and AD risk, preliminarily in a cross-sectional study utilizing medical records, serum analysis, or reliable interview techniques. nutrition. Proc Natl Acad Sci USA 2010; 107(Suppl. 1):1718–24. 7. Strachan DP. Hay fever, hygiene, and household size. Br Med J 1989;299:1259–60. 8. Rook GAW (ed.). Introduction: The changing microbial environment, Darwinian medicine and the hygiene hypothesis, The Hygiene Hypothesis and Darwinian Medicine. Springer: Basel, Switzerland, 2009, 1–27. 9. Lynch NR, Hagel I, Perez M et al. Effect of anthelmintic treatment on the allergic reactivity of children in a tropical slum. J Allergy Clin Immunol 1993;92:404–11. 10. Von Mutius E, Fritzsch C, Weiland SK et al. Prevalence of asthma and allergic disorders among children in united Germany: a descriptive comparison. Br Med J 1992;305: 1395–9. 11. Cookson WOCM, Moffatt MF. Asthma—an epidemic in the absence of infection? Science 1997;275:41–2. 12. Romagnani S. The increased prevalence of allergy and the Supplementary data are available at EMPH online. immune suppression, or both? Immunology 2004;112: 352–63. 13. Leynaert B, Neukirch C, Jarvis D et al. Does living on a farm during childhood protect against asthma, allergic rhinitis, acknowledgements and atopy in adulthood? Am J Respir Crit Care Med 2001; 164:1829–34. The authors would like to thank the Gates Cambridge Trust 14. von Mutius E, Vercelli D. Farm living: effects on child- and Tom LeDuc. They also appreciate the insights of the hood asthma and allergy. Nat Rev Immunol 2010;10: associate editor and anonymous reviewers who handled this 861–8. manuscript. 15. Kilpelainen M, Terho E, Helenius H et al. Farm environment in childhood prevents the development of allergies. funding Clin Exp Allergy 2000;30:201–8. 16. Haby MM, Marks GB, Peat JK et al. Daycare attend- M.F. is supported by Gonville & Caius College, Cambridge. ance before the age of two protects against atopy C.L.F. is supported by ESRC grant ES/I031022/1. in preschool age children. Pediatr Pulmonol 2000;30: 377–84. Conflict of interest: None declared. 17. Kramer U, Heinrich J, Wjst M et al. Age of entry to day nursery and allergy in later childhood. Lancet 1999;353: references 450–4. 18. Von Mutius E, Martinez FD, Fritzsch C et al. Skin test re- 1. Rook GAW. Hygiene hypothesis and autoimmune diseases. Clin Rev Allergy Immunol 2012;42:5–15. activity and number of siblings. Br Med J (Clin Res Ed) 1994;308:692–5. 2. Rook GAW. The hygiene hypothesis and the increasing 19. Westergaard T, Rostgaard K, Wohlfahrt J et al. Sibship prevalence of chronic inflammatory disorders. Trans R characteristics and risk of allergic rhinitis and asthma. Soc Trop Med Hyg 2007;101:1072–4. 3. D’Andrea MR. Add Alzheimer’s disease to the list of autoimmune diseases. Med Hypotheses 2005;64:458–63. 4. D’Andrea MR. Evidence linking neuronal cell death to autoimmunity in Alzheimer’s disease. Brain Res 2003; 982:19–30. Am J Epidemiol 2005;162:125–132. 20. Matricardi PM, Franzinelli F, Franco A et al. Sibship size, birth order, and atopy in 11,371 Italian young men. J Allergy Clin Immunol 1998;101:439–44. 21. Strachan DP, Taylor EM, Carpenter RG. Family structure, neonatal infection, and hay fever in adolescence. Arch Dis medicine and the hygiene hypothesis in Alzheimer patho- Child 1996;74:422–6. 22. Apter AJ. Early exposure to allergen: is this the cat’s meow, genesis? In: Graham AWR (ed.) The Hygiene Hypothesis or are we barking up the wrong tree? J Allergy Clin Immunol 5. Griffin WST, Mrak RE. Is there room for Darwinian and Darwinian Medicine. Springer: Basel, Switzerland, 2009, 257–78. 6. Finch CE. Evolution of the human lifespan and diseases of aging: roles of infection, inflammation, and 2003;111:938–46. 23. Abbas AK, Lohr J, Knoechel B. Balancing autoaggressive and protective T cell responses. J Autoimmun 2007;28: 59–61. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 hygiene hypothesis: missing immune deviation, reduced supplementary data 183 184 | Fox et al. Evolution, Medicine, and Public Health 24. Yazdanbakhsh M, Kremsner PG, Van Ree R. Allergy, parasites, and the hygiene hypothesis. Science 2002;296: 490–4. 25. Chang JS, Wiemels JL, Buffler PA. Allergies and childhood leukemia. Blood Cells Mol Dis 2009;42:99–104. Evidence for Health Policy. World Health Organization: Geneva, 2001. 42. Polinder S, Haagsma JA, Stein C et al. Systematic review of general burden of disease studies using disabilityadjusted life years. Popul Health Metr 2012;10:1–15. 26. Vukmanovic-Stejic M, Zhang Y, Cook JE et al. Human 43. Mathers CD, Lopez AD, Murray CJ. The burden of disease CD4+ CD25hi Foxp3+ regulatory T cells are derived by and mortality by condition: data, methods, and results for rapid turnover of memory populations in vivo. J Clin 2001. In: Lopez AD, Mathers CD, Ezzati M, Jamison DT Invest 2006;116:2423–33. 27. Sakaguchi S, Miyara M, Costantino CM et al. FOXP3+ regulatory T cells in the human immune system. Nat Rev Immunol 2010;10:490–500. 28. Pellicano M, Larbi A, Goldeck D et al. Immune profiling of Alzheimer patients. J Neuroimmunol 2011;242:52–9. et al. (eds) Global Burden of Disease and Risk Factors. World Bank: Washington, DC, 2006, 45–241. 44. Williams A. Calculating the global burden of disease: time for a strategic reappraisal? Health Econ 1999;8:1–8. 45. Mathers C, Fat DM, Boerma J. The Global Burden of Disease: 29. Schwarz MJ, Chiang S, Muller N et al. T-helper-1 and 2004 Update. World Health Organization: Geneva, 2008. 46. World Health Organization. Information on Estimation T-helper-2 responses in psychiatric disorders. Brain Methods. Global Health Observatory (GHO) 2013. http:// Behav Immun 2001;15:340–70. www.who.int/gho/ncd/methods/en/ (2 August 2013, 30. Prescott SL. Promoting tolerance in early life: pathways date last accessed). 47. Russ TC, Batty GD, Hearnshaw GF et al. Geographical 31. Smith M, Tourigny MR, Noakes P et al. Children with egg variation in dementia: systematic review with meta- allergy have evidence of reduced neonatal CD4+ CD25+ analysis. Int J Epidemiol 2012;41:1012–32. 48. World Health Organization. Mortality and Burden of CD127 lo/- regulatory T cell function. J Allergy Clin Immunol 2008;121:1460–6. 32. Nesse RM, Williams GC. Allergy. Why We Get Sick: The New Science of Darwinian Medicine. Chapter 11. New York: Random House, 1996, 158–70. Disease Estimates for WHO Member States in 2004. Geneva: WHO, 2009. http://www.who.int/healthinfo/ global_burden_disease/estimates_country/en/index. html (29 July 2013, date last accessed). 33. Gregg R, Smith C, Clark F et al. The number of human 49. Kalaria RN, Maestre GE, Arizaga R et al. Alzheimer’s peripheral blood CD4+ CD25high regulatory T cells disease and vascular dementia in developing countries: increases with age. Clin Exp Immunol 2005;140:540–6. prevalence, management, and risk factors. Lancet Neurol 34. Caetano Faria AM, Monteiro de Moraes S, Ferreira de 2008;7:812–26. Freitas LH et al. Variation rhythms of lymphocyte subsets 50. Brookmeyer R, Gray S, Kawas C. Projections of during healthy aging. Neuroimmunomodulation 2008;15: Alzheimer’s disease in the United States and the public 365–79. health impact of delaying disease onset. Am J Public Health 35. Ahmad OB, Boschi-Pinto C, Lopez AD et al. Age Standardization of Rates: A New WHO Standard. World Health Organization: Geneva, 2001. 1998;88:1337–42. 51. Ziegler-Graham K, Brookmeyer R, Johnson E et al. Worldwide variation in the doubling time of Alzheimer’s 36. World Health Organization. Burden of Disease. Health disease incidence rates. Alzheimers Dement 2008;4: Statistics and Health Information Systems: Disease and 316–23. 52. Rook G. 99th Dahlem conference on infection, inflamma- Injury Country Estimates. Geneva: WHO, 2009. http:// www.who.int/healthinfo/global_burden_disease/ tion and chronic inflammatory disorders: Darwinian estimates_country/en/index.html (29 July 2013, date last medicine and the ‘‘hygiene’’ or ‘‘old friends’’ hypothesis. accessed). Clin Exp Immunol 2010;160:70–9. 37. World Health Organization. Deaths and DALYs 2004: 53. Nesse RM, Williams GC. Why We Get Sick: The Analysis Categories and Mortality Data Sources (Annex C). New Science of Darwinian Medicine. Random House: Global Burden of Disease: 2004 Update 2009. http://www. New York, 1996. 54. Murray DR, Schaller M. Historical prevalence of infectious who.int/ (29 August 2013, date last accessed). 38. Alzheimer’s Association. 2012 Alzheimer’s disease facts and figures. Alzheimer Dement 2012;8:131–68. 39. Sosa-Ortiz AL, Acosta-Castillo I, Prince MJ. Epidemiology of dementias and Alzheimer’s disease. Arch Med Res 2012; 43:600–8. 40. Ince P. Pathological correlates of late-onset dementia in a multicentre, community-based population in England and Wales. Lancet 2001;357:169–75. 41. Mathers C, Vos T, Lopez A et al. National Burden of Disease Studies: A Practical Guide. WHO Global Program on diseases within 230 geopolitical regions: A tool for investigating origins of culture. J Cross Cult Psychol 2010; 41:99–108. 55. Fincher CL, Thornhill R. Parasite-stress promotes in-group assortative sociality: the cases of strong family ties and heightened religiosity. Behav Brain Sci 2012;35:61–79. 56. World Health Organization. MDG 7: Environment Sustainability. Global Health Observatory Data Repository, 2011. http://apps.who.int/ghodata (2 August 2013, date last accessed). Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 and pitfalls. Curr Allergy Clin Immunol 2008;21:64–9. Fox et al. | Hygiene and Alzheimer’s epidemiology 57. The World Bank. World Development Indicators, 2011. 74. Alafuzoff I, Almqvist E, Adolfsson R et al. A comparison of http://data.worldbank.org (29 July 2013, date last multiplex and simplex families with Alzheimer’s disease/ accessed). senile dementia of Alzheimer type within a well defined 58. Abouharb MR, Kimball AL. A new dataset on infant mortality rates, 1816–2002. J Peace Res 2007;44:743–754. for Economic Co-operation and Development (OECD): Paris, 2003. neuropathogenic factors in HIV infection: pathogenic similarities to Alzheimer’s disease. J Neuropathol Exp Neurol 1994;53:231–8. data and birth certificates to reconstruct the early-life opment of Alzheimer’s disease. Epidemiology 2001;12: 383–9. 76. Moceri VM, Kukull W, Emanuel I et al. Early-life risk factors and the development of Alzheimer’s disease. Neurology 61. World Health Organization. Dementia: A Public Health WHO–Alzheimer’s 75. Moceri VM, Kukull WA, Emanual I et al. Using census socioeconomic environment and the relation to the devel- 60. Stanley LC, Mrak RE, Woody RC et al. Glial cytokines as Priority. population. J Neural Transm Park Dis Dement Sect 1994; 7:61–72. 59. Maddison A. The World Economy: Historical Statistics. Organisation Disease International: Geneva, 2012. 62. Rodriguez JJL, Ferri CP, Acosta D et al. Prevalence of dementia in Latin America, India, and China: a populationbased cross-sectional survey. Lancet 2008;372:464–74. 2000;54:415–20. 77. Strachan DP, Harkins LS, Johnston IDA et al. Childhood antecedents of allergic sensitization in young British adults. J Allergy Clin Immunol 1997;99:6–12. 78. Jarvis D, Chinn S, Luczynska C et al. The association of family size with atopy and atopic disease. Clin Exp Allergy 1997;27:240–5. Interethnic differences in dementia epidemiology: global 79. Knesevich JW, LaBarge E, Martin RL et al. Birth order and and Asia-Pacific perspectives. Dement Geriatr Cogn Disord maternal age effect in dementia of the Alzheimer type. 2010;30:492–8. Psychiatry Res 1982;7:345–50. 64. Beasley R. Worldwide variation in prevalence of symptoms 80. Rosenkranz D, Weyer S, Tolosa E et al. Higher frequency of of asthma, allergic rhinoconjunctivitis, and atopic regulatory T cells in the elderly and increased suppressive eczema: ISAAC. Lancet 1998;351:1225–32. 65. Bach JF. The effect of infections on susceptibility to autoimmune and allergic diseases. New Engl J Med 2002;347:911–20. 66. Tallman PS, Kuzawa C, Adair L et al. Microbial exposures in activity in neurodegeneration. J Neuroimmunol 2007;188: 117–27. 81. Togo T, Akiyama H, Iseki E et al. Occurrence of T cells in the brain of Alzheimer’s disease and other neurological diseases. J Neuroimmunol 2002;124:83–92. infancy predict levels of the immunoregulatory cytokine 82. Larbi A, Pawelec G, Witkowski JM et al. Dramatic interleukin-4 in filipino young adults. Am J Hum Biol shifts in circulating CD4 but not CD8 T cell subsets 2012;24:446–53. 67. Hendrie HC, Osuntokun BO, Hall KS et al. Prevalence of in mild Alzheimer’s disease. J Alzheimers Dis 2009;17: 91–103. Alzheimer’s disease and dementia in two communities: 83. Vaeth M, Schliesser U, Muller G et al. Dependence on Nigerian Africans and African Americans. Am J Psychiatry nuclear factor of activated T-cells (NFAT) levels discrim- 1995;152:1485–92. inates conventional T cells from Foxp3+regulatory T cells. 68. Yamada T, Kadekaru H, Matsumoto S et al. Prevalence of dementia in the older Japanese-Brazilian population. Psychiatry Clin Neurosci 2002;56:71–5. 69. Graves A, Larson E, Edland S et al. Prevalence of dementia Proc Natl Acad Sci 2013;109:16258–63. 84. Pellicano M, Larbi A, Goldeck D et al. Immune profiling of Alzheimer patients. J Neuroimmunol 2012; 242:52–9. and its subtypes in the Japanese American population of 85. Saresella M, Calabrese E, Marventano I et al. PD1 negative King County, Washington State The Kame Project. Am J and PD1 positive CD4+T regulatory cells in mild cognitive Epidemiol 1996;144:760–71. 70. Hendrie HC, Ogunniyi A, Hall KS et al. Incidence of impairment and Alzheimer’s disease. J Alzheimers Dis dementia and Alzheimer disease in 2 communities. 86. Poojary KV, Yi-chi MK, Farrar MA. Control of Th2-mediated JAMA 2001;285:739–47. 71. Roman G. Gene-environment interactions in the ItalyArgentina ‘‘Colombo 2000’’ project. Funct Neurol 1998; 13:249–52. 72. Adelman S, Blanchard M, Rait G et al. Prevalence of dementia in African-Caribbean compared with UK-born White older people: two-stage cross-sectional study. Br J Psychiatry 2011;199:119–25. 73. Kim J, Stewart R, Shin I et al. Limb length and dementia in an older Korean population. J Neurol Neurosurg Psychiatry 2010;21:927–38. inflammation by regulatory T cells. Am J Pathol 2010;177: 525–31. 87. Remarque E, Bollen E, Weverling-Rijnsburger A et al. Patients with Alzheimer’s disease display a pro-inflammatory phenotype. Exp Gerontol 2001;36:171–6. 88. Singh V, Mehrotra S, Agarwal S. The paradigm of Th1 and Th2 cytokines. Immunol Res 1999;20:147–161. 89. Town T, Tan J, Flavell RA et al. T-cells in Alzheimer’s disease. Neuromolecular Med 2005;7:255–64. 90. Heneka MT, O’Banion MK. Inflammatory processes in Alzheimer’s disease. J Neuroimmunol 2007;184:69–91. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 63. Venketasubramanian N, Sahadevan S, Kua E et al. 2003;74:427–32. 185 186 | Fox et al. Evolution, Medicine, and Public Health 91. Swardfager W, Lanctot K, Rothenburg L et al. A metaanalysis of cytokines in Alzheimer’s disease. Biol Psychiatry 2010;68:930–41. 102. Williams GC, Nesse RM. The dawn of Darwinian medicine. Q Rev Biol 1991;66:1–22. 103. Williams G. Pleiotropy, natural selection, and the 92. Huberman M, Shalit F, Roth-Deri I et al. Correlation of evolution of senescence. Evolution 1957;11:398–411. cytokine secretion by mononuclear cells of Alzheimer 104. Frank SA. Somatic evolutionary genomics: mutations patients and their disease stage. J Neuroimmunol 1994; 52:147–52. 93. McGeer PL, Schulzer M, McGeer EG. Arthritis and anti- during development cause highly variable genetic mosaicism with risk of cancer and neurodegeneration. Proc Natl Acad Sci 2010;107(Suppl. 1):1725–30. inflammatory agents as possible protective factors for 105. Isolauri E, Kalliomaki M, Rautava S et al. Obesity— 94. Morgan D. Modulation of microglial activation state fol- extending the hygiene hypothesis. In: Microbial–Host Interaction: Tolerance versus Allergy, Vol. 64. Nestle´ lowing passive immunization in amyloid depositing Nutrition Institute Workshop Series: Pediatric Program. Alzheimer’s disease. Neurology 1996;47:425–32. transgenic mice. Neurochem Int 2006;49:190–4. University of Turku: Turku, Finland, 2010, 75–89. 95. Donahue JE, Johanson CE. Apolipoprotein E, amyloid[beta], and blood-brain barrier permeability in Alzheimer disease. J Neuropathol Exp Neurol 2008;67: 2010;126:1651–65. 107. Erdman SE, Poutahidis T. Roles for inflammation and 261–70. 96. Osuntokun BO, Sahota A, Ogunniyi A et al. Lack of an between apolipoprotein E e4 and Alzheimer’s disease in elderly Nigerians. Ann Neurol 2004;38:463–5. 97. Sayi J, Patel N, Premukumar D et al. Apolipoprotein E polymorphism in elderly east Africans. East Afr Med J 1997;74:668–70. regulatory T cells in colon cancer. Toxicol Pathol 2010; 38:76–87. 108. Anand S, Hanson K. Disability-adjusted life years: a critical review. J Health Econ 1997;16:685–702. 109. Prince M, Jackson J. World Alzheimer’s Report Executive Summary. Alzheimer’s Disease International 2009, 1–24. 98. Farrer LA, Friedland RP, Bowirrat A et al. Genetic and 110. Remais JV, Zeng G, Li G et al. Convergence of non- environmental epidemiology of Alzheimer’s disease in communicable and infectious diseases in low- and Arabs residing in Israel. J Mol Neurosci 2003;20:207–12. middle-income countries. Int J Epidemiol 2013;42:221–7. 99. Sandholzer C, Delport R, Vermaak H et al. High frequency 111. Lopez AD, Mathers CD, Ezzati M et al. Measuring the of the apo e4 allele in Khoi San from South Africa. Hum global burden of disease and risk factors, 1990–2001. Genet 1995;95:46–8. 100. Chen CH, Mizuno T, Elston R et al. A comparative study to In: Lopez AD, Mathers CD, Ezzati M et al. (eds) Global Burden of Disease and Risk Factors. World Bank: screen dementia and APOE genotypes in an ageing East African population. Neurobiol Aging 2010;31:732–40. 101. Nesse RM. How is Darwinian medicine useful? West J Med 2001;174:358–60. Washington, DC, 2006, 1–14. 112. Bloom BS, de Pouvourville N, Straus WL. Cost of illness of Alzheimer’s disease: how useful are current estimates? Gerontologist 2003;43:158–64. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 association 106. Erdman SE, Rao VP, Olipitz W et al. Unifying roles for regulatory T cells and inflammation in cancer. Int J Cancer 3 Evolution, Medicine, and Public Health [2012] pp. 3–13 article doi:10.1093/emph/eos003 Why are male malaria parasites in such a rush? Sex-specific evolution and host–parasite interactions 1 Leiden Malaria Research Group, Department of Parasitology, LUMC, Albinusdreef 2, 2333 ZA Leiden, The Netherlands; 2 Centre for Immunity, Infection and Evolution, Institutes of Evolution, Infection and Immunity, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JT, UK; 3Division of Infection and Immunity, Institute of Biomedical Life Sciences & Wellcome Centre for Molecular Parasitology, University of Glasgow, Glasgow G12 8TA, UK; 4Department of Bioinformatics, Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Pawinskiego 5a, 02-106 Warszawa, Poland. *Corresponding author. E-mail: sarah.reece@ed.ac.uk; tel:+44-131-650-5547; fax:+44-131-650-6564. y These authors contributed equally to this work. Received 30 August 2012; revised version accepted 11 September 2012. ABSTRACT Background: Disease-causing organisms are notorious for fast rates of molecular evolution and the ability to adapt rapidly to changes in their ecology. Sex plays a key role in evolution, and recent studies, in humans and other multicellular organisms, document that genes expressed principally or exclusively in males exhibit the fastest rates of adaptive evolution. However, despite the importance of sexual reproduction for many unicellular taxa, sex-biased gene expression and its evolutionary implications have been overlooked. Methods: We analyse genomic data from multiple malaria parasite (Plasmodium) species and proteomic data sets from different parasite life cycle stages. Results: The accelerated evolution of male-biased genes has only been examined in multicellular taxa, but our analyses reveal that accelerated evolution in genes with male-specific expression is also a feature of unicellular organisms. This ‘fast-male’ evolution is adaptive and likely facilitated by the male-biased sex ratio of gametes in the mating pool. Furthermore, we propose that the exceptional rates of evolution we observe are driven by interactions between males and host immune responses. Conclusions: We reveal a novel form of host–parasite coevolution that enables parasites to evade host immune responses that negatively impact upon fertility. The identification of parasite genes with accelerated evolution has important implications for the identification of drug and vaccine targets. Specifically, vaccines targeting males will be more vulnerable to parasite evolution than those targeting females or both sexes. K E Y W O R D S : sex-specific selection; Plasmodium; host–parasite coevolution; gene expression ß The Author(s) 2012. Published by Oxford University Press on behalf of the Foundation for Evolution, Medicine, and Public Health. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Shahid M. Khan1,y, Sarah E. Reece2,*,y, Andrew P. Waters3, Chris J. Janse1 and Szymon Kaczanowski4,y 4 | Reece et al. Evolution, Medicine, and Public Health INTRODUCTION all life cycle stages, except for a brief zygote phase, are haploid, and sex determination does not involve sex chromosomes or regions of contiguous genes [15]. Therefore, sexual dimorphism in Plasmodium is generated solely by differential gene expression, which is observed throughout the genome (Fig. S1). Depending on population structure, the ratio of male to female gametes may be male-biased in Plasmodium, but because gametogenesis involves only three more rounds of mitosis for each male compared with each female, the opportunity for mutational bias to result in fast-male evolution is much lower than for many multicellular organisms. Here, we integrate advances in genomics and proteomics to examine the evolutionary forces on genes encoding features of Plasmodium sexual cells. We compare divergence between closely related pairs of species and reveal rapid, adaptive evolution of male-biased genes in Plasmodium. The consequences of host–parasite interactions for the evolution of parasite sexual cells are poorly understood, but our analyses suggest a role for host immune responses in driving ‘fast-male’ evolution. Our findings demonstrate the necessity of an evolutionary framework for the development of medical interventions that disrupt disease transmission by preventing parasites from mating. METHODOLOGY We take advantage of proteomic data for malaria parasites [16] to analyse the rate of change of genes that are expressed in different life cycle stages, across several Plasmodium species. First, we test whether the rapid evolution of genes exclusively expressed in males occurs in unicellular parasites. Second, having found that male-biased genes evolve more rapidly than female-biased genes, we asked whether selective forces resulting from host–parasite interactions could influence this sex-specific evolution. Third, we investigated whether the rapid evolution of male-biased genes is due to the relaxation of selection constraints or adaptive evolution. Generation of data sets Plasmodium parasites replicate asexually in the circulation of the host and must undergo a round of sexual reproduction in the mosquito vector to be Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Explaining variation in rates of molecular evolution is fundamental to evolutionary biology and is especially important for organisms that respond rapidly to shifts in the environment, such as diseasecausing parasites and microbes that perform ecosystem services. In multicellular organisms, genes expressed principally or exclusively in one sex evolve at an accelerated rate compared with genes expressed in both sexes [1]. In particular, rapid adaptive evolution occurs in male-specific genes whose expression is associated with tissues and traits that underpin mating success and fertility [2–5]. This solution, to the problem of different optimal phenotypes in males and females, has been extensively documented for multicellular taxa [6, 7], but whether males and females of dioecious unicellular taxa are also subject to different, or opposing, selection pressures has been overlooked. This is surprising since sexual reproduction is an obligate feature of the life cycles of many parasite species. There is increasing interest in the reciprocal approach of using an evolutionary framework to understand the biology of parasites and exploiting the novel experimental opportunities provided by these species to test the generality of evolutionary theories. For example, blocking the fertility of gametes to prevent sexual reproduction is a priority target for the development of transmissionblocking interventions against malaria [8–12]. Understanding how selection shapes the evolution of parasite sexes and mating systems is therefore central to identifying the most ‘evolution-proof’ transmission-blocking targets and implementation strategies. For evolutionary biology, distinguishing between the footprints of natural and sexual selection pressures and identifying the biological mechanisms that drive the accelerated evolution observed in male-biased genes has proved difficult. For example, most males produce large numbers of sperm, and so multiple rounds of mitosis could result in a higher mutation rate in males than females. Alternatively, in mammals, genes on the Y chromosome could evolve rapidly simply because the lack of repair during recombination facilitates mutation [1], but X-linked genes could also evolve quickly because recessive mutations are exposed to selection in males [13, 14]. Using unicellular organisms such as malaria (Plasmodium) parasites overcomes many of these problems of inference because Reece et al. Fast-male malaria parasites | 5 each proteome. Proteins are subdivided into putative membrane and non-membrane proteins. Numbers inside the circles refer to the number of putative non-membrane proteins (solid background) and putative membrane proteins (shaded background) detected exclusively in proteomes of Males, Females, Asexual Stages or detected in all three stages (All Stages). (B–D) Rates of evolution determined by comparing genes from each closely related pair of Plasmodium species. The genes used for this analysis are the orthologs of the P. berghei genes identified in (A). P. berghei and P. yoelii (B); P. falciparum and P. reichenowi (C); P. vivax and P. knowlesi (D). transmitted [17, 18]. A small percentage of asexual stages differentiate into sexual stages, termed gametocytes. Male and female gametocytes differentiate into male and female gametes as soon as they are taken up in a mosquito blood meal. Within 10–12 min, each male gametocyte undergoes three rounds of mitosis, producing up to eight gametes, and each female gametocyte differentiates into a single gamete. Male and female gametocytes each express a specific and distinct set of proteins [16]. Using the proteomes of males, females and asexual blood stages, we distinguished Plasmodium berghei genes based on their expression [unambiguous identification of two or more high scoring (MASCOT score >15) peptides] in different stages during intra-erythrocytic development. We classified genes (Fig. 1A; Table S1, PB) as having male-biased expression (termed Male), female-biased expression (Female), expression in asexual and gametocyte stages (All Stages) and expression in asexual blood stages but not in gametocytes (Asexual Blood). We identified orthologs of the P. berghei genes using reciprocal BLAST for the following species: Plasmodium yoelii, Plasmodium vivax, Plasmodium Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Figure 1. Stage-specific rates of evolution. Estimated as dN/dS, for genes expressed in sexual (male, female) and asexual blood stage malaria parasites. (A) Pattern of ‘stage-specific’ expression of genes based on proteomes of P. berghei sexual and asexual blood stages. Red represents the total number of proteins identified in 6 | Reece et al. Evolution, Medicine, and Public Health knowlesi, Plasmodium reichenowi and Plasmodium falciparum (Table S1, PF and PV). We assume stage specificity of gene expression is similar across Plasmodium species, as previously suggested for sporozoites [19], asexual stages [20] and gametocytes [21]. We also subdivided each of the four categories into putative membrane and nonmembrane proteins, based on the presence of predicted signal peptides, transmembrane domains and glycosylphosphatidylinositol anchors (Table S1, PB, PF and PV). Plasmodium berghei gene models Protein selection We used proteins identified by Khan et al. [16] and describe the mapping of these data onto the new release for P. berghei in Table S6. We assume that membrane proteins contain more than one TM segment or signal peptide, predicted using hidden Markov models (http://www.cbs.dtu.dk/services/ SignalP/) [22]. Orthologous proteomes were predicted using reciprocal BLAST with e-value <1 106 [23]. Sets of proteins containing epitopes were obtained from PlasmoDB [24] as were values of dN, dS and pN/pS for the alignment of 3D7 versus Ghana isolates and the other P. falciparum strains (IT, DD2 and HB3). Codon alignments We used predicted sequences of P. berghei and P. vivax and genomes of P. yoelii and P. knowlesi obtained from PlasmoDB [23]. Genome regions containing orthologs of analysed sequences were predicted using the BLASTN program. Exons of genes were predicted using the sim4 program [25]. Sequences of exons were aligned using the dN/dS analysis For each expression class, we compared the rate of non-synonymous (dN) and synonymous (dS) nucleotide substitutions of the residue encoding sites of the same gene from pairs of closely related malaria parasites (Table S2). We calculated dN/dS between the rodent malaria parasites P. berghei versus P. yoelii, the human and non-human primate malaria parasites P. falciparum versus P. reichenowi and P. vivax versus P. knowlesi. We used these pairs of closely related species to compare the speed of evolution of different protein classes. The evolution of the gene classes we analyse occurred independently in each pair, and our observations are supported across all species pairs. Estimates of dN and dS were each obtained using PAML version 3.13d, under a codon-based model with average nucleotide frequencies estimated from the data at each codon position. We calculated dN/dS for genes for which the alignment between the two species was longer than 20 codons. For each pair of orthologous genes, the value of dN/dS was estimated from the ratio of maximum likelihood estimates of the number of non-synonymous substitutions per nonsynonymous nucleotide site (dN) and the number of synonymous substitutions per synonymous nucleotide site (dS). We also investigated whether any gene models used in our analysis are fragments that have been concatenated with others to form larger and more accurate gene models in the 8X P. berghei ANKA release. Of the 809 genes analysed, only three (one expressed in All Stages and two expressed in Males) are now predicted to be part of larger genes. Furthermore, for comparison with the results presented in Table S2, PB (for P. berghei versus P. yoelii), we have repeated the analyses using the new gene models and the results do not differ (Table S6). McDonald–Kreitman test To formally test for a role of positive selection, we used a modified version of the McDonald–Kreitman test [27, 28]. The parameter refers to the fraction of amino acid substitutions driven by positive Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 The gene models deposited in PlasmoDB are constantly being improved, and during the preparation of this article, a new release of the P. berghei genome became available. Therefore, we have adopted the naming convention based on the 8X P. berghei ANKA release 2010-06-01 (available at http:// plasmodb.org/plasmo/showXmlDataContent.do; jsessionid=3A2D8DAC94289AB19ADA50C7810AD CE?name=XmlQuestions.DataSources). The annotation for P. berghei was obtained, with permission, from the Pathogen Sequencing Unit at the Wellcome Trust Sanger Institute. Needelman and Wunsch algorithm [26]. Codons neighbouring the gaps were removed from alignments. All calculations were performed using perl pipeline. Alignments containing stop codons or <60 codons were removed. Reece et al. Fast-male malaria parasites Statistical significance Statistical significance was based on 10 000 mean values (dN, dS, dN/dS and pN/pS) for two sets bootstrapped independently and than compared. For the McDonald–Kreitman test, we compared frequencies using the DoFE-all method, which returns a value of alpha and 95% credibility intervals. RESULTS Fast-male evolution in Plasmodium The P. berghei versus P. yoelii (P = 0.01) and P. falciparum versus P. reichenowi (P = 0.001) comparisons revealed that Male genes are evolving significantly faster than Female genes (Fig. 1B and C). A similar trend, although not significant (P = 0.11), is observed in the Male versus Female genes in the P. vivax and P. knowlesi comparison (Fig. 1D). Values for dN and dS for all comparisons are plotted in Supplementary Fig. S2, and we obtain qualitatively similar results when analysing dN alone. Interestingly, in all comparisons (all P > 0.08), Male genes show high average dN/dS values that are comparable to those of Asexual Blood genes. Proteins of Asexual Blood stages have been shown to evolve rapidly due to selective pressures from the host immune system [29]. As expected from observations of multicellular organisms [4], genes commonly expressed (e.g. housekeeping proteins) in All Stages show significantly lower dN/dS values than stage-specific genes, for all comparisons (all P < 0.03). Genes encoding proteins with a membrane location in Asexual Blood stages have been shown to accumulate mutations faster than non-membrane proteins, which is thought to be due to selective pressures resulting from host immunity [29]. We tested whether the accumulation of mutations was also greater for genes encoding membrane proteins and particularly in genes encoding Male proteins (Fig. S2A–C; Table S2, PB, PF and PV). We find the following patterns (Fig. 2). First, as expected, membrane proteins of Asexual Blood and All Stages evolve faster than their non-membrane proteins (P = 0.03 and P = 0.01, respectively, for P. berghei versus P. yoelii, and the same but non-significant trend for the other species comparisons). Second, for Females, genes encoding membrane proteins evolve faster (P < 0.01) than genes encoding non-membrane proteins, except in the P. falciparum versus P. reichenowi comparison where this trend is reversed (P = 0.01). However, only eight genes are included in the Female membrane subset for P. falciparum versus P. reichenowi, giving this analysis the least power (>20 genes are compared for the other species pairs). Third, Male genes encoding non-membrane genes are exceptional because there is no significant difference in rates of evolution between non-membrane and membrane proteins (all comparisons, P > 0.22). Furthermore, genes encoding Male non-membrane proteins are evolving faster than genes encoding Female nonmembrane proteins (P < 0.01 for both P. berghei versus P. yoelii and P. knowlesi versus P. vivax; P = 0.01 for P. falciparum versus P. reichenowi). In contrast, when comparing only genes encoding membrane proteins, Males are not evolving significantly faster than Females for P. berghei versus P. yoelii (P = 0.46), and Females are evolving faster than Males for the P. falciparum versus P. reichenowi (P = 0.04) and P. knowlesi versus P. vivax comparisons (P = 0.03). 7 Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 selection which is estimated from polymorphism and divergence. The effects of positive selection () can be distinguished from purifying selection and neutrality by comparing divergence (rate of fixation of non-synonymous mutations; dN/dS) between species to the observed relative rate of non-synonymous polymorphisms (pN/pS) in natural populations within a species. The data for each gene comprise a 2 2 table with rows corresponding to putative selected or non-selected sites and columns corresponding to polymorphism or divergence. We applied a maximum likelihood approach using the ‘distribution of fitness effects’ program (DoFE; [27, 28]). The DoFE-all method uses a maximum likelihood approach that maximizes the number of genes that can be analysed (even those with no polymorphism) and does not sum DN, DS, PN and PS values across genes. This is important for selecting genes under positive selection when signals are weak because it also avoids underestimating when mildly deleterious mutations obscure divergence by inflating polymorphism. A negative value for indicates that there are no amino acid substitutions driven by positive selection and that the majority of polymorphisms are removed due to purifying selection. | 8 | Reece et al. Evolution, Medicine, and Public Health Host–parasite interactions Figure 2. Stage- and location-specific rates of evolution. Estimated as dN/dS, for predicted non-membrane (solid bars) and membrane (shaded bars) proteins expressed in sexual (male, female) and asexual blood stage malaria parasites. Genes were classified according to their exclusive detection in P. berghei proteomes of Males, Females, Asexual Stages or All Stages (see Fig. 1), The rates of evolution were determined by comparison of genes of the following pairs of Plasmodium species: P. berghei and P. yoelii (A); P. falciparum and P. reichenowi (B); P. vivax and P. knowlesi (C). Relaxation of constraints Next, we investigated whether a relaxation of constraints resulting in genetic drift or the fixation of deleterious mutations could explain the accelerated evolution of genes coding for Male non-membrane proteins. First, we compared the rate of synonymous We next examined what aspects of parasite ecology could explain the accelerated evolution of Male compared with Female genes, and particularly the exceptional rates of evolution in Male nonmembrane genes. When taken up in a blood meal, male and female gametocytes must rapidly differentiate into gametes and mate. Host immune factors (resulting from exposure to circulating gametocytes) are also taken up in the blood meal and can reduce fertility, especially of male gametes, and block transmission [30–32]. In the host, male and female gametocytes are protected inside red blood cells, but in the blood meal, gametocytes become more vulnerable to immune factors because they must exit red blood cells during gametogenesis. In contrast to females, male gametogenesis is complex and involves rapid DNA replication, flagella and gamete construction, extrusion and detachment of gametes [33–35]. Male gametes must then travel through the blood meal to locate and fertilize females. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 mutations (dS) according to whether gene expression is Male- or Female-biased and encodes membrane or non-membrane proteins (Table S2, PB, PK and PV). There are no significant differences in dS when comparing genes coding for nonmembrane proteins of Females with both nonmembrane and membrane proteins of Males (P > 0.09). This is also the case for genes coding membrane proteins of Females when compared with non-membrane and membrane proteins of Males in the P. berghei versus P. yoelii comparison (P > 0.1). Second, we tested whether the frequency of nonsense mutations (i.e. premature termination and stop codons) differs according to the four different expression categories: Male, Female, All Stages and Asexual Blood. We also tested 789 genes of P. berghei and P. yoelii for which >20 codons were available (Table S3). A total of 10 (1.3%) contained nonsense mutations. The number of nonsense mutations in Male and Female genes do not significantly differ (P > 0.05) from that expected by chance (1.3%), which is consistent with random occurrence of nonsense mutations. The lack of variation in dS and the random occurrence of nonsense mutations suggest that the rapid evolution of Male genes is not due to a relaxation of constraints and thus that selection may play a role. Reece et al. Fast-male malaria parasites proteins experience greater levels of immune recognition than Female proteins and that the immunogenicity of Males is equivalent to Asexual Blood stages. We then tested whether the rate of nonsynonymous versus synonymous substitutions is higher in genes encoding Male proteins containing epitopes compared with those without epitopes. We analysed the polymorphisms (pN/pS) in gene sequences for Male membrane and non-membrane proteins, with and without epitopes, from the two most comprehensively sequenced P. falciparum strains/isolates (3D7 and Ghana; Table S4). Consistent with other studies [34], we find that pN/pS is high in Plasmodium and that genes encoding Male proteins containing epitopes do have a significantly higher pN/pS than Females for both non-membrane (Fig. 3B; P = 0.01) and membrane encoding genes (Fig. 3C; P < 0.01). Furthermore, the proportion of non-synonymous versus synonymous substitutions is particularly high for Male non-membrane proteins containing epitopes involved in cellular motor machinery such as kinesins or dyneins (Table S4). This suggests that interactions with the host immune system shape genes encoding Male non-membrane proteins. Adaptive ‘fast-male’ evolution Our analyses suggest that accelerated evolution of Male-biased genes encoding both non-membrane and membrane proteins is better explained by selection pressures resulting from interactions with host immunity than a relaxation of constraints. To test if Male-biased genes are evolving under positive selection, we used a modified version of the McDonald–Kreitman test (the ‘DoFE-all method’ [27, 28]) and data from natural isolates of P. falciparum (3D7, Dd2, HB3, 7G8, D10, D6, Santa Lucia, K1, RO-33, IT, FCC-2/Hainan, Senegal, IGH-CR14 [41–43]) compared with P. reichenowi, and we assume that mutations manifesting as polymorphism occur in more than four isolates. We calculated the fraction of residues fixed by positive selection () for all genes expressed in Male, Female, All Stages and Asexual Blood classes (Table 1), assuming that all nonsynonymous polymorphisms are neutral. When non-synonymous polymorphisms are deleterious and removed from the population, takes a negative value that is not expressed as a proportion 9 Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 The more complex activities of males are predicted to make them more vulnerable to host immune factors taken up in the blood meal [36– 39]. This could occur because immune factors interact with male proteins with both membrane and non-membrane locations (non-membrane proteins are not exposed in females) and/or immune attack may occur for both sexes, but male proteins may be more easily damaged and the consequences for fertility may be more severe. For example, male fertility is reduced by exposure to transmission-blocking immune factors, such as reactive oxygen species, that affect intracellular processes [40]. Interactions between male proteins and immune factors may also be indirect; for example, antibodies that agglutinate the surface proteins of male gametes may select for faster performance of intracellular proteins involved in flagella formation. Furthermore, senescence or damage may result in male gametocytes inappropriately being activated in the host circulation, which will present both their intracellular and membrane proteins to the host immune system (activated females may only present membrane proteins). To investigate whether interactions between parasites and host immune responses could drive the accelerated evolution of genes encoding Male proteins, we first tested whether Male proteins are more immunogenic than those of Females. Because the membrane or non-membrane location of proteins is not a definitive predictor of exposure to host immunity, we classified proteins based on whether they are known immune epitopes. We focused on P. falciparum because epitopes for this species have been characterized and experimentally confirmed. These data are available from the Immune Epitope Database and Analysis Resource (http://www.immuneepitope.org/; Table S4). Specifically, we tested whether the proportion of ‘immune epitopes’ of P. falciparum proteins differed between Males and Females and their predicted membrane or non-membrane location. This analysis reveals that Male non-membrane proteins have a significantly higher percentage of epitopes compared with those of Females and All Stages (Fig. 3A; Fisher exact test P = 0.045 and P < 0.0001). Furthermore, the percentage of Male non-membrane proteins with epitopes is comparable to that of Asexual Blood’ stages (Fisher exact test P = 0.098). Similar, but non-significant trends are present in the percent of Male membrane proteins with epitopes. This suggests that Male | 10 | Reece et al. Evolution, Medicine, and Public Health suggesting that the accelerated evolution of Male genes is driven by positive selection. This is the pattern observed in multicellular taxa: male-biased mutations manifesting as polymorphisms are often subject to positive selection [2–5]. DISCUSSION pN/pS, for P. falciparum proteins with (solid bars) or without predicted immune epitopes (shaded bars) expressed in sexual (male, female) and asexual blood stage malaria parasites. (A) Percentage of proteins containing immune epitopes identified from the Immune Epitope Database and Analysis Resource. Genes were classified according to their exclusive detection in P. berghei proteomes of Males, Females, Asexual Stages or All Stages (see Fig. 1). The strength of diversifying selection on P. falciparum predicted non-membrane (B) and membrane (C) proteins. (i.e. when is positive, it is expressed as a proportion, but when is negative, it can vary between 0 and infinity). For all expression categories—except for Male genes— is negative (Table 1; Table S5), Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Figure 3. The strength of diversifying selection. Estimated as Our analyses reveal that the accelerated evolution of male-biased genes is not exclusive to multicellular taxa and that sex-specific selection occurs in dioecious single-celled organisms without sex chromosomes. The biology of Plasmodium also sheds light on how the rapid and adaptive evolution of male-biased genes occurs. We can rule out a role of sex chromosomes and mutational bias because male gametogenesis only involves three mitoses. However, the ratio of male to female gametes will often be slightly male biased, depending on the multiplicity of infections and presence of transmission-blocking factors [37, 39]. This could result in male gametes behaving as large population (compared with females), resulting in faster removal of deleterious mutations and rapid fixation of beneficial mutations associated with male performance. Multicellular taxa have traditionally been the focus for investigating sex-dependent rates of evolution and their underlying causes, but we demonstrate that Plasmodium parasites are also a useful model for research in this area. Rapid evolution of male traits is often expected to be a response to male–male competition or adaptations to attract females [6, 44], but we show that natural selection driven by host–parasite interactions can shape the evolutionary trajectories of the sexes. How natural and sexual selection interact to affect fitness in a changing environment is a fundamental question in evolutionary biology and has important implications for adaptation and speciation [45]. Studies revealing fast-male evolution in multicellular taxa have implicated sexual selection as a driver [1], but processes such as intra-sex competition, sexual antagonism (conflict) and natural selection can all contribute to driving sex-dependent selection [46]. While our results show that natural selection pressures, in the form of host immune factors, shape sex-dependent selection in Plasmodium, sexual selection could also make a contribution. Plasmodium populations span from clonal to genetically diverse [47] and infections containing multiple con-specific genotypes provide the opportunity for ‘sperm’ competition Reece et al. | Fast-male malaria parasites 11 Table 1. Adaptive Male evolution Number of genes Male All Membrane excluded Female All Membrane excluded Asexual blood All Membrane excluded All stages All Membrane excluded Genes with frequent polymorphisms ; for frequent polymorphisms 163 139 21 17 0.07 0.26 74 57 8 5 0.94 0.41 137 109 25 19 0.52 0.79 237 211 18 15 5.35 7.07 between unrelated males. This could be evaluated by comparing male-biased rates of evolution across populations with different inbreeding rates. Another approach lies in the potential offered by unicellular organisms to genetically manipulate fast evolving genes to test whether their functions are adaptations for immune evasion or sperm competition. How the interplay between phenotypic plasticity and microevolution shapes phenotypes is a key question in evolutionary biology [48] and has implications for the development of transmissionblocking interventions [49, 50]. Evolutionary theory predicts that malaria parasites facultatively increase their investment in males during periods in infections when the host produces immune factors that reduce the fertility of males more than females [36, 51]. By revealing that males are more immunogenic than females, we support this theory and show that, in addition to evading host immunity through phenotypic plasticity in sex ratios [52], parasites can also respond with rapid microevolution. Moreover, by adjusting sex ratios to produce more males when conditions for mating are unfavourable [36, 51], parasites may benefit from maximizing both their immediate mating success and the potential for adaptive evolution in response to the conditions they experience. There is a drive to develop transmission-blocking interventions that, when administered to hosts, kill gametocytes (e.g. gametocidal drugs) or produce immune responses that are taken up in the blood meals of vectors and prevent parasites from mating (e.g. by vaccinating against the antigens of one or both sexes). Our results suggest that a vaccine harnessing host immunity to target males is unlikely to be a novel selection pressure for malaria parasites and that phenotypic plasticity and rapid microevolution could quickly undermine such an intervention. Conclusions and implications Because male-biased genes evolve at an accelerated rate, our results predict that interventions specifically targeting males are more vulnerable to parasite counter-evolution than interventions targeting female or universally expressed genes. Furthermore, our approach, coupled with phenotypic analyses, provides a powerful way to assess the evolutionary potential of candidate antigens and identify slower evolving targets. supplementary data Supplementary data are available at EMPH online. acknowledgements We thank P.R. Haddrill, D.J. Obbard and R.S. Ramiro for discussions and D.L. Hartl for advice. Thanks to M. Berriman and colleagues at the Wellcome Trust Sanger Institute for permitting us to use the latest iteration of the P. berghei genome and to A Eyre-Walker for advice on using DoFE software. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Estimates of the fraction () of fixed non-synonymous polymorphism due to positive selection, with or without restriction to frequent polymorphisms. Natural isolates of P. falciparum (Dd2, HB3, 7G8, D10, D6, Santa Lucia, K1, RO-33, IT, FCC-2/Hainan, 3D7, Senegal, IGH-CR14) and P. reichenowi are compared. We classify frequent polymorphisms as those observed in four or more isolates. 12 | Reece et al. Evolution, Medicine, and Public Health funding S.E.R. is supported by a Wellcome Trust Fellowship (WT082234MA); S.K. by a Columb fellowship from the Foundation for Polish Science and supporting grant (1/722/N-COST/2010/0) COST Action of the Polish Ministry of Science); C.J.J. by the EU Seventh Framework Program (FP7/2007-2013; 242095); S.E.R., S.K. and S.M.K. are also supported by the COST Action (BM0802), and S.M.K., A.P.W. and C.J.J. received support from the European Commission (FP7, EVIMalaR Network of Excellence). Funding to pay the Open Access publication charges for this article was provided by xxxxx. Conflict of interest: none declared. 13. Vicoso B and Charlesworth B. Evolution on the X chromosome: unusual patterns and processes. Nat Rev Genet 2006;7:645–3. 14. Khaitovich P, Hellmann I, Enard W et al. Parallel patterns of evolution in the genomes and transcriptomes of humans and chimpanzees. Science 2005;309:1850–4. 15. Paul REL, Brey PT and Robert V. Plasmodium sex determination and transmission to mosquitoes. Trends Parasitol 2002;18:32–8. 16. Khan SM, Franke-Fayard B, Mair GR et al. Proteome analysis of separated male and female gametocytes reveals novel sex-specific Plasmodium biology. Cell 2005;121:675–87. 17. Sinden RE. Sexual development of malarial parasites. Advan Parasitol 1983;22:153–216. 18. Sinden RE, Canning EU, Bray RS et al. Gametocyte and gamete development in Plasmodium falciparum. Proc Biol Sci 1978;201:375–99. 19. Lasonder E, Ishihama Y, Andersen JS et al. Analysis of the Plasmodium falciparum proteome by high-accuracy mass spectrometry. Nature 2002;419:537–42. 1. Ellegren H and Parsch J. The evolution of sex-biased genes 20. Zhou Y, Ramachandran V, Kumar KA et al. Evidence-based and sex-biased gene expression. Nat Rev Genet 2007;8: annotation of the malaria parasite’s genome using com- 689–98. parative expression profiling. PLoS One 2008;3:e1570. 2. Ranz JM, Castillo-Davis CI, Meiklejohn CD et al. Sex- 21. Silvestrini F, Lasonder E, Olivieri A et al. Protein export dependent gene expression and evolution of the marks the early phase of gametocytogenesis of the Drosophila transcriptome. Science 2003;300:1742–5. human malaria parasite Plasmodium falciparum. Mol Cell 3. Wyckoff GJ, Wang W and Wu CI. Rapid evolution of male reproductive genes in the descent of man. Nature 2000; 403:304–9. 4. Zhang Z, Hambuch TM and Parsch J. Molecular evolution Proteomics 2010;9:1437–48. 22. Emanuelsson O, Brunak S, von Heijne G et al. Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protocol 2007;2:953–71. of sex-biased genes in Drosophila. Mol Biol Evol 2004;21: 23. Altschul SF, Gish W, Miller W et al. Basic local alignment 2130–9. 5. Good JM and Nachman MW. Rates of protein evolution search tool. J Mol Biol 1990;215:403–10. 24. Aurrecoechea C, Brestelli J, Brunk BP et al. PlasmoDB: a are positively correlated with developmental timing of ex- functional genomic database for malaria parasites. pression during mouse spermatogenesis. Mol Biol Evol 2005;22:1044–52. 6. Andersson M. Sexual Selection. Princeton University Press: Princeton, 1994. 7. Darwin C. The Descent of Man and Selection in Relation to Sex. Murray: London, 1871. 8. Carter R. Transmission blocking malaria vaccines. Vaccine 2001;19:2309–14. 9. Williamson K. Pfs230: from malaria transmissionblocking vaccine candidate toward function. Parasit Immunol 2003;25:351–9. 10. Malkin EM, Durbin AP, Diemert DJ et al. Phase 1 vaccine trial of Pvs25H: a transmission blocking vaccine for Plasmodium vivax malaria. Vaccine 2005;23:3131–8. 11. Ramjanee S, Robertson JS, Franke-Fayard B et al. The use Nucleic Acids Res 2009;37:D539–43. 25. Ogasawara J and Morishita S. A fast and sensitive algorithm for aligning ESTs to the human genome. J Bioinform Comput Biol 2003;1:363–86. 26. Needleman SB and Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 1970;48:443–53. 27. Bierne N and Eyre-Walker A. The genomic rate of adaptive amino acid substitution in Drosophila. Mol Biol Evol 2004; 21:1350–60. 28. Eyre-Walker A. The genomic rate of adaptive evolution. Trends Ecol Evol 2006;21:569–75. 29. Volkman SK, Barry AE, Lyons EJ et al. Recent origin of Plasmodium falciparum from a single progenitor. Science of transgenic Plasmodium berghei expressing the 2001;293:482–4. 30. Aikawa M, Rener J, Carter R et al. An electron microscop- Plasmodium vivax antigen P25 to determine the ical study of the interaction of monoclonal antibodies with transmission-blocking activity of sera from malaria gametes of the malaria parasite Plasmodium gallinaceum. J vaccine trials. Vaccine 2007;25:886–94. 12. Sutherland C. Surface antigens of Plasmodium falciparum 31. Carter R and Chen DH. Malaria transmission blocked by gametocytes—a new class of transmission-blocking immunisation with gametes of the malaria parasite. vaccine targets? Mol Biochem Parasitol 2009;166:93–8. Nature 1976;263:57–60. Protozool 1981;28:383–8. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 references Reece et al. | Fast-male malaria parasites 32. Targett GAT. Plasmodium falciparum—natural and experi- 42. Gardner MJ, Hall N, Fung E et al. Genome sequence of mental transmission-blocking immunity. Immunol Lett the human malaria parasite Plasmodium falciparum. 1988;19:235–40. Nature 2002;419:498–511. 33. Janse CJ, Vanderklooster PFJ, Vanderkaay HJ et al. Rapid 43. Volkman SK, Sabeti PC, DeCaprio D et al. A genome-wide- repeated DNA-replication during microgametogenesis map of diversity in Plasmodium falciparum. Nat Genet and DNA-synthesis in young zygotes of Plasmodium berghei. Trans R Soc Trop Med Hyg 1986;80:154–7. 2007;39:113–9. 34. Kawamoto F, Alejoblanco R, Fleck SL et al. Plasmodium berghei—ionic regulation and the induction of gametogenesis. Exp Parasitol 1991;72:33–42. 35. Sinden RE, Canning EU and Spain B. Gametogenesis and fertilisation in Plasmodium yeolli nigeriensis: a trans- 44. Clutton-Brock T. Sexual selection in males and females. Science 2007;318:1882–5. 45. Ritchie MG. Sexual selection and speciation. Ann Rev Ecol Evol Systemat 2007;38:79–102. 46. Carranza J. Defining sexual selection and sex-dependent selection. An Behav 2009;77:749–51. mission electron microscope study. Proc Biol Sci 1976; 47. Nkhoma SC, Nair S, Cheeseman IH et al. Close kinship 193:55–76. 36. Gardner A, Reece SE and West SA. Even more extreme within multiple-genotype malaria parasite infections. Proc fertility insurance and the sex ratios of protozoan blood 48. Chevin L-M, Lande R and Mace GM. Adaptation, plasticity, parasites. J Theor Biol 2003;223:515–21. 37. Paul REL, Ariey F and Robert V. The evolutionary ecology of 38. Paul REL, Coulson TN, Raibaud A et al. Sex determination in malaria parasites. Science 2000;287:128–31. 39. West SA, Reece SE and Read AF. Evolution of gametocyte sex ratios in malaria and related apicomplexan (protozoan) parasites. Trends Parasitol 2001;17:525–31. Biol Sci 2012;279:2589–98. and extinction in a changing environment: towards a predictive theory. PLoS Biol 2010;8:e1000357. 49. Mideo N and Reece SE. Plasticity in parasite phenotypes: evolutionary and ecological implications for disease. Future Microbiol 2012;7:17–24. 50. van Dijk MR, van Schaijk BC, Khan SM et al. Three members of the 6-cys protein family of Plasmodium play a role in gamete fertility. PLoS Pathog 2010;6:e1000853. 40. Ramiro RS, Alpedrinha Jo, Carter L et al. Sex and death: 51. West SA, Smith TG, Nee S et al. Fertility insurance and the effects of innate immune factors on the sexual repro- the sex ratios of malaria and related hemospororin blood duction of malaria Parasites. PLoS Pathog 2011;7: e1001309. 41. Jeffares DC, Pain A, Berry A et al. Genome variation and evolution of the malaria parasite Plasmodium falciparum. Nat Genet 2007;39:120–5. parasites. J Parasitol 2002;88:258–63. 52. Reece SE, Drew DR and Gardner A. Sex ratio adjustment and kin discrimination in malaria parasites. Nature 2008; 453:609–15. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Plasmodium. Ecol Lett 2003;6:866–80. 13 148 o r i gi na l research article Evolution, Medicine, and Public Health [2013] pp. 148–160 doi:10.1093/emph/eot012 Ademir Jesus Martins*1,2, Luiz Paulo Brito1, Jutta Gerlinde Birggitt Linss1, Gustavo Bueno da Silva Rivas3, Ricardo Machado3, Rafaela Vieira Bruno2,3, Jose´ Bento Pereira Lima1, Denise Valle2,4 and Alexandre Afranio Peixoto2,3,y 1 Laborato´rio de Fisiologia e Controle de Artro´podes Vetores, Instituto Oswaldo Cruz—FIOCRUZ and Laborato´rio de Entomologia, Instituto de Biologia do Exe´rcito, Rio de Janeiro, RJ, 21040-360, Brazil, 2Instituto Nacional de Cieˆncia e Tecnologia em Entomologia Molecular, Brazil, 3Laborato´rio de Biologia Molecular de Insetos, Instituto Oswaldo Cruz— FIOCRUZ, Rio de Janeiro, RJ, 21040-360, Brazil and 4Laborato´rio de Biologia Molecular de Flavivirus, Instituto Oswaldo Cruz—FIOCRUZ, Rio de Janeiro, RJ, 21040-360, Brazil *Correspondence address. Laborato´rio de Fisiologia e Controle de Artro´podes Vetores, Instituto Oswaldo Cruz— FIOCRUZ and Laborato´rio de Entomologia, Instituto de Biologia do Exe´rcito, Rio de Janeiro, RJ, 21040-360, Brazil. Tel:+55 21 25621398; Fax:+55 21 25621308; E-mail: ademirjr@ioc.fiocruz.br y In memoriam. Received 9 March 2013; revised version accepted 9 June 2013 ABSTRACT Background and objectives: Mutations in the voltage-gated sodium channel gene (NaV), known as kdr mutations, are associated with pyrethroid and DDT insecticide resistance in a number of species. In the mosquito dengue vector Aedes aegypti, besides kdr, other polymorphisms allowed grouping AaNaV sequences as type ‘A’ or ‘B’. Here, we point a series of evidences that these polymorphisms are actually involved in a gene duplication event. Methodology: Four series of methods were employed: (i) genotypying, with allele-specific PCR (AS-PCR), of two AaNaV sites that can harbor kdr mutations (Ile1011Met and Val1016Ile), (ii) cloning and sequencing of part of the AaNaV gene, (iii) crosses with specific lineages and analysis of the offspring genotypes and (iv) copy number variation assays, with TaqMan quantitative real-time PCR. Results: kdr mutations in 1011 and 1016 sites were present only in type ‘A’ sequences, but never in the same haplotype. In addition, although the 1011Met-mutant allele is widely disseminated, no homozygous (1011Met/Met) was detected. Sequencing revealed three distinct haplotypes in some individuals, raising the hypothesis of gene duplication, which was supported by the genotype frequencies in the offspring of specific crosses. Furthermore, it was estimated that a laboratory strain selected for ß The Author(s) 2013. Published by Oxford University Press on behalf of the Foundation for Evolution, Medicine, and Public Health. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Evidence for gene duplication in the voltage-gated sodium channel gene of Aedes aegypti Sodium channel gene duplication in Aedes aegypti Martins et al. | 149 insecticide resistance had 5-fold more copies of the sodium channel gene compared with a susceptible reference strain. Conclusions and implications: The AaNaV duplication here found might be a recent adaptive response to the intense use of insecticides, maintaining together wild-type and mutant alleles in the same organism, conferring resistance and reducing some of its deleterious effects. K E Y W O R D S : gene duplication; kdr mutation; sodium channel; pyrethroid resistance; Aedes aegypti BACKGROUND AND OBJECTIVES Several mutations have been identified in the Ae. aegypti NaV gene (AaNaV) comprising the IIS5–S6 region: Gly923Val, Leu982Trp, Ile1011Met, Ile1011Val, Val1016Ile and Val1016Gly [12–16]. The Ile1011Met substitution was associated with low sensitivity to pyrethroids evidenced by electrophysiological assays [12] and was the most frequent in a resistant Brazilian natural Ae. aegypti population [14]. However, substitutions in another position, 1016 (Val/Ile in South and Central America and Val/Gly in Thailand), are presently attributed with a more important role in pyrethroid resistance, the 1016 substitutions appearing as a recessive trait [13, 16–18]. Outside domain II, a Phe1534Cys substitution in the IIIS6 region was also related to pyrethroid resistance [19]. Besides amino acid changes, nucleotide and insertion/deletion polymorphisms have been detected in intron 20 in the AaNaV IIS6 genomic region that enable grouping the sequences in two categories, type ‘A’ or type ‘B’. The Ile1011Met and Val1016Ile mutations are found only in type ‘A’ sequences [14]. Herein, we further investigated the nature of this polymorphism. Sequencing of the AaNaV IIS6 genomic region and alelle specific-PCR (AS-PCR) typing of the 1011 and 1016 sites revealed, in several cases, three haplotypes in the same mosquito. Besides, in no case were homozygous specimens for the 1011Met mutation in natural populations detected. Crosses between laboratory-selected genotypes and copy number variation assays strongly suggested the occurrence of duplication events in the sodium channel gene, at least for the studied genomic region. MATERIALS AND METHODS Mosquitoes Rockefeller strain, continuously reared in the laboratory as a standard for insecticide susceptibility and life-history trait parameters, was used as reference for wild-type alleles for the voltage-gated sodium Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 The use of DDT as public health insecticide was one of the factors responsible for the yellow fever mosquito eradication in many Latin American countries in the 1950s [1]. Since the reintroduction of Aedes aegypti to South America, organophosphates and, subsequently, pyrethroid insecticides have been extensively used in governmental campaigns as well as in residential or private services. Pyrethroids have similar effects as DDT but with a lower residual effect in the environment, and they represent nowadays the main class of insecticide against arthropods, not only those of medical and veterinary importance but also in relation to agriculture and livestock [2]. In Brazil, despite the recent introduction of pyrethroids in campaigns for dengue control throughout the whole country, resistance to these compounds has already been detected in many Ae. aegypti populations [3, 4]. Pyrethroids and DDT have a rapid effect on the insect central nervous system, leading to repetitive and involuntary muscular contractions, followed by paralysis and death, commonly reported as knockdown effect [5, 6]. Accordingly, resistance to this is referred to as knockdown resistance (kdr), the principal cause being a mutation in the pyrethroid/DDT target site, the voltage-gated sodium channel (NaV). The NaV is an axonic transmembrane protein composed of four homologous domains (I–IV), each one with six hydrophobic segments (S1–S6) [7]. To date, most of the kdr mutations described lie in the NaV IIS6 region, and the Leu/ Phe substitution in the 1014 site (numbered according to the Musca domestica amino acid primary sequence) is by far the most common among all studied insects. Relatively recent analyses of kdr mutations in a series of arthropod species contributed to the knowledge concerning evolution and dynamics of pyrethroid resistance in natural populations. This effort is essential to formulate strategies able to prolong the effectiveness of pyrethroids in the field and to develop new compounds targeting the sodium channel [8, 9]. Some extensive reviews of kdr mutations are available [2, 10, 11]. 150 | Martins et al. Evolution, Medicine, and Public Health channel gene. The EE lineage was originated from laboratory selection pressure for nine consecutive generations with the pyrethroid deltamethrin using a sample of a natural population from Natal (a locality from the Northeast of Brazil) that did not harbor the mutation in the 1016 site [20]. Rearing and maintenance of the colonies were conducted according to standard laboratory conditions [21]. Field populations were obtained by sampling as described elsewhere [13]. Molecular assays Crossing experiments Crosses were performed between mosquitoes from Rockefeller and EE strains, respectively, homozygous (Ile/Ile) and apparently ‘heterozygous’ (Ile/Met) for the 1011 site. Each couple of one male and one virgin female was maintained for at least 3 days in conical 50 ml tubes covered with a mesh tulle under a cotton wool soaked in sugar solution. Females were then blood-fed on anesthetized mice, 24 h after sugar removal. Individual females were induced to lay eggs in small Petri dishes lined with wet filter paper [22]. Resulting F1 larvae were reared Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Genotyping by allele-specific PCR (AS-PCR) for the AaNaV 1011 site and sequencing of the IIS6 genomic region were performed with the DNA from the same specimens genotyped for the 1016 alleles, described in a previous report [13]. PCR discriminating type ‘A’ or ‘B’ sequences (see [14]) was carried out in 12.5 ml reactions containing 1 mM of each primer ‘forward’ (50 -AGGCTGACTGAAAGTAAATTGG-30 ) and ‘reverse’ (50 -CAAAAGCAAGGCTAAGAAAAGG-30 ), 6.25 ml of GoTaq Green Master Mix 2X (Promega) and 0.5 ml of genomic DNA, submitted for 30 denaturation, annealing and extension cycles under, respectively, 94 C/30", 60 C/1’ and 72 C/45". The amplified region includes the intron 20, polymorphic in size, in the AaNaV IIS6 region. For the 1011 site genotyping, PCR with 0.24 mM of common and 0.12 mM of each of the two specific primers [17] was performed as above, with 30 cycles of denaturation, annealing and extension under, respectively, 94 C/30", 57 C/1’ and 72 C/45’’ conditions. The PCR products were analyzed in 10% polyacrylamide gel electrophoresis stained in 1 mg/ml ethidium bromide solution. The AaNaV IIS6 region was amplified, cloned and sequenced as previously reported [14] in individual specimens from Uberaba, Cuiaba´, Aparecida de Goiaˆnia, Maceio´ and Fortaleza. Sequences of at least eight clones of each insect were analyzed. The numbers of copies of the AaNaV IIS6 genomic region were compared among the Rockefeller strain, the EE lineage and their F1 offspring (Hyb). DNA was extracted from pools of 10 L3 larvae (20 mg) with the kit Insect DNA Extraction (Zymo Research) according to the manufacturer’s instructions, brought to 5 ng/ml in H2O and aliquoted. Real-time PCR reactions were carried out based on instructions of customized TaqMan Copy Number Assay (Applied Biosystems) in 15 ml, containing 7.5 ml of 2 TaqMan Genotyping Master Mix (Applied Biosystems), 0.75 ml of 20 mix composed of primers and probes for both target and reference genes, 20 ng of DNA and H2O. The chosen single copy reference was the ribosomal gene RP49 (GenBank accession number AY539746), with primers AaRP49_F: 50 -ACATCGGTTACGGATCGAACA AG-30 , AaRP49_R: 50 -TGTGGACCAGGAACTTCTTG AAG-30 and probe AaRP49_M: 50 -VIC-CACCCGCCA TATGCT-MGB-NFQ-30 . The target was determined based on the AaNaV IIS6 region (GenBank accession number FJ479613) with primers AaNaVex20_F: 50 ACCGACTTCATGCACTCATTCAT-30 , AaNaVex20_R: 50 -ACAAGCATACAATCCCACATGGA-30 and probe AaNaVex20_M: 50 -FAM-CCACTCGCCGCATAAT0 MGB-NFQ-3 . Three assays were performed with DNA from three distinct pools of each lineage, in triplicate/assay. Reactions were conducted in an ABI StepOne Thermocycler (Applied Biosystems), following standard cycling conditions for TaqMan Genotyping assays. The CTs for the target (AaNaV) and reference (RP49) genes were determined based on automatic threshold indicated by the StepOne Software v2.0. Given the CT of each sample, their CTs were established, intended to normalize the amount of amplified products from AaNaV by RP49, and then the average of the replicates from each pool CT ([CT]) was calculated. The CT of the test lineages (EE and Hyb) were obtained by the difference between their [CT] and that of Rockefeller. Finally, the average of CTs from the three assays ([CT]) was calculated in order to estimate the number of AaNaV copies, normalized by RP49, related to Rockefeller. The diploid number of the target sequence of the tested sample was determined by the formula: cnc2CT, where cnc is the copy number of the target sequence in the reference sample and CT is the difference between the CT for the tested sample and the reference sample. Sodium channel gene duplication in Aedes aegypti until adults for genotyping by AS-PCR or for subsequent crossings to obtain F2, performed as above. Ethics statement Mosquito blood feeding Aedes aegypti females were fed on anesthetized mice (ketamine:xylazine 80–120:10–16 mg/kg), according to institutional procedures, oriented by the national guideline ‘the Brazilian legal framework on the scientific use of animals’ [23]. This study was reviewed and approved by the Fiocruz Ethics Committee on Animal Use (CEUA/FIOCRUZ), license number: L-011/09. RESULTS Typing of 1011 and 1016 sodium channel sites in Ae. aegypti natural populations by AS-PCR The allele frequencies of the AaNaV 1011 site were evaluated in the same mosquitoes which had the 1016 site analyzed previously, belonging to samples from 15 Brazilian localities [13]. The 1011Met-mutant allele was found in all localities, except in Boa Vista. In seven localities, specimens were divided into pyrethroid susceptible (S) or resistant (R) [13]. Table 1 shows allele frequencies considering both 1011 and 1016 sites together, combined in six molecular phenotypes, derived from three potential haplotypes (1011Ile+1016Val, 1011Ile+1016Ile and 1011Met+1016Val). We assumed that the recombinant haplotype containing both mutant alleles (1011Met+1016Ile) was not expected, because these sites are very close in the genome and both mutations are likely to be very recent. We observed that the 1011Ile/Ile+1016Ile/Ile combination, i.e. homozygous for the wild-type and for the mutant allele, respectively, in the 1011 and 1016 sites, was far more frequent among resistant than susceptible insects. This suggests that the 1016 site is probably more important for pyrethroid resistance than the 1011 site. Two other striking results can also be observed. First, we did not detect any specimen ‘homozygous’ for the 1011Met (1011Met/Met+1016Val/Val) mutation. Second, there is a higher than expected frequency of the 1011Ile/Met+1016Val/Val molecular phenotype in all samples, except the near monomorphic Boa Vista population (Table 1). Although the individual tests of the Hardy–Weinberg expectations for each sample were significant only in four cases, likely due to the small sample sizes, the lack of the 1011Met/Met+1016Val/Val molecular phenotype and the excess of 1011Ile/Met+1016Val/Val were observed in almost all populations. Two simple hypotheses were considered to explain this pattern. One possibility is that the 1011Met mutation is involved in a gene duplication, carrying both the mutant (1011Met+1016Val) and the wild-type allele (1011Ile+1016Val). In this case, the 1011Met/Met genotype would never be detected by the AS-PCR, because that duplication would generate a molecular phenotype mimicking a heterozygous 1011Ile/ Met. Alternatively, one might argue that the 1011Met mutation is lethal when in homozygosis. However, this is not the case ([16], see ‘Discussion’ section herein), and it does not explain the increased frequency of 1011Ile/Met+1016Val/Val, unless one also assumes this particular combination has a higher fitness. In order to better understand these data, we cloned and sequenced the IIS6 region from a number of mosquitoes. Sequencing of the IIS6 region of the Ae. aegypti sodium channel gene We obtained sequences of the AaNaV IIS6 region from a number of mosquitoes from five Brazilian populations (see ‘Materials and Methods’ section for details) and confirmed the polymorphism in this genomic region. Figure 1 shows the haplotypes and their respective submission numbers in GenBank. Sequences were classified as ‘A’ or ‘B’, according to two synonymous substitutions in exon 20 and differences in the intron (see [14] for details). The Ile1011Met substitution was seen in all studied populations, whereas Val1016Ile was not detected in the Northeastern localities (Maceio´ and Fortaleza). Both substitutions were present only in sequences type ‘A’, and among sequences from 40 individuals, no haplotype shared substitutions in both the 1011 and 1016 sites, indicating no recombinants between the two mutations. As mentioned above, this was expected considering that these sites are very close, and the mutations are likely to be very recent. Hence, only four haplotypes were observed 151 Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Entomological survey All field egg collections were conducted by agents from each respective State Health Secretariat, following procedures designed by the National Program of Dengue Control/Brazilian Ministry of Health. All ovitraps were installed and collected in the houses with residents’ permission. Martins et al. | R S R S R S R S R S R S R S * * * * * * * * Aparecida de Goiaˆnia 0.056 (0.094) 0.105 (0.305) 0.045 (0.052) 0.118 (0.221) 0.231 (0.148) 0.571 (0.617) 0 (0.035) 0.250 (0.303) 0.250 (0.391) 0.313 (0.431) 0.467 (0.538) 0.333 (0.444) 0.043 (0.030) 0.300 (0.276) 0.950 (0.930) 0.200 (0.090) 0 (0.191) 0 (0.003) 0.900 (0.903) 0.300 (0.423) 0.938 (0.938) 0.650 (0.681) 1016Val/Val 0 (0.204) 0.053 (0.029) 0.273 (0.299) 0.118 (0.138) 0 (0.325) 0.143 (0.112) 0.063 (0.223) 0.250 (0.275) – – – – 0.087 (0.204) 0.050 (0.184) 0 (0.095) 0 (0.255) 0.250 (0.191) 0.053 (0.078) – – – – 1016Val/Ile 1011Ile/Ile 0.222 (0.111) 0 (0.001) 0.455 (0.435) 0 (0.022) 0.385 (0.179) 0 (0.005) 0.500 (0.353) 0.100 (0.063) – – – – 0.391 (0.345) 0.050 (0.031) 0.050 (0.003) 0.250 (0.181) 0.063 (0.048) 0.526 (0.543) – – – – 1016Ile/Ile 0.500 0.842 0.091 0.588 0.308 0.286 0.313 0.350 0.750 0.688 0.533 0.667 0.174 0.400 – 0.200 0.625 0.053 0.100 0.700 0.063 0.350 (0.165) (0.301) (0.022) (0.095) (0.455) (0.061) (0.289) (0.221) (0.465) (0.052) (0.360) (0.148) (0.020) (0.260) (0.220) (0.469) (0.451) (0.391) (0.444) (0.083) (0.315) 1016Val/Val 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.222 (0.240) 0 (0.022) 0.136 (0.150) 0.176 (0.112) 0.077 (0.163) 0 (0.020) 0.125 (0.048) 0.050 (0.100) – – – – 0.304 (0.281) 0.200 (0.240) – 0.350 (0.234) 0.063 (0.150) 0.368 (0.310) – – – – (0.130) (0.177) (0.013) (0.146) (0.037) (0.037) (0.037) (0.040) (0.141) (0.118) (0.071) (0.111) (0.057) (0.090) – (0.076) (0.118) (0.044) (0.003) (0.123) (0.001) (0.031) 1016Val/Val 1011Met/Met 1016Val/Ile 1011Ile/Met Frequency of genotypes: observed (and expected assuming Hardy–Weinberg equilibrium) Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 14.7, 5, 0.0119 2.9, 5, 0.7204 1.1, 5, 0.9571 6.8, 5, 0.2347 11.2, 5, 0.0473 1.0, 5, 0.9589 15.6, 5, 0.0080 3.5, 5, 0.6213 5.8, 2, 0.0561 4.4, 2, 0.1114 2.0, 2, 0.3709 3.8, 2, 0.1534 5.5, 5, 0.3619 6.2, 5, 0.2860 1.9, 2, 0.3772 11.1, 5, 0.0487 11.7, 5, 0.0388 2.1, 5, 0.8408 0.06, 5, 0.9727 5.8, 2, 0.0551 0.02, 2, 0.9917 0.9, 2, 0.6377 2, df, P HWE Frequencies observed and expected (for Hardy–Weinberg equilibrium) of the molecular phenotypes derived by AS-PCR for the sites 1011 and 1016 in the same insects. In the header, the mutant alleles are underlined. Some populations are divided regarding their resistant (R) or susceptible (S) status to pyrethroid resistance. Populations whose individuals were not divided in R or S are marked with an asterisk (*) in status. The absence of the mutations 1011Ile/Met and 1016Val/Ile in a population is represented as endash (–). The last column gives the result of 2 analyses for testing Hardy–Weinberg equilibrium (HWE). The 1016 genotyping data were already presented elsewhere [13]. 18 19 22 17 13 14 16 20 16 16 15 15 23 20 20 20 16 19 20 20 16 20 n | Martins et al. Boa Vista Cachoeiro do Itapemirim Colatina Foz do Iguac¸u Ijuı´ Macapa´ Santa Ba´rbara Santa Rosa Uberaba Maceio´ Fortaleza Dourados Cuiaba´ Campo Grande Status Locality Table 1. Phenotypic frequency, considering AaNaV 1011 and 1016 sites, of Ae. aegypti natural populations from Brazil 152 Evolution, Medicine, and Public Health Sodium channel gene duplication in Aedes aegypti Figure 1. Diversity of a voltage-gated sodium channel gene region observed in Ae. aegypti Brazilian populations. Part of the region corresponding to the AaNaV exons 20 and 21, and the intron between them, are represented. A and B indicate the type of intron, as previously stated [14]. In red, the presumed amino acids for the sites 1011 and 1016. Genomic sequences representative for each haplotype were submitted to GenBank: 1011Ile+B+1016Val (GenBank accession number: FJ479613), 1011Ile+A+1016Val (FJ479611), 1011Met+A+ 1016Val (FJ479612) and 1011Ile+A+1016Ile (JX275501). TIGR = sequence from Ae. aegypti genome project (Vectorbase) 153 does not occur in all individuals, being therefore a polymorphic trait. In the samples analyzed, we detected mosquitoes ‘homozygous’ for the 1011Ile+ B+1016Val, 1011Ile+A+1016Val and 1011Ile+ A+1016Ile haplotypes, all having the wild-type allele for the 1011 site. However, the ‘1011Met+ A+1016Val’ (mutant in the 1011 site) haplotype was never detected in ‘homozygosis’, but always in association with ‘1011Ile+B+1016Val’, suggesting that the duplication involves these two variants (Table 2). Figure 2 presents a schematic representation of AaNaV haplotypes proposed for the populations analyzed based on our duplication hypothesis. The offspring of crosses between some combinations of parental genotypes was further analyzed in order to test this hypothesis. Crossing experiments In order to test the duplication hypothesis, we performed crosses between specimens with known molecular phenotypes (based on AS-PCR) and determined the frequency of the variants in the AaNaV 1011 site in their offspring. Initially, we evaluated the F1 of seven couples, each composed of a homozygous wild-type (1011Ile/Ile) and a putative heterozygous or duplicated (1011Ile/Met) progenitor, belonging, respectively, to the Rockefeller and the EE lineages. The latter originated from a laboratory population selection for pyrethroid resistance using a sample from a natural population that did not harbor the mutation Val1016Ile [20]. The results are shown in Table 3, with expected values and the Fisher tests for the three different hypotheses in Fig. 3, assuming either a duplication or no duplication. If the 1011Ile/Met parent did not harbor the duplicated haplotype, the offspring would present the Ile/Ile and Ile/Met genotypes in equal frequencies (Hypothesis 1). Assuming the occurrence of a duplication, one would expect the offspring genotyped as either 100% Ile/Met or alternatively Ile/Ile and Ile/Met in equal frequencies, respectively, if the parent was homozygous (Hypothesis 2a) or heterozygous (Hypothesis 2b) for the duplicated haplotype (Fig. 3). Two out of seven crosses (#3 and #4) had the 1011Ile/Ile genotype in around half of their offspring, which was thus not informative. In these two cases, this could be explained if the progenitor harboring the 1011Met mutation was heterozygous for the duplication (1011Ile/Ile_Met) as well as if it was heterozygous for non-duplicated haplotypes. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 (1011Ile+A+1016Val, 1011Ile+A+1016Ile, 1011Ile+B+1016Val and 1011Met+A+1016Val) out of six possibilities, considering the type of sequence (‘A’ or ‘B’) and the sites 1011 (Ile or Met) and 1016 (Val or Ile) (Table 2). Moreover, the 1011Met+A+1016Val haplotype was only present in specimens which also harbored the 1011Ile+ B+1016Val haplotype, therefore, classified as ‘heterozygous’. Accordingly, typing of various natural populations had revealed the absence of ‘homozygous’ for the 1011Met mutation (Table 1). Curiously, some specimens presented three haplotypes, which were in all cases: 1011Met+A+1016Val, 1011Ile+ A+1016Ile and 1011Ile+B+1016Val (Table 2). It is important to mention that females had their abdomen removed prior to DNA extraction in order to avoid eventual amplification of DNA from spermatozoids stored in the spermatechae, and there was no evidence of contamination in PCR negative controls. The last column of Table 2 presents the expected ‘genotypes’ through sequence typing (A or B) and the 1011 and 1016 sites. Sequencing confirmed the results for all insects genotyped by AS-PCR (data not shown). The presence of three alleles in one specimen suggests the gene duplication, at least in the genomic region analyzed. However, search in the Ae. aegypti genome project database (http://aaegypti. vectorbase.org/) did not indicate any evidence that the original Liverpool strain has more than one copy of any part, let alone the whole voltage-gated sodium channel gene. Based on the available sequences, this strain would be classified as homozygous for the 1011Ile+B+1016Val allele, just like the Rockefeller strain used here. Hence, the putative duplication Martins et al. | 154 | Martins et al. Evolution, Medicine, and Public Health Table 2. Sequencing of the AaNaV IIS6 genomic region of specimens from Ae. aegypti Brazilian natural populations Locality Sample Haplotype (1011+intron+1016) Ile + A + Val Uberaba Ap Goiaˆnia Maceio´ Fortaleza Ile + A + Ile X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Ile + B + Val X X X X X X X X X X X X X X X X X X X X X Met + B + Val Ile + B + Ile Ile/Ile+AB+Val/Val Ile/Met+AB+Val/Val Ile/Met+AB+Val/Ile Ile/Ile+AA+Val/Ile Ile/Ile+AA+Val/Ile Ile/Ile+AA+Ile/Ile Ile/Ile+AA+Ile/Ile Ile/Ile+BB+Val/Val Ile/Ile+AB+Val/Ile Ile/Met+AB+Val/Ile Ile/Ile+AA+Ile/Ile Ile/Ile+AA+Ile/Ile Ile/Ile+AA+Ile/Ile Ile/Ile+AB+Val/Ile Ile/Ile+AA+Val/Val Ile/Ile+AB+Val/Val Ile/Ile+AB+Val/Val Ile/Ile+AB+Val/Val Ile/Ile+AB+Val/Val Ile/Ile+AB+Val/Val Ile/Ile+AB+Val/Val Ile/Met+AB+Val/Val Ile/Ile+AA+Val/Ile Ile/Met+AB+Val/Ile Ile/Met+AB+Val/Val Ile/Met+AB+Val/Val Ile/Met+AB+Val/Val Ile/Met+AB+Val/Ile Ile/Met+AB+Val/Val Ile/Met+AB+Val/Val Ile/Met+AB+Val/Val Ile/Met+AB+Val/Val Ile/Met+AB+Val/Val Ile/Met+AB+Val/Val Ile/Ile+BB+Val/Val Ile/Ile+BB+Val/Val Ile/Met+AB+Val/Val Ile/Ile+AA+Val/Val Ile/Ile+BB+Val/Val Ile/Ile+AB+Val/Val Identification of each sample corresponds to the sampling locality: UBR, Uberaba; CUI, Cuiaba´; APG, Aparecida de Goiaˆnia; COM, Maceio´ and hrjg, Henrique Jorge (a district of Fortaleza). ‘Haplotypes’ indicate the combination among site 1011 (Ile or Met)+type of intron (A or B)+site 1016 (Val or Ile). The haplotype observed for each insect is marked by an ‘X’. In the header, the mutations are indicated in bold letters. The last column shows the phenotypic classification, confirmed by AS-PCR. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Cuiaba´ UBR-04 UBR-08 UBR-10 UBR-S25 UBR-S26 UBR-R1 UBR-R3 UBR-R10 UBR-R11 UBR-R13 UBR-R20 UBR-R22 UBR-R26 CUI-01 CUI-02 CUI-03 CUI-04 CUI-07 CUI-08 CUI-12 CUI-R16 CUI-S15 APG-01 APG-02 APG-04 APG-05 APG-06 APG-07 APG-08 APG-09 APG-10 APG-11 APG-12 COM-02 COM-07 COM-09 hrjg-21 hrjg-22 hrjg-23 hrjg-28 Met + A + Val Molecular phenotype (1011+intron+1016) Sodium channel gene duplication in Aedes aegypti Martins et al. | 155 Figure 2. Schematic representation of AaNaV haplotypes. Blue boxes indicate exons 20 and 21 with the intron between them, the latter used to classify the haplotypes as A (orange) or B (green). Sites 1011 and 1016 are represented by the variant wild-type (blue box) or mutant (red box). According to our hypothesis, there is a duplication in some populations, comprised of haplotypes 1011Ile+B+1016Val and 1011Met+A+1016Val. Dashed line suggests linkage of the haplotypes, but which one is upstream was not determined Table 3. Testing the gene duplication hypothesis: molecular phenotype frequencies for the AaNav 1011 site in F1 offspring from crossings between Ae. aegypti Ile/Ile X Ile/Met Crossings Hypothesesa F1 observed (n) With duplication Hypothesis 1 #1 #2 #3 #4 #5 #6 #7 (, Ile/Met x < Ile/Ile) (, Ile/Met x < Ile/Ile) (, Ile/Met x < Ile/Ile) (,Ile/Ile x < Ile/Met) (, Ile/Ile x < Ile/Met) (, Ile/Met x < Ile/Ile) (, Ile/Met x < Ile/Ile) Hypothesis 2a Hypothesis 2b Ile/Ile Ile/Met Ile/Ile Ile/Met P Ile/Ile Ile/Met P Ile/Ile Ile/Met P 0 0 8 9 0 0 0 20 20 12 9 30 30 22 10 10 10 9 15 15 11 10 10 10 9 15 15 11 *** *** NS NS *** *** *** 0 0 0 0 0 0 0 20 20 20 18 30 30 22 NS NS ** *** NS NS *** 10 10 10 9 15 15 11 10 10 10 9 15 15 11 *** *** NS NS *** NS NS Molecular phenotype frequencies were determined by AS-PCR for the AaNaV 1011 site (see ‘Materials and Methods’ section). aExpected numbers of F1 individuals of each molecular phenotype based on the three hypotheses of parental haplotype constitution (Fig. 3). Significance of the deviations of the tested hypotheses obtained through Fisher’s exact test: NS = non-significant, **P < 0.01, ***P < 0.001. However, as all the offspring from the other five crosses were 1011Ile/Met, the progenitor who harbored the mutation was necessarily homozygous for the duplication (Ile_Met/Ile_Met) (Fig. 3). In addition, the F2 offspring from crosses #1 (#1.1) and #2 (#2.1) revealed segregation in the approximated proportion of 3Ile/Met:1Ile/Ile (Table 4), corroborating the duplication hypothesis. Figure 3. Three hypotheses with the expected genotypes and molecular phenotypes in the AaNaV 1011 site for the parental and their respective expected frequency in the F1 offspring. Copy number assay We analyzed the AaNaV copy number variation through molecular assays using DNA from pools of larvae from the Rockefeller reference strain, homozygous for the wild-type alleles, and a strain (EE) selected in the laboratory for pyrethroid resistance [20] and harboring the putative duplication in The 1011Met mutation is shown in red. See text for further details the AaNaV, as suggested by the assays described above. In this sense, we assessed the relative amount of DNA molecules containing the genomic region spanning the AaNaV 1011 site normalized by Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Without duplication 156 | Martins et al. Evolution, Medicine, and Public Health a reference gene (RP49). Assuming that the Rockefeller strain has only two copies of AaNaV as expected for a diploid with a single copy gene, the EE lineage selected for resistance and ‘homozygous’ for the duplication, revealed to have in fact 10 copies (Table 5 and Supplementary Table S1). Accordingly, the F1 resulting from Rockefeller and EE had six copies. The results therefore indicate further duplication events and amplification in this locus. DISCUSSION Table 4. Testing the gene duplication hypothesis: molecular phenotype frequencies for the AaNav 1011 site in F2 offspring from crosses #1 and #2 (Table 3) Crossings (F1) F2 (n) Observed #1.1 (, Ile/Met x < Ile/Met) #2.1 (, Ile/Met x < Ile/Met) Expected Ile/Ile Ile/Met Ile/Ile Ile/Met P 5 7 25 23 8 8 22 22 NS NS Observed and expected numbers for each molecular phenotype in the F2 of crosses #1 and #2 (Table 3) assuming parents carry the following haplotypes Ile/Ile_Met Ile/Ile_Met, in agreement with the duplication hypothesis (Fig. 3). The expected frequencies are 0.25 Ile/Ile and 0.75 Ile/Met (0.50Ile/Ile_Met+0.25Ile_Met/Ile_Met). Deviations from the proposed hypotheses are non-significant (Fisher’s exact test; P > 0.05). Table 5. Copy number variation assay for AaNaV Assay 1 2 3 Rock EE Hib [CT] (SD) Cq [CT] (SD) Cq [CT] (SD) Cq 0.4 0 -0.7 (0.09) (0.11) (0.07) 0 0 0 2.7 2.4 3 (0.03) (0.04) (0.04) 2.3 2.4 2.4 2 1.7 -2.4 (0.07) (0.05) (0.06) 1.6 1.6 1.7 [CT] (SD) 0 Cn 2 (0.03) 2.3 10 (0.07) 1.6 6 Average and standard deviation CT (target reference) followed by the Cq (lineage test Rock) values from each lineage in each assay. Bottom: mean and standard deviation of CT from the three assays and the resulting number of copies (cn) of AaNaV relative to rp49. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 DDT and pyrethroids target the voltage-gated sodium channel (NaV) of insects, a key component of axon membranes exhibiting a fundamental physiological function in neural current propagation, with a complex but highly conserved structure among animals [24]. Vertebrate genomes present 6–10 NaVcoding genes, whereas invertebrate classes, such as Cnidaria and Annelida, have only 2–4 NaV genes [25]. In insects, there is only one NaV, also commonly referred to as ‘paralytic’ (para), due to its relationship with the phenotype of reversible paralysis under high temperatures in Drosophila melanogaster-mutant lineages [26, 27]. An important source of NaV protein variability in different tissues relies on alternative splicing and RNA editing [28]. However, to date no association between pyrethroid resistance and variation derived from post-transcriptional modifications in the Ae. aegypti NaV gene has been uncovered [18]. Another possible source of molecular diversity might be polymorphism generated by recent gene duplications. Putative additional NaV in insects (the orthologous channels DSC1 in D. melanogaster and BSC1 in Blattella germanica) were later grouped close to calcium channels, both functionally and evolutionarily [29, 30]. Recently, two NaV distantly related proteins were characterized in the Periplaneta americana cockroach, coded by the PaNaV and PaFPC para-like genes, a finding that Sodium channel gene duplication in Aedes aegypti metabolic resistance was demonstrated in Caribbean Ae. aegypti populations. Compared with the susceptible strain, two genes (CYP9J26 and the ABC transporter ABCB4) were amplified up to eight and seven copies, respectively [44]. Besides insecticide resistance, duplication of metabolic-resistance genes may also be selectively advantageous to the organism by increasing its general ability of detoxify xenobiotics. Moreover, new functions might be generated due to accumulation of substitutions in duplicated genes [45]. Such events would be more ‘free’ to occur, since the detoxifying enzyme system is redundant, reliant upon different enzymes with a similar function. Hence, the accumulation of potential loss of function alterations might not significantly compromise the metabolism [46]. By contrast, gene duplication events in molecules which are targets of neurotoxic insecticides are thought to be less likely, since they carry out very specific and essential activities, highly conserved throughout evolution. The increase in number might compromise the neurological functioning of the organism, an event described as dosage-balance hypothesis [47]. For instance, a Culex pipiens lineage with an acetilcolinesterase gene (ace-1) duplication presents 60% increase in enzyme activity. However, the acquired organophosphate resistance status is accompanied by an elevated cost of several life-history trait parameters [48]. Indeed, in a number of Cx. pipiens populations, the frequency of the ace-1R-mutant allele decays quickly in the absence of insecticide [49, 50], the same tendency observed for ace-1R in An. gambiae [51]. However, Cx. pipiens’ natural populations with a putative recent ace-1 gene duplication (<40 years) have also been described. In these cases, both copies, with and without the mutation selected for organophosphate resistance, lie in the same chromosome. These mosquitoes, with a ‘heterozygous’ molecular phenotype, are resistant to organophosphates but have a lower fitness loss [52], suggesting a mechanism which favors the occurrence of duplications in neurotoxic insecticide target-coding genes. Herein, we initially hypothesized a duplication in a region of the NaV gene of Ae. aegypti (AaNaV) as a polymorphic trait in natural populations of this important vector, which would include one-mutant haplotype for the 1011 site together with one wildtype for both sites, 1011Met+1016Val and 1011Ile+1016Val, respectively, supported by a fund 157 Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 suggested a possible early duplication event and subsequent loss of the NaV gene in some lineages [31]. The role of gene duplication and/or amplification in insecticide resistance has been described in at least 10 arthropod species, including mosquitoes [32]. The most classic case involves overexpression of Culex Esterase genes, leading to organophosphate resistance. This is the consequence of duplication of two genes (named esterase A and esterase B) or at least the esterase B [33– 35]. Amplification of esterase B1 in Californian Culex mosquitoes was the first event described in this context [36]. Variation in the number of copies among insects was also observed, being directly proportional to organophosphate resistance levels [37]. In agreement, laboratory insecticide selection pressure resulted in an increase in the gene copy numbers. However, it is likely that this process has a limit, since gene amplification is associated with a high fitness cost [38]. In fact, unequal crossingover in the duplicated locus [37] may cause a reduction in copy number over time in the absence of insecticide pressure. Gene duplication was also associated with another class of enzymes related to metabolic resistance, the multi-function oxidases (MFOs) or P450 [39]. Two genes of this class (CYP6P9 and CYP6P4) were overexpressed in pyrethroid-resistant lineages of the malaria vector, Anopheles funestus. This overexpression is associated within tandem gene duplications, mapped in a quantitative trait locus (QTL locus rp1) and responsible for 87% of the genetic variation for pyrethroid resistance in this lineage. Besides, single nucleotide polimorphisms (SNPs) observed in these genes were described as insecticide-resistance markers [39]. Another gene duplication event was associated with overexpression of a P450 gene (CYP9M10) in a pyrethroid-resistant strain of Culex quinquefasciatus [40]. Duplications in genes coding for enzymes involved in metabolic resistance are somewhat expected, since they are components of supergene families bearing many paralogous genes, generally organized in genome clusters [41]. These are rapidly evolving families and few orthologs are identified among insect species [42]. In the Ae. aegypti genome, at least 26, 49 and 160 genes of the main detoxifying enzymes were identified corresponding, respectively, to GST, Esterases and MFO. These numbers represent an increase of 36% compared with Anopheles gambiae [43]. Recently, the importance of gene amplification for pyrethroid Martins et al. | 158 | Martins et al. Evolution, Medicine, and Public Health when pools of 10 larvae were employed. The variation in the number of copies in natural populations remains to be investigated as an important clue for this evolutionary process. Amplification of the NaV gene was also recently demonstrated in a pyrethroid-resistant C. quinquefasciatus lineage. The classical kdr mutation (Leu1014Phe), strongly associated to pyrethroid resistance, was present in one type of sequence. The other type of sequence lacked the intron close to the 1014 site and was not related to resistance. This haplotype was suggested to be a pseudogene [55]. To the best of our knowledge, we present here the first evidence of a duplication event in the sodium channel gene of the dengue vector, Ae. aegypti. Although the available data point to a more important role of the mutations in the 1016 site for pyrethroid resistance, there is clear evidence that the 1011Met mutation, which is associated with the duplication/amplification event(s), is also associated with some resistance [12, 14]. Therefore, the gene duplication and amplification in the Ae. aegypti NaV gene might be a recent adaptive response to the intense use of insecticides, maintaining together wild-type and mutant alleles in the same organism conferring some resistance at the same time as reducing some of its deleterious effects on other aspects of fitness. It will be very interesting to investigate how much diversity in copy number variation there is in natural populations, besides its possible association with pyrethroid resistance and fitness cost. It is also intriguing whether the mosquito sodium channel gene is more prone to duplications than that of other pyrethroid-selected insects as well as what the potential evolutionary interpretation and implications of this process are. supplementary data Supplementary data is available at EMPH online. acknowledgments The authors thank Dr Alexandre Afranio Peixoto for his friendship and orientation throughout this study. This work is dedicated to his memory. They also thank Andre Torres and Heloisa Diniz for their assistance with the figures, the DNA sequencing facility of FIOCRUZ (Plataforma de Sequenciamento/PDTIS/Fiocruz) and to the Brazilian Dengue Control Program that allowed utilization of samples collected in the scope of the Brazilian A. aegypti Insecticide Resistance Monitoring Network (MoReNAa). Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 of evidence. AS-PCR genotyping confirmed that all individuals carrying the 1011Met mutation were (phenotypically) ‘heterozygous’. In addition, sequencing of the AaNaV IIS6 genomic region revealed some individuals with three haplotypes, suggesting the existence of a duplication with the proposed aforementioned composition. Similar results of mosquitoes harboring three alleles were recently reported for the An. gambiae acetilcolinesterase ace-1 gene and interpreted as evidence of a gene duplication event [53]. Saavedra-Rodriguez et al. [16] evaluated the role of AaNaV mutations in pyrethroid resistance by analyzing the susceptibility of the F3 offspring from the parental crossing ,1011Ile/Met+1016Ile/Ile (from Isla Mujeres, Mexico) <1011Ile/Ile+ 1016Val/Val (from New Orleans, lineage control of susceptibility). Interestingly, if the presence of a duplicated sodium channel had been considered, interpretation of some results would have been made easier since they would have better explained the different genotypes in the crosses. In addition, it is remarkable that the Ile1011Met substitution seems to appear in ‘homozygosis’ (1011Met/Met) in high frequency in other localities in Latin America [16, 54], indicating that this mutation is not recessive-lethal and that different types of duplicated haplotypes probably coexist in Ae. aegypti populations. This might also suggest that the gene duplication in the Ae. aegypti NaV gene we observed in Brazilian populations is a relatively recent event. Our initial hypothesis was that, at least for the Ae. aegypti populations studied herein, the 1011Met mutation occurs only in a duplicated haplotype containing a type ‘A’ sequence and the 1016Val wild-type allele, together and in linkage disequilibrium with a type ‘B’ sequence, containing the wild-type allele for both the 1011 and 1016 positions (Fig. 2). The high frequency of ‘heterozygous’ A/B, the lack of 1011Met/ Met specimens, 1011Ile/Met+1016Ile/Ile genotypes and the molecular phenotype of the offspring analyzed here support this hypothesis. However, the results obtained by the copy number variation assay show a ratio of five copies of the AaNaV gene in the EEselected lineage when compared with the Rockefeller strain, indicating that further duplication events might have taken place, possibly as a result of unequal crossing-over. Moreover, it is presumed that the number of copies is a polymorphic trait, given the large variation observed when using single mosquito DNA (data not shown), which was diminished Sodium channel gene duplication in Aedes aegypti funding Martins et al. | 159 13. Martins AJ, Lima JB, Peixoto AA et al. Frequency of Val1016Ile mutation in the voltage-gated sodium channel This work was supported by the Conselho Nacional de Desenvolvimento Cientı´fico e Tecnolo´ico (CNPq - Pronex gene of Aedes aegypti Brazilian populations. Trop Med Int Dengue), Fundac¸a˜o Carlos Chagas Filho de Amparo a` 14. Martins AJ, Lins RM, Linss JG et al. Voltage-gated sodium Pesquisa do Estado do Rio de Janeiro (FAPERJ - Cientistas channel polymorphism and metabolic resistance in pyr- do nosso estado), the Howard Hughes Medical Institute (HHMI) and the Instituto Nacional de Cieˆmcia e Tecnologia ethroid-resistant Aedes aegypti from Brazil. Am J Trop Med - Entomologia Molecular (INCT-EM). The funders had no role 15. Rajatileka S, Black WC 4th, Saavedra-Rodriguez K et al. in study design, data collection and analysis, decision to Development and application of a simple colorimetric publish, or preparation of the manuscript. Funding to pay assay reveals widespread distribution of sodium channel Health 2009;14:1351–5. Hyg 2009;81:108–15. the Open Access publication charges for this article was mutations in Thai populations of Aedes aegypti. Acta Trop provided by Fundac¸a˜o Oswaldo Cruz (FIOCRUZ). 2008;108:54–7. Conflict of interest: None declared. 16. Saavedra-Rodriguez K, Urdaneta-Marquez L, Rajatileka S et al. A mutation in the voltage-gated sodium channel gene associated with pyrethroid resistance in Latin American Aedes aegypti. Insect Mol Biol 2007;16:785–98. references 17. Garcia GP, Flores AE, Fernandez-Salas I et al. Recent rapid Epidemiol Serv Sau´de 2007;16:295–302. 2. Soderlund DM. Pyrethroids, knockdown resistance and sodium channels. Pest Manag Sci 2008;64:610–6. aegypti in Mexico. PLoS Negl Trop Dis 2009;3:e531. 18. Chang C, Shen WK, Wang TT et al. A novel amino acid substitution in a voltage-gated sodium channel is associated with knockdown resistance to permethrin in Aedes aegypti. Insect Biochem Mol Biol 2009;39:272–8. 3. da-Cunha MP, Lima JB, Brogdon WG et al. Monitoring of 19. Harris AF, Rajatileka S, Ranson H. Pyrethroid resistance in resistance to the pyrethroid cypermethrin in Brazilian Aedes Aedes aegypti from Grand Cayman. Am J Trop Med Hyg aegypti (Diptera: Culicidae) populations collected between 2001 and 2003. Mem Inst Oswaldo Cruz 2005;100:441–4. 2010;83:277–84. 20. Martins AJ, Ribeiro CD, Bellinato DF et al. Effect of insecti- 4. Montella IR, Martins AJ, Viana-Medeiros PF et al. cide resistance on development, longevity and reproduc- Insecticide resistance mechanisms of Brazilian Aedes tion of field or laboratory selected Aedes aegypti aegypti populations from 2001 to 2004. Am J Trop Med populations. PLoS One 2012;7:e31889. Hyg 2007;77:467–77. 5. Busvine JR. Mechanism of resistance to insecticide in 21. Lima JB, Da-Cunha MP, Da Silva RC et al. Resistance of houseflies. Nature 1951;168:193–5. 6. Harrison CM. Inheritance of resistance of DDT in the municipalities in the State of Rio de Janeiro and Espirito housefly, Musca domestica L. Nature 1951;167:855–6. 7. Catterall WA. From ionic currents to molecular mechan- 22. Belinato TA, Martins AJ, Lima JB et al. Effect of the chitin isms: the structure and function of voltage-gated sodium bility and reproduction of Aedes aegypti. Mem Inst Oswaldo channels. Neuron 2000;26:13–25. 8. Du Y, Nomura Y, Luo N et al. Molecular determinants on 23. Filipecki AT, Machado CJ, Valle S et al. The Brazilian legal Aedes aegypti to organophosphates in several Santo, Brazil. Am J Trop Med Hyg 2003;68:329–33. synthesis inhibitor triflumuron on the development, viaCruz 2009;104:43–7. the insect sodium channel for the specific action of type II framework on the scientific use of animals. ILAR J 2011;52: pyrethroid insecticides. Toxicol Appl Pharmacol 2009;234: E8–15. 266–72. 9. O’Reilly AO, Khambay BP, Williamson MS et al. Modelling insecticide-binding sites in the voltage-gated sodium channel. Biochem J 2006;396:255–63. 10. Davies TE, O’Reilly AO, Field LM et al. Knockdown resist- 24. Martins AJ, Valle D. The pyrethroid knockdown resistance. In: Soloneski,S, Larramendy,MS (eds), Insecticides—Basic and Other Applications. Rijeka: InTech, 2012, 17–38. 25. Goldin AL. Evolution of voltage-gated Na+ channels. J Exp Biol 2002;205(Pt 5): 575–84. ance to DDT and pyrethroids: from target-site mutations 26. Suzuki DT, Grigliatti T, Williamson R. Temperature-sensi- to molecular modelling. Pest Manag Sci 2008;64:1126–30. tive mutations in Drosophila melanogaster. VII. A mutation 11. Davies TG, Field LM, Usherwood PN et al. DDT, pyreth- (para-ts) causing reversible adult paralysis. Proc Natl Acad rins, pyrethroids and insect sodium channels. IUBMB Life 2007;59:151–62. 12. Brengues C, Hawkes NJ, Chandre F et al. Pyrethroid and DDT cross-resistance in Aedes aegypti is correlated with novel mutations in the voltage-gated sodium channel gene. Med Veterinary Entomol 2003;17:87–94. Sci USA 1971;68:890–3. 27. Loughney K, Kreber R, Ganetzky B. Molecular analysis of the para locus, a sodium channel gene in Drosophila. Cell 1989;58:1143–54. 28. Davies TG, Field LM, Usherwood PN et al. A comparative study of voltage-gated sodium channels in the Insecta: Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 rise of a permethrin knock down resistance allele in Aedes 1. Braga IA, Valle D. Aedes aegypti: vigilaˆncia, monitoramento da resisteˆncia e alternativas de controle no Brasil. 160 | Martins et al. Evolution, Medicine, and Public Health implications for pyrethroid resistance in Anopheline and other Neopteran species. Insect Mol Biol 2007;16:361–75. environmental response in the honeybee. Insect Mol Biol 2006;15:615–36. 29. Zhou W, Chung I, Liu Z et al. A voltage-gated calcium- 43. Strode C, Wondji CS, David JP et al. Genomic analysis of selective channel encoded by a sodium channel-like gene. detoxification genes in the mosquito Aedes aegypti. Insect Neuron 2004;42:101–12. 30. Cui YJ, Yu LL, Xu HJ et al. Molecular characterization of DSC1 orthologs in invertebrate species. Insect Biochem Mol Biol 2012;42:353–9. 31. Moignot B, Lemaire C, Quinchard S et al. The discovery of a novel sodium channel in the cockroach Periplaneta americana: evidence for an early duplication of the paralike gene. Insect Biochem Mol Biol 2009;39:814–23. 32. Bass C, Field LM. Gene ampliEcation and insecticide resistance. Pest Manag Sci 2011;67:886–9. 33. Raymond M, Chevillon C, Guillemaud T et al. An overview of the evolution of overproduced esterases in the mosquito Culex pipiens. Philos Trans R Soc Lond B Biol Sci 1998;353:1707–11. esterase A and B genes as a single unit in Culex pipiens mosquitoes. Heredity (Edinb) 1996;77(Pt 5): 555–61. ing the molecular basis of pyrethroid resistance in the dengue vector, Aedes aegypti. PLoS Negl Trop Dis 2012;6: e1692. 45. Kimura M, King JL. Fixation of a deleterious allele at one of two ‘‘duplicate’’ loci by mutation pressure and random drift. Proc Natl Acad Sci USA 1979;76:2858–61. 46. Conant GC, Wolfe KH. Turning a hobby into a job: how duplicated genes find new functions. Nat Rev Genet 2008; 9:938–50. 47. Papp B, Pal C, Hurst LD. Dosage sensitivity and the evolution of gene families in yeast. Nature 2003;424: 194–7. 48. Bourguet D, Raymond M, Fournier D et al. Existence of two acetylcholinesterases in the mosquito Culex pipiens 35. Montella IR, Schama R, Valle D. The classification of esterases: an important gene family involved in insecticide (Diptera:Culicidae). J Neurochem 1996;67:2115–23. 49. Berticat C, Boquien G, Raymond M et al. Insecticide resist- resistance—a review. Mem Inst Oswaldo Cruz 2012;107: ance genes induce a mating competition cost in Culex 36. Mouches C, Pasteur N, Berge JB et al. Amplification of an pipiens mosquitoes. Genet Res 2002;79:41–7. 50. Berticat C, Duron O, Heyse D et al. Insecticide resistance esterase gene is responsible for insecticide resistance in a genes confer a predation cost on mosquitoes, Culex 437–49. California Culex mosquito. Science 1986;233:778–80. pipiens. Genet Res 2004;83:189–96. 37. Guillemaud T, Lenormand T, Bourguet D et al. Evolution of 51. Alout H, Djogbenou L, Berticat C et al. Comparison of resistance in Culex pipiens: Allele replacement and changing environment. Evolution 1998;52:443–53. Anopheles gambiae and Culex pipiens Acetycholinesterase 1 biochemical properties. Comp Biochem Physiol B 38. Raymond M, Poulin E, Boiroux V et al. Stability of insecti- Biochem Mol Biol 2008;150:271–7. cide resistance due to amplification of esterase genes in 52. Labbe P, Berthomieu A, Berticat C et al. Independent Culex pipiens. Heredity 1993;70:301–7. 39. Wondji CS, Irving H, Morgan J et al. Two duplicated P450 duplications of the acetylcholinesterase gene conferring genes are associated with pyrethroid resistance in insecticide resistance in the mosquito Culex pipiens. Mol Biol Evol 2007;24:1056–67. Anopheles funestus, a major malaria vector. Genome Res 53. Djogbenou L, Chandre F, Berthomieu A et al. Evidence of 2009;19:452–9. 40. Itokawa K, Komagata O, Kasai S et al. Genomic structures introgression of the ace-1R mutation and of the ace-1 duplication in West African Anopheles gambiae s. s. PLoS One of Cyp9m10 in pyrethroid resistant and susceptible strains of Culex quinquefasciatus. Insect Biochem Mol Biol 2010;40: 631–40. 41. Ranson H, Claudianos C, Ortelli F et al. Evolution of supergene families associated with insecticide resistance. Science 2002;298:179–81. 42. Claudianos C, Ranson H, Johnson RM et al. A deficit of detoxification enzymes: pesticide sensitivity and 2008;3:e2172. 54. Lima EP, Paiva MH, de Araujo AP et al. Insecticide resistance in Aedes aegypti populations from Ceara, Brazil. Parasit Vectors 2011;4:5. 55. Xu Q, Tian L, Zhang L et al. Sodium channel genes and their differential genotypes at the L-to-F kdr locus in the mosquito Culex quinquefasciatus. Biochem Biophys Res Commun 2011;407:645–9. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 34. Rooker S, Guillemaud T, Berge J et al. Coamplification of Biochem Mol Biol 2008;38:113–23. 44. Bariami V, Jones CM, Poupardin R et al. Gene amplification, ABC transporters and cytochrome P450s: unravel- 187 Evolution, Medicine, and Public Health [2013] pp. 187–196 doi:10.1093/emph/eot016 orig inal research article Patterns of physical and psychological development in future teenage mothers Daniel Nettle1, Thomas E. Dickins*2, David A. Coall3,4 and Paul de Mornay Davies2 Centre for Behaviour and Evolution, Institute of Neuroscience, Newcastle University, Newcastle, UK; 2Department of Clinical Neurosciences, University of Western Australia, Crawley, Australia; and 4School of Medical Sciences, Edith Cowan University, Joondalup, Australia *Corresponding author. Department of Psychology, Middlesex University, London, UK. Tel:+44(0)2084114588; E-mail: t.dickins@mdx.ac.uk Received 3 June 2013; revised version accepted 13 August 2013 ABSTRACT Background and objectives: Teenage childbearing may have childhood origins and can be viewed as the outcome of a coherent reproductive strategy associated with early environmental conditions. Life-history theory would predict that where futures are uncertain fitness can be maximized through diverting effort from somatic development into reproduction. Even before the childbearing years, future teenage mothers differ from their peers both physically and psychologically, indicating early calibration to key ecological factors. Cohort data have not been deliberately collected to test life-history hypotheses within Western populations. Nonetheless, existing data sets can be used to pursue relevant patterns using socioeconomic variables as indices of relevant ecologies. Methodology: We examined the physical and psychological development of 599 young women from the National Child Development Study who became mothers before age 20, compared to 599 socioeconomically matched controls. Results: Future young mothers were lighter than controls at birth and shorter at age 7. They had earlier menarche and accelerated breast development, earlier cessation of growth and shorter adult stature. Future young mothers had poorer emotional and behavioural adjustment than controls at age 7 and especially 11, and by age 16, idealized younger ages for marriage and parenthood than did the controls. Conclusions and implications: The developmental patterns we observed are consistent with the idea that early childbearing is a component of an accelerated reproductive strategy that is induced by earlylife conditions. We discuss the implications for the kinds of interventions likely to affect the rate of teenage childbearing. K E Y W O R D S : life history theory; development; early reproduction; reproductive strategy ß The Author(s) 2013. Published by Oxford University Press on behalf of the Foundation for Evolution, Medicine, and Public Health. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 1 Psychology, Middlesex University, London, UK; 3Community, Culture, and Mental Health Unit, School of Psychiatry and 188 | Nettle et al. Evolution, Medicine, and Public Health INTRODUCTION Whether or not an organism is high or low on plasticity, their phenotype is regarded as the outcome of selection operating within the parameters of key trade-offs. ‘Trade-offs represent the costs paid in the currency of fitness when a beneficial change in one trait is linked to a detrimental change in another’ [21]. One key trade-off is that between current and future reproduction. Physiologically this amounts to a decision about when to stop investing in somatic capital (growth and maintenance) and divert energy into reproduction [17, 22]. Some species have a total commitment to this decision, including Pacific salmon, whose bodies deteriorate during spawning as they divert all of their somatic capital into reproduction. They die immediately after this event. Other species, including our own, have a mixed allocation across lifespan, and in our case we have a lengthy pre- and post-reproduction life [23]. Within species variation in timing of first reproduction should be sensitive to local ecology. A resource rich ecology will enable a relatively lengthy investment in somatic capital and a consequent delay in reproduction. Where the ecology is stressed, and resource acquisition uncertain, the somatic investment should stop sooner, and reproduction will commence earlier [24]. The trade-off between quality and quantity of offspring will also provide selection pressure. Ecological stress can lead to increased reproduction, effectively as a bet-hedging strategy. Better resources allow for investment in more robust, higher quality offspring [25]. Human populations in the developed world are not uniform in their ecological niche, and do not have equal access to resources. This leads to distinct life-history differences in terms of morbidity and mortality across socioeconomic gradients [26]. There are also differences in reproductive strategy, such that low socioeconomic status neighbourhoods carry a higher risk of teenage pregnancy and motherhood [3, 13, 27–29]. Life-history theory leads us to expect key individual differences in behaviour and physical growth between those who engage in early reproduction compared with those who are relatively delayed. Thus, teenage motherhood can be seen as an extreme end of a niche-specific early fertility strategy. The average age of first birth in poorer neighbourhoods will be lower than that in wealthier boroughs, but not all reproduction will begin during teenage years in deprived areas [30]. For those who do reproduce during their teenage Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Most women in Western populations delay the onset of childbearing. However, there is a small minority who become mothers before the age of 20. This ‘teenage childbearing’ phenomenon continues to attract public health interest and policy interventions [1–3], although the basis for considering it a major problem is debatable [4–6]. Policy makers often regard teenage childbearing as a mistake, stemming from lack of skills and knowledge surrounding contraception and sexual relationships [2, 7]. However, the contention that contraceptive behaviour or knowledge is a major causal factor is not well supported by evidence [1, 8, 9]. Moreover, programmes of intervention that provide contraceptive education to adolescents have been found to have no effect on the rate of teenage childbearing [10–12]. Policy makers have viewed this phenomenon as the outcome of ‘poor’ reasoning, and it is assumed that better reasoning will lead to delayed reproduction [13]. An alternative perspective holds that early childbearing is part of a coherent reproductive strategy for some women. Indeed, women’s ideal age for parenthood, surveyed at age 16 in the National Child Development Study (NCDS) (see below), is generally a good predictor of their subsequent actual age at first pregnancy [14]. Such desires could be seen as indicative of peer pressure imposing a social norm within such populations, but stable pro-natal attitudes of this sort also require an explanation, and could easily be symptomatic of a reproductive strategy [13]. In addition, teenage mothers reach menarche relatively early [15], suggesting more rapid maturation. Reproductive strategies differ between and within species. Life-history theory captures these differences [16]. A key assumption is that organisms will act to maximize their average lifetime inclusive fitness, and that selection will have led to the evolution of proximate mechanisms that enable physiological and behavioural calibration to local ecological contingencies [17]. The degree of calibration will vary across species from fixed to more plastic strategies. Those that inhabit relatively stable ecological niches are more likely to have low levels of plasticity compared with generalists or those from stochastic ecologies [18–20]. Within a species, where different ecologies are populated, we should expect to see different phenotypic responses to maximize inclusive fitness. Physical and psychological development in teenage mothers examining emotional and behavioural adjustment earlier in childhood in future teenage mothers. The strategic view of teenage childbearing also suggests that future teenage mothers should have a motivational orientation towards early childbearing, and this should be significantly before first conception. Consistent with this view, Maestripieri et al. [53] found that adolescent women from father-absent households, who are prone to show accelerated reproductive strategies, show a greater preference for images of infants than their peers. In this article, we use longitudinal data from the NCDS to compare the developmental profiles of a group of young women who became teenage mothers with those of a control group who did not. We examine physical variables (weight and height, weight and height gain, pubertal development, timing of menarche), and psychological variables (psychological adjustment in childhood, reproductive intentions at adolescence). As outlined earlier, we predict that the future young mothers will be characterized by poorer growth very early in life, rapid weight gain in middle childhood, early menarche and pubertal maturation and the early cessation of growth. Psychologically, we would expect to see negative emotional symptoms and behavioural adjustment problems in childhood, and a motivational orientation to early parenthood that is detectable by adolescence. We also investigate exposure to contraceptive education at age 16, to test for effects of lack of knowledge. Several of the developmental differences we predict have been found in previous research (e.g. early menarche [14], reduced adult stature [54], unhappiness in childhood [55] and idealization of early parenthood [28] are all associated with teenage childbearing). However, not all studies control rigorously for socio-economic position. This is important, as teenage childbearing is concentrated in the poorest social strata [56], and thus future teenage mothers will differ from the rest of the population in many ways that are related to poverty, but not directly related to their reproductive schedules. In this study, we compare future young mothers only to a socioeconomically matched control group to mitigate this problem, and to identify precursors that are specific to teenage childbearing. Moreover, no previous study has examined all the physical and psychological antecedents in a single investigation. The NCDS has exceptionally rich longitudinal data, including a wide variety of different measures, allowing this order of analysis. We can therefore 189 Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 years we must look to additional differences between mothers, and idiosyncratic ecological issues, beyond a general socioeconomic categorization. Belsky et al. [31] proposed that adverse early-life conditions—specifically, low parental investment and family stress—induce accelerated reproductive strategies as an adaptive response. Many studies have observed associations consistent with this hypothesis, such as those between low birthweight and early menarche [32–34], poor parent–child relationships and early menarche [35–38], or between stressful family environment and age at first sexual activity or conception [39, 40]. It is hard to separate out genetic and environmental explanations for these associations, given that there are established heritable effects on pubertal maturation [41], and there could be genetic correlations between these factors and parenting behaviours [42, 43]. However, evidence from genetically informative study designs [36], and experimental animal models [44, 45], suggests that the relationship between early-life inputs and subsequent reproductive strategies may be partly causal. Gene Environment interactions, whereby people with some genotypes are more responsive than others to the effect of rearing conditions, are also plausible [46]. If teenage childbearing is the outcome of a coherent reproductive strategy, and if that strategy is induced by early environmental conditions, then we can predict that future teenage mothers will differ from their peers in many ways beyond their knowledge about contraception. Moreover, these differences should be evident well before the childbearing years. Physically, we should expect relatively poor growth very early in life, since growth immediately before and after birth is highly sensitive to maternal investment [47, 48]. This should however be coupled with earlier puberty, and because of the relationship between pubertal maturation and stature increase [49], also with earlier cessation of stature growth. Early puberty requires rapid weight gain in middle childhood [50, 51], and thus we might additionally predict this pattern in future young mothers. At the psychological level, Belsky et al. [31] suggested that adverse rearing conditions should be reflected in increased levels of emotional and behavioural problems in childhood, and that these mediate the acceleration of reproductive strategy. Associations have been reported between teenage childbearing and conduct problems in adolescence [52], but there is a paucity of quantitative research Nettle et al. | 190 | Nettle et al. Evolution, Medicine, and Public Health compare the strength of association across different types of variables to investigate the relative strengths of say, depression in late childhood, early menarche and lack of contraceptive education, as individual predictors of teenage childbearing. METHODS No separate ethical approval was required for this research, as it was based on a secondary analysis of an existing, anonymous data set. Written consent for the storage of data was given by the parents of all cohort members (CMs), and, in adulthood, by the CMs themselves. Study population and design Measures Physical development Our physical development measures include birthweight (oz), weight (kg) and height (m) measured at the ages of 7, 11, 16 and 23. We also used these variables to calculate the gains in weight and height between 7 and 11, 11 and 16 and 16 and 23. Pubertal development was assessed at 11 and 16, with physicians assessing breast development (scales 1–5 at age 11, absent/intermediate/adult at age 16) and pubic hair (scales 1–5 at age 11, absent/sparse/ intermediate/adult at age 16). We treat the age 11 pubertal development variables as continuous, and for the age 16 variables, we contrast ‘adult’ (the modal response) with ‘non-adult’ (the other options combined). Age at onset of menses is reported twice in the NCDS data: by the girl being asked during physician examination at age 16, and by mother’s report in an interview at age 16. Once responses of ‘Not yet started’ and ‘Age unknown’ have been deleted from both variables, the two correlate at r = 0.72 (P < 0.001). Here, we use the mother report as it has more than 100 more complete records for our case group. Psychological development At ages 7 and 11, CMs’ teachers assessed their behaviour using items from the Bristol Social Adjustment Guides (BSAG) [58]. The teachers indicated whether a large number of classes of behaviour indicating poor adjustment were present (yes = 1/no = 0). These ratings give an overall maladjustment score (BSAG total; higher score indicates worse adjustment), and scores for Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 We used data from the NCDS, a longitudinal study of all children born in the UK between 3 March and 9 March 1958. Extensive medical and sociological data were gathered at the time of birth, at 7, 11, 16 and 23 years, using perinatal hospital data, physician examination and interviews with parents, teachers and the CMs themselves. The NCDS is ongoing. We employed a case control design for the following reasons. First, it is advantageous for studying dynamic populations in which follow-up is difficult. Second, it is effective for examining outcomes with a long latency period between exposure and manifestation—in this study this is up to 20 years. Third, it can be used to examine multiple risk factors for development of the focal variable. Given that longitudinal data have not been collected with our specific hypotheses in mind we recognize that total control is impossible to achieve. To this end we regard this study as an exploratory proof of concept. Our initial sample included all female CMs whose gestational age was known and was >259 days (term), and who were still in the study at age 23. From these 5152 women, 600 reported having a child before their 20th birthday (the ‘case’ group). Socioeconomic position in 1958 was primarily measured using the Registrar General’s social class framework [57], a five-point scale based on occupational ranking. To control for family socioeconomic position, we selected a set of controls such that the frequency distribution of the social class of the CM’s mother’s husband (variable n492), and the social class of CM’s mother’s father (variable n526), was the same in the case and control groups. This included selecting controls with missing values of these variables to correspond to cases with missing values. Selection of controls where there were more than needed who met the criteria was done by lowest NCDS serial number. One case could not be matched due to a unique combination of social class variables and was excluded from the study. Thus, the ‘case’ and ‘control’ groups (n = 599) are identical in terms of their distributions of household social class at the time of birth, and social class background of the CM’s mother, although they are unrepresentative of the NCDS women as a whole (see Table 1). The case and control groups do not differ in gestational age (cases: mean 283.31, SD 10.35; controls: mean 283.05, SD 9.70, t1196 = 0.46, n.s.). Nettle et al. | Physical and psychological development in teenage mothers 191 Table 1. Frequencies (percentages) of different social classes of mother’s husband, and mother’s father, in the case and control groups, and in women meeting the inclusion criteria from the NCDS cohort as a whole Class category Cases and controls 229 687 3010 601 409 4 114 1 97 (4.4) (13.3) (58.4) (11.7) (7.9) (0.1) (2.2) (0.01) (1.9) 3 35 346 105 70 0 25 0 15 (0.5) (5.8) (57.8) (17.5) (11.7) (0) (4.2) (0) (2.5) 115 673 2266 633 586 36 394 60 289 (2.2) (13.1) (44.0) (12.3) (11.4) (0.7) (7.7) (1.2) (7.6) 3 47 236 103 95 3 52 6 54 (0.5) (7.9) (39.4) (17.2) (15.9) (0.5) (8.7) (1.0) (9.0) 12 subscales (unforthcomingness, withdrawal, depression, anxiety about acceptance by adults, hostility towards adults, writing off adults and standards, anxiety about acceptance by children, hostility towards children, restlessness, inconsequential behaviour, miscellaneous symptoms and miscellaneous nervous symptoms). The subscale scores all had a strong mode at zero, and so we have treated them as dichotomous (zero score/non-zero score). The BSAG total scores did not have a mode at zero, but were skewed, and so we have square root transformed them for the purposes of t-tests. At age 16, CMs were asked in an interview to state the ideal age to get married, and the ideal age to start a family. Responses were coded using a series of categories (16 or 17, 18 or 19, 20 or 21, 22–25, 26–30, over 30). We have reconverted these categories into ages using category mid-points (30 for ‘Over 30’), but since the resulting distribution is non-normal, we use non-parametric statistics to test for differences in these variables. In the same interview, CMs were asked whether they had lessons about conception in the context of sex and relationships education at school, and whether they felt that they had been provided with enough information about conception. Analysis As our design controls for socioeconomic position, and the CMs do not differ in age, our statistical analyses are very simple. We compare variables between the case and control groups, reporting odds ratios (ORs) and their confidence intervals (CIs) for dichotomous variables, and t-tests or non-parametric Mann–Whitney U-tests for continuous ones. We report Cohen’s d [59] as a measure of effect size where appropriate. Note that we do not use paired statistics. Since around 150 cases have a father and a maternal grandfather from class III, for example, it would be arbitrary to match each case to one particular control for statistical purposes (and there would be many thousands of equally valid matchings). Instead, our design ensures that the overall socioeconomic profiles of the case and control groups do not differ, but the comparisons are between the group means or frequencies. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Mother’s husband I II III IV V Students Single, dead, away Retired Missing data Mother’s father I II III IV V Unemployed, sick Dead, away Retired Missing data Whole cohort 192 | Nettle et al. Evolution, Medicine, and Public Health RESULTS Growth and physical development Psychological development At age 7, the cases had higher total BSAG scores than the controls (t1095 = 5.77, P < 0.01, d = 0.35). At age 11, the difference had become more marked (t1034 = 7.25, P < 0.01, d = 0.45). Table 3 shows the OR for having a non-zero score on each of the BSAG subscales. At age 7, cases were significantly more likely to have a non-zero score than controls for unforthcomingness, depression, hostility towards adults, writing off adults and standards, inconsequential behaviour, and miscellaneous symptoms. At age 11, cases were significantly more likely to have a non-zero score than controls on all subscales except for withdrawal and anxiety about acceptance by adults. Effect sizes for the BSAG subscales were generally substantial, with a mean OR of 1.82 at age 11 (Table 3). The case group gave a significantly lower mean ideal age for marriage than the controls (Table 4; Mann–Whitney U-test: z = 7.77, P < 0.01). The case group also had significantly lower mean ideal ages DISCUSSION Our results indicate that the differences between British women who initiate childbearing early, and their peers who do not, are apparent well before adolescence. Future young mothers in the NCDS cohort were significantly lighter than their peers at birth, and by age 7, lagged behind their peers in terms of height. Between 7 and 16, future young mothers caught up somewhat in terms of height, and particularly in terms of weight, though the difference in weight gain between 7 and 16 was not statistically significant. We note the similarity here to the growth profile of those at risk for cardiovascular and metabolic problems later in life; low weight at birth and in early childhood, followed by relatively rapid weight gain in middle childhood [60]. Thus, accelerated reproductive schedules may have similar developmental origins. Our future young mothers also showed signs of accelerated pubertal maturation, with more adult breast development at 16, and an average age at menarche around 4 months younger than the controls. They also gained very little height after 16 compared to their peers, suggesting early termination of growth and an accelerated transition from adolescence to adulthood. The effect sizes for physical differences between future young mothers and controls were generally small [59], with the difference in timing of menarche providing the largest effect. The psychological variables reveal increased levels of emotional and behavioural disturbance at age 7 and, more strongly, at age 11. In contrast to the physical differences, the effect sizes for the psychological variables are substantial, with the odds of depression and hostility at age 11, for example, being over twice as high in the future young mothers as in the control group. Previous research has found that conduct disorder, but not affective problems such as depression, in adolescence, is predictive of teenage pregnancy [52]. However, using a psychological assessment in childhood, we found that both Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 The cases were on average significantly lighter than the controls at birth (Table 2), and tended to be lighter at age 7 (P = 0.06). All differences in weight and also in weight gain were non-significant after age 7. The cases were significantly shorter than the controls at 7 and 11, and then again at 23. The height gain 7–11 and 11–16 was no different for cases and controls (data not shown). However, the height gain between 16 and 23 was significantly less for the cases than controls (t788 = 4.49, P < 0.01, d = 0.32). The mean height gain 16–23 for the cases was 0.7 cm, compared to 1.5 cm for the controls. There was no difference in ratings of breast or pubic hair development at age 11 between cases and controls (t946 = 0.92, n.s.; t945 = 0.05, n.s.). However, at age 16, cases were more likely to be judged to have adult breasts than the controls (marginally significant: OR = 1.34, 95% CI 1.00–1.81, P = 0.05). The odds of being judged to have adult pubic hair were not significantly different between cases and controls (OR = 1.18, 95% CI 0.88–1.57). Menarche was significantly earlier in the cases than controls (t859 = 3.35, P < 0.01, d = 0.23; Table 2), with a mean difference of 0.29 years. for starting a family than the controls (Mann– Whitney U-test: z = 7.07, P < 0.01). Within the case group, 15.8% reported having had no sex education lessons about conception, compared to 12.8% of the controls (difference not significant: OR 1.28, 95% CI 0.87–1.89). Asked whether they needed more information about conception, 34.3% of the cases answered ‘yes’ or ‘maybe’. This compared to 30.7% of the matched controls (difference not significant: OR 1.12, 95% CI 0.95–1.49). Nettle et al. | Physical and psychological development in teenage mothers 193 Table 2. Comparison of the case and control groups for physical development variables NCDS variable Cases Controls Effect size Birthweight (oz) Weight, age 7 (kg) Weight, age 11 (kg) Weight, age 16 (kg) Weight, age 23 (kg) Height, age 7 (m) Height, age 11 (m) Height, age 16 (m) Height, age 23 (m) Breast development, age 11 Pubic hair, age 11 Breast development, age 16 Pubic hair, age 16 Age at menarche n574 dvwt07 dvwt11 dvwt16 dvwt23 dvht07 dvht11 dvht16 dvht23 n1531 n1532 From n2005 From n2006 From n2648 114.81 (6.93) 23.12 (3.46) 36.73 (7.69) 54.52 (8.83) 58.16 (10.03) 1.208 (0.057) 1.436 (0.071) 1.600 (0.061) 1.605 (0.065) 1.98 (0.93) 1.86 (0.93) Adult 258/non-adult 111 Adult 222/non-adult 133 12.57 (1.33) 116.81 (16.91) 23.55 (3.68) 37.54 (7.52) 54.19 (8.29) 58.37 (8.96) 1.220 (0.060) 1.447 (0.073) 1.607 (0.064) 1.621 (0.069) 2.04 (0.95) 1.86 (0.89) Adult 268/non-adult 155 Adult 244/non-adult 172 12.86 (1.25) 0.12* 0.12 0.11 0.04 0.02 0.21* 0.15* 0.11 0.25* 0.06 0 OR 1.34* OR 1.18 0.23* Given are descriptive statistics for each group (means and standard deviations or frequencies, as appropriate), and effect size of the case–control comparison (Cohen’s d or OR, as appropriate). *P < 0.05. Table 3. OR (95% CIs) for receiving a non-zero score on each of the BSAG subscales, for cases versus controls, at ages 7 and 11 Scale Age 7 Age 11 Unforthcomingness Withdrawal Depression Anxious accept. adults Host. adults Writing off adults Anxious children Host. children Restlessness Incons. behaviour Misc. symptoms Misc. nervous 1.50* (1.18–1.90) 1.00 (0.72–1.38) 1.64* (1.29–2.09) 1.11 (0.87–1.41) 1.95* (1.49–2.56) 1.79* (1.32–2.19) 1.11 (0.78–1.72) 1.22 (0.90–1.72) 1.30 (0.94–1.79) 1.68* (1.32–1.85) 1.45* (1.13–1.85) 1.12 (0.74–1.70) 1.30* (1.02–1.66) 1.34 (0.99–1.83) 2.28* (1.78–2.93) 1.29 (0.99–1.67) 2.00* (1.52–2.62) 1.54* (1.20–1.97) 1.59* (1.12–2.25) 2.62* (1.87–3.68) 2.43* (1.67–3.34) 1.75* (1.37–2.24) 1.69* (1.31–2.17) 1.97* (1.19–3.26) *P < 0.05. conduct problems and affective problems were more prevalent in future young mothers than in controls. In fact, increased emotional and behavioural disturbance in the future young mothers was consistent across all the subscales of the BSAG at age 11. Coupled with this was an idealization of earlier marriage and earlier childbearing by age 16. Thus, the psychological variables suggest a picture of poor adjustment and negative emotionality in mid- to late-childhood, associated with a tendency to reproduce young that is already in place by age 16. This evidence accords with recent qualitative studies, which have suggested that unhappiness in childhood is often a precursor to teenage motherhood, and that it is generally experienced as a positive life development [4, 5, 61]. The pattern of psychological development—unhappiness in childhood alongside a desire for parenthood—neatly mirrors the physical one of poorer childhood growth, but precocious development at and after puberty. Taken together, the physical and psychological trajectories are consistent with the Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Measure 194 | Nettle et al. Evolution, Medicine, and Public Health Table 4. Comparison of the case and control groups for psychological development variables Variable NCDS variable Cases Controls Effect size BSAG total score, age 7 BSAG total score, age 11 Ideal age for marriage Ideal age for family No lessons about conception Needs more info about conception n455 n1008 From n2809 From n2810 From n2825 From n2858 9.08 (8.29) 10.17 (9.53) 20.66 (2.54) 22.67 (2.75) Yes 63/no 335 Yes 129/no 247 6.62 (7.36) 6.43 (7.10) 21.81 (2.26) 23.96 (2.55) Yes 58/no 396 Yes 135/no 305 0.35* 0.45* 0.48* 0.49* OR 1.28 OR 1.12 Given are descriptive statistics for each group (means and standard deviations or frequencies, as appropriate), and effect size of the case–control comparison (Cohen’s d or OR, as appropriate). *P < 0.05. that mothers who gave birth at or before age 20 were more socioeconomically deprived, had reduced human and social capital and experienced significantly more mental health problems than mothers who delayed childbearing. The current research is valuable for two reasons. First, it allows us to clearly identify individual-level developmental precursors of early childbearing, above and beyond socioeconomic background. Our results suggest that young women who physically mature earlier in comparison to their peers, and especially those whose emotional and behavioural adjustment before puberty is poor, are at substantially increased likelihood of seeking early parenthood. Second, it has implications for the design of interventions. One of the few respects in which the future young mothers did not, on aggregate, differ significantly from the controls is in their exposure to sex education lessons about conception, or their satisfaction with those lessons (cf. [1]). Moreover, the finding that future young mothers had earlier ideal ages for parenthood undermines the view that teenage pregnancy is generally caused by mistakes stemming from poor contraceptive skills. Instead, teenage childbearing generally occurs in the context of early target ages for conception, and stands at the culmination of a long developmental trajectory that begins as early as in utero. It is quite plausible that interventions that improve birthweight or early growth, or reduce emotional distress in childhood, would disrupt this developmental trajectory, and have the eventual effect of reducing teenage pregnancy rates, while merely improving knowledge of contraception is unlikely to have much effect. This suggestion is borne out by the literature on the effectiveness of different kinds of intervention Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 idea of a facultative accelerated reproductive strategy being triggered by adverse early experience [31]. However, we note that with our current data, we can only document the different developmental trajectory of future young mothers; we cannot separate out the possible genetic and environmental influences causing it. There is good evidence for both genetic and environmental influences on, for example, age at menarche [36, 41], and Gene Environment interactions are also likely to be important. We should note by way of caution that the case– control comparisons reported here aggregate all the future young mothers together, and all the controls together. Thus, our analyses do not reflect the fact that there may be multiple pathways to teenage childbearing. Some cases of teenage childbearing may indeed reflect lack of contraceptive education; our results merely show that this is not generally the case in this cohort. Moreover, we have not discriminated the possibility that, for example, one subset of teenage conceptions is preceded by depression in childhood, while a different subset is preceded by early menarche, from the possibility that depression in childhood causes early menarche which leads to early parenthood. Our data are also relatively old, with the NCDS young mothers having their babies in the 1970s. Although the UK rate of teenage childbearing has declined since that time [28], there is no reason to believe that fundamental socioeconomic or psychosocial determinants have altered significantly in recent decades [62]. Indeed, one influential study of teenaged mothers in contemporary Britain noted that they continue to experience difficulties similar to those reported for earlier cohorts. Moffitt and E-Risk Study Team [63] reported Nettle et al. | Physical and psychological development in teenage mothers programme, which shows that interventions aimed at increasing childhood well-being do tend to have an impact [55], whereas sex education programmes aimed at adolescents do not [10–12]. 195 delivered by teachers on NHS registered conceptions and terminations: final results of cluster randomised trial. BMJ 2007;334:133–6. 12. Stephenson J, Strange V, Allen E et al. The long-term effects of a peer-led sex education programme (RIPPLE): a cluster randomised trial in schools in England. PLoS Med acknowledgements 2008;5:e224. 13. Dickins T, Johns S, Chipman A. Teenage pregnancy in the The NCDS is run by the Centre for Longitudinal Studies, United Kingdom: a behavioral ecological perspective. J Soc Institute of Education, London (www.cls.ioe.ac.uk), and data are made available to registered researchers via the UK Data Evol Cult Psychol 2012;6:344–59. 14. Nettle D, Coall DA, Dickins TE. Birthweight and paternal Archive (www.data-archive.ac.uk). We should like to thank involvement predict early reproduction in British women: two anonymous reviewers for useful comments made on evidence from the National Child Development Study. Am an earlier draft of this paper, and also the editorial team for their useful input. Conceived the study: D.N., T.E.D. J Hum Biol 2009;22:172–9. 15. Buston K, Williamson L, Hart G. Young women under and D.A.C.; obtained and screened data: D.N. and D.A.C.; 16 years with experience of sexual intercourse: who be- analysed data: D.N.; wrote and revised the paper: D.N., comes pregnant? J Epidemiol Community Health 2007;61: T.E.D., D.A.C. and P.D.M.D. 221–5. references and prospects. Naturwissenschaften 2000;87:476–86. 17. Kaplan H, Gangestad S. Life history theory and 1. Allen E, Bonell C, Strange V et al. Does the UK govern- evolutionary psychology. In: Buss DM (ed). The ment’s teenage pregnancy strategy deal with the correct Handbook of Evolutionary Psychology. Hoboken, N.J.: risk factors? Findings from a secondary analysis of data John Wiley and Sons, 2005, 68–95. from a randomised trial of sex education and their impli- 18. Sol D. Revisiting the cognitive buffer hypothesis for the cations for policy. J Epidemiol Community Health 2007;61: evolution of large brains. Biol Lett 2009;5:130–3. 19. Stearns SC. The evolutionary significance of phenotypic 20–7. 2. SEU. Teenage Pregnancy. London: Social Exclusion Unit/ HMSO, 1999. 3. Paranjothy S, Broughton H, Adappa R et al. Teenage pregnancy: who suffers? Arch Dis Child 2009;94:239–45. plasticity: phenotypic sources of variation among organisms can be described by developmental switches and reaction norms. BioScience 1989;436:1–10. 20. Leimar O. Environmental and genetic cues in the evolu- 4. Arai L. Teenage Pregnancy: The Making and Unmaking of a tion of phenotypic polymorphism. Evol Ecol 2007;23: Problem. Policy Press: Bristol, 2009. 5. Duncan S. What’s the problem with teenage parents? And 125–35. what’s the problem with policy? Crit Soc Policy 2007;27: 307–34. 21. Stearns SC. Trade-offs in life-history evolution. Funct Ecol 1989;3:259–68. 22. Chisholm JS, Ellison PT, Evans J et al. Death, hope, and sex. maternal age adversely affect child development? Curr Anthropol 1993;34:1–24. 23. Kaplan H, Hill KIM, Lancaster J et al. A theory of human life Evidence from cousin comparisons in the United States. history evolution: diet, intelligence, and longevity. Evol 6. Geronimus AT, Korenman S, Hillemeier MM. Does young Popul Dev Rev 1994;20:585–609. Anthropol Issues News Rev 2000;9:156–85. 7. Wight D, Abraham C. From psycho-social theory to sus- 24. Lawson DW, Mace R. Parental investment and the opti- tainable classroom practice: developing a research-based mization of human family size. Philos Trans R Soc Lond B teacher-delivered sex education programme. Health Educ Biol Sci 2011;366:333–43. 25. Borgerhoff Mulder M. Optimizing offspring: the quantity– Res 2000;15:25–38. 8. Harvey N, Gaudoin M. Teenagers requesting pregnancy termination are no less responsible about contraceptive use at the time of conception than older women. BJOG 2007;114:226–9. 9. Seamark C. Design or accident? The natural history of teenage pregnancy. J R Soc Med 2001;94:282–5. 10. DiCenso A, Guyatt G, Willan A et al. Interventions to reduce unintended pregnancies among adolescents: systematic review of randomised controlled trials. BMJ 2002; 324:1426–30. 11. Henderson M, Wight D, Raab GM et al. Impact of a theoretically based sex education programme (SHARE) quality tradeoff in agropastoral Kipsigis. Evol Hum Behav 2000;21:391–410. 26. Marmot M. Fair society, healthy lives. Public Health 2010; 126(Suppl.):S4–10. 27. Johns SE. Perceived environmental risk as a predictor of teenage motherhood in a British population. Health Place 2011;17:122–31. 28. Kiernan KE. Becoming a young parent: a longitudinal study of associated factors. Br J Sociol 1997;48:406–28. 29. Smith DM, Elander J. Effects of area and family deprivation on risk factors for teenage pregnancy among 13-15-yearold girls. Psychol Health Med 2006;11:399–410. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 16. Stearns SC. Life history evolution: successes, limitations, 196 | Nettle et al. Evolution, Medicine, and Public Health 30. Nettle D. Flexibility in reproductive timing in human 46. Belsky J, Pluess M. Beyond diathesis stress: differential females: integrating ultimate and proximate explan- susceptibility to environmental influences. Psychol Bull ations. Philos Trans R Soc Lond B Biol Sci 2011;366: 357–65. 2009;135:885–908. 47. Wells JCK. Maternal capital and the metabolic ghetto: an 31. Belsky J, Steinberg L, Draper P. Childhood experience, evolutionary perspective on the transgenerational basis of interpersonal development, and reproductive strategy: an evolutionary theory of socialization. Child Dev 1991; health inequalities. Am J Hum Biol 2010;22:1–17. 48. Bogin B. Patterns of Human Growth. Cambridge University 62:647–70. 32. Adair LS. Size at birth predicts age at menarche. Pediatrics 2001;107:e59. 33. Sloboda DM, Hart R, Doherty DA et al. Age at menarche: influences of prenatal and postnatal growth. J Clin Endocrinol Metab 2007;92:46–50. Press: Cambridge, 1988. 49. Nettle D. Women’s height, reproductive success and the evolution of sexual dimorphism in modern humans. Proc Biol Sci 2002;269:1919–23. 50. Blell M, Pollard TM, Pearce MS. Predictors of the age at first menarche in the Newcastle Thousand Families Study. 34. Opdahl S, Nilsen TIL, Romundstad PR et al. Association of size at birth with adolescent hormone levels, body size and J Biosoc Sci 2008;40:563–75. 51. Silva IdS, De Stavola BL, Mann V et al. Prenatal factors, age at menarche: relevance for breast cancer risk. Br J childhood growth trajectories and age at menarche. Int J Cancer 2008;99:201–6. Epidemiol 2002;31:405–12. 52. Maughan B, Lindelow M. Secular change in psychosocial relationships and individual differences in the timing risks: the case of teenage motherhood. Psychol Med 1997; of pubertal maturation in girls: a longitudinal test 27:1129–44. of an evolutionary model. J Pers Soc Psychol 1999;77: 53. Maestripieri D, Roney JR, DeBias N et al. Father absence, 387–401. 36. Tither JM, Ellis BJ. Impact of fathers on daughters’ age at menarche and interest in infants among adolescent girls. Dev Sci 2004;7:560–6. menarche: a genetically and environmentally controlled 54. Brennan L, McDonald J, Shlomowitz R. Teenage births and sibling study. Dev Psychol 2008;44. 37. Bogaert AF. Menarche and father absence in a national probability sample. J Biosoc Sci 2008;40:623–36. 38. Alvergne A, Faurie C, Raymond M. Developmental plasti- final adult height of mothers in India, 1998-1999. J Biosoc Sci 2005;37:185–91. 55. Harden A, Brunton G, Fletcher A et al. Teenage pregnancy and social disadvantage: systematic review integrating city of human reproductive development: effects of early controlled trials and qualitative studies. BMJ 2009;339:b4254. family environment in modern-day France. Physiol Behav 2008;95:625–32. 56. Imamura M, Tucker J, Hannaford P et al. Factors associated with teenage pregnancy in the European Union countries: a 39. Ellis BJ, Bates JE, Dodge KA et al. Does father absence place daughters at special risk for early sexual activity and teenage pregnancy? Child Dev 2003;74:801–21. 40. Chisholm J, Quinlivan J, Petersen R et al. Early stress predicts age at menarche and first birth, adult attachment, and expected lifespan. Hum Nat 2005;16:233–65. 41. Hartge P. Genetics of reproductive lifespan. Nat Rev Genet 2009;41:637–8. 42. Moffitt TE, Caspi A, Belsky J et al. Childhood experience and the onset of menarche: a test of a sociobiological model. Child Dev 1992;63:47–58. 43. Comings DE, Muhleman D, Johnson JP et al. Parent– systematic review. Eur J Public Health 2007;17:630–6. 57. Office for Population Censuses and Surveys. Classification of Occupations and Coding Index. Her Majesty’s Stationery Office: London, 1980. 58. Stott DH. The Social-adjustment of Children: Manual to the Bristol Social Adjustment Guides. University of London Press: London, 1965. 59. Cohen J. Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates: Hillsbaum, NJ, 1988. 60. Barker DJP, Osmond C, Forse´n TJ et al. Trajectories of growth among children who have coronary events as adults. N Engl J Med 2005;353:1802–9. daughter transmission of the androgen receptor gene as 61. Coleman L, Cater S. ‘Planned’ teenage pregnancy: per- an explanation of the effect of father absence on age of spectives of young women from disadvantaged back- menarche. Child Dev 2002;73:1046–51. 44. Cameron NM, Fish EW, Meaney MJ. Maternal influences on the sexual behavior and reproductive success of the grounds in England. J Youth Stud 2006;9:593–614. 62. Hobcraft J. The timing and partnership context of female rat. Horm Behav 2008;54:178–84. 45. Cameron NM, Shahrokh D, Del Corpo A et al. Epigenetic programming of phenotypic variations in re- becoming a parent: cohort and gender commonalities and differences in childhood antecedents. Demogr Res 2008;19:1281–322. 63. Moffitt TE, E-Risk Study Team. Teen-aged mothers in productive strategies in the rat through maternal care. contemporary Britain. J Child Psychol Psychiatr 2002;43: J Neuroendocrinol 2008;20:795–801. 727–42. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 35. Ellis B, McFadyen-Ketchum S. Quality of early family 273 Evolution, Medicine, and Public Health [2013] pp. 273–288 doi:10.1093/emph/eot022 orig inal research a r t i c le Evidence for independent evolution of functional progesterone withdrawal in primates and guinea pigs 1 Yale Systems Biology Institute and Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, USA; 2Perinatology Research Branch, Program for Perinatal Research and Obstetrics, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH, Bethesda, MD, USA; 3 Department of Obstetrics and Gynecology, University of Michigan, Ann Arbor, MI, USA; 4Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, USA; 5Department of Obstetrics and Gynecology, Wayne State University, Detroit, MI, USA *Correspondence address. Yale Systems Biology Institute, Yale University West Campus, Advance Biosciences Center, Room 272B, PO BOX 27388, West Haven, CT 06516-7388, USA. Tel: þ1 203 737 3091; Fax: þ1 203 737 3109; E-mail: gunter.wagner@yale.edu Received 8 May 2013; revised version accepted 21 November 2013 ABSTRACT Background and objectives: Cervix remodeling (CRM) is a critical process in preparation for parturition. Early cervix shortening is a powerful clinical predictor of preterm birth, and thus understanding how CRM is regulated is important for the prevention of prematurity. Humans and other primates differ from most other mammals by the maintenance of high levels of systemic progesterone concentrations. Humans have been hypothesized to perform functional progesterone withdrawal (FPW). Guinea pigs are similar to humans in maintaining high-progesterone concentrations through parturition, thus making them a prime model for studying CRM. Here, we analyze the phylogenetic history of FPW and document gene expression in the guinea pig uterine cervix. Methodology: Data on progesterone withdrawal were collected from the literature, and character evolution was analyzed. Uterine cervix samples were collected from non-pregnant, mid-pregnant and late pregnant guinea pigs. RNA was extracted and sequenced. Relative transcript levels were estimated and compared among sample groups. Results: The phylogenetic analysis shows that FPW evolved independently in primates and guinea pigs. The transcriptome data confirms that guinea pigs down-regulate progesterone receptor toward parturition, in contrast to humans. Some of the similarities between human and guinea pig are: downregulation of estrogen receptor, up-regulation of VCAN and IGFBP4 as well as likely involvement of prostaglandins. ß The Author(s) 2013. Published by Oxford University Press on behalf of the Foundation for Evolution, Medicine, and Public Health. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 ¨nter P. Wagner*1,5 Mauris C. Nnamani1, Silvia Plaza1, Roberto Romero2,3,4 and Gu 274 | Nnamani et al. Evolution, Medicine, and Public Health Conclusions and implications: (i) FPW in guinea pigs evolved independently from that in primates. (ii) A small set of conserved gene regulatory changes has been detected. K E Y W O R D S : guinea pig cervix; gene expression; functional progesterone withdrawal; evolution of parturition INTRODUCTION In this communication we investigate gene expression in the cervix of guinea pigs to assess the similarity between human and guinea pig CRM. Our aim is to answer the question of whether FPW in primates and guinea pigs is homologous, meaning that it was already present in the most recent common ancestor of humans and guinea pig. This question is important because molecular mechanisms are more likely shared among homologous characters than among characters that evolved independently. Guinea pig belongs to a basal rodent lineage [14], and it is thus possible that FPW in primates and guinea pigs could be homologous. Here, we report that there are extensive differences between the human and guinea pig cervical gene expression dynamics, which makes it unlikely that the mechanisms of ‘FPW’ are homologous between the two species. This inference is also supported by a phylogenetic analysis of serum progesterone concentrations at parturition among Archontolgires, the clade uniting primates and rodents. Nevertheless, it is still possible that there are conserved molecular mechanisms shared between primates and other placental mammals, as for instance the down-regulation of estrogen receptor alpha (ESR1), as reported in our previous paper for humans [17] and here for guinea pigs. METHODOLOGY Tissue harvesting and RNA extraction Hartley guinea pigs (pregnant and non-pregnant) were obtained from a commercial supplier (Charles River, Wilmington, MA, USA). The animals were housed in individual plastic cages with hay, branches and other environment enhancing objects. Non-pregnant females were examined daily for vulva occlusion membrane. These animals were sacrificed the day after the vulva occlusion disappeared indicating entry into estrus. Pregnant females were obtained with uncertain pregnancy dates and were euthanized based on assessing overall body weight of the female with gestation range Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Cervical remodeling (CRM) is a necessary step in preparation for successful parturition [1]. Premature softening and shorting of the cervix is also a powerful predictor of preterm birth, and thus understanding the molecular mechanisms regulating CRM are critical for successful intervention to prevent prematurity [2]. A major obstacle in unraveling the mechanisms of human cervical ripening is that mammals are highly variable with respect to the mechanisms underlying parturition. Most notable is the fact that in humans and other primates the placenta produces large amounts of progesterone through pregnancy and labor. In contrast, most other placental mammals, including model organisms such as sheep, mouse, rabbit and rat, have systemic progesterone withdrawal, meaning that the serum concentration of progesterone declines toward term before the onset of labor. Declining progesterone concentrations are thought to remove the ‘progesterone block’ from the uterus and the cervix and thus allow progression toward parturition [3–5]. To explain parturition in humans, it has been suggested that there have to be changes downstream of the progesterone signal that undercut the progesterone block in its target tissues. Several mechanisms have been proposed for this so-called functional progesterone withdrawal (FPW) [6–9], but none of them have yet received decisive support. A notable exception among model organisms is the guinea pig that, like the human, maintains high concentrations of serum progesterone through parturition [10]. It has thus been suggested that guinea pig is a potential model organism to study the mechanisms of FPW as well as prematurity, which is very rare in model species with fast gestation like mice [11]. This is an attractive possibility, as guinea pigs belong to the same clade as other major model organisms, mice, rat and rabbit, the so-called Glires [12–15]. It is also a model organism with a long tradition of experimental research [16] and a sequenced genome (http://useast.ensembl.org/Cavia_porcellus/Info/ Index). Independent evolution of functional progesterone withdrawal Sequencing and data processing RNA library preparation and high-throughput sequencing were performed on an Illumina HiSeq 2000 sequencing system following the protocol recommended by Illumina for sequencing total RNA samples. Sequencing was done for each biological replicates at 1 75 bp strand specific by the Yale Center for Genome Analysis. Sequence reads were aligned to the Cavia porcellus reference genome (cavPor3.69) using the splice junction mapper for RNA-seq reads TopHat2 [21–23]. Sequencing depth for RNA-seq samples averaged 45 million reads per biological sample with >80% overall alignment rate. After alignment, read counts were determined with HTSeq-count v0.5.4p1 as described by the authors. All RNA-seq data are deposited in GEO under accession number GSE47986. Data analysis Relative RNA abundance was measured as transcripts per million (TPM) transcripts as recommended in [24, 25] rather than RPKM or FPKM. The reason is that units of RPKM are not consistent between samples and are thus problematic when comparing RNA abundance between samples of different transcript composition [25]. In comparing transcript abundances we distinguish two different kinds of events. At the one hand are induction and repression of gene expression (turning gene expression ON or OFF). On the other hand are modulations of gene expression. We refer to induction when the transcript abundance of a gene is below the operational threshold of three TPM [26, 27] in the earlier stage of the reproductive cycle but above the threshold in the later stage. The threshold of three TPM corresponds to 1 RPKM in terms of the traditional RNA abundance measure [26, 27]. The threshold is based on association between expression level and chromatin modification status as well as a statistical model of transcript abundance. In contrast, we call changes of transcript abundance ‘expression modulation’ if the estimated transcript abundance is above the operational threshold in both stages compared. When we refer to up or down regulation, we mean gene expression modulation. The motivation for making the (operational) distinction between gene expression modulation and induction/repression is that they represent two different biological events. Induction/repression is associated with qualitative changes in the nature of the chromatin modification and the kind of transcription factors and co-factors associated with the cis-regulatory elements [27]. On the other hand, gene expression modulation can have a variety of causes, from phosphorylation status of transcription factors to the expression level of upstream regulators. Statistics When correlations were calculated or differential expression was tested, we transformed TPM data by square root transformation rather than the more widely used log-transformation for two reasons. First, TPM as well as RPKM data usually contains zero values. In a log-transformation these data points will be transformed into minus infinity, with consequences for data handling and interpretation. pffiffiffi In contrast, 0 ¼ 0 and no further problems arise. The second reason to prefer square root transformation is that it is in fact variance stabilizing [26], while the log-transformation leads to an inflation of variance at low abundance values. Tests for differential expression were not used as a discovery tool but to test specific hypotheses. For instant, we tested differences between stages for those genes that have been found differentially expressed in humans. For this reason, we did not use multiple comparison 275 Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 provided by the suppliers. For ‘post-mortem’ assessment of the duration of pregnancy, we took measurements of both the fetal supra-occipital-premaxilla length and fetal weight at harvest (Supplementary Table S1). To estimate gestational age we used previously published data [18–20]. Estimated gestational stage for mid-pregnant and late pregnant (but not in labor) animals was 43.6 ± 0.5 (mean ± SD) days and 65.2 ± 1.2 days, respectively (average gestation period of 65.5 ± 6.5 days range [16]). Tissue harvesting focused on the vaginal portion of the cervix to minimize contamination with uterine myometrial tissue. Samples were collected from the three groups in triplicates for non-pregnant and mid-pregnant females and four replicates for late pregnant females. Cervical tissue was immediately stored in RNAlater Solution (Ambion, cat# AM7020) until further processing. Total RNA was extracted using the RNeasy extraction kit (Qiagen, cat# 75142) following manufactures instructions that included an in-column DNAse digestion. Quality of the RNA was then confirmed using Agilent 2100 Bioanalyzer (Santa Clara, CA, USA). Nnamani et al. | 276 | Nnamani et al. Evolution, Medicine, and Public Health corrections. We used one-way ANOVA and t-tests on square root transformed TPM data. BrdU labeling RESULTS Evolution of functional progesterone withdrawal The guinea pig has attracted interest because it shares with humans the feature of maintaining high concentrations of serum progesterone (P4) (>200 ng/ml) through parturition [10]. For both species it has been suggested that some mechanism Figure 1. Evolution of FPW. Phylogenetic reconstruction of serum progesterone concentration at parturition in Euarchontoglires, the clade uniting primates and rodents. Species and lineages with sustained high systemic progesterone concentrations are indicated in white, species and lineages with a drop of systemic progesterone concentrations are indicated in black. The root state was fixed as progesterone withdrawal based on outgroup comparison with hoofed animals and carnivores. Tree topology in Glires follows fig. 6 in [28]. (a) Scenario that assumes that the mouse lemur has progesterone withdrawal. Under this assumption the lack of systemic progesterone withdrawal is an ancestral state of primates and even the Euarchonta. (b) Scenario that assumes that the mouse lemur does not have progesterone withdrawal. Under this scenario the ancestral state for Euarchonta and primates is ambiguous. Note that in both scenarios guinea pig is the only lineage without progesterone withdrawal within the Glires and that guinea pig is deeply nested within this clade. This phylogenetic distribution suggests that FPW evolved independently in primates (Euarchonta) and guinea pig within the Glires. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Three guinea pigs 35–38 weeks pregnant for mid-pregnant and also three females 58–63 weeks pregnant for late pregnancy were obtained from Charles River laboratories. Two females from each group were injected intraperitoneal with 10 ml of Bromodeoxyuridine (BrdU) labeling reagent Life Technology (Cat # 0103), and one control sample with 10 ml PBS. After 24 h the animal was sacrificed, and cervix was recovered and fixed in 4% paraformaldehyde (24–48 h). The tissue was processed for paraffin embedding and serial section of 6 mm was produced. The slides were processed following the Life Technology protocol for BrdU (Cat # 93-3943). downstream of the progesterone signal is responsible for CRM, called FPW. It is thus interesting to ask whether FPW in the guinea pig is homologous to that in humans. In Fig. 1a and b, a survey of progesterone withdrawal within Euarchontoglires is presented based on the phylogenetic hypothesis in [28]. Lack of systemic progesterone withdrawal is found in humans, apes (chimpanzee [29], gorilla [30, 31]) and old world monkeys (rhesus monkey [32, 33], baboon [34]). The situation in new world moneys is complicated. The detailed study by Chambers and Hearn on the common marmoset (Callithrix jacchus) shows a steady decline in serum progesterone levels in the 4 days leading up to parturition [35]. Other authors did not report such a drop but also did not explicitly document the peripartum period [36]. In contrast, Corousos and collaborators report the peripheral progesterone levels of two pregnant squirrel monkeys (Saimiri sciureus) but do not show a decline of progesterone levels in the days leading up to parturition [37], even though the levels at the day of parturition are reported to be close to zero. Nevertheless, it is clear that the withdrawal, if any, would have to be very precipitous or absent. Based on comparison with the rate of decline in the marmoset we concluded that the squirrel monkeys do not have systemic progesterone withdrawal. We could not find data about tarsiers and some limited Independent evolution of functional progesterone withdrawal 277 were mapped to the guinea pig genome version cavPor3.69 and read counts transformed to TPM transcripts [25]. Below we focus on two comparisons of transcript abundance. One is the differences between non-pregnant and mid-pregnant stages and the other the comparison of mid-pregnancy and late pregnancy. Gene expression changes in the cervix from estrus to mid-pregnancy In the comparison from non-pregnant and mid-pregnant cervices, 195 genes cross the operational criterion from being repressed to being expressed. Figure 2a shows the expression levels of genes operationally non-expressed in estrus but expressed in mid-pregnancy. Two genes stand out, CLCA1 and PLA2G10 (see Supplementary Table S2). CLCA1 is a component of Ca2þ sensitive chlorine channels and is involved in mucus production in other organs [42]. PLA2G10 is a phospholipase catalyzing the ratelimiting step in prostaglandin synthesis [43]. The other highly expressed and induced genes were: two members of the sodium-dependent decarboxylase transporters (SLC13A2 and SLC36A2), IGFBP1, as well as a tumor necrosis factor, TNFSF11. There are 428 genes expressed during estrus but operationally OFF at mid-pregnancy. Figure 2b shows their expression level during estrus. There are 11 genes highly expressed during estrus but operationally turned OFF in mid-pregnancy with a discontinuity of lower expression level (big bold arrow in Fig. 2b). The most highly expressed gene turned off in pregnancy is interleukin 1a (IL1A). The other genes are TMPRSS11D, a trans-membrane serine protease involved in gland secretory activity, interferon kappa, IFNK and an antagonist of IL1A, IL1RN and others more (see list in Supplementary Table S3). The gene ontology categories most over-represented are related to inhibition of cell proliferation. Gene expression in guinea pig cervix Gene expression changes in the cervix from mid- to late-pregnancy Cervical tissue was harvested from guinea pigs in three reproductive stages: non-pregnant and in estrus (NP), in mid-pregnancy (MT) and late pregnancy/term (LT) (see ‘Methodology’ for details on the timing of sampling). The samples were transcribed and sequences with 1 75 bp to an average sequencing depth of 40–50 Mio reads. Reads In the transition to late term 195 genes are turned on (Fig. 2c). The strongest expressed gene, which was not expressed in mid-pregnancy, is called VISG8, V-set and immunoglobulin domain containing 8. Very little is known about its function (Supplementary Table S4). The second most expressed gene turned on toward term is SPC24, a Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 information about lemurs. Blanco and Meyer report progesterone levels in various stages of estrus and pregnancy of the brown mouse lemur (Microcebus rufus) but the gestational stages in this study are very unreliable [38]. In spite of that, the pre-partum cohort shows higher progesterone than the pre-estrus cohort, making it possible that the mouse lemur lack systemic progesterone withdrawal. Thus, we provide two reconstructions one with and one without systemic progesterone withdrawal for the mouse lemur (Fig. 1a and b). An outgroup clade of primates are the Tupaias (Scandentia). In Tupaia belaneri, there is a drop in serum P4 only after parturition [39] but this paper reports unpublished data from Elger and Hasan about a single animal. However, the peak progesterone level is reported to occur 2 days before parturition and we thus classify it as having no systemic progesterone withdrawal. Within the Glires, the clade including rodents and rabbits and squirrels, all lineages have progesterone withdrawal except guinea pigs [40, 41]. In Fig. 1a and b, we provide two parsimony-based ancestral character state reconstructions reflecting the ambiguity about the situation in the lemur. Under either scenario the lack of systemic progesterone withdrawal in the guinea pig is reconstructed as independently derived (Fig. 1a and b). The difference between the two scenarios, lemurs with or without progesterone withdrawal, only affects the reconstruction of the ancestral character state in the primate lineage (Euarchonta). Assuming that the lemur has no systemic progesterone withdrawal (Fig. 1a) leads to a reconstruction where this condition is ancestral for the Euarchonta, and the situation in the marmoset is inferred to be a reversal. Assuming that the lemur has systemic progesterone withdrawal (Fig. 1b) leads to a reconstruction where the ancestral state of the stem lineage of primates is ambiguous. From these data we conclude that FPW likely has evolved independently in primates and guinea pigs. Nnamani et al. | 278 | Nnamani et al. Evolution, Medicine, and Public Health Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Figure 2. Expression levels in TPM of genes that are turned on or off during pregnancy in the guinea pig cervix. (a) Expression level of genes in mid-pregnancy of genes that are not expressed in non-pregnant cervix and expressed in mid-pregnancy. Note that there are two genes much more highly expressed than the others, CLCA1 and PLA2G10. (b) Expression level of genes in non-pregnant cervix that are not expressed in mid-pregnant cervix. There is a distinct difference in expression level between the 11 highest expressed genes and the rest (large arrow). (c) Expression level of genes in late pregnancy of genes not expressed in mid-pregnancy. (d) Expression level of genes during mid-pregnancy that are not expressed in late pregnancy. Four genes stand out: MGAM is the maltase-glucoamylase or alpha-glucosidase, an enzyme involved in starch digestion; SCXB the basic helix-loop-helix transcription factor; COL9A2 is a type IX collagen mostly known from hyaline cartilage; and CSPG5 is the chondroitin sulfate proteoglycan 5 (neuroglycan C). (e) Cervical stroma in mid-pregnancy labeled with BrdU. Note small dense nuclei with very few BrdU labeled cells. (f) Cervical stroma in late pregnancy labeled with BrdU. The cell nuclei are less dense and there are abundant cells labeled with BrdU, consistent with the finding that proliferation related genes are preferentially expressed in late pregnancy (continued) Independent evolution of functional progesterone withdrawal kinetochore complex component, foreshadowing the result of the gene ontology analysis. The overwhelming majority of genes are related to enhancement of cell proliferation, suggesting a very active proliferative state of the guinea pig cervix at term. These conclusions are supported by the induction or strong up-regulation of genes known to promote or are associated with tumor growth. For instance, among the induced genes is FOXM1 and among the strongest up-regulated genes are ARG2, arginase type II and FGFBP1. The high expression Nnamani et al. | 279 of ARG2 [44–46], FGFBP1 [47, 48] and FOXM1 [49–53], are associated with several cancers. To determine whether the late term cervical stroma has proliferating cells we injected guinea pigs in mid- and late-term with BrdU and harvested the tissue 24 h later (Fig. 2e and f). While at mid-term the nuclei of stromal cells are small and dense, in late term the cell nuclei are larger and less dense. BrdU stained cells are abundant in latter (Fig. 2f) consistent with the transcriptomic data suggesting higher proliferative activity toward late term. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Figure 2. (Continued) (continued) 280 | Nnamani et al. Evolution, Medicine, and Public Health Figure 2. (Continued) Expression dynamics of candidate genes Steroid receptors During estrus the expression of estrogen receptor, ESR1, is high (>240 TPM) and is decreasing to <150 TPM in mid-pregnancy and further decreases to <100 TPM toward term. In contrast, the progesterone receptor (PR) mRNA, PGR, is low during estrus (27 TPM) and rises to a value of 56 TPM in midpregnancy but falls close to pre-pregnancy levels of 30 TPM toward the end of pregnancy (Fig. 3a). Extracellular matrix The dominant collagen mRNA in the guinea pig cervix is type 1 collagen, COL1A1 and COL1A2. Expression is high for both type I collagen genes, 5000 TPM and 2000 TPM, respectively, and does not change among stages of the reproductive cycle. COL3A1, coding for collagen III identified in human cervix, was not mapped in our transcriptome data. The dominant small proteoglycan is fibromodulin (FMOD) (Fig. 3b), which is known to associate with type I collagen and is also able to sequester TGF-beta in the extracellular matrix. FMOD shows a 2.2-fold increase from estrus to mid-pregnant stage and then a 0.55-fold decrease toward term, returning to about the same level as in estrus. A similar dynamics is observed for another small proteoglycan, testican 2, SPOCK2, although at lower levels of expression. It increases 2.1 from estrus to mid-pregnancy and toward term is almost completely suppressed, going from 122 TPM at MT to 9 TPM at LT (0.07-fold change). Repression of both FMOD and SPOCK2 transcription may play a role in extracellular matrix remodeling in preparation for parturition. In addition, the expression dynamics of FMOD suggests lower efficiency of TGF-beta signaling during pregnancy than before and toward term. We found substantial expression for two large proteoglycans, versican (VCAN) and perlecan (HSPG2, heparan sulfate proteoglycan 2) (Fig. 3c). VCAN decreases from estrus to mid-pregnant by 0.38-fold and then increases 1.6-fold again toward term. In contrast, HSPG2 is high in estrus but declines slowly toward term to 59% from its value before pregnancy. The dominant matrix metallopeptidase mRNA is MMP2 (Fig. 4a), which is known to break down collagen IV and gelatin. There is an apparent increase in expression of MMP2 between mid- and late-term, but the statistical support is weak (P ¼ 0.056). The next highest expression was found for MMP14, which is activating MMP2 and is likely membrane bound. MMP14 expression in estrus is 340 TPM and decreases to mid-pregnancy by 0.55-fold (P ¼ 0.016) and shows a slight increase of 1.18-fold toward late term (P ¼ 0.05). In contrast, MMP11 has a moderate expression in estrus (109 TPM), decreases toward mid-pregnancy by 90% (0.1-fold) and is barely above the operational threshold for expressed genes at term (3.75 TPM). This MMP11 is not effective in extracellular matrix breakdown, but is reported to cleave protease inhibitor Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 In the transition to term 449 genes meeting the operational criterion of expression in mid-pregnancy are turned OFF toward term. Four of them have higher expression than the rest of the distribution (Fig. 2d). These are: MGAM, SCXB, COL9A2 and CSPG5 (Supplementary Table S5). MGAM is the maltase-glucoamylase or alpha-glucosidase, an enzyme involved in starch digestion; SCXB the basic helix-loop-helix transcription factor; COL9A2 is a type IX collagen mostly known from hyaline cartilage and CSPG5 is the chondroitin sulfate proteoglycan 5 (neuroglycan C). Repression of these genes suggests a change in the extra-cellular matrix composition during transition to term. Independent evolution of functional progesterone withdrawal dynamics of large proteoglycan genes: VCAN and perlecan (HSPG2). Values at the right of each graph give the P-value of an ANOVA test including all three samples; the valued above the graphs give t-test based P-values for two-tailed pairwise comparisons A1AT, which is not included in our mapped transcripts. Among the ADAM metallopeptidases with thrombospondin motif, ADAMTS metallopeptidases, ADAMTS1 and -2 are most expressed. ADAMTS1 shows a slow but insignificant increase from non-pregnant to term, while ADAMTS2 is increasing from non- to mid-pregnant and decreases at late term (Fig. 4b). The most expressed inhibitor of matrix metallopeptidase is TIMP2, which is highly expressed during estrus and shows a slow but insignificant decline toward term (Fig. 4c). The only highly expressed protease inhibitor that shows a significant increase from mid- to late-term is TIMP1 (1.9, P ¼ 0.006). TIMP1 is associated with cell proliferation and may have anti-apoptotic effects consistent with other results suggesting proliferation in later term cervix (see above and Fig. 2e and f). In contrast, TIMP3 shows a 50% reduction in expression between mid- and late-term (0.51, P ¼ 0.026). Signaling Prostaglandin signaling, in particular PGE2 and PGF2a, is associated with many aspects of female reproductive physiology including parturition [54]. A rate-limiting step in prostaglandin synthesis is the liberation of arachidonic acid from phospholipids by phospholipase A and C. In the guinea pig cervix we find a substantial increase in PLA2G10 just before parturition (Fig. 5a). Interestingly, the expression of cyclo-oxygenase II, catalyzing another key step in prostaglandin synthesis, is not increased toward term. 281 Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Figure 3. Expression dynamics of steroid receptors and proteoglycans. (a) Expression of ESR1 and PR. ESR1 is high during estrus and is steadily declining toward parturition. PGR is up-regulated in mid-pregnancy but returns to pre-pregnancy levels toward term. (b) mRNA expression dynamics of small proteoglycan genes: FMOD, testican 2 (SPOCK2) and BGN. (c) mRNA Nnamani et al. | 282 | Nnamani et al. Evolution, Medicine, and Public Health of genes coding for ADAM metallopeptidases with thrombospondin motif, ADAMTS metallopeptidases. (c) mRNA expression of genes coding for inhibitors of matrix metallopeptidases, TIMPs. Values at the right of each graph give the P-value of an ANOVA test including all three samples; the valued above the graphs give t-test based P-values for two-tailed pairwise comparisons The expression of IGF signaling molecules came to our attention though the conserved up-regulation of IGFBP4 in both human and guinea pig cervices (see below). mRNA expression of IGF1 and IGF2 toward term follow different trends, with IGF1 being up-regulated (1.57, P ¼ 0.0074) and IGF2 downregulated (0.66, P ¼ 0.0038). In contrast, there is no substantial change in IGF receptor mRNA expression. There is high expression of most IGF binding proteins, with three having a dramatic regulatory change (Fig. 5b). IGFBP5 is the most expressed IGFBP overall but does not show a significant difference between pregnancy stages. IGFBP4 shows the most dramatic up-regulation (2.75, P ¼ 0.0022). In contrast, IGFBP3 is expressed at about the same level as IGFBP5 at estrus and then shows a gradual decline during pregnancy with very low expression at term (from 1500 TPM during estrus to 140 TPM at term). These results could point to an attenuation of IGF2 signaling in the term cervix of the guinea pig because two binding proteins are increasing, and IGF2 expression is decreasing (but IGF1 mRNA is increasing). DISCUSSION In this study we addressed two questions. First, we investigated whether FPW in primates and guinea pigs has evolved independently or whether these Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Figure 4. Expression dynamics of proteases. (a) mRNA expression of matrix metalloprotease genes. (b) mRNA expression Independent evolution of functional progesterone withdrawal Nnamani et al. | 283 Figure 5. Expression dynamics of genes related to prostaglandin and IGF signaling. (a) mRNA expression of genes coding for PLA2 gamma 4A and 10, PLA2G4A and PLAG210. Note the steep increase in the expression level of PLA2G10, which suggests (b) mRNA expression of genes coding for insulin like growth factor binding proteins, IGFBP3, IGFBP4 and IGFBP5. Values at the right of each graph give the P-value of an ANOVA test including all three samples; the valued above the graphs give t-test based P-values for two-tailed pairwise comparisons are homologous features. Based on a parsimony reconstruction of ancestral character states we inferred that FPW in guinea pigs evolved independently of that in primates (see Fig. 1). Second, we investigated the gene regulatory dynamics during pregnancy in the uterine cervix using RNAseq to assess how CRM is regulated in the presence of sustained progesterone levels in the guinea pig. Our data suggest two mechanisms involved in CRM, down-regulation of PR expression and, potentially, the intracervical activation of prostaglandin synthesis, that are discussed in more detail below. Evolution of functional progesterone withdrawal Here, we use the term FPW as meaning a situation in which the circulating progesterone levels do not substantially fall prior to parturition. At this point, we do not ascribe a specific mechanistic meaning to the term FPW. The reason is that it is still unclear how different the regulation of CRM is between species with systemic progesterone withdrawal and those without systemic progesterone withdrawal; with some authors clearly favoring the view that the mechanisms of CRM are conserved between mouse and human [55], while the question still remains whether sustained high levels of circulating progesterone have consequences for the regulation of CRM and parturition in general [56]. The question whether FPW in guinea pigs and primates is homologous, requires a detailed investigation because the guinea pig represents a basal lineage among the rodents and FPW is widespread in Euarchonta, the clade consisting of primates, tree shrews and flying lemurs (for references see above). It is thus possible that FPW could have been an ancestral state within the Archontoglires, the clade uniting primates and rodents and their close relatives. In Euarchonta, the most basal lineage with data on progesterone levels at parturition is the northern tree shrew (Tupaia belangeri). For this species, FPW has been documented, and thus it is possible that FPW can be quite old. Data about progesterone levels at parturition in non-model organisms are sparse, but sufficient data of the basal lineages of the Euarchontoglires are available to allow a unique parsimony reconstruction (Fig. 1). The conclusion that FPW evolved twice independently is based on the serum progesterone withdrawal in rabbits and Sciuridae (here represented by ground squirrel and woodchucks), which are two lineages more basal than the most recent common ancestor of guinea pigs and other rodents (Fig. 1). The outgroup of the Euarchontoglires is Lauresiatheria, which is a large group containing as diverse animals as bats, horses, cows, hedgehogs and carnivores and others. We did not perform a detailed analysis of data on circulating progesterone levels during pregnancy in this group, but we are not Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 an increased paracrine expression of prostaglandin, since PLA2 is catalyzes a rate-limiting step in prostaglandin biosynthesis. 284 | Nnamani et al. Evolution, Medicine, and Public Health aware of any member of this group to show FPW. It is widely acknowledged that sheep, cow, pig, horse, dog and cat have circulating serum progesterone withdrawal at parturition. We thus infer that the ancestral state for Euarchontoglires is withdrawal of serum progesterone. Based on these data we consider the conclusion that primates and guinea pigs evolved FPW independently as robust. Physiology of cervix remodeling in guinea pig Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 An intriguing finding about guinea pig CRM was reported by Rodrı´guez et al. [57]. The authors quantified the density of PR and ESR1 positive cells in the cervix and the uterus and reported a 50% drop in the number of PR positive cells in the sub-epithelial portion of the cervix, i.e. the cervical stroma, 1– 2 days before parturition. The authors also found a negative correlation between PR and collagen integrity, suggestive of a causal link between loss of PR expression and CRM. This finding is intriguing as it provides a mechanistic model for FPW, at least for the cervical stroma (no comparable drop in PR expression was found in the myometrium). Our data are consistent with that of Rodrı´guez et al. as we find also a 50% drop in PR mRNA expression between mid- and late-pregnancy. Interestingly, though, no comparable decrease in PR mRNA abundance was reported for human cervix [17, 58, 59]. Rodrı´guez et al. also reported only small if any increase in ESR1 expression toward term. In our data we find a modest 0.7, but significant (P < 0.01) decrease in ESR1 mRNA abundance, which is similar to the 0.84 decrease of ESR1 mRNA in human cervix [17]. Another factor implicated in guinea pig CRM is phospholipase A2 (PLA2). PLA2 catalyzes a ratelimiting step in prostaglandin synthesis by liberating arachidonic acid from phospholipids [43]. Prostaglandin E2 and F2-alpha have been shown to stimulate collagenase activity in confluent guinea pig cervical cell culture [60], and it is also long established that prostaglandins can induce cervical ripening [1]. PLA2 activity increases 20-fold between non-pregnant to later term cervix extractions [61]. The PLA2 activity measured by Rajabi and collaborators was mostly due to the 85 kD cytosolic PLA2 (cPLA2), but contributions of lower molecular weight secreted PLA2 was also detected, with a distinct protein peak around 20 kD [61]. Surprisingly, no evidence for different protein levels was found between non-pregnant and term cervices by Rajabi and Cybulsky but this was based on protein gel band intensities. In our data, the mRNA abundance for cPLA2 (PLA2G4A) was at moderate but significant 15 TPM at late term, and at the same level for cervices of non-pregnant guinea pigs. In mid-term, the level was lower at 7 TPM, which is 2 different from late-term (P ¼ 8.88 104). However, the most dramatic change we found was in a lowmolecular weight secreted PLA2 (PLA2G10, 18 kD), which is not expressed before pregnancy and increases to 200 and 400 TPM at mid- and late-term pregnancy, respectively. Hence, our data are consistent with the results of Rajabi et al. with respect to protein levels of cPLA2 in non-pregnant and late term cervices, but also shows a dramatic induction of a secreted PLA2, PLA2G10, which is not expressed before pregnancy. These data suggest an active role of prostaglandin signaling in normal CRM in guinea pigs. In guinea pigs, CRM is in part achieved through collagen degradation effected by collagenases [62, 63]. Rajabi et al. reported a 5-fold increase in procollagenase activity from Day 50 to parturition. In our data the highest RNA expression level was found for MMP2, with 1000 TPM in late pregnancy and a 1.6-fold increase from mid-term to late-term. Among the ADAMTS metalloproteases, the highest expression was found to be ADAMTS1, with a 1.28 increase relative to mid-term and a final 181 TPM. The highest fold increase was recorded for ADAMTS8 with a 6.7 increase consistent with the finding of Hassan et al., who reported a 2.9 increase in human cervix [17]. The final RNA abundance of ADAMTS8 (14 TPM), however, was more than one order of magnitude lower than that of ADAMTS1 (181 TPM) and even smaller than the 1000 TPM found for MMT2. This comparison illustrates the danger of sole reliance on fold change measures to identify biologically important factors. Overall expression of type I collagen, collagenases and inhibitors of matrix metallopeptidases do not show a clear picture that would explain extracellular matrix remodeling. In contrast, the proteoglycan expression suggests a role in CRM with increased expression of the large proteoglycan VCAN. Small proteoglycans, FMOD, testican, SPOCK2 and biglycan (BGN), in contrast, all show considerable decrease toward term. FMOD is interacting with type I and type II collagens. BGN plays a role in assembly of collagen fibrils, and testican is also involved in extracellular matrix organization. Independent evolution of functional progesterone withdrawal Comparison to human cervix 285 Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 There are no data available to directly compare the results reported in the guinea pig with those in humans. The only data set we are aware of from humans that has some relationship to the guinea pig data presented here is Affymetrix based data from ripe and unripe cervices at term [17]. Ideally, both human and the guinea pig tissues should be obtained from corresponding stages of gestation. However, there are methodological and ethical considerations to prevent such a comparison. Obtaining cervix samples from women with ongoing gestations in the mid-trimester is not feasible because of the risk of causing a spontaneous abortion. On the other hand, the diagnosis of cervical ripening in the guinea pig is difficult. Changes in collagen have been quantitated using a non-invasive approach, which is not widely available to investigators [64, 65]. Therefore, a large number of guinea pigs at term would need to be euthanized to obtain material that would include both ripe and unripe cervices. One way of relating the observations made in this study to data derived from humans is to assume that an unripe cervix at term is a cervix that did not make the transition from mid-pregnant state to a compliant, ripe cervix. From this perspective, an unripe cervix at term is probably closer to a midgestation pregnant cervix than to a ripe cervix. This assumption can be challenged; yet, despite the limitations, we believe that this comparison may offer insight into the comparative biology of cervical ripening, which may be difficult to gather otherwise at this time. Another methodological issue is the comparison of transcriptomic data derived from microarrays (Affymetrix) and RNAseq data. Studies in humans have been based on Affymetrix technology [17], while our data were obtained using RNAseq. Whether and how Affymetrix and RNAseq data can be compared relies upon whether the different scales on which Affymetrix and RNAseq measures reside affect our conclusion. A key question when exploring this comparison is whether, on average, the direction of changes in gene expression between the two data sets corresponded. To address this, we first only considered statistically significant changes from the human data, and asked whether the RNAseq data (for the same genes) varied in the same direction. We considered fold changes estimated by Affymetrix and RNAseq technology. Fold changes are scale less quantities and thus, the original scales of RNA abundance measures became irrelevant, assuming their approximate linearity. The comparisons between the human and guinea pig samples were determined by calculating the correlation among the fold change data. This essentially tested for congruence of the direction of change rather than the magnitude of fold change. The only scenario where the technological difference would matter is if a change in one direction detected by Affimetrix technology would consistently lead to a signal in the opposite direction in RNAseq data. A substantial degree of methodological pessimism would be required to accept this scenario. We find that there is the potential of shared conserved regulatory pathways—most notably, the down-regulation of ESR1 and Polybromo 1 (PBRM1) as well as the up-regulation of IGFBP4, and the increased expression of VCAN. The other changes reported in the comparison of human ripe and unripe cervical tissue are not found in our comparison of mid- and late-gestation samples from guinea pig. There are methodological limitations that could explain these differences, and thus, the interpretation needs to proceed cautiously. At the very least, these data do not support a strong similarity of gene regulatory dynamics in humans and guinea pigs (Supplementary Fig. S1). Hassan et al. reported 89 genes with significant differences between ripe and unripe cervices. Most of these are up-regulated (n ¼ 81). Of these 89 genes 76 where detected or expressed in our guinea pig samples. The correlation between the log 2-fold differences in the Hassan data and the log 2-fold expression data from guinea pig is low, r ¼ 0.054. We then determined the set of genes that have a fold change in the same direction as in human samples and are significantly different between MT and LT (concordant genes). The list of concordant genes is short, comprising only nine genes: ESR1, PBRM1, SIGLEC1, CALD1, TPM1, IGFBP4, PLN, COL4A2 and VCAN. In contrast, there are 27 genes that are discordant, found to differ in the opposite direction between unripe and ripe human cervices at term and between MT and LT guinea pig cervix. More comparative data are necessary to assess the existence of conserved core-regulatory mechanisms shared among placental mammals, or at least among the Archontoglires, the clade combining humans and mice. Any mechanistic interpretation of transcriptomic data, however, will require followup studies to exclude the possibility that the changes Nnamani et al. | 286 | Nnamani et al. Evolution, Medicine, and Public Health in RNA abundance are due to changes in tissue composition, e.g. through the recruitment of leucocytes. 5. Csapo AI, Pulkkinen MO, Wiest WG. Effects of luteectomy and progesterone replacement therapy in early pregnant patients. Am J Obstet Gynecol 1973;115: 759–65. 6. Casey ML, MacDonald PC. The endocrinology of human CONCLUSIONS AND IMPLICATIONS parturition. Ann N Y Acad Sci 1997;828:273–84. 7. Merlino AA, Welsh TN, Tan H et al. Nuclear progesterone receptors in the human pregnancy myometrium: evidence that parturition involves functional progesterone withdrawal mediated by increased expression of progesterone receptor-A. J Clin Endocrinol Metab 2007; 92:1927–33. 8. Mesiano S. Myometrial progesterone responsiveness and the control of human parturition. J Soc Gynecol Investig 2004;11:193–202. 9. Wagner GP, Tong Y, Emera D et al. An evolutionary test of the isoform switching hypothesis of functional progesterone withdrawal for parturition: humans have a weaker repressive effect of PR-A than mice. J Perinat Med 2012;40:345–51. 10. Heap RB, Deanesly R. Progesterone in systemic blood and placentae of intact and ovariectomized pregnant guineapigs. J Endocrinol 1966;34:417–23. 11. Mitchell BF, Taggart MJ. Are animal models relevant to key supplementary data aspects of human parturition? Am J Physiol Regul Integr Comp Physiol 2009;297:R525–45. Supplementary data are available at EMPH online. 12. Cao Y, Adachi J, Yano T et al. Phylogenetic place of guinea pigs: no support of the rodent-polyphyly hypothesis from maximum-likelihood analyses of multiple protein funding This research was supported, in part, by the Perinatology Research Branch, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Department of Health and Human Services (NICHD/NIH); and, in part, with Federal funds from NICHD, NIH under sequences. Mol Biol Evol 1994;11:593–604. 13. Murphy WJ, Eizirik E, O’Brien SJ et al. Resolution of the early placental mammalian radiation using Bayesian phylogenetics. Science 2001;294:2348–51. 14. Lin YH, McLenachan PA, Gore AR et al. Four new mitochondrial genomes and the increased stability of evolutionary trees of mammals from improved taxon sampling. Mol Biol Evol 2002;19:2060–70. Contract No. HSN275201300006C. 15. dos Reis M, Inoue J, Hasegawa M et al. Phylogenomic Conflict of interest: None declared. datasets provide both precision and accuracy in estimating the timescale of placental mammal phylogeny. Proc Biol Sci 2012;279:3491–500. references 16. Wagner JE, Manning PJ. The Biology of the Guinea Pig. New York: Academic Press, 1976. 1. Word RA, Li XH, Hnat M et al. Dynamics of cervical 17. Hassan SS, Romero R, Tarca AL et al. The transcriptome remodeling during pregnancy and parturition: mechan- of cervical ripening in human pregnancy before the isms and current concepts. Semin Reprod Med 2007;25: onset of labor at term: identification of novel molecular 69–79. functions involved in this process. J Matern Fetal Neonatal 2. Romero R, Espinoza J, Kusanovic JP et al. The preterm parturition syndrome. BJOG 2006;113(Suppl. 3), 17–42. 3. Csapo AI. The ‘See-Saw’ theory of parturition. In: Knight J, O’Connor M (eds). The Fetus and Birth. New York: Wiley Publishers, 2008, 159–95. 4. Csapo AI, Pulkkinen MO, Ruttner B et al. The significance Med 2009;22:1183–93. 18. Draper RL. The prenatal growth of the guinea-pig. Anat Rec 1920;18:369–92. 19. Ibsen HL. Prenatal growth in guinea-pigs with special refernce to environmental factors affecting weight at birth. J Exp Zool 1928;51:51–93. of the human corpus luteum in pregnancy maintenance. 20. Kihlstrom I. A simple method for the assessment of I. Preliminary studies. Am J Obstet Gynecol 1972;112: fetal age in guinea pig litters of various sizes, using a stat- 1061–7. istical approach. Z Versuchstierkd 1985;27:227–31. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 (1) Lack of systemic progesterone withdrawal at parturition evolved independently in the primate and the guinea pig lineages. (2) In guinea pig CRM at term is associated with changes in proteoglycan expression and less so with changes in the expression of collagen or collagenases. (3) We confirm previous evidence for a downregulation of PR mRNA expression in the guinea pig cervix at term, a feature that has not been described in humans. (4) Potentially conserved mechanisms of CRM include down-regulation of ESR1 and the nuclear receptor PBRM1 as well as the up-regulation of VCAN and IGFBP4. Independent evolution of functional progesterone withdrawal 21. Langmead B, Trapnell C, Pop M et al. Ultrafast and mem- 37. Chrousos GP, Renquist D, Brandon D et al. The ory-efficient alignment of short DNA sequences to the squirrel monkey: receptor-mediated end-organ resist- human genome. Genome Biol 2009;10:R25. 22. Langmead B, Salzberg SL. Fast gapped-read alignment ance to progesterone? J Clin Endocrinol Metab 1982;55: 287 364–8. 38. Blanco MB, Meyer JS. Assessing reproductive profiles in with Bowtie 2. Nat Methods 2012;9:357–9. 23. Trapnell C, Pachter L, Salzberg SL. TopHat: discovering female brown mouse lemurs (Microcebus rufus) from splice junctions with RNA-Seq. Bioinformatics 2009;25: Ranomafana National Park, southeast Madagascar, using fecal hormone analysis. Am J Primatol 2009;71:439–46. 1105–11. 24. Li B, Ruotti V, Stewart RM et al. RNA-Seq gene expression 39. Luckhardt M, Kaufmann P, Elger W. The structure of the estimation with read mapping uncertainty. Bioinformatics tupaia placenta. I. Histology and vascularisation. Anat Embryol (Berl) 1985;171:201–10. 2010;26:493–500. 25. Wagner GP, Kin K, Lynch VJ. Measurement of mRNA abun- 40. Concannon P, Baldwin B, Tennant B. Serum progesterone dance using RNA-seq data: RPKM measure is inconsistent profiles and corpora lutea of pregnant, postpartum, barren among samples. Theory Biosci 2012;131:281–5. 26. Wagner GP, Kin K, Lynch VJ. A model based criterion for and isolated females in a laboratory colony of woodchucks gene expression calls using RNA-squ data. Theory Biosci 41. Holekamp KE, Nunes S, Talamantes F. Patterns of (Marmota monax). Biol Reprod 1984;30:945–51. progesterone secretion in free-living California ground 2013;132:159–64. 27. Hebenstreit D, Fang M, Gu M et al. RNA sequencing re- squirrels (Spermophilus beecheyi). Biol Reprod 1988;39: 1051–9. 42. Hauber HP, Goldmann T, Vollmer E et al. LPS-induced zoan cells. Mol Syst Biol 2011;7:497. 28. Murphy WJ, Pringle TH, Crider TA et al. Using genomic data to unravel the root of the placental mammal phyl- mucin expression in human sinus mucosa can be attenuated by hCLCA inhibitors. J Endotoxin Res 2007;13: 29. Reyes FI, Winter JS, Faiman C et al. Serial serum levels 109–16. 43. Kramer RM. Structure, function and regulation of mam- of gonadotropins, prolactin and sex steroids in the non- malian phospholipase A2. In: Brown BL, Dobson RM pregnant and pregnant chimpanzee. Endocrinology 1975; (eds). Advances in Second Messenger and Phosphoprotein ogeny. Genome Res 2007;17:413–21. Research. New York: Raven Press, 1993, 81–9. 96:1447–55. 30. Smith R, Wickings EJ, Bowman ME et al. Corticotropinhormone in chimpanzee and gorilla pregnancies. J Clin Endocrinol Metab 1999;84:2820–5. 31. Bahr NI, Martin RD, Pryce CR. Peripartum sex steroid pro- 44. Bron L, Jandus C, Andrejevic-Blant S et al. Prognostic value of arginase-II expression and regulatory T-cell infiltration in head and neck squamous cell carcinoma. Int J Cancer 2013;132:E85–93. files and endocrine correlates of postpartum maternal 45. Sigstad E, Paus E, Bjøro T et al. The new molecular markers behavior in captive gorillas (Gorilla gorilla gorilla). Horm DDIT3, STT3A, ARG2 and FAM129A are not useful in Behav 2001;40:533–41. 32. Bosu WT, Johansson ED, Gemzell C. Peripheral plasma diagnosing thyroid follicular tumors. Mod Pathol 2012; 25:537–47. levels of oestrone, oestradiol-17beta and progesterone 46. Gannon PO, Godin-Ethier J, Hassler M et al. Androgen- during ovulatory menstrual cycles in the rhesus monkey regulated expression of arginase 1, arginase 2 and with special reference to the onset of menstruation. Acta interleukin-8 in human prostate cancer. PLoS One 2010; Endocrinol (Copenh) 1973;74:732–42. 5:e12107. 33. Walsh SW, Stanczyk FZ, Novy MJ. Daily hormonal changes 47. Zheng HQ, Zhou Z, Huang J et al. Kruppel-like factor 5 in the maternal, fetal, and amniotic fluid compartments promotes breast cell proliferation partially through before parturition in a primate species. J Clin Endocrinol upregulating the transcription of fibroblast growth factor Metab 1984;58:629–39. binding protein 1. Oncogene 2009;28:3702–13. 34. Albrecht ED, Townsley JD. Serum progesterone in the 48. Abuharbeid S, Czubayko F, Aigner A. The fibroblast growth pregnant baboon (Papio papio). Biol Reprod 1976;14: factor-binding protein FGF-BP. Int J Biochem Cell Biol 610–2. 35. Chambers PL, Hearn JP. Peripheral plasma levels of 2006;38:1463–8. 49. Llaurado´ M, Majem B, Castellvı´ J et al. Analysis of gene progesterone, oestradiol-17 beta, oestrone, testosterone, expression regulated by the ETV5 transcription factor androstenedione and chorionic gonadotrophin during in OV90 ovarian cancer cells identifies FOXM1 over- pregnancy in the marmoset monkey, Callithrix jacchus. expression in ovarian cancer. Mol Cancer Res 2012;10: J Reprod Fertil 1979;56:23–32. 914–24. 36. Steinetz BG, Randolph C, Mahoney CJ. Patterns of relaxin 50. Chu XY, Zhu ZM, Chen LB et al. FOXM1 expression and steroids in the reproductive cycle of the common correlates with tumor invasion and a poor prognosis marmoset (Callithrix jacchus): effects of prostaglandin of colorectal cancer. Acta Histochem 2012;114:755–62. F2 alpha on relaxin and progesterone secretion during 51. Xia JT, Wang H, Liang LJ et al. Overexpression of FOXM1 is pregnancy. Biol Reprod 1995;53:834–9. associated with poor prognosis and clinicopathologic Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 veals two major classes of gene expression levels in meta- releasing Nnamani et al. | 288 | Nnamani et al. Evolution, Medicine, and Public Health stage of pancreatic ductal adenocarcinoma. Pancreas 2012;41:629–35. 52. Xue YJ, Xiao RH, Long DZ et al. Overexpression of FoxM1 is associated with tumor progression in patients with clear cell renal cell carcinoma. J Transl Med 2012;10:200. 59. Stjernholm-Vladic Y, Wang H, Stygar D et al. Differential regulation of the progesterone receptor A and B in the human uterine cervix at parturition. Gynecol Endocrinol 2004;18:41–6. 60. Rajabi M, Solomon S, Poole AR. Hormonal regulation of 53. Dai B, Gong A, Jing Z et al. Forkhead box M1 is regulated by heat shock factor 1 and promotes glioma cells sur- interstitial collagenase in the uterine cervix of the pregnant guinea pig. Endocrinology 1991;128:863–71. vival under heat shock stress. J Biol Chem 2013;288: 61. Rajabi MR, Cybulsky AV. Phospholipase A2 activity is 1634–42. increased in guinea pig uterine cervix in late pregnancy 54. Snegovskikh V, Park JS, Norwitz ER. Endocrinology of and at parturition. Am J Physiol 1995;269(5 Pt 1), E940–7. parturition. Endocrinol Metab Clin North Am 2006;35: 62. Rajabi MR, Dodge GR, Solomon S et al. Immunochemical 173–91, viii. and immunohistochemical evidence of estrogen- mediated collagenolysis as a mechanism of cervical during pregnancy and parturition. Trends Endocrinol Metab 2010;21:353–61. dilatation in the guinea pig at parturition. Endocrinology 1991;128:371–8. 56. Mahendroo M. Cervical remodeling in term and preterm 63. Hegele-Hartung C, Chwalisz K, Beier HM et al. Ripening of birth: insights from an animal model. Reproduction 2012; the uterine cervix of the guinea-pig after treatment with the 143:429–38. 57. Rodrı´guez HA, Kass L, Varayoud J et al. Collagen progesterone antagonist onapristone (ZK 98.299): an remodelling in the guinea-pig uterine cervix at term is associated with a decrease in progesterone receptor 64. Chwalisz K, Shao-Qing S, Garfield RE et al. Cervical ripening in guinea-pigs after a local application of nitric expression. Mol Hum Reprod 2003;9:807–13. 58. Stjernholm Y, Sahlin L, Akerberg S et al. Cervical ripening oxide. Hum Reprod 1997;12:2093–101. 65. Fittkow CT, Shi SQ, Bytautiene E et al. Changes in light- in humans: potential roles of estrogen, progesterone, and induced fluorescence of cervical collagen in guinea insulin-like growth factor-I. Am J Obstet Gynecol 1996;174: pigs during gestation and after sodium nitroprusside 1065–71. treatment. J Perinat Med 2001;29:535–43. electron microscopic study. Hum Reprod 1989;4:369–77. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 55. Timmons B, Akins M, Mahendroo M. Cervical remodeling 75 Evolution, Medicine, and Public Health [2013] pp. 75–85 doi:10.1093/emph/eot005 orig inal research article The adolescent transition under energetic stress Body composition tradeoffs among adolescent women in The Gambia 1 Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA; 2MRC Keneba, MRC Unit, The Gambia; 3MRC International Nutrition Group, London School of Hygiene and Tropical Medicine, London, WC1E 7HT, UK; and 4MRC Human Nutrition Research, Elsie Widdowson Laboratory, Cambridge, CB1 9NL, UK *Correspondence address. Department of Human Evolutionary Biology, Harvard University, 11 Divinity Avenue, Cambridge, MA 02138, USA. Tel:+1-617-495-1679; Fax:+1-617-496-8041; E-mail mreiches@fas.harvard.edu Received 30 January 2013; revised version accepted 23 February 2013 ABSTRACT Background and objectives: Life history theory predicts a shift in energy allocation from growth to reproductive function as a consequence of puberty. During adolescence, linear growth tapers off and, in females, ovarian steroid production increases. In this model, acquisition of lean mass is associated with growth while investment in adiposity is associated with reproduction. This study examines the chronological and developmental predictors of energy allocation patterns among adolescent women under conditions of energy constraint. Methodology: Fifty post-menarcheal adolescent women between 14 and 20 years old were sampled for weight and body composition at the beginning and end of 1 month in an energy-adequate season and 1 month in the subsequent energy-constrained season in a rural province of The Gambia. Results: Chronologically and developmentally younger adolescent girls gain weight in the form of lean mass in both energy-adequate and energy-constrained seasons, whereas older adolescents lose lean mass under conditions of energetic stress (generalized estimating equation (GEE) Wald chi-square comparing youngest tertile with older two tertiles 9.750, P = 0.002; GEE Wald chi-square comparing fast- with slow-growing individuals for growth rate 19.806, P < 0.001). When energy is limited, younger adolescents lose and older adolescents maintain fat (GEE Wald chi-square for interaction of age and season 6.568, P = 0.010; GEE Wald chi-square comparing fast- with slow-growing individuals for interaction of growth rate and season 7.807, P = 0.005). Conclusions and implications: When energy is constrained, the physiology of younger adolescents invests in growth while that of older adolescent females privileges reproductively valuable adipose tissue. K E Y W O R D S : life history theory; puberty; body composition; reproductive ecology ß The Author(s) 2013. Published by Oxford University Press on behalf of the Foundation for Evolution, Medicine, and Public Health. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Meredith W. Reiches*1, Sophie E. Moore2,3, Andrew M. Prentice2,3, Ann Prentice2,4, Yankuba Sawo2 and Peter T. Ellison1 76 | Reiches et al. Evolution, Medicine, and Public Health BACKGROUND AND OBJECTIVES equates to 300 average kilocalories per day for the 9 months of gestation (calculated with the equation from Aiello and Key [13] for a 42.2 kg !Kung woman) and 640 kcal per day for 6 months of lactation [14]. At the same time, adiposity is not the only somatic reproductive asset in women: in some developing world populations, female height correlates positively with marriageability and with reproductive success [15–17], suggesting that somatic investments in linear growth—or in one of its correlates, such as pelvic growth—may yield reproductive dividends. It is important to keep in mind, however, that greater height may indicate that growth has already ceased and the individual is prepared to invest in reproduction. This study investigated the determinants of somatic allocation strategy in energetically constrained adolescent women, many of whom have not completed linear growth. We asked the question: What developmental and chronological markers predict the transition from preferential investment in growth in the form of lean mass to investment in reproduction in the form of fat mass? We predicted that, in an energetically constrained population of adolescent women in The Gambia, developmental age would predict somatic energy allocation strategy. The answer to this question will contribute to our understanding of what constitutes evolutionarily relevant cues to modulating the tempo of reproductive maturation in females. METHODOLOGY Study subjects and field site Participants were 67 adolescent females between 14 and 19 years at enrolment, born to mothers enrolled in a 1989–94 protein, energy and micronutrient supplementation trial conducted across the rural West Kiang Region of The Gambia and co-ordinated by the Medical Research Council (MRC) field station in Keneba, The Gambia. Half the mothers received pregnancy supplements from 20 weeks gestation until delivery whereas the other half received supplements from delivery for 20 weeks. Daily supplements were 4250 kJ and 22 g protein [18]. Subjects enrolled in this study were resident in West Kiang, a rural province of The Gambia where the highly seasonal environment consists of a hungry season (June to October) characterized by population weight loss and a harvest season in which weight is gained Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Puberty is the transition from non-reproductive juvenility to reproductively capable maturity. In the terms of life history theory [1], puberty represents a transition in energy allocation: during the juvenile period, energy available beyond the requirements of maintenance is used for growth, as demonstrated by accelerated growth rates in well-nourished populations relative to energy-constrained populations [2]. At puberty, this surplus energy begins to be invested in reproductive function [3]. For human females, reproductive function is reflected by ovarian steroid production. Ovarian estradiol promotes the conversion of energy into adipose tissue [4], which is mobilized during gestation and lactation [5]. In the adolescent female body, therefore, acquisition of lean mass, comprising bones, muscle, water and organs, equates, in life history terms, to investment in growth, while acquisition or preferential maintenance of adipose tissue can be understood as investment in reproduction. Although puberty itself begins with a specific endocrine event, the initiation of pulsatile gonadotropin releasing hormone (GnRH) secretion from the arcuate nucleus of the hypothalamus [6, 7], the transition from juvenility to maturity occurs over the course of several years (Supplementary Fig. S1). Linear growth continues during this time, with height velocity in females generally peaking a year prior to menarche [8]. The overlap of these two phenomena, the adolescent linear height spurt and the externally visible sign of maturing gonadal function, indicates that adolescent physiology must allocate energy simultaneously to growth and reproductive function. This functional overlap is in keeping with the constrained fecundity seen in the years immediately after menarche, a phenomenon often referred to as ‘adolescent sterility’ but more accurately termed ‘adolescent subfecundity’ [9, 10]. The role of energy availability in mediating the timing of pubertal maturation in traditional and industrialized populations has been documented: while a high ratio of adult to juvenile extrinsic mortality risk promotes early age at maturity even when energy is limited [11, 12], there is generally a negative relationship between energy availability and juvenile growth rates on the one hand and age at puberty on the other hand [2]. Less is known, however, about the determinants of somatic energy allocation during puberty. This is a particularly relevant question for females, for whom a single reproductive event Reiches et al. | Female adolescent transition in The Gambia [19]. All daughters enrolled in this study were born in the hungry season, June through October inclusive, the time of year when the pregnancy supplement had the greatest impact on birth weight [18]. Participants were post-menarcheal, not pregnant, had reported at least one period since parturition if lactating and were not using hormonal contraception. Study design Anthropometry Height, weight and triceps skinfold thickness were measured in triplicate by the same trained observers. Height was measured in barefoot participants to the nearest millimeter with a stadiometer (Leicester height measure, Seca 214), calibrated daily with a wooden rod of known length. Weight was measured in light clothing in triplicate to the nearest 0.1 kg on a battery-operated scale (Tanita Corporation, Japan), placed on a level surface and calibrated daily with a 10-kg weight. Skinfold measures were taken to the nearest 2 mm with Holtain calipers (Holtain). Body composition Body composition was measured with the Tanita BC-418MA segmental body composition analyzer at the beginning and end of each sampling season. Prins et al. [20] validated the Tanita inbuilt prediction equation estimates against total body water estimates of body composition in Gambian children and developed a population-specific equation for the estimation of percent fat-free mass. The correlation of this estimation equation with estimates from deuterium was R = 0.84 (95% CI 0.79–0.89) [20]. A modified version of this equation, which includes using triceps skinfold measures, was used to convert Tanita impedance readouts in Ohms into estimates of fat and lean mass [21]. Biological samples Please see Supplementary Materials for detailed collection and analysis methods for C-peptide of insulin and leptin. Statistical analysis Statistical analysis was performed in IBM SPSS Statistics Version 19.0. GEEs were used to assess the effect of chronological age, gynecological age and height velocity on within-season changes in weight and body composition (fat mass and lean mass). Marginal means reflect uneven sample sizes in groups. Potential covariates included in the models were age, height and weight (for analyses of fat and lean mass). Results were considered significant at P < 0.05. Ethics Ethics approval for the study in The Gambia was granted by the joint Gambia Government/MRC The Gambia Ethics Committee (proposal SCC 1169). Permission for Harvard personnel to conduct the study was granted at Harvard by the Committee for the Use of Human Subjects (application #F-17744-102). RESULTS Study subject characteristics Chronological, developmental and anthropometric characteristics of study participants are detailed in Table 1. Because no effect of maternal treatment group was found in analysis, treatment group was not included as a term (data not presented). Of the 67 women enrolled, data for this analysis were available for 50. The other 17 women could not be located for end-of-month data collection. Differences in sample size between the harvest and hungry season are due to participants becoming pregnant, transferring out of the study area or withdrawing from the study. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Data reported here were collected in March during the 2010 harvest season and during a 30-day period spanning July and August in the 2010 hungry season. Anthropometric measurements, weight and body composition were measured at the beginning of each data collection period at MRC Keneba, using standard procedures with regularly validated equipment (see below). Weight and body composition were measured again at the end of the data collection period in participants’ villages. First morning fasting urine samples and non-fasted morning blood spots were collected approximately weekly at participants’ homes and transported to MRC Keneba laboratory for processing. 77 78 | Reiches et al. Evolution, Medicine, and Public Health Predictor variables Age Age refers to chronological age at the beginning of the study season. In the present analysis, the youngest tertile was compared with the group comprising the older two tertiles. This is because analysis revealed clear biological differences between those closer to the beginning of puberty and thus in the midst of pubertal growth relative to those who were nearer completion of growth and maturation processes. Height velocity Height velocity in cm/year was estimated in all individuals who were present for anthropometric measurement in at least two sampling seasons. Analysis compared individuals whose growth rate met or exceeded 1.0 cm/year, ‘fast growers’, with those whose growth rates were <1.0 cm/year, ‘slow growers’. Height velocity is a better proxy of maturity than height in the post-menarcheal period when agerelated height differences are less important than final height differences. Relationship among predictor variables Chronological and developmental variables are correlated both biologically and statistically. Each is presented separately here because each tracks a slightly different biological process: while age marks the passage of time, with which the probability of maturational events increases, gynecological age signifies distance from a threshold reproductive event in an individual’s unique maturational history. Height velocity, meanwhile, eventually reaches zero in all individuals; a snapshot of height velocity therefore allows an estimate of how close to this predetermined endpoint a given adolescent may be. Table 1. Participant characteristics by season Age (years) Gynecological age (years) Height (cm) Start of season weight (kg) End of season weight (kg) Start of season % fat (Tanita derived) End of season % fat (Tanita derived) Log leptin (ng/ml) Log C-peptide of insulin ng/creatinine (mg) Harvest season (mean ± SE), n = 47 Hungry season (mean ± SE), n = 29 Season Wald-chi square and P-value 17.30 (0.21) 2.2 (0.2) 161.0 (0.8) 52.7 (1.1) 52.6 (1.1) 21.8% (0.6) 21.8% (0.6) 1.1 (0.0) 1.2 (0.0) 17.98 (0.28) 2.9 (0.3) 161.4 (1.2) 55.8 (1.6) 54.9 (1.5) 24.0% (0.7) 23.3% (0.8) 1.0 (0.0) 0.9 (0.1) 5700, P < 0.001*** 5710, P < 0.001*** 10.7, P = 0.001** 40.2, P < 0.001*** 6.29, P = 0.012* 67.9, P < 0.001*** 10.2, P = 0.001** 4.32, P = 0.038* 12.8, P < 0.001*** This table represents the subset of 50 individuals for whom beginning and end of season weight and body composition data are available. *P < 0.05. **P < 0.01. ***P < 0.001. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Gynecological age Gynecological age is years since menarche. Menarcheal age, as self-reported recall data, is prone to errors of memory [22], which are compounded, in this case, by differences between researchers’ and participants’ concepts of time. Nonetheless, three lines of evidence indicate that menarcheal ages reported here are biologically relevant. First, median age at menarche in the study population was 15.00 years (95% confidence interval 14.92–15.42), while a recent probit analysis of age at menarche in the same population found a similar median menarcheal age of 14.90 (95% confidence interval 14.52–15.28) [23]. Second, there was no significant variation in reported menarcheal age relative to date of birth: an ANOVA assessing age at menarche by year of birth was non-significant (F = 1.64, P = 0.16). Third, we would predict that developmentally younger individuals have greater height velocity. As expected, average height velocity across the study period was associated negatively with gynecological age (GEE estimated marginal means of height velocity for gynecological age tertiles 1.0 cm/year in the youngest group, 0.9 cm/year in the middle group and 0.7 cm/year in the oldest group; Wald chi-square for gynecological age by tertile 14.6, P = 0.001, n = 52). As with age, the youngest tertile was compared with the older two tertiles in GEE analyses. Reiches et al. | Female adolescent transition in The Gambia Age and gynecological age tertiles were assigned across the full sample, of which a subset is represented in within-season weight and body composition measurements. Therefore, there were different numbers of individuals in the age and gynecological age tertiles. Overlap among age and gynecological age tertiles and height velocity categories in the harvest and hungry seasons is illustrated in Supplementary Fig. S2. Predictors of within-season weight change On all measures of chronological and developmental age, younger and faster growing individuals gained more weight than older, slower growing individuals in both the harvest season and the hungry season (Fig. 1). (For this analysis and for those below, Wald chi-square statistics for factors and covariates are available in Table 2 and estimated marginal means of group differences are in Table 3.) Both the gynecological age and height velocity models confirmed the finding that weight gain was more positive in the harvest season (Table 2). Height at entry into the study was positively associated with weight gain when age was a predictor (Table 2). This finding does not necessarily contradict the association between youth and weight gain, because height and age were not correlated in the sample population (GEE ns). Predictors of within-season lean mass change In both hungry and harvest seasons, younger and more rapidly growing individuals gained more lean mass than older and less rapidly growing individuals (Fig. 2 and Tables 2 and 3). The youngest tertile in gynecological age gained more lean mass than older tertiles in both seasons (Table 2). When participants were divided by age, older individuals lost lean mass in the hungry season whereas the youngest tertile did not (Tables 2 and 3). Similarly, slower but not faster growing individuals lost lean mass in the hungry season (Tables 2 and 3). Predictors of within-season fat mass change Chronologically younger individuals lost significantly more fat in the hungry season than in the harvest season (Fig. 2 and Tables 2 and 3). Although the mean fat change among older individuals in the hungry season was likewise negative, it was not significantly different from this group’s within-season fat change in the harvest season (Table 2). This pattern was echoed in height velocity groups: while fat mass remained constant for both fast and slow growers in the harvest season, the slow growers maintained fat mass in the hungry season while fast growers lost it (Fig. 2 and Tables 2 and 3). Gynecological age alone did not predict within-season fat mass change. CONCLUSIONS AND IMPLICATIONS Data from the study sample indicate that, during the adolescent life history transition, the bodies of postmenarcheal adolescent women in The Gambia responded to energetic stress with somatic energy allocation strategies that appeared to differ by age and developmental stage. Those who were Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Differences in energy availability between harvest and hungry seasons The current data indicate that harvest season was characterized by greater energy availability than the hungry season. Three lines of evidence demonstrate this difference. First, within season weight change in the study population was positive in the harvest season and negative in the hungry season (weight change in kg estimated marginal mean in the harvest season 0.2 (SE 0.2), hungry season 0.3 (SE 0.1), Wald chi-square of season 7.84, P = 0.005, age covariate Wald chi-square 7.47, P = 0.006, = 0.210, n = 49). Second, the population as a whole maintained fat mass in the harvest season and lost fat mass in the hungry season (fat mass change in kg estimated marginal mean in the harvest season 0.0 (SE 0.1), hungry season 0.4 (SE 0.1), Wald chi-square of season 12.0, P = 0.001). Finally, leptin and C-peptide of insulin, endocrine markers of long- and short-term energy status, respectively, were significantly higher in the harvest season than in the hungry season (log leptin in ng/ml estimated marginal mean in the harvest season 1.1 (SE 0.0), hungry season 0.95 (SE 0.0), GEE Wald chi-square for season 39.6 P < 0.001, fat mass covariate Wald chi-square 104.5, P < 0.001, = 7.107 E 6, n = 53; log C-peptide ng/Cr mg estimated marginal mean in the harvest season 1.1 (SE 0.0), hungry season 0.93 (SE 0.0), Wald chisquare of season 11.9, P = 0.001, age covariate Wald chi-square 6.58, P = 0.01, = 0.072, n = 52). Taken together, these results indicate that energy was limited in the hungry season relative to the harvest season, and the impact of energy constraint on weight and on C-peptide of insulin was greater in older individuals. Height and weight were not significant in any of the models and thus were not included as covariates. 79 | Reiches et al. Evolution, Medicine, and Public Health Within Season Weight Change age: youngest tertile age: older two tertiles gyn age: youngest tertile gyn age: older two tertiles height vel: ≥ 1 cm/yr height vel: < 1 cm/yr 1 0.8 0.6 0.4 0.2 Mass kg 80 0 0 0.5 1 1.5 2 2.5 3 -0.2 -0.4 -0.6 -0.8 -1 Harvest Season Hungry Season Figure 1. Within season weight change in the harvest and hungry seasons as predicted by age (P = 0.005), gynecological age (P = 0.002) and height velocity (P = 0.005). Wald chi-square statistics and additional P-values are in Table 2. Estimated marginal means and standard errors are in Table 3 chronologically younger and gaining significant height preferentially acquired lean mass, both in a season of relative energy abundance and in a season of relative energy constraint, mobilizing adipose tissue during the hungry season. In contrast, adolescent women who had neared or reached the end of linear growth and those who were chronologically older lost lean mass during the hungry season. These results are consistent with earlier findings that pregnant women who are still growing allocate a higher proportion of energy to maternal relative to fetal tissue than do comparably aged non-growing pregnant women [24] (though see also [25]). The current findings are also consistent with life history theory, suggesting that shifts in somatic priorities of energy allocation occur progressively during puberty. Given that all participants were postmenarcheal and height velocities were low across the sample, it is possible that differences in intrasomatic allocation strategy would be even more apparent in a sample including younger adolescents. When height velocity alone was considered, it appeared that slower growing adolescent women mobilized lean mass and preserved fat mass, suggesting that ovarian function in this subset of individuals may have been more robust than in faster growing women, with higher estradiol levels promoting maintenance of adipose depots important to gestation and lactation. Additional research will be needed to establish where on the body adipose tissue is preferentially maintained in developmentally and chronologically older adolescents under conditions of energy constraint. The authors hypothesize that gluteofemoral adipose depots will be favored, as these reserves have been shown to support gestation and lactation [26]. The findings reported here differ from and contribute to previous research in significant ways. Although cross-sectional patterns in female body composition with age and parity have been reported [27], short-term longitudinal shifts in intra-somatic allocation among fat and lean mass during puberty have not. Prior analyses of body composition in nonpregnant, non-lactating Gambian women found preferential mobilization of fat tissue during the hungry season in study participants 20–35 years of age [28], indicating that the result reported here may be specific to the pubertal period. Second, we did not detect energy sparing in activity during the hungry season relative to the harvest season, in contrast to data from a similar population in Senegal [29]. One finding that requires further exploration is the relative importance of height velocity compared with gynecological age in predicting within-season somatic energy allocation strategies in adolescent Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 -1.2 Reiches et al. | Female adolescent transition in The Gambia 81 Table 2. GEE Wald chi-square and P-values for analyses of age, gynecological age and height velocity as predictors of within-season change in weight and body composition Wald chi-square and P-values Lean Fat 7.89, P = 0.005** 3.53, P = 0.060 0.150, P = 0.699 Height 3.95, P = 0.047*, = 0.032 9.75, P = 0.002** 0.18, P = 0.672 4.255, P = 0.039* Height 4.71, P = 0.030*, = 0.036 0.56, P = 0.453 19, P < 0.001*** 6.568, P = 0.010* NS Gynecological age Gynecological age Season Interaction Covariate and b 9.45, P = 0.002** 7.11, P = 0.008** 0.01, P = 0.905 NS 4.53, P = 0.033* 0.10, P = 0.750 0.28, P = 0.599 Weight 4.36, P = 0.037*, = 0.025 NS NS NS NS Height velocity Height velocity Season Interaction Covariate and b 7.79, P = 0.005** 8.68, P = 0.003** 0.03, P = 0.854 NS 19.8, P < 0.001*** 3.16, P = 0.076 3.63, P = 0.057 NS 0.67, P = 0.412 5.76, P = 0.016* 7.81, P = 0.005** Age 4.26, P = 0.039*, = 0.089 Age Age Season Interaction Covariate and b Age and gynecological age comparisons are between the youngest tertile and older two tertiles, while height velocity comparisons are between individuals growing 1 cm/year and those growing <1 cm/year. *P < 0.05. **P < 0.01. ***P < 0.001. Table 3. Estimated marginal means and standard error for within-season weight change Weight, kg (SE) Harvest Age Youngest tertile Older two tertiles Gynecological age Youngest tertile Older two tertiles Height velocity 1 cm/year <1 cm/year Hungry Lean mass, kg (SE) Harvest Hungry 0.5 (0.2) 01 (0.2) 0.2 (0.2) 0.5 (0.2) 0.4 (0.2) 0.2 (0.2) 0.9 (0.2) 0.2 (0.2) 0.7 (0.2) 0.0 (0.2) 0.2 (0.2) 0.5 (0.2) 0.5 (0.2) 0.1 (0.2) 0.5 (0.2) 0.0 (0.2) 0.6 (0.2) 0.1 (0.3) 0.1 (0.1) 0.8 (0.3) 0.5 (0.2) 0.0 (0.2) 0.6 (0.2) 0.7 (0.2) Weight changes occur over a month. Harvest season n = 47. Hungry season n = 29. Fat mass, kg (SE) Harvest 0.1 (0.1) 0.1 (0.1) NS NS 0.0 (0.1) 0.1 (0.1) Hungry 0.8 (0.2) 0.4 (0.1) NS NS 0.6 (0.1) 0.0 (0.1) Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Weight 82 | Reiches et al. Evolution, Medicine, and Public Health (age P = 0.002, season NS, interaction P = 0.039). Wald chi-square statistics in Table 2; estimated marginal means in Table 3. (b) Comparison of lean mass change in the harvest season and the hungry season in individuals growing 1 cm/year relative to those growing <1 cm/year (height velocity P < 0.001, season NS, interaction NS). Wald chi-square statistics in Table 2; estimated marginal means in Table 3. (c) Comparison of fat mass change in the harvest season and the hungry season in the youngest and older two tertiles (age NS, season P < 0.001, interaction P = 0.010). Wald chi-square statistics in Table 2; estimated marginal means in Table 3. (d) Comparison of fat mass change in the harvest season and the hungry season in individuals growing 1 cm/ year relative to those growing <1 cm/year (height velocity NS, season P = 0.016, interaction P = 0.005). Wald chi-square statistics and additional P-values in Table 2; estimated marginal means in Table 3 women. Specifically, why was height velocity important while distance from menarche appeared to be less salient as a determinant of fat mass change? We will consider two complementary explanations for the importance of height in this study, since prioritizing investment in linear growth suggests that height itself is biologically significant. (Recall that height and age were not correlated in the sample population, suggesting that younger, faster growing individuals were not merely approaching a height threshold but rather were investing in height on its own behalf, or on behalf of a physiologically relevant trait that correlates with height.) The lesser explanatory power of gynecological age may be understood by keeping in mind two characteristics of pubertal maturation: first, menarche represents not a point of biological transition, as from sterility to fecundability, but rather a threshold at which the functioning of the hypothalamic–pituitary–ovarian axis becomes visible. Follicular estradiol production has reached a level at which the products of endometrial lining proliferation can no longer be resorbed and must be shed [30]. This proliferation, however, is no guarantee that ovulation has occurred [9]. The relationship of menarche to somatic energy allocation, therefore, is not completely clear: while there must be pubertal levels of circulating estradiol for menarche to occur, estradiol contributes both to reproductive function and to linear growth [31], making its relative importance in these two processes at the adolescent threshold point indeterminate. Differences in the relationship Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Figure 2. (a) Comparison of lean mass change in the harvest season and the hungry season in the youngest and older two tertiles Reiches et al. | Female adolescent transition in The Gambia reproduction, but it must prioritize which type of somatic store to preserve and which to mobilize when energy is limiting. The ability to negotiate these tradeoffs adaptively during the pubertal transition is necessary to acquiring the somatic capital that underwrites reproduction while taking advantage of reproductive opportunities at energetically favorable moments. In considering these results it is important to note that they focus on individuals in the 15–20-year age range who are not pregnant or in lactational amenorrhea. It is possible that there were physiological differences in somatic energy allocation strategy between this group and their age- and size-matched peers who were pregnant or in lactational amenorrhea and thus were not eligible for inclusion in this study. Additional limitations include unequal sample sizes in hungry and harvest seasons and a relatively small overall sample size. In summary, changes in weight and body composition over the course of an energetically constrained hungry season and a less energetically constrained harvest season indicated that chronologically and developmentally younger adolescent women preferentially allocate somatic resources to growth, while their older and more developed peers preserve reproductively valuable adipose tissue when energy is limiting. These results support an understanding of adolescence as a period of life history transition from juvenile growth to mature reproductive investment. The importance of height velocity as a predictor of somatic allocation strategy underscores its status as proxy for the degree to which the growth period is complete and the reproductive period begun. supplementary data Supplementary data is available at EMPH online. acknowledgments The authors wish to thank the women of West Kiang whose participation made the study possible. Special thanks go also to the field, Calcium, Vitamin D and Bone Health (CDBH), and data management teams, and to laboratory technicians at MRC Keneba, particularly Musa Colley. In the USA, the assistance of Dr Susan Lipson and Megan Verlage in the laboratory was invaluable. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 of height velocity, final height and menarche in resource-constrained and resource-abundant populations affirm the variability of the menarche-growth relationship [11]. Second, menarche typically occurs about a year following peak height velocity [8]. Given that there is inter-individual variation in the magnitude and duration of the pubertal spurt in linear growth (Supplementary Fig. S3), gynecological age or distance from menarche may correspond to different points in the individual’s linear growth trajectory [32]. That growth trajectory itself may be more theoretically and practically robust as a predictor of somatic energy allocation. Second, height velocity may be a signal of structural maturation that is easy to measure outwardly and that serves as a visible indicator of other dimensions of skeletal maturation, such as remodeling of the interior dimensions of the bony pelvis in females, which, even more than increases in bi-iliac breadth or hip circumference, is the axis of anatomical change most important to successful parturition in humans [33]. A related and not mutually exclusive possibility is that height is one marker of reproductive value in resource-scarce ecologies, perhaps constituting a metric of the quality of the individual’s developmental environment and thus the somatic resources that she will be able to invest in reproduction [3]. Height associates positively with marriageability [15] and reproductive success [16] in many non-Western populations, including The Gambia [17], though the significant outcome measure in the Gambian population is not number of births but survival of offspring, indicating that women of different heights may be able to allocate different amounts of energy to fetal or infant growth or immune function. It is worthwhile to note that there was strong evidence of growth and maturation in height, weight and adiposity in the study population across seasons (Table 1), even as within-season changes in weight and fat mass tended to be negative in the hungry season. The shorter term patterns detected through within-season analysis revealed responsiveness to energetic stress that was not discernible from data collected at less frequent intervals. Subtle seasonal changes in weight and body composition of the kind documented here likely reflect the type of facultative shifts in somatic energy allocation that conferred a selective advantage over the course of human evolution: the body must not only invest available resources in the most beneficial life history category, growth or 83 84 | Reiches et al. Evolution, Medicine, and Public Health nourished lactating women. Am J Clin Nutr 1991;54: funding The research was supported by a Doctoral Dissertation Improvement Grant (BCS-0925768) and by a Senior Research Grant (BCS-0921237) from the National Science Foundation of the USA. This work was supported by the UK Medical Research Council under program numbers MC-A760-5QX00, U105960371 and U123261351. 788–98. 15. Smits J, Monden CWS. Taller indian women are more successful at the marriage market. Am J Hum Biol 2012;24: 473–8. 16. Stulp G, Verhulst S, Pollet TV et al. The effect of female height on reproductive success is negative in western populations, but more variable in non-western populations. Am J Hum Biol 2012;24:486–94. 17. Sear R, Allal N, Mace R et al. Height and reproductive Conflict of interest: None declared. success among Gambian women. Am J Hum Biol 2004; 16:223. 18. Ceesay SM, Prentice AM, Cole TJ et al. Effects on birth weight and perinatal mortality of maternal dietary supple- references ments in rural Gambia: 5 year randomised controlled trial. 1. Gadgil M, Bossert WH. Life historical consequences of natural selection. Am Nat 1970;104:1–24. 2. Eveleth PB, Tanner JM. Worldwide Variation in Human 1990. energy balance in child-bearing Gambian women. Am J Clin Nutr 1981;34:2790–9. 20. Prins M, Hawkesworth S, Wright A et al. Use of 3. Charnov EL, Berrigan D. Why do primates have such long bioelectrical impedance analysis to assess body compos- lifespans and so few babies?. Evol Anthropol 1993;1:191–4. ition in rural Gambian children. Eur J Clin Nutr 2008;62: 4. Pallottini V, Bulzomi P, Galluzzo P et al. Estrogen regula- 1065–74. tion of adipose tissue functions: involvement of estrogen 21. Hawkesworth S, Prentice AM, Fulford AJ et al. Dietary sup- receptor isoforms. Infect Disord Drug Targets 2008;8: plementation of rural Gambian women during pregnancy 52–60. does not affect body composition in offspring at 11–17 5. Demmelmair H, Baumheuer M, Koletzko B et al. Metabolism of U13C-labeled linoleic acid in lactating women. J Lipid Res 1998;39:1389–96. years of age. J Nutr 2008;138:2468–73. 22. Koo MM, Rohan TE. Accuracy of short-term recall of age at menarche. Ann Hum Biol 1997;24:61–4. 6. Knobil E. The neuroendocrine control of the menstrual cycle. Recent Prog Horm Res 1980;36:53–88. 23. Prentice S, Fulford AJ, Jarjou LM et al. Evidence for a downward secular trend in age of menarche in a rural Gambian 7. Plant TM, Barker-Gibb ML. Neurobiological mechanisms of puberty in higher primates. Hum Reprod Update 2004; 10:67–77. population. Ann Hum Biol 2010;37:717–21. 24. Scholl TO, Hediger ML, Schall JI et al. Maternal growth during pregnancy and the competition for nutrients. 8. Tanner JM. Growth at Adolescence; with a General and Am J Clin Nutr 1994;60:183–8. 25. Jones RL, Cederberg HM, Wheeler SJ et al. Relationship Environmental Factors upon Growth and Maturation from between maternal growth, infant birthweight and nutrient Birth to Maturity. Springfield, III, C. C. Thomas, 1962. partitioning in teenage pregnancies. Int J Obstet Gynaecol Consideration of the Effects of Hereditary 9. Gray SH, Ebe LK, Feldman HA et al. Salivary progesterone levels before menarche: a prospective study of adolescent girls. J Clin Endocrinol Metab 2010;95:3507–11. 10. Apter D, Raisanen I, Ylostalo P et al. Follicular growth in relation to serum hormonal patterns in adolescent compared with adult menstrual cycles. Fertil Steril 1987; 47:82–8. 11. McIntyre MH, Kacerosky PM. Age and size at maturity in women: a norm of reaction?. Am J Hum Biol 2011;23: 305–12. 12. Kramer KL, Greaves RD, Ellison PT. Early reproductive ma- 2010;117:200–11. 26. Rebuffe-Scrive M, Enk L, Crona N et al. Fat cell metabolism in different regions in women. Effect of menstrual cycle, pregnancy, and lactation. J Clin Invest 1985;75: 1973–6. 27. Wells JC, Griffin L, Treleaven P. Independent changes in female body shape with parity and age: a life-history approach to female adiposity. Am J Hum Biol 2010;22: 456–62. 28. Lawrence M, Coward WA, Lawrence F et al. Fat gain during pregnancy in rural African women: the effect of turity among Pume Foragers: implications of a pooled en- season and dietary status. Am J Clin Nutr 1987;45: ergy model to fast life histories. Am J Hum Biol 2009;21: 1442–50. 430–7. 13. Aiello LC, Key C. Energetic consequences of being a Homo erectus female. Am J Hum Biol 2002;14:551–65. 14. Goldberg GR, Prentice AM, Coward WA et al. Longitudinal assessment of the components of energy balance in well- 29. Benefice E, Garnier D, Ndiaye G. High levels of habitual physical activity in west African adolescent girls and relationship to maturation, growth, and nutritional status: results from a 3-year prospective study. Am J Hum Biol 2001; 13:808–20. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 Growth. Cambridge University Press: Cambridge, UK, Br Med J 1997;315:786–90. 19. Prentice AM, Whitehead RG, Roberts SB et al. Long-term Reiches et al. | Female adolescent transition in The Gambia 30. Salamonsen LA. Current concepts of the mechanisms of 32. Tanner JM. Foetus into Man: Physical Growth from menstruation: a normal process of tissue destruction. Conception to Maturity. Harvard University Press: Trends Endocrinol Metab 1998;9:305–9. 31. Juul A. The effects of oestrogens on linear bone growth. Hum Reprod Update 2001;7:303–13. 85 Cambridge, MA, 1990. 33. Moerman ML. Growth of the birth canal in adolescent girls. Am J obstet Gynecol 1982;143:528–32. Downloaded from http://emph.oxfordjournals.org/ by guest on April 2, 2015 37 Evolution, Medicine, and Public Health [2013] pp. 37–45 doi:10.1093/emph/eot002 Socioeconomic status determines sex-dependent survival of human offspring David van Bodegom*1,2, Maarten P. Rozing1, Linda May3, Hans J. Meij1,4, Fleur Thome´se5, Bas J. Zwaan6,7 and Rudi G. J. Westendorp1,2 1 Department of Gerontology and Geriatrics, Leiden University Medical Center, PO Box 9600, 2300 RC Leiden; 2Leyden Academy on Vitality and Ageing, Poortgebouw LUMC, Rijnsburgerweg 10, 2333 AA Leiden; 3Department of Parasitology, Leiden University Medical Center, PO Box 9600, 2300 RC Leiden; 4Amphia Hospital, Postbus 90157, 4800 RL Breda; 5 Department of Sociology, VU University Amsterdam, De Boelelaan 1081, 1081HV Amsterdam; 6Laboratory of Genetics, Wageningen University, PO Box 309, 6700 AH Wageningen and 7Institute of Biology, Leiden University, PO Box 9505, 2300 RA Leiden, The Netherlands. *Corresponding author. Leyden Academy on Vitality and Ageing, Poortgebouw LUMC, Rijnsburgerweg 10, 2333 AA Leiden, The Netherlands. Tel:+31-71-5240960; Fax:+31-71-5248159; E-mail: bodegom@leydenacademy.nl. Received 31 August 2012; revised version accepted 26 January 2013. ABSTRACT Background and objectives: In polygynous societies, rich men have many offspring through the marriage of multiple wives. Evolutionary, rich households would therefore benefit more from sons, and according to the Trivers–Willard hypothesis, parents invest more in offspring of the sex that has the best reproductive prospects. We determined the sex differences in number of offspring, sex ratio of offspring, offspring survival and offspring weight in rich and poor households in a polygynous population. Methodology: We studied a population of 28 994 individuals in Northern Ghana during an 8-year prospective follow-up. We determined the fertility rate for both men and women, sex ratio of 3511 newborn offspring and offspring survival in 16 632 offspring up to reproductive age (18 years). Also, we collected 9842 weight measurements of 1470 offspring up to the age of 3 years from growth charts of local clinics. Results: In rich households, men have a lifetime number of 6.0 offspring, while for women this was 3.1. In line with evolutionary predictions, the male:female sex ratio was higher in rich households (0.52; poor households 0.49), sons had lower mortality in rich households (hazard ratio male versus female 1.06, P = 0.64; poor households: hazard ratio male versus female 1.46, P = 0.01) and sons also had higher weights in rich households (P = 0.008). Conclusions and implications: In rich households, men have higher reproductive prospects in this polygynous society and, in line with Trivers–Willard, we registered more sons in rich households, sons had lower mortality and higher weights, maximizing the reproductive output in this society. K E Y W O R D S : sex differences; Trivers–Willard; reproduction; offspring survival; offspring weight; Africa ß The Author(s) 2013. Published by Oxford University Press on behalf of the Foundation for Evolution, Medicine, and Public Health. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. orig inal research article 38 | van Bodegom et al. Evolution, Medicine, and Public Health BACKGROUND AND OBJECTIVES In polygynous societies, richer men can afford to marry multiple wives and consequently increase their reproductive success. In terms of Darwinian fitness, rich households would therefore benefit more from sons with their higher reproductive prospects. According to the Trivers–Willard hypothesis, parents invest more in offspring of the sex that has the best reproductive prospects [1]. Although Trivers–Willard effects have been found in many animals, they are highly debated in humans. In a recent review of 422 studies in mammals, which investigated sex ratios at birth, excluding humans, a Trivers–Willard effect was consistently found in several species, while in other species, including nonhuman primates, more contradictory findings are found [2]. An important consideration here is that many human studies were performed in monogamous populations [3–5]. Here, large effects are not expected since in a monogamous society, there will mostly not be large differences in reproductive output of sons and daughters. In polygynous societies, however, a subset of more successful sons can have large reproductive output through the marriage of multiple wives. Previous studies that have examined sex ratios and the Trivers–Willard effect in polygynous human populations found no sex-specific survival differences dependent on status among the Bari of South America, nor among the Gabbra and Kipsigis of Kenya [6–8]. We studied reproductive output of men and women in poor and rich households in a large population of 28 994 individuals in a rural African society in the Upper East Region of Ghana with a high degree of polygyny. Second, we investigated the differences in offspring sex ratio, sex differences in offspring survival and offspring weight in poor and rich households. METHODOLOGY Study area This study was conducted in the Garu-Tempane district in the Upper East region of Ghana. General fertility and mortality patterns have been described elsewhere [9]. The characteristics of the study population are presented in Table 1. The people are patriarchal, patrilineal and patrilocal and live in extended families, of which 48% are polygynous. During 8 years of follow-up from 2002 to 2010, we assessed reproduction and survival among 28 994 participants. The area is currently undergoing an epidemiological transition [10]. Drinking water was assessed on household level, water from boreholes was considered safe drinking water and water drawn from either open wells or from rivers was considered unsafe drinking water [11]. Socioeconomic status In 2007, we designed a DHS-type questionnaire to assess the socioeconomic status (SES) of the households of the study participants using a free listing technique whereby we asked people from different villages of the research area, both male and female, in focus group discussions to list the household items of most value [12]. These self-listed property questionnaires are reported to be highly correlated to longer property questionnaires [13]. The resulting list of valuable items was comparable to part of the core welfare indications questionnaire from the World Bank and to the extended DHS asset list, adapted to our region. The list included different items, including mainly domestic livestock and different valuable household items comprising motorbikes, bicycles and iron roofing. The average wealth of the household possessions in market value of 2007 was 1063 US dollar with a SD of 1021 US dollar. The distribution was skewed to the right. From these assets, a DHS wealth index was calculated. This was done as explained in paragraph 2.2 of the DHS wealth index comparative report [14]. Using SPSS factor analysis, the indicator variables were first standardized by calculating z-scores. Second, the factor coefficient scores or factor loadings are calculated. The DHS wealth index is the sum of the indicator values multiplied by the loadings. This index is itself a standardized score with a mean of 0 and a SD of 1. We defined poor and rich as the poorest 50% of households and the richest 50% of households divided by the median of the DHS wealth index. Fertility From the registered newborn offspring and the observed person-years of fertile men and women during our 8-year follow up, we calculated the agespecific fertility rates. Next, we multiplied the agespecific fertility rates with the fraction of surviving van Bodegom et al. | SES determines sex-dependent survival of human offspring Table 1. Characteristics of the study population Participants (n) Male (%) Female (%) 28 994 46 54 Tribe Bimoba (%) Kusasi (%) Other (%) 66 26 8 Households (n) Polygynous households (%) Mean value of household possessions in US$ (mean (SD)) Safe drinking water (%) 1703 48 1063 (1021) 80 Number offspring Numbers of offspring registered 2002–2010 (n) SES available (n) 3645 3511 Offspring survival Offspring 18 years (n) Follow-up (calenderyears) Person years (n) Mean follow-up (years) Deaths during follow-up (n) 16 632 2002–2010 91 256 5.5 471 Weights of offspring Offspring 3 years with growth chart (n) Weight measurements (n) Average number of measurements per child (n) men and women of these ages to calculate the number of offspring of each age group per year. The lifetime number of offspring was calculated as the sum of these numbers of offspring per year for all age groups multiplied by 5 since all age groups are 5-year age groups. 1470 9842 7 those cases only the person-years observed below age 18 years were included in the analysis. In total, we followed 16 632 individuals for 91 256 personyears which makes an average of 5.5 years follow-up below the age of 18 years per individual observed. During our follow-up, we observed 471 deaths below the age of 18 years. Survival The survival analysis used a multivariable lefttruncated Cox regression analysis adjusted for sex, tribe and drinking source. We found no evidence that the assumption of proportionality of hazards was violated. The left-truncated plots represent age-specific survival probabilities calculated from the 8-year follow-up rather than a prospective lifetime follow-up. For the survival analysis up to reproductive age, we included all offspring up to 18 years. This survival analysis was performed on all person-years observed 18 years during our 8-year follow-up. Some individuals were followed 8 years below the age of 18 years; some individuals were followed both below age 18 years and above and in Weights The weights of the offspring were obtained from growth charts of local health clinics in 2008. The clinics use hanging scales to measure the weight and use growth charts from the Ghana Health Service, adapted from the World Health Organization. For the separate sexes of each age, we standardized the weights on age and sex by calculating SDS or z-scores by subtracting the mean from the observed weight and dividing by the standard deviation. On average, we had seven measurements per child during their first 3 years of life. To take these repeated measures into account and not treat them 39 40 | van Bodegom et al. Evolution, Medicine, and Public Health as independent measures, we used a linear mixed model. In the model, we adjusted/corrected for tribe of the offspring. The offspring from different tribes in the area have very different biometrics. Some tribes have cows, and the offspring of these tribes drink milk. Therefore, these offspring have less stunted growth and also do not suffer from (protein) malnutrition. We also adjusted the model for drinking source and the month and year of measurement, as weights fluctuated dependent on the season and year (Supplementary Fig. S1). The point estimates presented in this article are estimates derived from this model and therefore do not always add up to zero for each age. Ethics Ethical approval was given by the Ethical Review Committee of the Ghana Health Service, the Medical Ethical Committee of the Leiden University Medical Centre in Leiden, The Netherlands, and by the local chiefs and elders of the research area. All analyses were performed with Stata 11.0 (StataCorp LP, TX, USA). RESULTS We visited the research area annually from 2002 to 2010. Each year we registered the deaths, migration and newborn offspring. Figure 1 shows the cumulative survival, age-specific fertility rates and number of offspring for men and women of different age groups from poor and rich households. Figure 2a compares these numbers of offspring born to fathers and mothers of different ages in poor and rich households. The people in the research area are polygynous and the man must pay a bride price of four cows to arrange a marriage. Consequently, richer men are able to increase their number of wives and hence offspring. Taking the age-specific fertility rate and survival to these ages into account, in poor households the total number of lifetime offspring, represented by the area under the curve in the figure, was 3.4 offspring for men and 2.7 offspring for women. In rich households, the total number of lifetime offspring was 6.0 offspring for men, whereas it was 3.1 offspring for women. Studies have shown a strong heritability of SES in pre-transitional societies [15]. This seems applicable to this population also, since income is generated largely through agriculture and sons inherit the cattle and land of their fathers. If offspring inherits the SES from their parents and rich men have better reproductive prospects, one could hypothesize that rich households would benefit more from sons, which would create an opportunity for selection on sex-specific survival dependent on SES. We compared the sex ratio of offspring, offspring survival and offspring weights in poor and rich households. Figure 2b shows the sex ratio of the registered offspring in the research area. Of all 3685 offspring, we had socioeconomic information on 3511 offspring. In poor households, we registered 544 male offspring and 565 female offspring (male:female sex ratio 0.49). In rich households, we registered 1240 male offspring and 1162 female offspring (male:female sex ratio 0.52). Since we did not register the offspring at birth, but during the annual field visit, these sex ratios are secondary sex ratios at an average age of 6 months. Second, we studied survival of 16 632 offspring up to reproductive age (18 years) (Fig. 2c). In poor households, sons had much higher mortality risk compared with daughters (hazard ratio (HR) 1.46 [95% CI 1.08–1.96]; P = 0.01). In rich households, however, mortality risk of sons was similar to that of daughters (HR 1.06 [95% CI 0.84–1.33]; P = 0.64, P for interaction = 0.09). To further investigate the observed sex differences, we also looked at the survival differences in different strata of SES. Figure 3 shows the survival differences for male and female offspring stratified in different strata of SES. The accompanying HRs are reported in Table 2. These analyses show that the sex-specific survival differences dependent on SES are largely due to a reduced survival of male offspring in the poorest households. Third, we analyzed the weights of offspring using repeated measurements from growth charts of the local health clinics. In an analysis of 9842 age and sex standardized measurements among 1470 offspring up to the age of 3 years, daughters had higher weights in poor households, whereas sons had higher weights in rich households (Fig. 2d). These differences in sex-specific weight gain were significantly different (P for interaction = 0.008). CONCLUSIONS AND IMPLICATIONS We observed sex-specific effects of SES on the sex ratio of offspring, offspring survival and offspring weight. Several points should be discussed when interpreting these results. van Bodegom et al. | SES determines sex-dependent survival of human offspring Poor men 0.4 0.1 0.2 Offspring / (person)year 0.6 0.0 0.6 0.4 0.1 0.2 Age group (years) Age group (years) Cumulative survival (right y-axis, proportion) Age specific reproductive rate (offspring/personyear) Cumulative survival (right y-axis, proportion) Offspring per agegroup/year (offspring/year) Offspringper agegroup/year (offspring/year) Age specific reproductive rate (offspring/personyear) Poor women Rich women 0.2 0.0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 9 105 0 0.00 Age group (years) 0.8 0.15 0.6 0.10 0.4 0.05 0.2 0.00 0.0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 9 105 0 0.4 0.05 1.0 0.20 Cumulative survival 0.6 0.10 Cumulative survival 0.8 0.15 Offspring / (person)year 1.0 0.20 Offspring / (person)year 0.0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 9 105 0 0.0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 9 105 0 0.0 0.8 0.2 Cumulative survival 0.8 0.2 1.0 0.3 Cumulative survival Offspring / (person)year Rich men 1.0 0.3 Age group (years) Cumulative survival (right y-axis, proportion) Age specific reproductive rate (offspring/personyear) Cumulative survival (right y-axis, proportion) Age specific reproductive rate (offspring/personyear) Offspring per agegroup/year (offspring/year) Offspring per agegroup/year (offspring/year) Figure 1. Cumulative survival, age-specific fertility rate and offspring per year for poor and rich men and women of different age groups. First, concerning the high ages of continued reproduction in this area. Since there is no official registration of births in this area, the ages are estimated ages by three independent observers, both local and Dutch fieldworkers. We used all information available to come to a best estimate, most notably the relation to other family members with known ages, but some ages could be estimated too low and some too high. Although we did our best to come to an objective estimate, old age carries a certain status in this area, and it is possible that more ages are overestimations than underestimations. This could explain the unusual high age of retained fertility for some women and it is also possible that the high reproductive output of some old men could be a little less extreme. Although misclassification of ages does not change the interaction of wealth and sex as we describe in this article, it is important to recognize this when interpreting the fertility data. Second, the sex ratios are sex ratios during registration at our annual field visit. Therefore, they are secondary sex ratios at an average age of 6 months and they do not necessarily reflect sex ratios at birth. Therefore, they could be the result of early mortality differences instead. We have observed mortality differences up to 18 years and it is expected that these differences also exist in the first 6 months of life. Another important point to discuss in this polygynous society is that men that fail to marry migrate to the south of Ghana to work in poor conditions in large cities or large-scale agricultural plantations. We have no estimate of their reproductive output but it is possible that this is low. Since the men that migrate are preferential poor males, the fertility figures for poor males in the research area are most probably overestimations of the reproductive output of all men born in poor households in the area. The contrast in reproduction between poor and rich is therefore probably even stronger than presented here. It is even possible that the lifetime number of offspring of poor women is greater than the lifetime number of offspring of poor men if this would be taken into account. However, it is not possible to 41 42 | van Bodegom et al. Evolution, Medicine, and Public Health Figure 2. Offspring per year (a), sex of offspring (b), offspring survival (c) and offspring weight (d) in poor and rich households. Error bars indicate standard errors. SDS = Standard Deviation Score. 0 5 Cumulative survival 10 Age (years) 0.85 0.90 0.95 1.00 0 15 Male 20 Female 5 Cumulative survival 0 0.85 0.90 0.95 1.00 10 Age (years) 0.85 0.90 0.95 1.00 0 15 Male 5 20 Female 5 0.85 0.90 0.95 0 10 Age (years) 0.85 0.90 0.95 1.00 10 Age (years) Cumulative survival 0 15 15 5 20 Male 20 Female Male 0.85 0.90 0.95 1.00 10 Age (years) 0.85 0.90 0.95 15 0 0 15 1.00 10 Age (years) Female 5 20 5 Male Female 5 20 Female Male 10 Age (years) 0.85 0.90 0.95 1.00 0 0 10 Age (years) 0.85 0.90 0.95 1.00 15 15 20 Male 20 0.85 0.90 0.95 0.85 0.90 0.95 1.00 10 Age (years) Female 5 Male 1.00 10 Age (years) Female 5 0 0 15 15 5 20 10 Age (years) 0.85 0.90 0.95 1.00 Female Male 10 Age (years) 5 20 Female Male 0 15 15 Male 20 Female 5 20 Female Male 0.85 0.90 0.95 1.00 0 10 Age (years) 15 5 10 Age (years) 20 Female Male 15 20 Female Male Figure 3. Offspring survival 18 years dependent on socioeconomic status (SES) in different strata of wealth (DHS wealth index). (a) Split by median (poor versus rich), (b) tertiles of SES, (c) quartiles of SES and (d) quintiles of SES. 0.85 0.90 0.95 1.00 (d) (c) (b) Cumulative survival 1.00 Cumulative survival Cumulative survival Cumulative survival (a) Cumulative survival Cumulative survival Cumulative survival Cumulative survival Cumulative survival Cumulative survival Richer Cumulative survival Poorer SES determines sex-dependent survival of human offspring van Bodegom et al. | 43 44 | van Bodegom et al. Evolution, Medicine, and Public Health Table 2. Hazard ratios for mortality 18 years (male versus female) HR 95% CI P Poorest 50% Richest 50% 1.46 1.05 (1.08–1.96) (0.84–1.33) 0.01 0.64 First tertile Second tertile Third tertile 1.51 0.96 1.15 (1.11–2.05) (0.70–1.33) (0.83–1.59) 0.008 0.81 0.4 First quartile Second quartile Third quartile Fourth quartile 1.43 0.97 1.24 1.11 (1.03–2.00) (0.67–1.41) (0.86–1.81) (0.75–1.65) 0.03 0.89 0.25 0.59 First quintile Second quintile Third quintile Fourth quintile Fifth quintile 1.58 1.35 0.68 1.44 1.09 (1.09–2.29) (0.88–2.07) (0.45–1.03) (0.92–2.25) (0.72–1.76) 0.02 0.17 0.07 0.11 0.68 Bold values indicate significance at p < 0.05 calculate this without knowing the exact fertility characteristics of the men that migrate. This does not change our conclusions, however, and in fact, it is possible that the Trivers–Willard effect could even be stronger than presented here. Concerning the mechanism behind the observed sex differences dependent on SES, two possible explanations exist. First, they could be a reflection of higher intrinsic vulnerability of sons to poor conditions. Looking at the mortality patterns in poor and rich households, Fig. 3 shows that the differences are largely determined by a higher mortality of sons in poor conditions. It is known that men have higher mortality risks throughout life in almost all countries and in this harsh environment, this could be the principle mechanism behind the observed survival differences dependent on SES [16]. Second, our observations are also in line with differences in parental investment as hypothesized by Trivers and Willard. The observed sex differences in weight could reflect differences in parental nursing habits; sex differences in breastfeeding have previously been observed in Poland and the Caribbean [17, 18]. These differences in parental behavior do not have to be based on conscious decisions. Previous studies among the Mukogodo of Kenya also showed that in a male-centered society, parental behavior can, maybe not even always consciously, be female oriented in a society where all Mukogodo are poor in relation to the Masaai [19]. We do not have observations on parental behavior in our study. Although this would be interesting, from an evolutionary perspective not the mechanism but the number of surviving male and female offspring is most relevant. A last thing to consider is a potential effect that birth order could have on the observed patterns. It could be expected that the first-born son would be preferred; because he would inherit the wealth and therefore have high reproductive prospects while later born sons would be less favored. Unfortunately, we do not have reliable data on this, but we are planning to collect this in the future. On the other side, although the oldest son inherits the house, his brothers together with their wive(s) will often live with him in his household. Also, it is important to realize that in this society, possessions are not owned individually but are shared to a high degree among the (male) kin of the household. Whether the sex differences that we have observed in our study reflect the higher vulnerability of sons to poor conditions, or reflect a sex-specific parental investment as proposed by Trivers and Willard, the net result is the same; sons are better off in richer households which maximizes the reproductive prospects of households in this polygynous society. In fact, the two explanations are not mutually exclusive. The Trivers–Willard hypothesis refers to an ultimate van Bodegom et al. | SES determines sex-dependent survival of human offspring explanation in terms of evolutionary optimization. Differential vulnerability to poor conditions is a proximate explanation referring to a potential mechanism, even if unspecified. 6. Zaldivar ME, Lizarralde R and Beckerman S. Unbiased sex ratios among the Bari: an evolutionary interpretation. Hum Ecol 1991;19:469–98. 7. Mace R. Biased parental investment and reproductive success in Gabbra pastoralists. Behav Ecol Sociobiol 1996;38: 75–81. 8. Borgerhoff Mulder M. Brothers and sisters. How sibling supplementary data interactions affect optimal parental allocations. Hum Nat Supplementary data are available at EMPH online. 1998;9:119–61, doi: 10.1007/s12110-998-1001-6. 9. Meij JJ, van Bodegom D, Ziem JB et al. Quality–quantity trade-off of human offspring under adverse environmental funding conditions. J Evol Biol 2009;22:1014–23, doi: 10.1111/ This research was supported by the Netherlands Foundation for the advancements of Tropical Research (WOTRO); the Netherlands Organization for Scientific Research (NWO); the EU-funded Network of Excellence LifeSpan, an unrestricted grant of the Board of the Leiden University Medical Center and the Association Dioraphte. None of these organizations had any role in the design, analysis, interpretation or report of the study. j.1420-9101.2009.01713.x. 10. Meij JJ, de Craen AJ, Agana J et al. Low-cost interventions accelerate epidemiological transition in Upper East Ghana. Trans R Soc Trop Med Hyg 2009;103:173–8, doi: 10.1016/j.trstmh.2008.09.015. 11. Kuningas M, May L, Tamm R et al. Selection for genetic variation inducing pro-inflammatory responses under adverse environmental conditions in a Ghanaian population. PLoS One 2009;4:e7795, doi: 10.1371/journal.pone. 0007795. 12. van Bodegom D, May L, Kuningas M et al. Socio-economic Conflict of interest: none declared. status by rapid appraisal is highly correlated with mortality Funding to pay the Open Access publication charges for this article was provided by the Leyden Academy on Vitality & Ageing. risks in rural Africa. Trans R Soc Trop Med Hyg 2009;103: 795–800, doi: 10.1016/j.trstmh.2008.12.003. 13. Morris SS, Carletto C, Hoddinott J et al. Validity of rapid estimates of household wealth and income for health surveys in rural Africa. J Epidemiol Commun Health 2000;54: 381–7. references 14. Rutstein SO and Johnson K. The DHS Wealth Index. DHS 1. Trivers RL and Willard DE. Natural selection of parental ability to vary the sex ratio of offspring. Science 1973;179: Comparative reports No. 6. Calverton MD: ORC Macro, 2004. 90–2. 2. Cameron EZ. Facultative adjustment of mammalian sex 15. Borgerhoff Mulder M, Bowles S, Hertz T et al. ratios in support of the Trivers–Willard hypothesis: evi- of inequality in small-scale societies. Science 2009;326: dence for a mechanism. Proc Roy Soc Lond B 2004;271: Intergenerational wealth transmission and the dynamics 682–8, doi: 10.1126/science.1178336. 16. Kalben B. Why men die younger: causes of mortality dif- 1723–8, doi: 10.1098/rspb.2004.2773. 3. Cameron EZ and Dalerum FA. Trivers–Willard effect in contemporary humans: male-biased sex ratios among bil- ferences by sex. North Am Actuarial J 2000;4:83–111. 17. Koziel S and Ulijaszek SJ. Waiting for Trivers and Willard: 10.1371/ do the rich really favor sons?. Am J Phys Anthropol 2001; 4. Chacon-Puignau GC and Jaffe K. Sex ratio at birth devi- 18. Quinlan RJ, Quinlan MB and Flinn MV. Local resource lionaires. PLoS One 2009;4:e4195, doi: journal.pone.0004195. 115:71–9, doi: 10.1002/ajpa.1058. ations in modern Venezuela: the Trivers–Willard effect. enhancement Soc Biol 1996;43:257–70. Caribbean community. Curr Anthropol 2008;46:471–80. and sex-biased breastfeeding in a 5. Gaulin SJ and Robbins CJ. Trivers–Willard effect in con- 19. Cronk L. Intention versus behaviour in parental sex pref- temporary North American society. Am J Phys Anthropol erences among the Mukogodo of Kenya. Biosoc Sci 1991; 1991;85:61–9. 23:229–40, doi: 10.1017/S0021932000019246. 45
© Copyright 2025