Page 1 of 43 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 Genome-Wide Association Study of Agronomic Traits in Common Bean Kelvin Kamfwa, Karen A. Cichy, and James D. Kelly* K. Kamfwa and J.D. Kelly, Dep. of Plant, Soil and Microbial Sciences, Michigan State Univ., 1066 Bogue St., East Lansing, MI 48824; K.A Cichy, USDA-ARS, Sugarbeet and Bean Research Unit, Michigan State Univ., 1066 Bogue St., East Lansing, MI 48824. Received 30 Sept. 2014. *Corresponding author (kellyj@msu.edu). Abbreviations: ADP, Andean diversity panel; BLAST, Basic local alignment search tool for nucleotide; Bp, base pair; DTF, days to flowering; DTM, days to maturity; GWAS, Genomewide association studies; HI, harvest index; SW, 100 seed weight; Kbp, Kilo base pair; LD, linkage disequilibrium; MAF, minor allele frequency; MLM, mixed linear model; PCA, principal component analysis; PHI, pod harvest index; PN, pod number; Pv, Phaseolus vulgaris; PW, pod weight; QTL, quantitative trait loci; RIL, recombinant inbred line; SN, seed number per plant; SNP, single nucleotide polymorphism. Abstract A genome-wide association study (GWAS) using a global Andean diversity panel (ADP) of 237 genotypes of common bean Phaseolus vulgaris, was conducted to gain insight into the genetic architecture of phenology, biomass, yield components and seed yield traits. The panel was evaluated for two years in field trials in Michigan and genotyped with 5398 single nucleotide polymorphism (SNP) markers. After correcting for population structure and cryptic relatedness, significant SNP markers associated with several agronomic traits were identified. Positional 1 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 candidate genes, including Phvul.001G221100 on Phaseolus vulgaris (Pv) chromosome 01, associated with days to flowering and maturity were identified. Significant SNPs for seed yield were identified on Pv03 and Pv09, and co-localized with quantitative trait loci (QTL) for yield from previous studies conducted in several environments and contrasting genetic backgrounds. The majority of germplasm carrying the alleles with positive effects on seed yield were of African origin, and largely underutilized in U. S. breeding programs. The study provided insights into the genetic architecture of agronomic traits in Andean beans. Introduction By 2050, the projected 9.6 billion people will require 70% more food than the current demand and most of this increased demand will be from developing countries mainly in Africa (Alexandratos and Bruinsma, 2012). Climate change will also likely exacerbate food security challenges especially in tropical and subtropical regions of Africa (Sassi, 2013). To meet this increased global food demand, the productivity of most food crops must increase especially in Africa where the yields are far below their potential (Beebe, 2012; Mueller et al., 2012). Common bean (Phaseolus vulgaris) is a key commodity for improving food security as it is an inexpensive and major source of protein and nutrients in many African and Latin American countries. It is widely grown and fits well in the low input agricultural systems practiced in these two regions where most resource-limited farmers cannot afford inputs such as fertilizers and irrigation (Beebe et al., 2012; Broughton et al., 2003). Improving seed yield is a major objective of most bean breeding programs (Beaver and Osorno, 2009; Kelly et al., 1998; Vandemark et al., 2014). Steady yield gains have been made in 2 Page 2 of 43 Page 3 of 43 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 Mesoamerican bean types over the last decades resulting from both genetic and improved crop management, whereas yield gains in large-seeded Andean beans have been modest (Singh et al., 2007; Vandemark et al., 2014). Seed yield is a quantitative trait in common bean and is conditioned primarily by three yield components: number of pods per plant, number of seeds per pod and seed weight (Adams, 1967). All three yield components are quantitative in nature and are based on the interaction of physiological and morphological features of the plant (Wallace et al., 1993a). The number of pods per plant and seeds per pod exhibit low heritability, whereas seed size exhibits moderate heritability levels (Coyne, 1968). Understanding the genetic architecture of yield and its interaction with individual yield components forms a basis for the genetic improvement of seed yield in Andean beans. Identifying genomic regions contributing to yield and its components is essential for marker-assisted selection that could accelerate gains in breeding for yield in Andean beans. Numerous mapping studies in common bean have reported QTL for yield and yield components on several chromosomes. Koinange et al. (1996) reported QTL for pods per plant on Pv01 and Pv08 in a population of 65 F8 RILs from inter gene pool cross of Midas x G12873 (wild Mesoamerican accession). Tar’an et al. (2002) reported QTL for seed yield on Pv05, Pv09 and Pv11, for pod number per plant on Pv02 in 145 F4:5 RILs from OAC Seaforth x OAC 95-4 navy bean cross. Beattie et al. (2003) reported QTL for seed yield on Pv03 and Pv05 in a population of 110 F5:7 RILs from a cross WO3391 × OAC Speedvale. They also reported QTL for pod number per plant on Pv02, Pv03 and Pv05 (Beattie et al., 2003). Blair et al. (2006) reported QTL for seed yield on Pv02, Pv03, Pv04 and Pv09 in an inbred backcross population of 157 BC2F3:5 from a cross between ICA Cerinza (cultivated recurrent parent) and G24404 (wild donor parent). In 3 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 the same population, QTL for pods per plant were identified on Pv07, Pv09 and Pv11 (Blair et al., 2006). Wright and Kelly (2011) reported QTL for yield on Pv03, Pv05, Pv10 and Pv11 in a population of 96 F4:5 RILs from a black bean cross between Jaguar and 115M. Checa and Blair (2012) identified QTL for seed yield on Pv03, Pv04 and Pv10 in F5:8 RILs from an inter gene pool cross of G2333 and G19839. Recently, Mukeshimana et al. (2014) reported QTL for seed yield on Pv03 and Pv09 in a population of 125 F5:7 RILs from inter gene pool cross of SEA5 x CAL96. The limited number of markers and small population sizes that were used in these studies resulted in QTL with low resolution. As a result inferences on positional candidate genes associated with the identified QTL were difficult to make. Advances in common bean genomics such as the sequenced genome (Schmutz et al., 2014) have resulted in the development of high throughput and efficient genotyping platforms including the BARCBean6K_3 BeadChip with nearly 6000 SNP markers (Hyten et al., 2010). The availability of SNP BeadChip has created an opportunity to conduct genome-wide association studies (GWAS) to dissect the genetic architecture of yield and yield components. The analysis allow for the identification of QTL with enhanced resolution because of the smaller linkage disequilibrium (LD) blocks in an association panel than in bi-parental mapping populations (Nordborg and Weigel, 2008). Enhanced resolution is critical for making inferences on positional candidate genes. The smaller LD blocks result from historical recombinations of genotypes from a genetically diverse panel as opposed to bi-parental mapping populations where the LD blocks are longer because of short-lived recombinations resulting from the few generations of recombination (Myles et al., 2009; Zhu et al., 2008). At each locus there are potentially several alleles being studied in GWAS (Yu and Buckler, 2006) whereas in bi-parental mapping only two 4 Page 4 of 43 Page 5 of 43 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 parental alleles that are segregating will be captured. From an applied perspective, GWAS is more efficient to investigate simultaneously, the genomic potential and genetic variability in a large collection of germplasm for potential use in breeding programs (Zhao et al., 2011). Two gene pools, the Andean and Middle American have been described in common bean (Gepts, 1998; Koenig and Gepts, 1989). Greater genetic variability exists in the Middle American than the Andean gene pool (Bitocchi et al., 2013). As a result more progress in the genetic improvement of several traits including yield has been documented in the Middle American gene pool than the Andean gene pool (Beebe, 2012; Beebe et al., 2001; Kornegay et al., 1992; White et al., 1992; Vandemark et al., 2014). However, moving favorable genes for several agronomic traits from the Mesoamerican into the Andean gene pool has been challenging especially due to incompatibility and linkage drag (Gepts and Bliss, 1985; Singh and Gutiérrez, 1984). Largeseeded Andean beans are the most popular beans in Africa (Beebe, 2012; Wortmann, 1998) but their yields are lower than Middle American beans. In this study a global diversity panel of 237 Andean genotypes from several geographic regions where common bean is grown including Africa, North America, Central America and South America was studied. A genome-wide association study was conducted to enhance our understanding of the genetic architecture of agronomic traits including phenological traits, yield components and seed yield in common bean using the diversity present in the Andean Diversity Panel (ADP). Materials and Methods Plant Material 5 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 The ADP comprised of 237 genotypes from mainly Africa, North America, Central America, South America and a few from Europe and Asia was assembled (Cichy et al., 2015). The panel contains varieties from public and private breeding programs, elite lines and landraces. These materials were collected from dry bean repositories in the U.S., from CIAT collection and some were collected during country visits to African countries. The panel represents the major Andean seed types and varieties important in Africa and North America. Field Phenotyping The ADP was field planted at the Montcalm Research Farm near Entrican, MI, USA in 2012 and 2013 growing seasons. The farm is located in central Michigan where Andean beans are commercially produced. The soil type is a combination of Eutric Glossoboralfs (coarse-loamy, mixed) and Alfic Fragiorthods (coarse-loamy, mixed, frigid) and rainfall was supplemented with overhead irrigation as needed. No fertilizer was applied to the plots and recommended practices were followed for weed and insect control. Soil samples collected from the trial site before planting showed that in 2012 season the nitrate level in the soil was on average 36 ppm whereas in 2013 it was 2.4 ppm. Before planting, seed was inoculated with commercial Rhizobium ‘Nodulator’ (Becker Underwood, Ames IA) with an undisclosed strain at the rate suggested on the package. However, common bean has been grown on this site for many years and there is also adequate native Rhizobium. In both seasons, the panel was planted in a randomized complete block design with two replications. Each genotype was planted in two row plots of 4.75 M long each and inter-row spacing of 0.50 M. Phenological traits for days to flowering (DTF) and days to maturity (DTM) were collected on all entries in both years. In 2012, three plants were sampled per plot at maturity and in 2013 six plants were sampled per plot at maturity. The 6 Page 6 of 43 Page 7 of 43 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 aboveground biomass (BM) of these plants was recorded and all pods were removed, counted, weighed and threshed. Total seed weight and 100-seed weight (SW) was measured on threshed seed. Biomass (BM), pod number (PN), pod weight (PW), seed number (SN) and seed yield per plant were an average of three (2012 season) or six (2013 season) plants. Pod harvest index (PHI) was calculated by dividing seed weight by weight of pods that possessed seed (Beebe et al., 2008). Harvest index (HI) was computed as the ratio of seed weight to total biomass. In both years, seed yield per hectare was calculated from yield measured for each plot and seed weight was adjusted to 16% moisture content. Genotyping DNA samples were genotyped using an Illumina BARCBean6K_3 BeadChip with 5398 SNPs (Hyten et al., 2010) as described by Cichy et al. (2015). Phenotypic Data analyses Statistical analyses for field data were conducted using mixed models in SAS 9.3(SAS Institute, 2011). Assumption for normally distributed data required for analysis of variance (ANOVA) and SNP-trait association test was checked for all traits measured. This was done on the combined residuals of all treatments for each trait using the normality tests in PROC UNIVARIATE. Based on normality test results that showed non-normal data for all traits measured in this study, data for all traits were transformed. All the trait means are reported in their original values. An ANOVA using PROC MIXED was conducted on all the traits based on the following statistical model: = + + + () + () + ℰ 7 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 Where: Yijk is the response variable (such as yield), with genotype i in the environment j, repetition k; αi is the fixed effect of the genotype i; βj is the random effect of the year j; αβ is the random effect of the interaction between genotype i and year j; γ is the random effect of a replication with year j; ε is the random error term, which is assumed to be normally distributed with mean =0 and variance δ2e. Pearson correlation analysis using PROC CORR was conducted on the average values for 2012 and 2013 growing seasons. Population Structure analysis and Marker-Trait Association Tests To assess the population genetic structure in the panel, the software program STRUCTURE (Pritchard et al., 2000) and Principal Component Analysis (PCA) was implemented in the software program EIGENSTRAT (Price et al., 2006). A subset of 89 SNPs not in LD and distributed across 11 chromosomes were employed for analysis with STRUCTURE. Length of Burnin periods was set to 50000 while number of Markov Chain Monte Carlo (MCMC) repetitions after Burnin was also set to 50000. An assumption of the presence of admixtures in the population was made. The K range was set to 1-10 and the number of reps for each simulation to five. The ideal number of sub-populations was determined using the Delta K (∆K) method (Evanno et al., 2005) implemented in the software STRUCTURE HARVESTER (Earl and von Holdt, 2012). After filtering for low quality and monomorphic SNPs, 5326 SNPs were retained. These were filtered further for minor allele frequency (MAF>0.02) (Stanton-Geddes et al., 2013) and a final total of 4850 SNPs were used in PCA and association analyses. To correct for cryptic relatedness 8 Page 8 of 43 Page 9 of 43 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 in the panel the Kinship matrix (K) was included in our association analyses. The kinship matrix was calculated using Scaled Identity by Descent method in TASSEL 5.0 (Bradbury et. al., 2007). To determine the SNP-trait associations, a Mixed Linear Model (MLM) (Yu et al., 2005; Zhang et al., 2010) was implemented in software program TASSEL. The following MLM equation was used: = + + + ℰ where: Y the phenotype of a genotype; X is the fixed effect of the SNP; P is the fixed effect of population structure (from PCA matrix); K is the random effect of relative kinship i.e., cryptic relatedness among genotypes (from kinship matrix); ε is the error term, which is assumed to be normally distributed with mean = 0 and variance δ2e. We used Bonferonni corrected p=1.0 x 10-5 (for α = 0.05 and 4850 SNPs) (which is the most conservative) was used to determine the significance threshold for SNPs. This was used for all traits except DTF and DTM, which was set to p=1.0 x 10-4 to retain SNPs associated with candidate genes. To gain insights into the positional candidate genes associated with significant SNPs, Jbrowse on Phytozome v10 (Goodstein et al., 2012) was used to browse the common bean genome version 1.0 (Schmutz et al., 2014). Positional candidate genes where identified by conducting LD analysis in TASSEL 5.0 for the genomic region surrounding significant SNPs. A gene was considered a positional candidate if: the gene contained a significant SNP or the gene contained a SNP that was in LD with a significant SNP. The functional annotation on Phytozome v10 (Goodstein et al., 2012) for the gene was then checked to make inferences about the plausible role of the gene in the control of a trait. For the gene with inadequate functional annotation data, 9 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 Page 10 of 43 genomic sequence data from Phytozome v10 was used in a search against NCBI and TAIR (Rhee et al., 2003) databases using BLASTN (Zhang et al., 2000). Results Phenotypic Traits Highly significant (P<0.0001) differences existed among the 237 genotypes for all the traits measured in both 2012 and 2013. The means and ranges for the traits measured are presented in Table 1. The means for BM, PW and SN were higher in 2012 than 2013. As expected, there were several significant correlations among traits measured (Table 2). Seed weight was negatively correlated with PN and SN (Table 2). Yield per plant was negatively correlated with DTF and DTM and was positively correlated with all other traits. Approximately 26 genotypes out of 237 genotypes in the ADP flowered later than 50 days after planting and were considered photoperiod sensitive. Of these 23 were from Africa, two from South America and one was from North America. The negative correlation between DTF and seed yield could be attributed to the presence of these photoperiod sensitive and late maturing genotypes in the panel whose seed filling duration was reduced because of the short growing season in Michigan. Falling temperatures towards end of the season could have reduced photo-assimilates accumulated before the end of seed filling. However, these genotypes did reach harvest maturity and samples were collected and plots harvested for data analysis. Population Structure The STRUCTURE (Pritchard et al., 2000) analysis and Evano test (∆K) indicated a two subpopulation structure within the 237 ADP genotypes. These two sub-populations are consistent 10 Page 11 of 43 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 with the Andean or Middle American gene pools. Among the 237 genotypes, 228 were from the Andean gene pool and the remaining nine genotypes were from Middle American gene pool. Interestingly sixteen Andean lines displayed between 10-40% of their genome as introgressions from Middle American gene pool. Analysis of population structure with PCA, revealed that the first, second and third principal component (PC) accounted for 36.3%, 12.1% and 5.0% of the genotypic variability in the ADP, respectively. A plot of PC1 against PC2 clearly showed three clusters of genotypes (Figure 1). One of these clusters was comprised of seven Middle American genotypes in the STRUCTURE analysis. The results of PCA and STRUCTURE are comparable though the bigger subpopulation of Andean genotypes in STRUCTURE analysis was split into two clusters in PCA. The smallest cluster of these two comprised of 19 Andean genotypes of which 14 were landraces from East Africa, four were varieties from North America and two were from the Caribbean. The other bigger Andean cluster comprised of genotypes from many geographic regions. The preliminary GWAS analyses showed comparable results when STRUCTURE or PCA results were used as a covariate to account for population structure in the panel. The first three PC’s that together explained 53.4% of the genotypic variability in the ADP were used as covariates to correct for population structure. Trait-SNP Associations Phenological traits Significant (P<1.0 x 10-4) SNPs were identified for DTF on Pv01 and Pv08 in 2012 (Figure 2). The most significant (P=6.9 x 10-6) SNP for DTF in 2012 that explained 9% of the variability in 11 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 Page 12 of 43 DTF was located on Pv08 (Table 3). One of the SNPs identified in 2012, ss715646578 on Pv01, was just below the significance threshold (P=5.6 x 10-4) in 2013. One significant (P=7.4 x 10-5) SNP was identified on Pv01 in 2013 for DTM. This SNP also explained about 9% of the variation in DTM and was the same SNP associated with DTF (Table 3). No significant associations for DTM were identified in 2012. Plant Biomass at Maturity Significant (P<1.0 x 10-5) SNPs for BM were identified in 2012 season. SNPs were detected on Pv02 and Pv08 (Figure 3) with the most significant (P=5.2 x 10-7) SNP on Pv08 that explained 12% of the variation in BM (Table 3). No significant associations for BM were identified in 2013. Pod Number Significant (P<1.0 x 10-5) SNPs for PN were identified in 2013 on Pv05 and Pv07 (Figure 3). The most significant (P=2.2 x 10-6) SNP on Pv05 explained about 10% of variation in PN (Table 3). No significant associations for PN were identified in 2012. Harvest Index and Pod Harvest Index Significant SNPs for HI were identified in 2012. The most significant (P=2.9 x 10-6) SNP was located on Pv03 and explained 12% of variability for HI in the ADP in 2012. No significant associations were identified in 2013. Significant association for PHI was identified n Pv04 in 2013 (Figure 3). The most significant SNP (P=4.5 x 10-6) that accounted for 10% of the 12 Page 13 of 43 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 variability for PHI was located on Pv04. No significant associations were detected for PHI in 2012. Pod Weight Significant SNPs for PW were identified on Pv08 in 2012. The most significant SNP (P=4.3 x 10-8) accounted for about 14% of the variability in PW (Table 3). In 2013 season, significant associations for PW were identified on Pv08. The most significant (P=8.8 x 10-6) SNP explained about 9% of the variability in PW in 2013 (Table 3). Seed Number Significant SNPs for SN were identified in 2013 on Pv03 and Pv05 (Figure 4). The most significant SNP (P=6.7 x 10-7) was located on Pv03 and accounted for about 13% of the phenotypic variation in SN (Table 3). No significant SNPs for SN were identified in 2012. Seed Yield Significant (P<1.0 x 10-5) SNPs for seed yield were identified on both per hectare and per plant basis in 2012. Several significant associations were identified for yield on a per plant basis on Pv08 in 2012. The most significant SNP (P=1.0 x 10-7) explained about 13% variation in seed yield per plant in the panel (Table 3). SNPs significantly associated with seed yield per hectare were identified on Pv03 and Pv09 (Figure 4) in 2012. The most significant (P=4.5 x 10-7) SNP was located on Pv03 and accounted for 14% variability in seed yield per hectare (Table 3). No significant associations were identified for yield on per plant and per hectare basis in 2013 season. 13 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 Page 14 of 43 The larger positive effect on seed yield for significant SNP ss715646178 with alleles G and T on Pv09 came from minor allele G (MAF=0.09). The average yield for genotypic class GG on ss715646178 was 1690 kg ha-1 while for TT it was 1561 kg ha-1. For SNP ss715649410 on Pv03 with alleles A and G, the larger positive effect on seed yield was from the minor G allele (MAF=0.12). The averages for seed yield of genotypic classes GG and AA were 1672 kg ha-1 and 1559 kg ha-1, respectively for SNP ss715649410. Among 237 genotypes in the ADP, only 28 and 21 genotypes carried the minor allele for ss715649410 and ss715646178, respectively (Table 4). The geographic distributions of genotypes that carried these alleles with larger positive effect are presented in Table 4. Twenty-one genotypes carried alleles with larger effect at both ss715646178 and ss715649410. The average yield for these 21 genotypes was 1824 kg ha-1. A group of 216 genotypes that did not carry the larger effect allele at both ss715646178 and ss715649410 averaged about 1627 kg ha-1. Clearly, there is a beneficial yield effect of having both alleles with larger effect in a single genotype. Of these 21 genotypes carrying the larger effect allele at both ss715646178 and ss715649410, 12 were from Africa, eight from North America and one from South America. Not all 12 genotypes from Africa were photoperiod sensitive in Michigan. These materials could serve as sources of germplasm in breeding for yield in North American bean breeding programs. Discussion Previous QTL studies using bi-parental populations have provided limited insights into the genetic architecture of a number of important agronomic traits of common bean. In this study we used a genome-wide association study approach to expand the genetic information on traits 14 Page 15 of 43 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 controlling phenology, biomass, yield components and seed yield in order to support breeding efforts directed at improving common beans from the Andean gene pool. Flowering is an important agronomic trait that is strongly influenced by the environment and is key in the adaptation of common bean genotypes to different geographic locations (Wallace et al., 1993a). In this study, we identified SNPs significantly associated with DTF on Pv01 and Pv08. The QTL on Pv08 was reported previously (Koinange et al., 1996; Pérez-Vega et al., 2010) and the QTL on Pv01 has been widely reported (Blair et al., 2006; Koinange et al., 1996; Mukeshimana et al., 2014; Pérez-Vega et al., 2010). Since previous studies have consistently reported QTL for flowering on Pv01, it is likely to be stable across several environment and genetic backgrounds. Potential positional candidate genes for flowering in the region around significant SNP ss715646578 on Pv01 were investigated. Four genes in LD with ss715646578 were detected. Among these genes Phvul.001G221100 (Figure 2) was approximately 4.5 Kbp downstream of ss715646578 and in LD. The functional annotation on Phytozome indicated that Phvul.001G221100 is a two-component sensor histidine kinase. BASTN search of Phvul.001G221100 genomic sequence against TAIR database resulted in the best hit to the Arabidopsis thaliana gene phyA that codes for phytochrome A. Phytochrome A is a photoreceptor pigment reported to control photoperiod sensitivity in Arabidopsis (Reed et al., 1994). A BLASTN search against of Phvul.001G221100 genomic sequence against NCBI data resulted in a best hit to a gene GmPhyA3 in Glycine max. GmPhyA3 has been cloned and characterized as contributing to the complex flowering response and maturity systems in soybean (Watanabe et al., 2009). Apparently, this gene is conserved in P. vulgaris, G. max and A. thaliana and appears to retain similar functions in photoperiod sensitivity, flowering and maturity in these three species. Based on GWAS results and comparative genomics, 15 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 Page 16 of 43 Phvul.001G221100 is a strong candidate as the gene on Pv01 controlling photoperiod sensitivity and flowering in common bean. In P. vulgaris the locus for photoperiod sensitivity (Ppd; Wallace et al., 1993b) was previously mapped to Pv01 (Koinange et al., 1996; Gu et al., 1998). Due to differences in the marker technologies used and the large confidence intervals for the QTL reported in previous studies, it is difficult to ascertain whether previously identified Ppd QTL co-localizes with candidate gene Phvul.001G221100. However, the ss715646578 SNP on Pv01 is polymorphic between the original photoperiod sensitive Redkote (A allele) and photoperiod neutral Redkloud (G allele) cultivars where the Ppd gene was first identified by Wallace et al. (1993b). Photoperiod sensitive genotypes (Ppd) flower later in extended day light environments and the phenomenon is more common in the Andean gene pool (Kornegay et al., 1993). A significant number of genotypes (26 out of 237 genotypes) in the ADP were photoperiod sensitive in Michigan due to the expression of the Ppd gene under the long day conditions during the growing season. Days to maturity is critical for the adaption to geographic areas with shorter growing seasons and short rainy seasons in tropical regions. We identified significant SNPs for maturity on Pv01. Previous studies have also reported a QTL for maturity on Pv01 (Koinange et al., 1996; Mukeshimana et al., 2014; Pérez-Vega et al., 2010). In this study the significant SNP ss715646578 on Pv01 for DTF in 2012 was the same significant SNP for maturity in 2013 (Figure 3). Co-localization of DTF and DTM QTL in common bean has been reported previously (Koinange et al., 1996). This may suggest that SNP ss715646578 is associated with a gene that has a pleiotropic effect on flowering and maturity. This may also suggest that this SNP may be in LD with two different genes controlling these two traits. 16 Page 17 of 43 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 To gain insights into how selection for flowering and maturity in different geographic regions has affected the allele frequencies of the ss715646578 SNP that is in LD with Phvul.001G221100, we investigated allele frequencies of all significant SNPs. The MAF for SNP ss48340819 that is significantly associated with flowering and maturity was the highest (MAF=0.36) among all significant SNPs for all traits measured (Table 3). The higher MAF for flowering and maturity than for other traits measured in this study including seed yield is due to differences in selection mode. Most materials from Africa flowered and matured later than materials from North America. This could be a reflection of emphasis placed on breeding for earliness in North America due to the shorter growing season when compared to the longer growing season in Africa (Beebe, 2012). The higher MAF could have resulted in spatial variation in flowering fitness optimum and the frequency of alleles carried on the SNP ss48340819. Because of the significant representation of both late and early flowering genotypes carrying contrasting alleles at SNP ss48340819, the MAF is expected to be larger. During selection for maturity, breeders rarely select phenotypes with extremes in maturity, in contrast to selecting for yield where phenotypes with extreme high yield potential are sought. Extreme phenotypes are always few and are caused by rare alleles. As a result the frequency of minor alleles at loci for yield would be lower as compared to DTF and DTM loci. Though the QTL for flowering on Pv01 has been widely reported, this is the first report where a QTL for flowering was resolved to a much smaller genomic region that could facilitate the identification of candidate gene(s). A candidate gene for flowering and maturity was identified through GWAS and comparative genomics enabled by the newly released genome for common bean. We have demonstrated how useful the sequenced P. vulgaris genome will be in advancing the knowledge of the candidate genes underlying important QTL. 17 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 Page 18 of 43 Means for BM, PN and seed yield per plant were higher in 2012 than 2013. This could be attributed to higher soil nitrogen available at the 2012 site (Nitrate=36 ppm) than the 2013 site (Nitrate=2.4 ppm) at the time of planting. This higher soil nitrogen could have benefited the plants in 2012 especially in early growth stages when there was little nitrogen fixation by the plant. Significant correlations were observed for most of the traits measured with seed yield among the 237 genotypes in the ADP. This was expected as most of these traits are inter-related and are determinants of seed yield. All the traits measured in this study can essentially be categorized into three groups: aerial biomass (BM, PW, and PN), phenology (DTF and DTM) and seed yield (seeds per plant, yield per hectare) and HI and PHI are computed based on these factors. Seed weight was negatively correlated with PN and SN. This could indicate compensation among yield components, which has been previously reported (Adams, 1967). Significant correlations between phenological traits, yield components, aerial biomass at flowering and seed yield have been reported previously (Scully et al., 1991). Both DTF and DTM were negatively correlated with yield (Table 2). This could be attributed to the photoperiod sensitivity of a significant number of genotypes in the ADP, due to the long day length in Michigan. Photoperiod sensitive genotypes flowered and matured later. Therefore, they had an extended vegetative growth stage and accumulated more biomass than the photoperiod insensitive genotypes. In addition, many of these genotypes were inefficient in partitioning assimilates to the seeds resulting in lower yields. It is probable that if the panel was evaluated in a tropical environment in East Africa where most of the photoperiod sensitive materials are adapted and grown, the correlation between yield and DTF, and DTM would be positive. 18 Page 19 of 43 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 In 2012, highly significant SNPs on Pv08 were identified that were associated BM, PW and yield per plant. Plant biomass was significantly correlated with PW and yield per plant in the correlation analyses (Table 2). These SNPs associated with more than one trait could be due to pleiotropy or due to linked genes that reside in the same LD block and are associated with the same SNPs. Since pods were part of BM in our measurements, pleiotropy between BM and PW, cannot be considered. However, pleiotropy is plausible between yield and the two aerial biomass components (BM and PW). Whereas linkage can be proven if a population can be used that captures more recombinations in the genomic region where significant SNPs for more than two traits reside, pleiotropy is difficult to prove. From a plant breeding perspective whether pleiotropy or linkage is the underlying basis for same SNPs to be associated with BM, PW and yield per plant has little effect because of the positive effects of these SNPs on BM, PW and yield per plant. Looking at significant associations for BM and yield on Pv08 helps to reinforce prior research that selecting for three major physiological components of yield i.e., BM, HI and DTF (in adapted genotypes) should result in an increase in seed yield in common bean (Wallace et al., 1993a). Significant SNPs for HI were identified on Pv03 in 2012 (Figure 4). The two most significant SNPs ss715639243 and ss715648538 for HI and seed yield per hectare (Table 3), respectively, on Pv03 were in strong LD (r2=1; D’=1). This may suggest that these SNPs were in LD with a pleiotropic gene for HI and seed yield (Wallace et al. 1993a). The other possible scenario was that ss715639243 and ss715648538 could have been in LD with linked genes for HI and seed yield. 19 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 Page 20 of 43 Pod number is a major yield component with a significant contribution to seed yield per plant (Adams, 1967). In this study, significant SNPs were identified for PN on Pv05 and Pv07 in 2013 seasons. QTL for PN have been reported previously on Pv05 (Beattie et al., 2003) and Pv07 (Blair et al., 2006). Two significant SNPs ss715649615 and ss715650235 in 2013 for PN and SN, respectively, on Pv05 were in LD (r2=0.2; D’=1). This may suggest that these SNPs could be in LD with a pleiotropic gene or genes in linkage for these two traits. Significant SNPs for SN were identified on Pv03 and Pv05 (Table 3). Significant SNPs for both SN in 2013 and seed yield in 2012 were identified on Pv03. Results of LD analysis for the entire Pv03 indicated that the two most significant SNPs ss715639901 and ss715648538 for SN and seed yield (Table 3), respectively, were in strong LD (r2=1; D’=1). Numbers of seeds per plant and seed yield are closely inter-related and as noted earlier could be collapsed into a single category of yield. This could explain the significant associations on the same chromosome and the strong LD of significant SNPs for these two traits. Several significant SNPs were identified on Pv03 and Pv09 for seed yield per hectare and on Pv08 for yield per plant in 2012 season. There are several reports of QTL for seed yield and some of these are consistent with our results. Seed yield QTL were identified on Pv03 (Blair et al., 2006; Checa and Blair, 2012; Mukeshimana et al., 2014; Wright and Kelly, 2011) and on Pv09 (Blair et al., 2006; Mukeshimana et al., 2014; Tar'an et al., 2002). The QTL, SY3.3SC for seed yield identified by Mukeshimana et al. (2014) had a marker interval of ss715640477ss715649325 that contained three SNPs. LD analysis between these SNPs and the significant (7.8 x 10-6) SNP ss715649410 for seed yield in the current study, indicated two of three SNPs 20 Page 21 of 43 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 were in LD (r2>0.6; D’>0.9) with ss715649410. One of these three SNPs in SY3.3SC interval was in strong LD (r2=0.9; D’=1) with the most significant (4.5 x 10-7) SNP ss715648538 for seed yield in the current study. Another QTL for seed yield identified by Mukeshimana et al. (2014) on Pv03 was in the marker interval ss715646941-ss715648035 containing 19 SNPs. Eight of these 19 SNPs were in LD (r2>0.5; D’>0.8) with the significant SNP ss715649410 in the current study. These results suggest that the gene(s) underlying the QTL for seed yield identified by Mukeshimana et al. (2014) are the same ones in LD with significant SNP ss715649410. Five different studies with very diverse populations including the current study have consistently reported seed yield QTL on Pv03 and four studies have reported seed yield QTL on Pv09. If these QTL are stable and expressed in diverse genetic backgrounds they could be used as potential candidates for marker-assisted breeding for seed yield. The geographic distribution of minor alleles with a larger positive effect on seed yield for two significant SNPs ss715949410 (P=7.6 x 10-6) and ss715646178 (P=1.9 x 10-6) on Pv03 and Pv09 was widespread. This may indicate the potential of this ADP as a source of germplasm from different countries with favorable rare alleles that could be used to breed for increased seed yield. Genotypes from other countries carrying alleles with positive effect on seed yield could also be used to introduce new genetic variability into the breeding programs. This could play a significant role in increasing gains in breeding for yield in Andean beans where gains have only been modest when compared to other market classes because of lack of depth in genetic variability (Vandemark et al., 2014). Since yield is a cumulative and complex trait (Kelly et al., 1998), many genes each with small but cumulative effects that are strongly influenced by environmental factors including weather and management contribute to yield. The fact that we only identified a few SNPs associated with yield does not mean that these were the only genetic determinants of yield in respective years but 21 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 Page 22 of 43 this indicates that we may have missed several loci with smaller contributions to yield. The current study had only sufficient power to identify polymorphic loci with large effects on seed yield due to the limited size of the ADP. Based on simulations to identify genes with effects as low as 5% in GWAS, over 1,000 genotypes would be needed to detect a greater number for genes with smaller effects (Yan et al., 2011). Most of the traits measured in this study had few significant SNPs. In addition, most SNPs were significant in one year only. There are two plausible reasons for this. First, the stringent significance level used following the conservative Bonferonni correction cut-off several SNPs that could be significant if the significance threshold was lowered. Second, most of the agronomic traits measured in this study tend to be significantly affected by the environment, resulting in a significant genotype by environment interaction that could have confounded the identification of same significant SNPs in both years. Given the genetic complexity of seed yield and its strong interaction with the environment, further evaluation of the ADP in several environments would help in validating the QTL identified in the current study and their stability across environments. The proportion of the phenotypic variation explained by our significant SNPs is lower than previously reported values. It is plausible that in some previously reported QTL, the R2 values for yield and yield components were inflated because of the small population sizes and limited marker density (Bernardo, 2008). The R2 values reported in this study that ranged from 9% to 14% are consistent with genetic complexity of traits such as yield that are controlled by several genes with small but cumulative effect. Conclusions 22 Page 23 of 43 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 This study has demonstrated the effectiveness of GWAS to identify QTL with enhanced resolution of important agronomic traits of common bean. A substantial number of QTL for the agronomic traits that were identified in this study are consistent with the QTL identified in previous studies that used diverse populations for bi-parental linkage mapping with low marker resolution. Furthermore, we identified novel QTL for several agronomic traits which resulted in the identification of candidate genes for days to flowering and maturity. Given the size of the panel this study is insufficient to identify QTL with smaller effect for the traits measured. We identified QTL some of which could potentially be used as candidates for marker-assisted selection to accelerate gains in breeding for seed yield. Future studies, using segregating populations at the significant SNP loci may be necessary to validate the QTL identified for yield and determine their usefulness in breeding. Our study provides more insights into the genetic architecture of important agronomic traits contributing to yield of common bean. Acknowledgements Research was supported by the Borlaug LEAP program, USDA-ARS, and was also made possible through support provided by the Feed the Future Innovation Lab for Collaborative Research on Grain Legumes by the Bureau for Economic Growth, Agriculture, and Trade, U.S. Agency for International Development, under the terms of Cooperative Agreement No. EDH-A00-07-00005-00, and this work was supported in part by funding from the Norman Borlaug Commemorative Research Initiative (US Agency for International Development). The opinions expressed in this publication are those of the authors and do not necessarily reflect the views of the U.S. Agency for International Development or the U.S. Government. We also thank Dr. Zixang Wen for his helpful comments on some aspects of data analyses. 23 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 References Adams, M.W. 1967. Basis of yield component compensation in crop plants with special reference to the field bean, Phaseolus vulgaris. Crop Sci. 7: 505-510. Alexandratos, N., and J. Bruinsma. 2012. World agriculture towards 2030/2050: the 2012 revision. ESA Working paper No. 12-03. Rome, FAO. Beattie, A.D., J. Larsen, T.E. Michaels, and K.P. Pauls. 2003. Mapping quantitative trait loci for a common bean (Phaseolus vulgaris L.) ideotype. Genome 46: 411-422. Beaver, J.S., and J.M. Osorno. 2009. Achievements and limitations of contemporary common bean breeding using conventional and molecular approaches. Euphytica 168: 145-175. Beebe, S. 2012. Common bean breeding in the tropics. Plant Breed. Rev. 36: 357-426. Beebe, S., J. Rengifo, E. Gaitan, M.C. Duque, and J. Tohme. 2001. Diversity and origin of Andean landraces of common bean. Crop Sci. 41: 854-862. Beebe, S., I.M. Rao, C. Cajiao, and M. Grajales. 2008. Selection for drought resistance in common bean also improves yield in phosphorus limited and favorable environments. Crop Sci. 48: 582-592. Beebe, S., I. M. Rao, C. Mukankusi, and R. Buruchara. 2012. Improving resource use efficiency and reducing risk of common bean production in Africa, Latin America and the Caribbean. Eco-efficiency: From vision to reality. CIAT, Cali, Colombia. 24 Page 24 of 43 Page 25 of 43 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 Bernardo, R. 2008. Molecular markers and selection for complex traits in plants: learning from the last 20 years. Crop Sci. 48: 1649-1664. Bitocchi, E., E. Bellucci, A. Giardini, D. Rau, M. Rodriguez, E. Biagetti, R. Santilocchi, P. S. Zeuli, T. Gioia, G. Logozzo, G. Attene, L. Nanni and R. Papa. 2013. Molecular analysis of the parallel domestication of the common bean (Phaseolus vulgaris) in Mesoamerica and the Andes. New Phytologist 197: 300-313. Blair, M.W., G. Iriarte, and S. Beebe. 2006. QTL analysis of yield traits in an advanced backcross population derived from a cultivated Andean× wild common bean (Phaseolus vulgaris L.) cross. Theor. Appl. Genet. 112: 1149-1163. Bradbury, P.J., Z. Zhang, D. E. Kroon, T. M. Casstevens, Y. Ramdoss, and E. S. Buckler. 2007. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23: 2633-2635. Broughton, W.J., G. Hernandez, M.W. Blair, S. Beebe, P. Gepts, and J. Vanderleyden. 2003. Beans (Phaseolus spp.) - model food legumes. Plant and Soil 252: 55-128. Checa, O.E., and W.M. Blair. 2012. Inheritance of yield-related traits in climbing beans (Phaseolus vulgaris L.). Crop Sci. 52: 1998-2013. Cichy, K.A., T. Porch, J.S. Beaver, P. B. Cregan, D. Fourie, R. Glahn, M. Grusak, K. Kamfwa, D. Katuuramu, P. McClean, E. Mndolwa, S. Nchimbi-Msolla, M.A. Pastor-Corrales, and P.N. Miklas. 2015. A Phaseolus vulgaris diversity panel for Andean bean improvement Crop Sci. 55: (on line). 25 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 Coyne, D.P. 1968. Correlation, heritability and selection of yield components in field beans, Phaseolus vulgaris L. Proc. Amer. Soc. Hort. Sci. 93: 388-396. Earl, D., and B. vonHoldt. 2012. STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conservation Genet Resour 4: 359-361. Evanno, G., S. Regnaut, and J. Goudet. 2005. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol. Ecology 14: 2611-2620. Gepts, P. 1998. Origin and evolution of common bean: past events and recent trends. HortScience 33: 1124-1130. Gepts, P., and F.A. Bliss. 1985. F1 hybrid weakness in the common bean. Differential geographic origin suggest two gene pools in cultivated bean germplasm. J. Hered. 76: 447-450. Goodstein, D.M., S. Shu, R. Howson, R. Neupane, R.D. Hayes, J. Fazo, T. Mitros, W. Dirks, U. Hellsten, N. Putnam, and D. S. Rokhsar. 2012. Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 40: 1178-1186. Gu, W.K., J. Q. Zhu, D. H. Wallace, S.P. Singh, and N.F. Weeden. 1998. Analysis of genes controlling photoperiod sensitivity in common bean using DNA markers. Euphytica 102: 125-132. 26 Page 26 of 43 Page 27 of 43 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 Hyten, D.L., Q. Song, E.W. Fickus, C.V. Quigley, J.S. Lim, I.Y. Choi, E.Y. Hwang, M.A. Pastor-Corrales, and P.B. Cregan. 2010. High throughput SNP discovery and assay development in common bean. BMC Genomics 11:475. Kelly, J.D., J.M. Kolkman, and K. Schneider. 1998. Breeding for yield in dry bean (Phaseolus vulgaris L.). Euphytica 102: 343-356. Koenig, R., and P. Gepts. 1989. Allozyme diversity in wild Phaseolus vulgaris: further evidence for two major centers of genetic diversity. Theor. Appl. Genet. 78: 809-817. Koinange, E.M., S.P. Singh, and P. Gepts. 1996. Genetic control of the domestication syndrome in common bean. Crop Sci. 36: 1037-1045. Kornegay, J., J.W. White, and O.O. de la Cruz. 1992. Growth habit and gene pool effects on inheritance of yield in common bean. Euphytica 62: 171-180. Kornegay, J., J.W. White, J.R. Dominguez, G. Tejado, and C. Cajiao. 1993. Inheritance of photoperiod response in Andean and Mesoamerican common bean. Crop Sci. 33: 977984. Mueller, N.D., J.S. Gerber, M. Johnston, D.K. Ray, N. Ramankutty, and J.A. Foley. 2012. Closing yield gaps through nutrient and water management. Nature 490: 254-257. Mukeshimana, G., L. Butare, P.B. Cregan, M.W. Blair, and J.D. Kelly. 2014. Quantitative trait loci associated with drought tolerance in common bean. Crop Sci. 54: 923-938. 27 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 Myles, S., J. Peiffer, P. Brown, E. Ersoz, Z. Zhang, D. Costich, and E. S. Buckler. 2009. Association mapping: critical considerations shift from genotyping to experimental design. Plant Cell 21: 2194-2202. Nordborg, M., and D. Weigel. 2008. Next-generation genetics in plants. Nature 456: 720-723. Pérez-Vega, E., A. Pañeda, C. Rodríguez-Suárez, A. Campa, R. Giraldez, and J.J. Ferreira. 2010. Mapping of QTLs for morpho-agronomic and seed quality traits in a RIL population of common bean (Phaseolus vulgaris L.). Theor. Appl. Genet. 120: 1367-1380. Price, A., N. Patterson, R. Plenge, M. Weinblatt, N. Shadick, and D. Reich. 2006. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38: 904 - 909. Pritchard, J., M. Stephens, and P. Donnelly. 2000. Inference of population structure using multilocus genotype data. Genetics 155: 945-959. Reed, J.W., A. Nagatani, T.D. Elich, M. Fagan, and J. Chory. 1994. Phytochrome A and phytochrome B have overlapping but distinct functions in Arabidopsis development. Plant Physiol. 104: 1139-1149. Rhee, S.Y., W. Beavis, T.Z. Berardini, G. Chen, D. Dixon, A. Doyle, et al. 2003. The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community. Nucleic Acids Res. 31: 224-228. SAS Institute. 2011. SAS version 9.3. SAS Institute Inc., Cary, NC. 28 Page 28 of 43 Page 29 of 43 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 Sassi, M. 2013. Impact of climate change and international prices uncertainty on the Sudanese sorghum market: A stochastic approach. Intern. Adv. Econ. Res. 19: 19-32. Schmutz, J., P. McClean, S. Mamidi, G. A. Wu, S. B. Cannon, J. Grimwood, J. Jenkins, S. Shu, Q. Song, C. Chavarro, M. Torres-Torres, V. Geffroy, S. M. Moghaddam, D. Gao, B. Abernathy, K. Barry, M. Blair, M. A. Brick, M. Chovatia, P. Gepts, D. M. Goodstein, M. Gonzales, U. Hellsten, D. L. Hyten, G. Jia, J. D. Kelly, D. Kudrna, R. Lee, M.M.S. Richard, P. N. Miklas , J. M. Osorno, J. Rodrigues, V. Thareau, C. A. Urrea, M. Wan, Y. Yu, M. Zhang, R. A. Wing, P. B. Cregan, D. S. Rokhsar, S.A. Jackson. 2014. A reference genome for common bean and genome-wide analysis of dual domestications. Nat. Genet. 46: 707-713. Scully, B., D.H. Wallace, and D. Viands. 1991. Heritability and correlation of biomass, growth rates, harvest index, and phenology to the yield of common beans. J. Amer. Soc. Hort. Sci. 116: 127-130. Singh, S.P., and J.A. Gutiérrez. 1984. Geographical distribution of the DL1 and DL2 genes causing hybrid dwarfism in Phaseolus vulgaris L., their association with seed size, and their significance to breeding. Euphytica 33: 337-345. Singh, S.P., H. Terán, M. Lema, D.M. Webster, C.A. Strausbaugh, P.N. Miklas, H.F. Schwartz, and M.A. Brick. 2007. Seventy-five years of breeding dry bean of the Western USA. Crop Sci. 47: 981-989. Stanton-Geddes, J., T. Paape, B. Epstein, R. Briskine, J. Yoder, J. Mudge, A. K. Bharti, A. D. Farmer, P. Zhou, R. Denny, G. D. May, S. Erlandson,.M. Yakub, M. Sugawara, M. J. 29 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 Page 30 of 43 Sadowsky, N. D. Young, and P. Tiffin. 2013. Candidate genes and genetic architecture of symbiotic and agronomic traits revealed by whole-genome, sequence-based association genetics in Medicago truncatula. PLoS ONE 8(5): e65688. Tar'an, B., T.E. Michaels, and K.P. Pauls. 2002. Genetic mapping of agronomic traits in common bean. Crop Sci. 42: 544-556. Vandemark, G.J., M.A. Brick, J.M. Osorno, J.D. Kelly, and C.A. Urrea. 2014. Edible Grain Legumes. In: S. Smith, B. Diers, J. Specht , and B. Carver, editors, Yield Gains in Major U.S. Field Crops. American Society of Agronomy, Inc., Crop Science Society of America, Inc., and Soil Science Society of America, Inc. p. 87-124. Wallace, D.H., J. P. Baudoin, J. S. Beaver, D. P. Coyne, D. E. Halseth, P. N. Masaya, H.M. Munger, J.R. Myers, M. Silbernagel, K.S. Yourstone, and R.W. Zobel. 1993a. Improving efficiency of breeding for higher crop yield. Theor. Appl. Genet. 86: 27-40. Wallace, D.H., K.S. Yourstone, P. N. Masaya, and R.W. Zobel. 1993b. Photoperiod gene control over partitioning between reproductive and vegetative growth. Theor. Appl. Genet. 86: 616. Watanabe, S., R. Hideshima, Z. Xia, Y. Tsubokura, S. Sato, Y. Nakamoto, N. Yamanaka, R. Takahashi, M. Ishimoto, T. Anai, S. Tabata, and K. Harada. 2009. Map-based cloning of the gene associated with the soybean maturity locus E3. Genetics 182: 1251-1262. White, W.J., J. Kornegay, J. Castillo, C. Molano, C. Cajiao, and G. Tejada. 1992. Effect of growth habit on yield of large-seeded bush cultivars of common bean. Field Crops Res. 29: 151-161. 30 Page 31 of 43 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 Wortmann, C.S. 1998. Atlas of common bean (Phaseolus vulgaris L.) production in Africa. CIAT, Cali, Colombia. Wright, E.M., and J.D. Kelly. 2011. Mapping QTL for seed yield and canning quality following processing of black bean (Phaseolus vulgaris L.). Euphytica 179: 471-484. Yan, J., M. Warburton, and J. Crouch. 2011. Association mapping for enhancing maize (Zea mays L.) genetic improvement. Crop Sci. 51: 433-449. Yu, J., and E.S. Buckler. 2006. Genetic association mapping and genome organization of maize. Curr. Opin. Biotechnol. 17: 155 - 160. Yu, J., G. Pressoir, W.H. Briggs, I.V. Bi, M. Yamasaki, J.F. Doebley, M. D. McMullen, B. S. Gaut, D. M. Nielsen, J. B Holland, S. Kresovich, and E. S. Buckler. 2005. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38: 203-208. Zhang, Z., S. Schwartz, L. Wagner, and W. Miller. 2000. A greedy algorithm for aligning DNA sequences. J. Comput. Biol. 7: 203-214. Zhang, Z., E. Ersoz, C.-Q. Lai, R. J. Todhunter, H. K. Tiwari, M. A. Gore, P. J. Bradbury, J. Yu, D. K. Arnett, , J. M. Ordovas, and E. S. Buckler. 2010. Mixed linear model approach adapted for genome-wide association studies. Nat. Genet. 42: 355-360. Zhao, K., C.-W. Tung, G.C. Eizenga, M.H. Wright, M.L. Ali, A.H. Price, G. J. Norton, , M. R. Islam, A. Reynolds, J. Mezey, A. M. Mc Clung, C. D. Bustamante, and S. R. McCouch. 31 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 2011. Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nat. Commun. 2: 467. Zhu, C., M. Gore, E.S. Buckler, and J. Yu. 2008. Status and prospects of association mapping in plants. The Plant Genome 1: 5-20. 32 Page 32 of 43 Page 33 of 43 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 Captions for the Figures Figure 1. Principle Component Analysis (PCA) plot of PC1 against PC2 illustrating the population structure in the ADP. The cluster of blue triangles represents the seven Middle American bean genotypes while the red represent the 237 Andean genotypes in two separate clusters Figure 2. Manhattan plots showing the same candidate SNP for both flowering in 2012 and maturity in 2013. The model of candidate gene Phvul.001G221100 associated with significant SNP on Pv01 is shown below. Figure 3. Manhattan plots showing candidate SNPs and their P-values from GWAS using MLM for Pod Harvest Index (PHI_13) on Pv03 in 2013, pod number (PN_13) on Pv05 and Pv07 in 2013, biomass (BM_12) on Pv02 and Pv08 in 2012 and pod weight (PW_12) on Pv08 in 2012 and number of pods per plant for 2013 season. Red line is the significance threshold of P=1.03 x 10-5 after Bonferonni correction of α = 0.05 Figure 4. Manhattan plots showing candidate SNPs and their P-values from GWAS using MLM for seed yield (kg ha-1) on Pv03 and Pv09, and HI on Pv03 in 2012. Red line is the significance threshold of P=1.03 x 10-5 after Bonferonni correction of α = 0.05 33 Page 34 of 43 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 Table 1. Means and ranges for ten agronomic traits measured on 237 common bean genotypes in the Andean Diversity Panel (ADP) grown at the Montcalm Research Farm, Entrican, MI in 2012 and 2013. ADP (n=237 genotypes) Trait Year Mean † Min. ‡ Max. ‡ Days to Flowering 2012 43.4±0.3 28.0 69.0 2013 44.7±0.3 34.0 60.0 2012 91.1±0.4 75.0 115.0 2013 89.2±0.3 73.0 113.0 2012 32.8±0.6 10.8 96.7 2013 25.5±0.3 12.3 48.1 2012 44.2±0.4 17.4 68.8 2013 45.2±0.5 16.1 70.3 2012 11.0±0.2 3.3 28.0 2013 9.2±0.1 4.0 20.7 2012 0.45±0 0.18 0.65 2013 0.50±0 0.26 0.76 2012 0.70±0 0.23 0.84 2013 0.73±0 0.40 0.83 2012 21.1±0.5 5.0 59.8 2013 17.8±0.2 4.9 64.3 2012 32.8±0.2 9.5 92.0 2013 29.4±0.4 11.5 69.2 Days to Maturity Biomass per Plant (g) Hundred Seed Weight (g) Pod Number per Plant Harvest Index Pod Harvest Index Pod Weight per Plant (g) Seeds per Plant 34 Page 35 of 43 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 Seed Yield per Plant (g) Seed Yield (kg ha-1) 2012 14.8±0.3 3.7 38.5 2013 12.9±0.2. 3.4 25.8 2012 1599±26.0 485 3689 2013 1647±31.5 136 3845 † Mean ± Standard Error of the Mean; ‡Max and Min represent the maximum and minimum range for a trait 35 Page 36 of 43 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 Table 2. Pearson correlation coefficients among ten agronomic traits measured on 237 common bean genotypes in the Andean Diversity Panel (ADP) grown at the Montcalm Research Farm, Entrican, MI in 2012 and 2013. Pod Pod Seed Seed Seed Yield/ Pod Harvest Harvest Days to Days to Seed Traits Weight Number Number Weight Plant Index Index Flowering Maturity Yield Biomass 0.87*** 0.68*** 0.62*** 0.24*** 0.87*** 0.12** -0.26*** 0.17** 0.19*** 0.25*** 0.72*** 0.61*** 0.39*** 0.96*** 0.07ns 0.61*** -0.27*** -0.22*** 0.37*** 0.81*** -0.17** 0.62*** 0.15** 0.39*** -0.1* -0.13** 0.17** -0.38*** 0.65*** 0.29*** 0.36*** 0.14** 0.04ns 0.07ns 0.34*** -0.13** 0.32*** -0.44*** -0.27*** 0.36*** 0.31*** 0.68*** -0.21*** -0.12* 0.36*** 0.41*** 0.15** 0.18** 0.06ns -0.37*** -0.39*** 0.46*** 0.70*** -0.33*** Pod Weight Pod Number Seed Number Seed Weight Seed Yield/Plant Pod Harvest Index Harvest Index Days to Flowering Days to Maturity -0.37*** * Significant at α=0.05; **Significant at α=0.01; ***Significant at α=0.001 36 Page 37 of 43 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 Table 3. Chromosome, position, p-values, proportion of phenotypic variation explained (R2) and minor allele frequency of two most significant SNPs for ten agronomic traits measured on 237 common bean genotypes in the Andean Diversity Panel (ADP) grown at the Montcalm Research Farm, Entrican, MI in 2012 and 2013. Minor Allele Trait Year SNP† Chromosome SNP Position P-value‡ R2 § Frequency Days to Flowering 2012 ss715646088 Pv08 57734680 6.9E-06 0.09 0.15 2012 ss715646578 Pv01 48340819 1.1E-05 0.10 0.37 Days to Maturity 2013 ss715646578 Pv01 48340819 7.4E-05 0.09 0.37 Biomass 2012 ss715639408 Pv08 5150618 5.2E-07 0.12 0.13 2012 ss715647433 Pv02 38769141 2.1E-06 0.10 0.10 2012 ss715639243 Pv03 45577363 2.9E-06 0.12 0.13 2012 ss715641141 Pv03 46054672 2.9E-06 0.12 0.13 Pod Harvest Index 2013 ss715648677 Pv04 297638 4.5E-06 0.10 0.29 Number of Pods 2013 ss715649615 Pv05 27957387 2.2E-06 0.10 0.03 2013 ss715647649 Pv07 40059490 3.8E-06 0.11 0.03 Harvest Index 37 Page 38 of 43 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 Pod Weight Seed Number Yield per Plant Seed Yield 2012 ss715639408 Pv08 5150618 4.3E-08 0.14 0.13 2012 ss715649359 Pv08 4743573 1.9E-07 0.14 0.13 2013 ss715647392 Pv08 59337110 8.8E-06 0.09 0.13 2013 ss715639901 Pv03 25241093 6.7E-07 0.13 0.09 2013 ss715650235 Pv05 27277193 4.5E-06 0.10 0.13 2012 ss715639408 Pv08 5150618 1.0E-07 0.13 0.13 2012 ss715649359 Pv08 4743573 2.8E-07 0.14 0.13 2013 ss715647002 Pv09 20618286 8.0E-06 0.09 0.12 2012 ss715648538 Pv03 38268568 4.5E-07 0.14 0.09 2012 ss715646178 Pv09 10005643 1.9E-06 0.11 0.09 †SNP=Single Nucleotide Polymorphic code; ‡P=significance level and E=exponential; § R2 is phenotypic variation explained by the SNP 38 Page 39 of 43 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 Table 4. Geographic distributions of the alleles of two significant SNPs with larger positive effect on seed yield measured on 237 bean genotypes in the Andean Diversity Panel (ADP) grown at the Montcalm Research Farm, Entrican, MI in 2012 and 2013. Allele and SNP G (ss715649410) † Country G (ss715646178) † Number of Genotypes Angola 2 1 Canada 1 1 Georgia 1 0 Kenya 2 2 Malawi 1 1 Puerto Rico 5 2 Tanzania 10 8 Uganda 1 1 USA 5 5 † G was the minor allele with a frequency of 0.12 and 0.09 for both ss715649410 (on Pv03) and ss715646178 (on Pv09), respectively. 39 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 Figure 1. Principle Component Analysis (PCA) plot of PC1 against PC2 illustrating the population structure in the ADP. The cluster of blue triangles represents the 7 Middle American genotypes while the red represent the 237 Andean genotypes in 2 separate clusters 254x190mm (150 x 150 DPI) Page 40 of 43 Page 41 of 43 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 Figure 2. Manhattan plots showing the same candidate SNP for both flowering in 2012 and maturity in 2013. The model of candidate gene Phvul.001G221100 associated with significant SNP on Pv01 is shown below. 254x190mm (150 x 150 DPI) The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 Figure 3. Manhattan Plots showing candidate SNPs and their P-values from GWAS using MLM for Pod Harvest Index (PHI_13) on Pv03 in 2013, pod number (PN_13) on Pv05 and Pv07 in 2013, biomass (BM_12) on Pv02 and Pv08 in 2012 and pod weight (PW_12) on Pv08 in 2012 and number of pods per plant for 2013 season. Red line is the significance threshold of P=1.03 x 10-5 after Bonferonni correction of α = 0.05 254x190mm (150 x 150 DPI) Page 42 of 43 Page 43 of 43 The Plant Genome Accepted paper, posted 03/16/2015. doi:10.3835/plantgenome2014.09.0059 Figure 4. Manhattan Plots showing candidate SNPs and their P-values from GWAS using MLM for seed yield (Kg ha-1) on Pv03 and Pv09, and HI on Pv03 in 2012. Red line is the significance threshold of P=1.03 x 105 after Bonferonni correction of α = 0.05 254x190mm (150 x 150 DPI)
© Copyright 2024