Genomic predictions in Angus cattle: Comparisons of sample size, response... and clustering methods for cross-validation

Genomic predictions in Angus cattle: Comparisons of sample size, response variables,
and clustering methods for cross-validation
P. Boddhireddy, M. J. Kelly, S. Northcutt, K. C. Prayaga, J. Rumph and S. DeNise
J ANIM SCI 2014, 92:485-497.
doi: 10.2527/jas.2013-6757 originally published online January 15, 2014
The online version of this article, along with updated information and services, is located on
the World Wide Web at:
http://www.journalofanimalscience.org/content/92/2/485
www.asas.org
Downloaded from www.journalofanimalscience.org by guest on October 6, 2014
Genomic predictions in Angus cattle: Comparisons of sample size, response
variables, and clustering methods for cross-validation1
P. Boddhireddy,*2 M. J. Kelly,† S. Northcutt,‡ K. C. Prayaga,§ J. Rumph,* and S. DeNise*
*Zoetis Inc., Kalamazoo, MI 49007; †Queensland Alliance for Agriculture and Food Innovation, University of Queensland,
Brisbane St. Lucia, QLD, 4072, Australia; ‡American Angus Association, 3201 Frederick Ave, Saint Joseph, MO 64506;
and §Zoetis Inc., 45 Poplar Road, Parkville, Victoria, 3052, Australia
ABSTRACT: Advances in genomics, molecular
biology, and statistical genetics have created a
paradigm shift in the way livestock producers pursue
genetic improvement in their herds. The nexus of these
technologies has resulted in combining genotypic
and phenotypic information to compute genomically
enhanced measures of genetic merit of individual
animals. However, large numbers of genotyped and
phenotyped animals are required to produce robust
estimates of the effects of SNP that are summed together
to generate direct genomic breeding values (DGV).
Data on 11,756 Angus animals genotyped with the
Illumina BovineSNP50 Beadchip were used to develop
genomic predictions for 17 traits reported by the
American Angus Association through Angus Genetics
Inc. in their National Cattle Evaluation program.
Marker effects were computed using a 5-fold crossvalidation approach and a Bayesian model averaging
algorithm. The accuracies were examined with EBV
and deregressed EBV (DEBV) response variables
and with K-means and identical by state (IBS)-based
cross-validation methodologies. The cross-validation
accuracies obtained using EBV response variables
were consistently greater than those obtained using
DEBV (average correlations were 0.64 vs. 0.57). The
accuracies obtained using K-means cross-validation
were consistently smaller than accuracies obtained
with the IBS-based cross-validation approach (average
correlations were 0.58 vs. 0.64 with EBV used as a
response variable). Comparing the results from the
current study with the results from a similar study
consisting of only 2,253 records indicated that larger
training population size resulted in higher accuracies
in validation animals and explained on average 18%
(69% improvement) additional genetic variance across
all traits.
Key words: accuracy, Angus cattle, clustering methods, cross-validation, identical by state, response variables
© 2014 American Society of Animal Science. All rights reserved. J. Anim. Sci. 2014.92:485–497
doi:10.2527/jas2013-6757
Introduction
Genomic selection methods add value to
traditional breeding programs by improving the
accuracy of breeding value estimates. The improved
accuracy enables faster genetic improvement by
decreasing generation intervals and providing more
accurate selection decisions. Genomic selection
involves using a training population with genotypes
1Authors thank American Angus Association and its subsidiary
Angus Genetics, Incorporated (AGI), for providing EPDs, pedigree,
and other data for this analysis.
2Corresponding author: Prashanth.Boddhireddy@Zoetis.com
Received May 28, 2013.
Accepted November 27, 2013.
and phenotypes to simultaneously estimate effects of
thousands of SNP across the genome (Meuwissen et
al., 2001). These SNP effects can be used to predict
genetic merit of any genotyped animal referred to as
direct genomic values (DGV), which can be combined
with EBV to compute genomically enhanced EBV
(VanRaden et al., 2009).
The accuracy of the estimated DGV is important
to the application of genomic selection in animal
breeding. Several factors influence DGV accuracy
including the number of records in the training
population, the relationship between the discovery
population and the target validation population
(Habier et al., 2007, 2010; Clark et al., 2012), the type
of response variable (e.g., raw phenotypes, EBV, or
485
Downloaded from www.journalofanimalscience.org by guest on October 6, 2014
486
Boddhireddy et al.
deregressed EBV [DEBV; Garrick et al., 2009]) used
for estimating SNP effects, and the methodologies
used for clustering training data for cross-validation
(Saatchi et al., 2011). Others include the effective
population size (Goddard and Hayes, 2009; Daetwyler
et al., 2010), the extent oflinkage disequilibrium (LD)
and density of genotypes (De Roos et al., 2008; Hayes
et al., 2009a; Kim and Kirkpatrick, 2009; Wolc et
al., 2011), the number of QTL contributing towards
a trait of interest (Hayes et al., 2010), the heritability
of the trait (Goddard and Hayes, 2009), the accuracy
associated with the measurement of phenotypes, and
statistical methodologies used for estimating DGV and
corresponding accuracies (Moser et al., 2009; Su et al.,
2010).
In this study we evaluate the effect of EBV and
DEBV response variables, the utility of K-means and
identical by state (IBS)-based clustering methods for
cross-validation, and the effect of training population
size on the prediction accuracy.
Materials and methods
Data
The study used genotypes on 11,756 registered
Angus animals with expected progeny differences
(EPD) records. The cryopreserved semen, hair follicles,
or blood samples of these animals were procured
from various commercial AI organizations, research
organizations, and commercial breeders. The DNA was
extracted from these samples and genotyped using the
Illumina BovineSNP50 chip (Illumina Inc., San Diego,
CA).
Seventeen traits were used in this analysis including
birth weight(BIR_WT), calving ease direct (CED),
calving ease maternal (CEM), carcass weight (CW),
docility, fat thickness (FAT), DMI, heifer pregnancy
(HP), marbling score (MARB), mature height (MHT),
mature weight, maternal milking ability (MILK),
rib eye area (REA), scrotal circumference, weaning
weight (WW), yearling weight, and yearling height.
By combining 3 generations of pedigree information
on sires and dams, a total of 23,448 EPD and 52,164
pedigree records were available for this study.
The EPD and their associated Beef Improvement
Federation (BIF) accuracies were obtained from the
American Angus Association (AAA) national cattle
evaluation in March 2012. The number of records
with both phenotypes and genotypes available varied
across traits ranging from 573 records for HP to 7,976
records for CW (Table 1). Heritability estimates for the
traits ranged from 0.12 for CEM to 0.64 for MHT as
estimated using the entire AAA phenotypic database
(Angus National Cattle Evaluation; http://www.angus.
Table 1. Heritability estimates and summary statistics for the traits used in this study. Cross-validation (CRV) and
external validation (EXV) are data splits representing cross-validation and validation datasets, respectively
Trait
h2
Animals
EPD1 Mean
EPD SD
Mean BIF2
accuracy
Birth wt, kg
Calving ease direct, %
Calving ease maternal, %
Carcass wt, kg
Docility, %
Fat thickness, cm
DMI, kg/d
Heifer pregnancy, %
Marbling score
Mature height, cm
Maternal milking ability, kg
Mature wt, kg
Rib eye area, cm2
Scrotal circumference, cm
Weaning wt, kg
Yearling height, cm
Yearling wt, kg
0.42
0.20
0.12
0.28
0.37
0.42
0.31
0.13
0.37
0.64
0.14
0.37
0.37
0.47
0.20
0.50
0.20
7,804
6,905
6,905
7,976
2,012
7,974
2,570
573
7,975
676
7,804
676
7,976
4,610
7,804
3,103
7,804
0.81
6.24
7.64
9.25
11.31
0.03
0.08
8.26
0.51
0.86
11.08
33.4
1.94
0.43
23.38
0.81
42.27
0.73
3.81
2.97
4.00
10.27
0.05
0.08
2.1
0.28
1.17
2.74
29.6
1.35
0.54
4.38
0.81
7.29
0.39
0.31
0.18
0.17
0.36
0.19
0.29
0.26
0.22
0.47
0.21
0.45
0.26
0.39
0.31
0.44
0.27
1EPD
2BIF
= expected progeny differences.
= Beef Improvement Federation.
Downloaded from www.journalofanimalscience.org by guest on October 6, 2014
Mean BIF
accuracy in
CRV
Mean BIF
accuracy in
EXV
0.43
0.34
0.22
0.21
0.42
0.23
0.33
0.32
0.25
0.57
0.26
0.53
0.30
0.44
0.34
0.50
0.31
0.31
0.24
0.10
0.10
0.24
0.13
0.23
0.13
0.16
0.27
0.13
0.29
0.19
0.31
0.23
0.34
0.18
Genomic predictions in Angus cattle
org/Nce/Heritabilities.aspx). The mean BIF accuracies
are presented in Table 1.
Quality Control for Genotypes. Animals with
identical genotypes (i.e., genomic twins) as well as
animals with call rate less than 80% were removed
from the analysis. Markers that had a call rate less than
70% were also removed from the analysis. Genotypes
derived from both the version 1 and version 2 of the
BovineSNP50 were used so only markers present on
both chips were considered in this study. After editing,
the total number of markers remaining for the analysis
was 48,048.
Response Variables. Estimated breeding values
and DEBV were used as response variables for
genomic analyses. For this purpose, initially EPD were
multiplied by 2 to derive EBV and the BIF accuracies
were converted into standard accuracies using the
formula proposed in BIF guidelines (Beef Improvement
Federation, 2010) as
r = éêë1- (1- BIF ) 2 ùúû
1/ 2
.
Deregressed EBV were computed using EBV
and their respective accuracies as per the approach
proposed by Garrick et al. (2009). This approach is
designed to eliminate the parental information and
shrinkage inherent in EBV. The base-adjustment value
required for calculation of DEBV was computed as the
mean value of the EBV (for that trait). Furthermore,
to ensure the quality of DEBV used in the study, only
animals with DEBV accuracy greater than a 0.05
487
threshold were included in the analysis, similar to the
study by Ostersen et al. (2011). The number of animals
removed from the genomic analyses with DEBV as
the response variable varied depending on the trait and
associated accuracies (Fig. 1). As a consequence of the
deregression methodology and the threshold imposed,
DEBV records of CEM and MILK traits had only onethird the number of records compared to EBV response
variables. Hence, these traits were not analyzed with
DEBV response variables.
Statistical Methodology
The estimation and evaluation of genomic
selection results was undertaken in 2 steps denoted as
cross-validation (CRV) and external validation (EXV).
The available data for each trait were divided into 2
groups based on EBV accuracy: the top two-thirds high
accuracy animals as the CRV dataset and the bottom
one-third low accuracy animals as the EXV dataset.
Cross-Validation. The training dataset was
classified into 5 groups either using K-means or IBSbased clustering methods. During CRV, a prediction
equation was developed using the marker effects
estimated in 4 of the 5 groups and tested in the fifth
group that was not used for marker effect estimation.
This fifth group provided the CRV results from that
calibration. This process was repeated 5 times leaving
a different group out of the estimation of marker effects
each time. The prediction equations applied to the
Figure 1. Number of observations in training cross-validation (CRV) group and validation group (external validation [EXV]) for response variables EBV
and deregressed EBV (DEBV) for several traits studied.
Downloaded from www.journalofanimalscience.org by guest on October 6, 2014
488
Boddhireddy et al.
EXV dataset were the average marker effects across 5
calibrations.
K-Means and Identical by State-Based Clusterings
for Cross-Validation. Both K-means and IBS-based
clustering methods use a distance matrix that is a
measure of the relationship among the animals in the
dataset. The distance matrix was computed from the
pedigree and genotypes for K-means and IBS-based
methods, respectively.
In the K-means methodology, the distance matrix
was the A matrix (additive relationship matrix computed
from the pedigree) with slight modifications to each
of the elements in the matrix. If the elements of the A
matrix were denoted as aij, the elements of the distance
matrix for K-means methodology was computed as dij =
1 – [aij/(aiiajj)1/2] as described by Saatchi et al. (2011),
in which dij is a measure of pedigree distance between
individual i and individual j, aij is the additive genetic
relationship between individual i and individual j, and
aii and ajj are diagonal elements of the A matrix. The A
matrix was computed using PyPedal (Cole, 2007) and
both the relationship matrix and K-means methodology
(Hartigan and Wong, 1979) were implemented in R (R
Development Core Team, 2011).
In the IBS-based method, the distance matrix
consisted of IBS distances computed from marker
genotypes. An R package called GenABEL (Aulchenko
et al., 2007) was used for the computation of the IBS
distance matrix as
fij = å k éêë ( xi ,k - pk )( x j ,k - pk )ùúû / [ pk ´ (1- pk ) ] ,
in which fij is the relationship between animals i and
j, xi,k is the genotype at marker k for animal i, xj,k is
the genotype at marker k for animal j, and pk is the
allele frequency for marker k. The IBS distance matrix
was then subjected to principal component analysis to
reduce the dimensionality of the relationship matrix
from n × n to n × 1. Ranks derived from the first principal
component (PC1), which explains the greatest amount
of variation, were used for partitioning animals into 1
of 5; the first one-fifth records were assigned to group
1, the next one-fifth records were assigned to group 2,
and so on.
The maximum relationship coefficients (amax)
were computed to evaluate the relationship among
animals within and across clusters for different
clustering methods. For an animal k belonging to
cluster i, amax(i,j)k was the maximum of the relationship
value (in the numerator relationship matrix) across
all the animals in cluster j. The average maximum
relationship of all animals in cluster i to the animals in
cluster j was amax(i,j). The estimate of amax_within for a
cluster i was then amax(i,i) and amax_across for a given
cluster i was computed as the average of (amax(i,j)),
in which j ≠ i. In addition, the difference between
amax_within and amax_across, denoted as amax_diff, was
also computed. Although IBS-based and K-means
clustering methods were of primary interest in the
current study, for comparative purposes and to enable
better understanding of the effect of clustering methods
on CRV results, comparisons were also made against
random clustering and IBS-based methodology with
unequal cluster sizes (IBS_UnEqual) for BIR_WT.
In the IBS_UnEqual clustering method, the clustering
was solely driven by PC1 values ignoring cluster sizes.
The difference between maximum and minimum PC1
values across all clusters within IBS_UnEqual method
was similar whereas the same was not necessarily
true in the case of IBS-based method with equal sized
clusters.
Computation of Marker Effects and Direct
Genomic Value. The allele substitution effect for each
marker was computed using Gensel (Fernando and
Garrick, 2008) in which a Bayesian method called
BayesC was used for estimation of marker effects.
BayesC fits a statistical model assuming a known
fraction of markers as having zero effects, which in the
current analyses was set to be 0.95. This is referred
to as πi denoting the proportion of markers excluded
in the model in each Gibbs sampling run. The total
number of Markov Chain Monte Carlo iterations used
for estimating posterior means of marker effects and
variances was 45,000, of which the first 5,000 were
discarded for burn-in. For each trait, both EBV and
DEBV were used as response variables. In the analyses
with EBV as response variable, accuracies were used
as weighting factors. In the analyses with DEBV as the
response variable, respective weights were calculated
according to Garrick et al. (2009). The weight for the
ith animal was wi = (1 – h2)/{[c + (1 – r2i)/r2i]h2}, in
which h2 was the heritability of the trait, c was the part
of the genetic variance not explained by markers, and
r2i was the reliability of the DEBV of the ith animal.
Three different c values, 0.10, 0.50, and 0.70, were
used for computing weights.
The basic statistical model used for estimating
these marker effects for each trait was
,
m
in which yi is the response yi = ì + å xij b j + evariable, m
j =1
is the number of markers,
xij denotes
genotype codes 0, 1, and 2 (for the 0, 1, and 2 copies of
“A” allele, respectively), bj is the substitution effect of
marker j, and e is residual error. The DGV of the
targeted animals in a validation dataset were calculated
Downloaded from www.journalofanimalscience.org by guest on October 6, 2014
489
Genomic predictions in Angus cattle
Figure 2. Direct genomic value (DGV) accuracies compared for EBV and deregressed EBV (DEBV) response variables in the external validation dataset.
Direct genomic values, DGV_EBV and DGV_DEBV were computed with EBV and DEBV response variables, respectively. For EBV response variables, accuracy was measured as correlation between DGV and EBV. For DEBV response variables, accuracy was computed as correlation between DGV and DEBV
divided by square root of heritability.
by summing the products of the estimated marker
substitution effects ( bˆ j ) and their genotype codes xij as
DGVi =
m
å x bˆ
j =1
ij
j
.
Five sets of DGV were computed, 1 corresponding
to each of the CRV group. A pooled correlation across
the 5 CRV groups was computed as described in
Snedecor and Cochran (1967). Direct genomic values
were also computed for EXV animals. The marker
effects for computing DGV were average of marker
Table 2. Number of observations and summary statistics for several traits used in the year 2010 study1. Crossvalidation (CRV) and external validation (EXV) are data splits representing cross-validation and validation datasets,
respectively
Trait
h2
Animals
EPD2 mean
Birth wt, kg
0.31
1,783
0.98
Calving ease direct, %
0.10
1,722
4.67
Calving ease maternal, %
0.14
1,722
5.67
Carcass wt, kg
0.29
1,812
4.63
DMI, kg/d
0.48
2,253
0.12
Fat thickness, cm
0.42
1,844
0.03
Marbling score
0.37
1,844
0.27
Maternal milking ability, kg
0.12
1,783
9.80
0.37
1,844
0.90
Rib eye area, cm2
Weaning wt, kg
0.25
1,783
21.41
1This dataset refers to a subset data of the current study that was analyzed in year 2010.
2EPD = expected progeny differences.
3BIF = Beef Improvement Federation.
EPD SD
Mean BIF3
accuracy
Mean BIF
accuracy in
CRV
Mean BIF
accuracy in
EXV
0.37
1.95
1.42
0.73
0.14
0.03
0.1
1.15
0.58
1.96
0.39
0.30
0.16
0.14
0.28
0.20
0.20
0.30
0.23
0.30
0.42
0.32
0.18
0.18
–
0.23
0.24
0.32
0.27
0.33
0.32
0.22
0.09
0.08
–
0.13
0.14
0.21
0.17
0.23
Downloaded from www.journalofanimalscience.org by guest on October 6, 2014
490
Boddhireddy et al.
Table 3. Accuracies of direct genomic values (DGV), regression coefficients, and percent genetic variation explained
by DGV with EBV and deregressed EBV (DEBV) response variables
Trait
r1
EBV
b1
Training (cross-validation)
DEBV
%GV1
r
b
External validation
%GV
r
EBV
b
%GV
r
DEBV
b
%GV
Birth wt, kg
0.55
0.90
30
0.54
1.01
29
0.57
0.86
32
0.44
1.02
19
Calving ease direct, %
0.59
0.91
34
0.70
1.20
49
0.59
0.99
35
0.54
1.31
30
Calving ease maternal, %
0.65
0.92
42
–
–
–
0.62
0.92
39
–
–
–
Carcass wt, kg
0.74
0.91
55
0.43
0.95
18
0.62
0.88
39
0.54
1.75
30
Docility, %
0.64
0.80
41
0.71
0.78
50
0.69
0.91
48
0.58
1.22
34
Fat thickness, cm
0.74
1.19
54
0.48
1.12
23
0.61
1.19
37
0.46
1.44
21
DMI, kg/d
0.44
1.01
19
0.34
0.95
12
0.39
0.99
15
0.22
0.86
5
Heifer pregnancy, %
0.57
0.95
31
0.54
2.22
29
0.71
0.87
51
0.12
1.25
2
Marbling score
0.76
1.00
58
0.51
1.05
26
0.78
1.01
61
0.48
1.24
23
Mature height, cm
0.56
1.04
31
0.52
0.97
27
0.66
1.11
43
0.34
1.12
11
Maternal milking ability, kg
0.72
0.91
52
–
–
–
0.72
0.87
52
–
–
–
Mature wt, kg
0.47
0.73
22
0.28
0.21
8
0.55
0.90
30
0.24
0.35
6
Rib eye area, cm2
0.71
0.97
50
0.62
0.95
38
0.65
0.97
42
0.50
1.17
25
Scrotal circumference, cm
0.62
0.91
38
0.60
1.03
36
0.63
0.94
40
0.41
1.11
17
Weaning wt, kg
0.71
0.92
50
0.77
1.16
59
0.68
0.88
46
0.62
1.21
39
Yearling height, cm
0.62
0.94
37
0.57
1.05
32
0.61
0.85
37
0.41
1.00
16
Yearling wt, kg
0.74
0.93
55
0.91
1.15
83
0.68
0.83
47
0.71
1.20
51
AVG
0.64
0.94
41
0.57
1.05
35
0.63
0.94
41
0.44
1.15
22
1r = accuracy of DGV estimated as correlation between DEBV or EBV and DGV; b = regression of DEBV or EBV on DGV; %GV = percent genetic variance
explained by DGV.
effects across the 5 CRV groups for both K-means and
IBS-based strategies.
Parameters to Evaluate Direct Genomic Values
A perfect evaluation of genomic predictions
requires the estimation of the correlation between true
breeding values and DGV. As true breeding values
were not available, an approximation to the evaluation
was achieved by computing the correlation between
predicted values (DGV) and EBV in both the CRV and
EXV datasets in the current study. In the case of DEBV,
the correlation was divided by the square root of the
heritability since the DEBV were on the phenotypic
scale.
In addition, the regression of EBV or DEBV on
DGV was calculated to evaluate prediction bias. The
DGV were unbiased, downward biased, and upward
biased when regression coefficients were closest to,
greater than, and less than 1, respectively. Percent
genetic variation (%GV) explained was computed by
squaring the correlation between DGV and EBV and
then multiplying by 100.
Subset of the Data. One of the objectives of this
study was to evaluate the effect of size of the training
dataset on the accuracy of genomic predictions. A
subset of the above mentioned dataset with 10 traits
was analyzed in 2010, referred to as the year 2010
study in this publication. A total of 2,253 records
with both genotypes and EPD were available for the
year 2010 study. The EPD were obtained from AAA
national cattle evaluation in November 2010. The EPD
were initially multiplied by 2 to derive EBV and BIF
accuracies were converted into standard accuracies
as described earlier. The animals were genotyped on
version 1 of the Illumina BovineSNP50 chip. The 10
traits common with the current study were BIR_WT,
CED, CEM, CW, FAT, DMI, MARB, MILK, REA, and
WW. Heritability estimates were similar for most of
the traits between the studies. The number of records
available ranged from 1,722 records for CED to 2,253
records for DMI. Summary statistics for this data subset
are presented in the Table 2. Similar to the current
study, the year 2010 dataset was divided into training
and validation groups consisting of two-thirds higher
accuracy bulls and one-third lower accuracy animals,
respectively. The genomic analyses methodology was
the same as the current study. However, only the EBV
response variable with IBS-based clustering method
was used in the CRV.
Results
Estimated Breeding Value vs. Deregressed EBV
Response Variables
The correlations, regression coefficients, and %GV
for training CRV and EXV are presented in Table
Downloaded from www.journalofanimalscience.org by guest on October 6, 2014
491
Genomic predictions in Angus cattle
Table 4. Accuracies of direct genomic values (DGV), regression coefficients of deregressed EBV (DEBV) or EBVon
DGV and percent genetic variation explained by DGV with EBV and DEBV as response variables comparing
identical by state (IBS)-based and K-means cross-validation methods
Trait
r1
IBS based (EBV)
b1
%GV1
r
K-means (EBV)
b
%GV
r
IBS based (DEBV)
b
%GV
r
K-means (DEBV)
b
%GV
Birth wt, kg
0.55
0.9
30
0.49
0.87
24
0.54
1.01
29
0.47
0.99
22
Calving ease direct, %
0.59
0.91
34
0.51
0.86
26
0.70
1.20
49
0.61
1.21
38
Calving ease maternal, %
0.65
0.92
42
0.59
0.84
34
–
–
–
–
–
–
Carcass wt, kg
0.74
0.91
55
0.68
0.91
46
0.43
0.95
18
0.38
0.95
15
Docility, %
0.64
0.80
41
0.63
0.79
39
0.71
0.78
50
0.70
0.79
50
Fat thickness, cm
0.74
1.19
54
0.70
1.21
49
0.48
1.12
23
0.42
1.00
18
DMI, kg/d
0.44
1.01
19
0.41
0.92
17
0.34
0.95
12
0.33
0.95
11
Heifer pregnancy, %
0.57
0.95
31
0.54
0.97
29
0.54
2.22
29
0.58
4.64
34
Marbling score
0.76
1.00
58
0.77
0.95
60
0.51
1.05
26
0.55
1.12
30
Mature height, cm
0.56
1.04
31
0.43
0.97
18
0.52
0.97
27
0.36
0.83
13
Maternal milking ability, kg
0.72
0.91
52
0.66
0.87
44
–
–
–
–
–
–
Mature wt, kg
0.47
0.73
22
0.33
0.62
11
0.28
0.21
8
0.14
0.12
2
Rib eye area, cm2
0.71
0.97
50
0.69
0.94
48
0.62
0.95
38
0.58
0.89
34
Scrotal circumference, cm
0.62
0.91
38
0.55
0.90
30
0.60
1.03
36
0.52
1.01
27
Weaning wt, kg
0.71
0.92
50
0.62
0.87
39
0.77
1.16
59
0.68
1.22
47
Yearling height, cm
0.62
0.94
37
0.58
0.94
33
0.57
1.05
32
0.55
1.07
30
Yearling wt, kg
0.74
0.93
55
0.66
0.87
44
0.91
1.15
83
0.81
1.16
65
AVG
0.64
0.94
42
0.58
0.9
35
0.57
1.05
35
0.51
1.2
29
1r = accuracy of DGV estimated as correlation between DEBV or EBV and DGV; b = regression of DEBV or EBV on DGV; %GV = percent genetic variance
explained by DGV.
Figure 3. The mean of the maximum relationship coefficients (amax) values between different clusters for identical by state (IBS)-based, K-means, random,
and IBS-based methodology with unequal cluster sizes (IBS_UnEqual) clustering methods.
Downloaded from www.journalofanimalscience.org by guest on October 6, 2014
492
Boddhireddy et al.
Table 5. Comparison of relationship among animals within (amax_within) and across (amax_across) clusters in various
methods
amax_within
amax_across
Method
Cluster
Animals
K-means
Cluster 1
284
0.56
0.24
K-means
Cluster 2
2,150
0.47
0.31
K-means
Cluster 3
1,243
0.50
0.31
K-means
Cluster 4
1,039
0.37
0.21
K-means
Cluster 5
434
0.55
0.23
Average
–
–
0.49
0.26
Cluster 1
1,030
0.55
0.28
IBS4 based
IBS based
Cluster 2
1,030
0.43
0.32
IBS based
Cluster 3
1,030
0.38
0.32
IBS based
Cluster 4
1,030
0.37
0.32
IBS based
Cluster 5
1,030
0.45
0.30
Average
–
–
0.44
0.31
Random
Cluster 1
1,093
0.38
0.39
Random
Cluster 2
999
0.37
0.37
Random
Cluster 3
1,030
0.37
0.37
Random
Cluster 4
1,032
0.37
0.38
Random
Cluster 5
996
0.40
0.39
Average
–
–
0.38
0.38
Cluster 1
71
0.43
0.16
IBS_UnEqual5
IBS_UnEqual
Cluster 2
2,423
0.45
0.35
IBS_UnEqual
Cluster 3
1,731
0.44
0.35
IBS_UnEqual
Cluster 4
638
0.49
0.29
IBS_UnEqual
Cluster 5
287
0.57
0.24
Average
–
–
0.48
0.28
1a
max_diff = the difference between amax_within and amax_across.
2Training cross-validation correlations (r) between EBV and direct genomic values for birth weight.
3CI = confidence interval of correlations.
4IBS = identical by state.
5IBS_UnEqual = IBS-based methodology with unequal cluster sizes.
3. The results indicate that the CRV and EXV mean
correlations were higher and consistent across traits
when using EBV response variable than those obtained
using DEBV. The mean CRV correlations across all
traits were 0.64 and 0.57 using EBV and DEBV traits,
respectively (Table 3). With EXV data sets, the mean
correlations were 0.63 and 0.44 for EBV and DEBV
traits, respectively (Table 3). These results demonstrate
that DEBV prediction accuracies were higher when
EBV accuracies were higher. The accuracies obtained
with c values of 0.1, 0.5, and 0.7 (used in computation
weights of DEBV) resulted in similar accuracies
across all traits indicating that the accuracies were not
sensitive to choice of c used for DEBV weights. The
DEBV results presented in the tables were based on c
value of 0.50.
The predictive ability of DGV was further explored
by contrasting the response variables used in developing
prediction equations (Fig. 2). Estimated breeding values
were predicted with higher accuracies using prediction
equations developed with EBV than those developed
with DEBV response variables. In contrast, DEBV
were predicted with similar accuracies with prediction
amax_diff1
0.32
0.17
0.18
0.16
0.32
0.23
0.27
0.11
0.06
0.05
0.15
0.13
–0.01
0.00
0.00
–0.01
0.01
0.00
0.27
0.10
0.09
0.19
0.33
0.20
r2
0.43
0.49
0.52
0.46
0.52
0.48
0.50
0.55
0.55
0.59
0.58
0.55
0.58
0.62
0.60
0.56
0.60
0.59
0.53
0.53
0.52
0.54
0.47
0.52
CI3 of r
0.33 to 0.52
0.46 to 0.52
0.48 to 0.56
0.41 to 0.51
0.45 to 0.59
–
0.45 to 0.54
0.51 to 0.59
0.51 to 0.59
0.55 to 0.63
0.54 to 0.62
–
0.54 to 0.62
0.58 to 0.66
0.56 to 0.64
0.52 to 0.60
0.56 to 0.64
–
0.34 to 0.68
0.50 to 0.56
0.48 to 0.55
0.48 to 0.59
0.37 to 0.56
–
equations developed with EBV response variables as
those developed with DEBV response variables at least
in some traits.
Identical by State-Based vs. K-Means Clustering
The training CRV mean correlations obtained with
IBS-based clustering were higher than those obtained
using K-means clustering methodology. The average
correlation across all traits was 0.64 and 0.58 with
IBS based and K-means, respectively (Table 4), with
EBV as the response variable. The average correlation
across all traits was 0.57 and 0.51 with IBS based and
with K-means, respectively, when DEBV was used
as the response variable. The correlation between
IBS-based and K-means correlations across all the
traits was approximately 0.96. In general, correlations
were higher in IBS-based clustering methods with
the exception of marbling for which accuracies were
slightly higher for K-means clustering method.
Downloaded from www.journalofanimalscience.org by guest on October 6, 2014
493
Genomic predictions in Angus cattle
Table 6. Comparison of accuracies of direct genomic values (DGV) and percent genetic variation explained by DGV
across the current study and year 2010 study in cross-validation (CRV) and external validation (EXV) datasets
Trait
Birth wt, kg
Calving ease direct, %
Calving ease maternal, %
Carcass wt, kg
Fat thickness, cm
DMI, kg/d
Marbling score
Maternal milking ability, kg
Rib eye area, cm2
Animals
1,783
1,722
1,722
1,812
1,844
2,253
1,844
1,783
Year 2010 study results
%GV1 –
r1 – CRV
CRV
r – EXV
0.55
0.57
0.53
0.63
0.70
0.39
0.77
0.68
30
32
40
40
49
17
59
46
0.52
0.41
0.67
0.50
0.61
0.28
0.49
0.43
%GV –
EXV
Animals
27
17
45
25
37
11
24
18
7,804
6,905
6,905
7,976
7,974
2,570
7,975
7,804
Current results
%GV –
r – CRV
CRV
r – EXV
0.55
0.59
0.65
0.74
0.74
0.44
0.76
0.72
%GV –
EXV
30
35
43
55
55
19
58
52
0.57
0.60
0.62
0.73
0.74
0.39
0.76
0.73
32
36
39
53
55
15
58
53
1,844
0.65
42
0.48
23
7,976
0.71
50
Weaning wt, kg
1,783
0.64
41
0.53
28
7,804
0.71
50
AVG
1,839
0.61
40
0.49
26
7,169
0.66
45
1r = accuracy of DGV estimated as correlation between EBV and DGV; %GV = percent genetic variance explained by DGV.
0.70
0.70
0.65
49
49
44
Relationship of Animals Within and Across Clusters
with Different Clustering Methods
for random clustering method is uniform across all
clustering methods.
The number of records, amax_within, amax_across,
and amax_diff values, indicating the relationship among
animals within and across clusters created through
K-means, IBS-based, random, and IBS_UnEqual
clustering methods are presented in Table 5. As
expected, these results demonstrate that K-means
method and IBS_UnEqual methods disproportionately
allocate animals to different clusters. One of the 5
clusters had more than 40% of the animals while another
had less than 6% of the animals. The amax_diff values
for K-means, IBS based, random, and IBS_UnEqual
are 0.23, 0.12, 0.00, and 0.20, respectively, indicating
that the K-means method provides tighter clustering
followed by IBS_UnEqual, IBS-based, and random
method, respectively. The average correlation between
EBV and DGV for BIR_WT trait was 0.48, 0.55,
0.59, and 0.52 for K-means, IBS-based, random, and
IBS_UnEqual methods, respectively. The confidence
interval range around the correlations was smaller
(0.08–0.09) for IBS-based and random methods while
the range was larger for K-means (0.06–0.19) and IBS_
UnEqual (0.06–0.34).
The distribution of amax values for different
clustering methods is shown in Fig. 3. The results
demonstrate that in IBS-based and IBS_UnEqual
clustering methods, the animals in cluster i are more
closely related to the animals in cluster i – 1 and i
+ 1 than to those in clusters i – 2 and i + 2. Similar
distinctions could not be made for K-means clusters
since the clustering was not driven by values on a
linear scale as in IBS method. The distribution of amax
Current Study vs. Year 2010 Study
The results of this study were compared to the
results obtained from a previous study with the same
methodology but with fewer records (see Table 6). The
comparison is based on results obtained using only
EBV response variables. The average CRV correlations
are comparable in both studies albeit with moderate
increases in the current study. However, EXV results
demonstrate a significant improvement in accuracy
(0.49 vs. 0.65; Table 6). Significant improvement was
observed for CED, CW, MARB, MILK, REA, WW, and
YW. The average amount of genetic variance explained
increased from 26 to 44%, an average improvement of
69%.
Discussion
Genomic selection has made significant
contributions in improving the accuracy of selection
decisions, especially in dairy breeds. Similar efforts
in beef cattle have been constrained by the lack of
large pedigreed and recorded resource populations. In
this study, we assembled a relatively large resource
population for Angus beef cattle to generate robust
genomic predictions across a wide range of economic
traits. Furthermore, the effects of methodological
variations on accuracy were also presented with a view
to compare the response variables, the CRV groupings,
and the size of the training dataset.
In the current study, a 2-step validation was
implemented. In general, the accuracies obtained
Downloaded from www.journalofanimalscience.org by guest on October 6, 2014
494
Boddhireddy et al.
from CRV and EXV datasets were comparable and
consistent within the EBV response variable while the
accuracies obtained with DEBV response variables
were considerably higher in CRV datasets than those
observed in EXV datasets. One reason for such
lower DEBV prediction accuracies in EXV dataset
is that DEBV (obtained with low accuracy EBV) are
expected to have smaller genomic contribution due to
Mendelian sampling and have more noise added during
deregression process, in which parental contribution is
removed. It should be noted that in the current study,
no effort was made to minimize the additive genetic
relationships between CRV and EXV datasets and the
split was driven strictly by EBV accuracies of various
traits. Therefore, it is reasonable to expect that CRV
and EXV datasets are related, with additive genetic
relationships and LD determining the accuracy of DGV
in the validation dataset as expected in a real world
scenario. In a similar study in Angus cattle, Saatchi et al.
(2011) reported DGV accuracies with DEBV response
variables. The accuracies ranged from 0.22 to 0.69
with an average of 0.44. For the same traits, response
variable, and clustering method, the correlation in our
study ranged from 0.14 to 0.81 with an average of 0.50.
The differences in the results could be attributed to a
greater number of records used in this study.
The results of both CRV and EXV indicated that
the accuracies obtained with EBV response variables
were higher than those achieved with DEBV. In a
dairy study, Aguilar et al. (2010) reported consistently
slightly lower accuracies when using daughter
deviations (DYD) as a response variable than EBV. In
another dairy study, Gredler et al. (2010) found similar
accuracies with EBV and DEBV for protein yield with
an average accuracy of 94%. However, the prediction
accuracies obtained with EBV response variables were
markedly better than those obtained with deregressed
EBV for the interval between first and last insemination
in heifers and cows that had average EBV accuracies of
54 and 63%, respectively. Guo et al. (2010) compared
the accuracies with EBV and DYD as response
variables and found that using EBV resulted in slightly
higher accuracies (approximately 1.9%) across several
simulated scenarios.
On the contrary, Ostersen et al. (2011) found
that DEBV as response variable resulted in higher
reliabilities of DGV than those obtained using EBV
in pigs. The improvement was about 39 and 18% for
daily gain and feed conversion ratio. Thus, the results
in the literature vary even though most results suggest
that using EBV response variables resulted in higher
accuracies than those obtained with DEBV or DYD.
In some cases the improvement was marginal but
was significant in other cases. In particular, the EBV
response variables performed better for traits with
lower EBV accuracies than for those traits with higher
EBV accuracies relative to DEBV. This has important
implications for development of genomic-enhanced
evaluations in beef breeds where accuracy may be
constrained by the number of available records.
While DEBV (Garrick et al., 2009) have reported
to be more appropriate as response variables than EBV,
the proposed advantage in predictive power is not
supported in the current study. There can be several
reasons for this result. The positive effect of using
DEBV as response variable will depend on the degree
of double counting and the amount and heterogeneity
of information for genotyped animals (Ostersen et al.,
2011). It can be postulated that in the current study,
compared to pigs or dairy cattle, the degree of genetic
relationships are lower among the individuals in the
training population and therefore the advantage of
DEBV in addressing double counting may have been
reduced. Furthermore, the perceived disadvantage
associated with variable EBV accuracies when used
as response variables was addressed in the current
study by including accuracies as weighting factors in
the genomic analyses. In support of the current results,
Guo et al. (2010) also reported that with sires having
unequal number of progeny in a simulated dairy study,
the reliabilities using the EBV approach were slightly
higher than those using weighted DYD approach. In
the same study, the simulated results showed that for
traits of moderate to high heritability and relatively
more number of progeny and hence high accuracies,
EBV and DEBV generate DGV of similar accuracy and
hence it can be stated that the relative advantage of 1
response variable over the other depends on the nature
of the dataset.
Cross-validation is often used in predictive
modeling studies to judge the generalizability of the
statistical estimates to an unseen independent validation
data set. The data in this study have been clustered for
CRV using various strategies. The rationale for using
clustering methods that use relationship among animals
for CRV is that if the SNP underlying the prediction
equations are causal mutations or in close proximity
to causal mutations, then the estimates should hold
up in a genetic background slightly further from the
training dataset. However, some of the estimates that
are based on weaker LD signals may not validate
well in unseen data sets that might otherwise work in
a data set that is similar to the training data set. By
increasing tightness of clustering (i.e., increased amax_
diff), one may lose weaker LD signals that contribute
to prediction accuracy in the validation set of animals.
It is therefore necessary to maintain a balance between
the tightness of the clusters in CRV methodology and
Downloaded from www.journalofanimalscience.org by guest on October 6, 2014
Genomic predictions in Angus cattle
the possibility of losing LD signals that may improve
prediction accuracy.
In this study we compared the accuracies of
DGV across 17 traits analyzed with IBS-based and
K-means clustering methods. Both these methods,
as implemented in this study, partition the training
animals based on the relationship among animals, 1
using information from computed genotypes and
the other from pedigree. K-means tended to cluster
animals into groups that were more homogenous
(average amax = 0.49) compared to IBS (average amax
= 0.44). Each group was more genetically distant to
other groups within K-means compared to IBS (0.26
and 0.31, respectively). This was reflected in the
lower CRV correlations in K-means compared to IBS
based. This supports the results of Clark et al. (2012)
and Habier et al. (2010) who showed that the genetic
relationships between the training and the validation
datasets impact realized genomic selection accuracy.
While the groups formed by K-means clustering
had slightly less genetic diversity within groups (amax
within = 0.49) compared to IBS or random clustering,
the groups formed by this clustering were quite
unbalanced. Unequal cluster sizes lead to difference in
sampling variance between the CRV groups. Groups
with smaller sizes will have accuracy estimates with
larger sampling variance. This is reflected in the
confidence intervals around the correlations where
the confidence interval range for IBS based was stable
(0.08–0.09) while the range was much larger for
K-means (0.06–0.19) and IBS_UnEqual (0.06–0.34).
Another way of decreasing the relationship
between calibration clusters and validation cluster
while maintaining equal number of animals in
calibration clusters is by excluding animals in clusters
i – 1 and i + 1 from calibration. This is possible in IBSbased methods because of the linear scale on which
the cluster group allocations are made. Therefore, it
can be argued that IBS-based method offers better a
balance between uniformity of cluster sizes and the
difference in relatedness within and across clusters.
Increasing the number of animals in training has
been shown to increase the accuracy of genomic
selection in both theoretical (Daetwyler et al., 2008;
Hayes et al., 2009b) and empirical studies (VanRaden et
al., 2009). The results of the current study demonstrate
that these prediction equations explained an additional
18% genetic variance across 10 traits in the validation
dataset, where the size of the training population was
increased from approximately 1,300 to 5,250 records.
Most studies comparing the effect of size of the
training population have been in dairy cattle with
limited comparable studies in beef cattle. In one of the
first such empirical studies on genomic predictions,
495
VanRaden et al. (2009) found that the gain from genomic
prediction was linear for net merit when training size
was increased from 1,151 bulls to 3,576 bulls in U.S.
Holsteins. In the current study, while such an overall
increase in accuracy across all traits was observed
owing to increase in the training population, trait to
trait variation was also evident. This also emphasized
the fact that such an increase could not continue
linearly with an increase in the training population and
that different traits tend to slow down at differing rates
depending on the additive genetic variation of the trait
and the relationships among the training population.
On the contrary, Moser et al. (2009) did not find any
consistent improvement with increase in training size
in an Australian dairy cattle study when the size was
varied from 1,239 to 1,880, which could be attributed
to fewer number of markers (about 7,000) and the
addition of relatively fewer animals.
In a German Holstein study, Habier et al. (2010)
reported an improved DGV accuracy due to LD alone
with increasing training size in milk yield, fat yield,
protein yield, and somatic cell score. For the same traits,
a North American Holstein study (Habier et al., 2011)
further confirmed that the accuracy of DGV improved
markedly with training data size increase from 1,000
to 4,000 bulls albeit with slight improvement or a
reduction from 4,000 to 6,500 bulls. Furthermore, there
were trait and statistical method specific differences in
the extent of increase in accuracy. In summary, there
is overwhelming evidence in the literature to suggest
that an increased training population size has a positive
effect on DGV accuracies in cattle and in other species
such as poultry (Wolc et al., 2011). This is consistent
with the knowledge that with additional records, there
will be more observations per SNP allele and hence
greater accuracy in estimating SNP effects. This was
summarized in a review describing the differences
in accuracies of DGV achieved in various countries
due to differing number of bulls used in the reference
population (Hayes et al., 2009b). Increasing training
population size increases the accuracy of genomic
predictions by influencing several factors, for example,
by increasing the relationship between the animals in
the training and validation animals, especially in North
American pure-bred Angus cattle where the effective
population size is approximately 650 (Saatchi et al.,
2011). Increasing the training population size also
allows for the relaxation in cutoff thresholds such as
decreasing minor allele frequency thereby retaining
rare SNP in the genotype data. Furthermore, there
could be a potential increase in the number of high
EBV accuracy animals in the training with an indirect
effect on the increased accuracy of predictions with an
associated decrease in the sampling variance.
Downloaded from www.journalofanimalscience.org by guest on October 6, 2014
496
Boddhireddy et al.
Conclusions
This study presented a comprehensive analysis of
accuracies of genomic predictions using both CRV and
EXV datasets in U.S. Angus cattle. Estimated breeding
value response variables resulted in higher accuracies
than those obtained with deregressed response
variables. The marker effects estimated from EBV were
as successful in predicting DEBV as marker effects
computed from DEBV while the opposite was not true.
Training CRV accuracies with K-means methodology
resulted in accuracies that were consistently lower than
those obtained with the IBS-based clustering method.
Increasing the discovery population size increased the
prediction accuracies and explained an average of 18%
of additional genetic variance in the EXV data.
Literature Cited
Aguilar, I., I. Misztal, D. L. Johnson, A. Legarra, S. Tsuruta, and
T. J. Lawlor. 2010. Hot topic: A unified approach to utilize
phenotypic, full pedigree, and genomic information for genetic
evaluation of Holstein final score. J. Dairy Sci. 93:743–752.
Angus National Cattle Evaluation. 2012. Angus trait heritabilities
and
genetic
correlations.
http://www.angus.org/Nce/
Heritabilities.aspx. (Accessed 6 January 2014.)
Aulchenko, Y. S., S. Ripke, A. Isaacs, and C. M. van Duijn. 2007.
GenABEL: An R library for genome-wide association analysis.
Bioinformatics 23(10):1294–1296.
Beef Improvement Federation. 2010. http://www.beefimprovement.
org/content/uploads/2013/07/Master-Edition-of-BIFGuidelines-Updated-12-17-2010.pdf. (Accessed 6 January
2014.)
Clark, S. A., J. M. Hickey, H. D. Daetwyler, and J. H. van der
Werf. 2012. The importance of information on relatives for the
prediction of genomic breeding values and the implications
for the makeup of reference data sets in livestock breeding
schemes. Genet. Sel. Evol. 44:4.
Cole, J. W. 2007. A computer program for pedigree analysis.
Comput. Electron. Agric. 57:107–113.
Daetwyler, H. D., R. Pong-Wong, B. Villanueva, and J. A. Woolliams.
2010. The impact of genetic architecture on genome-wide
evaluation methods. Genetics. 185(3): 1021–1031.
Daetwyler, H. D., B. Villanueva, and J. A. Woolliams. 2008.
Accuracy of predicting the genetic risk of disease using a
genome-wide approach. PLoS ONE 3:E3395.
De Roos, A. P., B. J. Hayes, R. J. Spelman, and M. E. Goddard. 2008.
Linkage disequilibrium and persistence of phase in HolsteinFriesian, Jersey and Angus cattle. Genetics 179(3):1503–1512.
Fernando, R. L., and J. D. Garrick. 2008. User manual for a
portfolio of genomic selection related analyses, version 2.0.
Animal Breeding and Genetics, Iowa State University, Ames,
IA.
Garrick, D. J., J. F. Taylor, and R. L. Fernando. 2009. Deregressing
estimated breeding values and weighting information for
genomic regression analyses. Genet. Sel. Evol. 41:55.
Goddard, M. E., and B. J. Hayes. 2009. Mapping genes for
complex traits in domestic animals and their use in breeding
programmes. Nat. Rev. Genet. 10(6):381–391.
Gredler, B., H. Schwarzenbacher, C. Egger-Danner, C. Furest,
R. Emmerlin, and J. Sölkner. 2010. Accuracy of genomic
selection in dual purpose Fleckvieh cattle using three types
of methods and phenotypes. In: Proc. 9th World Congress on
Genetics Applied to Livestock, August 1–6, 2010, Leipzig,
Germany.
Guo, G., M. S. Lund, Y. Zhang, and G. Su. 2010. Comparison
between genomic predictions using daughter yield deviation
and conventional estimated breeding value as response
variables. J. Anim. Breed. Genet. 127(6):423–432.
Habier, D., R. L. Fernando, and J. C. Dekkers. 2007. The impact of
genetic relationship information on genome-assisted breeding
values. Genetics 177(4):2389–2397.
Habier, D., R. L. Fernando, K. Kizilkaya, and D. J. Garrick. 2011.
Extension of the Bayesian alphabet for genomic selection.
BMC Bioinf. 12:186.
Habier, D., J. Tetens, F. R. Seefried, P. Lichtner, and G. Thaller.
2010. The impact of genetic relationship information on
genomic breeding values in German Holstein cattle. Genet.
Sel. Evol. 42:5.
Hartigan, J. A., and M. A. Wong. 1979. A k-means clustering
algorithm. Appl. Stat. 28:100–108.
Hayes, B. J., P. J. Bowman, A. J. Chamberlain, and M. E. Goddard.
2009a. Invited review: Genomic selection in dairy cattle:
Progress and challenges. J. Dairy Sci. 92:433–443.
Hayes, B. J., J. Pryce, A. J. Chamberlain, P. J. Bowman, and M.
E. Goddard. 2010. Genetic architecture of complex traits
and accuracy of genomic prediction: Coat colour, milk-fat
percentage, and type in Holstein cattle as contrasting model
traits. PLoS Genet. 6(9):E1001139.
Hayes, B. J., P. M. Visscher, and M. E. Goddard. 2009b. Increased
accuracy of artificial selection by using the realized
relationship matrix. Genet. Res. 91:47–60.
Kim, E. S., and B. W. Kirkpatrick. 2009. Linkage disequilibrium
in the North American Holstein population. Anim. Genet.
40(3):279–288.
Meuwissen, T. H., B. J. Hayes, and M. E. Goddard. 2001. Prediction
of total genetic value using genome-wide dense marker maps.
Genetics 157:1819–1829.
Moser, G., B. Tier, R. E. Crump, M. S. Khatkar, and H. W.
Raadsma. 2009. A comparison of five methods to predict
genomic breeding values of dairy bulls from genome-wide
SNP markers. Genet. Sel. Evol. 41:56.
Ostersen, T., O. F. Christensen, M. Henryon, B. Nielsen, G. Su, and
P. Madsen. 2011. Deregressed EBV as the response variable
yield more reliable genomic predictions than traditional EBV
in pure-bred pigs. Genet. Sel. Evol. 43:38.
R Development Core Team. 2011. R: A language and environment
for statistical computing. R Foundation for Statistical
Computing, Vienna, Austria. http://www.R-project.org/.
(Accessed 6 January 2014.)
Saatchi, M., M. C. McClure, S. D. McKay, M. M. Rolf, J. Kim,
J. E. Decker, T. M. Taxis, R. H. Chapple, H. R. Ramey, S.
L. Northcutt, S. Bauck, B. Woodward, J. C. Dekkers, R. L.
Fernando, R. D. Schnabel, D. J. Garrick, and J. F. Taylor. 2011.
Accuracies of genomic breeding values in American Angus
beef cattle using K-means clustering for cross-validation.
Genet. Sel. Evol. 43:40.
Snedecor, G. W., and W. G. Cochran. 1967. Statistical methods.
Iowa State Univ. Press, Ames, IA.
Downloaded from www.journalofanimalscience.org by guest on October 6, 2014
Genomic predictions in Angus cattle
Su, G., B. Guldbrandtsen, V. R. Gregersen, and M. S. Lund. 2010.
Preliminary investigation on reliability of genomic estimated
breeding values in the Danish Holstein population. J. Dairy
Sci. 93(3):1175–1183.
VanRaden, P. M., C. P. Van Tassell, G. R. Wiggans, T. S. Sonstegard,
R. D. Schnabel, J. F. Taylor, and F. S. Schenkel. 2009. Invited
review: Reliability of genomic predictions for North American
Holstein bulls. J. Dairy Sci. 92:16–24.
497
Wolc, A., J. Arango, P. Settar, J. E. Fulton, N. P. O’Sullivan, R.
Preisinger, D. Habier, R. Fernando, D. J. Garrick, and J. C.
Dekkers. 2011. Persistence of accuracy of genomic estimated
breeding values over generations in layer chickens. Genet. Sel.
Evol. 43:23.
Downloaded from www.journalofanimalscience.org by guest on October 6, 2014
References
This article cites 25 articles, 5 of which you can access for free at:
http://www.journalofanimalscience.org/content/92/2/485#BIBL
Downloaded from www.journalofanimalscience.org by guest on October 6, 2014