Multivariate empirical Bayes to predict the plant breeding values

Ceron-Rojas, J. Jesus; Sahagún-Castellanos, Jaime; Ceron-Rojas, J. Jesus; Sahagún-Castellanos, Jaime

Serviços Personalizados

Journal

Artigo

Indicadores

Citado por SciELO
Acessos

Links relacionados

Similares em SciELO

Mais
Mais

Permalink

Agrociencia

versão On-line ISSN 2521-9766versão impressa ISSN 1405-3195

Agrociencia vol.50 no.5 Texcoco Jul./Ago. 2016

Crop Science

Multivariate empirical Bayes to predict the plant breeding values

J. Jesus Ceron-Rojas¹^*

Jaime Sahagún-Castellanos¹

^¹Instituto de Horticultura, Departamento de Fitotecnia, Universidad Autónoma Chapingo. 56230. Chapingo, México. (jesusceronrojas@live.com.mx).

Abstract:

The plant breeding value is inheritable and determines phenotypic characteristics such as plant height, and grain yield, and it can be predicted by means of univariate or multivariate Bayesian models based on the phenotypic or genomic plants information. These models control the uncertainty associated to prediction better, but this comes at a high computational cost, so less demanding alternative models are required. Empirical Bayes is a prediction method in which the expectation of the posterior distribution is the estimator of the breeding value. This is a variant of the standard Bayesian estimator and is efficient; it is robust to the erroneous specifications of the a priori distribution of parameters, and the parameter covariances can be estimated through restricted maximum likelihood. A multivariate linear model was proposed to predict the breeding value within the empirical Bayes context. This model incorporates the genetic correlations between traits, pedigree information, genomic information, and contains the multivariate genomic linear model and the multivariate standard linear model as particular cases. The genomic model uses only genomic information, whereas the standard model uses only information from the pedigree in the prediction. To compare numerically the efficiency of each of the three models, the correlations between the predicted and observed values obtained with the data from two maize (Zea mays) F₂ populations and one double haploid wheat (Triticum aestivum L.) population, each of them with three characteristics and a particular set of molecular markers and genotypes, were used. In the three populations, the numerical results indicated that the model proposed provides more precise predictions than the other two. We concluded that the results were due to the fact that the model proposed used the genetic correlations between traits and the phenotypic, as well as genomic information, in the prediction.

Key words: Joint posterior distribution; molecular markers; multivariate linear model; Triticum aestivum; restricted likelihood; Zea mays

Resumen:

El mérito genético de las plantas es heredable y determina características fenotípicas como altura de planta y rendimiento de grano, y puede predecirse por medio de modelos bayesianos univariados o multivariados con base en la información fenotípica o genómica de las plantas. Estos modelos controlan la incertidumbre asociada a la predicción pero son computacionalmente demandantes, por lo cual se requieren modelos alternativos menos demandantes. Bayes empírico es un método de predicción en el cual la esperanza de la distribución posterior es el estimador del mérito genético. Éste es una variante del estimador bayesiano estándar y es eficiente; es robusto ante las especificaciones erróneas de la distribución a priori de los parámetros y las covarianzas de éstos pueden estimarse por verosimilitud restringida. Para predecir el mérito genético en el contexto Bayes empírico se propuso un modelo lineal multivariado, el cual incorpora las correlaciones genéticas entre caracteres, la información del pedigrí, la información genómica, y contiene al modelo lineal genómico multivariado y al modelo lineal estándar multivariado como casos particulares. El modelo genómico usa solo información genómica mientras que el modelo estándar usa sólo información del pedigrí en la predicción. Para comparar numéricamente la eficiencia de cada uno de los tres modelos se usaron las correlaciones entre los valores predichos y observados obtenidas con los datos de dos poblaciones de maíz (Zea mays) F₂ y una población de trigo (Triticum aestivum L.) doble haploide, cada una de éstas con tres características y un conjunto particular de marcadores moleculares y genotipos. En las tres poblaciones los resultados numéricos indicaron que el modelo propuesto proporciona predicciones más precisas que los otros dos. Concluimos que los resultados se deben a que el modelo propuesto usa en la predicción, además de las correlaciones genéticas entre caracteres, la información fenotípica y genómica.

Palabras clave: Distribución posterior conjunta; marcadores moleculares; modelo lineal multivariado; Triticum aestivum; verosimilitud restringida; Zea mays

Introduction

The prediction of the plant and animal breeding values is generally done through the mixed linear model (^{Robinson, 1991}) or with some Bayesian approximation (^{Blasco, 2001}; ^{Sorensen and Gianola, 2002}) based on the phenotypic and pedigree records of the candidates for selection. However, ^{Meuwissen et al. (2001)} showed that genomic selection (GS) increases the accuracy (correlation between values observed and predicted) of the prediction of breeding values of the candidate for selection, and reduces the intervals between selection cycles in up to two thirds when the number of genotypes and molecular markers (MM) used in the prediction is sufficiently large. In GS, the predicted breeding values, or genomic estimated breeding values (GEBV), are obtained by multiplying the MM effects estimated in the training population by the coded MM values obtained after the first selection cycle. The GEBVs are the tool of GS and allow selecting quantitative traits in absence of phenotypic information (^{Gianola, 2013}; ^{Beyene et al., 2015}).

One of the most important problems in GS is to obtain sufficiently precise GEBV so that the GS be efficienyy can be the highest possible. This problem has led to several prediction methodologies derived from the following assumptions: 1) the MM effects have a multivariate normal distribution with mean equal to zero and constant variance, and 2) the MM effects have an a priori distribution that can be uniform, gamma, etc. Point 1) led to the genomic best linear unbiased predictor (GBLUP) (^{VanRaden, 2008}) and to the best Bayes linear unbiased predictor (Bayes-BLUP) (^{Verbyla et al., 2009}, ²⁰¹⁰). Point 2) led to Bayesian methodologies such as Bayes A, B, C, D, etc. (^{de los Campos et al., 2013}; ^{Gianola, 2013}), which differ only in the specific assumption that they make with regard to the a priori variance distribution of the marker effects.

In GS, the Bayesian methods were developed within the context of a phenotypic variable with the objective of improving the GBLUP accuracy; however, it has not been irrefutably demonstrated that the GBLUP accuracy is significantly lower than the Bayesian methods (^{Massman et al., 2013}). The Bayesian methods provide better control of the uncertainty associated to the prediction of the breeding value (^{Blasco, 2001}) but require numerical methods, such as Gibbs sampling (^{Casella and George, 1992}), to estimate the MM effects and any other parameter associated to the breeding value. ^{Verbyla et al. (2009)} point out that Bayes B requires up to 2440 h of computing for the Gibbs sampling to converge, while Bayes A and Bayes-BLUP require at least 6 h of computing for the convergence of such algorithm. According to ^{Verbyla et al. (2009)}, despite the great differences in the computing time with the methods indicated, when the number of genotypes and MM is large, the accuracy reached by all of them is virtually equal (0.6, average).

The prediction procedures with univariate models do not take into account the genetic correlations between traits, although in practice evaluating the cultivars requires several traits in a simultaneous manner. For example, breeders of yield and grain quality register phenotypic data that include yield components (e.g., grain weight or biomass), grain quality (e.g., flavor, shape, color, nutrient content), and resistance to biotic and abiotic stress (^{Jia and Jannink, 2012}). The prediction of the multivariate breeding value has the advantage of incorporating the genetic correlations between traits. This information must increase the efficiency in the breeding value prediction; for this reason ^{Calus and Veerkamp (2011)} suggested a procedure similar to Bayes A, and ^{Hayashi and Iwata (2013)} adapted Bayes D to the multivariate case. However, alternatives computationally less demanding without affecting the precision of the prediction are required: empirical Bayes is an alternative prediction method with desirable statistical properties. In it, under the assumption that the variances of the parameters are known, the expectation of the posterior distribution of the breeding value is an empirical Bayes estimator of such value (^{Tempelman and Rosa, 2004}). This is a variant of the standard Bayesian estimator and is quite efficient; in addition, it is robust to the erroneous specifications of the a priori distribution of the parameters (^{Lehmann and Casella, 1998}).

In the GS programs the first selection cycle includes only phenotypic information, although the training population (where the first group of parents is selected) has phenotypic and MM data. When making selection only with phenotypic information, the information from the MM is not used. If the phenotypic and the MM information are combined in the prediction, the precision will increase even during the first selection cycle due to the increase in the model information. A similar problem occurs when only some individuals of the candidates for selection have MM and the rest do not, as in the hybrid plants breeding (^{Massman et al., 2013}) or in animal selection (^{Legarra et al., 2009}).

The objective of this study was to propose and evaluate, within the empirical Bayes context, a multivariate linear model that uses pedigree and genomic information in a joint manner to predict the breeding value of the candidates for selection. In such a model the expectation of the joint posterior distribution of the breeding values is the empirical Bayes estimator. The basic assumptions of this model are: 1) the genetic variances and covariances are known; 2) the genomic effect and the additive genetic effect not explained by the MM have a joint multivariate normal distribution with mean equal to zero and common variance; 3) the breeding value of the candidates for selection is the sum of the genomic effect and the additive genetic effect not explained by the MM. Besides, it is shown that the genomic multivariate linear model (which uses only genomic information in the prediction) and the standard multivariate linear model (which uses only phenotypic and pedigree information in the prediction) are particular cases of the proposed model.

Materials and Methods

Maize populations 1 and 2

In each one of the two F₂ maize (Zea mays) Populations, three variables were recorded: grain yield (RG, Mg ha^‒1), ear height (AM, cm), and plant height (AP, cm). The maize Population 1 had 199 MM and 247 genotypes, whereas in the maize Population 2 the number of MM was 259 and of genotypes 248. The estimated genetic correlations between RG and AM, RG and AP, and AM and AP in the maize Population 1 were, respectively, 0.53, 0.52 and 0.98, whereas in the maize Population 2, those correlations were: 0.58, 0.76 and 0.71.

Population 3 (wheat population)

The wheat (Triticum aestivum L.) double haploid Population included 1279 MM and 599 genotypes. In it, the grain yield (RG, Mg ha^‒1) was recorded in three environments (RG1, RG2, RG3). To predict the breeding value of the candidates for selection, RG1, RG2 and RG3, each of them was considered a particular characteristic because the genotypes were evaluated in different environments. The estimated genetic correlations between RG1 and RG2, RG1 and RG3, and RG2 and RG3, were ‒0.03, ‒0.21 and 0.73, respectively.

The proposed univariate linear model

Let γq=Xuq be a g×1 vector (g = number of genotypes in the population) of genomic breeding values associated to the characteristic q(q=1,2,...,t; t=number of variables) of the candidates for selection. Assume that 𝛾_q has multivariate normal distribution (NMV) with mean 0 and variance Gσγq2, i.e., 𝛾_q ~ NMV 0,Gσγq2, where σγq2 is the additive genomic variance of 𝛾_q and G = XX′ / k is the g×g additive genomic relationships matrix between genotypes; X is a g× m matrix (m= number of MM in the population) of coded MM values (2‒2p, 1‒2p and ‒2p for the genotypes AA, Aa, and aa, respectively) associated to the additive effects of the quantitative traits loci (QTL); p is the frequency of the allele A and 1‒p is the frequency of allele a in the MM j (j=1,2,..., m); u _q is a m ×1 vector of additive effects of the QTL associated to the m MM that affect trait q; k=∑j=1m2pj1-pj (^{Habier et al., 2007}) in a population F₂ and k=∑j=1m4pj1-pj in a double haploid population. In addition, let a _q ~ NMV 0,Aσaq2 be a g×1 vector of additive genetic merits unexplained by the MM associated to the trait q, where A is the numerical relationships matrix and σaq2 is the additive genetic variance of a _q . The combined linear model for the trait qyq* can be denoted as yq*=1μq+Zaq+Zγq+eq, or equivalently as:

(1)

where y _q = yq* −1μ _q ~ NMV 0,Vq is a vector g×1 of the q trait observations centered with regard to the q trait mean, 𝛍 _q ; 1 is a g×1 vector of ones; Vq=Aσaq2+Gσγq2+2Covaq,γq´+Igσeq2, and Covaq,γq´=Gσγq2 (i.e., the covariance between a _q and 𝛾_q is equal to the variance of 𝛾_q ); 𝛾_q , G y σγq2 were defined before; Z is a matrix of incidence (generally an identity matrix g×g) y e _q ~ NMV 0,Igσeq2 is a g×1 vector of residuals; I _g is an identity matrix g×g and σeq2 is the variance of residuals. The model of Equation 1 will be called univariate combined linear model.

Posterior joint distribution of a _q and 𝛾_q

The posterior joint distribution of a _q and 𝛾_q can be written as:

(2)

where the symbol “∝” indicates that P(a _q , 𝛾_{q /} y _q ) can be written as the product of the likelihood function, of y _q , P(y _q / a _q /𝛾_q ) ∝ exp-12yq-Zaq-Zγq⁡´ R-1yq-Zaq-Zγq, the conditional a priori distribution of a _q given 𝛾_q , P(a _q / 𝛾_q ) ∝ exp-12aq-γq´T-1aq-γq and the a priori distribution of 𝛾_q , P(𝛾_q ) ∝ exp-12γq´Φ-1γq, R=Igσeq2, T=Aσaq2-Φ y Φ=Gσγq2. According to the properties of the NMV distribution (^{Sorensen and Gianola, 2002}), 𝛾_q y T are the expectation and the variance of a _q / 𝛾_q , respectively. Thus, Equation 2 is equal to:

(3)

The right side of Equation 3 is the normal distribution kernel with mean Dd and variance D, where θq´=aq´γq´, D-1=D11-1D12-1D21-1D22-1^-1, D11-1=R-1+T-1, D12-1=D21-1=R-1-T-1, D22-1=R-1+T-1+Φ-1, d=12⊗R-1yq, 12´=11 and "⊗" denotes the Kronecker product between matrices (^{Langville y Stewart, 2004})..

Estimator of θ _q

From Equation 3, the empirical Bayesian estimator of θq´=aq´γq´ is:

(4)

The components of variance: σaq2, σγq2 and σeq2 can be estimated from the marginal distribution of y _q using restricted maximum likelihood (^{Lynch and Walsh, 1998}; ^{Vattikuti et al., 2012}).

The multivariate linear model

When two or more traits are used in the prediction of the breeding value, the combined linear model from Equation 1 can be written as:

(5)

where, now, y´=y1´ y2´… yt´ ~ NMV(0,V), a´=a1´ a2´…at´ ~ NMV(0,S), γ´=γ1´ γ2´…γt´ ~ NMV(0,Ω) y e´=e1´ e2´…et´ ~ NMV(0,Ψ) are vectors made up of t subvectors g×1 of observations (y), of additive genetic effects unexplained by the MM (a), of additive genomic effects (𝛾), and of errors (e), respectively; V = S + 3Ω + Ψ, where S = C ⊗ A, Ω = Γ ⊗ G and Ψ = E ⊗ I _g; C=σaqi (q,i = 1,2,…,t; t=number of traits) is the matrix of variances and covariances of the additive genetic effects unexplained by the MM (a), Γ=σ𝛾qi is the matrix of variances and covariances of additive genomic merits (𝛾), and E=σeqi is the matrix of variances and covariances of the residues; Z is an identity matrix (or of incidence) of order gt×gt; A, G and I _g are defined in Equation 1. The matrices C=σaqi and Γ=σγqi can be conformed from the estimations of the components of variances: σa2q, σγ2q and σe2q, and of the respective covariances (^{Vattikuti et al., 2012}).

Estimation of a and 𝛾

Let θ´=a´γ´, be a vector conformed by a´=a1´ a2´…at´ and γ´=γ1´ γ2´…γt´ (Equation 5); the posterior distribution of θ is similar to the distribution of θq´=aq´γq´ (Equation 3), thus, the empirical Bayesian estimator of θ is similar to Equation 4, i.e.,

(6)

where, now, the components that make up the D ^‒1 matrix are: D11-1=Ψ-1+(S-Ω)-1, D12-1=D21-1= Ψ-1-(S-Ω)-1 and D22-1=Ψ-1+(S-Ω)-1+Ω-1; d=12⊗Ψ-1y, Ψ-1=E-1⊗Ig and 12´=1 1.

Prediction of the breeding value in the first cycle of selection

In the first selection cycle the predictor of the breeding value of the candidates for selection (θ¯^) can be written as:

(7)

where â and ŷ are sub-vectors of θ^BE=Dd (Equation 6).

Prediction of the breeding value after the first selection cycle

In order to obtain the predicted values for the candidates for selection from the second selection cycle, it is necessary to estimate the values of the vector u´=u1´ u2´…ut´ in the training population from the equation γ = X _t u, where X _t = I _t ⊗ X, I _t is the identity matrix t×t and X is the matrix of coded MM values in the training population. An estimator of u in the training population is:

(8)

where ŷ is the sub-vector of Equation 6. From Equation 8, the empirical Bayesian predictor of the breeding value after the first selection cycle is:

(9)

where W _l = I _t ⊗ X _l (l=2,3,…,N; N=number of selection cycles), I _t was defined previously, and X _l is the matrix of coded MM values obtained in the selection cycle l. Thus, from the second selection cycle, the only thing that will change in Equation 9 will be the coded values in matrix X _l .

Criterion to compare the efficiency of the prediction models

Since accuracy is equal to the correlation between predicted and observed values, their maximum value is 1. Assume that ρ _c and ρ _g denote the accuracy in the combined and the genomic linear model, respectively, then:

(10)

is the efficiency (^{Bulmer, 1980}) of the combined linear model with regard to the genomic linear model. Thus, if p=0, the efficiency of both models is equal (ρ _c = ρ _g ); p>0 si ρ _c > ρ _g (the efficiency of the combined model is greater than that of the genomic model) and if ρ _c < ρ _g , p<0 (the efficiency of the combined model is lower than that of the genomic model). Thus, Equation 10 allows determining the most efficient linear model to predict the genetic merit.

Results and Discussion

The genomic model is nested in the combined model

One of the most important results in the theory of GS is that the expectation of the genomic relationships matrix G is equal to the numerical relationships matrix A, i.e. , E (G) = A (^{Habier et al., 2007}). This means that G is a particular realization of A and that when the number of MM and genotypes increases in the training population, the value of G will tend concentrated around A, so that it can be assumed that at the limit, G=A. The same is true with the additive genomic variances and covariance matrix Γ in relation to the additive genetic variances and covariances matrix C. That is, when the number of MM and genotypes increases, the matrix Γ approaches C, and at the limit, Γ=C. When G=A and Γ=C, S=Ω and the matrices that make up the matrix D-1:D11-1=Ψ-1+(S-Ω)-1, D12-1=D21-1=Ψ-1-(S-Ω)-1 and D22-1=Ψ-1+(S-Ω)-1+Ω-1, are reduced to D11-1=Ψ-1, D12-1=D21-1=Ψ-1 and D22-1=Ψ-1+Ω-1, and matrix D-1 will be equal to Ψ-1Ψ-1Ψ-1Ψ-1+Ω-1^‒1.

This indicates that all breeding value information is concentrated in the additive genomic effects 𝛾 and that the values of vector a are null. In such a case, the empirical Bayes estimator θ^BE=Dd (Equation 6) becomes the predictor of the additive genomic merit (ŷ) and can be denoted as:

(11)

This result indicates that the genomic linear model is a particular case of the combined linear model.

The model with only phenotypic information is nested in the combined model

When the information of the MM is not used, matrix Ω is null and, in such a case, θ^BE becomes the predictor of the additive genetic effects (â) and can be written as:

(12)

This shows that the linear model with only phenotypic information is a particular case of the combined linear model. The â will be called standard predictor.

Accuracy of the three prediction models

The predicted values of the breeding value of the candidates for selection associated to each one of the three traits of the two maize populations (Populations 1 and 2) and of the wheat population (Population 3) were denoted as θ¯^₁, θ¯^₂ and θ¯^₃, for the combined model (Equation 7); γ^1, γ^2 and γ^3 for the genomic model (Equation 11), and a^1, a^2 and a^3 for the standard model (Equation 12). With the predicted and the observed values the accuracy (correlation between the predicted and observed values) was calculated for each one of the three traits of the three models; these are shown in Table 1.

Table 1: Correlations obtained between the predicted values of the standard, genomic and combined models, and the values of the observations of three traits in two maize populations and one wheat population.

♦ ^†Grain yield (Mg ha^‒1), ^❡Ear height (cm), ^§Plant height (cm).

Numerical evaluation of the three prediction models

The efficiency of the combined model with regard to the standard model and the genomic model; and the efficiency of the standard model with regard to the genomic model, was evaluated through Equation 10 with the correlation values presented in Table 1.

Maize population 1

Efficiency of the combined model compared with the standard model

The value of p (Equation 10) associated to the correlations between grain yield (GY) and its predicted values (θ¯^₁ y â) was calculated as p=1000.8830.551-1=60.2, , where 0.883 was the correlation between GY and θ¯^₁, and 0.551 was the correlation between RG and â ₁. Given that p=60.254, the efficiency of the combined model was 60.2 % higher than the efficiency of the standard model.

The value of p obtained from the correlation between the ear height (AM) and θ¯^₂ (0.767), and the correlation between AM and â ₂ (0.719), was p=1000.7670.719-1=6.7. Given that p=6.7, the combined model was 6.7 % more efficient than the standard model. Finally, the value of p for plant height (AP) and its predicted values was equal to p=1000.8300.229-1=262.4, where 0.830 was the value of the correlation estimated between AP and θ¯^₃, and 0.229 was the value of the estimated correlation between RG and â ₃. In this last case, the combined model was 262.4 % more efficient than the standard model.

The average of the three values of p obtained with the correlations between the predicted and observed values of the three traits was equal to 109.8 %. This means that the combined model was more adequate for predicting the breeding value because it has an efficiency that is 1.1 greater than the standard model.

Efficiency of the combined model compared with the genomic model

The average efficiency of the combined model was 366.9 % higher than the genomic model efficiency. This is because the estimated correlation values between AM and γ^2 (0.130) (Column 7, Table 1), and between AP and γ^3 (0.123) (Column 8, Table 1) were the lowest. In this case the combined model efficiency was 3.7 higher than the genomic model efficiency, so it is more adequate for predicting the breeding value in this set of data.

Efficiency of the standard model compared with the genomic model

Again, due to the low estimated correlation values between AM and γ^2 (0.130) and between AP and γ^3 (0.123), the average efficiency of the standard model with regard to the genomic model was 174.7 % higher. That is, the genomic model was almost twice less efficient than the standard model. This is because in the maize Population 1 the number of markers was only 199.

In short, the combined model was almost 4 times more efficient than the genomic model, and 1.1 more efficient than the standard model. It is evident, then, that the combined model is more adequate to predict the breeding value than the other two models in this set of data.

Maize population 2

Efficiency of the combined model compared with the standard model and with the genomic model

A similar procedure to the one performed with maize Population 1 allows demonstrating that the average efficiency of the combined model was 9.4 % higher than the average efficiency of the standard model, and 38.2 % higher than in the genomic model, respectively. Although the number of markers increased relatively little in the maize Population 2 (only 20 MM more than in the maize Population 1), the efficiency of the combined model with regard to the genomic one was only 38.2 % higher, which indicates that the increase in the number of markers increased the efficiency of the genomic model. However, the combined model was more efficient than the other two models, so it is also advisable to use it to predict the breeding value in this set of data.

Efficiency of the standard model compared with the genomic model

The average efficiency of the standard model was only 29.4 % higher than that of the genomic model. This result indicates that the increment in the number of MM increased the efficiency of the genomic model.

Population 3

Efficiency of the combined model compared with the standard model and with the genomic model

The average efficiency of the combined model with regard to the standard model and the genomic one was only 0.2 and 16.15 %, respectively. Due to the number of MM (1279) and of genotypes (599) in Population 3, the efficiency of the combined model with regard to the standard model and genomic model was lower than in maize Populations 1 and 2. In this case, both, the combined model and the standard model, could be adequate for predicting the breeding value.

The results from Population 3 are explained because the accuracy of the standard model (Equation 12) is very high (Table 1) due to the grain yield coming from an autogamous species. Therefore, although the number of MM is large, these contribute very little to the accuracy of the combined model.

Efficiency of the standard model compared with the genomic model

Despite the number of MM being relatively high, the average efficiency of the standard model was higher than that of the genomic model by 15.9 %. As it has been indicated, this is because the accuracy of the standard model for this population is very high (Table 1). However, the correlations obtained in the genomic model between the predicted values and the observed values were higher in Population 3 than in maize Populations 1 and 2 (Table 1), which suggests that when increasing the number of MM, the precision of the genomic model also increased.

According to the results of the three prior populations, the combined model was in general more efficient than the other two models, although, as the number of markers and genotypes increased, the efficiency of the combined model with regard to the genomic model was reduced. The efficiency observed from the combined model in the results of the three populations must be necessarily attributed to the fact that the model used two sources of information in the prediction: phenotypic and genomic. Then if the combined model is used in the first selection cycle, the precision of the selection in that cycle will increase.

Advantage of the genomic model with regard to the standard model

In the GS the usual way of predicting the plant and animal breeding values in breeding programs is to substitute the numerical relationships matrix (A) by the genomic relationships matrix (G) in the prediction equations. Therefore, the prediction equation of the genomic model (Equation 11) and the standard model (Equation 12), are formally equivalent. When the number of MM and genotypes is large, both models tend to provide predictions that are increasingly more similar (Table 1, Population 3). However, the advantage of the genomic model with regard to the standard model lies in the possibility of reducing the intervals between selection cycles in more than two thirds. Thus, the genomic model is more efficient than the standard model when the efficiency is measured per year and not per selection cycle. According to ^{Beyene et al. (2015)}, the genomic selection requires 1.5 years to complete a selection cycle, while the phenotypic selection requires 4 years for each selection cycle.

Importance of the combined model

There are several Bayesian (^{Gianola, 2013}) and non-Bayesian (^{VanRaden, 2008}) methods to predict the breeding value in the univariate context under the assumption that the number of genotypes and MM is sufficiently large in the base population. In practice, however, not all the candidates for selection (plants or animals) have molecular markers. Therefore, a model such as the one proposed could be easily adapted to this case, thus increasing the precision in the prediction.

Empirical Bayes compared with GBLUP

Because the MM effects have a multivariate normal distribution, empirical Bayes and GBLUP should give very similar results (^{Robinson, 1991}) when the same prediction model is used. This is because the assumptions of GBLUP and empirical Bayes are basically the same and because, when the variances of the parameters are known, GBLUP is considered a particular case of the Bayesian methods (^{Blasco, 2001}).

Finally, how could the breeding value be predicted? Through the empirical Bayes proposed, through GBLUP or with some of the existing Bayesian approximations? The standard Bayesian models provide a better control of the uncertainty associated to the prediction of the breeding value (^{de los Campos et al., 2013}, ^{Gianola, 2013}), which is attained with much computation work (^{Verbyla et al., 2009}). GBLUP, in turn, requires the knowledge of variances of the parameters; when these variances are unknown, the statistical properties of GBLUP are also unknown (Gianola, 2013). According to ^{Blasco (2001)}, the election of one prediction model over another one should be based on the fact that the chosen model offers a solution that the others do not, on how ease is to solve the problem, and on the trust in their results. This last point has the greatest importance, since if the researcher feels comfortable with a specific method, it means that he/she knows its limitations and advantages and knows what to expect from the model when using it in a specific statistical analysis.

Conclusions

The model proposed, with joint information on pedigree and genome within the empirical Bayes context, provided more precise predictions than the other two models because the predictions incorporate not only the phenotypic and genomic information, but also the genetic correlations between traits.

Literatura Citada

Blasco, A. 2001. The Bayesian controversy in animal breeding. J. Anim. Sci. 79: 2023-2046. [ Links ]

Beyene, Y., K. Semagn, S. Mugo, A. Tarekegne, R. Babu, B. Meise, P. Sehabiague, D. Makumbi, C. Magorokosho, S. Oikeh, J. Gakunga, M. Vargas, M. Olsen, B. M. Prasanna, M. Banziger, and J. Crossa. 2015. Genetic gains in grain yield through genomic selection 1 in eight bi-parental maize populations under drought stress. Crop Sci. 55: 154-163. [ Links ]

Bulmer, M. G. 1980. The Mathematical Theory of Quantitative Genetics. Lectures in Biomathematics. University of Oxford: Clarendon Press. 254 p. [ Links ]

Calus, M. P. L., and R. F. Veerkamp. 2011. Accuracy of multi-trait genomic selection using different methods. Genet. Selection Evol. 43: 26. http://www.gsejournal.org/content/43/1/26. (Consulta: Febrero 2015). [ Links ]

Casella, G., and E. I. George. 1992. Explaining the Gibbs sampler. The Am. Stat. 46: 167-174. [ Links ]

de los Campos, G., J. M. Hickey, R. Pong-Wong, H. D. Daetwyler, and M. P. L. Calus. 2013. Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics 193: 327-345. [ Links ]

Gianola, D. 2013. Priors in whole-genome regression: the bayesian alphabet returns. Genetics 194: 573-596. [ Links ]

Hayashi, T., and H. Iwata. 2013. A Bayesian method and its variational approximation for prediction of genomic breeding values in multiple traits. BMC Bioinf. 14: 34. [ Links ]

Habier, D., R. L. Fernando, and J. C. M. Dekkers. 2007. The impact of genetic relationship information on genome-assisted breeding values. Genetics 177: 2389-2397. [ Links ]

Jia, Y., and J. L. Jannink. 2012. Multiple-trait genomic selection methods increase genetic value prediction accuracy. Genetics 192: 1513-1522. [ Links ]

Langville, A. N., and W. J. Stewart. 2004. The Kronecker product and stochastic automata networks. J. Comp. Appl. Math. 167: 429-44. [ Links ]

Legarra, A., I. Aguilar, and I. Misztal. 2009. A relationship matrix including full pedigree and genomic information. J. Dairy Sci. 92: 4656-4663. [ Links ]

Lehmann, E. L., and G. Casella. 1998. Theory of Point Estimation. 2nd Ed. Springer-Verlag New York. 589 p. [ Links ]

Lynch, M., and B. Walsh. 1998. Genetics and Analysis of Quantitative Traits. Sinauer Associates, Inc. Publisher Sunderland, Massachusetts, USA. 980 p. [ Links ]

Massman, J. M., A. Gordillo, R. E. Lorenzana, and R. Bernardo. 2013. Genomewide predictions from maize single-cross data. Theor. Appl. Genet. 126: 13-22. [ Links ]

Meuwissen, T. H. E., B. J. Hayes, and M. E. Goddard. 2001. Prediction of total genetic value using genome-wide dense marker maps. Genetics 157: 1819-1829. [ Links ]

Robinson, G. K. 1991. That BLUP is a good thing: The estimation of random effects. Stat. Sci. 6: 15-51. [ Links ]

Sorensen, D., and D. Gianola. 2002. Likelihood, Bayesian, and MCMC Methods in Quantitative Genetics. Springer, New York. 740 p. [ Links ]

Tempelman, R. J., and G. J. M. Rosa. 2004. Empirical Bayes approach to mixed model inference in quantitative genetics. In: Saxto, A. M. (ed). Genetics Analysis of Complex Traits Using SAS. Cary N.C., SAS Institute Inc. pp: 149-176. [ Links ]

VanRaden, P.M. 2008. Efficient methods to compute genomic predictions. J. Dairy Sci. 91: 4414-4423. [ Links ]

Vattikuti, S., J. Guo, and C. C. Chow. 2012. Heritability and genetic correlations explained by common SNPs for metabolic syndrome traits. PLoS Genet 8 (3): e1002637. DOI: 10.1371/journal.pgen.1002637. [ Links ]

Verbyla, K. L., B. J. Hayes, P. J. Bowman, and M. E. Goddard. 2009. Accuracy of genomic selection using stochastic search variable selection in Australian Holstein Friesian dairy cattle. Genet. Res. Camb. 91: 307-311. [ Links ]

Verbyla, K. L., P. J. Bowman, B. J. Hayes, and M. E. Goddard. 2010. Sensitivity of genomic selection to using different prior distributions. MCM Proceeding 4 (Supp 1) S5. [ Links ]

Received: February 2015; Accepted: February 2016

^* Author for correspondence: jesusceronrojas@live.com.mx

Este es un artículo publicado en acceso abierto bajo una licencia Creative Commons

Serviços Personalizados

Journal

Artigo

Indicadores

Links relacionados

Compartilhar

Agrociencia

versão On-line ISSN 2521-9766versão impressa ISSN 1405-3195

Agrociencia vol.50 no.5 Texcoco Jul./Ago. 2016