Comparing the Intercept Mixture Model with the Slack-Variable Mixture Model

Cruz-Salgado, Javier; Cruz-Salgado, Javier

Serviços Personalizados

Journal

Artigo

Indicadores

Citado por SciELO
Acessos

Links relacionados

Similares em SciELO

Mais
Mais

Permalink

Ingeniería, investigación y tecnología

versão On-line ISSN 2594-0732versão impressa ISSN 1405-7743

Ing. invest. y tecnol. vol.17 no.3 Ciudad de México Jul./Set. 2016

Articles

Comparing the Intercept Mixture Model with the Slack-Variable Mixture Model

Comparación del modelo de mezclas con intercepto con el modelo de mezclas de variable de holgura

Javier Cruz-Salgado¹

^¹ Universidad Politécnica del Bicentenario, México, Investigación y Desarrollo Tecnológico. E-mail: jcruzs@upbicentenario.edu.mx

Abstract:

Mixture experiments are experiments performed using ingredients whose proportions are restricted. This restriction may result in extremely small range in terms of the mixtures, causing difficulties in model fitting arising from ill-conditioning. The choice of model form is a very important factor in the numerical stability of the information matrix. In this paper, the intercept model is compared against the slack-variable model for mixture experiments. We analyzed if it matter which component is replaced for the constant term in the intercept model, in the sense on numerical stability. We also show by numerical examples that the Correlation Criterion, presented in Kang et al. (2015), does not work for the intercept model. Next, as suggested in the literature, we use linear transformation to alleviate the numerical instability. In addition, we try four transformation methods and choose the best one for the intercept model and the slack-variable model. Finally, we compare the intercept model with the slack-variable model based mainly on the prediction accuracy and numerical stability.

Keywords: mixture experiments; intercept model; slack-variable model; variable transformation; condition number; variance inflation factor

Resumen:

Los diseños de experimentos para mezclas son diseños que se llevan a cabo usando ingredientes cuyas proporciones están sujetas a restricciones. Dichas restricciones pueden dar como resultado un rango extremadamente pequeño en términos de las mezclas, causando dificultades en el ajuste del modelo debido a problemas de coli- nealidad. La elección de la forma del modelo es un factor importante en la estabilidad numérica de la matriz de información. En este artículo, se compara el modelo intercepto contra el modelo de variable de holgura en experimentos para mezclas. Se analizó la importancia de cuál componente se remplaza por el término constante en el modelo intercepto, en el sentido de estabilidad numérica. Asimismo, se muestra mediante ejemplos numéricos que el Criterio de Correlación, presentado por Kang et al. (2015), no funciona para el modelo intercepto. Después, como se sugiere en la literatura, se emplearon transformaciones numéricas para mejorar la inestabilidad numérica. Adicionalmente, se probaron cuatro métodos de transformación y se seleccionó el mejor, tanto para el modelo intercepto como para el modelo de variable de holgura. Finalmente, se comparó el modelo intercepto contra el modelo de variable de holgura basado principalmente en la precisión de predicción y la estabilidad numérica.

Palabras clave: diseño de experimentos para mezclas; modelo intercepto; modelo de variable de holgura; transformación de variables; número condicional; factor de inflación de la varianza

Introduction

A mixture experiment is one in which the response depends only on the relative proportions of the ingredients, or components, present in the mixture, this proportions represent the design variables. In such experiments, by mixing different components, the product is developed. Mixture experiments frequently appear in fields such as chemical, pharmaceutical, food and plastic industries. There are certain type of mixture experiments where the total amount of the mixture, or process variables, are involve too as a design variables (^{Piepel and Cornell, 1985}; ^{Goldfarb et al., 2004}). In this paper, we focus on the mixture experiments with the proportions of the components as the only input variables.

If x _i denotes the proportion of the ith of q components, then xi ≥ 0 for i = 1, 2,..., q, and

(1)

Commonly the design region (1) is subject to additional constraints of the form

(2)

to one or several components. These additional restrictions may result in extremely small range in terms of the mixtures. Not only does the experimental design region become constrained, but the resulting model from a mixture experiment also has to satisfy the constraint. This can cause difficulties in model fitting arising from ill-conditioning. That is, the columns of the corresponding model matrix can be almost linearly dependent (^{Prescott et al., 2002}). Some consequences of ill-conditioning are that the least squares estimators of the parameters have large standard errors and are highly correlated, and the estimates are highly dependent on the precise location of the design points.

Data from a mixture experiment are usually modelled using Scheffe´'s polynomial models (^{Sheffé, 1958}). The quadratic Scheffé's model has the general form:

Scheffe´'s model

where β _i and β _ij are unknown parameters to be estimated.

Because of the mixture constraint Eq. (1), the quadratic form of Scheffé's model involves linear terms and cross-product terms only, but this could be re-parameterized to include square terms (^{Philip and Norman, 2009}). In fact, there are a number of different ways of writing a polynomial model, of any specified order, obtained by re-parameterization using the mixture constraint (^{Prescott et al., 2002}).

Alternative polynomial model forms include the intercept models, which are obtained by replacing one mixture component, for example x _q , for a constant term. A related model is the slack-variable model, in which one variable (designated the slack variable) is entirely eliminated by substitution; that is, by expressing it in terms of the remaining k - 1 components using Eq. (1), then substituting it into Scheffe´'s model (^{Piepel and Cornell, 1985}). The different between the intercept model and the slack-variable model, is that the slack-variable model has quadratic model terms, while intercept model only present linear and interactions model terms, see (^{Cornell, 2000}; ^{Philip and Norman, 2009}; ^{André, 2005}). Such re-parameterized models are equivalent in the sense that they lead to the same predicted values and basic analysis of variance (^{Philip and Norman, 2009}). These two models are re-parameterizations of one another and all lead to the same fitted response contours and residuals. The equations may be expressed as follows, with the same symbols being used for parameters that are common to the different models:

Intercept mode

(3)

Slack-variable model

(4)

Moreover γ _i = β _i − β _q , α _i = β _i − β _q + β _iq , α _ii = −β _iq and α _ij = β _ij − (β _iq + β _jq ).

The pros and cons of the use of intercept models or slack-variable models, as opposed to Scheffe´'s model, have generated a lot of discussions among research workers and practitioners (^{André, 2005}). This issue was discussed by ^{Cornell (2000)}, one of the questions raised by him was “does it matter which component is designated the slack variable?” He attempted to answer this question by discussing three numerical examples. In ^{André (2005)}, the same issue was revisited from a different perspective. Emphasis was placed on model equivalence through the use of the column spaces of the matrices associated with the fitted models. It was shown that for the Scheffé's complete model and its corresponding slack-variable models, their reduced models, or submodels, provide different types of information. For some reduced models of a given size, Scheffé’s model may provide the best fit, but for other reduced models, some slack-variable models may be preferred. ^{Prescott et al. (2002)} propose an alternative pseudo-component-type transformation that leads to model coefficients that represent predictions at a wider selection of points within the design space. This delivers model coefficients that have a much more meaningful interpretation. ^{Prescott and Draper (2002)} compare the Scheffé model, the Kronecker model and the intercept model. Recently, ^{Kang et al. (2015)} propose a new criterion named “Correlation criterion”, to choose the best Slack-variable model using different components as slack variables, this criterion is only based on the design of the mixture.

In this paper, we analyzed if it matter which component is replaced for the constant term in the intercept model, in the sense on numerical stability. We also show by numerical examples that the Correlation Criterion, presented in ^{Kang et al. (2015)}, does not work for the intercept model. Next, as suggested in the literature, we use linear transformation to alleviate the numerical instability. In addition, we try four transformation methods and choose the best one for the intercept model and the slack-variable model. Finally, we compare the intercept model with the slack-variable model based mainly on the prediction accuracy and numerical stability.

Method

Diagnostic measures

Below we briefly explain some diagnostic measures that help to detect or identify collinearity (^{Cornell, 2003}; ^{Montgomery and Voth, 1994}; ^{Prescott et al., 2002}).

Multiple correlation coefficient

We define x _j as the j _th column of X and X _j as the matrix that results when the column x _j is deleted from X. Then Rj2 is the multiple correlation coefficient obtained by regressing x _j on X _j . When the first column of X is a constant column,Rj2 is usually calculated, for j = 2,..., p, as

(5)

Where 1 is a column of unit elements.

When the column of constants of the X matrix is not available, the unadjusted multiple correlation coefficient can be obtained by

(6)

For j = 1,..., p.

Variance inflation factor

The variance inflation factor (VIF) associated with the estimated regression coefficients β _j is given by

(7)

Small values of VIF are an indication of conditioning.

To evaluate the overall collinearity level of a model, it is propose the mean variance inflation factor (MVIF)

(8)

where p is the number of effects in the model, excluding the intercept.

Condition number

Allow λ_max > λ₂ > ... > λ_p−1 > λ_min to be the p eigenvalue of X’ X, which are p solutions to the determinant equation

which is a polynomial with p roots.

There are many definitions of the condition number (CN) of a matrix. The general definition used in applied statistics is the square root of the ratio of the maximum to the minimum eigenvalues of X’ X denoted by

(9)

Small values of λ_min and large values of λ^max indicate the presence of collinearity. Low values of the CN indicate some level of stability or conditioning in the least squares estimate.

Remedial measures

Linear transformation is suggested in the literature to alleviate the numerical instability. Below we briefly explain tree linear transformation.

L-Pseudocomponents

When ingredients proportions x _i are restricted by lower bounds L _i while retaining an upper bound of 1, (^{Kurotori, 1966}) recommended using L-pseudocomponents of the form

(10)

Where L = ∑ L _i . For the restricted mixture space to exist within the simplex, it is necessary that L < 1. Using the Eq. (10) in the pseudocomponent space, we have

(11)

for i = 1,..., q (^{Prescott et al., 2002}).

U-Pseudocomponents

When the range of each ingredient proportion x _i is restricted by an upper bound U _i only, ^{Crosier (1984)} recommended the use of U-pseudocomponents, defined as

(12)

where U = ∑U _i . For the U-simplex to be a region fully within the original simplex, it is necessary that U − 1 ≤ U _min (^{Crosier, 1984}). If is requirement holds, then the pseudocomponent transformation

(13)

gives only positive multipliers of v _i .

When U − 1 > U _min , the U-simplex extend outside the original mixture simplex and better conditioning may or may not be achieved with U-pseudocomponents, depending on the particular restrictions and the design points.

Modified L-pseudocomponents

For the modified L-pseudocomponent we have to calculate the average over the N observations, x-l=∑u=1Nxui/N, where ∑i=1qx-=1. Next, we calculate the differences x-l-Li, i = 1,2,…, q. Suppose the k ^th component has the mini-mum difference dk=x-k-Lk≤di=x-i-Li,i≠k. Then, instead of all the components being transformed to the L-pseudocomponents as in (10), we use the average of the q − 1 components x-li≠k to define the modified L-pseudocomponents

(14)

where

Is a scale constant (^{Philip and Norman, 2009}).

Correlation criterion

^{Kang et al. (2015)} propose a new criterion named “Correlation criterion”, to choose the best Slack-variable model using different components as slack variables. Below we present de Correlation criterion.

Denote F as the design matrix for the mixture experiment of total q components and n experimental settings. Define r _ij as the correlation between the ith and jth columns of F . This correlation is given by

(15)

where F (,i) is the ith column of F , F-i is the sample mean of F (, i), and 1 is a vector of 1's. To evaluate whether the ith column is collinear with any other columns, we can use its average squared correlation with all the q − 1 columns. Denote it as ri2 and it can be calculated by

(16)

The matrix H is H = I − 1/n 11’and I is the n ⨯ n identity matrix. Thus, we choose the component that has the largest ri2 as the slack variable (^{Kang et al., 2015}).

Results and discussion

Four examples to evaluate de correlation criterion in the intercept model

In this section, we compute the correlation criterion in four examples chosen from the literature.

Example 1

We consider the example used by Cornell and Gorman (2009) involving three components and seven design points in the reduced region constrained by the inequalities

Foremost, we compute the correlation criterion. The calculations are presented below

According to the correlation criterion x ₂ should be replaced by a constant term. In Table 1 and Table 2 we present the VIF associated with the estimated regression coefficients, MVIF and CN for the three intercept models and the three slack-variable models respectively. We use the expression IM _xi to represent the intercept model replacing x _i for a constant term and SV _xi to represent the slack-variable model using x _i as a slack variable. It can be seen in Table 1 that replacing the component with the largest correlation by a constant term, the most stable intercept model is not achieved. On the other hand, it can be seen in Table 2 that choosing the component with the largest correlation criterion as slack variable, the most stable slack-variable model is achieved.

Table 1 VIFs, MVIF and CN for IM _xi Example 1 with original scale.

Table 2 VIFs, MVIF and CN for SV _xi Example 1 with original scale.

Example 2

^{Cornell (2000)} considered a three-component, mixture experiment example involving the tint strength of house paint blends. A simplex-centroid design was chosen with the simplex centroid replicated three times. In this example, if we proceed to compute the correlation criterion, we going to see that the value of r _i is the same for the three components (see Tables 3 and 4). Thus, in this case the correlation criterion does not help to decide which component should be replaced for the constant term or should be selected as slack-variable.

Table 3 VIFs, MVIF and CN for IM _xi Example 2 with original scale.

Table 4 VIFs, MVIF and CN for SV _xi Example 2 with original scale.

Example 3

^{Cornell and Gorman (2003)} present a numerical example with three component, ethanol (x ₁), water (x ₂) and propylene glycol (x ₃). Experiment consist in seven-point design. Constraints on the component proportions were: 0.15 ≤ x ₁ ≤ 0.50, 0.20 ≤ x ₂ ≤ 0.70, 0.15 ≤ x ₃ 0.65. As can be seen in Table 5, again the correlation criterion dose not determine the most stable intercept model.

Table 5 VIFs, MVIF and CN for IM _xi Example 3 with original scale.

Table 6 VIFs, MVIF and CN for SV _xi Example 3 with original scale.

Example 4

^{Cornell (2002)} present an example named The Formulation of a Tropical Beverage. A tropical beverage was formulated by combining juices of watermelon (x ₁), orange (x ₂), pineapple (x ₃), and grapefruit (x ₄). In this example, the component with the largest r _i is x ₁, however, as can be seen in Tables 7 and 8, the most stable intercept model is achieved replacing component x ₂, and the most stable slack-variable model is achieved using any of the three x ₂, x ₃ and x ₄ components.

Table 7 VIFs, MVIF and CN for IM _xi Example 4 with original scale.

Table 8 VIFs, MVIF and CN for SV _xi Example 4 with original scale.

To summarize this section, we can point out that the intercept model can be used, in order to alleviate the collinearity problem. As could be seen in the examples shown in this section, the choice of which component is replaced for the constant term is crucial in the sense on numerical stability. However, the choice of which component should be replaced for a constant term, cannot be performed according with the correlation criterion. We recommend practitioners directly construct each possible intercept model matrix and choose the best one according to maxVIF, MVIF and CN criterion.

Linear transformations

To analyze mixture experiments, it is suggested to perform some linear transformation on the components’ proportions to reduce the ill-conditioning problem. However, different transformation methods work the best for different mixture models. For the slack-variable model, ^{Kang et al. (2015)} recommend scale the design of the proportion into [-1, 1], which is the typical scale used in classical design and analysis of experiments. For other mixture models L-, U- and modified-pseudocomponent transformations are recommended for the literature. Thus, we try all four transformation methods and choose the best one for the intercept model and the slack-variable model.

Example 5

In this example, we used the Example 1 data (Cornell and Gorman, 2009) and applied the four transformation, in order to choose the best one according to maxVIF, MVIF and CN. Based on Table 9, for this example, scaling the components’ proportions into [-1, 1] range works the best for the two models. In this case, the slack-variable model is the most parsimonious model.

Table 9 Comparison of the complete IM model and SV model using the Example 1 data, with different transformation.

Example 6

For the Example 6, we used the Example 4 data (^{Cornell, 2002}) and applied the four transformation, in order to choose the best one according to maxVIF, MVIF and CN. Based on Table 10, for this example, the modified L- Pseudocomponent transformation works the best for the intercept model while scaling the components’ proportions into [-1, 1] range works the best for the slack-variable model. In this case, the intercept model is the most parsimonious model.

Table 10 Comparison of the complete IM model and SV model using the Example 4 data, with different transformation.

Numerical stability and prediction accuracy comparison

The choice of model forms can affect the numerical stability of the information matrix. In this section, we compare the intercept model with the slack-variable model based mainly on the prediction accuracy and numerical stability.

Example 7

In this section, we use the example presents in John (1984), the experiment involves an additive x ₁ and three lubricant blends x ₂, x ₄ and x ₄. The component proportions need to satisfy

First, we use different transformation methods for the two models and choose the best one according to maxVIF, MVIF and CN in Table 11. According to Table 11 scaling the components’ proportions into [-1, 1] range works the best for both models. In this case, the slack-variable model is the most parsimonious model.

Table 11 Comparison of the complete IM model and SV model using the Example 7 data, with different transformation.

In Table 12 we present the analysis of variance (ANOVA) for the slack-variable model using Example 7 date and scaling the components’ proportions into [-1, 1] range. As can be seen in Table 12, the interaction x ₂, x ₄ and the quadratic term x24 have no statistical significance effect over the response according with de P-value at 99% confidence level. Thus, both terms can be removed from the model.

Table 12 ANOVA for the Slack-variable model into [-1, 1] range. Significant codes 0.01 “*”.

On the other hand, in Table 13 we present the ANOVA for the intercept model using Example 7 date and scaling the components’ proportions into [-1, 1] range. As can be seen in Table 13, the intercept terms x ₂, x ₄ and x ₃, x ₄ have no statistical significance effect over the response according with P-value at 99% confidence level. Thus, again we can removed both terms from the model.

Table 13 ANOVA for the Intercept model into [-1, 1] range. Significant codes 0.01 “*”.

Table 14 shows the comparison of the reduced slack variable model and the intercept model.

Table 14 Comparison of the reduced intercept model and slack-variable model into [-1, 1] range.

R ² and Radj2 are the coefficient of determinant and the adjusted version, and σ^ is MSE from the ANOVA. LOOCV and 13-fold CV are the leave-one-out and 13-fold cross validation prediction errors. Both reduced models have 8 terms including the intercept and the same fit quality. Table 14 show that slack-variable model present the best prediction accuracy and the most parsimonious model in terms of numerical stability.

Conclusion

In this paper, we analyzed if it matter which component is replaced for the constant term, in the intercept model, in the sense on numerical stability. By numerical examples, in section: Four examples to evaluate de correlation criterion in the intercept model we showed that the intercept model can be used to reduce the ill-conditioning problem. In addition, evidence was given that the choice of which component is replaced for the constant term is crucial in the sense on numerical stability. Moreover, we computed the correlation criterion and we determined that this criterion does not work for the intercept model, that is, the choose of which component should be replaced for a constant term, cannot be performed according with the correlation criterion. We recommend practitioners directly construct each possible intercept model matrix and choose the best one according to maxVIF, MVIF and CN criterion.

In section: linear transformations we tried four transformation methods and choose the best one for the intercept model and the slack-variable model. Scaling the components’ proportions into [-1, 1] range works the best for the slack-variable model, the modified L-Pseudocomponent transformation works the best for the intercept model in some cases.

Finally, in section: numerical stability and prediction accuracy comparison we compare the intercept model with the slack-variable model based mainly on the prediction accuracy and numerical stability. The slack-variable model has the best prediction accuracy and is the most parsimonious model in terms of numerical stability.

Acknowledgements

This research was supported by CONACYT, UPB and CIATEC.

Refrencias

Andre´. Slack-variable models versus Scheffe´’s mixture models. Journal of Applied Statistics, volume 32 (issue 9), 2005: 887-908. [ Links ]

Cornell. Fitting a slack-variable model to mixture data: some questions raised. Journal of Quality Technology, volume 32 (issue 2), 2000: 133-147. [ Links ]

Cornell J.A. Multiple constraints on the component proportions, in: Experiments with mixtures: designs, models and the analysis of mixture data, New York, John Wiley and Sons, 2002, pp. 141-144. [ Links ]

Cornell J.A. and Gorman J.W. Two new mixture models: Living with collinearity but removing its influence. Journal of Quality Technology, volume 35 (issue 1), 2003: 78-88. [ Links ]

Crosier R.B. Mixture experiments: geometry and pseudocomponents. Technometrics, (issue 26), 1984: 209-216. [ Links ]

Goldfarb C.M., Anderson-Cook C.M., Borror, Montgomery D.C. Fraction of design space plots for assessing mixture and mixture-process designs. Journal of Quality Technology, volume 36, 2004: 169-179. [ Links ]

John-Ralph St.C. Experiments with mixtures, ill-conditioning, and ridge regression. Journal of Quality Technology, (issue 16), 1984: 81-96. [ Links ]

Kang L., Cruz-Salgado J., Brenneman W. Comparing the slackvariable mixture model with other alternatives. Technometrics, p. Accepted, 2015. [ Links ]

Kurotori I.S. Experiments with mixtures of components having lower bounds. Industrial Quality Control, (issue 22), 1966: 592-596. [ Links ]

Montgomery D.C. and Voth S.E. Multicollinearity and leverage in mixture experiments. Journal of Quality Technology, volume 27, 1994: 96-108. [ Links ]

Philip and Norman D.R. Modeling in restricted mixture experiment spaces for three mixture components. Quality Technology & Quantitative Management, volume 6, (issue 3), 2009: 207-217. [ Links ]

Piepel and Cornell. Models for mixture experiments when the response depends on the total amount. Technometrics, volume 27, 1985: 219-227. [ Links ]

Prescott A.M., Dean N.R., Draper, Lewis S.M. Mixture experiments: Ill-conditioning and quadratic model specification. Technometrics, volume 44, 2002: 260-268. [ Links ]

Scheffé. Experiments with mixtures. Journal of the Royal Statistical Society Ser. B, (issue 20), 1958: 344-360. [ Links ]

Bibliography

Kowalski S.M., Cornell J.A., Vining G.G. Kowalskisplit-plot designs and estimation methods for mixture experiments with process variables. Technometrics, volume 44, 2002: 72-79. [ Links ]

SmithW.F. Experimental design for formulation ASA-SIAM, Series on Statistics and Applied Probability, Philadelphia, PA, ASASIAM, Alexandria, VA., 2005. [ Links ]

Received: January 2016; Accepted: April 2016

* Javier Cruz-Salgado. BS, industrial engineering, Universidad Tecnologica de Leon, March 2007. MSc, manufacturing and industrial engineering, CIATEC/CONACYT, August 2012. PhD, manufacturing and industrial engineering, CIATEC/CONACYT, November 2015. Intern at Materials Research Department CIATEC 2011- 2012. Research stay at Illinois Institute of Technology in the Department of Applied Mathehmatics 2013. Head of Research and Technological Development Department in Universidad Politecnica del Bicentenario.

This is an open-access article distributed under the terms of the Creative Commons Attribution License