Previous | Table of Contents | Next |
The term transformation is generally used when some function of the measured response variable y is used in place of y in a model. For example, a square root transformation may be used in a multiple linear regression model to fit a model of the form
It may appear that transformation is a subset of the curvilinear regression. However, the ideas discussed in this section apply not only to the regression model but to other models as well, such as those used in factorial experimental designs discussed in Part IV. Thus, the discussion of transformations deserves a special section of its own.
There are three cases where a need for transformation should be investigated. First, a transformation should be used if it is known from physical considerations of the system that a function of the response rather than the response itself is a better variable to use in the model. For example, if an analyst has measured the interarrival times y for requests and it is known that the number of requests per unit time (1/y) has a linear relationship to a certain predictor, then 1/y transformation should be used. We will see several such examples during experimental design and analysis. Second, a transformation should be investigated if the range of the data covers several orders of magnitude and the sample size is small. In other words, if ymax/ymin is large, a transformation of the response that reduces the range of variability should be investigated. Third, transformations are used if the homogeneous variance (homoscedasticity) assumption of the residuals is violated, as discussed next.
If a scatter plot of the residual versus predicted response shows that the spread in the residuals is not homogeneous, this indicates that the residuals are still functions of the predictor variables. The assumed linear model does completely describe the relationship, and a transformation of the response may help solve the problem. To find the transformation, compute the standard deviation of residuals at each value of
(assuming that there are more than one residuals at each value) and plot the standard deviation as a function of the mean
If there are several replicated measurements of response variable y for each given set of predictor variable values, then this plot can be prepared even before fitting a model. For each set of replicated observations, the standard deviation s and the mean
is computed and plotted on a scatter diagram. Suppose the relationship between the standard deviation s and the mean
is
Then a transformation of the form
w = h(y)
may help solve the problem, where
A few transformations that have been found useful in practice are as follows:
FIGURE 15.2 Standard deviation versus mean response graphs can be used to determine the transformation required.
These transformations and a few others are listed in Table 15.9. In each case, y may also be shifted, and y + c (with some suitable c) may be used in place of y. This shifting is useful if there are negative or zero values and if the transformation function is not defined for these values.
If the value of the exponent a in a power transformation is not known, the following transformation family, called Box-Cox family of transformations, can be used:
Where g is the geometric mean of the responses:
g = (y1y2...yn)1/n
TABLE 15.9 Transformations to Stabilize the Variance |
---|
Previous | Table of Contents | Next |