Previous Table of Contents Next


12.4 COMMON MISUSES OF MEANS

The following is a list of some of the mistakes that are often committed by novices:

  Using Mean of Significantly Different Values: When the mean is the correct index of central tendency for a variable, it does not automatically imply that a mean of any set of that variable will be useful. Usefulness depends upon the number of values and the variance, not only on the type of the variable. For example, it is not very useful to say that the mean CPU time per query is 505 milliseconds when the two measurements come out to be 10 and 1000 milliseconds. An analysis based on 505 milliseconds would lead nowhere close to the two possibilities. In this particular example, the mean is the correct index but is useless.
  Using Mean without Regard to the Skewness of Distribution: Another example of misuse of means is shown in Table 12.1, where response times for two different systems have been tabulated. Both have mean response times of 10. In the first case, it is useful to know the mean because the variance is low and 10 is the typical value. In the second case, the typical value is 5; hence, using 10 for the mean does not give any useful result. The variability is too large in this case.
  Multiplying Means To Get the Mean of a Product: The mean of a product of two random variables is equal to the product of means if the two random variables are independent. If x and y are correlated,

E(xy) ≠ E(x)E(y)

Example 12.1 On a timesharing system, the total number of users and the number of subprocesses for each user are monitored. The average number of users is 23. The average number of subprocesses per user is 2. What is the average number of subprocesses?
TABLE 12.1 System Response Times for 5 Days

System A System B

10 5
9 5
11 5
10 4
10 31

Sum 50 50
Mean 10 10
Typical 10 5

Is it 46? No! The number of subprocesses a user spawns depends upon how much load there is on the system. On an overloaded system (large number of users), users try to keep the number of subprocesses low, and on an underloaded system, users try to keep the number of subprocesses high. The two variables are correlated, and therefore, the mean of the product cannot be obtained by multiplying the means. The total number of subprocesses on the system should be continuously monitored and then averaged.

  Taking a Mean of a Ratio with Different Bases: This has already been discussed in Chapter 11 on ratio games and is discussed further in Section 12.7.

12.5 GEOMETRIC MEAN

The geometric mean of n values x1, x2, . . ., xn is obtained by multiplying the values together and taking the nth root of the product:

The mean discussed in other sections is what should be termed the arithmetic mean. The arithmetic mean is used if the sum of the observations is a quantity that is of interest. Similarly, the geometric mean is used if the product of the observations is a quantity of interest.

Example 12.2 The performance improvements in the latest version of seven layers of a new networking protocol was measured separately for each layer. The observations are listed in Table 12.2. What is the average improvement per layer? The improvements in the seven layers work in a “multiplicative” manner, that is, doubling the performance of layer 1 and layer 2 shows up as four times the improvement in performance.
TABLE 12.2 Improvement in Each Layer of Network Protocol

Protocol
Layer
Performance
Improvement (%)

7 18
6 13
5 11
4 8
3 10
2 28
1 5

Average improvement per layer
= {(1.18)(1.13)(1.11)(1.08)(1.10)(1.28)(1.05)}1/7 -1
= 0.13
Thus, the average improvement per layer is 13%.

Other examples of metrics that work in a multiplicative manner are as follows:

  Cache hit ratios over several levels of caches
  Cache miss ratios
  Percentage performance improvement between successive versions
  Average error rate per hop on a multihop path in a network

The geometric mean can be considered as a function gm( ), which maps a set of responses {x1, x2, . . ., xn} to a single number . It has the following multiplicativity property:

That is, the geometric mean of a ratio is the ratio of the geometric means of the numerator and denominator. Thus, the choice of the base does not change the conclusion. It is because of this property that sometimes the geometric mean is recommended for ratios. However, if the geometric mean of the numerator or denominator does not have any physical meaning, the geometric mean of their ratio is meaningless as well. Means of ratios are discussed further in Section 12.7.


Previous Table of Contents Next

Copyright © John Wiley & Sons, Inc.