Chapter 8 Statistical Intervals for a Single Sample 8-1 INTRODUCTION z An interval estimate for a population parameter is called a confidence interval, the confidence interval is constructed so that we have high confidence that it does contain the unknown population parameter. z A tolerance interval is another important type of interval estimate. z For a normal distribution, we know that 95% of the distribution is in the interval µ − 1.96σ, µ + 1.96σ z (8-1) However, this is not a useful tolerance interval because the parameters µ and σ are unknown. z Point estimates such as x and s can be used in Equation 8-1 for µ and σ . The result is an interval of the form x − ks , x + ks (8-2) where k is an appropriate constant (that is larger than 1.96 to account for the estimation error). z A prediction interval provides bounds on one (or more) future observations from the population. z Keep the purpose of the three types of interval estimates clear: z A confidence interval bounds population or distribution parameters (such as the mean viscosity). z A tolerance interval bounds a selected proportion of a distribution. A prediction interval bounds future observations from the z population or distribution. 8-2 CONFIDENCE INTERVAL ON THE MEAN OF A NORMAL DISTRIBUTION, VARIANCE KNOWN 8-2.1 Development of the Confidence Interval and its Basic Properties z Suppose that X1, X2, , Xn is a random sample from a normal distribution with unknown mean µ and known variance σ 2 . z Sample mean X is normally distributed with mean µ and variance σ2 / n . X −µ σ/ n z Standardize X : Z = z A confidence interval (CI) estimate for µ is an interval of the form (8-3) l ≤ µ ≤ u , where the endpoints l and u are computed from the sample data. z Because different samples will produce different values of l and u, these end-points are values of random variables L and U. z Determine values of L and U such that the following probability statement is true: P{L ≤ µ ≤ U } = 1 − α z Once we have selected the sample, (8-4) so that X 1 = x1 , X 2 = x2 , … , X n = x n , and computed l and u, the resulting confidence interval for µ is l ≤µ≤u z (8-5) The end-points or bounds l and u are called the lower- and upper-confidence limits. z 1 − α is called the confidence coefficient. X −µ ⎧ ⎫ ≤ zα / 2 ⎬ = 1 − α P ⎨− z α / 2 ≤ σ n ⎩ ⎭ σ σ ⎫ ⎧ ≤ X − µ ≤ zα / 2 ⋅ P ⎨− z α / 2 ⋅ ⎬ = 1− α n n ⎩ ⎭ σ σ ⎫ ⎧ ≤ −µ ≤ − X + z α / 2 ⋅ P ⎨− X − z α / 2 ⋅ ⎬ = 1− α n n⎭ ⎩ σ σ ⎫ ⎧ ≤ µ ≤ X − zα / 2 ⋅ P⎨ X + zα / 2 ⋅ ⎬ = 1− α n n⎭ ⎩ 1− α α/2 − zα /2 α/2 + zα /2 Definition If is the sample mean of a random sample of size n from a normal population with known variance σ 2 , a 100(1 − α )% CI on µ is given by x − zα / 2 σ / n ≤ µ ≤ x + zα / 2σ / n (8-7) where z α / 2 σ is the upper 100α / 2 percentage point of the standard normal distribution. EXAMPLE 8-1 z Ten measurements of impact energy (J) on specimens of A238 steel cut at 60ºC are as follows: 64.1, 64.7, 64.5, 64.6, 64.5, 64.3, 64.6, 64.8, 64.2, and 64.3. z Assume that impact energy is normally distributed with σ = 1J. z Find a 95% CI for µ . z The required quantities are z α / 2 = z 0.025 = 1.96 , n = 10 , σ = 1 , and x = 64.46 . 95% 2.5% − z 0 .025 2.5% z 0 .025 P( z < z 0.025 ) = (1 − 95%) / 2 + 95% = 97.5% ⇒ z 0.025 = 1.96 σ σ ≤ µ ≤ x + zα / 2 n n 1 1 64.46 − 1.96 ≤ µ ≤ 64.46 + 1.96 10 10 63.84 ≤ µ ≤ 65.08 x − zα / 2 z Find a 99% CI for µ ! z Confidence Level and Precision of Estimation z In Example 8-1 z At α = 0.01, we find z α / 2 = z 0.01 / 2 = z 0.005 = 2.58 , while for α = 0.05 , z 0.025 = 1.96 . z The length of the 95% confidence interval 2(1.96σ / n ) = 3.92σ / n z The length of the 99% CI 2(2.58σ / n ) = 5.16σ / n z Thus, the 99% CI is longer than the 95% CI. z Generally, for a fixed sample size n and standard deviation σ , the higher the confidence level, the longer the resulting CI. z The length of a confidence interval is a measure of the precision of estimation. CI ∝ 1 percision of estimation 8-2.2 Choice of Sample Size z The precision of the confidence interval in Equation 8-7 is 2 zσ / 2σ / n . z The error E = x − µ is less than or equal to z α / 2 σ / n with confidence 100(1 − α) . See Fig. 8-2. Figure 8-2 Error in estimating µ with x . z The appropriate sample size is found by choosing n such that zα / 2σ / n = E . Definition If is used as an estimate of µ , we can be 100(1 − α) % confident that the error x − µ will not exceed a specified amount E when the sample size is ⎛ z σ⎞ n = ⎜ α/2 ⎟ ⎝ E ⎠ 2 , CI = 2 E (8-8) z Notice that 2E is the length of the resulting confidence interval. EXAMPLE 8-2 z In Example 8-1, suppose that we wanted to determine how many specimens must be tested to ensure that the 95% CI on µ for A238 steel cut at 60°C has a length of at most 1.0J. z Since the bound on error in estimation E is one-half of the length of the CI, to determine n we use Equation 8-8 with E = 0.5 , σ = 1, and z α / 2 = 0.025 . ⎛ z σ⎞ ⎡ (1.96)1⎤ n = ⎜ α/2 ⎟ = ⎢ = 15.37 ≅ 16 ⎣ 0.5 ⎥⎦ ⎝ E ⎠ 2 2 8-2.3 One-Sided Confidence Bounds Definition A 100(1 − α) % upper-confidence bound for µ is µ ≤ u = x + zα σ / n (8-9) and a 100(1 − α) % lower-confidence bound for µ is x − zα σ / n = l ≤ µ (8-10) 8-2.5 A Large-Sample Confidence Interval for µ z It turns out that when n is large, replacing σ by the sample standard deviation S has little effect on the distribution of Z. Definition When n is large, the quantity X −µ S/ n has an approximate standard normal distribution. Consequently, x − zα / 2 s s ≤ µ ≤ x + zα / 2 n n (8-13) is a large sample confidence interval for µ , with confidence level of approximately 100(1 − α) %. z Generally n should be at least 40 to use this result reliably. EXAMPLE 8-3 z A sample of fish was selected from 53 Florida lakes z Mercury concentration in the muscle tissue was measured (ppm). z The mercury concentration values are z The summary statistics from Minitab are displayed below: Descriptive Statistics: Concentration Variable N Mean Median TrMean StDev SE Mean Concentration 53 0.5250 0.4900 0.5094 0.3486 0.0479 Variable Maximum Q1 Q3 1.3300 0.7900 Minimum Concentration 0.0400 z 0.2300 Figure 8-3(a) and (b) presents the histogram and normal probability plot of the mercury concentration data. z Find an approximate 95% CI on µ . z n = 53, x = 0.5250, s = 0.3486, and z 0.025 = 1.96. z Because n>40, we can use Eq 8-13. s s ≤ µ ≤ x + z 0.025 n n 0.3486 0.3486 ≤ µ ≤ 0.5250 + 1.96 0.5250 − 1.96 53 53 0.4311 ≤ µ ≤ 0.6189 x − z 0.025 z Find an approximate 99% CI on µ . Figure 8-3 Mercury concentration in largemouth bass (a) Histogram. (b) Normal probability plot. z A General Large Sample Confidence Interval. z The large-sample confidence interval for µ in Equation 8-13 is a special case. z Suppose that θ is a parameter of a probability distribution. z ˆ be an estimator of θ . Let Θ z A large-sample approximate CI for θ is given by θˆ − z α / 2 σ Θˆ ≤ θ ≤ θˆ + z α / 2 σ Θˆ Exercise 8-2 : 8-1, 8-3, 8-5, 8-7, 8-9, 8-11, 8-13, 8-15. (8-14) 8-3 CONFIDENCE INTERVAL ON THE MEAN OF A NORMAL DISTRIBUTION, VARIANCE UNKNOWN z We wish to construct a two-sided CI on µ . z If the variance σ 2 is known, we know that Z = ( X − µ) /(σ / n ) has a standard normal distribution. z When σ 2 is unknown, a logical procedure is to replace σ with the sample standard deviation S. z The random variable Z now becomes T = ( X − µ) /( S / n ) . z A logical question is what effect does replacing σ by S have on the distribution of the random variable T? z If n is large, the answer to this question is “very little.” z However, n is usually small in most engineering problems, and in this situation a different distribution must be employed to construct the CI. 8-3.1 The t Distribution Definition Let X 1 , X 2 , ... , X n be a random sample from a normal distribution with unknown mean µ and unknown variance σ 2 . The random variable T= X −µ S/ n has a t distribution with n - 1 degrees of freedom. (8-15) z The t probability density function f ( x) = Γ[(k + 1) / 2] 1 ⋅ 2 -∞ < x < ∞ ( k +1) / 2 πk Γ(k / 2) [( x / k ) + 1] (8-16) where k is the number of degrees of freedom. The mean and variance of the t distribution are zero and k/(k - 2) (for k > 2), respectively. z Appendix Table IV provides percentage points of the t distribution. z Let t α ,k be the value of the random variable T with k degrees of freedom above which we find an area (or probability) α . Figure 8-5 Percentage points of the t distribution. z Ex: The t-value with 10 degrees of freedom having an area of 0.05 to the right is t 0.05,10 = 1.812 . 查表 P (T10 > t 0.05,10 ) = P (T10 > 1.812) = 0.05 99% 0.05% 0.05% t 0.05,10 8-3.3 The t Confidence Interval on µ Definition If and s are the mean and standard deviation of a random sample from a normal distribution with unknown variance σ 2 , a 100(1 − α) percent confidence interval on µ is given by x − t α / 2,n−1 s / n ≤ µ ≤ x + t α / 2 ,n−1 s / n (8-18) where tα / 2,n−1 is the upper 100α / 2 percentage point of the t distribution with n – 1 degrees of freedom. (n < 40) EXAMPLE 8-4 (n < 40) z The load at specimen failure is as follows (in megapascals): 19.8 10.1 14.9 7.5 15.4 15.4 15.4 18.5 7.9 12.7 11.9 11.4 11.4 14.1 17.6 16.7 15.8 19.5 8.8 z 13.6 11.9 11.4 The sample mean is x = 13.71 , and the sample standard deviation is s = 3.55. z We want to find a 95% CI on µ . z Since n = 22, we have n - 1 = 21 degrees of freedom for t, so t 0.025, 21 = 2.080 . x − t α / 2 ,n−1 s / n ≤ µ ≤ x + t α / 2 ,n−1 s / n 13.71 − 2.080(3.55) / 22 ≤ µ ≤ 13.71 + 2.080(3.55) / 22 13.71 − 1.57 ≤ µ ≤ 13.71 + 1.57 12.14 ≤ µ ≤ 15.28 Exercise 8-3 : 8-21, 8-23, 8-25, 8-27, 8-29, 8-31. 8-4 CONFIDENCE INTERVAL ON THE VARIANCE AND STANDARD DEVIATION OF A NORMAL POPULATION Definition Let X 1 , X 2 , ... , X n be a random sample from a normal distribution with mean µ and variance σ 2 , and S 2 let be the sample variance. Then the random variable (n − 1) S 2 X = σ2 2 (8-19) has a chi-square ( X 2 ) distribution with n - 1 degrees of freedom. z The probability density function of a χ 2 random variable is f ( x) = 1 x ( k / 2 )−1e − x / 2 k/2 2 Γ ( k / 2) x>0 where k is the number of degrees of freedom. z Several chi-square distributions are shown in Fig. 8-8. (8-20) Figure 8-8 Probability density functions of several χ 2 distributions. z Define χ α2 ,k as the percentage point or value of the chi-square random variable with k degrees of freedom such that the probability that X 2 exceeds this value is α . P( X > χ ) = 2 2 α ,k ∞ ∫ f (u )du = α χα2 ,k This probability is shown as the shaded area in Fig. 8-9(a). Figure 8-9 Percentage point of the χ 2 distribution. (a) The percentage point χ α2 ,k . (b) The upper percentage point χ 02.05,10 = 18.31 and the lower percentage point χ 02.95,10 = 3.94 . z z z The construction of the 100(1 − α) % CI for σ 2 . (n − 1) S 2 σ2 Chi-square with n − 1 degrees of freedom, we may write Because X 2 = P(χ12−α / 2 ,n−1 ≤ X 2 ≤ χ α2 / 2 ,n−1 ) = 1 − α (n − 1) S 2 ⎛ 2 ⎞ P⎜ χ1−α / 2 ,n−1 ≤ ≤ χ α2 / 2 ,n−1 ⎟ = 1 − α 2 σ ⎝ ⎠ ⎛ (n − 1) S 2 (n − 1) S 2 ⎞ ⎟ = 1− α P⎜⎜ 2 ≤ σ2 ≤ 2 ⎟ χ χ 1−α / 2 ,n −1 ⎠ ⎝ α / 2 ,n−1 1− α α/2 α/2 χ12−α / 2,n−1 χ α2 / 2,n−1 Definition If s 2 is the sample variance from a random sample of n observations from a normal distribution with unknown variance σ 2 , then a 100(1 − α )% confidence interval on σ 2 is (n − 1) S 2 (n − 1) S 2 2 ≤σ ≤ 2 χ α2 / 2,n−1 χ1−α / 2,n−1 where χ α2 / 2,n −1 and χ12−α / 2,n−1 (8-21) are the upper and lower 100α / 2 percentage points of the chi-square distribution with n − 1 degrees of freedom, respectively. A confidence interval for σ has lower and upper limits that are the square roots of the corresponding limits in Equation 8-21. The 100(1 − α) % lower and upper confidence bounds on σ 2 are (n − 1) s 2 ≤ σ2 2 χ α , n −1 and (n − 1) s 2 σ ≤ 2 χ1− α , n −1 2 (8-22) respectively. EXAMPLE 8-5 z An automatic filling machine is used to fill bottles with liquid detergent. z A random sample of 20 bottles results in a sample variance of fill volume of s 2 = 0.0153 (fluid ounces) 2 . z We will assume that the fill volume is approximately normally distributed. z A 95% upper-confidence interval is found from Equation 8-22. (n − 1) s 2 σ ≤ 2 χ 0.95,19 2 σ2 ≤ z (19)0.0153 = 0.0287 (fluid ounce) 2 10.117 σ ≤ 0.17 Therefore, at the 95% level of confidence, the data indicate that the process standard deviation could be as large as 0.17 fluid ounce. Exercise 8-4 : 8-33, 8-35, 8-37, 8-39, 8-41 8-5 A LARGE-SAMPLE CONFIDENCE INTERVAL FOR A POPULATION PROPORTION z Construct confidence intervals on a population proportion. z A random sample of size n has been taken from a large (possibly infinite) population. z X (≤ n) observations in this sample belong to a class of interest. z Pˆ = X / n is a point estimator of the proportion of the population p that belongs to this class. z n and p are the parameters of a binomial distribution. z The sampling distribution of Pˆ is approximately normal with mean p and variance p (1 − p ) / n . X Pˆ = n X n interest class interest class =p population ⎡X ⎤ 1 E[ Pˆ ] = E ⎢ ⎥ = ⋅ np = p ⎣n⎦ n 1 1 1 1 V [ Pˆ ] = 2 V [ X ] = 2 ⋅ npq = pq = p ⋅ (1 − p ) n n n n Definition If n is large, the distribution of Z= X − np = np (1 − p ) is approximately standard normal. Pˆ − p p (1 − p ) n z To construct the confidence interval on p P(− z α / 2 ≤ Z ≤ z α / 2 ) ≅ 1 − α ⎛ ⎜ P⎜ − z α / 2 ≤ ⎜ ⎜ ⎝ ⎞ ⎟ ˆ P− p ≤ zα / 2 ⎟ ≅ 1 − α ⎟ p (1 − p ) ⎟ n ⎠ ⎛ P⎜⎜ Pˆ − z α / 2 ⎝ p (1 − p ) ≤ p ≤ Pˆ + z α / 2 n p (1 − p ) ⎞ ⎟⎟ ≅ 1 − α n ⎠ standard error of the point estimator Pˆ z Equation 8-23 contain the unknown parameter p. z To replace Pˆ by in the standard error. Definition If pˆ is the proportion of observations in a random sample of size n that belongs to a class of interest, an approximate 100(1 − α) % confidence interval on the proportion p of the population that belongs to this class is pˆ − z α / 2 pˆ (1 − pˆ ) ≤ p ≤ pˆ + z α / 2 n pˆ (1 − pˆ ) n (8-25) where z α / 2 is the upper α / 2 percentage point of the standard normal distribution. EXAMPLE 8-6 z In a random sample of 85 automobile engine crankshaft bearings. z 10 have a surface finish that is rougher than the specifications allow. z z pˆ = x / n = 10 / 85 = 0.12. A 95% two-sided confidence interval for p is computed from Equation 8-25 as pˆ (1 − pˆ ) ≤ p ≤ pˆ + z 0.025 n pˆ − z 0.025 pˆ (1 − pˆ ) n 0.12(0.88) 0.12(0.88) ≤ p ≤ 0.12 + 1.96 85 85 0.05 ≤ p ≤ 0.19 0.12 − 1.96 Choice of Sample Size z Pˆ is the point estimator of p. z The error in estimating p by Pˆ as E = p − Pˆ . z Set E = z α / 2 p (1 − p) / n and solve for n. E = (zα / 2 ) 2 2 p (1 − p ) ⎛z ⎞ ⋅ ⇒ n = ⎜ α / 2 ⎟ ⋅ p (1 − p ) n ⎝ E ⎠ 2 2 ⎛z ⎞ n = ⎜ α / 2 ⎟ p (1 − p ) ⎝ E ⎠ z (8-26) Choosing n uses the fact that the sample size from Equation 8-26 will always be a maximum for p = 0.5. upper bound on n 2 ⎛z ⎞ n = ⎜ α / 2 ⎟ (0.25) ⎝ E ⎠ Set p = 0.5 (8-27) EXAMPLE 8-7 z See Ex 8-6. z How large a sample is required if we want to be 95% confident that the error in using pˆ to estimate p is less than 0.05? z Using pˆ = 0.12 as an initial estimate of p. 2 2 ⎛z ⎞ ⎛ 1.96 ⎞ n = ⎜ 0.025 ⎟ pˆ (1 − pˆ ) = ⎜ ⎟ 0.12(0.88) ≅ 163 ⎝ 0.05 ⎠ ⎝ E ⎠ z At least 95% confident that our estimate pˆ of the true proportion p was within 0.05 regardless of the value of p, we would use Equation 8-27 to find the sample size 2 2 ⎛z ⎞ ⎛ 1.96 ⎞ n = ⎜ 0.025 ⎟ (0.25) = ⎜ ⎟ (0.25) ≅ 385 ⎝ 0.05 ⎠ ⎝ E ⎠ One-Sided Confidence Bounds The approximate 100(1 − α) % lower and upper confidence bounds are pˆ − z α pˆ (1 − pˆ ) ≤p n respectively. Exercise 8-5 : 8-43, 8-45, 8-47. and p ≤ pˆ + z α pˆ (1 − pˆ ) n (8-28) 8-6 A PREDICTION INTERVAL FOR A FUTURE OBSERVATION z In this section we show how to obtain a 100(1 − α) % prediction interval on a future value of a normal random variable. z Suppose that X 1 , X 2 , ..., X n is a random sample from a normal population. z We wish to predict the value X n+1 , a single future observation. z A point prediction of X n+1 is X , the sample mean. z The prediction error is X n+1 − X . z The expected value of the prediction error is E ( X n+1 − X ) = µ − µ = 0 z The variance of the prediction error V ( X n+1 − X ) = σ 2 + z The prediction error X n+1 − X is normally distributed. Z= z σ2 1 = σ 2 (1 + ) n n X n+1 − X 1 σ 1+ n Replacing σ with S results in T= X n+1 − X 1 S 1+ n which has a t distribution with n − 1 degrees of freedom. Definition A 100(1 − α) % prediction interval on a single future observation from a normal distribution is given by x − t α / 2,n−1 s 1 + 1 1 ≤ X n+1 ≤ x + t α / 2 ,n−1 s 1 + n n (8-29) EXAMPLE 8-8 z See Example 8-4. z The load at failure for n = 22 specimens was observed. z x = 13.71 and s = 3.55 . z 95% confidence interval on µ was 12.14 ≤ µ ≤ 15.28 . z We plan to test a twenty-third specimen. z A 95% prediction interval on the load at failure for this specimen is x − t α / 2,n−1 s 1 + 1 1 ≤ X n+1 ≤ x + t α / 2,n−1 s 1 + n n 13.71 − (2.080)3.55 1 + 1 1 ≤ X 23 ≤ 13.71 + (2.080)3.55 1 + 22 22 6.16 ≤ X 23 ≤ 21.26 z Notice that the prediction interval is considerably longer than the CI. Exercise 8-6 : 8-49, 8-51, 8-55, 8-57, 8-59. 8-7 TOLERANCE INTERVALS FOR A NORMAL DISTRIBUTION z The interval from µ − z α / 2 to µ − z α / 2 σ is called a tolerance interval. Definition A tolerance interval for capturing at least γ % of the values in a normal distribution with confidence level 100(1 − α) % is x − ks, x + ks where k is a tolerance interval factor found in Appendix Table XI. Values are given for γ = 90%, 95%, and 95% and for 95% and 99% confidence. EXAMPLE 8-9 z See Example 8-4. z The load at failure for n = 22 specimens was observed, and we found that x = 31.71 and s = 3.55 . z Find a tolerance interval for the load at failure that includes 90% of the values in the population with 95% confidence. z The tolerance factor k for n = 22 , γ = 0.90 , and 95% confidence is k = 2.264 . z The desired tolerance interval is ( x − ks, x + ks)or[31.71 − (2.264)3.55,31.71 + (2.264)3.55] = (23.67,39.75) z 95% confident that at least 90% of the values of load at failure for this particular alloy lie between 23.67 and 39.75 megapascals. Exercise 8-7 : 8-61, 8-63, 8-65, 8-67, 8-69, 8-71, 8-77, 8-79, 8-81, 8-83, 8-85.
© Copyright 2025