Chapter 7: Inference for Distributions A visual comparison of normal and paranormal distribution Lower caption says 'Paranormal Distribution' - no idea why the graphical artifact is occurring. http://stats.stackexchange.com/questions/423/what-is-your-favorite-data-analysis-cartoon 2 7.1: Inference for the Mean of a Population - Goals • Be able to distinguish the standard deviation from the standard error of the sample mean. • Be able to construct a level C confidence interval (without knowing ) and interpret the results. • Perform a one-sample t significance and summarize the results. • Be able to determine when the t procedure is valid. 3 Conditions for Inference (Chapter 6) 1. The variable we measure has a Normal distribution with mean and standard deviation σ. 2. We don’t know , but we do know σ. 3. We have an SRS from the population of interest. 4 Shape of t-distribution http://upload.wikimedia.org/wikipedia/commons/thumb/4/41/Student_t_pdf.svg/1000 px-Student_t_pdf.svg.png 5 t-Table (Table D) 6 Table A vs. Table D Table A Standard normal (z) P(Z ≤ z) df not required Table D t-distribution P(T > t) df required 7 Example: t critical values What is the t critical value for the following: a) Central area = 0.95, df = 10 b) Central area = 0.95, df = 60 c) Central area = 0.95, df = 100 d) Central area = 0.95, z curve e) Upper area = 0.99, df = 10 f) lower area = 0.99, df = 10 8 Summary: CI Confidence Interval x ± t*(df) s n Upper Confidence Bound < x + t*(df) Lower Confidence Bound > x - t*(df) Sample Size t's n m s n s n 2 9 Example: Sample size You are in charge of quality control in your food company. You sample randomly four packs of cherry tomatoes. The average weight from your four boxes is 222 g with a sample standard deviation of 5 g. a) What sample size is required to obtain a margin of error of 2 g at a 95% confidence level? 10 Single mean test: Summary Null hypothesis: H0: μ = μ0 x 0 Test statistic: t s/ n Alternative Hypothesis One-sided: upper-tailed Ha: μ > μ0 One-sided: lower-tailed Ha: μ < μ0 two-sided Ha: μ ≠ μ0 P-Value P(T ≥ t) P(T ≤ t) 2P(T ≥ |t|) 11 Robustness of the t-procedure • A statistical value or procedure is robust if the calculations required are insensitive to violations of the condition. • The t-procedure is robust against normality. – n < 15 : population distribution should be close to normal. – 15 < n < 40: mild skewedness is acceptable – n > 40: procedure is usually valid. 12 Inferences for Non-Normal Distributions • If you know what the distribution is, use the appropriate model. • If the data is skewed, you can transform the variable. • Use a nonparametric procedure. 13 7.2: Comparing two Means - Goals • Be able to construct a level C confidence interval for the difference between two means and interpret the results. • Perform a two-sample t significance and summarize the results. • Be able to construct a level C confidence interval for a matched pair and interpret the results. • Perform a matched pair t significance and summarize the results. • Be able to determine when the t procedure is valid. 14 Conditions for Inference: 2 - sample 1. Each group is considered to be a sample from a distinct population. • We have an SRS from the population of interest for each variable. 2. The responses in each group are independent of those in the other group. 3. The variable(s) we measure has a Normal distribution with mean and standard deviation σ. 15 Df for 2-sample t test s s n1 n 2 df 2 2 2 2 1 s1 1 s2 n1 1 n1 n 2 1 n 2 2 1 2 2 2 16 Two-sample Test (independent): Summary Null hypothesis: H0: μ1 – μ2 = Δ Test statistic: t x1 x2 s12 s22 n1 n2 Alternative P-Value Hypothesis Upper-tailed Ha: μ1 – μ2 > Δ P(T ≥ t) Lower-tailed Ha: μ1 – μ2 < Δ P(T ≤ t) two-sided Ha: μ1 – μ2 ≠ Δ 2P(T ≥ |t|) Note: If we are determining if the two populations are equal, then Δ = 0 17 Example: two-sample Independent t A group of 15 college seniors are selected to participate in a manual dexterity skill test against a group of 20 industrial workers. Skills are assessed by scores obtained on a test taken by both groups. The data is shown in the following table: a) Perform a significance test to determine if the skills are the same for college students and industrial works at a significance level of 0.05. b) Calculate and interpret the 95% confidence interval. Group n x̅ s Students Workers 15 20 35.12 37.32 4.31 3.83 18 Example: two-sample Independent t (cont) The data does not provide support (P = 0.128) to the claim that there is a difference between the population mean tests for students and workers. 19 Two-sample Test (independent): CI Summary 𝑝𝑜𝑖𝑛𝑡 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 ± 𝑚 = 𝑝𝑜𝑖𝑛𝑡 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 ± 𝑡 ∗ (𝑑𝑓)𝑆𝐸 2 2 𝑠 𝑠 1 2 ∗ = 𝑥1 − 𝑥2 ± 𝑡 (𝑑𝑓) + 𝑛1 𝑛2 20 Example: two-sample Independent t A group of 15 college seniors are selected to participate in a manual dexterity skill test against a group of 20 industrial workers. Skills are assessed by scores obtained on a test taken by both groups. The data is shown in the following table: a) Perform a significance test to determine if the skills are the same for college students and industrial works at a significance level of 0.05. b) Calculate and interpret the 95% confidence interval. Group n x̅ s Students Workers 15 20 35.12 37.32 4.31 3.83 21 Example: two-sample Independent t (CI) (cont) We are 95% confidence that the difference between the population mean tests of students and workers is between -5.08 and 0.68. P-value = 0.128, (-5.08, 0.68) 22 Matched Pairs Procedures • To compare the responses to the two treatments in a matched-pairs design, find the difference between the responses within each pair. Then apply the one-sample t procedures to these differences. 23 Conditions for Inference: 2 - sample 1. Each pair is considered to be a sample from a population of pairs. • We have an SRS from the population of pairs. 2. Each pair is independent of the other pairs. 3. The difference of the each pair that we measure has a Normal distribution with mean D and standard deviation σD. 24 Two-sample Matched Pair 𝑠𝑑 𝑆𝐸 = 𝑛 𝑠𝑑 ∗ 𝑑 ± 𝑡 (𝑑𝑓) 𝑛 25 Two-sample matched pair Test: Summary Null hypothesis: H0: μD = d Test statistic: t sd / n Alternative Hypothesis One-sided: upper-tailed Ha: μD > One-sided: lower-tailed Ha: μD < two-sided Ha: μD ≠ P-Value P(T ≥ t) P(T ≤ t) 2P(T ≥ |t|) Note: If we are determining if the two populations are equal, then Δ = 0 26 Example: Paired t test Procedure In an effort to determine whether sensitivity training for nurses would improve the quality of nursing provided at an area hospital, the following study was conducted. Eight different nurses were selected and their nursing skills were given a score from 1 to 10. After this initial screening, a training program was administered, and then the same nurses were rated again. On the next slide is a table of their pre- and post-training scores. a) Conduct a test to determine whether the training could on average improve the quality of nursing provided in the population at a 0.01 significance level. b) Calculate and interpret the 99% lower confidence bound of the population mean difference in nursing scores? 27 Individual Pre-Training Post-Training Pre - Post 1 2 3 4 5 6 7 8 mean stdev 2.56 3.22 3.45 5.55 5.63 7.89 7.66 6.20 5.27 2.018 4.54 5.33 4.32 7.45 7.00 9.80 7.33 6.80 6.57 1.803 -1.98 -2.11 -0.87 -1.90 -1.37 -1.91 0.33 -0.60 -1.30 0.861 28 Example: Paired t test Procedure (cont) The data does provide strong support (P = 0.002) to the claim that the population average score did improve after training. 29 Example: Paired t test Procedure In an effort to determine whether sensitivity training for nurses would improve the quality of nursing provided at an area hospital, the following study was conducted. Eight different nurses were selected and their nursing skills were given a score from 1 to 10. After this initial screening, a training program was administered, and then the same nurses were rated again. On the next slide is a table of their pre- and post-training scores. a) Conduct a test to determine whether the training could on average improve the quality of nursing provided in the population at a 0.01 significance level. b) Calculate and interpret the 99% upper confidence bound of the population mean difference in nursing scores? 30 Individual Pre-Training Post-Training Pre - Post 1 2 3 4 5 6 7 8 mean stdev 2.56 3.22 3.45 5.55 5.63 7.89 7.66 6.20 5.27 2.018 4.54 5.33 4.32 7.45 7.00 9.80 7.33 6.80 6.57 1.803 -1.98 -2.11 -0.87 -1.90 -1.37 -1.91 0.33 -0.60 -1.30 0.861 31 Example: Paired t test Procedure (cont) We are 99% confident that the difference in the scores between pre-training and post-training scores is less than -0.39. P = 0.00185, < -0.39 32 Independent vs. Paired 1. If there is great heterogeneity between experimental units and a large correlation within experimental units then a paired experiment is preferable. 2. If the experimental units are relatively homogeneous and the correlation within pairs is not large, then unpaired experiments should be used 33 Robustness of the 2 sample tprocedure • The t-procedure is very robust against normality. Let n = n1 + n2 – n < 15 : population distribution should be close to normal. – 15 < n < 40: mild skewedness is acceptable – n > 40: procedure is usually valid. • Best when n1 n2 • Best when distributions are similar. 34 In Class (or HW): 2-sample Independent or Paired For the following questions, state which method is better; independent or paired and why. The following explanations are wrong: 1) there is no information for one of the methods, 2) the data is matched in the exercise, 3) the number of data points is different (or the same). 35 Example 1 Example 1: Does dress affect competence and intelligence ratings? Researchers performed a study to examine whether or not women are perceived as less competent and less intelligent when they dress in a sexy manner versus a business-like manner. Competence was rated from 1 (not at all) to 7 (extremely), and a 1 to 5 scale was used for intelligence. Under each condition, 17 subjects provided data. 36 Example 2 Example 2: Perceived quality of high- and lowperforming restaurants. A study classified 394 quick-service restaurants (QSR) into highperforming and low-performing groups based on their total sales. Each restaurant was rated on a collection of perceived measures of quality by a large number of diners using a 1 to 7 scale. In this study we view the diners as a measuring instrument, and our major interest is in comparing the 170 high-sales restaurants with the 224 lowsales restaurants. 37 Example 3 Example 3: Air in poultry-processing plants. The air in poultry-processing plants often contains fungus spores. If the ventilation is inadequate, this can affect the health of the workers. The problem is most serious during the summer. To measure the presence of spores, air samples are pumped to an agar plate and “colony-forming units (CFUs)” are counted after an incubation period. Here are data from two locations in a plant that processes 37,000 turkeys per day, taken on four days in the summer. 38 Example 4 Example 4: The manufacture of dyed clothing fabrics. Different fabrics respond differently when dyed. This matters to clothing manufacturers, who want the color of the fabric to be just right. Fabrics made of cotton and of ramie are dyed with the same “procion blue” die applied in the same way. A colorimeter is used to measure the lightness of the color on a scale in which black is 0 and white is 100. 39 Example 5 Example 5: Durable press and breaking strength. “Durable press” cotton fabrics are treated to improve their recovery from wrinkles after washing. Unfortunately, the treatment also reduces the strength of the fabric. A study compared the breaking strength of fabric treated by two commercial durable press processes. Five specimens of the same fabric were assigned at random to each process. 40 Example 6 Example 6: Brain training. The assessment of computerized brain-training programs is a rapidly growing area of research. Researchers are now focusing on who this training benefits most, what brain functions are most susceptible to improvement, and which products are most effective. A recent study looked at 487 community-dwelling adults aged 65 and older, each randomly assigned to one of two training groups. In one group, the participants used a computerized program 1 hour per day. In the other, DVD-based educational programs were shown and quizzes were administered after each video. The training period lasted 8 weeks. The response was the improvement in a composite score obtained from an auditory memory/attention survey given before and after the 8 weeks. 41 Example 7 Example 7: Occupation and diet. Do various occupational groups differ in their diets? A British study of this question compared 98 drivers and 83 conductors of London doubledecker buses. The conductors’ jobs require more physical activity. 42
© Copyright 2025