BE640 Exam II 2015 - University of Massachusetts Amherst

PubHlth 640 Exam 2 – Spring 2015
Name __________________________________________________
PubHlth 640 Intermediate Biostatistics
Spring 2015
Examination 2
Units 3, 4 and 5 – Discrete Distributions, Categorical Data Analysis &
Logistic Regression
Due: Wednesday April 22, 2015
Before you begin:
This is a “take-home” exam. You are welcome to use any reference materials you wish. You are
welcome to use the computer as you wish, too. However, you MUST work this exam by yourself and you
may not consult with anyone.
Instructions and Checklist:
__1. Start each problem on a new page.
__ 2. Write your name on every page.
__ 3. Make a photo-copy of your exam for safekeeping prior to submission
__ 4. Complete the signature page
__ 5. Please DO NOT submit a copy of the exam questions.
How to submit your exam (sorry – Faxed exams are NOTpermitted):
(1) ONLINE Students
Please be sure your name is somewhere on your submission. Next, save it as a
SINGLE FILE pdf using the naming convention lastname_exam2.pdf. Email it to me
at: cbigelow@schoolph.umass.edu
(2) Worcester Section.
The UMass calendar says
that Wednesday April 22 is a “Monday class schedule”. Tentatively, please bring your
exam (stapled, please) to class on Wednesday April 22, 2015. If you are unable to
come to class, I will accept a pdf (see instructions for online students).
We need to choose a night to meet during the week April 20-24, 2015.
(2) Amherst Section
The UMass
calendar says that Wednesday April 22 is a “Monday class schedule”. Tentatively,
please bring your exam (stapled, please) to class on Wednesday April 22, 2015. If you
are not coming to class, please put your exam in my mailbox, located in the mail room
on the 4th floor of Arnold house.
Tentatively, we will have an optional “lab” session on Wednesday April 22, 2015.
(3) ALL
I will also accept exams sent by U.S. Post. Please mail with postmark no later than
April 22, 2015 to:
Carol Bigelow
School of Public Health/402 Arnold House
University of Massachusetts/Amherst
715 North Pleasant Street
Amherst, MA 01003-9304
Tel. 413-545-1319.
\...\2015\docu\exams & solutions\BE640 Exam 2 2015.docx
Page 1 of 12
PubHlth 640 Exam 2 – Spring 2015
Name __________________________________________________
Signature
This is to confirm that in completing this exam, I worked independently and did not
consult with anyone.
Name: ___________________________________________________________
Date: ___________________________
Thank you!
\...\2015\docu\exams & solutions\BE640 Exam 2 2015.docx
Page 2 of 12
PubHlth 640 Exam 2 – Spring 2015
Name __________________________________________________
1. (10 points total)
It is believed that 25% of children exposed to a particular infectious agent become ill with the disease. In
100 playgroups of 4 children each, the following frequencies of disease were observed:
___________________________________________________________________
Number of Cases
Frequency
Expected Frequency
__________________________________________________________________
0
38
31.6
1
35
42.2
2
15
21.1
3
7
4.7
4
5
0.4
___________________________________________________________________
Set up the computations and verify the expected frequency numbers shown in the third column of this
table.
\...\2015\docu\exams & solutions\BE640 Exam 2 2015.docx
Page 3 of 12
PubHlth 640 Exam 2 – Spring 2015
Name __________________________________________________
2. (10 points)
Suppose it is known that a certain genetic mutation occurs in an insect population, on average, in 20 out
of 10,000 insects. Next suppose that 8,000 insects are sampled and a count is obtained of the number of
insects that have the genetic mutation. Let X be the appropriately defined Binomial distribution for this
setting and let Y be the appropriately defined Poisson distribution for this setting. Thus, the count of
insects with the genetic mutation is either X distributed Binomial or it is Y distributed Poisson.
2a. (6 points)
Using the appropriately defined Binomial and Poisson distributions, complete the following table
of probabilities:
X distributed Binomial
Y distributed Poisson
Pr [ X = 0 ] = _____
Pr [ Y = 0 ] = _____
Pr [ X > 1 ] = _____
Pr [ Y > 1 ] = _____
Pr [X < 5 ] = _____
Pr [Y < 5 ] = _____
2b. (2 points)
What are the values of the mean and standard deviation of the random variable X?
2c. (2 points)
What are the values of the mean and standard deviation of the random variable Y?
\...\2015\docu\exams & solutions\BE640 Exam 2 2015.docx
Page 4 of 12
PubHlth 640 Exam 2 – Spring 2015
Name __________________________________________________
3. (10 points total)
Twelve ants and eighteen flies were placed in a container with insecticide and observed. After sixteen
insects had died, there were nine ants alive and five flies alive. Apply the Fisher’s exact test to test the
null hypothesis that ants and flies are equally susceptible to the insecticide. Carry out the appropriate
statistical test to address this question.
3a. (2 points)
The null and alternative hypotheses. (Be sure to define your terms).
3b. (5 points)
The achieved level of significance (p-value).
3c. (3 points)
An interpretation of your findings in terms that a layperson can understand.
\...\2015\docu\exams & solutions\BE640 Exam 2 2015.docx
Page 5 of 12
PubHlth 640 Exam 2 – Spring 2015
Name __________________________________________________
4. (10 points total)
A study was made of 100 terminal cancer patients who were given either vitamin C or placebo as part of
their therapy. Patients differed in their age (AGE), gender (SEX), and location of tumor (SITE). Of
interest is the outcome 0/1 remission (REMISS) at 30 days. A subset of the data, which includes 40
patients is presented here.
Vitamin C Group
SITE
SEX
AGE
Stomach
Stomach
Stomach
Stomach
Stomach
F
M
F
F
M
61
69
62
66
63
Bronchus
Bronchus
Bronchus
Bronchus
Bronchus
M
M
M
M
F
Colon
Colon
Colon
Colon
Colon
Rectum
Rectum
Rectum
Rectum
Rectum
Placebo Group
REMISS
SITE
SEX
AGE
REMISS
Yes
No
Yes
No
yes
Stomach
Stomach
Stomach
Stomach
Stomach
F
F
M
M
M
58
71
63
45
57
No
No
Yes
Yes
no
74
74
66
52
48
No
Yes
No
No
No
Bronchus
Bronchus
Bronchus
Bronchus
Bronchus
M
F
F
M
M
74
50
66
50
87
Yes
Yes
No
Yes
No
F
F
M
M
F
76
58
49
69
70
Yes
Yes
Yes
Yes
No
Colon
Colon
Colon
Colon
Colon
F
M
M
F
F
35
50
89
67
55
Yes
No
No
Yes
No
F
F
F
M
M
56
75
57
56
68
No
Yes
Yes
Yes
No
Rectum
Rectum
Rectum
Rectum
Rectum
M
M
F
M
F
82
51
73
85
64
No
Yes
No
No
Yes
Carry out the appropriate statistical test to assess whether, overall, the data suggest that supplemental
treatment with vitamin C is effective with respect to the outcome of remission. Hint – This exercise also
asks you to use the information provided to construct the 2x2 table that you then analyze.
\...\2015\docu\exams & solutions\BE640 Exam 2 2015.docx
Page 6 of 12
PubHlth 640 Exam 2 – Spring 2015
Name __________________________________________________
5. (20 points)
Dear class – The data for this question are fictitious.
Consider the following case-control study investigation of the relationship of asbestos and lung cancer.
An important covariate is smoking.
You are given the 2x2 table distribution of asbestos exposure (yes/no) and lung cancer (yes/no), overall
and separately for strata defined by smoking (smokers and non-smokers)
Overall
Asbestos Exposure
Yes
No
Yes
80
15
Lung Cancer
No
38
152
Yes
75
5
Lung Cancer
No
20
80
Yes
5
10
Lung Cancer
No
18
72
Stratum = 1 (Smokers)
Asbestos Exposure
Yes
No
Stratum = 2 (Non-Smokers)
Asbestos Exposure
Yes
No
\...\2015\docu\exams & solutions\BE640 Exam 2 2015.docx
Page 7 of 12
PubHlth 640 Exam 2 – Spring 2015
Name __________________________________________________
5a. (4 points, subtotal)
What are the values of
(i) (1 point) the “overall” odds ratio?
(ii) (1 point) the Mantel-Haenszel estimate of the “overall” odds ratio?
(iii) (1 point) the stratum specific odds ratio for stratum =1 (Smokers)
(iv) (1 point) the stratum-specific odds ratio for stratum=2 (Non-smokers)
5b. (4 points)
Perform the appropriate statistical test of the null hypothesis of homogeneity of association.
5c (4 points)
Using your answer to question #5b, in your opinion, is there statistically significant evidence that
the relationship between asbestos exposure and lung cancer differ (is modified) by smoking
status?
5d. (4 points)
Perform the Mantel-Haenszel test of the null hypothesis of no association .
5e. (4 points)
In 2-3 sentences at most, what do you conclude?
\...\2015\docu\exams & solutions\BE640 Exam 2 2015.docx
Page 8 of 12
PubHlth 640 Exam 2 – Spring 2015
Name __________________________________________________
6. (10 points)
Suppose we want to learn about the relationship between education and prevalence of smoking in a
particular community. Consider a study (it’s hypothetical) of a simple random sample of 585 adults, all
of whom have completed at least a high school education and all of whom are of the same socio-economic
status. The explanatory variable is education with 5 levels. The outcome variable is current smoking
with 2 levels.
Education completed =
High School
Associate Degree
More than Associate, Some College
Undergraduate Degree
More than Undergraduate
Total
Current Smoker =
Yes
No
12
38
18
67
27
95
32
239
5
52
94
491
Total
50
85
122
271
57
585
Is there any statistically significant evidence of a downward trend in smoking prevalence associated with
higher level of education completed? Carry out the appropriate statistical test to address this question. In
reporting your answer, please state
6a. (2 points)
The null and alternative hypotheses. (Be sure to define your terms).
6b. (2 points)
The test statistic and its calculated value.
6c. (3 points)
The achieved level of significance.
6d. (3 points)
An interpretation of your findings in terms that a layperson can understand.
\...\2015\docu\exams & solutions\BE640 Exam 2 2015.docx
Page 9 of 12
PubHlth 640 Exam 2 – Spring 2015
Name __________________________________________________
7. (10 points total)
In a logistic regression analysis of the likelihood (π) of mortality that considered several variables, a one
predictor model was fit to malnutrition (MALNUT) coded 1=malnutrition, 0=NO malnutrition. The
following was obtained
logiˆt[πˆ ] = -1.8563 + 1.210[malnut]
The 2x2 table associated with these data is the following
Mortality
1 = Dead
0=Alive
MALNUT
1=Malnourished
0=NOT malnourished
11
10
21
64
32
74
21
85
106
7a. (4 points)
Verify that the regression coefficient (beta) for MALNUT in the logistic regression model is the natural
logarithm of the odds ratio for MALNUT in the 2x2 table. Show all work.
7b. (3 points)
Using the logistic regression model, what is the formula for the predicted probability of death for a person
who is malnourished? What is its calculated numeric value?
7c. (3 points)
Using the 2x2 table, what is the formula for the empirical estimate of the probability of death for a person
who is malnourished? Hint – the empirical estimate is simply the observed proportion. What is its calculated
numeric value?
\...\2015\docu\exams & solutions\BE640 Exam 2 2015.docx
Page 10 of 12
PubHlth 640 Exam 2 – Spring 2015
Name __________________________________________________
8. (10 points total)
A logistic regression model analysis was performed to investigate the relationship of sex, age, and
income with event of clinical depression (1=yes). The following results were obtained.
Sex (1=Female)
Age (per year)
Income (per $1,000)
Constant (intercept)
βˆ
ˆ ˆ
SE(β)
p-value
0.925
-0.024
-0.040
-0.477
0.393
0.009
0.014
0.867
0.02
0.01
0.01
0.19
Using this model, what is the estimated relative odds (odds ratio, OR) of clinical depression for a
female aged 60 with income $50,000 compared to a reference person who is male aged 45 with
income $75,000?
\...\2015\docu\exams & solutions\BE640 Exam 2 2015.docx
Page 11 of 12
PubHlth 640 Exam 2 – Spring 2015
Name __________________________________________________
9. (10 points total)
A survey of senior high-school students queried the use of each of cigarettes, alcohol, and marijuana. In
one analysis it was of interest to explore the association of cigarette use and/or alcohol use as predictors of
marijuana use. Thus, in this analysis, marijuana use (yes or no) was treated as the response variable Y. Y
was coded 1=yes and 0=no. The two other variables (cigarette use and alcohol use) were treated as
predictor variables. Each of these were also coded as 1=yes and 0=no. The following table shows the
output for a logistic regression model containing the two predictors ALCOHOL and CIGARETTES.
Predictor
Intercept
ALCOHOL
CIGARETTES
Coefficient, βˆ
Se Coeff, seˆ ⎡βˆ ⎤
Wald Z
p-value
-5.30904
2.98601
2.84789
0.475190
0.464671
0.163839
-11.17
6.43
17.38
< .0001
< .0001
< .0001
⎣ ⎦
9a. (2 points)
For the model summarized in the table, state the prediction equation for the estimated probability
( πˆ ) of marijuana use.
9b. (2 points)
ˆ alcohol=0, cigarettes=0 ) for a senior high-school
What is the estimated probability of marijuana use ( π
student who does not drink and who does not smoke cigarettes?
9c. (2 points)
ˆ alcohol=1, cigarettes=1) for a senior high-school
What is the estimated probability of marijuana use ( π
student who drinks and who also smokes cigarettes?
9d. (2 points)
Using the model fit summarized in the table below, complete the following table.
Estimated Probability of Marijuana Use ( πˆ ) , by Alcohol Use and Cigarette Use, Based on Model
Cigarette Use
Alcohol Use
Yes
No
Yes
______
_____t
No
______
_____
9e. (2 points)
In 1-3 sentences, what conclusions do you draw from these analyses?
\...\2015\docu\exams & solutions\BE640 Exam 2 2015.docx
Page 12 of 12