MA40189 - Solution Sheet Five

MA40189 - Solution Sheet Five
Simon Shaw, s.shaw@bath.ac.uk
http://people.bath.ac.uk/masss/ma40189.html
2014/15 Semester II
1. Let X1 , . . . , Xn be exchangeable so that the Xi are conditionally independent
given a parameter θ. For each of the following distributions for Xi | θ find
the Jeffreys prior and the corresponding posterior distribution for θ.
(a) Xi | θ ∼ Bern(θ).
The likelihood is
f (x | θ)
=
n
Y
θxi (1 − θ)1−xi
i=1
= θn¯x (1 − θ)n−n¯x .
As θ is univariate
∂2
log f (x | θ)
∂θ2
=
=
∂2
{n¯
x log θ + (n − n¯
x) log(1 − θ)}
∂θ2
n¯
x (n − n¯
x)
− 2 −
.
θ
(1 − θ)2
The Fisher information is thus
¯
¯ nX
(n − nX)
θ
I(θ) = −E − 2 −
θ
(1 − θ)2 ¯ | θ) {n − nE(X
¯ | θ)}
nE(X
=
+
2
θ
(1 − θ)2
n
n
n
=
+
=
.
θ
1−θ
θ(1 − θ)
The Jeffreys prior is then
f (θ) ∝
r
1
1
n
∝ θ− 2 (1 − θ)− 2 .
θ(1 − θ)
This is the kernel of a Beta( 12 , 12 ) distribution. This makes it straightforward to
obtain the posterior using the conjugacy of the Beta distribution relative to the
Bernoulli likelihood. The posterior is Beta( 12 + n¯
x, 21 + n − n¯
x). (If you are not
sure why, see Solution Sheet Four Question 1 (a) with α = β = 21 .)
1
(b) Xi | θ ∼ P o(θ).
The likelihood is
f (x | θ)
=
=
n
Y
θxi e−θ
xi !
i=1
n¯
x −nθ
θ e
Qn
i=1
xi !
.
As θ is univariate
∂2
log f (x | θ)
∂θ2
=
=
∂2
∂θ2
−
n¯
x log θ − nθ −
n
X
!
log xi !
i=1
n¯
x
.
θ2
The Fisher information is thus
¯ nX
I(θ) = −E − 2 θ
θ
¯ | θ)
nE(X
n
=
= .
2
θ
θ
The Jeffreys prior is then
r
f (θ) ∝
1
n
∝ θ− 2 .
θ
This is often expressed as the improper Gamma( 12 , 0) distribution. This makes
it straightforward to obtain the posterior using the conjugacy of the Gamma
x, n).
distribution relative to the Poisson likelihood. The posterior is Gamma( 12 +n¯
(If you are not sure why, see Solution Sheet Four Question 4 (a) with α = 12 ,
β = 0.)
(c) Xi | θ ∼ M axwell(θ), the Maxwell distribution with parameter θ so that
12
3
2
θx2i
2
2
f (xi | θ) =
θ xi exp −
, xi > 0
π
2
q
2
and E(Xi | θ) = 2 πθ
, V ar(Xi | θ) = 3π−8
πθ .
The likelihood is
1
f (x | θ)
n 2
Y
2
θx2i
exp −
=
θ
π
2
i=1
)
!
(
n
2
n
n
Y
3n
2
θX 2
2
=
θ2
xi exp −
x .
π
2 i=1 i
i=1
3
2
x2i
As θ is univariate
∂2
log f (x | θ)
∂θ2
=
∂2
∂θ2
= −
n
n
X
n
2
3n
θX 2
log +
log θ +
log x2i −
x
2
π
2
2 i=1 i
i=1
3n
.
2θ2
2
!
The Fisher information is
3n
I(θ) = −E − 2
2θ
θ = 3n .
2θ2
The Jeffreys prior is then
r
f (θ) ∝
3n
∝ θ−1 .
2θ2
Thus is often expressed as the improper Gamma(0, 0) distribution. This makes
it straightforward to obtain the posterior using the conjugacy of the Gamma
Pndistri1
2
bution relative to the Maxwell likelihood. The posterior is Gamma( 3n
i=1 xi ).
2 , 2
(If you are not sure why, see Solution Sheet Four Question 1 (c) with α = β = 0.)
2. Let X1 , . . . , Xn be exchangeable so that the Xi are conditionally independent
given a parameter λ. Suppose that Xi | λ ∼ Exp(λ) where λ represents the
rate so that E(Xi | λ) = λ−1 .
(a) Show that Xi | λ ∼ Exp(λ) is a member of the 1-parameter exponential
family. Hence, write down a sufficient statistic t(X) for X = (X1 , . . . , Xn )
for learning about λ.
We write
f (xi | λ) = λe−λxi = exp {−λxi + log λ}
which is of the form exp{φ1 (λ)u1 (xi ) + g(λ) + h(xi )} where φ1 (λ) = −λ, u1 (xi ) =
xi , g(λ) = log λ and h(xi ) = 0. From the Proposition given in Lecture 9 (see p26
of the on-line notes) a sufficient statistic is
t(X)
n
X
=
u1 (Xi ) =
i=1
n
X
Xi .
i=1
(b) Find the Jeffreys prior and comment upon whether or not it is improper. Find the posterior distribution for this prior.
The likelihood is
f (x | λ) =
n
Y
λe−λxi = λn e−λn¯x .
i=1
As θ is univariate
∂2
log f (x | θ)
∂θ2
=
=
∂2
(n log λ − λn¯
x)
∂θ2
n
− 2.
λ
The Fisher information is
n n
I(λ) = −E − 2 λ = 2 .
λ
λ
3
(1)
The Jeffreys prior is thus
r
f (λ) ∝
n
∝ λ−1 .
λ2
(2)
This is the improper Gamma(0, 0) distribution. The posterior can be obtained
using the conjugacy of the Gamma distribution relative to the Exponential likelihood. It is Gamma(n, n¯
x). (If you are not sure why, see Solution Sheet Three
Question 1 (a) with α = β = 0.)
(c) Consider the transformation φ = log λ.
i. By expressing L(λ) = f (x | λ) as L(φ) find the Jeffreys prior for φ.
Inverting φ = log λ we have λ = eφ . Substituting this into (1) we have
L(φ) = enφ exp −n¯
xeφ .
We have
∂2
log L(φ)
∂φ2
=
=
∂2
nφ − n¯
xeφ
∂φ2
−n¯
xeφ .
The Fisher information is
I(φ)
¯ φ φ
= −E −nXe
= nE(Xi | φ)eφ .
Now E(Xi | φ) = E(Xi | λ) = λ−1 = e−φ so that I(φ) = n. The Jeffreys prior
is then
√
fφ (φ) ∝ n ∝ 1
(3)
which is the improper uniform distribution on (−∞, ∞).
ii. By transforming the distribution of the Jeffreys prior for λ, f (λ),
find the distribution of φ.
From (2) we have that fλ (λ) ∝ λ−1 . Using the familiar change of variables
formula,
∂λ fφ (φ) = fλ (eφ )
∂φ
with
∂λ
∂φ
= eφ as λ = eφ , we have that
fφ (φ) ∝ |eφ |e−φ = 1
which agrees with (3). This is an illustration of the invariance to reparameterisation of the Jeffreys prior.
3. The Jeffreys prior for Normal distributions. In Lecture 10 we showed that
for an exchangeable collection X = (X1 , . . . , Xn ) with Xi | θ ∼ N (θ, σ 2 ) where
σ 2 is known the Jeffreys prior for θ is f (θ) ∝ 1.
4
(a) Consider, instead, that Xi | θ ∼ N (µ, θ) where µ is known. Find the
Jeffreys prior for θ.
The likelihood is
f (x | θ)
=
=
n
Y
1
1
exp − (xi − µ)2
2θ
2πθ
i=1
)
(
n
n
1 X
−
2
−n
(xi − µ) .
(2π) 2 θ 2 exp −
2θ i=1
√
As θ is univariate
∂2
log f (x | θ)
∂θ2
(
n
n
n
1 X
− log 2π − log θ −
(xi − µ)2
2
2
2θ i=1
=
∂2
∂θ2
=
n
1 X
n
−
(xi − µ)2 .
2θ2
θ3 i=1
)
The Fisher information is
(
= −E
I(θ)
)
n
1 X
n
2
−
(X
−
µ)
θ
i
2θ2
θ3 i=1
n
n
1 X +
E (Xi − µ)2 | θ
2
3
2θ
θ i=1
n
n
n
= − 2+ 2 =
.
2θ
θ
2θ2
= −
The Jeffreys prior is then
r
f (θ) ∝
n
= θ−1 .
2θ2
(b) Now suppose that Xi | θ ∼ N (µ, σ 2 ) where θ = (µ, σ 2 ). Find the Jeffreys
prior for θ.
The likelihood is
(
f (x | θ)
=
(2π)
−n
2
2 −n
2
(σ )
n
1 X
(xi − µ)2
exp − 2
2σ i=1
)
so that
log f (x | θ)
=
−
n
n
n
1 X
log 2π − log σ 2 − 2
(xi − µ)2 .
2
2
2σ i=1
As θ = (µ, σ 2 ) then the Fisher Information matrix is
o
o 

n 2
n 2
∂
∂
E ∂µ
E ∂µ∂σ
2 log f (x | θ) θ
2 log f (x | θ) θ


I(θ) = − 
o
o 
n 2
n 2
∂
∂
E ∂(σ2 )2 log f (x | θ) θ
E ∂µ∂σ2 log f (x | θ) θ
5
Now,
∂
log f (x | θ)
∂µ
∂
log f (x | θ)
∂(σ 2 )
=
n
1 X
(xi − µ);
σ 2 i=1
= −
n
1 X
n
+
(xi − µ)2 ;
2σ 2
2σ 4 i=1
so that
∂2
log f (x | θ)
∂µ2
=
∂2
log f (x | θ)
∂µ∂(σ 2 )
=
∂2
log f (x | θ)
∂(σ 2 )2
=
n
;
σ2
n
1 X
− 4
(xi − µ);
σ i=1
−
n
n
1 X
−
(xi − µ)2 .
2σ 4
σ 6 i=1
Noting that E(Xi − µ | θ) = 0 and E{(Xi − µ)2 | θ} = σ 2 , the Fisher information
matrix is
n
− σn2
0
0
σ2
I(θ) = −
=
.
n
n
0
0 2σn4
2σ 4 − σ 4
The Jeffreys prior is
f (θ) ∝
=
p
|I(θ)|
r
n2
∝ σ −3 .
2σ 6
(c) Comment upon your answers for these three Normal cases.
Suppose θ = (µ, σ 2 ) and Xi | θ ∼ N (µ, σ 2 ). If µ is unknown and σ 2 known then
the Jeffreys prior is f (µ) ∝ 1. If µ is known and σ 2 is unknown then the Jeffreys
prior is f (σ 2 ) ∝ σ −2 . When both µ and σ 2 are unknown then the Jeffreys
prior is f (µ, σ 2 ) ∝ σ −3 . Jeffreys himself found this inconsistent, arguing that
f (µ, σ 2 ) ∝ σ −2 , the product of the priors f (µ) and f (σ 2 ). Jeffreys’ argument was
that ignorance about µ and σ 2 should be represented by independent ignorance
priors for µ and σ 2 separately. However, it is not clear under what circumstances
this prior judgement of independence should be imposed.
4. Consider, given θ, a sequence of independent Bernoulli trials with parameter θ. We wish to make inferences about θ and consider two possible
methods. In the first, we carry out n trials and let X denote the total
number of successes in these trials. Thus, X | θ ∼ Bin(n, θ) with
n
fX (x | θ) =
θx (1 − θ)n−x , x = 0, 1, . . . , n.
x
In the second method, we count the total number Y of trials up to and
including the rth success so that Y | θ ∼ N bin(r, θ), the negative binomial
distribution, with
y−1
fY (y | θ) =
θr (1 − θ)y−r , y = r, r + 1, . . . .
r−1
6
(a) Obtain the Jeffreys prior distribution for each of the two methods. You
may find it useful to note that E(Y | θ) = θr .
For X | θ ∼ Bin(n, θ) we have
log fX (x | θ)
n
log
+ x log θ + (n − x) log(1 − θ)
x
=
so that
∂2
log fX (x | θ)
∂θ2
= −
x
(n − x)
−
.
2
θ
(1 − θ)2
The Fisher information is
(n − X) X
θ
−
θ2
(1 − θ)2 n
n
n
+
=
.
θ
1−θ
θ(1 − θ)
I(θ)
−E
=
=
−
The Jeffreys prior is thus
f (θ) ∝
r
1
1
n
∝ θ− 2 (1 − θ)− 2
θ(1 − θ)
which is a kernel of the Beta( 12 , 12 ) distribution.
For Y | θ ∼ N bin(r, θ) we have
log fY (y | θ)
=
log
y−1
r−1
+ r log θ + (y − r) log(1 − θ)
so that
∂2
log fY (y | θ)
∂θ2
= −
r
(y − r)
−
.
2
θ
(1 − θ)2
The Fisher information is
I(θ)
=
=
=
−E
−
r
(Y − r) −
θ
θ2
(1 − θ)2 r( θ1 − 1)
r
+
θ2
(1 − θ)2
r
r
r
+
= 2
.
θ2
θ(1 − θ)
θ (1 − θ)
The Jeffreys prior is thus
f (θ) ∝
r
1
r
∝ θ−1 (1 − θ)− 2
θ2 (1 − θ)
which can be viewed as the improper Beta(0, 12 ) distribution.
7
(b) Suppose we observe x = r and y = n. For each method, calculate the
posterior distribution for θ with the Jeffreys prior. Comment upon
your answers.
We summarise the results in a table.
Prior
Likelihood
Beta( 12 , 12 ) X | θ ∼ Bin(n, θ); θx (1 − θ)n−x
Beta(0, 21 ) Y | θ ∼ N bin(r, θ); θr (1 − θ)y−r
Posterior
Beta( 21 + x, 21 + n − x)
Beta(r, 21 + y − r)
Notice that if x = r and y = n then the two approaches have identical likelihoods:
in both cases we observed x successes in n trials but θ | x ∼ Beta( 12 + x, 12 + n − x)
and θ | y ∼ Beta(x, 12 + n − x). Although the likelihoods are the same, Jeffreys’
approach yields different posterior distributions which seems to contradict the
notion of an noninformative prior. This occurs because Jeffreys’ prior violates the
likelihood principle. In short this principle states that the likelihood contains
all the information about the data x so that two likelihoods contain the same
information if they are proportional. In this case, the two likelihoods are identical but yield different posterior distributions. Classical statistics violates the
likelihood principle but Bayesian statistics (using proper prior distributions) does
not.
8