MA40189 - Solution Sheet Five Simon Shaw, s.shaw@bath.ac.uk http://people.bath.ac.uk/masss/ma40189.html 2014/15 Semester II 1. Let X1 , . . . , Xn be exchangeable so that the Xi are conditionally independent given a parameter θ. For each of the following distributions for Xi | θ find the Jeffreys prior and the corresponding posterior distribution for θ. (a) Xi | θ ∼ Bern(θ). The likelihood is f (x | θ) = n Y θxi (1 − θ)1−xi i=1 = θn¯x (1 − θ)n−n¯x . As θ is univariate ∂2 log f (x | θ) ∂θ2 = = ∂2 {n¯ x log θ + (n − n¯ x) log(1 − θ)} ∂θ2 n¯ x (n − n¯ x) − 2 − . θ (1 − θ)2 The Fisher information is thus ¯ ¯ nX (n − nX) θ I(θ) = −E − 2 − θ (1 − θ)2 ¯ | θ) {n − nE(X ¯ | θ)} nE(X = + 2 θ (1 − θ)2 n n n = + = . θ 1−θ θ(1 − θ) The Jeffreys prior is then f (θ) ∝ r 1 1 n ∝ θ− 2 (1 − θ)− 2 . θ(1 − θ) This is the kernel of a Beta( 12 , 12 ) distribution. This makes it straightforward to obtain the posterior using the conjugacy of the Beta distribution relative to the Bernoulli likelihood. The posterior is Beta( 12 + n¯ x, 21 + n − n¯ x). (If you are not sure why, see Solution Sheet Four Question 1 (a) with α = β = 21 .) 1 (b) Xi | θ ∼ P o(θ). The likelihood is f (x | θ) = = n Y θxi e−θ xi ! i=1 n¯ x −nθ θ e Qn i=1 xi ! . As θ is univariate ∂2 log f (x | θ) ∂θ2 = = ∂2 ∂θ2 − n¯ x log θ − nθ − n X ! log xi ! i=1 n¯ x . θ2 The Fisher information is thus ¯ nX I(θ) = −E − 2 θ θ ¯ | θ) nE(X n = = . 2 θ θ The Jeffreys prior is then r f (θ) ∝ 1 n ∝ θ− 2 . θ This is often expressed as the improper Gamma( 12 , 0) distribution. This makes it straightforward to obtain the posterior using the conjugacy of the Gamma x, n). distribution relative to the Poisson likelihood. The posterior is Gamma( 12 +n¯ (If you are not sure why, see Solution Sheet Four Question 4 (a) with α = 12 , β = 0.) (c) Xi | θ ∼ M axwell(θ), the Maxwell distribution with parameter θ so that 12 3 2 θx2i 2 2 f (xi | θ) = θ xi exp − , xi > 0 π 2 q 2 and E(Xi | θ) = 2 πθ , V ar(Xi | θ) = 3π−8 πθ . The likelihood is 1 f (x | θ) n 2 Y 2 θx2i exp − = θ π 2 i=1 ) ! ( n 2 n n Y 3n 2 θX 2 2 = θ2 xi exp − x . π 2 i=1 i i=1 3 2 x2i As θ is univariate ∂2 log f (x | θ) ∂θ2 = ∂2 ∂θ2 = − n n X n 2 3n θX 2 log + log θ + log x2i − x 2 π 2 2 i=1 i i=1 3n . 2θ2 2 ! The Fisher information is 3n I(θ) = −E − 2 2θ θ = 3n . 2θ2 The Jeffreys prior is then r f (θ) ∝ 3n ∝ θ−1 . 2θ2 Thus is often expressed as the improper Gamma(0, 0) distribution. This makes it straightforward to obtain the posterior using the conjugacy of the Gamma Pndistri1 2 bution relative to the Maxwell likelihood. The posterior is Gamma( 3n i=1 xi ). 2 , 2 (If you are not sure why, see Solution Sheet Four Question 1 (c) with α = β = 0.) 2. Let X1 , . . . , Xn be exchangeable so that the Xi are conditionally independent given a parameter λ. Suppose that Xi | λ ∼ Exp(λ) where λ represents the rate so that E(Xi | λ) = λ−1 . (a) Show that Xi | λ ∼ Exp(λ) is a member of the 1-parameter exponential family. Hence, write down a sufficient statistic t(X) for X = (X1 , . . . , Xn ) for learning about λ. We write f (xi | λ) = λe−λxi = exp {−λxi + log λ} which is of the form exp{φ1 (λ)u1 (xi ) + g(λ) + h(xi )} where φ1 (λ) = −λ, u1 (xi ) = xi , g(λ) = log λ and h(xi ) = 0. From the Proposition given in Lecture 9 (see p26 of the on-line notes) a sufficient statistic is t(X) n X = u1 (Xi ) = i=1 n X Xi . i=1 (b) Find the Jeffreys prior and comment upon whether or not it is improper. Find the posterior distribution for this prior. The likelihood is f (x | λ) = n Y λe−λxi = λn e−λn¯x . i=1 As θ is univariate ∂2 log f (x | θ) ∂θ2 = = ∂2 (n log λ − λn¯ x) ∂θ2 n − 2. λ The Fisher information is n n I(λ) = −E − 2 λ = 2 . λ λ 3 (1) The Jeffreys prior is thus r f (λ) ∝ n ∝ λ−1 . λ2 (2) This is the improper Gamma(0, 0) distribution. The posterior can be obtained using the conjugacy of the Gamma distribution relative to the Exponential likelihood. It is Gamma(n, n¯ x). (If you are not sure why, see Solution Sheet Three Question 1 (a) with α = β = 0.) (c) Consider the transformation φ = log λ. i. By expressing L(λ) = f (x | λ) as L(φ) find the Jeffreys prior for φ. Inverting φ = log λ we have λ = eφ . Substituting this into (1) we have L(φ) = enφ exp −n¯ xeφ . We have ∂2 log L(φ) ∂φ2 = = ∂2 nφ − n¯ xeφ ∂φ2 −n¯ xeφ . The Fisher information is I(φ) ¯ φ φ = −E −nXe = nE(Xi | φ)eφ . Now E(Xi | φ) = E(Xi | λ) = λ−1 = e−φ so that I(φ) = n. The Jeffreys prior is then √ fφ (φ) ∝ n ∝ 1 (3) which is the improper uniform distribution on (−∞, ∞). ii. By transforming the distribution of the Jeffreys prior for λ, f (λ), find the distribution of φ. From (2) we have that fλ (λ) ∝ λ−1 . Using the familiar change of variables formula, ∂λ fφ (φ) = fλ (eφ ) ∂φ with ∂λ ∂φ = eφ as λ = eφ , we have that fφ (φ) ∝ |eφ |e−φ = 1 which agrees with (3). This is an illustration of the invariance to reparameterisation of the Jeffreys prior. 3. The Jeffreys prior for Normal distributions. In Lecture 10 we showed that for an exchangeable collection X = (X1 , . . . , Xn ) with Xi | θ ∼ N (θ, σ 2 ) where σ 2 is known the Jeffreys prior for θ is f (θ) ∝ 1. 4 (a) Consider, instead, that Xi | θ ∼ N (µ, θ) where µ is known. Find the Jeffreys prior for θ. The likelihood is f (x | θ) = = n Y 1 1 exp − (xi − µ)2 2θ 2πθ i=1 ) ( n n 1 X − 2 −n (xi − µ) . (2π) 2 θ 2 exp − 2θ i=1 √ As θ is univariate ∂2 log f (x | θ) ∂θ2 ( n n n 1 X − log 2π − log θ − (xi − µ)2 2 2 2θ i=1 = ∂2 ∂θ2 = n 1 X n − (xi − µ)2 . 2θ2 θ3 i=1 ) The Fisher information is ( = −E I(θ) ) n 1 X n 2 − (X − µ) θ i 2θ2 θ3 i=1 n n 1 X + E (Xi − µ)2 | θ 2 3 2θ θ i=1 n n n = − 2+ 2 = . 2θ θ 2θ2 = − The Jeffreys prior is then r f (θ) ∝ n = θ−1 . 2θ2 (b) Now suppose that Xi | θ ∼ N (µ, σ 2 ) where θ = (µ, σ 2 ). Find the Jeffreys prior for θ. The likelihood is ( f (x | θ) = (2π) −n 2 2 −n 2 (σ ) n 1 X (xi − µ)2 exp − 2 2σ i=1 ) so that log f (x | θ) = − n n n 1 X log 2π − log σ 2 − 2 (xi − µ)2 . 2 2 2σ i=1 As θ = (µ, σ 2 ) then the Fisher Information matrix is o o n 2 n 2 ∂ ∂ E ∂µ E ∂µ∂σ 2 log f (x | θ) θ 2 log f (x | θ) θ I(θ) = − o o n 2 n 2 ∂ ∂ E ∂(σ2 )2 log f (x | θ) θ E ∂µ∂σ2 log f (x | θ) θ 5 Now, ∂ log f (x | θ) ∂µ ∂ log f (x | θ) ∂(σ 2 ) = n 1 X (xi − µ); σ 2 i=1 = − n 1 X n + (xi − µ)2 ; 2σ 2 2σ 4 i=1 so that ∂2 log f (x | θ) ∂µ2 = ∂2 log f (x | θ) ∂µ∂(σ 2 ) = ∂2 log f (x | θ) ∂(σ 2 )2 = n ; σ2 n 1 X − 4 (xi − µ); σ i=1 − n n 1 X − (xi − µ)2 . 2σ 4 σ 6 i=1 Noting that E(Xi − µ | θ) = 0 and E{(Xi − µ)2 | θ} = σ 2 , the Fisher information matrix is n − σn2 0 0 σ2 I(θ) = − = . n n 0 0 2σn4 2σ 4 − σ 4 The Jeffreys prior is f (θ) ∝ = p |I(θ)| r n2 ∝ σ −3 . 2σ 6 (c) Comment upon your answers for these three Normal cases. Suppose θ = (µ, σ 2 ) and Xi | θ ∼ N (µ, σ 2 ). If µ is unknown and σ 2 known then the Jeffreys prior is f (µ) ∝ 1. If µ is known and σ 2 is unknown then the Jeffreys prior is f (σ 2 ) ∝ σ −2 . When both µ and σ 2 are unknown then the Jeffreys prior is f (µ, σ 2 ) ∝ σ −3 . Jeffreys himself found this inconsistent, arguing that f (µ, σ 2 ) ∝ σ −2 , the product of the priors f (µ) and f (σ 2 ). Jeffreys’ argument was that ignorance about µ and σ 2 should be represented by independent ignorance priors for µ and σ 2 separately. However, it is not clear under what circumstances this prior judgement of independence should be imposed. 4. Consider, given θ, a sequence of independent Bernoulli trials with parameter θ. We wish to make inferences about θ and consider two possible methods. In the first, we carry out n trials and let X denote the total number of successes in these trials. Thus, X | θ ∼ Bin(n, θ) with n fX (x | θ) = θx (1 − θ)n−x , x = 0, 1, . . . , n. x In the second method, we count the total number Y of trials up to and including the rth success so that Y | θ ∼ N bin(r, θ), the negative binomial distribution, with y−1 fY (y | θ) = θr (1 − θ)y−r , y = r, r + 1, . . . . r−1 6 (a) Obtain the Jeffreys prior distribution for each of the two methods. You may find it useful to note that E(Y | θ) = θr . For X | θ ∼ Bin(n, θ) we have log fX (x | θ) n log + x log θ + (n − x) log(1 − θ) x = so that ∂2 log fX (x | θ) ∂θ2 = − x (n − x) − . 2 θ (1 − θ)2 The Fisher information is (n − X) X θ − θ2 (1 − θ)2 n n n + = . θ 1−θ θ(1 − θ) I(θ) −E = = − The Jeffreys prior is thus f (θ) ∝ r 1 1 n ∝ θ− 2 (1 − θ)− 2 θ(1 − θ) which is a kernel of the Beta( 12 , 12 ) distribution. For Y | θ ∼ N bin(r, θ) we have log fY (y | θ) = log y−1 r−1 + r log θ + (y − r) log(1 − θ) so that ∂2 log fY (y | θ) ∂θ2 = − r (y − r) − . 2 θ (1 − θ)2 The Fisher information is I(θ) = = = −E − r (Y − r) − θ θ2 (1 − θ)2 r( θ1 − 1) r + θ2 (1 − θ)2 r r r + = 2 . θ2 θ(1 − θ) θ (1 − θ) The Jeffreys prior is thus f (θ) ∝ r 1 r ∝ θ−1 (1 − θ)− 2 θ2 (1 − θ) which can be viewed as the improper Beta(0, 12 ) distribution. 7 (b) Suppose we observe x = r and y = n. For each method, calculate the posterior distribution for θ with the Jeffreys prior. Comment upon your answers. We summarise the results in a table. Prior Likelihood Beta( 12 , 12 ) X | θ ∼ Bin(n, θ); θx (1 − θ)n−x Beta(0, 21 ) Y | θ ∼ N bin(r, θ); θr (1 − θ)y−r Posterior Beta( 21 + x, 21 + n − x) Beta(r, 21 + y − r) Notice that if x = r and y = n then the two approaches have identical likelihoods: in both cases we observed x successes in n trials but θ | x ∼ Beta( 12 + x, 12 + n − x) and θ | y ∼ Beta(x, 12 + n − x). Although the likelihoods are the same, Jeffreys’ approach yields different posterior distributions which seems to contradict the notion of an noninformative prior. This occurs because Jeffreys’ prior violates the likelihood principle. In short this principle states that the likelihood contains all the information about the data x so that two likelihoods contain the same information if they are proportional. In this case, the two likelihoods are identical but yield different posterior distributions. Classical statistics violates the likelihood principle but Bayesian statistics (using proper prior distributions) does not. 8
© Copyright 2024