Regression Analysis of Data from a Cluster Sample

Regression Analysis of Data from a Cluster Sample
Dan Pfeffermann; Gad Nathan
Journal of the American Statistical Association, Vol. 76, No. 375. (Sep., 1981), pp. 681-689.
Stable URL:
http://links.jstor.org/sici?sici=0162-1459%28198109%2976%3A375%3C681%3ARAODFA%3E2.0.CO%3B2-Z
Journal of the American Statistical Association is currently published by American Statistical Association.
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained
prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in
the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/journals/astata.html.
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic
journals and scholarly literature from around the world. The Archive is supported by libraries, scholarly societies, publishers,
and foundations. It is an initiative of JSTOR, a not-for-profit organization with a mission to help the scholarly community take
advantage of advances in technology. For more information regarding JSTOR, please contact support@jstor.org.
http://www.jstor.org
Sat Dec 15 10:44:14 2007
Regression Analysis of Data From a Cluster Sample DAN PFEFFERMANN and GAD NATHAN*
The purpose is to estimate linear combinations, P,
=
+;Pi, of the regression coefficients with known
weights, wi, on the basis of a sample, S, which includes
units from only part of the groups, Ui.
We denote the sample by S = {(i, j): i E S*, j E Si),
where S * is a sample of n groups and Si is a sample of
mi units in the ith group. We do not specify the sampling
design except that it is a probability measure on the set
of all possible samples.
Regression models which assume different vectors of
coefficients in different groups are widely treated in the
literature. Of special interest is the work of Zellner (1962)
on the estimation of seemingly unrelated regressions. In
the case of nonzero covariances between the random
deviations, eij, relating to different groups, and suffiKEY WORDS: Regression; Complex samples; Cluster ciently large samples, S,, from each group, Ui, i = 1,
. . . , N, the estimators proposed for the individual coefsampling; Extended least squares; Bayesian estimator.
ficients, based on the whole sample, S , are better than
those obtained by estimating each coefficient Pi from the
1. INTRODUCTION
corresponding sample Si. Zellner assumes however anonThe problem of regression analysis for data from a empty sample, Si, from each group, Ui, and thus, does
cluster sample, with different regression relationships in not deal with the case of groups not represented in the
different clusters, can be formulated as follows. For each sample.
unit, u, of a finite population, U, containing M identifiable
The model defined by ( I . 1) through (1.3) was investiunits, there are observations on two variables, Y and X. gated by Konijn (1962), who defined the parameter of inU is decomposed into N exclusive and exhaustive groups terest as the weighted average, ~ K " = , ; P ; I E K , M i ,
(clusters), UI, . . . , UN, with Ui = {uil, . . . , u ; ~ , ) , and, for two-stage cluster sampling, proposed the Horvitzand 2 K 1 M~ = M.
Thompson type estimator (EiES*
M ~ B ~ / ~ELl
F ~ ) /Mi,
The random variable, Yij, and the observation, xij, on where
is the ordinary least squares estimator of Pi
X, associated with the unit uij, are assumed to have the from the sample, Si, and Tri is the inclusion probability
linear relationship
of Ui, i = 1, . . . , N. Similarly, Porter (1973) treats
the case in which the various groups, U;, are different
units, with t available observations for each unit, and
the parameter of interest is defined as the simple mean,
2
: di/N.
for some vector P' = ( P I , . . . , PN), where the random
The problem under consideration is closely related to
deviations, eii, satisfy
the common problem of estimating totals, T = EE zi,
E(€ij 1
= 0, for all i and j,
(1.2) of finite populations with complication caused by the fact
that the coefficients of sampled groups also have to be
and :
estimated. Two basic approaches for estimating totals of
finite populations are the design-based approach and the
mi2; i = k , j = I
E ( ~ i j ~I kXi,j,
/ X ~ I )=
(1.3) model-based approach, discussed fully, for example, by
0;
otherwise.
Smith (1976) and by Sarndal(1978). Basically, the designbased approach assigns randomness only to the sample
design and assumes no prior relationship between the
* Dan Pfeffermann is Lecturer and Gad Nathan is Associate Professor, at the Department of Statistics, Hebrew University of Jerusalem. values zi. The model-based approach assumes that the
They are also both associated with the Israel Central Bureau of Statis- values, zi, are realized outcomes of random variables, Zi,
For the case of different regression relationships in different subgroups of a finite population, only part of which
are sampled, the extended least squares estimator of any
weighted average of the distinct coefficients is derived,
under assumptions relating only to the first two moments
of the distribution of the coefficients. Under these assumptions, the estimator is shown to be the best linear
&unbiased estimator, while under further distributional
assumptions, it is also the Bayesian estimator for a quadratic loss function. For the case of unknown variances
a method for estimating them from the sample is proposed. The empirical estimator thus obtained is shown
to perform well by a simulation comparison with the optimal estimator and with other proposed empirical
estimators.
2z
pi
tics. A previous version of this paper was presented as an invited paper
at the third meeting of the International Association of Survey Statisticians, New Delhi, 1977, and it is partially based on the thesis of the
first author, written under the supervision of the second author. The
authors are grateful to anonymous referees for their helpful comments,
and to Israel Einot for programming the simulation computations.
0 Journal of the American Statistical Association
September 1981, Volume 76, Number 375
Theory and Methods Section
682
Journal of the American Statistical Association, September 1981
which are related by a mathematical model, while the
sample design has no role in the inference process.
Konijn (1962) and Porter (1973) obviously follow the
design-based approach. Since they assume no prior relationship between the coefficients, the estimators they
propose have no optimal properties. In this they are similar to the estimators of totals for finite populations, under
the design-based approach.
The fact that observations are available only for part
of the groups naturally leads to the formulation of a model
which defines the relationships between the coefficients,
p,. Scott and Smith (1969), Lindley and Smith (1972), and
Sedransk (1977) provide typical examples of this alternative approach, in which the coefficients are assumed
to be generated from a normal distribution. By assuming,
in addition, a normal distribution for the random deviations, ei,,the Bayesian estimators of the coefficients and
of the population total are derived. Only Scott and Smith
(1969), however, deal with the special situation where
only part of the groups are represented in the sample,
and their results are restricted to a model which is analogous to the random effects model in analysis of variance.
For most of our results we can relax the normal assumptions and assume only that the unknown coefficients
are uncorrelated random variables with the same expectation and variance, that is,
a2; i =
E(vivk)
however, the assumed model, even in its general form,
is sufficient for the derivation of best linear unbiased
estimators of P,,..
Note that the restriction to the case of simple regression
is for simplicity only and that the generalization to the
case of multiple regression is straightforward. Similarly,
more general variance-covariance matrices can easily be
considered (Pfeffermann 1978). Finally, although we consider the coefficients, pi, as random variables, our main
concern is in estimating their weighted average, P,,
rather than in estimating their common expectation, p.
This differs somewhat from the random coefficient
regression (RCR) approach proposed in econometrics for
analyzing cross-sectional data; and although it also considers the estimation of the realizations, pi, it focuses
attention on the estimation of p (Zellner 1966; Swamy
1970, 1971; and Swamy and Mehta 1975). Note that by
this approach the distribution of the coefficients P, represents only physical probability. In many applications
the specific realizations of the random coefficients in an
actual finite population rather than their expectation over
all possible realizations is of primary interest, especially
when the variance ti2 is relatively high.
An important special case, which requires the estimation of linear combinations of the coefficients, is that
in which we wish to predict the value Y(u) of a unit, u,
selected at random from the population, given that its X
value, X(u), is a given value, x; defining the predictor as
the conditional expectation m(x) = E(Y(u) I X(u) = x),
k
(1.6)
=
0;
otherwise.
These assumptions may be considered as relating to a
random process which generates the values of the coefficients in the various groups or, alternatively, as relating
to their prior subjective distribution. In particular, these
assumptions hold if the Pi's are exchangeable and uncorrelated, which implies that the group labels (i = 1,
. . . , N ) provide no information on the values of the
corresponding coefficients (Ericson 1969). It should be
noted that the common mean, p, is fixed but unknown.
Considering the coefficients as random variables necessitates further characterization of the sample design
and we assume that it is noninformative, in the sense that
the selection of the sample is independent of the realizations of the Pi's, that is,
Although we basically adopt the model-based approach, our assumptions about the model are very general
and are suitable for many practical problems. Since, as
pointed out by Ericson (1969), the ideas that underlie the
assumption of an exchangeable prior distribution are very
similar to those.underlying the use of simple random sampling in the classical approach, our assumptions can be
considered, in a certain sense, as a compromise between
the two approaches. As shown in the following sections,
where Pi(x) = P(u E U I X(u) = x). The estimation of
m(x) can be considered as transforming microrelationships into macrorelationships. For example, the assumptions (1.1) through (1.3) may be postulated as relating to
those between the price of an apartment and its size,
where the groups, Ui, represent different localities. The
estimation of (1.8) may be required as representing an
average price, over all localities, of an apartment of given
size, x, the weights Pi(x) being the known proportions of
apartments of that size in the different localities. Applying
this procedure, for different sizes, may then serve for
constructing an apartment price index, adjusted for
changes in size. This procedure is indeed currently being
considered by the Israel Central Bureau of Statistics. The
estimation of a single overall finite population regression
coefficient (Kish and Frankel 1974) would certainly not
be satisfactory for this situation.
In the following section we derive the extended least
squares estimator, @,(a),of p, and investigate its relationship to estimators proposed under the design-based
approach. In Section 3 we prove that this estimator is the
best linear &unbiased estimator, where 6 denotes the joint
distribution of the vector P and the variables Yij, (i, j)
E S. For the normal case (with a vague prior distribution
for p), @,(a) is shown to be the posterior expectation of
Pfeffermannand Nathan: Regression Analysis for Clustered Sampling
683
P,,, and thus to be the Bayesian estimator of P,,. under, dividual coefficients, pi, and of their expectation, p:
for example, a quadratic loss function. In Section 4 the
problem of unknown variances is considered and a
method for their estimation from the sample is proposed.
@(a)= i = I hipi
i = I hi,
By replacing the unknown variances by their sample estimates in p,,.(a),the empirical estimator, p,,.(e),of P,, is where
obtained. The final section presents some simulation
comparisons of the performance of @,,,(e)with that of the
optimal estimator, p,.(a), and other empirical estimators
obtained by using different estimators of the unknown
variances.
(0;
otherwise,
2. ESTIMATING P,, BY EXTENDED LEAST SQUARES
and
i
Throughout this and the next section the variances, u?
and S2, are assumed to be known. The case of unknown
variances will be treated in Section 4. We also assume
that the X values are fixed for given units. Generalization
to the case in which the xij's are observations on random
variables, which are independent of the random deviations, ek,, for all pairs (k, I), and of the coefficients, Pk,
for every k , is immediate.
We follow Duncan and Horn (1972) and Haitovsky
(1973) and write (1.1) for a given sample of units, together
with (1.4) as a single linear model,
/i
Pi =
A
(2.8)
10;
otherwise.
pi
is the classical least squares estimator of pi for a
sampled group and is defined as zero otherwise. Equation
(2.5) defines, for i E S * , pi(a) as the weighted average
of the least squares estimator, pi, and of the estimator
of the common expectation, p, with weights X i and 1
- Xi, respectively. The weight, hi, is a decreasing function of the ratio between u i 2 / ~ j , s ,xi: (the conditional
variance of around pi) and S2 (the variance of pi around
PI.
The smaller the ratio, the more weight given to the
least squares estimator, compared to that given to the
estimator of the expectation. Coefficients of nonsampled
groups are estimated by the estimator of the expectation,
p(a), which is a weighted average of the least squares
estimators, the weights being determined by the reciprocals of the variances:
- P)2 = S2 + ui2IZjES,
x.:i This straightforward result stems from the assumption that coefficients of different groups are uncorrelated
and is the same as that obtained from applying the random
coefficient regression (RCR) approach of Swamy (1970).
In fact, p(a) is the classical least squares estimator of p
model holding for the
obtained from the
least squares estimators pi, i S* (see also set. 4).
If the vector ( w l , . . . , wN, 0) is denoted by WO', the
EL^ estimator of @,, is defined as
pi
with the following notations: Yo' = (Y'I, . . . , Y', , OfN),
where Yi denotes the Y values observed for si (the sample
from the ith selected cluster) and Ork denotes in general
a zeros vector of order k ;
where xi denotes the X values for si and Om,N- is a zeros
matrix of order m
( N - n), m = Z:=Imi; b and
denote an identity matrix and a units vector of order k;
E ' = (erI . . . ern)where ~i denotes the vector of random
errors corresponding to the sampled units in si; and
v' = (v, , . . . , vN), vi is defined by (1.4).
We assume E(eij vi) = 0, for all i and j. Then from
(1.2), (1.3), (1.3, and (1.6) we obtain
*
E(eO)= Om + N
E(eOeor)
=
diag(u12I r m ,, . . . , u? lrm,,,Zj21IN)
Consider the following estimator of Po:
~ ~ ( 6 ~
N
bw(a) = w O ' p o ( a )=
2 wipi(a).
(2.9)
i= 1
It is of special interest to follow the limiting behavior
of this estimator for very large and for very small values
(2.3) of the variances. We assume for simplicity that
wi
(2.2)
+
When S2 becomes small and the ratios ci = u?
xi: (i = 1, . . . , n) are bounded from below,
iES* jES;
Following Haitovsky (1973), we term p ( a ) the exlim
&,,(a) =
(2.10)
6*-0
2 xij2/5?
tended least squares (ELS) estimator (or predictor) of
i€S* j€S,
p . Straightforward algebraic manipulations imply the
following expressions for the ELS estimators of the in- which is the estimator of a common regression coefficient
9
684
Journal of the American Statistical Association, September 1981
computed from all the observations. Since 6'+ 0 implies
Thus, the likelihood of Po, given the sample, S , and
that the coefficients tend to be equal, this is an obvious the vector of observations, Y, is
result.
1 "
When the ratios, ci, decrease to zero and 6' remains
IIPOI (S, Y)] .Iexp{- - C,
(yij - PixiJ)2/v:
2 i = 1 jes,
bounded (or when 6' becomes large and the ratios ci are
bounded),
(3.5)
Equations (3.2) and (3.3) imply that the joint prior dis- tribution of B0 is =
C, wipi
iES*
+
(2.11)
C, wip*
=
p,(C)
igS*
piln.
where p* = 2;=1
In particular, this implies that coefficients of sampled
groups are estimated by their least squares estimators,
while coefficients of nonsampled groups are estimated by
the simple mean of the estimators of the coefficients of
sampled groups. For this limiting case the procedure is
analogous to that suggested by tiaditional sampling theory for estimating totals of finite populations, based on
simple random sampling without replacement, with the
only distinction that the coefficients of sampled groups
are replaced by their least squares estimators. Furthermore, when estimating p 7 EE, PiIN, the estimator
obtained when 6' + rn is P* and the same estimator is
also obtained when all the conditional variances, u;/
xlJ2,are equal. @*was proposed by Porter (1973)
for estimating in the case of simple random sampling.
It is easy to verify that in this case @ *is an unbiased
estimator of p , for any given vector p , where the expectation is over the distribution of the least squares estimators for given sampled groups and over the random
selection of groups. The correspondence between the results under the assumption of an exchangeable prior and
under simple random sampling has been noted in a different context by Ericson (1969).
The posterior distribution of Po is proportional to the
product of the likelihood function and the prior distribution that, after some algebraic manipulations, can be
written as
P(POI (S, Y)1
Thus the posterior distribution of Po' is multinormal
with expectation S0(a) and variance covariance matrix
(P'v-lXO)-l.
The fact that Po has a multinormal posterior distribution implies that for any given vector of weights w' =
(w1, . . . , wN), Pw has a normal posterior distribution,
so that @,(a) is the Bayesian estimator of Pw for every
loss function that attains its minimum at the posterior
expectation. Considering again the limiting behavior of
@,(a), it turns out that (2.11) is the limit of the Bayesian
estimator as 6' + 00. On the other hand, 6' 4 rn implies
that the prior distribution of the coefficients, g,, tends to
the locally uniform distribution, P(Pi) = constant.
The distributional assumptions (3.1) through (3.3) have
been used only in order to derive the Bayesian estimator
of p, but the properties of the ELS estimator, @,(a),
can be investigated under more general priors. For the
rest of this article, we only assume the moment structure
3. PROPERTIES OF THE ELS ESTIMATORS
defined by (1.2), (1.3), (1.9, and (1.6). We denote by EE;(.)
Some optimal properties of the ELS estimators will be
the expectation over the joint distribution of the coeffiproved in this section and we show first that, for the
cients p and the variables Yij, (i, J ~ E S .
normal case, fiO(a)is the posterior expectation of Po. For
this purpose, we assume
Lemma. @,(a) is 6 unbiased in the sense that E,[@,(a)
- pw] = 0. The proof follows immediately from (2.5)
and (1.5).
xJES,
p
P(P) = constant.
The estimator @,(a) is a linear function of the observed
Y values. In the following theorem we prove that it is
(3.3) indeed the best out of all linear &unbiased estimators.
The derivation of the posterior distribution of Po folTheorem. Let 6, = L' I Y
lows that of Scott and Smith (1969) for the posterior dis- estimator of pw. Then
tribution of linear functions of the elements of a finite
population, assuming a similar model for the case of group
means (i.e., xij = 1 for all i and j).
and
From (1.7) it follows that
where Y'
=
( Y r l ,... . , Y',).
+ b be any linear &unbiased
Proof. Denote L ' = (L',, O t N )
SO
that
bw = L'YO +
Pfeffermann and Nathan: Regression Analysis for Clustered Sampling
685
b, and define C' = L ' - W O ' ( X O ' V - ' X O ) - ' X O ' ~=- l
4. THE CASE OF UNKNOWN VARIANCES
( C ' I , C'2). Denote also for simplicity (XO'V-'XO)-' by
In deriving the optimal estimator @,(a) of p,,, we asv*,
sumed that the variances, S2and u?, (i E s * ) , are known.
In practice, this condition is rarely met and one can proE<(@, - Pw) = EE{[WO'V*XO'V-'+ C ' ]
(3.10)
ceed in one of two ways.
X (XOPO+ EO) + b - WO'PO)
I Choose a suitable prior distribution to represent
and it follows from (2.2),
the knowledge available about the variances and
follow
the Bayesian approach.
= PC'XOIN+I+ b = PC'IXIN + b.
I1 Estimate the unknown variances from the sample.
Thus for
to be 6 unbiased for any P requires
Computational difficulties limit considerably the apC t l X I N= b = 0 .
(3.11) plication of the first alternative; see, for instance, Box
and Tiao (1968) and Lindley and Smith (1972).
By use of (3.10) and the result b = 0,
The second alternative, as adopted by Swamy (1970)
&(p, - pW)2= WO'V*WO+ C'VC
in dealing with RCR models and by Rao (1976) for a
random effects model, is to replace the unknown variances in the formula of the estimators by their estimates
from the sample. Following our previous attitude not to
postulate any distributional assumptions beyond those
The last term in the right-hand side of (3.12) is easily
concerning the first and second moments, we adopt this
shown to be zero by means of (2.3) and (3.11) and (3.12)
second approach, but suggest a somewhat different procan be written as
cedure for estimating S2and thereby hi of sampled groups.
The procedure has some desirable theoretical properties
and is shown to perform well in practice by the simulation
study described in the next section.
The proposed estimators for u?, (i E S * ) , are
which completes the proof.
Bw
Scott and Smith (1969) prove by means of Lagrange
multipliers a similar theorem for the special case in which
Xij = 1 for all i and j. When the weighted sum defining
p, relates only to coefficients of sampled groups, that is,
Wi = 0 for i h S*, the optimality of @,,(a)could be inferred
from a more general theorem proved by Harville (1976).
It should be noted, however, that the theorem cannot be
considered as a special case of the theorem proved by
Duncan and Horn (1972) (assuming EP is known) or of
the theorem proved in Rao (1973, p. 234) since there the
requirement for unbiasedness is equivalent, in our notation, to the condition, C t I X = 0, whereas we only require CllXIN = 0 and thus enlarge the subspace of possible estimators.
The optimal properties of pw(a) relate to the 6 distribution, given the sample of units. The results, however,
can be immediately extended to the overall expectation,
Eo(.), over all possible samples. Thus for any linear 6unbiased estimator, pw,
i
=
1 , . . . n ) . (4.1)
This is an unbiased estimator of u? and for the normal
case it is the best unbiased estimator of u? (Theil 1971,
p. 390). In this case 6? is also a consistent estimator of
ui2.When the normal assumption does not hold, but the
random deviations eij are independent for all i and j and
identically distributed for every given i, the existence of
the positive limits
lim
m,-m
2 xi?/mi
J.E S ;
=
ai # 0; (i = 1, . . . , n)
(4.2)
is known to be sufficient for ensuring the consistency of
the estimators 6;.
Denote by i i , i = 1, . . . , n, the expression obtained
by replacing u? by 6: in formula (2.7) of hi.
We propose to estimate S2 by the largest solution of
the equation
This equation is a modified form of equations (4.3.29)
through (4.3.31) of Swamy (1971, p. 112).
The left-hand side of (4.3)-with hi instead of ii-is
where the summations are over the set of all possible
samples and P(S) is the probability of selecting the sam- just the weighted error sum of squares (and thus the usual
unbiased least squares estimator of S2) under the linear
ple, S , as implied by the sampling design.
686
Journal of the American Statistical Association, September 1981
heteroscedastic model
ficients, G . ( p i - pi)/&, converges in distribution to
a standard normal, as mi + 0 3 . Furthermore, if P1 =
. . . = pn = P, and mi + 03 so that miim = constant for
i = 1, . . . , n, then it follows from Rao (1973, p. 389)
that (n - l)h(O) converges in distribution to a chi-square
with n - 1 degrees of freedom, so that the limiting probability of (4.8) holding, when P1 = . . . Pn, is one-half.
consider, however, the case where the number of Sampled groups, n, also increases. Define i2to be the positive
solution of (4.7) when it exists, and zero, otherwise.
Under the assumptions that (4.2) holds, that the deviations, eij, are identically and independently distributed
and that ~ ~ - P)4
( exists
6 ~ and is a bounded function of
mi, (i = 1, . . . , n), i2is a consistent estimator of S2, in
the sense that
S2/hi; i = j
E(eiej) =
(4.6)
otherwise.
The proposed procedure has thus much in common
with the moments method of estimation. When assumptions (3.1) and (3.2) hold, the least squares estimators are
normally distributed and the left-hand side of (4.3)-with
X i instead of ii-is the best unbiased estimator of S2, a
direct consequence from Theil (1971, p. 390). This property, together with the consistency of the estimators &
,;
suggests that in the normal case of the left-hand side of
plim
i2= S2.
(4.9)
(4.3) is in fact close to the true variance, provided that
n-m;m,-m
(
i
= 1 , . . . ,n)
the sample sizes {mi} are sufficiently large. As will be
detailed, this closeness is attained under more general
This result follows from the fact that for the true
conditions, when the number of sampled groups is also variance,
sufficiently large.
plim
h(tj2) = 1
(4.10)
Equation (4.3) has one trivial solution, iI2= 0. By
n-m;m,-m
( i = I . . . . ,n)
dividing both sides of (4.3) by S2, the following equation
is obtained.
and, since h(S2) is a continuous monotone function of S2,
h - ' is a continuous function of z in the neighborhood of
z = 1, so that (4.9) follows. The condition (4.2) is required
only to ensure the consistency of the estimators of the
variances, ui2, and it is redundant in the normal case.
We denote by pi(e) the empirical estimator of pi, obtained if we replace S2and ui2by their sample estimators,
i2and I?;, in (2.5) through (2.7).
The empirical estimator of Pw is then defined to be
N x
Two main questions concerning the solution of this
pw(e>=
wwipi(e>,
(4.11)
equation are the existence and uniqueness of a positive
i= 1
solution and the proximity of that solution to the true
variance. In order to answer these questions, consider and it follows from Slutsky's theorem (Wilks 1962, p.
the left-hand side of (4.7) as a function, h(S2), of the 102) that under the conditions that ensure that i2+ S2
and bi2+ 02,
unknown variance S2.
Function h(S2) is a decreasing monotone continuous
plim
pw(e) = pw(a).
(4.12)
n--t m;m,- m function of S2for S2 > - mini[u;l~jESixi;], and so there
( i = I , . . . , n) cannot be more than one positive solution. As lims2-m
h(S2) = 0, the existence of a positive solution is condiDeriving the explicit solutions of equation (4.3) for the
tioned on the inequality
general case does not seem possible, but the solutions
are easily found by an iterative procedure. It is interesting, however, to solve the equation for two special cases.
Case 1. xij = 1 for all i and j, u? = u2 and mi = mo
for every i. u2 will be estimated in this case by
Next, we consider the limiting properties of the components of this inequality and of the estimator obtained, if and the nonzero solution of (4.3) is
it holds. Let (4.2) hold and the random deviations, eij,
be independent, for all i and j, and identically distributed
for every i. It follows immediately from Anderson (1971,
pp. 23-25) that for given values P I , . . . , Pn of the coef- where
a re the n sample means and
(5)
y
=
(lln) x : = l
Pfeffermann and Nathan: Regression Analysis for Clustered Sampling is the overall mean. Denote by (jA(e) the empirical
estimator of p, obtained in this case. Scott and Smith
(1969), in dealing with this model, obtained the estimator
(jA(e)as an approximation to the posterior expectation
of p, assuming in addition to assumptions (3.1) through
(3.3) the noninformative prior for the unknown variances
68 7
which i*' is used to estimate ti2, rather than is2.
The
modification ensures a nonnegative estimate of S2 and
was also suggested in Swamy and Mehta (1975).
IV The empirical estimator obtained from (2.5) through
(2.9), if in (2.7) u? is replaced by 6; and ti2 is replaced
by
Case 2. n = 2. In this case, the nonzero solution of
The estimator in2
of ?i2is obtained from 6: by neglecting
(4.3) is
the factor n - '
'=, (6i2/xjEs,x
):i and thus ensuring a
positive estimator. Estimating 6' in this way is analogous
to estimating the between-groups component of variance
in ANOVA models by s 1 2 = E:='=,
(G - y)2/(n - I),
which is sometimes recommended in the literature (Kish
overestimates 8'. Note
Estimating 6' by g2 = max(0,
and substituting i2 1965, p. 168). The estimator in2
also
that
this
estimator
solves
the
equation (4.3) when Ki
and &? for the corresponding true variances in @(a),the
=
1,
(i
=
1
.
.
.
n)
and
thus
in2
2 8'. We denote this
optimal estimator of p = Epi defined by (2.6), is a special
empirical
estimator
by
@,(n).
case of the general procedure for estimating the expecV The classical estimator, p,(c), defined by (2.1 1).
tation of the random coefficients, as proposed by Swamy
This
estimator has the advantage that its quadratic loss
and Mehta (1975) when applying the RCR approach. Note
can
be
computed exactly and compared with that of the
that
is a &unbiased estimator of S2.
optimal estimator p,,,(a)through (3.13). Note also that by
(2.11), @,,(c)is the limit of @,(a) when h i + 1, (i E S*).
5. SIMULATION COMPARISONS
The simulation study was based on actual data on the
The estimation procedure of the unknown variances
floor
areas, xij, of M = 6,037 apartments in Israel, divided
discussed in the previous section must be evaluated with
into
N
= 31 groups, according to geographical location.
respect to the efficiency of the empirical estimators @,(e).
The
values
of the various parameters were chosen so as
Since analytical study does not seem feasible, except for
to
simulate
a theoretical linear relation of the general
very special cases (Rao 1976), we use simulation to comform
(1.
I),
between
an apartment price, Yij, and its area,
pare the performance of the proposed procedure both to
xij.
The
coefficients,
pi (i = 1 , . . . , 31 = N ) , were
that of other procedures proposed in the literature and
generated
from
a
normal
distribution with mean P =
to that of the optimal estimator (based on the true
2,000
and
three
alternative
values for the standard devariances).
viation:
6
= 20, 50, 80. The sample of apartments was
The different estimators of p, considered in this study
result from applying different procedures for estimating selected each time by a two-stage cluster sampling proS2from the sample and substituting the sample estimators cedure with simple random sampling without replacefor the unknown variances in the optimal estimator, ment in both stages. The number of selected groups considered was n = 10, 20, 30. For each n, three sets of
@,(a), defined by (2.5) through (2.9).
second-stage
sample sizes were considered: mi = 15
The performances of the following five estimators of
(i
E S*); mi = 45 (i E S*); and differential preassigned
p, are compared.
sample sizes, varying between 5 and 50 with mean =
I The optimal estimator, p,(a), defined by (2.5) 30 and standard deviation u, = 14.
through (2.9) on the basis of the true values of the
The prices, Y,,, of the selected apartments were genvariances.
erated from a normal distribution with mean pix, and
I1 The empirical estimator, @,(e), proposed in the preassigned standard deviations, ui, differing from group
previous section, defined by (4.11).
to group in the range 1,000 5 ui I16,000.
I11 The empirical estimator obtained from (2.5) through
Eight independent samples were selected for each com(2.9), if in (2.7) u? is replaced by &? and S2 is replaced bination of the number of sampled groups, n, and of the
bY
second-stage sample sizes, mi. For each sample of units,
n
S, 100 sets of values of pi (i = S*), and corresponding
0, (n - I)-' 2 ( p i - @*I2
values of Yij, [(i, j) E S] were generated for each value
i= 1
(5.1) of 6. The number of simulations chosen ensured that for
all sets of weights w considered, the sample mean squared
error (SMSE) of the optimal estimator @,(a) did not differ
from the theoretical mean squared error by more than 5
We denote this empirical estimator by @,(s). It is a mod- percent. The sets of weights w were determined according
ification of an estimator proposed by Swamy (1970), in to (1.8) for 10 selected values of x. One additional set
x:=
m
688
Journal of the American Statistical Association, September 1981
considered was the set of equal weights, namely, w i =
1/31 (i = 1, . . . , 31).
For each set of weights and for every estimator of P,,
the overall SMSE's were computed by averaging over all
800 simulations. In addition, the average of the sample
mean squared errors (ASMSE) of the estimators of the
separate coefficients pi was computed for each type of
estimator by averaging SMSE over all 31 groups. This
is the sample estimate of the loss, defined as
by Rao (1976), where pi(-) stands for any estimator of pi.
Table 1 presents the square roots of the ASMSE of the
five estimators considered, for each value of 6, according
to the number of selected groups, n, and the second-stage
sample sizes, mi. Only values of the square roots of the
ASMSE are presented, since their behavior is completely
consistent with that of the SMSE of the respective estimators of pw for the various sets of w considered.
The main conclusions from the table are as follows.
I For each of the estimators examined, the error
decreases when the second-stage sample sizes increase.
Except for the case of the classical estimator, p,,(c), when
6 is small, the error also decreases with n and the effect
of increasing n is in general stronger than that of increasing the second-stage sample sizes, {mi). This can be seen,
for example, by comparing the error for n = 10, mi =
15 (i E S*) with the corresponding errors obtained for n
= 10, mi = 45, (i E S*) and the errors obtained for n
= 30, mi = 15 (i E S*).
I1 In order to evaluate the efficiency of the proposed
empirical estimator pw(e), the relative increase in the
ASMSE caused by using this estimator instead of the
optimal estimator (based on the true values of the variances) can be computed. It turns out that, except for very
low values of hi, caused either by small values of a2 or
by large values of the conditional variances, ai21XjEs,
xi$, the relative increase is very small and it becomes
smaller as the values of hi become larger. Thus, for 6
= 20 and differential second-stage sample sizes, the relative increase is less than 5.5 percent, while for 6 = 50
it is less than 3.1 percent, even for mi = 15, (i E S*). It
is interesting to notice that the relative increase in the
ASMSE becomes larger with n. The reason for this phenomenon is that for both the empirical estimators and for
the optimal estimator, coefficients of nonsampled groups
are estimated by the estimator of the expectation, which
is obtained as a weighted average of the least square
estimators pi, (i E S*). Although the weights differ from
one estimator to the other, the estimators themselves are
very similar. As a result, for small values of n, when most
coefficients are estimated by the expectation estimator,
the differences in ASMSE between the empirical estimators and the corresponding optimal estimators are very
slight. When, on the other hand, n is large and most
groups are sampled, the difference in ASMSE between
the two estimators becomes sharper, although they are
still, in general, very small.
I11 It is instructive to compare the performance of all
the empirical estimators considered in this study with that
of the classical estimator. The empirical estimators turn
out to be much more efficient, especially for large values
of n and small values of hi, (i E S*).
IV Comparison of the performance of the three empirical estimators themselves indicates that pw(e) is, in
general, more efficient than both b,.(s) and pw(n)for small
Table 1. Square Root of the Average Sample Mean Squared Error (ASMSE) of the Optimal Estimator and
of the Empirical Estimators
n
10
20
30 m,
6
Estimator
15
Varying
45
15
Varying
45
15
Varying
45
Pfeffermann and Nathan: Regression Analysis for Clustered Sampling
values of 6, while fi,(n) is in most cases slightly better
than the other two' when 6 is large. It shoul'd be noted
also that the differences between fiw(n)and both fi,,(e)
and fi,,!~) increase with n in a similar way to those betwee? P,,,(e) a@ fi,(a) for the same reason discussed in
(11). PJe) and p,,(s) seem to be equally good, but a careful
examination indicates that p,,(e) performs somewhat better than pw(s), as the differences between the values of
h iof the different groups increase. Note that when all the
values of h i are equal for i E S*, $2 and 6 * 2 are the same
and so the two empirical estimators, pw(e) and fiw(s),
coincide.
Other empirical estimators could be considered, such
as estimating the unknown variances by MINQUE estimators, but the small differences observed between the
optimal estimators @,(a) and the empirical estimators
fiw(e)seem to validate the procedure we adopted for estimating p,. Further research is required to compare this
approach with other approaches, such as the Bayesian
approach, using a well-defined joint prior distribution for
the coefficients and the unknown variances. In particular,
the robustness of the various approaches to deviations
from the underlying model should be examined.
[Received March 1979. Revised October 1980.1
REFERENCES
ANDERSON, T.W. (1971), The Statistical Analysis of Time Series,
New York: John Wiley.
BOX, G.E.P., and TIAO, G.C. (1968), "Bayesian Estimation of Means
for the Random Effect Model," Journal of the American Statistical
Association, 63, 174-181.
DUNCAN, D.B., and HORN, S.D. (1972), "Linear Dynamic Recursive
Estimation From the Viewpoint of Regression Analysis," Journal of
the American Statistical Association, 67, 815-822.
ERICSON, W.A. (1969), "Subjective Bayesian Models in Sampling
Finite Populations," Journal of the Royal Statistical Society, Ser. B,
31, 195-233.
HAITOVSKY, Y. (1973), "Maximum Joint Probability Estimates of
689
the Linear Hierarchical Model," unpublished paper, Hebrew
University.
HARVILLE, D. (1976), "Extension of the Gauss Markov Theorem to
Include the Estimation of Random Effects," Annals of Statistics, 4,
384-396.
KISH, L. (1965), Survey Sampling, New York: John Wiley.
KISH, L., and FRANKEL, M.R. (1974), "Inference From Complex
Samples," Journal of the Royal Statistical Society, Ser. B., 36, 1-37.
KONIJN, H. (1962), "Regression Analysis in Sample Surveys," Journal of the American Statistical Association, 57, 590-605.
LINDLEY, D.V., and SMITH, A.F.M. (1972). "Bayes Estimates for
the Linear Model," Journal of the Royal Statistical Society, Ser. B,
34, 1-18.
PFEFFERMANN, D. (1978), "Regression Analysis for Complex Samples From Finite Populations," unpublished Ph.D. thesis, Hebrew
University.
PORTER, R.M. (1973), "On the Use of Survey Sample Weights in the
Linear Model," Annals of Economic and Social Measurement, 2,
141-158.
RAO, C.R. (1973), Linear Statistical Inference and Its Applications
(2nd ed.), New York: John Wiley.
(1976), "Characterization of Prior Distributions and Solution to
a Compound Decision Problem," Annals of Statistics, 4, 823-835.
SARNDAL, C.E. (1978), "Design Based and Model Based Inference
in Survey Sampling," Scandinavian Journal of Statistics, 5, 27-52.
SCOTT, A., and SMITH, T.M.F. (1969), "Estimation in Multistage
Surveys," Journal of the American Statistical Association, 64,
830-840.
SEDRANSK, J. (1977), "Sampling Problems in the Estimation of the
Money Supply," Journal of the American Statistical Association, 72,
516-522.
SMITH, T.M.F. (1976), "The Foundations of Survey Sampling. A Review," Journal of the Royal Statistical Society, Ser. A, 139, 183-195.
SWAMY, P.A.V.B. (1970), "Efficient Inference in a Random Coefficient Regression Model," Econometrica, 38, 311-323.
(1971), Statistical Inference in Random Coef3cient Regression
Models, New York: Springer-Verlag.
SWAMY, P.A.V.B., and MEHTA, J.S. (1975), "Bayesian and NonBayesian Analysis of Switching Regressions and of Random Coefficient Regression Models," Journal of the American Statistical Association, 70, 593-602.
THEIL, H . (1971), Principles of Econometrics, New York: John Wiley.
WILKS, S.S. (1962), Mathematical Statistics, New York: John Wiley.
ZELLNER, A. (1962), "An Efficient Method for Estimating Seemingly
Unrelated Regressions and Tests for Aggregation Bias," Journal of
the American Statistical Association, 57, 348-368.
(1966), "On the Aggregation Problem. A New Approach to a
Troublesome Problem," Report # 6628, University of Chicago, Center for Mathematical Studies in Business and Economics.