Regression Analysis of Data from a Cluster Sample Dan Pfeffermann; Gad Nathan Journal of the American Statistical Association, Vol. 76, No. 375. (Sep., 1981), pp. 681-689. Stable URL: http://links.jstor.org/sici?sici=0162-1459%28198109%2976%3A375%3C681%3ARAODFA%3E2.0.CO%3B2-Z Journal of the American Statistical Association is currently published by American Statistical Association. Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/journals/astata.html. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals and scholarly literature from around the world. The Archive is supported by libraries, scholarly societies, publishers, and foundations. It is an initiative of JSTOR, a not-for-profit organization with a mission to help the scholarly community take advantage of advances in technology. For more information regarding JSTOR, please contact support@jstor.org. http://www.jstor.org Sat Dec 15 10:44:14 2007 Regression Analysis of Data From a Cluster Sample DAN PFEFFERMANN and GAD NATHAN* The purpose is to estimate linear combinations, P, = +;Pi, of the regression coefficients with known weights, wi, on the basis of a sample, S, which includes units from only part of the groups, Ui. We denote the sample by S = {(i, j): i E S*, j E Si), where S * is a sample of n groups and Si is a sample of mi units in the ith group. We do not specify the sampling design except that it is a probability measure on the set of all possible samples. Regression models which assume different vectors of coefficients in different groups are widely treated in the literature. Of special interest is the work of Zellner (1962) on the estimation of seemingly unrelated regressions. In the case of nonzero covariances between the random deviations, eij, relating to different groups, and suffiKEY WORDS: Regression; Complex samples; Cluster ciently large samples, S,, from each group, Ui, i = 1, . . . , N, the estimators proposed for the individual coefsampling; Extended least squares; Bayesian estimator. ficients, based on the whole sample, S , are better than those obtained by estimating each coefficient Pi from the 1. INTRODUCTION corresponding sample Si. Zellner assumes however anonThe problem of regression analysis for data from a empty sample, Si, from each group, Ui, and thus, does cluster sample, with different regression relationships in not deal with the case of groups not represented in the different clusters, can be formulated as follows. For each sample. unit, u, of a finite population, U, containing M identifiable The model defined by ( I . 1) through (1.3) was investiunits, there are observations on two variables, Y and X. gated by Konijn (1962), who defined the parameter of inU is decomposed into N exclusive and exhaustive groups terest as the weighted average, ~ K " = , ; P ; I E K , M i , (clusters), UI, . . . , UN, with Ui = {uil, . . . , u ; ~ , ) , and, for two-stage cluster sampling, proposed the Horvitzand 2 K 1 M~ = M. Thompson type estimator (EiES* M ~ B ~ / ~ELl F ~ ) /Mi, The random variable, Yij, and the observation, xij, on where is the ordinary least squares estimator of Pi X, associated with the unit uij, are assumed to have the from the sample, Si, and Tri is the inclusion probability linear relationship of Ui, i = 1, . . . , N. Similarly, Porter (1973) treats the case in which the various groups, U;, are different units, with t available observations for each unit, and the parameter of interest is defined as the simple mean, 2 : di/N. for some vector P' = ( P I , . . . , PN), where the random The problem under consideration is closely related to deviations, eii, satisfy the common problem of estimating totals, T = EE zi, E(€ij 1 = 0, for all i and j, (1.2) of finite populations with complication caused by the fact that the coefficients of sampled groups also have to be and : estimated. Two basic approaches for estimating totals of finite populations are the design-based approach and the mi2; i = k , j = I E ( ~ i j ~I kXi,j, / X ~ I )= (1.3) model-based approach, discussed fully, for example, by 0; otherwise. Smith (1976) and by Sarndal(1978). Basically, the designbased approach assigns randomness only to the sample design and assumes no prior relationship between the * Dan Pfeffermann is Lecturer and Gad Nathan is Associate Professor, at the Department of Statistics, Hebrew University of Jerusalem. values zi. The model-based approach assumes that the They are also both associated with the Israel Central Bureau of Statis- values, zi, are realized outcomes of random variables, Zi, For the case of different regression relationships in different subgroups of a finite population, only part of which are sampled, the extended least squares estimator of any weighted average of the distinct coefficients is derived, under assumptions relating only to the first two moments of the distribution of the coefficients. Under these assumptions, the estimator is shown to be the best linear &unbiased estimator, while under further distributional assumptions, it is also the Bayesian estimator for a quadratic loss function. For the case of unknown variances a method for estimating them from the sample is proposed. The empirical estimator thus obtained is shown to perform well by a simulation comparison with the optimal estimator and with other proposed empirical estimators. 2z pi tics. A previous version of this paper was presented as an invited paper at the third meeting of the International Association of Survey Statisticians, New Delhi, 1977, and it is partially based on the thesis of the first author, written under the supervision of the second author. The authors are grateful to anonymous referees for their helpful comments, and to Israel Einot for programming the simulation computations. 0 Journal of the American Statistical Association September 1981, Volume 76, Number 375 Theory and Methods Section 682 Journal of the American Statistical Association, September 1981 which are related by a mathematical model, while the sample design has no role in the inference process. Konijn (1962) and Porter (1973) obviously follow the design-based approach. Since they assume no prior relationship between the coefficients, the estimators they propose have no optimal properties. In this they are similar to the estimators of totals for finite populations, under the design-based approach. The fact that observations are available only for part of the groups naturally leads to the formulation of a model which defines the relationships between the coefficients, p,. Scott and Smith (1969), Lindley and Smith (1972), and Sedransk (1977) provide typical examples of this alternative approach, in which the coefficients are assumed to be generated from a normal distribution. By assuming, in addition, a normal distribution for the random deviations, ei,,the Bayesian estimators of the coefficients and of the population total are derived. Only Scott and Smith (1969), however, deal with the special situation where only part of the groups are represented in the sample, and their results are restricted to a model which is analogous to the random effects model in analysis of variance. For most of our results we can relax the normal assumptions and assume only that the unknown coefficients are uncorrelated random variables with the same expectation and variance, that is, a2; i = E(vivk) however, the assumed model, even in its general form, is sufficient for the derivation of best linear unbiased estimators of P,,.. Note that the restriction to the case of simple regression is for simplicity only and that the generalization to the case of multiple regression is straightforward. Similarly, more general variance-covariance matrices can easily be considered (Pfeffermann 1978). Finally, although we consider the coefficients, pi, as random variables, our main concern is in estimating their weighted average, P,, rather than in estimating their common expectation, p. This differs somewhat from the random coefficient regression (RCR) approach proposed in econometrics for analyzing cross-sectional data; and although it also considers the estimation of the realizations, pi, it focuses attention on the estimation of p (Zellner 1966; Swamy 1970, 1971; and Swamy and Mehta 1975). Note that by this approach the distribution of the coefficients P, represents only physical probability. In many applications the specific realizations of the random coefficients in an actual finite population rather than their expectation over all possible realizations is of primary interest, especially when the variance ti2 is relatively high. An important special case, which requires the estimation of linear combinations of the coefficients, is that in which we wish to predict the value Y(u) of a unit, u, selected at random from the population, given that its X value, X(u), is a given value, x; defining the predictor as the conditional expectation m(x) = E(Y(u) I X(u) = x), k (1.6) = 0; otherwise. These assumptions may be considered as relating to a random process which generates the values of the coefficients in the various groups or, alternatively, as relating to their prior subjective distribution. In particular, these assumptions hold if the Pi's are exchangeable and uncorrelated, which implies that the group labels (i = 1, . . . , N ) provide no information on the values of the corresponding coefficients (Ericson 1969). It should be noted that the common mean, p, is fixed but unknown. Considering the coefficients as random variables necessitates further characterization of the sample design and we assume that it is noninformative, in the sense that the selection of the sample is independent of the realizations of the Pi's, that is, Although we basically adopt the model-based approach, our assumptions about the model are very general and are suitable for many practical problems. Since, as pointed out by Ericson (1969), the ideas that underlie the assumption of an exchangeable prior distribution are very similar to those.underlying the use of simple random sampling in the classical approach, our assumptions can be considered, in a certain sense, as a compromise between the two approaches. As shown in the following sections, where Pi(x) = P(u E U I X(u) = x). The estimation of m(x) can be considered as transforming microrelationships into macrorelationships. For example, the assumptions (1.1) through (1.3) may be postulated as relating to those between the price of an apartment and its size, where the groups, Ui, represent different localities. The estimation of (1.8) may be required as representing an average price, over all localities, of an apartment of given size, x, the weights Pi(x) being the known proportions of apartments of that size in the different localities. Applying this procedure, for different sizes, may then serve for constructing an apartment price index, adjusted for changes in size. This procedure is indeed currently being considered by the Israel Central Bureau of Statistics. The estimation of a single overall finite population regression coefficient (Kish and Frankel 1974) would certainly not be satisfactory for this situation. In the following section we derive the extended least squares estimator, @,(a),of p, and investigate its relationship to estimators proposed under the design-based approach. In Section 3 we prove that this estimator is the best linear &unbiased estimator, where 6 denotes the joint distribution of the vector P and the variables Yij, (i, j) E S. For the normal case (with a vague prior distribution for p), @,(a) is shown to be the posterior expectation of Pfeffermannand Nathan: Regression Analysis for Clustered Sampling 683 P,,, and thus to be the Bayesian estimator of P,,. under, dividual coefficients, pi, and of their expectation, p: for example, a quadratic loss function. In Section 4 the problem of unknown variances is considered and a method for their estimation from the sample is proposed. @(a)= i = I hipi i = I hi, By replacing the unknown variances by their sample estimates in p,,.(a),the empirical estimator, p,,.(e),of P,, is where obtained. The final section presents some simulation comparisons of the performance of @,,,(e)with that of the optimal estimator, p,.(a), and other empirical estimators obtained by using different estimators of the unknown variances. (0; otherwise, 2. ESTIMATING P,, BY EXTENDED LEAST SQUARES and i Throughout this and the next section the variances, u? and S2, are assumed to be known. The case of unknown variances will be treated in Section 4. We also assume that the X values are fixed for given units. Generalization to the case in which the xij's are observations on random variables, which are independent of the random deviations, ek,, for all pairs (k, I), and of the coefficients, Pk, for every k , is immediate. We follow Duncan and Horn (1972) and Haitovsky (1973) and write (1.1) for a given sample of units, together with (1.4) as a single linear model, /i Pi = A (2.8) 10; otherwise. pi is the classical least squares estimator of pi for a sampled group and is defined as zero otherwise. Equation (2.5) defines, for i E S * , pi(a) as the weighted average of the least squares estimator, pi, and of the estimator of the common expectation, p, with weights X i and 1 - Xi, respectively. The weight, hi, is a decreasing function of the ratio between u i 2 / ~ j , s ,xi: (the conditional variance of around pi) and S2 (the variance of pi around PI. The smaller the ratio, the more weight given to the least squares estimator, compared to that given to the estimator of the expectation. Coefficients of nonsampled groups are estimated by the estimator of the expectation, p(a), which is a weighted average of the least squares estimators, the weights being determined by the reciprocals of the variances: - P)2 = S2 + ui2IZjES, x.:i This straightforward result stems from the assumption that coefficients of different groups are uncorrelated and is the same as that obtained from applying the random coefficient regression (RCR) approach of Swamy (1970). In fact, p(a) is the classical least squares estimator of p model holding for the obtained from the least squares estimators pi, i S* (see also set. 4). If the vector ( w l , . . . , wN, 0) is denoted by WO', the EL^ estimator of @,, is defined as pi with the following notations: Yo' = (Y'I, . . . , Y', , OfN), where Yi denotes the Y values observed for si (the sample from the ith selected cluster) and Ork denotes in general a zeros vector of order k ; where xi denotes the X values for si and Om,N- is a zeros matrix of order m ( N - n), m = Z:=Imi; b and denote an identity matrix and a units vector of order k; E ' = (erI . . . ern)where ~i denotes the vector of random errors corresponding to the sampled units in si; and v' = (v, , . . . , vN), vi is defined by (1.4). We assume E(eij vi) = 0, for all i and j. Then from (1.2), (1.3), (1.3, and (1.6) we obtain * E(eO)= Om + N E(eOeor) = diag(u12I r m ,, . . . , u? lrm,,,Zj21IN) Consider the following estimator of Po: ~ ~ ( 6 ~ N bw(a) = w O ' p o ( a )= 2 wipi(a). (2.9) i= 1 It is of special interest to follow the limiting behavior of this estimator for very large and for very small values (2.3) of the variances. We assume for simplicity that wi (2.2) + When S2 becomes small and the ratios ci = u? xi: (i = 1, . . . , n) are bounded from below, iES* jES; Following Haitovsky (1973), we term p ( a ) the exlim &,,(a) = (2.10) 6*-0 2 xij2/5? tended least squares (ELS) estimator (or predictor) of i€S* j€S, p . Straightforward algebraic manipulations imply the following expressions for the ELS estimators of the in- which is the estimator of a common regression coefficient 9 684 Journal of the American Statistical Association, September 1981 computed from all the observations. Since 6'+ 0 implies Thus, the likelihood of Po, given the sample, S , and that the coefficients tend to be equal, this is an obvious the vector of observations, Y, is result. 1 " When the ratios, ci, decrease to zero and 6' remains IIPOI (S, Y)] .Iexp{- - C, (yij - PixiJ)2/v: 2 i = 1 jes, bounded (or when 6' becomes large and the ratios ci are bounded), (3.5) Equations (3.2) and (3.3) imply that the joint prior dis- tribution of B0 is = C, wipi iES* + (2.11) C, wip* = p,(C) igS* piln. where p* = 2;=1 In particular, this implies that coefficients of sampled groups are estimated by their least squares estimators, while coefficients of nonsampled groups are estimated by the simple mean of the estimators of the coefficients of sampled groups. For this limiting case the procedure is analogous to that suggested by tiaditional sampling theory for estimating totals of finite populations, based on simple random sampling without replacement, with the only distinction that the coefficients of sampled groups are replaced by their least squares estimators. Furthermore, when estimating p 7 EE, PiIN, the estimator obtained when 6' + rn is P* and the same estimator is also obtained when all the conditional variances, u;/ xlJ2,are equal. @*was proposed by Porter (1973) for estimating in the case of simple random sampling. It is easy to verify that in this case @ *is an unbiased estimator of p , for any given vector p , where the expectation is over the distribution of the least squares estimators for given sampled groups and over the random selection of groups. The correspondence between the results under the assumption of an exchangeable prior and under simple random sampling has been noted in a different context by Ericson (1969). The posterior distribution of Po is proportional to the product of the likelihood function and the prior distribution that, after some algebraic manipulations, can be written as P(POI (S, Y)1 Thus the posterior distribution of Po' is multinormal with expectation S0(a) and variance covariance matrix (P'v-lXO)-l. The fact that Po has a multinormal posterior distribution implies that for any given vector of weights w' = (w1, . . . , wN), Pw has a normal posterior distribution, so that @,(a) is the Bayesian estimator of Pw for every loss function that attains its minimum at the posterior expectation. Considering again the limiting behavior of @,(a), it turns out that (2.11) is the limit of the Bayesian estimator as 6' + 00. On the other hand, 6' 4 rn implies that the prior distribution of the coefficients, g,, tends to the locally uniform distribution, P(Pi) = constant. The distributional assumptions (3.1) through (3.3) have been used only in order to derive the Bayesian estimator of p, but the properties of the ELS estimator, @,(a), can be investigated under more general priors. For the rest of this article, we only assume the moment structure 3. PROPERTIES OF THE ELS ESTIMATORS defined by (1.2), (1.3), (1.9, and (1.6). We denote by EE;(.) Some optimal properties of the ELS estimators will be the expectation over the joint distribution of the coeffiproved in this section and we show first that, for the cients p and the variables Yij, (i, J ~ E S . normal case, fiO(a)is the posterior expectation of Po. For this purpose, we assume Lemma. @,(a) is 6 unbiased in the sense that E,[@,(a) - pw] = 0. The proof follows immediately from (2.5) and (1.5). xJES, p P(P) = constant. The estimator @,(a) is a linear function of the observed Y values. In the following theorem we prove that it is (3.3) indeed the best out of all linear &unbiased estimators. The derivation of the posterior distribution of Po folTheorem. Let 6, = L' I Y lows that of Scott and Smith (1969) for the posterior dis- estimator of pw. Then tribution of linear functions of the elements of a finite population, assuming a similar model for the case of group means (i.e., xij = 1 for all i and j). and From (1.7) it follows that where Y' = ( Y r l ,... . , Y',). + b be any linear &unbiased Proof. Denote L ' = (L',, O t N ) SO that bw = L'YO + Pfeffermann and Nathan: Regression Analysis for Clustered Sampling 685 b, and define C' = L ' - W O ' ( X O ' V - ' X O ) - ' X O ' ~=- l 4. THE CASE OF UNKNOWN VARIANCES ( C ' I , C'2). Denote also for simplicity (XO'V-'XO)-' by In deriving the optimal estimator @,(a) of p,,, we asv*, sumed that the variances, S2and u?, (i E s * ) , are known. In practice, this condition is rarely met and one can proE<(@, - Pw) = EE{[WO'V*XO'V-'+ C ' ] (3.10) ceed in one of two ways. X (XOPO+ EO) + b - WO'PO) I Choose a suitable prior distribution to represent and it follows from (2.2), the knowledge available about the variances and follow the Bayesian approach. = PC'XOIN+I+ b = PC'IXIN + b. I1 Estimate the unknown variances from the sample. Thus for to be 6 unbiased for any P requires Computational difficulties limit considerably the apC t l X I N= b = 0 . (3.11) plication of the first alternative; see, for instance, Box and Tiao (1968) and Lindley and Smith (1972). By use of (3.10) and the result b = 0, The second alternative, as adopted by Swamy (1970) &(p, - pW)2= WO'V*WO+ C'VC in dealing with RCR models and by Rao (1976) for a random effects model, is to replace the unknown variances in the formula of the estimators by their estimates from the sample. Following our previous attitude not to postulate any distributional assumptions beyond those The last term in the right-hand side of (3.12) is easily concerning the first and second moments, we adopt this shown to be zero by means of (2.3) and (3.11) and (3.12) second approach, but suggest a somewhat different procan be written as cedure for estimating S2and thereby hi of sampled groups. The procedure has some desirable theoretical properties and is shown to perform well in practice by the simulation study described in the next section. The proposed estimators for u?, (i E S * ) , are which completes the proof. Bw Scott and Smith (1969) prove by means of Lagrange multipliers a similar theorem for the special case in which Xij = 1 for all i and j. When the weighted sum defining p, relates only to coefficients of sampled groups, that is, Wi = 0 for i h S*, the optimality of @,,(a)could be inferred from a more general theorem proved by Harville (1976). It should be noted, however, that the theorem cannot be considered as a special case of the theorem proved by Duncan and Horn (1972) (assuming EP is known) or of the theorem proved in Rao (1973, p. 234) since there the requirement for unbiasedness is equivalent, in our notation, to the condition, C t I X = 0, whereas we only require CllXIN = 0 and thus enlarge the subspace of possible estimators. The optimal properties of pw(a) relate to the 6 distribution, given the sample of units. The results, however, can be immediately extended to the overall expectation, Eo(.), over all possible samples. Thus for any linear 6unbiased estimator, pw, i = 1 , . . . n ) . (4.1) This is an unbiased estimator of u? and for the normal case it is the best unbiased estimator of u? (Theil 1971, p. 390). In this case 6? is also a consistent estimator of ui2.When the normal assumption does not hold, but the random deviations eij are independent for all i and j and identically distributed for every given i, the existence of the positive limits lim m,-m 2 xi?/mi J.E S ; = ai # 0; (i = 1, . . . , n) (4.2) is known to be sufficient for ensuring the consistency of the estimators 6;. Denote by i i , i = 1, . . . , n, the expression obtained by replacing u? by 6: in formula (2.7) of hi. We propose to estimate S2 by the largest solution of the equation This equation is a modified form of equations (4.3.29) through (4.3.31) of Swamy (1971, p. 112). The left-hand side of (4.3)-with hi instead of ii-is where the summations are over the set of all possible samples and P(S) is the probability of selecting the sam- just the weighted error sum of squares (and thus the usual unbiased least squares estimator of S2) under the linear ple, S , as implied by the sampling design. 686 Journal of the American Statistical Association, September 1981 heteroscedastic model ficients, G . ( p i - pi)/&, converges in distribution to a standard normal, as mi + 0 3 . Furthermore, if P1 = . . . = pn = P, and mi + 03 so that miim = constant for i = 1, . . . , n, then it follows from Rao (1973, p. 389) that (n - l)h(O) converges in distribution to a chi-square with n - 1 degrees of freedom, so that the limiting probability of (4.8) holding, when P1 = . . . Pn, is one-half. consider, however, the case where the number of Sampled groups, n, also increases. Define i2to be the positive solution of (4.7) when it exists, and zero, otherwise. Under the assumptions that (4.2) holds, that the deviations, eij, are identically and independently distributed and that ~ ~ - P)4 ( exists 6 ~ and is a bounded function of mi, (i = 1, . . . , n), i2is a consistent estimator of S2, in the sense that S2/hi; i = j E(eiej) = (4.6) otherwise. The proposed procedure has thus much in common with the moments method of estimation. When assumptions (3.1) and (3.2) hold, the least squares estimators are normally distributed and the left-hand side of (4.3)-with X i instead of ii-is the best unbiased estimator of S2, a direct consequence from Theil (1971, p. 390). This property, together with the consistency of the estimators & ,; suggests that in the normal case of the left-hand side of plim i2= S2. (4.9) (4.3) is in fact close to the true variance, provided that n-m;m,-m ( i = 1 , . . . ,n) the sample sizes {mi} are sufficiently large. As will be detailed, this closeness is attained under more general This result follows from the fact that for the true conditions, when the number of sampled groups is also variance, sufficiently large. plim h(tj2) = 1 (4.10) Equation (4.3) has one trivial solution, iI2= 0. By n-m;m,-m ( i = I . . . . ,n) dividing both sides of (4.3) by S2, the following equation is obtained. and, since h(S2) is a continuous monotone function of S2, h - ' is a continuous function of z in the neighborhood of z = 1, so that (4.9) follows. The condition (4.2) is required only to ensure the consistency of the estimators of the variances, ui2, and it is redundant in the normal case. We denote by pi(e) the empirical estimator of pi, obtained if we replace S2and ui2by their sample estimators, i2and I?;, in (2.5) through (2.7). The empirical estimator of Pw is then defined to be N x Two main questions concerning the solution of this pw(e>= wwipi(e>, (4.11) equation are the existence and uniqueness of a positive i= 1 solution and the proximity of that solution to the true variance. In order to answer these questions, consider and it follows from Slutsky's theorem (Wilks 1962, p. the left-hand side of (4.7) as a function, h(S2), of the 102) that under the conditions that ensure that i2+ S2 and bi2+ 02, unknown variance S2. Function h(S2) is a decreasing monotone continuous plim pw(e) = pw(a). (4.12) n--t m;m,- m function of S2for S2 > - mini[u;l~jESixi;], and so there ( i = I , . . . , n) cannot be more than one positive solution. As lims2-m h(S2) = 0, the existence of a positive solution is condiDeriving the explicit solutions of equation (4.3) for the tioned on the inequality general case does not seem possible, but the solutions are easily found by an iterative procedure. It is interesting, however, to solve the equation for two special cases. Case 1. xij = 1 for all i and j, u? = u2 and mi = mo for every i. u2 will be estimated in this case by Next, we consider the limiting properties of the components of this inequality and of the estimator obtained, if and the nonzero solution of (4.3) is it holds. Let (4.2) hold and the random deviations, eij, be independent, for all i and j, and identically distributed for every i. It follows immediately from Anderson (1971, pp. 23-25) that for given values P I , . . . , Pn of the coef- where a re the n sample means and (5) y = (lln) x : = l Pfeffermann and Nathan: Regression Analysis for Clustered Sampling is the overall mean. Denote by (jA(e) the empirical estimator of p, obtained in this case. Scott and Smith (1969), in dealing with this model, obtained the estimator (jA(e)as an approximation to the posterior expectation of p, assuming in addition to assumptions (3.1) through (3.3) the noninformative prior for the unknown variances 68 7 which i*' is used to estimate ti2, rather than is2. The modification ensures a nonnegative estimate of S2 and was also suggested in Swamy and Mehta (1975). IV The empirical estimator obtained from (2.5) through (2.9), if in (2.7) u? is replaced by 6; and ti2 is replaced by Case 2. n = 2. In this case, the nonzero solution of The estimator in2 of ?i2is obtained from 6: by neglecting (4.3) is the factor n - ' '=, (6i2/xjEs,x ):i and thus ensuring a positive estimator. Estimating 6' in this way is analogous to estimating the between-groups component of variance in ANOVA models by s 1 2 = E:='=, (G - y)2/(n - I), which is sometimes recommended in the literature (Kish overestimates 8'. Note Estimating 6' by g2 = max(0, and substituting i2 1965, p. 168). The estimator in2 also that this estimator solves the equation (4.3) when Ki and &? for the corresponding true variances in @(a),the = 1, (i = 1 . . . n) and thus in2 2 8'. We denote this optimal estimator of p = Epi defined by (2.6), is a special empirical estimator by @,(n). case of the general procedure for estimating the expecV The classical estimator, p,(c), defined by (2.1 1). tation of the random coefficients, as proposed by Swamy This estimator has the advantage that its quadratic loss and Mehta (1975) when applying the RCR approach. Note can be computed exactly and compared with that of the that is a &unbiased estimator of S2. optimal estimator p,,,(a)through (3.13). Note also that by (2.11), @,,(c)is the limit of @,(a) when h i + 1, (i E S*). 5. SIMULATION COMPARISONS The simulation study was based on actual data on the The estimation procedure of the unknown variances floor areas, xij, of M = 6,037 apartments in Israel, divided discussed in the previous section must be evaluated with into N = 31 groups, according to geographical location. respect to the efficiency of the empirical estimators @,(e). The values of the various parameters were chosen so as Since analytical study does not seem feasible, except for to simulate a theoretical linear relation of the general very special cases (Rao 1976), we use simulation to comform (1. I), between an apartment price, Yij, and its area, pare the performance of the proposed procedure both to xij. The coefficients, pi (i = 1 , . . . , 31 = N ) , were that of other procedures proposed in the literature and generated from a normal distribution with mean P = to that of the optimal estimator (based on the true 2,000 and three alternative values for the standard devariances). viation: 6 = 20, 50, 80. The sample of apartments was The different estimators of p, considered in this study result from applying different procedures for estimating selected each time by a two-stage cluster sampling proS2from the sample and substituting the sample estimators cedure with simple random sampling without replacefor the unknown variances in the optimal estimator, ment in both stages. The number of selected groups considered was n = 10, 20, 30. For each n, three sets of @,(a), defined by (2.5) through (2.9). second-stage sample sizes were considered: mi = 15 The performances of the following five estimators of (i E S*); mi = 45 (i E S*); and differential preassigned p, are compared. sample sizes, varying between 5 and 50 with mean = I The optimal estimator, p,(a), defined by (2.5) 30 and standard deviation u, = 14. through (2.9) on the basis of the true values of the The prices, Y,,, of the selected apartments were genvariances. erated from a normal distribution with mean pix, and I1 The empirical estimator, @,(e), proposed in the preassigned standard deviations, ui, differing from group previous section, defined by (4.11). to group in the range 1,000 5 ui I16,000. I11 The empirical estimator obtained from (2.5) through Eight independent samples were selected for each com(2.9), if in (2.7) u? is replaced by &? and S2 is replaced bination of the number of sampled groups, n, and of the bY second-stage sample sizes, mi. For each sample of units, n S, 100 sets of values of pi (i = S*), and corresponding 0, (n - I)-' 2 ( p i - @*I2 values of Yij, [(i, j) E S] were generated for each value i= 1 (5.1) of 6. The number of simulations chosen ensured that for all sets of weights w considered, the sample mean squared error (SMSE) of the optimal estimator @,(a) did not differ from the theoretical mean squared error by more than 5 We denote this empirical estimator by @,(s). It is a mod- percent. The sets of weights w were determined according ification of an estimator proposed by Swamy (1970), in to (1.8) for 10 selected values of x. One additional set x:= m 688 Journal of the American Statistical Association, September 1981 considered was the set of equal weights, namely, w i = 1/31 (i = 1, . . . , 31). For each set of weights and for every estimator of P,, the overall SMSE's were computed by averaging over all 800 simulations. In addition, the average of the sample mean squared errors (ASMSE) of the estimators of the separate coefficients pi was computed for each type of estimator by averaging SMSE over all 31 groups. This is the sample estimate of the loss, defined as by Rao (1976), where pi(-) stands for any estimator of pi. Table 1 presents the square roots of the ASMSE of the five estimators considered, for each value of 6, according to the number of selected groups, n, and the second-stage sample sizes, mi. Only values of the square roots of the ASMSE are presented, since their behavior is completely consistent with that of the SMSE of the respective estimators of pw for the various sets of w considered. The main conclusions from the table are as follows. I For each of the estimators examined, the error decreases when the second-stage sample sizes increase. Except for the case of the classical estimator, p,,(c), when 6 is small, the error also decreases with n and the effect of increasing n is in general stronger than that of increasing the second-stage sample sizes, {mi). This can be seen, for example, by comparing the error for n = 10, mi = 15 (i E S*) with the corresponding errors obtained for n = 10, mi = 45, (i E S*) and the errors obtained for n = 30, mi = 15 (i E S*). I1 In order to evaluate the efficiency of the proposed empirical estimator pw(e), the relative increase in the ASMSE caused by using this estimator instead of the optimal estimator (based on the true values of the variances) can be computed. It turns out that, except for very low values of hi, caused either by small values of a2 or by large values of the conditional variances, ai21XjEs, xi$, the relative increase is very small and it becomes smaller as the values of hi become larger. Thus, for 6 = 20 and differential second-stage sample sizes, the relative increase is less than 5.5 percent, while for 6 = 50 it is less than 3.1 percent, even for mi = 15, (i E S*). It is interesting to notice that the relative increase in the ASMSE becomes larger with n. The reason for this phenomenon is that for both the empirical estimators and for the optimal estimator, coefficients of nonsampled groups are estimated by the estimator of the expectation, which is obtained as a weighted average of the least square estimators pi, (i E S*). Although the weights differ from one estimator to the other, the estimators themselves are very similar. As a result, for small values of n, when most coefficients are estimated by the expectation estimator, the differences in ASMSE between the empirical estimators and the corresponding optimal estimators are very slight. When, on the other hand, n is large and most groups are sampled, the difference in ASMSE between the two estimators becomes sharper, although they are still, in general, very small. I11 It is instructive to compare the performance of all the empirical estimators considered in this study with that of the classical estimator. The empirical estimators turn out to be much more efficient, especially for large values of n and small values of hi, (i E S*). IV Comparison of the performance of the three empirical estimators themselves indicates that pw(e) is, in general, more efficient than both b,.(s) and pw(n)for small Table 1. Square Root of the Average Sample Mean Squared Error (ASMSE) of the Optimal Estimator and of the Empirical Estimators n 10 20 30 m, 6 Estimator 15 Varying 45 15 Varying 45 15 Varying 45 Pfeffermann and Nathan: Regression Analysis for Clustered Sampling values of 6, while fi,(n) is in most cases slightly better than the other two' when 6 is large. It shoul'd be noted also that the differences between fiw(n)and both fi,,(e) and fi,,!~) increase with n in a similar way to those betwee? P,,,(e) a@ fi,(a) for the same reason discussed in (11). PJe) and p,,(s) seem to be equally good, but a careful examination indicates that p,,(e) performs somewhat better than pw(s), as the differences between the values of h iof the different groups increase. Note that when all the values of h i are equal for i E S*, $2 and 6 * 2 are the same and so the two empirical estimators, pw(e) and fiw(s), coincide. Other empirical estimators could be considered, such as estimating the unknown variances by MINQUE estimators, but the small differences observed between the optimal estimators @,(a) and the empirical estimators fiw(e)seem to validate the procedure we adopted for estimating p,. Further research is required to compare this approach with other approaches, such as the Bayesian approach, using a well-defined joint prior distribution for the coefficients and the unknown variances. In particular, the robustness of the various approaches to deviations from the underlying model should be examined. [Received March 1979. Revised October 1980.1 REFERENCES ANDERSON, T.W. (1971), The Statistical Analysis of Time Series, New York: John Wiley. BOX, G.E.P., and TIAO, G.C. (1968), "Bayesian Estimation of Means for the Random Effect Model," Journal of the American Statistical Association, 63, 174-181. DUNCAN, D.B., and HORN, S.D. (1972), "Linear Dynamic Recursive Estimation From the Viewpoint of Regression Analysis," Journal of the American Statistical Association, 67, 815-822. ERICSON, W.A. (1969), "Subjective Bayesian Models in Sampling Finite Populations," Journal of the Royal Statistical Society, Ser. B, 31, 195-233. HAITOVSKY, Y. (1973), "Maximum Joint Probability Estimates of 689 the Linear Hierarchical Model," unpublished paper, Hebrew University. HARVILLE, D. (1976), "Extension of the Gauss Markov Theorem to Include the Estimation of Random Effects," Annals of Statistics, 4, 384-396. KISH, L. (1965), Survey Sampling, New York: John Wiley. KISH, L., and FRANKEL, M.R. (1974), "Inference From Complex Samples," Journal of the Royal Statistical Society, Ser. B., 36, 1-37. KONIJN, H. (1962), "Regression Analysis in Sample Surveys," Journal of the American Statistical Association, 57, 590-605. LINDLEY, D.V., and SMITH, A.F.M. (1972). "Bayes Estimates for the Linear Model," Journal of the Royal Statistical Society, Ser. B, 34, 1-18. PFEFFERMANN, D. (1978), "Regression Analysis for Complex Samples From Finite Populations," unpublished Ph.D. thesis, Hebrew University. PORTER, R.M. (1973), "On the Use of Survey Sample Weights in the Linear Model," Annals of Economic and Social Measurement, 2, 141-158. RAO, C.R. (1973), Linear Statistical Inference and Its Applications (2nd ed.), New York: John Wiley. (1976), "Characterization of Prior Distributions and Solution to a Compound Decision Problem," Annals of Statistics, 4, 823-835. SARNDAL, C.E. (1978), "Design Based and Model Based Inference in Survey Sampling," Scandinavian Journal of Statistics, 5, 27-52. SCOTT, A., and SMITH, T.M.F. (1969), "Estimation in Multistage Surveys," Journal of the American Statistical Association, 64, 830-840. SEDRANSK, J. (1977), "Sampling Problems in the Estimation of the Money Supply," Journal of the American Statistical Association, 72, 516-522. SMITH, T.M.F. (1976), "The Foundations of Survey Sampling. A Review," Journal of the Royal Statistical Society, Ser. A, 139, 183-195. SWAMY, P.A.V.B. (1970), "Efficient Inference in a Random Coefficient Regression Model," Econometrica, 38, 311-323. (1971), Statistical Inference in Random Coef3cient Regression Models, New York: Springer-Verlag. SWAMY, P.A.V.B., and MEHTA, J.S. (1975), "Bayesian and NonBayesian Analysis of Switching Regressions and of Random Coefficient Regression Models," Journal of the American Statistical Association, 70, 593-602. THEIL, H . (1971), Principles of Econometrics, New York: John Wiley. WILKS, S.S. (1962), Mathematical Statistics, New York: John Wiley. ZELLNER, A. (1962), "An Efficient Method for Estimating Seemingly Unrelated Regressions and Tests for Aggregation Bias," Journal of the American Statistical Association, 57, 348-368. (1966), "On the Aggregation Problem. A New Approach to a Troublesome Problem," Report # 6628, University of Chicago, Center for Mathematical Studies in Business and Economics.
© Copyright 2024