Chapter 20 I Generalized Estimating I Equations: Epileptic Seizures and Chemotherapy 10.1 Description of data 111 :t clinical trial reportt?rl by Thit11 x l d Vtiil (l990),59 ~ ~ L ~ P T with I~.s cpilt?la,v \wrc r;~r~dorrliif~rl l o goulxs rrer:eiving cit.11crt.11~anti-rpilcptic: I r i ~ gpvngx1,irle or a glaccl,o ill ~(irlitiur~ t o sr.andnrd clietr~otl~eropy. Tlie rlatnbcr of seiz111.mIVRS collntcd uwr four t.wo-wck pcriotlg. 111 addii,ion, n h~~cliu~c seizure r n t ~was rccnnlctl fur each pdicnt-, I)aswl ( ~ I t,hc I eight-wcclc prcrar~cln~~~i~~nt.ion seizure colait. Tllc age of each par i ~ n twas also rccordcd. Tlrr li~nixign~slinnof irllcrrsl is whet:tlsl' t h ~ :rretrt~cntpropibide rcduccs tlic Rtq11enc.v US epileptic sr?~ul.psconlrrareti wit11 piaccl~o. Tlre data are s h n ~ ~inn TnZ>Ic 10.1. (These dnta .+IKo appet~ri11 Hand t t 01.. 1994.) 4 Table 10.1 Data in e p i l . dta subj 1 ~rl 1U,I 31 5 L') 3 s:I vl tirnl Imc np 3 3 0 11 31 Table 10.1 Data in epil.dta (continued) 55221 3 . i 4 Y 1 18 32 10.2 Generalized estimating equations In this ckaptcr we mnsider an approach t o the analysis of longitudinal data that iis very tliff~rentfrom random d w t s modeling d~czibe(1in the prcvims tllaptcr. Instead of altempting to model the depcndenr~ hetnwn responses on the same kldividuals a arising from 1)etwmnwbjcct heterogcncity rapresented by random intercepts and posihly sandom slopes, wc will r o n c e ~ l t r t eon estimating the marginal mmn ;Iructure, treating the deperldcnce RS a ~ u i s a n c ~ . 10.2.1 N o m d b y Olistvibuted responses If we suypose that a nornlzlly distributed resprlnse is observed 011 each :ildividual at T time points, then the basic regessior~model for l o ~ y i il~dinaldata bemmes (cf. cquation (3.3)) :$,here y: = ( y t l , g =-. ~ ,.,yTT),a: = ( ~ ~ , , ~ i. 2. ,,E. , T ) , X. is a T x ( p + 1) I icsign matrix, ant1 @' = (A!. . . ,&) is B VCC~OP of regression pararrl-rcla. The residual tmms are assumed to haw a multivariate n o m d :istril>u~tioriwith a eovariancc matrix ol mme prtrtiwlar form that is a 5nctiorl of (hnp~fillly)n small nurr~bc~. of parmnetcr.s. Maximurr~likelimod extimation c m be uscd t o evtirnatn both thc parmeters in (10.1) and t,heparamctcrs st mi bur in^ the covariance matrix (detals are givcn :a Jennrirh and Schluchter, 1986). The ldler rut! often not of prirnaw :!trlecst (they arc often refarcd to as nuisance praramctcrs): hilt. using :romriarlcc rrlatrix that fails to match that of ehc repeated ImmureYents can lead t,o inefficient astirnntcs and invalid stmandad crmrs for -ZP pxdncters that am ol conctlrrl. nnmely the 0 in (10.1). If each non-replicated elcment of thc covariarice matrix ir tretktcd ac i separate pax-mchcr; giving M unstru~cturcdc m w h c c matrix, a11d if -7erc arc no mivliirig clnta, then this approach is eswr~tidlyc q u ~ v d e ~ l t - multivariate arlalysis of variance for longitudind data (see Everitt, 2 ~11).IIoa~evcr.iL is nftm more eficicnt to i m p w somr lneaningful --.-ucture or1 thc cmarimce matrix. Tllc simplpst ('and nrost nnreali* tic) sti'ticturc is fndependence with dl off-diagorixl cle~nrnts(~.hecowrianccs) ~ q u d t o zero, and t,yp~callyall diwouel clements (thc wiancs) cqnsl to each orher. Anotlrr cornmonly nscd simple structure, known as mmpcrzad t;yrmmetq (for examnple, see Wirier, Ig'il), rcquirps Llmt all eovrariances arc equal and all varianres are q a s l . This is just thc torrdation structure of n Iincar rmrlom intercept modcl described i11 the previous rhapt~scxeept tliat tlie random i n t e r r ~ prnodcl l also rcqr~ircs that the correlation hr positive. Othcr rorrelrltion structures ir~dudcai~toreh~essive htructilre where the ro~relatior~s dccreasc wit-h the distanr~bctvip~ntime points. Whatwpr the assumed wsrelntion stnlrture, all models may be eslimatcd by m a x i m ~ ~lmi k ~ lhoorl. i IJnrortiinately, it i s generally not straigl~tfurwardto specify a multimrfnte model For non-normal responses. One sol~ition,disaiwcd in the previous rhaptw: i s t o inducx rmidual d~penclmrfinrnong the respunss using randoin effccts. An dternativc approach is to give 111) f,hc idea of a modd altogether by uslnp; gmemZkm! cstimalir~gq ~ u t i o n (GEE) s a= introduced by Liang and Zeger (1986). GencraIixed e*,iinnting q n a tions nw ewntialfy a m ~ ~ l t i m r i ~ extension te of the quasi-likdihood approach cliscussd in Chapter 7 (sec also \l'edrlerhurn, 1974). In GEE the parametere arc e u t ~ m n t dusirig "c?timat,ing equations" raemlhing the s m r e equations for ntoximum likclihood estimation of the lir~eeat model dw,ribecl in tile previous section. 'These ati1rmti11.g c q ~ ~ a t i o only rcrluirc specification of a lirk and variantx: function md a correlation structure for the o h d resporiM cond~tiorid011 the covariate. A4 iu tllc qmf-likelihood approach, thc paranlcteM can he atirnated. erren if thc specification doer not. correspnnd t.o nxiy statistical rriodcl. The regressior~meflicient,~represent inargina! cffects, i.e., they d e tamins thc pop~~latiun averaged relationships. Liang and Zeg~r(1886 show that the estimates of thew coefficients arc valid even when tlw correlation structnr~is inmrrertly sprcificd. Corm* inferences can te obtnin~tlusing roh~iste s t i w t ~ of s the standard errors h n s d on thEsandm-icl~estimator for di~slereddata (c g., Uinder, 19883; Willima 2000). Thc parametem of thc corrchtion m<btrix, referrd tn as the wo~kzrcgcmelrahon mat% arc twatcd WH n ~ ~ w a n pammete~s. ~e HOAever, Lindsey aiid Lmnbert (1998) aud Crotichley and Davics (1999 point sul thnt estimates arc rir, l o r ~ p rc o ~ a i a e n tif c~cndogc~ous" ccvi~rixtcssi~chw baseline reuponsE5 arc in(t:ludecl in the model. Fortunatcly, inclusion of the bas~lincrespomy: ILS a c o m ~ i i ~ does t e yie!? coudstcnt cslimdttm of trraLtm~xlCefferts in c l i n i d Llhlnldaln such zi rha epilepsy data&cunsidmcd hcrc! (sne Croudllcy mid Davics, 19O'J) as long as t.he lnwdd does riot contnjn a hmlirlc by lrealinent uiterirctio11. T h e r ~are sollie inlportar~ldiifereilces hetween GEE and random rH'ccts mudcli~lg. First, whilc randwn effects modeling is ha%d un a 5~stisticxlmorlcl mrl typically rrlnxirn~~m likelihood estiara.tion, GEE is l a statistir:al model. Second! 1 a11 estimation rnctl~odthat is not b s ~ on ~ l w r cis an irnport,ant diFerence in Ll~eintwpi-etntion of the r e p s ,ion coefficients. In rmiclmn effmfa models, thc regression coefficierlls represent cvnndiilanaalnr srsbject-sp~~ifir: c f f ~ t for s given val~icsof the rat~donieffects. Fur GEE, on t11e other hand, t h ~ regcssion ! cot%r i ~ n t nrcprcwnt rnurginal or ylopulatiov~ areraged effects. As we saw ia (he thongllt disorder data in the p r w i o ~ achapter, cunditiotutl and marginal rslabiorishila can be v e v difbrent. Eithcr may be of intcr-.st: for insbancc palients arc IikFly to utimt. to h o w the subject-specific 4Xwt of treatments, whercas I ~ a d t ~ccnrwtrlisk~ h may be interested in :,opulatio~~ averltgwl cFeuts. Whr~casrandom efecls rr~odclsallow t ha ~rarginulrelntioltship t,o ljr? dcriwd, GEE does not allow dcrivntiori of -!lo conditiond ~*~laiionsl~ip. Not.c that conditio~ialand marginal rela-ionships arc the sarm if an idsritit,y link is used a~trrd,in the rn of :anrlom irilerccpt modcls (no random co&?fticieuis) : il a log link is spwi?.vrl (swUiagle e t id., 2002). Tl~irrl,GEE is oftmenp~cferrcdbecame, in : I I I ~ L Y ~ to % the rantloin effects apprued~, the piualneter cstirostea are 31nsih3cnt cver~i f t h e correlation strnbure i~rnissyecifid (nlthuugh -:!is is t , y l ~ c01i1.v if (;he mean structure is corr~xi.lyspecifid). Fourtll! y i l i l e maximurti likc!ihood estimation of a eorrcct.ly specified model is -311sist~iit if dittn arc missin: at. random (MAR), his is not the case for -:;EE rt-hich rcqrrircs t h a t rmponscs are ~niwingco~nplct,elyat rarldon~ SICAH); or tltfit*missingr~cssdepcnds only 0x1 the cnmriatcs irlcludcd -rthe mudel. See Hxrrlin and Fill>@(2002) for a thorough introduction - 1, GEE. 10.3 - Analysis using Stata gcnernlizcd estimating cquntions a p p r o x l ~a ;s askscribed ill Limig c.d Zpe;er (I986), is iinplcmcnt-ed ill Stala's xtgea cornrl~muld. The 3 i r i co~nponcnts 1v11ichImve to be spedlied are: -7.p I thc msunicd clistribution ol t.lic rwpoIise vnrixble (pj;irren ttic ar vnrintsu), qvxified in the family () option - this detcrmin~sthe varia.nce runctitni, t,hc link hc~wecntkic rmponse varial>le and its linear prcdi~tor, specifid in tlrc k i n k 0 nptioii. a ~ i d w the slruchw of the working corrciation matrix, specified in th-correlation0 option. lu general, it is not necessary to speciFy the link0 option sin=iw for thc glm corilrnand, the dcfa11It link is the ccannaical link for tb specifier! family. Since the xtgee cornmand will oRcn he mod with Ibe f amily (gauss option, tomther with the idcntitg link function, WY m-ill illustrate tb: option on thc pwt-na~aldepression data u s 4 in the prpvlous two c l m ~ ters hafore moving 011to ded with the epilepsy data in Tablc 10.1. 10.9.1 Post-natal dep~esszondata T11e data are ,rehtaincd using i n f i l e subj group depO depl dep2 dep3 dep4 dep5 dep6 / J / using deprsss.dat, clear reshape long dep, i(subj1 j ( v i s i t ) mvdecode -all, mv(-9) To-hegin, wc fit a model that regmscs dep on group. visit, the? intcrac.tion and v i s i t quared as in t,he previous eI1apt.m bul u n d r the unrealistic mumption of indepcnd~nrs.The necessary rommmc written out in i t s Tull~st,form is generate gr-vis = group*visit generate vis2 = v i ~ i t - 2 xtgee bep group v l s l t gr-vis visa, i(subj) t(visit1 corr (indep) link(iden) fmily{gauss) /// (see Display 10.1). Here, the fitted model is simply a ~ilult~iple r e p sin11rnodcl for 365 o k m t i o n s which are a s ~ u r n dttu bc i r ~ d ~ p e n d e r oforic a~~ot~her; the estimiltcd s r d e parameter is jmt tlic residual me= sguarc, and thc davimcc is equal lo the residrld slun of quww. P~stirnntedregraion coefficients ant1 their msodated standard ermindicate that the group by vissit irltcrartiorl is rignificant at thc 5 " Icvel. Hmc~ver,treating the observations m b~dcpendentis unrealisr and will almost wrtainly lead Lo poor =timat& or the standard error:. Stmdard errors for hetween-s~thjectfactors (hcrs group) W P likel~i he u n d c ~ ~ e s t m h chda c a ~ wrw are treating observations From t I c s a r snhjecl rls independent, t l ~ ~increasing ?. the apparent sample size; s t a d a d m r s for wiChin-~~~bject f&)n (here vlsit, p - v i s , and via: arc likely to be mresli~naterlsince n~ me not colltrolli~igfor residuz betwccn-suhjoct variability. Uk therefore auw abandon the ~ssurnptinrluf indaper~dcncem: =timate tl corrrlatio~)mntrix having compoilnd symmetry (i.c., mrstraining thc correlations h e t m ~ r nt,he observations at any pair of tip- :iE population-averaged model 3rmp variable: :mk: subj ~dentity Family : :orrelation: h b e r of obs M b e r of groups Obs par group: min = BY& = Gausalan iadepsndent Icale parameter: 26 89995 )tar80~ch12(356): ~ s p s r s i o n(Pearaon). 9578.17 26.89995 .390a383 ,579783 4.89 -- max Wald chi2(4) Prob > cbI2 Deviance Dispersion Q.000 .2340665 - = = n 366 61 2 5.8 7 269.51 0 . 0 0 ~ 9576.17 26.89935 .5468102 ?oints tu be cqual). Such a correlalion strulctume is spccificd using :orr(exchangeable), ol' the rrhhreviatcd form corr(exc). Thc model -an he fitted as Ibllows: group v i s i t vsis vis2, i(subj) t ( v l s i t ) / / / corr(exc) linkciden) famcgauss) Inbtead of speci%ing the sub.joct and time identifiers wing the op-:ms i 0 mid t 0 , we can also dcclnre the data as bcing of the f m n r t (for cross-ser:tior~al timc scries) ~q follows: xtgee dep i i s subj ti$ v i s l t -.:id ornil the i 0 and t 0 options from riow on. Since both tlw link =.:id thc fnmily corre~pordl o the ddauIt options. the saInc analysis y a y be carricd out, using the shorter corrrnl~id xtgee dep group v i s i t gr-vis v1s2, corr(exc) 4 c Display 10.2). Aftcr ~Lirnation,estat wcerrelation rcports the ~ t i i i ~ nuq~rkirig t~d "nil,lliuiS conelation matrix e s t a t wcorrelat ioa, format lZ6.4g) rhich is shown in Ilisplay 1U.3. Bcre t h e format0 option was wed to :--<luccthe niirnba or clecimrtl p h s and therefore avoid vm7sof the .cllrix wrapping over two lines. Sote t,ha+,the standn.d error fur group has jncrewcd wtlcreas t h ',r v i s i t , g i v i a , ald v i s 2 have dccremd as expcctcd. TIE est,i- a t 4 wit.hi11-subjcctcorrelatiur~rnalrix is corripound symrnet.rit:. This s t r u c t u ~ eis fi.eq11ently not acceptable ~ i n r:nrrel..littions c ~ b ~ t w e mp i i m of obmrmtior~swiddv separated in linw will oftcn be lo\vcr than ror obscrvatioris cloec~together. TIlis pallcrrl was nppnrcnt from Lhc s c a b tcrplot rnat.rix gix>c.cnin Chaptcr 8. To allorv Tor sue11 n pxilwn of corrctations among the repeat4 o h se~vatiorls.we cau ~noveto raatmgmaswe ~Lmck~m. FOPexnmglc, in a first-order autoregrewive sp~cifirationt11.e (:orreIatiorl betwerr~t h e points r m d s is assulncd lo bc p l ' - y l . The necessary ~nstructionfor fitting the prcviouslv considc~cdrnodcl bill with this first-ardor autoro :rcssir.e structurc for tt~ccorrelations is xtgee dep group v i ~ i tgr-vis vis2, corrlarl) S2E population-averaged model ;roup and time vars: Lank: ?m11y: :3rrelarion: Gaussian AR(1) - I Cosf. S t d . Err. a - 356 61 m u = = 213.85 W b e r of obs = Mumbsr of grwpa Obr per group. =in = wg ' Wald cbl2C41 Prob > chi2 27.10X8 ? t a l e pramster: dep subj v i ~ i t ~dentity Pzltl - I9SX Conf. 2 5.8 7 0.0000 Ineemall - Display 10.4 The estirnatm of the r e g w i o r ~cocfficiclii,~and tI~cirstandard ccrrors Ir 13isplay 10.4 have changd hilt riot substar~tially. The rslirrlntcd i t l ~ i n - s u b j e c tcorrelation matrix may again he obttained using estat ucorrelation, format(X6.4g) w Disl~lriy10.5) w l ~ i dhas ~ the expected pattern in which currela- ,ns dccrcasc suhtanlidly ns thc separation between the oh-tionti --r,.rekw*. Other wrrelat~onst.rrudm~sare available for xtgee, including the -?:ion correlation(unstructured~in which rio coristrair~tsare placed t11e mrrehtions. (This is esseutiallv qulvaler~tto ~uultivariateanal-:< of variance fur longitudinal data, cxccpt that Ihc wrin~lccis =--adt.0 he constant o r w time.) It might appear t h t ~ ~ s i this n g option 210 r A Handbook of Stalkvrad A n d kc 5 Usi~ay.Stah - -- P a t m a t e d within-subj correlat~oamatrix R: Display 10.5 would he t l ~ cx m t sensible one to chouss for all data set.8. This is not. however, the rax since i t necessitatw the esti~ntimntionof mauy nuisance paramet,rrs. This can, in some cireun~stmccs,rmsc prnblcms in the estimation of thwe pitramcks nf most inter&, particularly when the sanlplc size is small and the number of timP points in Tnrge. We now analyze the epilepsy dhta using a similar model as for tk depression rhhn, brlt using thc Poisson distribution and lug link. 7 3 data arc amiiabb in a Stattta Rlc epil .dta and can be rcatI using 1 use e p f l , clear LVc wiIl treat the hiweline rnaasurc EL? one of the rrspunsm: 1 generate yO = baseline I Sonit! ilsoful summy statistics c m b p obtnined using summarize yO y l y2 y3 y4 ~f treat==l (see Displays 10.6 and 10.7). We see that the number ool ohwmtions is co~rstantover time T them appears to be no rlrupout. The rncaus nnd standard deviatiof yo arc larger than for the nthcr responsw because seixur~qSF= counted over m %week period at baseline md orrev 2-week periods c the neyllhscquent visits. Tho largest ~ I I oC P y l in thc progabida seems out ofstep with h e other rriaxirnum v~luesarid mgy indicate z z 011t1ier. Snmc graphics of thc data mav be uscctil for investigating 1% possibility further: but fist it is rnnvcniant lo reshape the data frcm its present "wide" form tr) t,llc "long': Form. Wc riuw reshape the as follows: Grnemlrs~rlP~stmtnnlmgEpunlaonu F~iiqrltc.Sczm~wund Chprraothmzpp ------------------- W 211- Variable 31 31 Y4 8.580845 8.419955 8.129D32 6.709671 102 A5 72 63 18.24067 11.85986 13.89421 11.26408 reahape long y , i (subj) j (visiz) Sort subj treat visit list in 1/12, clean :ubj 1 visit 1 1 1 1 id 104 104 2 104 1 4 104 2 a iw 2 1 a 2 3 106 106 2 2 3 3 0 3 1 0 4 P 0 i y 11 6 3 3 3 traat 0 basellnu age 11 31 1 31 31 91 31 0 0 0 H il 3 o s 0 li 11 li 11 o 11 3 3 11 0 108 106 107 6 D 0 0 ior 2 a 2.0 30 11 30 30 30 6 25 a 2s Display 30.8 Pcrhaps the rnml useful grap11ic.ddisplay for invmtigating the data is a set of graphs uf indiv~dltdresporlse prufil~s.Since we are pjmnirig fit a Poisson nlodcl with the log link to the data, we takc thc log transformation I~eforeplotting the response profilm. (We need to add a p ~ i t i v elumber, say 1, hccausc some v i m ~ r eI : O U I I ~we ~ zero.) to generate ly = log(y+l) Howcvcr, thc bnsclinr mensure rpprpmnts yeizurc counts over an &week period, comparcd with 2-week periods for eadl of the olhcr time points. We tli~xeforad i a h the b l i ~ l count c by 4: replace ly = log(y/4+1) i f visit-0 md t h plot ~ t h c log-count,~: twoway connect ly v i s i t i f treat--0, bycsubj, /// styleIcmpact) pizleCMLogcount") twoway connect ly v i s i t i f treat==l, bycsubj, /// stylslccmpact)) yzitle("Log comt") Ttie resulting graphs are s h m in Fignres 10.1 a i d 10.2. There is nobviom improvcmcnt in the progtbide muup. Subjccl 49 hrul moepileptic fits overall t.hm any other subject and might perhaps be m:sidcrcd an outlier (.we Exercise 10.2). vkl Figure 10.2: Rcsyr>lxseprofiles in ttlc trcatnd group. -4s discuswl in Clinpt~r7. thc most plausibIe distribution for count ?acs ir: oRrn the Porswn rlistribut~nn The Paisbon distribution is spec- ~ in d xtgee lnoclds tlsir~gthe npt~onfamlly(poisson) Tha log link - impllcrl (since it is the cm~lnlrallink). The b d i n r counts were Ltdincd ovcl rn X-w~ckperiod whereas all suhsquent munts arc over rn~ck To model the wiznre uatr in courits per week, R P must - - e r e k t r ~IISP the log ohs-ervatian period log(p,) as nn u%et (a eovrtri<ttr :th reg~msion~~eRcicient set to 1). Tlw n1ude2for thc mnean count 0 , ~ ;:# r h a t the r ~ t is e modeled as -.- .\c cm compute the required offset ming I 1 generate Lnobs - cond(vxsit==O ,kn(8),In(2)) T > I ~ O W I I JD~igglc ef al. (2002). we will allow the log rntc to &mge -reatinant pm~rpsprcilcmnstrmttafter t l u bas~liiincr~s~ssrncnt. Phc --esqarp cm'ariats. ail iriclicator for the post-hascline tisits and M I interaction betweerr that indicmr and treatment group, are created using generate post = visit>O generate tr-post = trsat*post We wilI also nuntrul for thc agc of the patients. The summary tables for thc scixure data given w page 210 provide strong smpi~icnlevidence t h a t there is ovcrdispersiun (the w i a ~ c e are s greater *.tian the mcans). and this rAn be incorporntcd using the scale(x2) option to allow the dispersion parmeter 4 t o bc csli~nated(see also Chapter 7). iis subj age treat post tr-post, corr(exc) ZamilyIpoia) /// off set(lnobs) scale(x2) xtgse y GEE ppulatada-eusrqed m d e l Group variable' Link. Famlly: Corralation: 18.48008 -.0i7737B .ll07981 -.1036807 ,0148614 -201945 .ISOW35 ,213317 2.265255 ,4400816 -.0322513 tr-poat _cons lnobs (Standard error. -max -- I d e r of oba Number of g r w p s = Obs psr @oup: min aVg exchangeable Scale parameter: treat mbj log Paisaon w a d caia(r) &ah > chi2 -.061385 -.a135922 -2.17 -0.09 0.Om 0.930 0.74 0 460 -.I83321 -0.49 5.16 0.627 -.57.17742 0.000 1.402711 296 59 5 5.0 6 5.43 0.2458 -.0031176 ,3780176 ,4049173 ,31441s 3.12TS Iof fret) scaled using aquara root of Pearson X2-based B ~ s p e r s ~ o o l Display 10.9 T h c ou tpnt assllming WI exchangeable correlation slructnre is gi~~"in Display 10.9, and thc cstimatwl rorrelatiu~lrr~alrixis ohtained usitc xtcorr. (see Display 110.10). Tn Display 10.9, the parameter 4 is evtirmtcd a? 18.5,i n d i c a t k W V P ~ Pov~rdinpersiii~u i11 Lhex data. We briefly illu~trat~e how import%i t m y t o dlw for ovcrdispersion by omitting Ihe scale(x2) optior.: Fstmated w l t h l r s u b j correlation matrix A : Display 10.10 xtgee y age treat post tr-post, corr(exc) family(pois) / / / off set(lnobs) GEE populmt~oo-averagedmodel Group variable. link. ?amlly: Correlat3on: Scale parwmster: Number o f aubj log Poisson exchangeable 1. O ~ B Number of groups Obs p e r group min avs max Wald df214) Pxob > cba2 --- 295 59 5 ' 6.0 6 = 100.38 = I) 0000 Display 10.11 Thc rcsulcs givc11i n Display 10.11 show that t h e stmdard errws me ror~murh smaller than before. Ewn if werdisperuion had nut been saw ~er.twI,this error co11Id have h e w detected lr, using the vce(robust1 .~ption(.we Chapter 7): xtges y age treat post tr-post, corr(exc) iamily(pois) /// o f f s e t Ilnobs) vce(robust) Tlic results of thc rohust rcgrrssion in Display 10.12 are r ~ m a r k a b l y thosc of thc ovrr.dispersed Poimison modpi. st~ggwtingtlxit. -7.p lntlcr ISa ~caso~iable .'lnode1" for the data. - n~ilnrto flumber o f obs Nmber of group8 GEE populatlm-amrag& model subj log Group variable. Link: P5IUlly: Co~rePatiw: PO~S~OE exchangeable Scale parameter: Y = = Obs per group: m i n = Wg ' max = Wald c h i 2 ( 4 ) - 295 59 5 5.0 5 %.B5 1 Pro8 > chi2 = 0.1442 (Std. Err. adjusted f o r clwtsring on smbj) Coef. Semi-robust Std. F.rr. z P>lal 195X Ccnf . ~ntervall w txaat post Cr-DOSt Display 10.12 The estimated coefficient of rr-post represcnh the estimated differcncc in the c h m g ~in log seivurc rate from bnsclinc to post r a n domizati~nbctwcen the plareho and progahide groiips. In thc placebr group thwe itr or1 incrcasc in t,he log seizurc rate of O.ll(lX, nntl in the prngabide g o u p thcrc is an inmaue of only 0.007 (= 0.11118 - .103il. Hmwer, t11e di ffcrence is not ~igriificai~t (p= 0.68). The exponential a! the interaction coeffiri~ntgiw an estimated incidence rate ratio, hem the ratio of the rclatiw increase in seizlrrc rate for the treated pahienrs cornpard with the cor~tmlpatients. The rxl>oncntirttedcorffic:ier~taui_ the corresponding nnlfi(lence i n t c m l can he obtained directly ilsinp thc ef o m option in xtgee: treat post tr-post, corr(exc) family (poi.) off s e t (lnobs) scale(x2) e f o m xtgea y age /// The r ~ s t ~ lin t s DispIny 110.13 indicate that the relative increase k scizurc rate is 10%lunw irl thc trcatetl guuy compared with the contmr group, with a 95% cullfidcncc interval frorrl41X lower t.o 37% sate. Ijowwer, before inLc~.gre~ing theue est imaws, we shuul~iperforsome d i a p ~ ~ t i c sStandardizarl . Pcnrson rmiduals can bc useful fir ident,ifying potcntiat olitliers ( w e equation (7.9)). Tliesc can be founi by first llhiug the predict cnmnland to o b t ~ i npredicted munts, su-trncting the o h e n d counts, and dividirg by the estimat~txds t n n d a ~ deviation where 8 is thc -timated dispcrsioa parameter: G, quietly xtgee y treat baseline age viait, corr(exc) / / / GEE populatrcm-averaged m&l Group u e r ~ a b l e : Link. Family : Correlation: subj log Polrsnn sxchmgeable Scale parameter: 18.43008 .Q6SZ632 age treat tr-post 98237 1.117168 .9016131 lnobr ( o f f set> paat ,0143927 .1983847 3676464 ,192308 -2.17 -0.04 0.74 -0.49 -- Ihrmber of oh6 = Humber of p~oups G%s pmr p a u p . min = avg max = Vald chla(4) R o b > chi2 0.030 0.930 0.460 0.627 285 59 5 5.0 6 5.43 0.2458 .9404613 .6612706 .W4&9873 3.4593B9 .8325009 ,5934667 1.499179 1 369456 arrors scaled uslng a p r a root of Pearson Ka-based diaperslon) (Stan- family(pois) scaleIx2) predict pred, mu generate pres = (y-pred)/sq?A (e(chi2_dis)*pred) Boxp1c)ts of thrse resid~~als at endl vlsit are obtaincd rwinl: s o r t visit graph box stpres, medtype(line1 ovet(visit, /// relabel(1 "visit 1" 2 "visit 2" 3 "visit 3" /// 4 " v i s i t 4")) The resulting graph is sllown in Figure 10.3. Pearson residuals grcnter illan 4 are c c r t i n l y a camp for concern, so we Can check which &i~bjects -hey belong to r~sing list subj id ifstpres>4 I I subj ~d I iubjcct 49 appears t o he an outlier due to rxtrcrriely large mcnnnts ar .wwin Fiplrc 10.2. SuRject 25 also has an i~~itl.lmuaually lwgc count at, -.+it 3. It woulrl be a good idcn to repeat, lhe analysis without ni~bject :!I t o sce how much the I T ' S I ~ ~ P are I affmtdby this unusud wll>ject,(sce :sercisc 10.2). This [:an be vjebveved as a semiti~xtgandvbi~. - ID - P - N - a - m S z rn i * &&&€ I N. 0 1 2 3 4 Figurc 10.3:St~ildnrdiicdPcmun residuals. 10.4 Exercises 10.1 - Treatment of post-natal depression 1. For the depression data, rompare the rcsults of GEE wirk a compouttd spimefric structure with orlfinxy linear IF gression where standart! crrors ace corrwted for the withi* subject corrclatior~using: a. the options, vceIrobust) cluster(aubj), tu obtain tk sandwich estimator for chrst~reddata (sw hclp for regre- 4 wti b. bootstrappirg,hy ampling mbjecb~vithrmplnremcnt. T b i ]nay he achlewd 11s1ngG ~ cbootstrap prefix, t o # c t k with thc option cluster Csubj 1. 10.2 Epibptic sei~uresand chemotherapy 1. Explorc othrr possiblc correlation str~lrtilresfor Ihe s e b data in t.he eontcxt of a Priswn rnodcl. Exnsnixle the rob-b-tandmd ~ r 1 . oin~ewh ~ c m . 2. Repeat the above a~mlyses,but excluding subject 49 (a% a p p m t o he a11oullicr). Complirp the rasulls. 10.3 Thought disorder and schizophrenia 1. f i r Ihc Lhought disorder data dismEserl i11 the previous dlag ter, estimate the effect of early, month their interartion on thc logit of though:hl disorder i~singGEE uith an exchmg+ ablc corrclntion structure. USP r o h w t standard errors. 2. Interpret the estimates. 3. Plot tl~eprediciedprobability over time for early o m . t t uvman (using graph twoway function, see Section R.3.2). and comyare the curve with thc curves in Fignre 9.10. 10.4 Driver education In a randoinized experiment, to iuvestigate if drivcr education reduces tlic numhcr ol oollisionq and trfiffic. violatiurn of teer~qvrs (Stork et n l . 1983). aligilde h i ~ h5sch001 students were randomi z d to thrw gmilps: safe perfumsnw curriculum (SPC),p w drivcr license curriculnm (PDL), ar~dcor~trol.\t%crca Lhc SPC w~ a 70-hoilr state of-the-art program; the PDL wxs a 30-hoisr cour>woontaining only the minimum training required to pms thc driving test. The control g r r ~ i ~receil~ed p no training Lhrough the sclwol system and w ~ taught 5 by t,he parerits mcl/or private training schools only. During thrcc yeaw of follow-lip, the otr rurrence of collirionv and moving violat~onswere nht.ainer1 \]sing r ~ o r d from s the stC1tcDepartment, of hlntor Vehicles. (The data are fro~nDavis, 2002.) The wia1)Ies in drivers-dta are: program: group (strlugs w i a b l c with valucs SPC, PDF, and Conbrol) m gender: gmder (string rwiahlc with v a l ~ Male ~ ~ s and Fe~nnlt?) Icolt to co13: i11dica;tor For al lexst one mllihion or moving vlolatirm dnririg years 1 Lo 3 I num: number of times thc rcsponsccovariate pattern o c r ~ n e r l I. Prepare the data for mdj~isusing GEE. [Hint: make sure to expand the data first using expand num, then rmhape t r l long.) 2. Investigate t h e effect or lime, program, gender, and the progrtvn by gender irltcraclio~ion the odds of at least oue collision or moving vinlat,ion using gcncralizcd eatilnxting equations wilh n, logii, link and unst,ructured corrclntio~is. Usc robust starrdord el rors t.11roughout this exercise. 3. Pcrforrri Q \C7a1d Ccst for the interaction tcrms and rcmove thcm it Ihc tcst is not significant at ttrc 5% 3cvcT. 4. Ry inspecting the elirnatd wrrelittioii matrix, c h o m tb correlat~onst,ructmethat mppem to be m w t appropriate wtimat~t.he model with that correhtion struct,wc. 5. Interpret tlic odds ratio entimatw for thc firial n~oclel.
© Copyright 2025