Data-Driven Percentile Modified Two-sample Wilcoxon Test O. Thasa ∗and J.C.W. Raynerb a Dept. of Applied Mathematics, Biometrics and Process Control, Ghent University, Belgium b School of Mathematical and Physical Sciences, University of Newcastle, Callaghan, NSW 2308, Australia ∗ Corresponding author: Olivier Thas, Department of Applied Mathematics, Biometrics and Process Control, Ghent University, Belgium (e-mail: olivier.thas@UGent.be) 1 Summary The two-sample Wilcoxon rank sum statistic can be derived as the first component of the Pearson chi-squared statistic in a particularly constructed contingency table. For this test a “percentile modification” has been proposed, which is equivalent to splitting the contingency table into two independent subtables, and computing the Wilcoxon statistic on one of the subtables. Although this procedure does not use all data in the sample, it often results in a power increase. The splitting position is determined by an arbitrarily chosen trimming proportion p. We first illustrate that an inappropriate choice of p may result in substantial power loss. To circumvent this problem, we propose a new test statistic by using a data-dependent choice for p, say pˆ. We show that its asymptotic null distribution is the supremum of a time-transformed Brownian Motion. At the rejection of the null hypothesis, informative conclusions may be obtained by conditioning on pˆ. In a simulation study it is shown that our solution results in a power advantage for some particular alternatives. Also, instead of using only one subtable, we suggest to compute the Wilcoxon statistic on both subtables, and to consider their sum as a new test statistic, which we consider as a recomposition of statistics, rather than a decomposition. keywords: contingency table, Pearson chi-squared, Brownian Motion 2 1 INTRODUCTION The Wilcoxon rank sum test (Wilcoxon, 1945) is probably the most common nonparametrical test for testing the two-sample location problem H0 : F1 (x) = F2 (x) for all x (1) H1 : F1 (x) = F2 (x − ∆) for all x, (2) versus where ∆ 6= 0 measures the location shift. This formulation of the two-sample location problem is generally referred to as the location-shift model. Under the restriction that F1 and F2 have the same “shape”, the location-shift model implies that the hypotheses are equivalent to H0 : ∆ = 0 versus H1 : ∆ 6= 0, where ∆ is now the difference in means (medians). Although most text books present the location-shift model to introduce the Wilcoxon rank sum test, this test actually tests the null hypothesis H0 : P [X < Y ] = 1/2, where X and Y have distribution functions F1 and F2 , respectively, without imposing any further restrictions on their shape. Note that F1 = F2 implies P [X < Y ] = 1/2, but the reverse is not necesarilly true. Some authors have argued (see Hollander & Wolfe, 1999, Chapter 4, for an overview) that a conclusion as e.g. P [X < Y ] > 1/2 is more informative than a conclusion about differences in means. For example, suppose X and Y are responses under placebo and active treatment, respectively. Then, rejecting H0 in favour of P [X < Y ] > 0.5, implies that it is more likely that giving an active treatment to a patient will result in a higher response than when the patient would have been given placebo. Some people prefer conlclusions in terms of stochastic ordenings of F1 and F2 , i.e. they want to test F1 = F2 against F1 (x) < F2 (x) (F1 (x) > F2 (x)) for all x. This is referred to as “X is stochastically larger (smaller) than Y ”. Evidently, this further implies P [X < Y ] < 1/2 (P [X < Y ] > 1/2), but this is not necessarily true in the other direction. As mentioned before, the most common nonparametrical test for testing H0 is the Wilcoxon rank sum test (Wilcoxon, 1945). Rayner and Best (2001) have shown how this statistic is obtained as the first component of Pearson’s χ2 statistic computed on a particularly constructed contingency table. This is summarized in Section 2. 1 Gastwirth (1965) has proposed a percentile modified two-sample Wilcoxon test. Basically, his modification consists in pooling the data of the two samples, and performing the Wilcoxon test on only a fraction of the data, where the fraction is determined by percentages r and p of the most extreme small and large data points in the pooled sample, respectively. In this paper, we set r = 0, i.e. the fraction only refers to a portion of extreme large observations. This trimming proportion p must be determined a priori. Although in this way, a part of the data is actually discarded, Gaswirth has shown that in many occasions a power gain is obtained. However, if p is chosen badly, then there is a risk of loosing power. In Section 3 we show that this modification is basically the same as splitting the contingency table and using only one part for further calculations. We also illustrate the meaning of p and argue that an appropriate value for p may be estimated from the data, say pˆ, so that the percentile modified Wilcoxon statistic based on pˆ is maximized. Also this pˆ has an interesting interpretation, and the maximized Wilcoxon statistic may be used as a test statistic. In Section 4 we give the asymptotic null distribution. The results of a small simulation study are presented in Section 5. 2 The Contingency Table Approach Let X1 , . . . , Xn1 and Y1 , . . . , Yn2 denote the sample observations from F1 and F2 , respectively, and let n = n1 + n2 , the total sample size. The pooled sample observations are denoted by Zi , and the corresponding order statistics by Z(i) (i = 1, . . . , n). A binary variable S is defined as an indicator for the original sample, i.e. Si = 1 if Zi is an observation from the first sample, and Si = 2 if Zi is originally from the second sample. Similarly, S(i) refers to Z(i) . The two-sample Wilcoxon rank sum statistic is then given by Wn = n X jI(Z(j) ), (3) j=1 where I(Z(j) ) = if S(j) = 1 . if S(j) = 2 1 0 (4) A 2 × n contingency table, say {Nij }, is constructed as follows (this exposition follows from Rayner and Best (2001)). Let N1j = I(Z(j) ) and N2j = 1 − I(Z(i) ) 2 (j = 1, . . . , n). Note that by construction the n column totals N.j = N1j + N P2jn (j = 1, . . . , n) are fixed and all equal to one. The two row totals Ni. = j=1 Nij = ni are the number of observations in the first (i = 1) and in the second (i = 2) sample. Under the two-sample null hypothesis, the two column distributions {N1j }j and {N2j }j are equal to the marginal distribution, which is the uniform. Therefore, Pearson’s X 2 test seems an appropriate choice of test statistic. In the contingency table Pearson’s X 2 statistic for independence (or homogeneity of column distributions) is given by 2 X n X (Nij − Ni. /n)2 2 . Xn = N /n i. i=1 j=1 From Rayner and Best (2001), Pearson’s X 2 statistic satisfies Xn2 = n−1 X n−1 X Uj2 = j=1 Vjt Vj , j=1 where Vjt = (V1j , V2j ) and Vij = −1/2 Ni. n X Nik gj (k). k=1 Here, the {gj } are orthonormal polynomials on the marginal column distribution, which is, by construction, the uniform discrete distribution {n−1 , n−1 , . . . , n−1 }. We take gn (k) = 1 for all k = 1, . . . , n, so that for s = 1, . . . , n − 1, gs (.) is a polynomial of degree s. In particular, the degree one polynomial is r 12 n+1 k− . g1 (k) = n2 − 1 2 As the first component we find Wn − U12 = V1t V1 = q 2 N1. (n+1) 2 , (5) N1. N2. (n+1) 12 which is exactly the squared standardized Wilcoxon rank sum statistic, which has asymptotically a χ21 distribution under H0 . Moreover, the asymptotic null 3 distribution of U12 follows also directly from the general theory given in Rayner and Best (2001). The second component of Xn2 is 2 Pn n2 −1 n+1 2 √ − 12 N1. k=1 N1k k − 2 , U22 = V2t V2 = n q 1 2 2 N N (n − 1)(n − 4) 180 1. 2. (6) which is exactly the square of the standardized Mood statistic for testing equality of dispersion. The third component U32 = V3t V3 of Xn2 , is given by √ U32 = V3t V3 = n Pn n+1 k=1 N1k k − 2 q 3 − 3n2 −7 20 Pn k=1 N1k k − (n2 −1)(n2 −4)(n2 −9)N1. N2. 2800 n+1 2 2 . (7) They all have an asymptotic χ21 null distribution. [something about the relation with two-sample tests for dispersion and skewness with reference to (Eubank, LaRiccia, & Rosenstein, 1987)] 3 3.1 The Percentile Modification Percentile Modified Statistics Gastwirth (1965) proposed a percentile modification to increase the power of twosample tests. He claims that differences between distributions or samples are often masked when all observations are considered. In particular, differences become sometimes more apparent in the tails of the distribution. Therefore, his suggestion is to remove a fraction of the sample observations and to apply the Wilcoxon test on the remaining observations. The fraction of the observations to be discarded is denoted by p, and let np denote the largest integer smaller than or equal to pn. Given the original order statistics Z(i) , we define the lower observations as Z(1) , . . . , Z(np ) , and the upper observations as Z(np +1) , . . . , Z(n) . The construction of the contingency table {Nij }i=1,2;j=1,...,n immediately implies that the lower observations completely de4 termine the left part of the table, {Nij }i=1,2;j=1,...,np , and, similarly, the upper observations determine the right part of the table, {Nij }i=1,2;j=np +1,...,n . The row totals are denoted by Ni.low = Ni.low (p) and Ni.up = Ni.up (p) (i = 1, 2), respectively. For notational comfort, the dependence on p will be omitted unless it is needed. We will continue with the contingency table corresponding to the upper observations and give these upper observations ranks 1, . . . , n − np . In this table the theory of Section 2 applies. Hence, the first component of Pearson’s X 2 statistic is the square of the standardized conditional Wilcoxon rank sum statistic, based on the upper subsample. Conditional refers to the conditioning on the row totals of the selected subtable. In particular, the conditional Wilcoxon statistic is given by n X up up W = W (p) = (j − np )I(Z(j) ), (8) j=np +1 and the first component is the square of T up W up − 21 N1.up (nq + 1) , = T (p) = q up up 1 N N (n + 1) q 2. 12 1. up (9) where nq = n − np is the number of upper observations in the sample. Note that W up (0) is exactly the Wilcoxon statistic. To apply the general distribution theory of Pearson components (Rayner & Best, 2001), it must be assured that nq → ∞ as n → ∞. Therefore, we will assume that p < 1, which is an obvious restriction from a ‘data’-analysis point of view. Under this condition it follows immediately that T up has asymptotically a standard normal null distribution. It may happen that one believes that the difference between F1 and F2 will occur in the lower part. Then, the lower contingency table may be used to construct a statistic similar to T up . Since the construction is basically exactly the same, we will give no further details. Let T low = T low (p) denote this statistic. Note that also T low has an asymptotic standard normal null distribution, provided that p > 0. Since T low and T up are calculated on mutually distinct subsamples, it may be informative to combine them both into one test statistic, T tot (p) = T up (p)2 + T low (p)2 . (10) Equation 10 looks like a decomposition of the T tot (p) statistic, though we pefer to look at it as a recomposition of the statistics T up (p) and T low (p). 5 3.2 Profiles The distribution theory of the percentile modified Wilcoxon test only holds when p is fixed, i.e. when p is chosen prior to the observation of the data. Of course, it may often happen that p is chosen inappropriately, resulting in a low power test. In this section, we propose to compute a profile of the percentile modified Wilcoxon test. A profile is simply T up (p) considered as a function of the trimming proportion p. Since each p corresponds to exactly one threshold value Z(np ) , the profile can also be defined as T up (p(z)) with p(z) the smallest p such that p(Znp ) ≤ p. In a similar way T low (p(z)) and T tot (p(z)) are the profiles of the T low and T tot statistics. We argue that profile plots are informative exploratory tools, but we will also use them here to illustrate that some choices of the trimming proportion p may result in poor tests. As an example, we consider two samples of 50 observations each. Figure 1 shows the EDF’s of the two samples and the three profile plots for T up , T low and T tot . The EDF’s clearly show that the shapes of the two distributions are quite different. The profile plots show for each z (or corresponding p) the value of the percentile modified test statistic. For each fixed p, the level α = 0.05 critical value of T up (p) and T low (p) is 1.96 (two-sided; asymptotic approximation). From the profile plot of T up , it is seen that significance would only be obtained with 1 < z < 4, and, similarly, for T low significance only occurs for 2.5 < z < 4. For each p, T tot (p) is asymptotically χ22 distributed. Thus, significance would result from T tot (p) > 5.99. This happens when 2 < z < 4. These plots clearly illustrate that the final results of the percentile modified tests depend heavily on the choice of p. The profile plots also suggest another approach: define a test statistic as the maximum absolute value of the profile over p ∈ (0, 1). This is further discussed in the next section. 4 The Supremum Statistic 4.1 The Test Statistics and Asymptotic Null Distributions Define Snup = sup |Tnup (p)|, p 6 (11) ●● 1 ● ● ● −1 ●● −5 profile 5 10 15 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● 0 ●●● ● ●●● 5 10 z lower total 35 z ●●● ● ●●● ●● ● ● 5 10 15 ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● 0 z ●● 15 ● ●● 15 ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● 25 ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● 0 ● ●● 0 5 2 1 −1 profile 3 4 5 0 ● ● ●● −3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● upper profile 0.4 0.0 Fn(x) 0.8 EDF ●●● ● ●●● 5 10 ●● ● ● ●● 15 z Figure 1: Example data set. Upper left pannel: EDF’s of the two samples; upper right pannel: profile of T up ; lower left pannel: profile of T low ; lower right pannel: profile of T tot . 7 and denote pˆ for the value of p for which the supremum is obtained. In a similar way Snlow = supp |Tnlow (p)| and Sntot = supp Tntot (p) are defined. Of course the asymptotic null distribution of these new test statistics will no longer be standard normal. In the following theorem the asymptotic null distribution is discussed. In Figure 1 the pˆ corresponding to the maxima are indicated by vertical lines. It is interesting to note that the positions of these lines more or less coincide with the crossing of the two EDF’s, and that they approximately devide the z-range into parts with conditional stochastic ordenings. The asymptotic null distribution of Snup is given in the next theorem. The proof is given in appendix A. Theorem 1 Let 0 < θ < 1, Ψ(p) = (1 − p)3 , and let S(t) denote a Brownian motion on [Ψ(θ), 1]. Then, as n → ∞ (a) w S ◦Ψ T up −→ 1/2 Ψ (b) S(t) d up up Sn = sup |T (p)| −→ sup 1/2 0≤p≤θ Ψ(θ)≤t≤1 t Note that the restriction p ≤ θ < 1 expresses the condition that there should be observations in the upper sample. A very similar theorem follows immediately for T low and Snlow . Once the weak limits of T up and T low are known, the following theorem follows almost immediately (an outline of the proof is given in appendix B). Theorem 2 Let 0 < θ < 12 , Ψup (p) = (1 − p)3 , Ψlow (p) = p3 , and let S(t) denote a Brownian motion on [Ψlow (θ), Ψup (θ)]. Then, as n → ∞ (a) ! ! T tot w −→ 2 S ◦ Ψup + 1/2 Ψup 2 S ◦ Ψlow 1/2 Ψlow (b) Sntot = d sup T tot (p) −→ θ≤p≤1−θ " sup θ≤p≤1−θ 8 S(Ψup (p)) Ψup (p)1/2 2 + S(Ψlow (p)) Ψlow (p)1/2 2 # Table 1: Simulated critical values for the Snup (Snlow ) and Sntot statistic Sntot n1 = n2 Snup (Snlow ) α = 0.01 α = 0.05 α = 0.10 α = 0.01 α = 0.05 α = 0.10 10 2.8284 2.4056 2.2478 9.9709 8.0404 7.1333 25 3.1720 2.7045 2.5311 12.8290 9.8143 8.7879 50 3.3526 2.8999 2.6734 14.2166 11.1685 9.7863 ∞ 3.7488 3.2341 3.0111 Simulations of Snup under H0 for several sample sizes n1 and n2 have indicated that the convergence to the limiting null distribution is quite slow. Therefore, we propose to work with the Monte Carlo approximation to the exact null distribution. For sample sizes (n1 = n2 ) n1 = 10, n1 = 25 and n1 = 50, and levels α = 0.01, α = 0.05 and α = 0.10, the simulated and asymptotic critical values are given in Table 1. All estimates are based on 100,000 simulation runs. In the example presented in Figure 1 both samples had 50 observations. The critical value of Snlow and Snup at α = 0.05 is 2.9 (Table 1), and for Sntot we find 11.2. Using these critical values as thresholds in the profile plots of Figure 1, we find that all three sup-tests result in a rejection of the null hypothesis H0 : P [X < Y ] = 0.5. 4.2 Interpretation of Alternatives As mentioned in the introduction, the Wilcoxon rank sum test tests the null hy1 +1 ˆ = n11n2 W − n2n pothesis H0 : P [X < Y ] = 21 . It is also interesting to note that π 2 is an unbiased estimator of P [X < Y ]. This is a direct consequence of the relationship between the Wilcoxon and the Mann-Whitney statistics. Thus, at the rejection of H0 , π ˆ may be directly used to formulate a conclusion. In many practical situations P [X < Y ] is close to 0.5, but a closer look at the data suggests that this is caused by computing P [X < Y ] as an integral over the whole support of the distributions F1 and F2 . For example, the two empirical distribution functions (EDF) in Figure 2 show samples from two normal distributions, both with mean 2, but the data from the first group has standard deviation 1, whereas the standard deviation in the second population is 0.2. The traditional 9 1.0 EDF 0.2 0.4 Fn(x) 0.6 0.8 group 1 group 2 0.0 ● 0 ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 3 4 x Figure 2: EDF of two samples from normal distributions with means equal to 2, and standard deviations equal to 1 (group 1) and 2 (group 2) two-sample Wilcoxon rank sum test gives a p-value of 0.46, so that it is concluded that P [X < Y ] = 0.5. However, Figure 2 indicates that small values of x are more likely to result from group 1. When x < 2, the latter statement is true with probability close to one! Also, when x > 2, it is with very high probability that the response of group 2 is smaller than the respose of group 1. This is exactly what the percentile modified Wilcoxon test of Gastwirth (1965) could have detected if p was chosen such that a fraction p of the pooled sample data is smaller than 2 (p approximately 0.5). With such a p, the null hypothesis of the percentile modified Wilcoxon test is H0 : P [X < Y |X, Y > 2] = 0.5, and for the example data set this test rejects H0 with a p-value of zero, resulting in the conlcusion P [X < Y |X, Y > 2] <<< 0.5. With the same trimming proportion p, a percentile modified Wilcoxon test could have been computed on the lower instead of the upper observations. Also here the null hypothesis is strongly rejected, resulting in the conclusion P [X < Y |X, Y < 2] >>> 0.5. 10 5 Simulation Study We limit the simulation study to some specific alternatives which may be of particular interest. For instance, two distributions which show conditional stochastic ordenings. As a benchmark comparison we also include some simulations under the location-shift model. Although there are many tests described in the literature for the two-sample problem, we only include the Wilcoxon rank sum test and the two-sample t-test. The latter is only considered because it is often (mis)used in practice. All tests are performed at the 5% level of significance and all powers are estimated based on 10.000 simulation runs. Sample sizes of n1 = n2 = 20 and n1 = n2 = 50 are considered. As a first alternative, two normal populations in a location-shift model are considered. The first is standard normal, and the second has also variance 1 and a varying mean from 1 to 1.5 in steps of 0.25. Figure 3 shows the EDF’s and profile plots of a random sample of n1 = n2 = 20 observations. As a second alternative, also two normal distributions are considered. Again, the first is standard normal, and the second has mean 1 and a varying standard deviation from 1 to 3 in steps of 0.5. Figure 4 shows the EDF’s and profile plots of a random sample of n1 = n2 = 20 observations. Finally, two exponential distributions are considered. The first has rate 1, and the second has a varying rate between 1 and 3 in steps of 0.5. Since the rate parameter influences both the mean and the shape in general, the samples from these distributions are corrected for the mean, i.e. the theoretical mean (inverse of the rate) is substracted from each sample. An example is presented in Figure 5. The results of the simulation study are presented in Figures 6, 7 and 8. For the location-shift alternative, Figure 6 shows that the best powers are obtained with the t-test, very closely followed by the Wilcoxon test. This is as expected, for the simulations were performed under all optimality conditions for the t-test. The powers of the three new tests are smaller, but no power break down is observed. The powers of the new tests are quite similar. Under the second alternative (Figure 7), the powers of the t-test and the Wilcoxon test decrease as the standard deviation of the second normal distribution is increased. The opposite behaviour is observed for the Snup and Sntot tests. The power of the Snlow test is generally low. Finally, Figure 8 shows very large powers for the Snlow and Sntot tests, moderate powers for the Snup test, low powers Wilcoxon test, and virtually no power for the 11 −1 −2 profile −3 1 ● ●● ● ● ● ● ● ●● 1 ● 0 ● ● ● 2 ● ● ● 3 ● ● ● ● −1 0 ● ● ● ●● ● ● ● ● ●● ● ● ●● ●● ● 0 1 ● ● ● ● ● 2 3 ● ● ● ● ● ● ● ● ● 2 3 ● ● ● ● ● 2 3 ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● −1 z ● ● ●● ● ● ● 1 total ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ●● lower ● −2 ● ● ● ● ● ● ● ● ● z ● ●● −1 ● ● z −3 profile 0 ●● ● ● −1 0 −1 ●● ● ● ● ● ● ● ● ● ● 12 ● ● ● ● ● ● 2 4 6 8 ● ● ● ● ● ● ● ● ● ● ● upper profile 0.4 0.0 Fn(x) 0.8 EDF 0 1 z Figure 3: Example data set (two normal distributions with variance 1, and means 0 and 1, respectively). Upper left pannel: EDF’s of the two samples; upper right pannel: profile of T up ; lower left pannel: profile of T low ; lower right pannel: profile of T tot . 12 −2 0 2 −0.5 0 −1.5 −2.5 total −2 0 2 4 8 ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 ● ●● ● 2 ● 0 0.0 −2 lower profile 2.0 profile 1.0 −4 −4 z ● ● ● ● z ● −6 ● ●● ● ● −6 ● ● ● ● ●● ●● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●●●● ●●● ● ●● ●● ● ● 4 ● ● ● ● ● ● 10 −4 ● ● 6 −6 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ●● ●● ● ● ● ● ● ● ● ● ● 4 ● ● ● upper profile 0.4 0.0 Fn(x) 0.8 EDF 4 ● ● ● −6 z −4 ● ● ● ●● ● −2 0 2 ● 4 z Figure 4: Example data set (two normal distributions with means 0 and 1, and standard deviations 1 and 3, respectively). Upper left pannel: EDF’s of the two samples; upper right pannel: profile of T up ; lower left pannel: profile of T low ; lower right pannel: profile of T tot . 13 EDF 3 2 1 0 0.5 ●● ●● ● ● ● ● ● ● ● −0.5 20 total ● 0.0 10 ● ● 0.5 ● ●● ● ●● ● ● ●● ● ●● ●● −0.5 z 0.5 ● 5 ● ●● ●● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● 15 ● ● 0.0 lower 0 0 −2 −3 ● ● ●● ● z ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● −4 ● z ● ● ● −0.5 ● ● ● ● 0.0 ●● ●● −1 ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● profile 0.4 ● ● ● ● ● ● ● −0.5 profile ● profile ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● 0.0 Fn(x) 0.8 ● upper ● 0.0 ● 0.5 z Figure 5: Example data set (two exponential distributions with rates 1 and 3, both corrected for the mean). Upper left pannel: EDF’s of the two samples; upper right pannel: profile of T up ; lower left pannel: profile of T low ; lower right pannel: profile of T tot . 14 1.0 Normal, location shift, n=50 1.0 Normal, location shift, n=20 ● ● ● ● ● 0.8 0.8 ● 0.6 power power ● 0.4 ● 0.2 0.4 S low S up S tot t W 0.2 0.6 ● ● ● S low S up S tot t W ● ● ● 0.0 ● 0.0 ● 0.0 0.5 1.0 1.5 0.0 0.5 1.0 mu2 1.5 mu2 Figure 6: Estimated powers for the location-shift alternatives for sample sizes n = 20 (left pannel) and n = 50 (right pannel) 1.0 Normal, n=50 1.0 Normal, n=20 ● 0.8 0.8 ● ● 0.4 0.6 ● power ● S low S up S tot t W ● 0.4 power 0.6 ● ● ● ● ● 0.2 0.2 ● ● ● S low S up S tot t W ● 0.0 0.0 ● 1.0 1.5 2.0 2.5 3.0 1.0 sd 1.5 2.0 2.5 3.0 sd Figure 7: Estimated powers for the normal scale alternatives for sample sizes n = 20 (left pannel) and n = 50 (right pannel) 15 Exponential, n=50 1.0 1.0 Exponential, n=20 ● ● ● ● ● ● ● ● 0.6 S low S up S tot t W 0.4 0.4 power ● power ● 0.6 ● 0.8 0.8 ● ● 0.2 ● ● 1.0 S low S up S tot t W 0.0 0.0 0.2 ● 1.5 2.0 2.5 3.0 1.0 rate 1.5 2.0 2.5 3.0 rate Figure 8: Estimated powers for the exponential alternatives for sample sizes n = 20 (left pannel) and n = 50 (right pannel) t-test. In general, for the alternatives considered, the Sntot test seems to be a reasonable compromise between the Snlow and Snup tests. 16 APPENDIX: PROOF OF THEOREM 1 First, the weak convergence of T up (p) is shown. Part (b) of the theorem is a direct consequence of part (a) and the continuous mapping theorem. To proof the weak convergence of T up (p), we will first start with showing the weak convergence of the statistic √ √ 1 up 1 up up Sn (p) = 48σn (p)T (p) = n W (p) − 2 N1. (n(1 − p) + 1) , n2 2n q 1 N up (p)N2.up (p)(n(1 − p) + 1). This proof consists of two where σn (p) = 12n3 1. parts: we need to show (1) finite dimensional convergence to a multivariate normal distribution, and (2) tightness of {Sn }. Once weak convergence of Sn is established, we use the Skorohod construction and a strong approximation result for the weighted version T up of Sn . First, the notation is simplified. Let Nn (p) = N1.up (p) and Wn (p) = W up (p). (1) To prove multivariate normality, it is sufficient to apply the multivariate central limit theorem. It has been shown before that for each p, E {Sn (p)} = 0. Furthermore, as n → ∞, it is easy to show that Var {Sn (p)} = 48σn2 (p) → (1 − p)3 (we have used limn→∞ Nnn(p) = 12 (1 − p), which is the convergence of a non-random series, for all inference is conditional on Nn (p)). We only need to calculate the covariance Cov {Sn (p), Sn (q)}. Straightforward, though lengthy calculations give lim Cov {Sn (p), Sn (q)} = (1 − p)3 ∧ (1 − q)3 , (12) n→∞ where ∧ denotes the minimum operator. Equation 12 shows exactly the covariance function of a Brownian motion in transformed time (1 − p)3 . (2) Tightness is proven if we succeed in showing that for all n ≥ 1, and p1 ≤ p ≤ p2 , E (Sn (p1 ) − Sn (p))2 (Sn (p) − Sn (p2 ))2 ≤ (µ(p) − µ(p1 ))(µ(p2 ) − µ(p)), where µ is a continuous measure on [0, 1]. This is essentially Theorem 6 in Chapter 2 of Shorack and Wellner (1986), with a = 1 and b = 2. 17 By the independence of Sn (p1 ) − Sn (p) and Sn (p) − Sn (p2 ), we have E (Sn (p1 ) − Sn (p))2 (Sn (p) − Sn (p2 ))2 = E (Sn (p1 ) − Sn (p))2 E (Sn (p) − Sn (p2 ))2 . Further, we have E (Sn (p1 ) − Sn (p))2 = Var {Sn (p1 )} − 2Cov {Sn (p1 ), Sn (p)} + Var {Sn (p)} 1 1 1 ≤ (1 − p1 )3 − 2 (1 − p)3 + (1 − p)3 4 4 4 3 3 ≤ (p − 1) − (p1 − 1) = µ(p) − µ(p1 ), where µ(p) = (1 − p)3 is a continuous measure on [0, 1]. The first inequality sign in the above calculations follow from the observation that the variance of Sn (p) is a decreasing function of n. Similarly, E (Sn (p) − Sn (p1 ))2 ≤ (p2 − 1)3 − (p − 1)3 = µ(p2 ) − µ(p). Hence, E (Sn (p1 ) − Sn (p))2 (Sn (p) − Sn (p2 ))2 ≤ (µ(p)−µ(p1 ))(µ(p2 )−µ(p)). (13) This completes the proof of the weak convergence of Sn to a Brownian motion in transformed time (1 − p)3 . For the Skorohod construction the following strong approximation holds Sn (p) − S(Ψ(p)) a.s. −→ 0 as n → ∞, (14) sup Ψ1/2 (p) 0≤p≤θ Rθ 1 provided that 0 Ψ(t) dΨ(t) = − ln(Ψ(θ)) < ∞, which holds because θ < 1 is assumed. This strong approximation result may be reformulated as 3 up S((1 − p) )) a.s. −→ sup T (p) − 0 as n → ∞, (15) 3/2 (1 − p) 0≤p≤θ from which the weak convergence of T up immediately follows. APPENDIX: PROOF OF THEOREM 2 w From Theorem 1 we know that T up −→ S◦Ψup 1/2 Ψup S◦Ψlow 1/2 , as Ψlow T tot and Sntot are n → ∞. Both part (a) and part (b) follow immediately, because continuous function of T low and T up (continuous mapping theorem). 18 w , and similarly, T low −→ References Eubank, R., LaRiccia, V., & Rosenstein, R. (1987). Test statistics derived as components of Pearson’s phi-squared distance measure. Journal of the American Statistical Association, 82, 816-825. Gastwirth, J. (1965). Percentile modification of two sample rank tests. Journal of the American Statistical Association, 18, 1127-1141. Hollander, M., & Wolfe, D. (1999). Nonparametric statistical methods. New York: Wiley. Rayner, J., & Best, D. (2001). A contingency table approach to nonparametric testing. New York: Chapman and Hall. Shorack, G., & Wellner, J. (1986). Empirical processes with applications to statistics. New York, USA: Wiley. Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1, 80-83. 19
© Copyright 2024