How to Measure Segregation Angelo Mele∗ Department of Economics University of Illinois, Urbana-Champaign amele2@uiuc.edu http://netfiles.uiuc.edu/amele2/www July 30, 2007 Abstract We introduce a new theoretical framework for the measurement of residential segregation with two goals: 1) a segregation index should not depend on arbitrary partitions of the city in neighborhoods, but only on agents’ locations; 2) a segregation index should allow the measurement of segregation of a continuous variable (income) and of multiple attributes (race and income) together. We assume that the individuals locations follow a spatial Poisson point process over the metropolitan area and, conditional on location, we associate to each agent a random variable (mark) representing his socioeconomic characteristic(s). We construct a new spatial dissimilarity index and we compare it to existing neighborhood-based indices of dissimilarity. We provide nonparametric methods for estimating the spatial dissimilarity for the case in which individual locations data are available and an alternative approximate method when only summary data at the block level are available. We apply our approach to the analysis of racial segregation and we analyze both artificial and census data, showing that the ranking of cities is different under our index and the traditional dissimilarity. This last result potentially challenges the findings of the literature studying the effects of racial segregation on individual socioeconomic outcomes, since in those studies the level of segregation is measured according to the neighborhood-based approach. Keywords: racial segregation, income segregation, point processes, poisson processes, spatial statistics, nonparametric estimation ∗ Previous versions of this work circulated under the title: "A Unified Stochastic Framework for Measures of Socioeconomic Residential Segregation". The first idea for this paper came out during a twenty-minutes discussion with Roger Koenker: he suggested to explore the literature on point processes. That search turned out to be extremely helpful, at least as much as the following twenty-minutes conversations with him. I am also grateful to Patrick Bayer, Alberto Bisin, Rosa Ferrer, Antonio Galvao, Shadmehr Mehdi, Antonio Mele, Luca Opromolla, Franco Peracchi, Giorgio Topa, Jungmo Yoon and participants to the Washington University Graduate Students Conference 2006, UIUC Econometrics Lunch Seminar, ESPE 2007 Conference, La Pietra-Mondragone Workshop 2007 for useful comments and suggestions. All remaining errors are mine. 1 1 Introduction In this work we introduce a new theoretical framework for the measurement of spatial segregation with two main goals: first of all, a segregation index should not depend on arbitrary partitions of the city in neighborhoods, but only on the individuals’ locations over the urban area; second, a segregation index should be able to measure the segregation of a continuous variable (income) or the residential separation on multiple attributes (race, income, education) together. We provide two main contributions to the literature: we develop a flexible framework for the measurement of segregation introducing a stochastic process driving the locations of individuals of different racial groups and providing a definition of segregation based on the individual locations over a metropolitan area; second, we present methods of estimation for point patterns data and show nonparametric methods for the case in which only count data at the block level are available. We start by observing that we do not expect people to be located uniformly over the entire metropolitan area, for a number of reasons: geographic barriers, different building structures, laws, dedicated areas, etc.: otherwise we would observe some newyorkers living in central park. In fact there are different intensities (mean number of people per unit area) over the urban area, some neighborhoods showing more population density than others. An index of segregation is a function of the individuals’ locations summarizing the difference between the spatial pattern of each (racial) group and that of the population as a whole.1 The stochastic framework introduced in this work provides a way to define and measure the difference among spatial patterns that is both theoretically sound and easily implementable in empirical analysis. It is known that spatial separation by race (or other socioeconomic variables) over the urban area is a specific and historical characteristic of the US cities. In this work we answer the question: how should we measure the extent of segregation in a metropolitan area? Suppose we want to measure the segregation of blacks from non-blacks: the traditional approach, which I will refer to as the neighborhood-based approach, involves the partition of the city in n neighborhoods, then the computation for each neighborhood i of the share of blacks Bi /Pi , where Pi is the number of individuals and Bi the number of blacks in neighborhood i. If there is no segregation the fraction of blacks in each neighborhood Bi /Pi will be equal to the fraction of blacks in the whole city B/P . An index of segregation is then a synthetic measure of the difference between the actual distribution of races across neighborhoods, i.e. the distribution (B1 /P1 , ..., Bn /Pn ), and the distribution arising when there is no segregation, (B/P, ..., B/P ), with appropriate normalization in order to get a quantity between zero and one, which is comparable across cities. According to the notion used by the researcher to measure this difference one will obtain alternative indices.2 However, all the indices built according to the neighborhood-based approach present some common problems. First of all, all the indices are based on some partition of the metropolitan area in neighborhoods (as argued by Echenique and Fryer (2006)), usually census tracts or blocks, making the measurement directly dependent on the specific partition adopted. Second, if we compute the index of segregation using different levels of aggregation of the data (tracts, block groups or blocks) we will get different numbers and, even worse, different rankings of the cities in terms of segregation. a problem known in spatial analysis as Modifiable Area Unit Problem (MAUP). Third, the majority of the indices does not take into account the spatial location of the individuals over the urban area, thus completely ignoring the inherently spatial nature of the phenomenon. Fourth, the indices are devoted to the measurement of segregation of a categorical variable (race, occupation): whenever we are interested in the segregation of a continuous variable (income, education, wealth) we have 1 The idea applies analogously to income segregation. See Massey and Denton (1988) for an extensive review of the traditional indices. Reardon and Firebaugh (2002) explicitly provide a discussion of the neighborhood based approach. 2 2 to split the continuum into a set of categories (income groups, education groups, etc) in order to use the same indices. We think that an ideal segregation index should be able to take into account the continuity of the variables under consideration, or even better it should be able to measure segregation on different levels (for example race and income together). The approach developed in this work starts from the same argument of Echenique and Fryer (2006) and Reardon and O’Sullivan (2004), considering the individuals and their spatial location as the primitive of the segregation measure, hence avoiding the problem of arbitrary partitions. The main innovation is the introduction of a stochastic framework, with the assumption that the individuals locations are the realization of a stochastic process, mapping points on the plane. We build on the theory of point processes, a branch of stochastic geometry often used in disciplines like biology, epidemiology, astronomy, ecology and geology to model spatial data. A spatial point process is a stochastic process mapping a countable set (of points) X in a space S ⊆ R2 . The fundamental parameter of the process is the Intensity Function λ (ξ), i.e. the expected number of points of the process in an infinitesimal area around the point ξ in S. The Intensity Measure Λ (A) is obtained by integration of λ (ξ) over A ⊆ S, and it corresponds to the expected number of points over A. An Inhomogeneous Poisson Point Process is a spatial point process defined as follows: 1) for any area A in S, the number N (A) of points of the process in the region A follows a Poisson distribution with parameter equal to the Intensity Measure Λ (A); 2) given N (A) = n, the points are identically and independently distributed over A according to a density f (ξ) = λ (ξ) /Λ (A). A Marked Poisson Process is a process X = { {ξ, m (ξ)}| ξ ∈ X0 } such that a random mark m ∈ M is attached to each point of the Poisson process X0 . In our application the mark represents the racial group of an individual living at location ξ. We develop a general framework in order to measure segregation, but we present an application to racial segregation only. Income segregation is analyzed in another paper in progress, since the estimation methodology is more complicated. The model specifically assumes that: 1. The individuals’ locations X0 are the realization of an Inhomogeneous Poisson Point Process over the space S with intensity function λ0 (ξ). 2. Conditional on the realization of X0 , the marks m are mutually independent 3. For any racial group m, the conditional marks distribution ρ (ξ, m, X0 Âξ) . depends only on the location ξ (it does not depend on the location of the other points X0 Âξ). The first two conditions amount to assume that the locations of individuals belonging to the different racial groups follow an Inhomogeneous Marked Poisson Process, i.e. a process with different types and spatially varying intensities. The third condition insures that the process is Poisson on the enlarged space S × M and it can be shown (see the technical appendix) that this process is equivalent to a multivariate Inhomogeneous Poisson Process, a process composed by independent Poisson Processes, one for each racial group, with intensities of racial group m being λ (ξ, m) = λ0 (ξ) ρ (ξ, m), where ρ (ξ, m) is the conditional probability that in point ξ there is an individual of racial group m. The marked point process X is completely unsegregated if and only if the conditional probability ρ (ξ, m) does not vary over space, i.e. ρ (ξ, m) = ρ (m) for all ξ and m. The marked point process X is completely segregated if and only if the mark distribution is degenerate at each point, i.e. for all ξ ∈ X0 , there exist an m∗ such that ρ (ξ, m∗ ) = 1 and ρ (ξ, m) = 0 for any m 6= m∗ . We present an example of index, constructed according to this approach: we measure the level of segregation at point ξ as absolute deviation from the completely unsegregated process, by using 3 the quantity |ρ (ξ, m) − ρ (m)|. The total segregation in the metropolitan area is then defined as the sum of this quantity over all racial groups and points. The index of spatial dissimilarity is obtained by normalizing this sum with its value under complete segregation. Such an index is immune from the problems mentioned above, by definition. We estimate the conditional mark distribution with the ratio of the estimated racial-specific and b (ξ, m) /λ b0 (ξ). The intensity estimates are obtained using overall intensity functions, b ρ (ξ, m) = λ a multiplicative quartic kernel, a standard method in the literature (see Diggle 2003). The choice of the kernel bandwidth is done by a MSE minimization criterion, as suggested by Diggle (1985) and Berman and Diggle (1989). Since the MSE minimization prescribes a different bandwidth for each city, we have comparability problems among cities, cause the difference in the estimated index can reflect also the difference in bandwidths: we thus present estimates in which we use the same bandwidth for all the cities. We use a finite grid method, following the literature, but we also provide a valid alternative in which the kernel estimator is evaluated only at the observed locations, thus avoiding the approximations involved when using a finite grid. We then develop an approximated nonparametric method for the case in which only count data at the block level are available: the approximation is good enough as long as the intensity is smooth and the block area is small. We apply the methodology to artificial point pattern data and census block level count data. Using census 2000 data we measure segregation levels in all the metropolitan areas of the US, and we present results for 9 of them in order to make the exposition more concise. The rankings implied by the spatial dissimilarity and the traditional indices are different, proving that this methodology doesn’t provide only a refinement of the traditional measurement but it is capturing some features of the segregation phenomenon that standard indices are unable to detect. The correlations between our index and the traditional ones are between .65 and .8, providing more evidence that we are capturing something that traditional indices are not able to capture. The advantages of this approach are striking. First of all the index does not depend on arbitrary partitions and it is a function of individuals’ coordinates, by definition. We also avoid the problem of comparability among different census years because even if the census tracts definition changes from one census to another the spatial dissimilarity index is unaffected. The comparability extends also across countries, since the index is based on geographic coordinates (longitude and latitude), while the neighborhood-based indices, since based on the specific neighborhood definition, do not have this property. Second, we can measure the segregation of a continuous variable, like income: the marks space M can be any metric space. In the case of racial segregation M = {1, 2, 3, 4, 5} is discrete, while for income segregation we would have M = [0, ∞). In principle we can measure multilevel segregation by defining M = {1, 2, 3, 4, 5}×[0, ∞) or subgroups segregation by redefining the conditional probabilities over subgroups, given the independece property implied by our assumptions. Third, we can build statistical tests, for example to test if New York is more segregated than Chicago, based on the stochastic process. Fourth, we can estimate segregation at different scales, by changing the bandwidth of the kernel estimator. Theoretically, with a larger bandwidth the point process is more similar to an homogeneous process and the segregation index will converge to zero. Our simulations with census data confirm this result, which is also a theoretical justification of Reardon et al (2006) and Feitosa et al (2006) empirical findings. Fifth, we have not considered here the determinants of segregation, but the framework is flexible enough to be used for this kind of study. Nothing Pprevents us from assuming a parametric spatial model for the intensity function, λ (ξ, m) = αm + K k=1 β k Zk (ξ, m) , where the Zk (ξ, m)’s are geocoded variables affecting the location of individuals. Once we estimate the parameters of such a model we are able to run some policy experiment to determine which desegregation policies are more effective, ceteris paribus. Finally, in this work we present what can be called a "descriptive" theory of segregation indices. The index is 4 a random variable, depending on the realization of the point process: ideally we would like to know how the segregation index changes as a function of the process parameters. In a paper in progress we provide some theoretical results for indices under the same assumptions used in this work. We are able to compute expected value and variance, so that we can estimate if New York is on average structurally more segregated than Chicago, where structurally is interpreted as conditional on the intensity function. This work is related to several strands of literature. First of all it builds on the enormous body of research on indices of segregation, which is highly influenced by Massey and Denton (1988): they provided a principal component analysys of the segregation indices, showing that the dissimilarity and isolation indices were able to explain most of the variability of segregation in US cities.Their results encouraged researchers to use the dissimilarity as the only measure of racial segregation, until recently, when some research pointed out the flaws of neighborhood based indices. Among others, Echenique and Fryer (2006) question the arbitrary partition of the city in neighborhoods and develop a segregation index based on individuals’ social networks, building on three basic axioms: monotonicity, linearity and homogeneity.3 Basically the Spectral Segregation Index measures segregation based on social interactions of same race neighbors, where neighbors are the agents living within 1 km from the individual itself. Reardon and O’Sullivan (2004) extend the theory of neighborhood-based segregation indices to spatial measures, adapting the properties required to neighborhood-based indices to a framework based on the individuals location over a city map. In their study the overall segregation is a function of the "local environment" of the agents, where local environment is defined by a proximity function that may assume different functional forms. Their approach is very close to the one proposed here but in our setting the notion of local environment is infinitesimal. Reardon et al (2006) show that empirically the segregation level measured by their indices decreases as the local environment radius increases. They adopt an estimation strategy similar to the one we use in this work, but they do not specify the underlying stochastic process for the spatial pattern: they define the "empirical" local environment by the radius of the gaussian kernel estimator for the intensity. Using similar techniques Feitosa et al (2006) compute several segregation indices with spatial features. Even if they do not specify a stochastic process for the individuals locations, their work provides local and global measures of segregation, and a basic test for detection of segregation that builds on Anselin (1995). The work is also related indirectly to the research on the effects of racial and socioeconomic segregation on the individual outcomes. Among others, Cutler and Glaeser (1997) analyze the effect of racial segregation in MSAs on socioeconomic outcomes, in particular high school and college graduation, job idleness and earnings and single motherhood. Their estimates show that, once correcting for endogeneity, segregation worsens those outcomes. Card and Rothstein (2006) find that school and residential segregation explain most of the negative relative black test score. Ananat (2007) confirms the negative effect of segregation on outcomes, while providing a better correction for the endogeneity of segregation.4 All of these works measure segregation using the neighborhood-based approach. Echenique and Fryer (2006) replicate the specifications of Cutler and Glaeser (1997) using their Spectral Segregation Index to measure segregation, showing that the 3 Monotonicity, if the individuals in city A have a larger share of connections/interactions with same race individuals than in city B, then the level of segregation in A is higher than in B; Linearity, an individual’s segregation increaseas linearly with the level of segregation of the agents she is connected to; Homogeneity, if all individuals in a city network have half of their social interactions with same race agents, the index of segregation is one-half (this is just a normalization). 4 In a related analysis, La Ferrara and Mele (2006) show that segregation has a positive effect on the average public school per pupil spending both at the district and the metropolitan level; nonetheless segregation is also associated with an increased inequality of expenditure among districts. 5 qualitative results are unchanged, even if the magnitude of estimated effects is slightly different. It is not clear that this results should still hold when using our approach and future research is needed in this direction. The third strand of literature related to this paper is the rapidly growing research on point processes theory and their applications.5 Statistical models of point patterns are used in spatial epidemiology (Diggle, Zheng and Durr (2005), Kelsall and Diggle (1998)), Neuroscience (Diggle, Eglen and Troy (2006)), Astrophysics, Ecology, Geology (Zhuang, Ogata and Vere-Jones (2006)) and Image Recognition. Especially related to the present work is Diggle, Zheng and Durr (2005), that studies the clustering of bovine tubercolosis in cornwall. They assume that the cases of different types of tubercolosis follow a multivariate inhomogeneous poisson process and then compute risk surfaces and conditional probability of a specific type of disease at a specific location. The definition of segregation is similar to the one proposed here, but the conditional probabilities are computed taking into account the controls.6 They use a kernel regression estimator for the conditional probabilities and provide a test for detection of segregation based on Monte Carlo simulation: the null hypothesis of no segregation is rejected. The paper is organized as follows. In Section 2 we briefly consider the underlying motivation of this work. In Section 3 we briefly introduce the theory of point processes and in Section 4 we develop the idea of measuring segregation via conditional probabilities. In section 5 we present the data, we briefly review the available estimation methods for point patterns data and we provide the approximated estimation method for count data. Section 6 show the results and Section 7 provides the conclusions and a discussion of future directions of research. The appendices contain a more technical introduction to the theory of point process (A), the description of parametric estimation methods (B) and some alternative estimates (C). 2 Motivation Consider the problem of measuring the residential segregation of blacks in a city. The traditional approach, which I will refer to as the neighborhood-based approach, involves the partition of the city in n neighborhoods, then the computation for each neighborhood i of the share of blacks Bi /Pi , where Pi is the number of individuals and Bi the number of blacks in neighborhood i. If there is no segregation the fraction of blacks in each neighborhood Bi /Pi will be equal to the fraction of blacks in the whole city B/P . An index of segregation is then a synthetic measure of the difference between the actual distribution of races over neighborhoods, i.e. (B1 /P1 , ..., Bn /Pn ), and the distribution arising when there is no segregation (B/P, ..., B/P ), with appropriate normalization in order to get a quantity between zero and one which is comparable across cities. According to the notion used by the researcher to measure this difference one will obtain alternative indices.7 For example, the most popular measure of residential segregation is the dissimilarity index that defines the difference among distribution by using the absolute deviation 5 See Diggle (2003), Moller and Waagepetersen (2004), Stoyan, Kendall and Mecke (1987) and Stoyan and Stoyan (1994) for excellent introductions to the theory and some applications. 6 In their model there are four types of tubercolosis and there is also a control group, i.e. locations in which there is an animal not infected by the disease. We don‘t have to model the control group in our application. 7 See Massey and Denton (1988) for an extensive review of the traditional indices. Reardon and Firebaugh (2002) explicitly provide a discussion of the neighborhood based approach. 6 ¯ ¯ ¯ Bi B ¯ − P i ¯ Pi P¯ 1 ¡ ¢ D= B B 2 P 1 − P P i=1 n X (2.1) This index is interpreted as the fraction of blacks that would have to move to another neighborhood in order to achieve a completely integrated city. This correspond to an intuitive notion of segregation, i.e. an uneven distribution of the racial groups over the city’s neighoborhoods. However the dissimilarity index and all the available neighborhood-based indices of segregation share some common undesirable properties. [Insert Figure 1 Here] First of all, the index depends on the specific partition of the urban area, as argued by Echenique and Fryer (2006). The Bureau of Census usually provides data at different levels of aggregation: census tracts, block groups and blocks. Census tracts usually have between 2,500 and 8,000 persons and, when first delineated, are designed to be homogeneous with respect to population characteristics, economic status, and living conditions.8 Therefore the definition of tracts itself biases the index towards higher segregation. Furthermore different partitions could lead to different values of the index: consider the example in Figure 1. The figure shows the locations of blacks (black circles) and whites (white circles) in four stylized cities. The geographic distribution of the racial groups in the four cities is the same, but different neighborhood partitions are shown. If we adopt the neighborhood-based approach, city A exhibits maximum segregation (D = 1), city B is perfectly integrated (D = 0), city C is perfectly segregated (D = 1), and city D has an intermediate level of segregation (D = .2291). This is of course an undesirable property. The second problem is that the index does not take into account the spatial location of agents. Consider again Figure 1, for example city C. A neighborhood-based index will consider all the people living in the same neighborhood as experiencing the same level of spatial separation. Of course this is not the case: the black agent living at coordinates (4,5) will experience much more segregation than the black living at (3,6), the first one being surrounded only by same race neighbors while the second experiencing more heterogeneity of his neighbors. This cannot be taken into account as long as we do not consider individual-based indices. The third problem is known to geographers as the Modifiable Area Unit Problem (MAUP): If we compute the index of segregation using different levels of aggregation of the data (tracts, block groups or blocks) we will get different numbers and, even worse, different rankings of the cities in terms of segregation.Let us compare cities A and B in Figure 1: city A is obtained by subdividing each neighborhood of city B in four equivalent subunits. City A shows maximum segregation while city B complete integration. There is evidence of the MAUP problem when using census data, and the effect is amplified when there is a very high level of segregation because smaller subunits (block groups) are more homogeneous than bigger ones (census tracts), hence when using block groups the index will be higher than when using census tracts. 8 http://www.census.gov/geo/www/cen_tract.html 7 Table 1: Rankings Census Tracts Detroit, MI Gary-Hammond, IN Cleveland, OH Chicago, IL Milwaukee, WI Flint, MI Saginaw-Bay City-Midland Buffalo, NY Newark, NJ Glens Falls, NY depends on subunits used Block Groups 0.8728 Laredo, TX 0.8692 Gary-Hammond, IN 0.8482 Detroit, MI 0.8359 Cleveland 0.8204 Wausau, WI 0.8092 Bismark, ND 0.8072 Chicago, IL 0.8070 Eau Claire, WI 0.7798 Buffalo, NY 0.7780 Milwaukee, WI 0.8915 0.8893 0.8837 0.8623 0.8572 0.8550 0.8529 0.8438 0.8383 0.8359 In Table 1, using data from the 1990 Census, we show that the ranking of the cities in terms of segregation of african-americans (we use the dissimilarity index) is different if we use census tracts or block groups. Furthermore, when using block groups, Laredo, TX is the most segregated, while when using census tracts it is the 126th most segregated, with a dissimilarity of .51. For the least segregated MSAs the pattern is similar, but less important than for highly segregated cities. The framework that we propose here is able to solve all these problems together, since we will use individual-based measures instead of neighborhood-based indices. 3 The Stochastic Environment This section presents an introduction to the theory of spatial point processes, providing the necessary background for the understanding of our theoretical framework. A more detailed and technical exposition is contained in the appendix. The reader familiar with these stochastic processes can skip this section. 3.1 Notation, Basic Properties and Definitions A spatial point process is a stochastic process that maps countable sets in planar regions.9 More generally a spatial point process X is a random countable subset of a space S ⊆ R2 . We will denote the process as a random set X = {xi } or, according to the context, as a random variable N (A), that is the number of points in set A ⊆ S. We will denote the realizations of X as x and the realizations of N as n. We will denote the generic point in S as ξ or η and the generic point of the process as xi . We will write |A| to indicate the area of set A and dξ to describe the infinitesimal region containing ξ. We will refer to the points of the process as events. We will consider only finite point processes, i.e. stochastic processes that map finite sets of points in any planar region. We will denote the set of all such realizations x as N1f . The point processes that will be considered here are simple (or orderly), i.e. for any i 6= j, we have xi 6= xj : this means that there are no coincident events (points). [Insert Figure 2 here] In Figure 2 we show the realizations of different point processes that we use to explain the basic definitions and properties. The HPP is an Homogeneous Poisson Process, which is considered 9 We just cover details about what is relevant for the present work. Any book about point processes listed in the references can be used as a valid and much more detailed introduction. 8 in literature as the benchmark of complete spatial randomness, the IPP is an Inhomogeneous Poisson Process, the MultitypePP is a Multitype Poisson Process (a superposition of independent univariate HPP processes of different type, where type is visually identified by colors), and the MPP is a Marked Poisson Process, where the type of the points is defined by the radius of the circles. The first important concept is stationarity: if the point field is observed from different regions on the plane then the configurations of the points are similar, differences arising only from randomness. More formally, a point process is stationary if all probability statements about the process in any bounded A of the plane are invariant under arbitrary translations. This property is very important in defining the randomness of the process as we’ll see below. Analogously a point process is isotropic if the invariance holds under arbitrary rotations. Stationarity and Isotropy together give what is called motion-invariance. In the measurement of segregation we will consider non-stationary and non-isotropic processes. The processes HPP and MultitypePP in Figure 2 are stationary and isotropic (see the Homogeneous Poisson Process definition below) while the IPP is neither stationary nor isotropic. One should be aware that motion-invariance or stationarity do not imply regular patterns, since the process is stochastic and there is noise in the realizations as can be seen in the figure. 3.2 The intensity function Consider a process X defined over S ⊆ R2 . The intensity function is defined as a (locally integrable)10 function λ : S → [0, ∞) ¾ ½ EN (dξ) (3.1) λ (ξ) = lim |dξ| |dξ|→0 and it is the analogous of the expectation for a random variable. We can interpret it as the expected number of points of the process per infinitesimal area dξ around the point ξ. The intensity measure of a point process X is defined for any A ⊆ S as Z λ (ξ) dξ (3.2) EN (A) = Λ (A) = A Since we are considering only finite simple point processes, we will have Λ (A) < ∞ for all bounded A ⊆ S and Λ ({ξ}) = 0, for any ξ ∈ S. We can reinterpret the intensity function in the following way: the quantity λ (ξ) dξ can be thought of as the probability that there exists an event in the infinitesimal region dξ, i.e λ (ξ) dξ ≈ P [N (dξ) = 1], since for an infinitesimal region dξ we have EN (dξ) ≈ P [N (dξ) = 1]. 3.3 Poisson Processes The Poisson point processes are by far the most important in applications and are the models that define the notion of complete spatial randomness. DEFINITION 3.1 (Poisson Point Process) A point process X on S is a Poisson Point Process with intensity λ (ξ) if the following two conditions are satisfied: 1. for any A ⊆ S, N (A) ∼ P oisson (Λ (A)) 10 A function is locally integrable if U A λ (ξ) dξ < ∞ for all bounded A ⊆ S. 9 2. conditional on N (A) = n, the events are identically and independently distributed over A according to the density f (ξ) = λ (ξ) /Λ (A) We will denote the generic Poisson Process as X ∼ P oi (S, λ (ξ)). The processes HPP and IPP in Figure 2 are examples of Poisson Point Processes. Condition (1) drives the number of events in the region A, while condition (2) states that conditional on the draw from the Poisson distribution, the events are i.i.d. with density f , the ratio of intensity function and intensity measure. Condition (1) also implies for any bounded A ⊆ S, that EN (A) = Λ (A). A Poisson Point Process is Homogeneous or stationary (HPP) if the intensity function is constant over space, i.e. λ (ξ) = λ, for all ξ ∈ S and f (ξ) = |A|−1 , for any A ⊆ S. It follows that for an Homogeneous Poisson Process EN (A) = λ |A|. The HPP is considered the ideal of complete spatial randomness in literature: roughly speaking complete spatial randomness means that we do not expect the intensity of the process to vary over the region we are considering and that there are no interactions amongst different events. Indeed, by condition (1) and the fact that λ (ξ) = λ, an HPP shows stationarity and isotropy, cause N (A) ∼ P oisson (λ |A|), and thus the expected number of events does not vary over the planar region A; by condition (2) and f (ξ) = |A|−1 , we have no clustering or inhibition (the presence of a point in ξ does not make more or less likely the occurrence of an event η in the neighborhood of ξ). The process HPP in Figure 2 is an Homogeneuous Poisson Process with intensity λ = 100 over the unit square. The Poisson Process is Inhomogeneous or nonstationary (IPP) if the intensity function is not constant. The IPP is the simplest class of nonstationary point processes used in applications. The IPP realization in Figure 2 shows an example of IPP with intensity λ (ξ) = 200 (ξ 1 )2 + 200 (ξ 2 )2 over the unit square, where ξ = (ξ 1 , ξ 2 ) 3.4 Marked Point Processes Consider a point process X0 defined over the space S ⊆ R2 . If we attach a random mark m (ξ) ∈ M to each point ξ ∈ X0 then the process X = { {ξ, m (ξ)}| ξ ∈ X0 } is called Marked Point Process with events in S and marks in M. The marks attached to the point of the process are itself random variables. The easiest way to think about this process is a point process to which we randomly attach labels: the realization is thus a bunch of locations (points) with different labels. Notice that the space M may be either a finite set, i.e. M = {1, 2, ..., M }, in which case X is called multitype process, or a general subset M ⊆ Rq , q ≥ 1. Both the bottom realizations in Figure 2 come from Marked Point Processes, the first one being a multitype process with marks in M = {red, black, green} and the second one a marked point process with marks space M = [0, ∞). In next section we build a segregation measure based on a specific marked point process, that we describe in the following. DEFINITION 3.2 (Marked Poisson Process) The Point Process X = { {ξ, m (ξ)}| ξ ∈ X0 } is a Marked Poisson Process if: 1. X0 ∼ P oi (S, λ0 (ξ)) 2. conditional on X0 the marks { m (ξ)| ξ ∈ X0 } are mutually independent 10 Let us denote the conditional marks distribution as ρ (ξ, m, X0 Âξ). In principle ρ (·) can depend on the specific location ξ but also the location of the other points of the process X0 Âξ (it cannot depend on the other points’ marks by condition (2) of the above definition). If ρ (ξ, m, X0 Âξ) = ρ (ξ, m) for any ξ ∈ X0 and for any m ∈ M, i.e. the conditional marks distribution does not depend on the location of the other events X0 Âξ, then the MPP is a Poisson Point Process over the enlarged space S × M, with intensity λ (ξ, m) = λ0 (ξ) ρ (ξ, m) (see proposition A.1 in the appendix). Furthermore, when the marks space M is finite (for example for racial groups) we have another useful result: a multitype process with ρ (ξ, m, X0 Âξ) = ρ (ξ, m) is equivalent to a multivariate Poisson Process (see proposition A.2 in the appendix). A multivariate Poisson Process is obtained by superposition of independent univariate Poisson Processes. Therefore if we have a multitype process with M = {1, 2, ..., M } and ρ (ξ, m, X0 Âξ) = ρ (ξ, m) for any ξ ∈ X0 and for any m ∈ M, we can reformulate it as a multivariate Poisson Process (X1 , X2 , ..., XM ), with Xm ∼ P oi (S, λ (ξ, m)) mutually independent and λ (ξ, m) = λ0 (ξ) ρ (ξ, m), m = 1, ..., M . This last result will be exploited in the estimation. 4 4.1 Segregation Measurement General Framework In this section we will develop a statistical framework in order to measure segregation, based on point processes theory. Similar statistical models are employed in spatial epidemiology for disease mapping and for detection of disease clusters. Among others Kelsall and Diggle (1998) develop a multivariate poisson process model in order to estimate the spatial variation in risk of desease for a population at risk. A similar model is used in Diggle, Zheng and Durr (2005) to detect clustering of different types of bovine tubercolosis in a region. Consider the set of all the possible finite realizations of the marked point process, that we call N1m . We want to measure the spatial segregation of a set of points X = {xi , m (xi )}ni=1 that are characterized both by their position xi in the city area S and a mark m (xi ) defined over a space M. Examples of marks are racial groups, income groups, income levels, education levels, a mix of them. The marks space can be any metric space so we are not constrained to measure segregation over a univariate mark. In our view an index of segregation should be a function of the locations of all the individuals and their type (racial group, income level, education level). Therefore we define a segregation index to be a function of the realization of a marked point process, with range in [0, 1], i.e. a segregation index is a function Φ : N1m −→ [0, 1]: Φ is increasing with respect to the differences among the actual spatial distribution and the distribution under complete integration. The index is zero if the process is unsegregated and one if the process is completely segregated. It should be realized that in this stochastic setting the segregation index is a random variable, and according to the realization x of the marked point process, there will be a corresponding realization φ of the segregation index. In this work we will provide an analysis of segregation of the realized spatial pattern: Mele (2007, in progress) provides theoretical results and shows how to compute the moments of any index. We assume that the locations of the individuals X0 are the realization of an Inhomogeneous Poisson Point Process over the space S ⊆ R2 with intensity function λ0 (ξ) ASSUMPTION 4.1 The individuals locations X0 follow an Inhomogeneous Poisson Process with 11 intensity λ0 (ξ) over S X0 ∼ P oi (S, λ0 (ξ)) (A1) The next two are the crucial assumptions for the model. ASSUMPTION 4.2 Conditional on X0 , the marks are mutually independent, i.e. for ξ i ∈ X0 , i = 1, ..., n n Y P ( m (ξ i ) = mi | X0 ) (A2) P ( m (ξ 1 ) = m1 , ..., m (ξ n ) = mn | X0 ) = i=1 We are thus assuming that X = { {ξ, m (ξ)}| ξ ∈ X0 } is a Marked Poisson Process over the region S with marks in the space M. Let’s define ρ (ξ, m, X0 Âξ) ≡ P ( m (ξ) = m| X0 ), the probability that a point ξ has mark m, conditional on the realization of the locations X0 . We assume that this conditional probability depends on the location ξ, but it does not depend on the locations of the other points of the process X0 \ ξ. ASSUMPTION 4.3 For all ξ ∈ X0 , for all m ∈ M ρ (ξ, m, X0 \ ξ) = ρ (ξ, m) (A3) The assumptions (A1-A3) imply that the process X is Poisson over the enlarged space S × M, with intensity λ (ξ, m) = λ0 (ξ) ρ (ξ, m) (proposition A.1 in appendix). When the marks space is discrete we can reformulate the model as a multivariate inhomogeneous poisson process X = M S Xm with intensities λ (ξ, m) = λ0 (ξ) ρ (ξ, m), m = 1, 2, ..., M respectively, where Xm and Xm0 m=1 are stochastically independent for m 6= m0 (proposition A.2 in appendix). In this setup the process exhibits segregation if the spatial pattern of a specific type is different from that of the population as a whole. In terms of the model there is no segregation if the conditional probability of each type does not depend on the location, i.e. if the intensity of each type is proportional to the intensity of the whole population over the entire metropolitan area S.11 DEFINITION 4.1 The marked point process X is completely unsegregated if and only if ρ (ξ, m) = ρ (m) for all ξ ∈ X0 , m ∈ M In the definition ρ (m) corresponds to the marginal marks’ distribution (and it can easily be estimated from the data). We observe maximum segregation if the realization exhibits a degenerate conditional marks’ distribution at each point. Formally, the definition is slightly different if we have a continuous or a discrete marks’ space. In the discrete case the definition is the following12 DEFINITION 4.2 The marked point process X is completely segregated if and only if for all ξ ∈ X0 , ∃m∗ ∈ M such that ρ (ξ, m∗ ) = 1 and ρ (ξ, m) = 0 for any m 6= m∗ . 11 In literature this is called random labelling. See also Diggle, Zheng and Zurr (2005) that use a similar definition. When M is continuous as in the case of income segregation we modify the definition as follows: The marked point process X is completely segregated if and only if for all ξ ∈ x0 , ∃m∗ = m∗ (ξ) ∈ M such that ρ (ξ, m) = δ (m − m∗ ), where δ (u) is the Dirac-Delta function. 12 12 4.2 An Index of Racial Segregation We measure the level of segregation at location P ξ as the absolute deviation from a complete unsegregated process. We consider the quantity m∈M |ρ (ξ, m) − ρ (m)| for each ξ ∈ X0 as the measure of difference between the two distributions. If we sum for all the points of the realization of X0 we get a measure of total deviation from complete integration. In order to have an index varying between zero and one we normalize the realized sum by the theoretical value of the sum under complete segregation. P P Our measure of segregation is defined as ξ∈X0 m∈M |ρ (ξ, m) − ρ (m)|. This is the equivalent of the dissimilarity index in our setting, so Pcall it spatial dissimilarity index. The normalizaPwe will tion is obtained by dividing the quantity ξ∈X0 m∈M |ρ (ξ, m) − ρ (m)| by its value under perfect P segregation,13 i.e. 2n m∈M ρ (m) (1 − ρ (m)). The Spatial Dissimilarity Index is thus defined as the ratio X X |ρ (ξ, m) − ρ (m)| φdism = ξ∈X0 m∈M 2n X m∈M (4.1) ρ (m) (1 − ρ (m)) The index is very similar to the traditional dissimilarity index, but it specifically takes into account the position of the individuals in the city map. The main difference is that in the traditional approach the conditional probability ρ (ξ, m) is the same for all the individuals belonging to the same census tract, while here we are not making any such restrictions. The most popular works on racial segregation are dedicated to the dichotomous case, in which we measure the segregation of a group with respect to the rest of the population, e.g. we measure the segregation of blacks with respect to the non-blacks. In its dichotomous version, the spatial dissimilarity can be simplified (b=blacks).14 X |ρ (ξ, b) − ρ (b)| φdism = ξ∈X0 2nρ (b) (1 − ρ (b)) (4.2) It should be clear that the spatial dissimilarity is X just an example of index that we can build in this framework: any index based on the measure h (ξ), where h is a nonnegative function ξ∈X0 summarizing the difference among the actual distribution and the distribution under no segregation S S Consider the quantity ξ∈x0 m∈M |ρ (ξ, m) − ρ (m)|. With some algebra and considering that, under complete ∗ segregation, for all ξ ∈ X0 , ∃m ∈ M such that ρ (ξ, m∗ ) = 1 and ρ (ξ, m) = 0 for any m 6= m∗ , we get [ [ [ |ρ (ξ, m) − ρ (m)| = (|ρ (ξ, 1) − ρ (1)| + ... + |ρ (ξ, M ) − ρ (M)|) 13 ξ∈x0 m∈M ξ∈x0 = = = ρ (1) n |1 − ρ (1)| + (1 − ρ (1)) n |0 − ρ (1)| + ... ... + ρ (M) n |1 − ρ (M)| + (1 − ρ (M)) n |0 − ρ (M)| 2ρ (1) n (1 − ρ (1)) + ... + 2ρ (M) n (1 − ρ (M)) [ 2n ρ (m) (1 − ρ (m)) m∈M 14 With some algebra we get S S m∈M |ρ(ξ,m)−ρ(m)| 0 S = φdism = ξ∈x 2n ρ(m)(1−ρ(m)) m∈M 2 S ξ∈x0 |ρ(ξ,b)−ρ(b)| 4nρ(b)(1−ρ(b)) = 13 S ξ∈x0 |ρ(ξ,b)−ρ(b)| 2nρ(b)(1−ρ(b)) (a distance function for example), can be used as index ofPsegregation, under the appropriate normalization.15 In the spatial dissimilarity we used h (ξ) = m∈M |ρ (ξ, m) − ρ (m)| 4.3 Discussion The main assumption we imposed is that the pattern of locations is a realization of an Inhomogeneous Multitype Poisson process over the metropolitan area. While this assumption may sound inappropriate for the phenomenon under consideration, racial or income segregation, it can be justified somehow. First, it is clear that the segregation of blacks over the metropolitan area implies some interaction and/or interdependence among events of different type. While this is an interesting consideration per se, it is not clear how this interdependence can be modeled, a priori, and it would probably require the specification of a parametric model with some covariates driving the intensity function. This can be implemented only with the availability of very detailed individuallevel census data containing not only the location coordinates, but also detailed socio-economic data. Given the actual availability of data at the block level in which the only informations are the location of the blocks’ centroids, its racial composition and some summary indicators of housing values, it is not obvious that we can construct such a model. Second, in this work we are mainly interested in measuring the extent of socioeconomic segregation, we are not considering its determinants. Modelling the interdependence of the events would probably give more insights on the causes of segregation, but it is unlikely to give some additional information about the measurement. Therefore the assumption of independence can be considered as a useful benchmark (to be improved in the future, especially for policy purposes), since we are measuring the extent of segregation without exploring its determinants. Future research will be devoted to explore other models and to relax this assumption. In particular, of great interest are the Marked Pairwise Interactions Models, that can actually take into account the possible "repulsion" or "attraction" among different locations and among different marks. See Diggle (2003) or Moller and Waagepetersen (2004) for a extensive introduction to these models. This framework is very flexible and we can adapt it to many different setting and measurement purposes. For example let’s measure the segregation of blacks with respect to the minorities (defined as blacks, asians, hispanics). This is a dichotomous index in which we consider blacks vs a fraction of the whole population. We redefine the conditional probabilities: let p (ξ, b) be the conditional probability that at location ξ there is P a black, given that in ξ there is a member of the minority {a, b, h}. We have p (ξ, b) = ρ (ξ, b) / m∈{a,b,h} ρ (ξ, m), while under perfect integration p (b) = S P ξ∈x0 |p(ξ,b)−p(b)| ρ (b) / m∈{a,b,h} ρ (m) and we can define the index accordingly φdism = 2np(b)(1−p(b)) . If we P want to measure the multigroup segregation of the minorities we define p (ξ, m) = P ρ (ξ, m) / k∈{a,b,h} ρ (ξ, k) and p (m) = ρ (m) / k∈{a,b,h} ρ (k) and the index will become φdism = S ξ∈x0 S S m∈{a,b,h} |p(ξ,m)−p(m)| . Following the same methodolody and with a continuous mark space, we can define a measure of income segregation. We can also build measures of income segregation by racial groups or measures of income segregation by income groups. For example income segregation among those with income above or below a certain level or in an income interval. This also can lead to build measures of multilevel segregation by using an higher dimensional marks’ space. It should be noted that when considering multidimensional marks the independence assumption holds for the vector m and not for the components of m: for example if we consider the segregation of race (r) and income (y) we assume independence across pairs m = (r, y) but not among r and y. This implies that we can 2n 15 m∈{a,b,h} p(m)(1−p(m)) See Mele (2007, in progress) for general results on indices additive in the events. 14 account for any possible correlation among marks and use the conditional joint distribution of race and income ρ (r, y) to describe the non segregated case. We leave these developments to future research. Finally, in this work we present what can be called a "descriptive" theory of segregation indices. As we already noticed above, the index is a random variable, depending on the realization of the point process: ideally we would like to know how the segregation index changes as a function of the process parameters. In Mele (2007, in progress) we provide some theoretical results for indices under the same assumptions used in this work. The Poisson assumption simplify the problem of computing the moments (in particular expectation and variance) of the index. We prove that given an intensity function we are able to compute the moments of any index. The only drawback is that in practice these are computationally very difficult, involving the evaluation ofPan infinite sum of infinite integrals. However, if we restrict the attention to indices of the form ξ∈X h (ξ), i.e. if we define the index as the sum over all the events of a particular individual level index (or function), we can show that the moments reduce to a simple integral that can be computed with traditional numerical integration methods. This allow us to estimate if (say) New York is on average structurally more segregated than Chicago, where structurally is interpreted as conditional on the intensity function. 5 Empirical Methodology All the data analysis was performed with R16 by using some available packages for the analysis of spatial point patterns and by custom functions written by the author in R and C.17 We first test the performance of this approach with artificial datasets, in order to give a flavor of the potential improvement with respect to the traditional neighborhood-based measures. We also use data of Metropolitan Statistical Areas (MSAs) and Primary Metropolitan Statistical Areas (PMSAs) from the 1990 and 2000 Census. 5.1 Data The artificial data were created in order to show the differences of the individuals-based measure and in order to convince the reader that it is immune from the problems mentioned in section 2. [Insert Figure 3 here] In Figure 3 we show the plot of the six artificial cities: A(symptotia), B(ayesia), C(lassica), D(eMoivria), E(mpirica) and F(isheria). Each city contains 800 individuals, distributed over the square [0, 4] × [0, 4]. There are 25% blacks (the black circles) and 75% whites (the red circles). The grid represents the partition in neighborhoods, so each city contains 16 neighborhoods with the same area (a square [0, 1] × [0, 1]). The following picture it is useful to explain how we constructed the cities. 16 http://www.r-project.org/ In particular we used the packages Splancs and SpatStat. The first one developed by Diggle and Rowlingson (1993) is especially devoted to nonparametric methods and it’s quite flexible for handling and manipulating data. The disadvantage is the fact that the polygons in which the data are created or simulated must be convex. The second one, developed by Baddeley and Turner (2005), is more related to parametric techniques but it also allows for nonconvex polygons, which is useful when considering real datasets (Manhattan, for example, is not a convex polygon). We also used the package spatialkernel developed by Diggle, Zheng and Durr (2005), but we modified some C routines in order to compute our indices and to speed up our empirical strategy. 17 15 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 Cities A, B and C were constructed in the following way: we simulated an HPP with 50 points on a unit square, one for blacks and a different one for whites; then we cloned and used the unit squares as neighborhoods of the cities, assigning 4 of them to be black and 12 of them to be white. For city A the neighborhoods 13 to 16 are black, for city B the neighborhoods 2, 6, 7 and 16 are black, and for city C we created a central ghetto by assigning 6, 7, 10 and 11 to the black population. City D was constructed in a different way. We simulated an HPP with 600 points over the square [0, 4] × [0, 4] and we assigned those points to be whites. Then we simulated an HPP with 200 points in the circle of radius one and we translated the coordinates such that the center of the circle coincided with the center of the city. We assigned the points in the circle to be black. This creates a pattern similar to city C, but we allow some whites to be inside the ghetto and the ghetto does not cover the entire area of 4 neighborhoods. City E was constructed by simulating an HPP with 600 points over the square [0, 4] × [0, 4] for the whites. Then we simulated two HPP with 100 points each over the circle of radius 1 for the black population. We set the circle’s centers to coincide with the centroid of neighborhoods 6 and 11. This creates an irregular black neighborhood in the city, while allowing whites to be inside the ghetto too. Finally, city F is the result of a simulation of an HPP with 600 points over the square [0, 4]×[0, 4] for the whites and an HPP with 200 points over the square [0, 4] × [0, 4] for the blacks. This is the perfect integrated case, according to our framework. In order to prove the effects of arbitrary partitions and the MAUP on the traditional measures of segregation we constructed different partitions of the cities in neighborhoods of equal area. In particular we will show how cities D and E’s measured level of segregation changes by progressively increasing the numbers of identical neighborhoods in the progression: 4, 16, 64. We also use census data from the 1990 and 2000 US Census of Population and Housing. The ideal dataset would consist of individual or household level data on location, racial group and socioeconomic characteristics. Unfortunately such data are not available, therefore initially we followed Echenique and Fryer (2006) and used data aggregated at the block level from the Summary Tape File 1B (STF1B) of the Census 1990 and the Summary File 1 (SF1) of the Census 2000, containing the location of the centroid of the block, the racial composition and some indicator of socioeconomic status (mean housing value, mean rent). We use the procedure of Echenique and Fryer (2006) and define a block to be black if its black population is more than a half. Otherwise we define the block to be nonblack. The same rule is followed for other racial groups. When computing the multigroup indices we assign a block to belong to a specific race if that race is majority in the block. [Insert Figure 4 and 5 here] Even if this is not the ideal dataset, it is a reasonable approximation of household level data, because blocks are very homogeneous for racial and socioeconomic composition (see the discussion in Echenique and Fryer (2006) for example). As an example the geographic pattern of blacks’ segregation in New York is shown in Figure 4: in red we have the centroids of the nonblacks’ blocks while in black the african-americans’ blocks. The pattern of geographic separation is evident: the 16 black population is concentrated in Harlem, Bronx and Bedford-Stuyvesant. In Figure 5 we show the same patterns for all the racial groups, showing in red the whites, in black the african americans, in green the asians, and cyan the other races (which are more or less equivalent to the hispanic population). The racial segregation pattern is very clear in the multigroup case too. However we noticed that this procedure can bias the index towards more segregation, since we are considering a block with 51% blacks and one with 99% blacks to be the same in our dataset and this is exactly what we would like to avoid. For instance, the metropolitan area of Laredo, TX, has 193117 inhabitants in 2000 and a population of 713 blacks. Using the Echenique and Fryer’s procedure we end up with 2 black blocks and 3091 white blocks. It turns out that these two black blocks are blocks with just one inhabitant, which is black. In Laredo we are thus using just 2 blacks of the 713 living in the city. This of course biases the index towards one. To take this into account we propose two alternative estimation methods, one for point pattern data and one for block level data. The second method is particularly relevant when only block level summary census data and the centroid of the block are available, since the Poisson assumption allows the researcher to recover the intensity function at each location (not only at the centroids). 5.2 Estimation Strategy with Point Pattern Data In order to get an estimate of the spatial dissimilarity index we use b ρ (m) = nm /n as an estimate of the marginal mark distribution. The spatial dissimilarity index can be estimated by ¯ PM Pn ¯¯ nm ¯ b ρ (x , m) − i m=1 i=1 b φ (5.1) ¡ ¢n P dism = nm nm 2n M 1 − m=1 n n The estimate b ρ (ξ, m) is obtained by nonparametric methods. We already explained that we can reformulate the multitype point process as a multivariate poisson process with independent univariate processes, and so we can estimate the univariate processes separately (because of independence). This observation leads to a convenient and intuitive estimate of b ρ (ξ, m) b ρ (ξ, m) = b (ξ, m) λ b0 (ξ) λ (5.2) b (ξ, m) is the estimate of the intensity function for the univariate process Xm , correwhere λ sponding to the spatial pattern of group m. Diggle (1985) and Berman and Diggle (1989) suggested b (ξ) = N (ξ, h) /πh2 , where N (ξ, h) is the number of events within disthe following estimator, λ tance h from ξ. Basically this is the counting of events within the disc of radius h and centered in ξ, scaled by the area of the disc.18 . In a more general formulation we have the following kernel estimator (see Diggle (2003), p.148 or Moller and Waagepetersen (2004)) b (ξ) = λ n X i=1 kh (ξ − xi ) S kh (ξ − xi ) dξ R (5.3) where kh (u) = h12 k (u/h). In our computations we will use a multiplicative quartic kernel in order to speed up the estimation procedure.19 18 This can be intepreted as a kernel estimator in which the kernel is 1 if 0 ≤ u ≤ 1 πu2 k (u) = 0 otherwise 19 For example, a gaussian kernel takes longer to compute and does not give much improvement in the estimation. 17 Some words whould be spent about the choice of the bandwidth h. Of course, the use of a kernel estimator calls for a criterium to choose the optimal bandwidth h. Researcher usually rely on MSE minimization, cross-validation criteria or more complicated methods. In this application it is not clear if the bandwidth should be chosen by using one of these criteria. From one hand, we can say that the optimal h should be different for each city. If a city is more spatially dense than another one then the bandwidth should take it into account. Also since the bandwidth can be interpreted as defining the relevant neighborhood for the individual (the local environment, in the words of Reardon and O’Sullivan (2004)), we can think that different cities could in principle have different relevant neighborhoods, and thus different h’s. This would suggest to choose different h’s for different cities. In almost all the estimations we will choose h such that the Mean Squared Error is minimized, following the computations of Diggle (1985) and Berman and Diggle (1989) that show the formula for the M SE (h) in the case of a stationary and isotropic Cox Process.20,21 Z Z 1 − 2K (h) ¡ 2 ¢−2 λ2 (kξ − ηk) dηdξ M SE (h) = λ2 (0) + Λ (A) + πh (5.4) πh2 where λ2 (kξ − ηk) is the second-order intensity function defined as ¾ ½ E [N (dη) N (dξ)] λ2 (ξ, η) = (5.5) lim |dη| |dξ| |dξ|,|dη|−→0 which is a measure of the spatial assotiation of the process. Notice that E [N (dη) N (dξ)] ≈ P [N (dη) = N (dξ) = 1], for ξ and η close. If we assume stationarity and isotropy then λ2 (ξ, η) = λ2 (kξ − ηk), i.e it is a function of the euclidean distance among the two points. The quantity K (h) is K (h) = λ −1 −2 E [No (h)] = 2πλ Z h λ2 (ξ) ξdξ (5.6) 0 and it is defined as the expected number of further events in the circle of radius h and center ξ. We estimate K (h) with the celebrated Ripley’s estimator: define w (ξ, u) as the proportion of the circumference of the circle with center ξ and radius u, which lies in S, and wij = w (xi , uij ), where uij = kxi − xj k . n b (h) = K XX 1 −1 |S| wij Ih (uij ) n (n − 1) (5.7) i=1 j6=i where Ih (uij ) = I (uij ≤ h) is an indicator function. This gives edge-corrected estimates of the K(h) function. For the remaining part of (5.4), λ2 (0) does not depend on h, while for the integral we use the weighted integral suggested by Berman and Diggle (1989). By plugging these estimates \ in (5.4) we obtain an estimated M SE (h). On the other hand, the estimated index will be a function of the bandwidth. We know that the intensity estimator is more sensitive to the bandwidth than to the specific kernel function chosen. Furthermore it is known that the choice of the specific kernel function is not as crucial as the choice of the bandwidth h: we concentrated our efforts on the latter issue. 20 A Cox Process is a point process such that: stochastic process 1) Λ (ξ) : ξ ∈ R2 is a non-negative-valued 2) Conditional on the realization Λ (ξ) = λ (ξ) : ξ ∈ R2 , the point process follows a IPP with intensity λ (ξ). We can see a IPP as a particular Cox process in which the distribution of Λ (ξ) is degenerate at λ (ξ). 21 This is a simple but rough method of computing the optimal bandwidth. The literature on Point Processes usually relies on ad hoc criteria. Diggle, Zheng and Durr (2005) use cross-validated likelihood methods. 18 This would imply that the difference of the measured segregation in two cities can depend on the specific bandwidth selection, suggesting to choose the same fixed bandwidth for all the cities. We present results in which we use a fixed bandwidth of .5 and 1 km respectively. It should be mentioned that there are other methods to estimate ρ (ξ, m). Among others, Kelsall and Diggle(1998) and Diggle, Zheng and Durr (2005) use a kernel regression estimator for ρ (ξ, m) and they choose the optimal bandwidth by cross-validated likelihood. We will not experiment with these techniques here and we will consider alternative estimation approaches in future developments. We conducted several experiments in order to find the fastest estimation procedure. We estimated the intensities using a finite grid of 2000×2000 for the kernel estimation, but we experimented with finer grids to test the robustness of the results. As a practical matter, when estimating the b (ξ, m) and λ b0 (ξ), in order to avoid unconditional probability, we use the same bandwidth for λ pleasant results like probabilities greater than one or conditional probabilities not summing up to one. However, we realized that the use of the grid can be a source of problems sometimes: using the same grid for (say) New York and Champaign will lead to a greater approximation error for New York, since this city is much more densely populated than Champaign. Moreover, notice that we we do not need to evaluate the intensity in each point of the set S but just at the observed locations. This is feasible because in order to compute the index (5.1) we need to evaluate the conditional probabilities at the observed points only. This method is considered more precise because the estimated indices do not rely on approximations on a finite grid. 5.3 Estimation Strategy with Count data In the case in which we have data aggregated by area, i.e. counts per block as in the Census Summary Files, we can use an approximated kernel regression method. The metropolitan area S K [ Sk and Sk ∩ Sl = ∅, for k 6= l. By the independent is partitioned in K disjoint subunits, S = k=1 scattering property the counting variables N (Sk ) over disjoint regions are independent. Therefore, by the definition of intensity function and intensity measure, we have Z λ (ξ) dξ EN (Sk ) = Sk for any k. This implies that we can write the number of points as Z N (Sk ) = λ (ξ) dξ + uk Sk where uk is a mean zero error, uncorrelated across R blocks because ¡ ¢of the independence across disjoint regions: thus there exists a ξ ∈ Sk such that Sk λ (ξ) dξ = λ ξ |Sk | and we can write ¡ ¢ N (Sk ) = λ ξ |Sk | + uk (5.8) If we assume that λ (ξ) is a very smooth function and the area of the block |Sk | is small, we can approximate (5.8) for ξ ∈ Sk with N (Sk ) ≈ λ (ξ) |Sk | + uk This allows us to use a kernel regression approach to estimate the expected number of points in Sk , E [ N (Sk )| ξ] ≈ λ (ξ) |Sk | 19 and thus the function λ (ξ) |Sk | can be estimated as b (ξ) |Sk | = λ n X i=1 K (ξ − xi ) Pn h ni j=1 Kh (ξ − xj ) where xi ’s are the centroids of the census blocks. Using this procedure we can then estimate b (ξ, m) |Sk | and taking the ratio we get an estimate for b b0 (ξ) |Sk | and λ ρ (ξ, m) λ Pn b (ξ, m) Kh (ξ − xi ) nmi λ b ρ (ξ, m) = = Pi=1 n b0 (ξ) λ i=1 Kh (ξ − xi ) n0i (5.9) where n0i is the number of people living in block i and nmi is the number of people belonging to race m and living in block i; we use the estimated conditional probabilities evaluated at the block centroid to compute the index. In Appendix C we present indices estimated by smoothing the proportion of each racial group, using the same approach. A practical alternative would be to assume that all the mass of individuals is concentrated at the centroid of the block. This is equivalent to assume that the intensity at the centroid is equal to the total number of people in the block. This procedure is practically very appealing but contradicts the point process assumptions, thus we prefere to use the kernel regression approach just shown. In Appendix B we present alternative parametric estimation methods for point patterns and count data. Specifically, for count data, we can recover the intensity function using MLE techniques, as long as we assume a parametric model for the intensity and we know the polygonal shape and coordinates of each block.22 6 Results 6.1 Artificial Data In Figure 6 we show a plot of the estimated MSE as a function of the bandwidth h. For most of the artificial cities the search for the optimal bandwidth is not hard. In general, as expected, the optimal bandwidth for the general spatial pattern is larger than the one for blacks: given the segregation pattern, blacks are much closer to each other and the precision of the kernel must be augmented. [Insert Figure 6 here] The selected bandwidth are summarized in the following table23 22 The procedure is very computationally involved since it requires to compute the intensity measure (an integral) for each block and iterate the numerical maximization routing to find the parameters of the intensity. 23 We had a minor problem with the bandwidth selection for City E. As shown in the table, the optimal h0 for city E, would be 0.005, which in fact gives kernel estimates that are just little circular regions of almost null radius. Given the final goal of obtaining a concrete estimate for the conditional probabilities ρ (ξ, b), we experimented various arbitrary h0 ’s. At the end we considered appropriate to use the bandwidth for whites. 20 Table 2: Optimal bandwidths Total Blacks Whites City A 2.83 0.418 2.43 City B 2.605 0.264 0.37 City C 0.37 0.264 0.62 City D 2.445 0.194 2.96 2.85 City E 2.395 0.00524 City F 2.73 2.78 2.75 In order to give an idea about how the intensities estimates look like we show the kernel estimates for city C in Figure 7 to 9. The visualization of the estimate is suggestive of why we should be concerned about the neighborhood-based indices of segregation, since the the intensity seems to vary a lot over the city’s area, at least for some cities. In Figure 7 we present the estimates of intensity for the population as a whole, in Figure 8 the one for blacks and in Figure 9 the intensity for nonblacks. In Figure 10 we show the estimated conditional probability of blacks, which is smoothed out at the border of the central black neighborhood.25 [Insert Figure 7, 8, 9 and 10 here] The comparison of the spatial dissimilarity with the standard dissimilarity index shows some interesting patterns, as shown in Table 3. Table 3: Spatial Dism vs. Traditional Spatial Dism Dism City A 0.9225333 1 City B 0.900698 1 City C 0.9061751 1 City D 0.803017 0.7816667 City E 0.8993939 0.8816667 City F 0.03108531 0.1216667 For the "extreme" cities A, B and C the spatial dissimilarity is smaller than the standard counterpart: this is the result of smoothing out the conditional probability ρ (ξ, b) over the region, as a consequence of computing the index based on the individual locations. For the cities D and E, the ones with a non-squared central ghetto it seems there is generally an accordance among the two indices, and they look very close. Of course if we change the neighborhoods definition this would change (see below). For the other "extreme" city, the perfectly integrated F, the spatial dissimilarity measures less segregation than the standard measure. The Modifiable Area Unit Problem (MAUP) does not affect the spatial dissimilarity by definition, while it heavily alters the standard measure. If we compute the dissimilarity index using different levels of aggregation of the data we will get different numbers. The problem is amplified when there is a very high level of segregation cause smaller subunits are more homogeneous than 24 The actual bandwidth used in the estimation is the one of whites, i.e h0 = 2.85. This is to avoid weird behaviour of the estimated intensities and conditional probabilities. 25 When looking at these estimates, we should keep in mind that we used a different bandwidth for each type (total population, blacks and whites), so the visible differences in intensity over space should not be interpreted as relevant in the computation of the conditional probabilities, since there we use the same bandwidth for blacks and total population. 21 bigger ones, hence when using smaller neighborhoods the index will be higher than when using bigger ones. The result of our simulations are shown in Table 4 Table 4: Dissimilarity and MAUP City D City E (φdism = 0.803017) (φdism = 0.8993939) 4 0.07666667 0.495 16 0.7816667 0.8816667 64 0.7558333 1.0 We computed the dissimilarity index for several different partitions of the cities: 4, 16, and 64 neighborhoods respectively. For city E we see a clear increase of the index as we increase the number of neighborhoods. Surprisingly, for city D, the value of the index is not necessarily monotonically increasing in the number of neighborhoods: from 4 neighborhoods to 16 the index increases, while it decreases from 16 neighborhoods to 64. Table 4 suggests another potential problem of the neighborhood-based approach: the relationship between the scale of the partition and the index is not necessarily monotonic. This does not happen in our framework: we will show that the spatial dissimilarity is a monotonically decreasing function of the bandwidth. 6.2 Metropolitan Areas Census Data We have computed the index of spatial dissimilarity for all the racial groups and all the US metropolitan areas in 2000, by using the different estimation methods. The computed indices are available at the website http://netfiles.uiuc.edu/amele2/www/pps/. For ease of exposition we analyze only blacks and multigroup segregation, showing results for a sample of 9 MSAs: Detroit, New York, Chicago, Los Angeles, San Francisco, Philadelphia, Boston, Cleveland, Champaign-Urbana. This is enough to show some of the properties of our measure and compare it with the traditional approach. 6.2.1 Blacks Segregation In Figure 11 we report the estimated MSE for the black population in the New York PMSA, as an example to illustrate the procedure of estimation. The minimizer corresponds to the optimal bandwidth of 348 meters. [Insert Figure 11 here] The corresponding estimated conditional probability is shown in Figure 12: the three main black areas in the Bronx, Harlem and Bedford-Stuyvesant shown in Figure 4 above, correspond to the whiter areas in Figure 12, where the conditional probability is close or equal to 1.26 [Insert Figure 12 here] 26 The reader should be aware that Figure 12 is realized with a grid 1000 × 1000, smaller than the grid we actually used in estimation. The main pattern is nonetheless clear even with the smaller resolution. 22 In Table 5 we present the principal result: we compare the spatial dissimilarity with the traditional indices computed using blocks and tracts as subunits. The indices reported in column 1 of table 5 are obtained using the approximated kernel regression method. Both the levels of segregation and the ranking of the cities are different from those implied by the traditional approach. Table 5: Spatial Dissimilarity vs Traditional (2000), Indices and Rankings Indices Rankings Spatial Dism Blocks Tracts Spatial Dism Blocks Tracts Detroit 0.8701 0.8655 0.8405 1st 1st 1st New York 0.6903 0.7013 0.6714 5th 6th 5th Chicago 0.7632 0.8215 0.7789 2nd 2nd 2nd Los Angeles 0.6148 0.6266 0.5765 6th 7th 7th San Francisco 0.5217 0.6149 0.5528 9th 8th 8th Philadelphia 0.7276 0.7565 0.6897 4th 4th 4th Boston 0.6009 0.7084 0.6364 7th 5th 6th Cleveland 0.7532 0.8096 0.7713 3rd 3rd 3rd Champaign 0.5937 0.6055 0.4468 8th 9th 9th One could object that this is just a consequence of smoothing out the neighborhood-based index, but if this is the case we would expect the estimated spatial dissimilarity to have values between the ones in columns 2 and 3: this is not true, thus we conclude that our index is not only a smooth version of the neighborhood-based indices but it is able to detect some aspect of the segregation phenomenon that traditional indices cannot. We think of this method as the most reliable nonparametric method when using count data.27 In Table 6 we presents results using the kernel regression and different bandwidths. One of the advantages of our approach is the possibility to compute segregation indices at different scales.28 The scale is a proxy for the local environment of the individuals and by varying the bandwidth we can vary the scale of the measurement. Moreover, as suggested in the methodological section, since the MSE minimization (or any other method to select the bandwidth) will prescribe a different bandwidth for each city according to the specific morphology of the metropolitan area, we may think that the measurement of segregation is directly dependent on the difference between bandwidths among cities. Therefore using the same bandwidth for all the cities may give us more comparable estimates. 27 For parametric methods we refer the reader to Appendix B. Notice that there is a price to pay when using parametric methods with count data: we need to specify a parametric model for the intensity function and we need to know the coordinates of the blocks boundaries in order to perform the integration of the intensity function over the block region. 28 I am extremely grateful to Patrick Bayer for this suggestion. 23 Table 6: Spatial Dissimilarity, different bandwidths Indices Rankings Optimal h = 0 .5 h = 1 Optimal h = 0 .5 h = 1 Detroit 0.8701 0.8536 0.8380 1st 1st 1st New York 0.6903 0.6878 0.6679 5th 5th 5th Chicago 0.7632 0.7552 0.7400 2nd 3rd 3rd Los Angeles 0.6148 0.6027 0.5795 6th 6th 6th San Francisco 0.5217 0.5275 0.5031 9th 9th 9th Philadelphia 0.7276 0.7079 0.6738 4th 4th 4th Boston 0.6009 0.5999 0.5780 7th 7th 7th Cleveland 0.7532 0.7588 0.7418 3rd 2nd 2nd Champaign 0.5937 0.5862 0.5459 8th 8th 8th On the first column of Table 6 we reproduce the kernel regression estimate of Table 5. In column 2 and 3 we propose the estimate with a bandwidth of half and one kilometer respectively for each city. The ranking obtained from a .5 km bandwidth is not very different from the one obtained from a 1 km bandwidth, while they are both different from the one obtained with MSE minimization.29 We computed the indices for the 9 MSAs by varying the bandwidth from .1 to 3 km: theoretically we expect that the index will converge to zero as the bandwidth increases, since the estimated process will converge to an homogeneous process with zero segregation. [Insert Figure 13 here] This result is the same Reardon et al (2006) and Feitosa et al (2006) found, but since they do not assume any stochastic process they are not able to give a theoretical justification of the negative relationship. Notice that the rankings are quite stable as a function of the bandwidth. 6.2.2 Multigroup Segregation For the multigroup version of the dissimilarity we computed similar tables. Table 7 has the same structure of Table 5. In order to avoid conditional probabilities that do not sum to one, we have to use the same bandwidth for each racial group. It is then quite arbitrary to decide which one to use. We think the optimal bandwidth of the entire population is the safest bet, so we used that in all the computation. 29 The ranks in column 5 and 6 are the same only for the 9 cities in our tables, but they change for other cities. 24 Table 7: Multigroup Spatial Dissimilarity vs Traditional (2000) Indices Rankings Spat Dism Blocks Tracts Spat Dism Blocks Tracts Detroit 0.8286 0.8530 0.8190 1st 1st 1st New York 0.6054 0.6647 0.4783 5th 6th 6th Chicago 0.6563 0.7705 0.7213 4th 4th 3rd Los Angeles 0.4834 0.5381 0.4780 8th 8th 7th San Francisco 0.4770 0.5276 0.4706 9th 9th 8th Philadelphia 0.6966 0.7931 0.7127 3rd 3rd 4th Boston 0.5336 0.6713 0.5479 7th 5th 5th Cleveland 0.7208 0.8345 0.7980 2nd 2nd 2nd Champaign 0.5466 0.5780 0.4376 6th 7th 9th As for the dichotomous case the spatial dissimilarity in column 1 implies different levels of segregation than the traditional one. The ranking is slightly different too. In Table 8 we repeat the exercise of Table 6, using a fixed bandwidth for all the cities. The difference in the indices are not striking, while the rankings appear quite similar.30 Table 8: Multigroup Segregation, different bandwidths (2000) Indices Rankings Optimal h = 0 .5 h = 1 Optimal h = 0 .5 h = 1 Detroit 0.8286 0.7945 0.7734 1st 1st 1st New York 0.6054 0.6020 0.5803 5th 5th 5th Chicago 0.6563 0.6412 0.6216 4th 4th 4th Los Angeles 0.4834 0.4715 0.4488 8th 9th 9th San Francisco 0.4770 0.4815 0.4618 9th 8th 8th Philadelphia 0.6966 0.6712 0.6338 3rd 3rd 3rd Boston 0.5336 0.5322 0.5053 7th 7th 6th Cleveland 0.7208 0.7287 0.7071 2nd 2nd 2nd Champaign 0.5466 0.5389 0.5028 6th 6th 7th The first conclusion we may draw from these table is that the bandwidth choice is crucial for the correct measurement of the segregation levels and more research is needed in order to improve the quality of the estimates. Nonetheless the results are suggestive of the differences of our measures and the neighborhood-based approach. We computed the correlation among our indices and the neighoborhood-based ones in Table 9. We present correlations with the standard dissimilarity, the isolation index, the information index and the Gini index (see Massey and Denton (1988) or Reardon and Firebaugh (2004) for a detailed description). For blacks we also show the correlation with the Spectral Segregation Index of Echenique and Fryer (2006), which is the only index based on individuals locations available in literature. 30 If we consider all the cities we have some differences. 25 Table 9: Correlations with traditional indices Panel A: Blacks SDism (opt) SDism (.5km) SDism (1km) SSI Dism SDism (.5km) 0.9773 SDism (1km) 0.9462 0.9702 SSI 0.7044 0.7213 0.7905 Dissimilarity 0.6675 0.6640 0.7522 0.5740 Isolation 0.7371 0.7468 0.8227 0.9000 0.7810 Information 0.7290 0.7313 0.8234 0.7926 0.9210 Gini 0.6749 0.6706 0.7577 0.5905 0.9897 Isol Info 0.9545 0.7797 0.9180 Panel B: Multigroup SDism (.5km) 0.9854 SDism (1km) 0.9728 Dissimilarity 0.7484 Isolation 0.7241 Information 0.7470 Gini 0.7430 0.9544 0.8442 0.9402 0.9825 0.7485 0.7258 0.7429 0.7455 0.8244 0.7990 0.8176 0.8157 0.8821 0.9530 0.9860 For the dichotomous version of the index (blacks) in Panel A, the correlation with the standard dissimilarity is between .6675 and .7522, indicating that we are not just replicating the measurement of the dissimilarity: our index captures something that the neighborhood-based dissimilarity cannot, i.e. the individual locations and exposure to other races (via the spatially varying conditional probabilities). Similarly the correlation with the Gini is between .6749 ad .7577. Notice that Gini and Dissimilarity are almost perfectly correlated. The correlation with the Information and Isolation indices is slightly higher but still far from one. It is interesting to notice the correlation with the Spectral Segregation Index (SSI), that varies from .7044 to .7905, showing that we are not just replicating the measures of Echenique and Fryer (2006). Also notice the high correlation of the SSi and the Isolation index: the SSI is based on the interactions among points so it is not surprising that it is highly correlated with the isolation index that measures the exposure to other race neighbors. The correlations for the multigroup indices in Panel B are slightly higher but the pattern is similar. In Appendix C we present the results obtained using the kernel regression method for smoothing the proportions in each block. The results do not perfectly overlap the ones presented here, since with proportions the approximation is more demanding in terms of smoothness of the intensity. All these results show that the choice of the estimation method is important in this context: the researcher should choose the estimation strategy based on the data availability (point pattern or count data) but also on the a priori information on the smoothness of the intensity. We leave the development of alternative methods to future research, while the interested reader can get a flavor of what can be done with parametric methods by reading Appendix B. 7 Conclusion and Discussion In this work we have shown a new approach to measure residential segregation with an application to the racial segregation. We assume that the locations of individuals of different racial groups 26 follow an Inhomogeneous Marked Poisson Process over the metropolitan area and we compute the conditional probability that in a specific location there is an individual of racial group m. If there is no segregation this conditional probability should not vary over space. We build a segregation index analogous to the dissimilarity and we show that it is immune from the problems arising with neighborhood-based measures: it does not depend on arbitrary partitions of the city in neighborhoods, it is a function of the individuals’ locations and it is immune from the modifiable area unit problem. Furthermore the index computed according to our approach gives different rankings of the cities than traditional measures, proving that this methodology doesn’t provide only a refinement of the existing indices. This framework is very flexible and future research will be devoted to explore all its potential. The main assumption we imposed is that the pattern of locations is a realization of a Inhomogeneous Multitype Poisson process over the metropolitan area: this amounts to assume that the univariate poisson processes are independent, i.e. for example the spatial location of blacks is independent by the spatial location of whites in the urban area. While this assumption may seem inappropriate when measuring racial segregation, it provides a useful benchmark. Future research will explore models where the interaction of different events is explicitly modeled. In particular, of great interest are the Marked Pairwise Interactions Models, that can actually take into account the possible "repulsion" or "attraction" of the different events/points and marks.31 Another interesting application is the measurement of income segregation, where the marks space is continuous. The definition of complete segregation is slightly different: The marked point process X is completely segregated if and only if for all ξ ∈ x0 , ∃m∗ = m∗ (ξ) ∈ M such that ρ (ξ, m) = δ (m − m∗ ), where δ (u) is the Dirac-Delta function. In this case the spatial dissimilarity is Z 1 X |ρ (ξ, m) − ρ (m)| dm φdism = 2n M ξ∈x0 where we have replaced the sum over the racial groups by the integral over M = [0, ∞) and we use an analogous normalization. The estimation is also more complicated in this case so we refer the reader to a companion paper in progress. We have shown that the framework can be easily extended for the measurement of subgroups segregation or multilevel segregation. It should be noted that when considering multidimensional marks the independence assumption holds for the vector m and not for the components of m: for example, if we consider the segregation of race (r) and income (y) together, we assume independence across pairs m = (r, y) but not among r and y. This implies that we can account for any possible correlation among the submarks r and y and use the joint distribution of race and income ρ (r, y) to describe the non segregated case. We leave this development to future efforts. In this work we have shown a "descriptive" theory of segregation, where the index is a function of the specific realization. However the index is a random variable itself, being a function of the point process realization. So we can build a test that probabilistically assess if a city is more segregated than another one. The test provided in Kelsall and Diggle (1998) or in Diggle, Zheng and Durr (2005) is for detection only: the null hypothesis is no segregation and rejecting means that there is segregation, wihout referring to the level. The development of tests is highly influenced by the computational speed, therefore the experimentation of faster and more precise estimation methods is necessary. We think of experimenting kernel regression methods (Kelsall and Diggle (1998) and Diggle, Zheng and Durr (2005)), total variation regularization methods used in density 31 See Diggle (2003) or Moller and Waagepetersen (2004) for an extensive introduction to these models. 27 estimation (see for example Koenker and Mizera (2004))32 and other smoothing techniques. The parametric methods analyzed in Appendix B are appealing when we have count data at the block level or other small areas, as long as we have the boundaries of these polygons. The drawbacks are that we have to specify a parametric model for the intensity and the numerical optimization routine can be computationally very slow. Finally, in this work we present what can be called a "descriptive" theory of segregation indices. As we already noticed above, the index is a random variable, depending on the realization of the point process: ideally we would like to know how the segregation index changes as a function of the process parameters. In Mele (2007, in progress) we provide some theoretical results for indices under the same assumptions used in this work. The Poisson assumption simplify the problem of computing the moments (in particular expectation and variance) of the index. We prove that given an intensity function we are able to compute the moments of any index. The only drawback is that in practice these are computationally very difficult, involving the evaluation ofPan infinite sum of infinite integrals. However, if we restrict the attention to indices of the form ξ∈X h (ξ), i.e. if we define the index as the sum over all the events of a particular individual level index (or function), we can show that the moments reduce to a simple integral that can be computed with traditional numerical integration methods. This allow us to estimate if (say) New York is on average structurally more segregated than Chicago, where structurally is interpreted as conditional on the intensity function. References [1] Ananat, Oltmans Elizabeth (2007), "The Wrong Side(s) of the Tracks: Estimating the Causal Effect of Racial Segregation on City Outcomes", mimeo, Duke University and NBER [2] Anselin, Luc (1995), "Local Indicators of Spatial Association - LISA", Geographical Analysis 27(2):93-115 [3] Baddeley, Adrian and Turner, Rolf (2005), "spatstat: An R Package for Analyzing Spatial Point Patterns", Journal of Statistical Software, 12(6):1-42 [4] Berman, Mark and Diggle, Peter (1989), "Estimating Weighted Integrals of the Second-Order Intensity of a Spatial Point Process", Journal of the Royal Statistical Society, Series B, 51(1):8192 [5] Card, David and Rothstein, Jesse (2007), "Racial Segregation and the Black-White Test Score Gap", forthcoming, Journal of Public Economics. [6] Cutler , D. M. and Glaeser, E. L. (1997), "Are Ghettos Good or Bad", Quarterly Journal of Economics, 112: 827-872 [7] Cutler, D. M., Glaeser, E.L. and Vigdor, Jacob L. (1999), The Rise and Decline of the American Ghetto, Journal of Political Economy, 107(3):455-506 [8] Daley, D. J. and Vere-Jones, D (2003), "An Introduction to the Theory of Point Processes", Springer, 2nd Edition 32 This methods are likely to produce better results with income segregation, where the continuity of the marks creates problem in the kernel estimation in the form of Dirac catastrophe. 28 [9] Diggle, Peter (1983), "Statistical Analysis of Spatial Point Patterns", Academic Press, London, First Edition [10] Diggle, Peter (1985), "A Kernel Method for Smoothing Point Process Data", Applied Statistics, 34(2):138-147 [11] Diggle, Peter (2003), "Statistical Analysis of Spatial Point Patterns", Academic Press, London, Second Edition [12] PJ Diggle, SJ Eglen, JB Troy (2006). "Modelling the Bivariate Spatial Distribution of Amacrine Cells", In A. Baddeley et al. (Eds.) Case Studies in Spatial Point Process Modelling, Springer Lecture notes in Statistics 185:215—233 [13] Diggle, Peter, Zheng, Pingping and Durr, Peter (2005), "Nonparametric estimation of spatial segregation in a multivariate point process: bovine tubercolosis in Cornwall, UK", Applied Statistics, 54(3):645-658 [14] Echenique, Federico and Fryer, Roland (2007), "A Measure of Segregation Based on Social Interactions", Quarterly Journal of Economics 122(2):441-485 [15] Feitosa, Flavia, Camara, Gilberto, Monteiro, Antonio M. V., Koschitzki, Thomas, and Silva, Marcelino P. S. (2007), "Global and Local Spatial Indices of Urban Segregation", International Journal of Geographical Information Science 21(3):299-323 [16] Glaeser, E. L. and Vigdor, Jacob L. (2000), Racial Segregation in 2000 Census: Promising News, Center of Urban and Metropolitan Policy, The Brookings Institution Survey Series, April [17] Kelsall, Julia E. and Diggle, Peter J. (1998), "Spatial Variation in Risk of Diseases: A Nonparametric Binary Regression Approach", Applied Statistics, 47(4):559-573 [18] Koenker, Roger and Mizera, Ivan (2004), "Penalized Triograms: Total Variation Regularization for Bivariate Smoothing", Journal of Royal Statistical Society: Series B (Statistical Methodology) 66(1):145-163 [19] La Ferrara, Eliana and Mele, Angelo (2006), "Racial Segregation and Public School Expenditure", CEPR Discussion Paper 5750 [20] Massey, Douglas S. and Denton, Nancy A. (1988). The Dimensions of Residential Segregation, Social Forces, 67(2):281-315 [21] Mele, Angelo (2007), "Poisson Indices of Segregation", in progress, UIUC [22] Moller, Jesper and Waagepetersen, Rasmus Plenge (2004), "Statistical Inference and Simulation for Spatial Point Processes", Monographs on Statistics and Applied Probability 100, Chapman and Hall [23] Reardon, Sean F. and Firebaugh, Glenn (2002), Multigroup Segregation Indices, Sociological Methodology 32:33-68 [24] Reardon, Sean F. and O‘Sullivan, David (2004), "Measures of Spatial Segregation", Sociological Methodology 34:121-162 29 [25] Reardon, Sean F, O‘Sullivan, David, Lee, Barrett A., Firebaugh, Glenn, Farrell, Chad (2006), "The Segregation Profile: Investigating How Metropolitan Racial Segregation Varies by Spatial Scale", WP 06-01, Stanford University [26] Rowlingson, B.S. and Diggle P.J. (1993), "Splancs: Spatial Point Patterns Analysis Code in S-Plus", Computers and Geosciences, 19:627-655 [27] Stoyan, D., Kendall, W.S. and Mecke, J. (1987), "Stochastic Geometry and Its Applications", John Wiley and Sons [28] Stoyan, D. and Stoyan, H (1994), "Fractals, Random Shapes and Point Fields: Methods of Geometrical Statistics", Wiley Series in Probability and Mathematical Statistics, John Wiley and Sons [29] Zhuang J., Ogata Y. and Vere-Jones D. (2005), "Diagnostic Analysis of Space-Time Branching Processes for Earthquakes" Chap. 15 (Pages 275-290) of Case Studies in Spatial Point Process Models, Edited by Baddeley A., Gregori P., Mateu J., Stoica R. and Stoyan D. Springer-Verlag, New York. 30 A Preliminaries of Point Processes Theory This appendix elaborates from various sources and we only cover what we consider a prerequisite for the understanding of the methodology used to measure residential segregation: we refer the interested reader to the books listed in the references for a more exhaustive treatment of the theory. A.1 Basic properties and definitions A spatial point process is a stochastic process that maps countable sets in planar regions. More generally a point process X is a random countable subset of a space S ⊆ R2 . We will denote the random set as X = {xi } or according to the context as the random variable N (A), that is the number of points in set A ⊆ S. We will denote the realizations of X as x and the realizations of N as n. We will denote a generic point in S as ξ or η and the generic point of the process as xi . We will write |A| to indicate the area of region A and dξ to refer to the infinitesimal region containing ξ. We will consider only finite point fields. Formally DEFINITION A.1 Let’s consider any realization of the process x ⊆ S and let’s denote the cardinality of the set as n (x). Then we say that x is locally finite if n (x ∩ A) < ∞, for any bounded A⊆S Consider the set of all such realizations x N1f = {x ⊂ S : n (x ∩ A) < ∞, for any bounded A ⊆ S} whom elements are locally finite point configurations. In the following we will consider only processes X with realizations in N1f . The first important concept is stationarity: a point process is stationary if when observed from different sets on the plane, the configurations of the points are similar, differences arising from randomness (that follows the same laws). More formally, a point process is stationary if all probability statements about the process in any bounded set A of the plane are invariant under arbitrary translations. This property is very important in defining the randomness of the process as we’ll see below. DEFINITION A.2 (Stationarity) A point process X is stationary if for any p ∈ R2 , the translated process Xp = X + p = {xi + p : xi ∈ X} and X have the same distribution, i.e. P (X ∈ A) = P (Xp ∈ A). This implies that all the statistics are invariant under translation, e.g. EN (A) = ENp (A) are constant over the region A. A point process is isotropic if the invariance holds under arbitrary rotations. DEFINITION A.3 (Isotropy) A point process X is isotropic if for any m ∈ R, the processes X and mX have the same distribution, i.e. P (X ∈ A) = P (mX ∈ A) A process that is stationary and isotropic is called motion-invariant. For convenience we will also assume that the process is simple (or orderly), i.e that multiple coincident events cannot occur. Formally we have the following DEFINITION A.4 (Orderliness) A point process X is orderly (simple) if xi 6= xj for all i 6= j. 31 A.2 First and Second Order Properties Consider a process X defined over S ⊆ R2 . The intensity function is a locally integrable function λ : S → [0, ∞), defined as the limit of the expected number of points per infinitesimal area ¾ ½ E [N (dξ)] (A.1) λ (ξ) = lim |dξ| |dξ|→0 R A function is locally integrable if λ (ξ) dξ < ∞ for all bounded A ⊆ S. If we assume stationA arity then λ (ξ) = λ for all ξ. The second order intensity function is defined as ¾ ½ E [N (dξ) N (dη)] λ2 (ξ, η) = lim |dξ| |dη| |dξ|,|dη|→0 (A.2) and it is a measure of the spatial assotiation of the process. If we assume stationarity and isotropy then λ2 (ξ, η) = λ2 (kξ − ηk), it is a function of the euclidean distance among the two points. It is convenient to define another quantity: the intensity measure of a point process X is defined for A ⊆ S as Z λ (ξ) dξ (A.3) Λ (A) = EN (A) = A It is usually assumed that Λ (A) is locally finite, i.e. Λ (A) < ∞ for all bounded A ⊆ S, and diffuse, i.e. Λ ({ξ}) = 0, for ξ ∈ S (or alternatively @ξ ∈ S s.t. Λ ({ξ}) > 0) The fact that Λ (A) is diffuse implies that P [N (dξ) > 1] = o (|dξ|), i.e. there are no coincident points, so the process is simple (or orderly). The intensity function has also an infinitesimal interpretation, since the fact that P [N (dξ) > 1] = o (|dξ|) implies that E [N (dξ)] converges to P [N (dξ) = 1] as |dξ| → 0.33 It follows that the quantity λ (ξ) dξ can be interpreted as the probability of an event in the infinitesimal region dξ, i.e λ (ξ) dξ ≈ P [N (dξ) = 1]. Analogously Notice that E [N (dη) N (dξ)] ≈ P [N (dη) = N (dξ) = 1], for ξ and η close, and we can interpret the quantity λ2 (ξ, η) dξdη as the probability of observing two events in the infinitesimal regions dξ and dη. A.3 Poisson Processes The Poisson point processes are by far the most important in applications and are the models that define the notion of complete spatial randomness. Before going over the general definition of a Poisson process we have to consider a related process. Let’s consider any density function f defined on A ⊆ S and let n ∈ N DEFINITION A.5 (Binomial Point Process) A point process X is a Binomial Point Process of n points in A with density f if it consists of n i.i.d. points with density f . We will denote such a process as X ∼ Bin (A, n, f ). 33 With a back-of-the-envelope computation E [N (dξ)] = P [N (dξ) = 1] E [ N (dξ)| N (dξ) = 1] + P [N (dξ) > 1] E [ N (dξ)| N (dξ) > 1] = P [N (dξ) = 1] + P [N (dξ) > 1] E [ N (dξ)| N (dξ) > 1] and as |dξ| → 0 the claim is proven. 32 Since f is a density function, i.e. R A f (ξ) dξ = 1, it follows necessarily that |A| > 0. The simplest Binomial point process has finite A, i.e. |A| < ∞, and the points are drawn from a uniform distribution over A, so that f (ξ) = |A|−1 DEFINITION A.6 (Poisson Point Process) A point process X on S is a Poisson Point Process with intensity λ (ξ) if the following two conditions are satisfied: 1. for any bounded A ⊆ S with Λ (A) < ∞ P [N (A) = n] = [Λ (A)]n exp [−Λ (A)] , n! n = 0, 1, 2, .... 2. for any n ∈ N and any bounded A ⊆ S with 0 < Λ (A) < ∞ , conditional on N (A) = n with f (ξ) = λ (ξ) /Λ (A) = λ (ξ) / R X ∼ Bin (A, n, f ) A λ (ξ) dξ. We will write X ∼ P oi (S, λ (ξ)). Given the condition (1) of the definition, for any bounded A ⊆ S, we have EN (A) = Λ (A). In many works condtion (2) is replaced by the independent scattering condition. 3. for disjoint sets A1 , A2 , A3 , ...Ak ⊆ A the random variables N (A1 ) , N (A2 ) , N (A3 ) , ... are stochastically independent, i.e. P [N (A1 ) = n1 , ..., N (Ak ) = nk ] = [Λ (A1 )]n1 exp [−Λ (A1 )] exp [−Λ (Ak )] × · · · × [Λ (Ak )]nk n1 ! nk ! for n = n1 + n2 + ... + nk . It is straightforward to show that condition (3) is implied by (1) and (2) of the above definition.34 A Poisson Point Process is said Homogeneous (or stationary) if λ (ξ) = λ, for all ξ ∈ S and f (ξ) = |A|−1 , for any bounded A ⊆ S. It follows that for an Homogeneous Poisson Process (HPP) EN (A) = λ |A|. 34 Let’s consider the case in which we have only two disjoint sets, i.e. A = A1 ∪ A2 . The extension to k sets is done by induction. Conditional on N (A) = n1 + n2 = n, P [ξ ∈ (X ∩ A)] = f (ξ) = λ (ξ) /Λ (A). Then given N (A) = n, ] Λ (A1 ) f (ξ) dξ = P [ N (A1 ) = 1| N (A) = n] = Λ (A) A1 ln1 k 1) and also and by condition (2), P [ N (A1 ) = n1 | N (A) = n] = Λ(A Λ(A) P [ N (A1 ) = n1 , N (A2 ) = n2 | N (A) = n] = $ # n n Λ (A1 ) 1 Λ (A2 ) 2 n1 + n2 n1 Λ (A) Λ (A) = [Λ (A1 )]n1 [Λ (A2 )]n−n1 n! n1 ! (n − n1 )! Λ (A)n and thus (1) implies that the unconditional probability is P [N (A1 ) = n1 , N (A2 ) = n2 ] = = [Λ (A1 )]n1 [Λ (A2 )]n−n1 exp [−Λ (A)] n! [Λ (A)]n n1 ! (n − n1 )! [Λ (A)]n n! exp [−Λ (A )] exp [−Λ (A )] 1 2 [Λ (A1 )]n1 [Λ (A2 )]n−n1 n1 ! (n − n1 )! 33 DEFINITION A.7 (Homogeneous Poisson Process) A point process X on S is an Homogeneous Poisson Point Process with intensity λ if the following two conditions are satisfied: 1. for any bounded A ⊆ S P [N (A) = n] = [λ |A|]n exp [−λ |A|] , n! n = 0, 1, 2, .... 2. for any n ∈ N and any bounded A ⊆ S, conditional on N (A) = n ´ ³ X ∼ Bin A, n, |A|−1 The HPP is considered the ideal of complete spatial randomness in literature. Complete spatial randomness means that we do not expect the intensity of the process to vary over the region we are considering and that there are no interactions amongst different events. Indeed, by condition (1) and the fact that λ (ξ) = λ, an HPP shows stationarity and isotropy, cause N (A) ∼ P oisson (λ |A|), and thus the expected number of events does not vary over the planar region A; by condition (2) and f (ξ) = |A|−1 , we have no clustering or inhibition (the presence of a point in ξ does not make more or less likely the occurrence of an event η in the neighborhood of ξ). A Poisson Point Process is Inhomogeneous (IPP) if the intensity function is not constant over A, thus is nonstationary and anisotropic. The IPP is the simplest class of nonstationary point processes used in applications. A.4 Marked Point Processes Consider a point process X0 defined over the space S ⊆ R2 . If there are random marks m (ξ) ∈ M attached to each point ξ ∈ X0 then the process X = { {ξ, m (ξ)}| ξ ∈ X0 } is called marked point process with events in S and marks in M. Notice that M may be either a finite set, i.e. M = {1, 2, ..., M }, in which case X is multitype process, or a general subset M ⊆ Rq , q ≥ 1 (it can also be a set of compact subsets, i.e. M = {F : F ⊆ Rq }, that is called boolean model ). In the case of categorical variables we use a finite set (for example when considering the racial groups). DEFINITION A.8 (Marked Poisson Process) The process X = { {ξ, m (ξ)}| ξ ∈ X0 } is a Marked Poisson Process if 1. X0 is a Poisson Point Process over S with intensity function λ0 (ξ) (with all bounded A ⊆ S) R A λ0 (ξ) dξ < ∞ for 2. conditional on X0 the marks { m (ξ)| ξ ∈ X0 } are mutually independent The intensity λ (ξ, m) of the Marked Poisson Process is such that R M λ (ξ, m) dm = λ0 (ξ). We have the following proposition (for a proof see Proposition 3.9 in Moller and Waagepetersen (2004), p. 26) 34 If X = { {ξ, m (ξ)}| ξ ∈ X0 } is a Marked Poisson Point Process with PROPOSITION A.1 M ⊆ Rq , q ≥ 1 and if 1. conditional on X0 , the marks have distribution ρ (ξ, m, X0 Âξ) = ρ (ξ, m) 2. the intensity of the process can be written as λ (ξ, m) = λ0 (ξ) ρ (ξ, m) then X ∼ P oi (S × M, λ (ξ, m)) The proposition is very useful in the framework we use in measuring segregation, cause it implies that the Marked Poisson Process is Poisson over the enlarged space S × M and we can use the standard estimation methods developed for Poisson Processes. Another useful corollary of the proposition is the following PROPOSITION A.2 Consider a Multitype Point Process with M = {1, 2, ..., M } and a multivariate point process (X1 , X2 , ..., XM ). The following two properties are equivalent 1. P ( m (ξ) = m| X0 = x0 ) = ρ (ξ, m) does not depend on X0 Âξ 2. (X1 , X2 , ..., XM ) is a multivariate Poisson Process with Xm ∼ P oi (S, λ (ξ, m)) mutually independent and λ (ξ, m) = λ0 (ξ) ρ (ξ, m), m = 1, ..., M When the conditional mark distribution does not depend on location, ρ (ξ, m) = ρ (m) for all ξ, then we have random labelling. B Alternative Estimation Methods B.1 Parametric Estimation with Point Pattern Data In the case of the Inhomogeneous Poisson Process the likelihood can be written easily by exploiting the definition P (X) = P ( X| N (S) = n) P (N (S) = n) ! Ãn · Y λ (xi ) ¸ exp [−Λ (S)] [Λ (S)]n = Λ (S) n! i=1 By taking the logs and rearranging we get log P (X) = = = n X i=1 n X i=1 n X i=1 log λ (xi ) − n log Λ (S) − Λ (S) + n log Λ (S) − log (n!) log λ (xi ) − Λ (S) − log (n!) log λ (xi ) − Z S λ (ξ) dξ − log (n!) So the parameters of the intensity function can be estimated using maximum likelihood techniques b θ = arg max θ n X i=1 log λθ (xi ) − 35 Z S λθ (ξ) dξ B.2 Recovering Intensity from Count data When the available data are not point patterns, but aggregated by area, we can use the independent scattering property of the Poisson Process to recover the intensity. Suppose the metropolitan are K [ Sk and Sk ∩ Sl = ∅, for k 6= l, that we will call S is partitioned in K disjoint subunits, S = k=1 blocks (but may be arbitrarily small areas). By the independent scattering property the counting variables over disjoint regions are independent. Therefore P (N (S1 ) = n1 , ..., N (SK ) = nK ) = P (N (S1 ) = n1 ) · · · P (N (SK ) = nK ) exp [−Λ (S1 )] exp [−Λ (SK )] [Λ (S1 )]n1 · · · [Λ (SK )]nK = n1 ! nK ! "K # K Y Y = n−1 ! exp [−Λ (S)] [Λ (Sk )]nk k k=1 k=1 " So the log-likelihood can be written as (we don’t consider K Y # n−1 k ! , since it is constant k=1 log P (X) = −Λ (S) + = − = − Z K X nk log [Λ (Sk )] k=1 λ (ξ) dξ + S K ·Z X k=1 K X nk log λ (ξ) dξ + Sk λ (ξ) dξ Sk k=1 ¸ ·Z K X ·Z nk log ¸ λ (ξ) dξ Sk k=1 ¸ and we can estimate the intensity function, assuming a parametrization θ b θ = arg max θ K X k=1 nk log ·Z Sk ¸ λθ (ξ) dξ − K ·Z X k=1 Sk λθ (ξ) dξ ¸ The main price to pay is the necessity to specify a functional form for the intensity function in order to compute the integral. Furthermore the integral cannot be computed if we don‘t have the coordinate of the blocks boundaries. The Census releases boundary files down to the block group level, but not at the block level. An alternative is to use the Tiger Files. C Kernel Regression using Proportions In this appendix we show the results obtained using the kernel regression of the proportions of each racial groups in the block. The results are slightly different, confirming the fact that this is an approximated estimate. 36 Table C1: Spatial Dissimilarity, different bandwidths (2000) Indices Rankings Optimal h = .5 h=1 Optimal h = .5 h = 1 Detroit 0.8703224 0.8501121 0.8352947 1st 1st 1st New York 0.6877347 0.6848539 0.6675551 5th 5th 5th Chicago 0.7606747 0.7498699 0.7345411 2nd 3rd 3rd Los Angeles 0.629817 0.6175982 0.5955263 7th 6th 6th San Francisco 0.5319183 0.5397131 0.5041305 10th 10th 9th Philadelphia 0.7272086 0.7076749 0.6758686 4th 4th 4th Boston 0.6034576 0.6022681 0.5777609 8th 7th 7th Cleveland 0.7496532 0.7559522 0.7377916 3rd 2nd 2nd Champaign 0.5962014 0.5898408 0.5583697 9th 8th 8th Laredo 0.676968 0.5063632 0.3431035 6th 9th 10th Table C2: Multigroup Segregation, different bandwidths (2000) Indices Rankings Optimal h = 0 .5 h =1 Optimal h = 0 .5 h = 1 Detroit 0.8290554 0.7929796 0.7712272 1st 1st 1st New York 0.6047147 0.6011003 0.5804878 5th 5th 5th Chicago 0.6575917 0.6392469 0.6171649 4th 4th 4th Los Angeles 0.4906992 0.4776546 0.4549143 8th 9th 9th San Francisco 0.4741394 0.4796534 0.4552766 9th 8th 8th Philadelphia 0.6979656 0.6727337 0.6367538 3rd 3rd 3rd Boston 0.5337201 0.5320079 0.4981971 7th 7th 7th Cleveland 0.7185795 0.7274199 0.7031806 2nd 2nd 2nd Champaign 0.5476806 0.5401149 0.5064247 6th 6th 6th Laredo 0.3056275 0.2343179 0.1868961 10th 10th 10th 37 Table C3: Correlations with traditional indices Panel A: Blacks SDism (opt) SDism (.5km) 0.9778 SDism (1km) 0.9523 SSI 0.6302 Dissimilarity 0.6163 Isolation 0.6677 Information 0.6630 Gini 0.6220 Panel B: Multigroup SDism (.5km) 0.9846 SDism (1km) 0.9756 Dissimilarity 0.7190 Isolation 0.6870 Information 0.7199 Gini 0.7192 SDism (.5km) SDism (1km) SSI Dism Isol Info 0.9750 0.6470 0.6144 0.6772 0.6660 0.6199 0.7347 0.7100 0.7687 0.7699 0.7138 0.5740 0.9000 0.7926 0.5905 0.7810 0.9210 0.9897 0.9545 0.7797 0.9180 0.9856 0.7197 0.6888 0.7164 0.7228 0.7920 0.7633 0.7914 0.7900 0.8821 0.9530 0.9860 0.9544 0.8442 0.9402 38 7 6 5 y 4 3 2 1 1 2 3 4 y 5 6 7 8 CITY B 8 CITY A 1 2 3 4 5 6 7 8 1 2 3 4 x 5 6 7 8 6 7 8 x 7 6 5 y 4 3 2 1 1 2 3 4 y 5 6 7 8 CITY D 8 CITY C 1 2 3 4 5 6 7 8 1 x 2 3 4 5 x Figure 1: Different partitions matter 1 HPP IPP MultitypePP MPP Figure 2: Examples of realizations of point processes 2 3 2 w[,2] 1 2 0 0 1 pp[,2] 3 4 Figure 7: CITY B 4 Figure 7: CITY A 0 1 2 3 4 0 1 pp[,1] 4 3 4 3 4 4 0 0 1 2 yc 3 3 2 1 w[,2] 3 Figure 7: CITY D 4 Figure 7: CITY C 0 1 2 3 4 0 1 w[,1] 2 xc Figure 7: CITY F 2 1 0 0 1 2 yc 3 3 4 4 Figure 7: CITY E yc 2 w[,1] 0 1 2 3 4 0 1 xc 2 xc Figure 3: Artificial data 3 150 100 Northings 50 0 0 20 40 60 80 Eastings Figure 4: Geographic distribution of blacks in New York PMSA, 2000 4 150 100 Northings 50 0 0 20 40 60 80 Eastings Figure 5: Geographic distribution of racial groups in New York PMSA, 2000 5 Figure 6: Estimated MSE and optimal bandwidths for cities A and C 6 Figure 7: Estimated λ0 (ξ) for city C 7 Figure 8: Estimated λ (ξ, b) for city C 8 Figure 9: Estimated λ (ξ, nb) for city C 9 Figure 10: Estimated conditional probability ρ (ξ, b) for city C 10 Figure 11: Estimated conditional probability for blacks in New York PMSA 11 Figure 12: Estimated conditional probability of black in New York PMSA, 2000 12 Figure 13: Spatial Dissimilarity and Scale 13
© Copyright 2024