Effect of Object Identification Algorithms on Feature Based Verification Scores Michael Weniger Petra Friederichs Meteorological Institute, University Bonn EGU Vienna 18 April 2015 Michael Weniger (MIUB) OIA and Feature Based Verification 1 / 30 Motivation Motivation Feature based methods for the evaluation of spatial fields employ object identification algorithms (OIA). Verification scores are defined based on the results of these OIA. ⇒ How does the choice of OIA and its parameters influence the resulting scores? ⇒ What are the implications for spatial fields with non-negligible observational uncertainties? Michael Weniger (MIUB) OIA and Feature Based Verification 2 / 30 Introduction Goal: Evaluation of Probabilistic Spatial Fields Figure: Example for two 2D spatial fields Definitions spatial field: a set of data with spatial dimension greater than one. probabilistic spatial field: the value of each point is not a real number, but a random variable. These random variables are usually correlated in space and time, e.g. I I non-negligible observational uncertainties, the output of an ensemble model. Michael Weniger (MIUB) OIA and Feature Based Verification 3 / 30 Introduction First Step: Evaluation of Deterministic Spatial Fields Figure: Example for two 2D spatial fields Why do traditional methods not work? I Double penalty of misplaced events: a forecast with a misplaced event is scored worse by point-to-point measures than a forecast with either a complete miss or a false alarm since it is penalized as both at once. I Domination of small scale errors and noise ⇒ interesting information is lost ⇒ We need different techniques. Michael Weniger (MIUB) OIA and Feature Based Verification 4 / 30 Introduction First Step: Evaluation of Deterministic Spatial Fields Figure: Example for two 2D spatial fields During the last decade many new spatial verification methods have been developed. They can be classified into four categories [1, 2]: I fuzzy verification / neighborhood methods I scale separation techniques I field deformation I feature-based methods Michael Weniger (MIUB) OIA and Feature Based Verification 5 / 30 Introduction First Step: Evaluation of Deterministic Spatial Fields Figure: Example for two 2D spatial fields During the last decade many new spatial verification methods have been developed. They can be classified into four categories [1, 2]: I fuzzy verification / neighborhood methods I scale separation techniques I field deformation I feature-based methods ← we will focus on these methods Michael Weniger (MIUB) OIA and Feature Based Verification 5 / 30 Introduction Feature-Based Method: Overview Spatial Field Object Identification External object identification algorithms (OIA) are employed to define objects in the original spatial data. There are many different OIA, which have one or more parameters (e.g. threshold level or smoothing), which have to be specified by the user. Objects Michael Weniger (MIUB) OIA and Feature Based Verification 6 / 30 Introduction Feature-Based Method: Overview Spatial Field Object Identification External object identification algorithms (OIA) are employed to define objects in the original spatial data. There are many different OIA, which have one or more parameters (e.g. threshold level or smoothing), which have to be specified by the user. Objects Verification Method Verification scores are calculated based on the binary object masks given by the OIA and the original spatial fields. Scores Michael Weniger (MIUB) OIA and Feature Based Verification 6 / 30 Introduction Feature-Based Method: Central Question How important is the choice of OIA and parameters for the resulting verification scores? Why is this important? I The OIA and its parameters usually have to be chosen by the user. I It is not uncommon to find multiple valid choices. I If a score is very sensitive to these choices one might get very different results that have equal justification. ⇒ The explanatory power of the verification method is very weak. Michael Weniger (MIUB) OIA and Feature Based Verification 7 / 30 Introduction Feature-Based Method: Central Question How important is the choice of OIA and parameters for the resulting verification scores? Why is it particularly important for probabilistic fields? I The effect of uncertainties is closely connected to the sensitivity towards certain OIA parameters. I A good example is the threshold value, which is used in most algorithms to identify objects as cohesive areas where the fields exceeds the value of this parameter. I Changing the value of the threshold parameter in the OIA is therefore closely related to observational uncertainties, which change the value of the field itself. ⇒ The sensitivity of a score to varying OIA and parameter values is an indication for its sensitivity towards uncertainties. Michael Weniger (MIUB) OIA and Feature Based Verification 7 / 30 Introduction SAL: Introduction [3] SAL is a feature based method, that was developed to I measure the quality of a forecast using three distinct scores, which have direct physical interpretations to allow for conclusions on potential sources of model errors; I it does not require to match individual objects in observations and forecasts, but compares the statistical characteristics of those fields; I it yields scores close to a subjective visual assessment of the accuracy of the forecast for precipitation data. SAL stands for its three score components: I (S)tructure: describes the shape of objects I (A)mplitude: describes a global intensity error (L)ocation: consists of two parts I L1: describes a global displacement error L2: describes the spread of objects For this study we are interested in the object depend scores S ∈ [−2, 2] and L2 ∈ [0, 1]. Michael Weniger (MIUB) OIA and Feature Based Verification 8 / 30 Introduction SAL: OIA Setting We study the effect of different parameters for three different OIA I threshfac: defines objects as coherent areas of threshold exceedances. It depends on the value of the threshold level, which is defined by the parameter fac. I threshsizer: relies on the results of threshfac and removes small objects. The parameter NContig defines the minimal object size. I convthresh: applies a smoothing operator to the field before using threshfac. The radius of the smoothing disc is given by the parameter smoothpar. Michael Weniger (MIUB) OIA and Feature Based Verification 9 / 30 Introduction SAL: Data Map of the area shared by observational data and model output. For the calculation of SAL the model output is interpolated onto the coarser observational grid. SAL scores are calculated for various parameter settings using 400 cases of spectral radiance (6.2IR) fields over Germany: I model output: COSMO-DE forward operator I observations: SEVIRI satellite Michael Weniger (MIUB) OIA and Feature Based Verification 10 / 30 Introduction SAL: Overview Spatial Field Object Identification Algorithm - Parameter convthresh smoothpar threshsizer Ncontig threshfac fac smoothness minimal object-size threshold Objects SAL (S)tructure Verification Method S ∈ [−2, 2] (A)mplitude (L)ocation L2 ∈ [0, 1] Scores Michael Weniger (MIUB) OIA and Feature Based Verification 11 / 30 Parameter Sensitivity of SAL Statistical Procedure Parameter Sensitivity: Statistical Procedure 1 Compare changes in parameter values to the response in SAL scores. I I 2 Study differences in the distributions of the resulting sets of SAL scores. I I 3 We evaluate mean and maximal response over the set of 400 spatial fields for each algorithm and both L2 and S scores. This yields results regarding parameter sensitivity on an absolute scale, i.e. “How strongly does the choice of OIA and parameters influence my score?” In order to assess various characteristics of the score distributions, we employ five different hypothesis tests to detect significant differences due to changes in parameter values. This yields results regarding parameter sensitivity on a relative scale, i.e. “How important is the choice of OIA and parameters for the interpretation of my SAL results?” Take a closer look at the underlying processes and case studies. I I We approach this with a brief theoretical showcase example and then look at some of the worst case scenarios. This gives us the information we need, when thinking about new spatial methods in a probabilistic environment, i.e. “What can go wrong with the present approach, and how can we avoid it?” Michael Weniger (MIUB) OIA and Feature Based Verification 12 / 30 Parameter Sensitivity of SAL Statistical Procedure Parameter Sensitivity: Statistical Procedure 1 Compare changes in parameter values to the response in SAL scores. I I 2 Study differences in the distributions of the resulting sets of SAL scores. I I 3 We evaluate mean and maximal response over the set of 400 spatial fields for each algorithm and both L2 and S scores. This yields results regarding parameter sensitivity on an absolute scale, i.e. “How strongly does the choice of OIA and parameters influence my score?” In order to assess various characteristics of the score distributions, we employ five different hypothesis tests to detect significant differences due to changes in parameter values. This yields results regarding parameter sensitivity on a relative scale, i.e. “How important is the choice of OIA and parameters for the interpretation of my SAL results?” Take a closer look at the underlying processes and case studies. I I We approach this with a brief theoretical showcase example and then look at some of the worst case scenarios. This gives us the information we need, when thinking about new spatial methods in a probabilistic environment, i.e. “What can go wrong with the present approach, and how can we avoid it?” Michael Weniger (MIUB) OIA and Feature Based Verification 12 / 30 Parameter Sensitivity of SAL Statistical Procedure Parameter Sensitivity: Statistical Procedure 1 Compare changes in parameter values to the response in SAL scores. I I 2 Study differences in the distributions of the resulting sets of SAL scores. I I 3 We evaluate mean and maximal response over the set of 400 spatial fields for each algorithm and both L2 and S scores. This yields results regarding parameter sensitivity on an absolute scale, i.e. “How strongly does the choice of OIA and parameters influence my score?” In order to assess various characteristics of the score distributions, we employ five different hypothesis tests to detect significant differences due to changes in parameter values. This yields results regarding parameter sensitivity on a relative scale, i.e. “How important is the choice of OIA and parameters for the interpretation of my SAL results?” Take a closer look at the underlying processes and case studies. I I We approach this with a brief theoretical showcase example and then look at some of the worst case scenarios. This gives us the information we need, when thinking about new spatial methods in a probabilistic environment, i.e. “What can go wrong with the present approach, and how can we avoid it?” Michael Weniger (MIUB) OIA and Feature Based Verification 12 / 30 Parameter Sensitivity of SAL Absolute Parameter Sensitivity Absolute Par. Sensitivity: Min. Object-Size (L2) Minimal object size is measured in number of grid points. Recall that L2 ∈ [0, 1]. We observe only very small mean responses and still controllable worst case scenarios. It is important to note the linear decay in response-strength for small changes in parameter values. Michael Weniger (MIUB) OIA and Feature Based Verification 13 / 30 Parameter Sensitivity of SAL Absolute Parameter Sensitivity Absolute Par. Sensitivity: Min. Object-Size (S) Minimal object size is measured in number of grid points. Recall that S ∈ [−2, 2]. The results for the S-score confirm the above points. Due to the linear decay of response-strength for small changes in parameter values and small absolute response strength, we denote the min. object-size as a stable parameter. Michael Weniger (MIUB) OIA and Feature Based Verification 14 / 30 Parameter Sensitivity of SAL Absolute Parameter Sensitivity Absolute Par. Sensitivity: Threshold Ratio (L2) The threshold ratio takes values in (0, 1]. Recall that L2 ∈ [0, 1]. We observe very strong mean responses and completely uncontrollable worst case scenarios. A response equal to one means, that we effectively cannot distinguish between the best score (L2 = 0) and the worst score (L2 = 1). Michael Weniger (MIUB) OIA and Feature Based Verification 15 / 30 Parameter Sensitivity of SAL Absolute Parameter Sensitivity Absolute Par. Sensitivity: Threshold Ratio (S) Threshold ratio takes values in (0, 1]. Recall that S ∈ [−2, 2]. This case is completely analogous to L2. Since there is no decay of response-strength for small changes in parameter values and the absolute response is very strong, the threshold ratio is an unstable parameter. Michael Weniger (MIUB) OIA and Feature Based Verification 16 / 30 Parameter Sensitivity of SAL Absolute Parameter Sensitivity Absolute Par. Sensitivity: Smoothing Radius (L2) Smoothing radii are measured in number of grid points. Recall that L2 ∈ [0, 1]. We observe weak mean but strong maximal responses. This indicates an underlying process with high impact, that needs very specific conditions to occur. Therefore, it occurs less often for small changes in parameter values leading to a linear decay in mean response. Michael Weniger (MIUB) OIA and Feature Based Verification 17 / 30 Parameter Sensitivity of SAL Absolute Parameter Sensitivity Absolute Par. Sensitivity: Smoothing Radius (S) Smoothing radii are measured in number of grid points. Recall that S ∈ [−2, 2]. This case is again completely analogous to L2. Due to weak mean but strong maximal responses, the smoothing radius is a metastable parameter. In this case it is particularly important to understand the causes for the infrequently occurrence of strong score responses. Michael Weniger (MIUB) OIA and Feature Based Verification 18 / 30 Parameter Sensitivity of SAL Absolute Parameter Sensitivity Absolute Parameter Sensitivity: Summary The results for L2 and S are consistent: I the minimal object-size is stable I the smoothing radius is metastable I the threshold ratio is unstable The behavior of the threshold parameter is particularly important for the sensitivity towards observational uncertainties. How important are these results for the (statistical) interpretation of SAL? Michael Weniger (MIUB) OIA and Feature Based Verification 19 / 30 Parameter Sensitivity of SAL Relative Parameter Sensitivity Relative Parameter Sensitivity: Procedure We compare the distributions of S and L2 scores for each possible parameter pairing To detect differences in distributions, we apply different hypothesis tests: I Kolmogorov-Smirnov I Student-t I Wilcox-Mann-Whitney I Median I Quantile The null hypothesis H0 is always defined as: “Both parameter values yield identical distributions.” When H0 is dismissed with a significance level of 5%, we know that both distributions are significantly different in a statistical characteristic, which is given by the specific hypothesis test. Michael Weniger (MIUB) OIA and Feature Based Verification 20 / 30 Parameter Sensitivity of SAL Relative Parameter Sensitivity Relative Parameter Sensitivity: Results Due to the number of possible combinations of parameter values, we have to evaluate a large number of statistical test results. These can be summarized as follows. Contrary to the absolute parameter sensitivity, L2 and S exhibit different behaviors. I For L2 distributional differences are closely connected to changes in the mean value. I For S these differences occur most likely in the spread of the distributions. The reason for this can be found in the definition of the scores: I L2 ∈ [0, 1] is a positively defined one-sided score. I S ∈ [−2, 2] is defined as a two-sided score. For the S score responses to changing parameters can cancel each other out in the mean value, but lead to a larger spread. Michael Weniger (MIUB) OIA and Feature Based Verification 21 / 30 Parameter Sensitivity of SAL Relative Parameter Sensitivity Relative Parameter Sensitivity: Results Due to the number of possible combinations of parameter values, we have to evaluate a large number of statistical test results. These can be summarized as follows. Varying smoothing radii and minimal object-sizes (stable and metastable) I Only large changes ion parameter values lead to significant distributional differences. I For L2 the differences are only visible in the mean value. I For S the majority of significant differences are only visible in the spread, i.e. with the quantile test. Varying threshold levels (unstable) I Most parameter parings exhibit distributional differences. I In the majority of cases all five hypothesis tests detected these differences. Michael Weniger (MIUB) OIA and Feature Based Verification 21 / 30 Parameter Sensitivity of SAL Underlying Processes Underlying Processes In the following section we aim to understand the processes that lead to unstable or very sensitive parameters. 1 We consider two theoretical showcase scenarios, which will give us an idea what we should be looking for in the data. 2 We examine if the theoretical considerations are consistent with the statistics of the data. 3 We look at some case studies to observe the processes in concrete sets of data. Michael Weniger (MIUB) OIA and Feature Based Verification 22 / 30 Parameter Sensitivity of SAL Underlying Processes Theoretical Considerations Setting: there is a large and flat object with an intensity value just above the threshold level. Slightly raising the threshold level causes the object to vanish. This process can drastically change the S (structure) and L2 (scattering) scores. The changes in S and L2 can occur independently from each other. ⇒ The correlation between |∆S| and ∆L2 is expected to be small. Michael Weniger (MIUB) OIA and Feature Based Verification 23 / 30 Parameter Sensitivity of SAL Underlying Processes Theoretical Considerations Setting: there is a large object with a small interconnecting bridge. Slightly raising the threshold level or increasing the smoothing radius causes the object to decompose. This process can drastically change the S (structure) and L2 (scattering) scores. The changes in S and L2 are coupled: the decomposition yields smaller structures with larger spread. ⇒ The correlation between |∆S| and ∆L2 is expected to be very high. Michael Weniger (MIUB) OIA and Feature Based Verification 23 / 30 Parameter Sensitivity of SAL Underlying Processes Correlation Between |∆S| and ∆L2 Varying minimal object-sizes (threshsizer ) I None of the processes can occur by omitting only small objects. I Omitting small objects reduces the spread and increases the structure score simultaneously. ⇒ The correlation of S and L2 score changes are expected to be high. Varying smoothing radii (convthresh) I Only process (b) can occur, if an increase in smoothing causes an interconnecting bridge to vanish. ⇒ The correlation of S and L2 score changes are expected to be very high. Varying threshold levels (threshfac) I Both processes can occur. I We expect cases with low correlation (process (a)) and cases high correlation (for process (b)). ⇒ The correlation should behave more irregular and should overall be lower. Michael Weniger (MIUB) OIA and Feature Based Verification 24 / 30 Parameter Sensitivity of SAL Underlying Processes Correlation Between |∆S| and ∆L2 The theoretical considerations are consistent with the statistics of our data. Michael Weniger (MIUB) OIA and Feature Based Verification 24 / 30 Parameter Sensitivity of SAL Underlying Processes Case Studies Both processes should lead to significant changes in S and/or L2 scores. We have taken a look at spatial fields, which exhibit the largest score differences for different parameter settings, i.e. the worst case scenarios. For small changes in parameter values the vast majority of score differences are founded in one of the described processes. ⇒ We have identified the processes, which lead to unstable parameter behaviors. An example for each process is given on the following slides. Michael Weniger (MIUB) OIA and Feature Based Verification 25 / 30 Parameter Sensitivity of SAL Underlying Processes Case Study (a): Large and Flat Object ∆S = −2.5, Michael Weniger (MIUB) ∆L2 = 0.1 OIA and Feature Based Verification 26 / 30 Parameter Sensitivity of SAL Underlying Processes Case Study (b): Small Interconnecting Bridge ∆S = 0.6, Michael Weniger (MIUB) ∆L2 = 0.4 OIA and Feature Based Verification 27 / 30 Summary Summary: Parameter Sensitivity of SAL (I) Vanishing of large flat objects and the decomposition of large objects is present in all studied sets of data: I total cloud cover I spectral radiance (8 different channels) I precipitation (2km and 6km resolution COSMO reanalysis) These processes can cause high parameter sensitivity of OIA The maximal score response was similar across all sets of data ⇒ The frequency of these “bad” cases is the deciding factor for the stability of a parameter. Michael Weniger (MIUB) OIA and Feature Based Verification 28 / 30 Summary Summary: Parameter Sensitivity of SAL (II) Varying threshold levels are very problematic: I The threshold level is a parameter, that has to be chosen in each of the three studied OIA. I Varying threshold levels are closely related to observational uncertainties ⇒ All studied OIA are potentially very sensitive to uncertainties OIA, which rely on threshold levels, are not viable in a probabilistic environment with non-negligible observational uncertainties. The key to find a solution is to circumvent the non-continuous operator of thresholding. Promising approaches are I Probabilistic level sets [4] I Image warping with splines [5] I Wavelet decomposition [6] Michael Weniger (MIUB) OIA and Feature Based Verification 29 / 30 Literature Literature [1] Eric Gilleland, David Ahijevych, Barbara G. Brown, Barbara Casati, and Elizabeth E. Ebert. Intercomparison of Spatial Forecast Verification Methods. Wea. Forecasting, 24(5):1416–1430, October 2009. [2] E Ebert, L Wilson, A Weigel, M Mittermaier, P Nurmi, P Gill, M Göber, S Joslyn, B Brown, T Fowler, et al. Progress and challenges in forecast verification. Meteorological Applications, 20(2):130–139, 2013. [3] Heini Wernli, Marcus Paulat, Martin Hagen, and Christoph Frei. Sal - a novel quality measure for the verification of quantitative precipitation forecasts. Monthly Weather Review, 136(11), 2008. [4] Kai Pöthkow, Britta Weber, and Hans-Christian Hege. Probabilistic marching cubes. In Computer Graphics Forum, volume 30, pages 931–940. Wiley Online Library, 2011. [5] Eric Gilleland, Johan Lindström, and Finn Lindgren. Analyzing the image warp forecast verification method on precipitation fields from the icp. Weather and Forecasting, 25(4):1249–1262, 2010. [6] B Casati, G Ross, and DB Stephenson. A new intensity-scale approach for the verification of spatial precipitation forecasts. Meteorological Applications, 11(02):141–154, 2004. Michael Weniger (MIUB) OIA and Feature Based Verification 30 / 30
© Copyright 2024