int. j. prod. res., 2003, vol. 41, no. 7, 1587–1603 Improved SPC chart pattern recognition using statistical features A. HASSANy*, M. SHARIFF NABI BAKSHy, A. M. SHAHAROUNy and H. JAMALUDDINy Increasingly rapid changes and highly precise manufacturing environments require timely monitoring and intervention when deemed necessary. Traditional Statistical Process Control (SPC) charting, a popular monitoring and diagnosis tool, is being improved to be more sensitive to small changes and to include more intelligence to handle dynamic process information. Artificial neural networkbased SPC chart pattern recognition schemes have been introduced by several researchers. These schemes need further improvement in terms of generalization and recognition performance. One possible approach is through improvement in data representation using features extracted from raw data. Most of the previous work in intelligent SPC used raw data as input vector representation. The literature reports limited work dealing with features, but it lacks extensive comparative studies to assess the relative performance between the two approaches. The objective of this study was to evaluate the relative performance of a feature-based SPC recognizer compared with the raw data-based recognizer. Extensive simulations were conducted using synthetic data sets. The study focused on recognition of six commonly researched SPC patterns plotted on the Shewhart X-bar chart. The ANN-based SPC pattern recognizer trained using the six selected statistical features resulted in significantly better performance and generalization compared with the raw data-based recognizer. Findings from this study can be used as guidelines in developing better SPC recognition systems. 1. Introduction The increase in demand for faster delivery, small order quantity and highly precise products has led manufacturing systems to move towards becoming more flexible, integrated and intelligent. This requires that in monitoring critical processes, process information should be analysed rapidly, in a timely fashion and continuously for decision-making. Advances in manufacturing and measurement technology have enabled real-time, rapid and integrated gauging and measurement of process and product quality. Unfortunately, traditional Statistical Process Control (SPC) monitoring and diagnosis approaches are insufficient to cope with these new developments. Generally, enhancement is needed for this tool to be more sensitive to small changes, to be adaptable to a dynamic process environment, to enable rapid analysis and to become more informative and intelligent. Figure 1 shows the relationship between advances in manufacturing technology and the need for improvement in process monitoring and diagnosis. SPC charts are widely used for monitoring manufacturing process and product variability and are useful for ‘listening to the voice of the process’ (Oakland 1996). Revision received August 2002. { Faculty of Mechanical Engineering, Universiti Teknologi Malaysia, 81310 UTM Skudai, Johor Bahru, Malaysia. * To whom correspondence should be addressed. e-mail: adnan@fkm.utm.my International Journal of Production Research ISSN 0020–7543 print/ISSN 1366–588X online # 2003 Taylor & Francis Ltd http://www.tandf.co.uk/journals DOI: 10.1080/0020754021000049844 1588 A. Hassan et al. Manufacturing Technology *Automated , flexible, integrated & intelligent systems *Rapid & short runs *High precision Measurement Technology *Real time/on-line data acquisition *Rapid gauging & sensing *abundant data Product & Process Data Monitoring & Diagnosis *monitoring and diagnosis tools need to be more sensitive to small change, adaptable, rapid, informative, and intelligent. *Traditional SPC monitoring tool need to be enhanced Figure 1. Interaction between advances in manufacturing technology, quality monitoring and diagnosis. Properly implemented SPC charting techniques can identify when a particular process is operating within a statistically in-control (stable) state or statistically out-of control (unstable) state. Further, analysis of the observations plotted on the SPC charts provides process information, which can be useful for diagnostic purposes. Unstable processes may produce time series patterns such as cyclic, linear trend-up, linear trend-down, sudden shift-up, sudden shift-down, mixtures, stratification and systematic when plotted on a Shewhart X-bar chart. Identification of these patterns coupled with engineering knowledge of the process leads to a more focused diagnosis. This significantly minimizes efforts for troubleshooting. Traditionally, SPC chart patterns have been analysed and interpreted manually. Towards the end of the 1980s, several researchers such as Swift (1987) and Cheng (1989) proposed the use of expert systems for SPC chart analysis and interpretation, as manual methods were no longer sufficient for the situation described above. Developments in computing technology have motivated researchers to explore the use of artificial neural networks (ANN) for SPC chart pattern recognition (Hwarng and Hubele 1991, Hwarng and Hubele 1993, Pham and Oztemel 1993). The use of neural network technology has overcome some of the drawbacks in the traditional expert system approaches. Neural networks offer useful properties and capabilities such as non-linearity, input-output mapping, adaptability and fault tolerance, Improved SPC chart pattern recognition using statistical features 1589 among others (Haykin 1999). Since then, several other researchers have proposed various ANN-based SPC chart pattern recognizers. There are many factors that influence the performance of ANN-based pattern recognizers. Among these are the design of the network itself (micro- and macrolevels), selection of training algorithms and training strategies, and the representation of input data for training and testing. Most of the existing SPC pattern recognition schemes in the literature use normalized raw data as the input vector to the recognizer. These data representations normally produce large ANN structures and are not very effective and efficient for complicated recognition problems. A smaller ANN size can lead to faster training and generally more effective and efficient recognition. This limitation can be overcome with the use of features for representing data as demonstrated in pattern recognition applications for handwritten (Zeki and Zakaria 2000), characters (Amin 2000) and grain grading (Utku 2000) among others. The common motivation for using features extracted from raw data is dimensionality reduction (Pandya and Macy 1996), which would significantly reduce the size of the input vector. It was hypothesized that a smaller network size using the feature-based SPC pattern recognizer would perform and generalize better than the raw data-based recognizer. Generalization here means the ability of a recognizer to recognize correctly a pattern it has not been trained on. Very limited work has been reported on the use of features extracted from SPC chart signals as the input vectors into ANN-based SPC pattern recognizers. Pham and Wani (1997) introduced feature-based control chart pattern recognition. Nine geometric features were proposed: slope, number of mean crossings, number of leastsquare line crossings, cyclic membership, average slope of the line segments, slope difference and three different measures for area. The scheme was aimed at improving the performance of the pattern recognizer by presenting a smaller input vector (features). Tontini (1996, 1998) developed an online learning pattern recognizer for SPC chart pattern classification based on Radial Basis Functions Fuzzy-Artmap Neural Network. His input vector consisted of combinations of 60 individual raw observation data, mean and standard deviation of 15 statistical windows, 10 lags of autocorrelation, results of the computational Cusum chart and chi-square statistic. It would appear that combining all these simultaneously would result in a large input vector. No comparison between raw data against features set as the input vector representation was reported. The focus of his study was on developing a recognizer for online incremental learning. Pham and Wani (1997), Wani and Pham (1999) and Tontini (1996, 1998) did not report on the relative merits of raw data and feature set as the input vector representation. Our extensive literature review of major international journals only found that Anagun (1998) conducted a rather limited comparative study between the effectiveness of direct representation (raw data) against feature-based representation. He used a set of frequency counts as the features. The robustness of his feature set seems to be rather limited since it loses the information on the order of the data. A more extensive investigation is needed to identify the relative merits of the two approaches for input vector representation. Thus, the purpose of this current study is to fill this gap through investigating the classification performance when using a set of statistical features compared with the raw data as input representation. 1590 A. Hassan et al. The paper is organized as follows. Section 2 presents the patterns used and their generation, while section 3 discusses the statistical features investigated. Section 4 discusses the design of the pattern recognizers followed by the experimental procedures in section 5. Section 6 provides the results and discussion on the comparison between the two types of recognizers. Section 7 presents some conclusions. 2. Sample patterns Fully developed patterns were investigated. Ideally, sample patterns should be developed from a real process. Since a large amount of samples was required for the recognizers’ training and they were not economically available, simulated data were used. This is a common approach adopted by other researchers as mentioned above. This study adopted Swift’s (1987) methodology to simulate individual process data since the methodology has been widely accepted by other researchers. In this study, each sample pattern consisted of 20 subgroup averages of time sequence data with a sample size of five. The parameters used for simulating the six commonly researched control chart patterns are given in table 1. The values of these parameters were varied randomly in a uniform manner between the limits shown. Random noise of 1/3 was added to all unstable patterns. These parameters were chosen to keep the patterns within the control limits since, for preventive purposes, the status of a process should be identified while it is operating within these limits. It may be too late for preventive action if the ‘alarm’ is only generated after hitting the control limits. The minimum parameter values were chosen such that the patterns were sufficiently differentiable even after being contaminated by random variation. It was assumed that only one fundamental period existed for cyclic patterns, and a sudden shift only appeared in the middle of an observation window. A total of 2160 and 3600 sample patterns were used in the training and recall phases, respectively. 3. Statistical features The choice of statistical features to be extracted from the raw data to be presented as the input vector into the recognizer is very important. Battiti (1994) noted that the mixture content of a feature set to represent the original signal has impact on learning and generalization performance of recognizers. The presence of too many input features can burden the training process and lead to inefficient recognizers. Features low in information content or redundant should be eliminated whenever possible. Redundant here refers to features with marginal contribution given that other features are present. Pattern type Linear trend-up Linear trend-down Sudden shift-up Sudden shift-down Cyclic Stable process Table 1. Parameters (in terms of ) gradient: 0.015–0.025 gradient: 0.025 to 0.015 shift magnitude: 0.7–2.5 shift magnitude: 2.5 to 0.7 amplitude: 0.5–2.5; period ¼ 10 mean ¼ 0, SD ¼ 1 Parameters for simulating SPC chart patterns. Improved SPC chart pattern recognition using statistical features Selected features Omitted features Mean SD Skewness Mean-square value Autocorrelation Cusum Median Range Kurtosis Slope Table 2. 1591 Selected and omitted features (Hassan et al. 2002). A two-level resolution IV fractional factorial experimental design, 2105 IV (Montgomery 2001b) was used for screening and selecting a minimal set of representative statistical features from a list of 10 possible candidate features. Detailed discussion on this can be found in Hassan (2002). Table 2 summarizes the selected and omitted features. The mathematical expressions for these statistical features are provided in the appendix. Most of the above features are self-evident as they refer to commonly used summary statistics. The mean square value is the ‘average power’ of the signal (Brook and Wynne 1988). The Cusum statistics incorporates all information from a sequence of sample values by accumulating the sums of deviations of sample values from a target value (Montgomery 2001a). The last values of algorithmic Cusum were used. The slope used was estimated based on the least-squares method (Neter et al. 1996). Averages for autocorrelation at lag 1 and 2 were used for feature autocorrelation. The six features recommended above were then used to represent the input data for training and testing the feature-based recognizers. 4. Pattern recognizer design The recognizer was developed based on multilayer perceptrons (MLPs) architecture, as it has been applied successfully to solve some difficult and diverse problems such as in modelling, prediction and pattern classification (Haykin 1999). Its basic structure comprises an input layer, one or more hidden layer(s) and an output layer. Figure 2 shows an MLP neural network structure comprising these layers and their respective weight connections, w1ji and w2kj . Details of design procedures for such MLP are widely available, for example in Patterson (1996) and Haykin (1999). Before this recognizer can be put into application, it needs to be trained and tested. In the supervised training approach, sets of training data comprising input and target vectors are presented to the MLP. The learning process takes place through adjustment of weight connections between the input and hidden layers (w1ji ) and between the hidden and output layers (w2kj ). These weight connections are adjusted according to the specified performance and learning functions. The number of input nodes at the input layer was set according to the actual number of statistical features used. In this study, the size of input vector was six corresponding to the selected feature set given in table 2. When the raw data were used, the input node size was equal to the size of the observation window, i.e. 20. The number of output nodes in this study was set corresponding to the number of pattern classes, i.e. six. The number of nodes in the hidden layer was chosen based on trial 1592 A. Hassan et al. Input layer Hidden layer Output layer j i k Input vector O1 O2 OM wkj2 w1ji Figure 2. MLP neural network structure. and error. Thus, the ANN structures were 20 6 6 and 6 6 6 for the recognizers using raw data and statistical features as the inputs, respectively. Since this study used the supervised training approach, each pattern presentation was tagged with its respective label. The labels, shown in table 3, are the targeted values for the recognizers’ output nodes. The maximum value in each row (0.9) identifies the corresponding node expected to secure the highest output for a pattern to be considered correctly classified. The output values are denoted as O1 , O2 ; . . . ; OM in figure 2. Preliminary investigations were conducted to choose a suitable training algorithm. Three types of back propagation training algorithms, namely gradient descent with momentum and adaptive learning rate (traingdx), BFGS quasi-Newton (trainbfg), and Levenberg-Marquardt (trainlm) algorithms (Demuth and Beale 1998), were evaluated. The traingdx was adopted here since it provided reasonably good performance and more consistent results. It was also more memory-efficient Targeted recognizer outputs Node Pattern class 1 2 3 4 5 6 Description 1 2 3 4 5 6 Random Linear trend-up Linear trend-down Sudden shift-up Sudden shift-down Cyclic 0.9 0.1 0.1 0.1 0.1 0.1 0.1 0.9 0.1 0.1 0.1 0.1 0.1 0.1 0.9 0.1 0.1 0.1 0.1 0.1 0.1 0.9 0.1 0.1 0.1 0.1 0.1 0.1 0.9 0.1 0.1 0.1 0.1 0.1 0.1 0.9 Table 3. Targeted recognizer outputs. Improved SPC chart pattern recognition using statistical features 1593 compared with the trainlm. Trainlm gave the fastest convergence with the least epochs but it required too much memory. Trainbfg gave much faster convergence compared with traingdx, but the results were relatively less consistent. The network performance was measured using the mean squared error (MSE). The activation functions used were hyperbolic tangent (tansig) for the hidden layer and sigmoid (logsig) for the output layer. The hyperbolic tangent function is given by f ðxÞ ¼ ex ex =ex þ ex with output range from 1 to þ1. The sigmoid function is given by f ðxÞ ¼ 1=1 þ ex with output varying monotonically from 0 to 1. The variable x is the net input of a certain processing element (Hush and Horne 1993, Patterson 1996). 5. Experimental procedure Two types of ANN recognizers were developed: one used raw data as the input vector while the other used statistical features extracted from the data as the input vector. Before the relative merit between the use of raw data and the use of features as input vector representation could be evaluated, the recognizers had to be properly trained and tested. This section discusses the procedures for the training and recall (recognition) phases of the recognizers. The recognition task was limited to the six previously mentioned common SPC chart patterns. All the procedures were coded in MATLAB1 using its ANN toolbox. 5.1. Training phase Figure 3 shows the training procedure for the recognizers and table 4 provides the details of training specifications. The overall procedure began with the generation and presentation of process data to the observation window. All patterns were fully developed when they appeared within the recognition window. For raw data as the input vector, the pre-processing stage involved only basic transformation into standardized Normal, Nð0; 1Þ values. On the other hand, the statistical features approach involved extraction of statistical values from the raw data. These statistical features were then normalized such that their values would fall within ½1; 1. The rest of the procedure was the same for both approaches. Before the sample data were presented to the ANN for the leaning process, it was divided into training (60%), validation (20%) and preliminary testing (20%) sets (Demuth and Beale 1998). These sample sets were then randomized to avoid possible bias in the presentation order of the sample patterns to the ANN. The training procedure was conducted iteratively covering ANN learning, validation of in-training ANN and preliminary testing. During learning, a training data set (2160 patterns) was used for updating the network weights and biases. The ANN was then subjected to in-training validation using the validation data set (720 patterns) for early stopping to avoid over fitting. The error on the validation set will typically begin to rise when the network begins to over fit the data. The training process was stopped when the validation error increases for a specified number of iterations. In this study, the maximum number of validation failures was set to five iterations. Demuth and Beale (1998) provide further discussion on the use of early stopping for improving the generalization of the network. The ANN was then subjected to preliminary performance tests using the testing data set (720 patterns). The testing set errors were not used for updating the network weights and biases. However, the results from the preliminary tests in terms of percentage of correct 1594 A. Hassan et al. Start Process data & observation window Input data representation Preprocess: Feature extraction and normalisation Preprocess: Standardisation Divide and randomize sample patterns Preliminary testing set Validation set Training set ANN learning (Backpropagation) Validation of intraining ANN Training stopping criteria met? (Validation test) Retraining using new data set Next epoch No Yes Testing the trained recogniser Satisfy acceptance criteria? Yes Accept the trained recogniser No Retraining allowed? Yes No Select the best trained recogniser End Figure 3. Training procedure for the recognizers using raw data and statistical features. Improved SPC chart pattern recognition using statistical features Specification 1595 Value Number of training samples 2160 patterns Training stopping criteria Maximum no. of epochs ¼ 300 Error goal ¼ 0:01 Maximum no. of validation failures ¼ 5 Acceptance criteria for trained recognizer MSE 0:01 Percentage of correct classification 95%, or the best recognizer after retraining Maximum number of retraining allowed twice Table 4. Training specifications. classification and the test set errors were used as acceptance criteria of the trained recognizers. In other words, the decision on either to accept the trained recognizers or to allow for retraining was made based on these preliminary performance tests. The training was stopped whenever one of the following stopping criteria was satisfied: the performance error goal was achieved, the maximum allowable number of training epochs was met or the maximum number of validation failures was exceeded (validation test). Once the training stopped, the trained recognizer was evaluated for acceptance. The acceptance criteria as given in table 4 were compared with the recognizer’s preliminary performance results. The recognizer would be retrained using a totally new data set if its performance remained poor. This procedure was intended to minimize the effect of poor training sets. Each type of recognizer (statistical features input and raw data input) was replicated by exposing them to 10 different training cycles, giving rise to 10 different trained recognizers for each type. These recognizers are labelled 1.1–1.10 in table 5 and 2.1–2.10 in table 6 for the raw data and statistical features input, respectively. All 10 recognizers in each type have the same architecture and differ only in the training data sets used. The Recognizer 1:j and Recognizer 2:j for j ¼ 1; 2; . . . ; 10, were trained using the same training data set. Discussion on the training and recall performance provided in tables 5 and 6 are given in section 6. 5.2. Recall or recognition phase Once accepted, the trained recognizer was tested (recall phase) using 10 different sets of fresh totally unseen data sets. The testing procedure for the recall phase is shown in figure 4. Results of the recall phase are presented and discussed below. 6. Results and discussion This section presents results and comparisons of the performance between the feature-based recognizers trained and tested using the six recommended statistical features as given in table 2 and the recognizers trained and tested using raw data. Tables 5 and 6 show the training and recall performance of the 10 raw data-based recognizers and the 10 feature-based recognizers, respectively. It was noted during training that feature-based recognizers were more easily trained. None of the featurebased recognizers required retraining whilst all of the raw data-based recognizers 1596 A. Hassan et al. Input: raw data Training phase Recognizer no. Percentage correct classification Training error ðMSEÞ 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 Overall mean SD 90.14 94.31 92.64 92.78 92.64 95.42 92.5 92.64 91.81 94.44 92.93 1.4863 0.0216 0.0113 0.0124 0.0235 0.0161 0.0131 0.0146 0.0124 0.0207 0.0121 0.0158 0.0045 Table 5. Recall phase Percentage correct classification No. of epochs Mean SD 177 294 278 181 278 198 300 278 279 217 248 48.84 89.961 94.103 93.220 92.205 93.156 92.880 92.236 93.220 91.321 92.851 92.515 1.169 0.683 0.534 0.488 0.506 0.536 0.441 0.547 0.488 0.467 0.511 Training and recall performance for raw data-based recognizers. Input data: statistical features Training phase Recognizer no. Percentage correct classification Training error ðMSEÞ 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 Overall mean SD 96.96 98.06 96.81 97.78 97.78 96.81 95.97 96.39 98.19 97.5 97.22 0.7467 0.0099 0.0099 0.0098 0.0099 0.0100 0.0100 0.0100 0.0100 0.0100 0.0099 0.00994 0.00007 Table 6. Recall phase Percentage correct classification No. of epochs Mean SD 213 241 219 220 204 265 225 224 221 220 225.2 16.81 96.84 97.11 96.30 96.78 96.36 97.18 96.98 97.01 96.80 96.58 96.79 0.30 0.314 0.399 0.408 0.222 0.387 0.303 0.351 0.377 0.289 0.226 Training and recall performance for feature-based recognizers. required such retraining before they could be accepted. The overall mean percentages of correct recognition of raw- and feature-based recognizers were 92.5 and 96.8%, respectively. The percentages ranged from 89.96 to 94.10% and from 96.30 to 97.18% for the raw data- and feature-based recognizers, respectively. The results for statistical significance tests are summarized in table 7. Paired t-tests ( ¼ 0:01) were conducted as described in Walpole et al. (1998) for 10 pairs of raw data- and Improved SPC chart pattern recognition using statistical features 1597 INPUT DATA AND PREPROCESSING Process Data (Testing sets) SPC monitoring window Raw-data-based recogniser Which recogniser to Test? Preprocess: standardisation Feature-based recogniser Preprocess: feature extraction & normalisation Testing the trained ANN recogniser Performance evaluation Figure 4. Testing procedure for the trained recognizers (recall phase). feature-based recognizers for their performance in terms of percent of correct classification and the training error (MSE). The results in table 7 suggest that the difference in recognition accuracy between the type of recognizers was significant. This confirms that features as input data representation give better recognition performance compared with raw data. This finding is consistent with those reported in Pham and Wani (1997) and Wani and Pham (1999) although they used different sets of features. Further, the above comparison shows that the difference between the training errors (MSE) for the training phase between the two types of recognizers was significant. This result indicates that 1598 A. Hassan et al. Performance measure Hypotheses tstatistics (T) tcritical ðt ) Decision Recall performance H0 : recallðFeatureRawÞ ¼ 0 H1 : recallðFeatureRawÞ > 0 11.148 2.821 reject H0 MSE H0 : MSEðRawFeatureÞ ¼ 0 H1 : MSEðRawFeatureÞ > 0 4.100 2.821 reject H0 Table 7. Statistical significance tests for difference in performance. more training efforts would be required if the raw data-based recognizer were to achieve the required error margin. 6.1. Confusion matrix The confusion matrix is a table summarizing the tendency of the recognizer to classify a recognized pattern into a correct class or into any of the other five possible (wrong) classes. Confusion matrices, as given in tables 8 and 9, provide the overall mean percentages for confusions among pattern classes for 10 raw data-based and 10 feature-based recognizers, respectively. In other words, they are the mean scores from 100 such matrices (10 recognizers 10 testing sets). Tables 8 and 9 show that there is confusion in the classification process for both types of recognizers. For the raw data-based recognizer, there was tendency for the random pattern to be most confused with the cyclic, the linear trend-up pattern with sudden shift up, and linear trend-down pattern with sudden shift down. The featurebased recognizers also demonstrated almost a similar confusion tendency except for the random patterns. These pairings could be the result of the confused pairs sharing many similar characteristics. Random patterns were the hardest to be classified for the raw data-based recognizers (85.7%). They misclassify about 6% of random patterns as cyclic patterns and about 5% of cyclic patterns as random ones. However, for the feature-based recognizers, shift patterns were the hardest to be classified (about 94%). These patterns tended to be confused with random and linear trend patterns for about 2–3% of cases. Generally, the results for classification of random patterns in table 8 (85.7%) and table 9 (95.2%) suggest that the type I error performance for both types of recognizers does not seem to be very good. This is possibly due to the unpredictable Pattern class identified by raw data-based recognizer True pattern class Random Trend-up Trend-down Shift up Shift-down Cyclic Table 8. Random Trend-up Trend-up Shift-up Shift-down Cyclic 85.73 0.01 0.01 3.35 2.81 5.27 1.24 98.31 0.00 7.64 0.00 0.01 1.35 0.00 97.44 0.00 6.81 0.00 2.99 1.68 0.00 88.62 0.00 0.01 2.71 0.00 2.55 0.00 90.28 0.00 5.98 0.10 0.00 0.38 0.09 94.71 Mean percentage for confusion using raw data as the input vector. 1599 Improved SPC chart pattern recognition using statistical features Pattern class identified by raw features-based recognizer True pattern class Random Trend-up Trend-down Shift up Shift-down Cyclic Table 9. Random Trend-up Trend-up Shift-up Shift-down Cyclic 95.24 0.00 0.00 2.33 2.62 0.91 0.00 99.23 0.00 2.90 0.00 0.00 0.01 0.00 99.36 0.00 2.62 0.00 2.30 0.74 0.00 94.50 0.00 0.44 1.89 0.00 0.63 0.00 94.32 0.54 0.56 0.03 0.01 0.27 0.45 98.10 Mean percentage for confusion using statistical features as the input vector. Percentage correction recognition Random Trend-up Trend-down Shift-up Shift-down Cyclic Data set Feature Raw Feature Raw Feature Raw Feature Raw Feature Raw Feature Raw 1 2 3 4 5 6 7 8 9 10 96.58 95.28 95.75 94.62 95.65 95.03 94.05 95.80 94.88 94.75 87.25 86..58 86.43 84.93 86.02 86.77 83.97 85.00 86.07 84.32 98.83 99.42 99.30 99.25 99.03 99.49 99.23 99.27 99.02 99.45 98.73 98.65 98.47 98.19 98.82 98.05 97.98 97.92 98.08 98.18 99.30 99.52 99.27 99.60 99.28 99.05 99.44 99.45 99.48 99.23 97.85 97.75 97.93 97.05 98.05 97.33 97.03 97.45 96.62 97.33 99.30 99.52 99.27 99.60 99.28 99.05 99.44 99.45 99.48 99.23 88.44 87.92 90.67 99.63 89.52 88.58 87.60 87.90 88.95 88.03 93.72 94.62 93.80 95.15 93.92 95.08 93.38 94.10 95.07 94.40 90.27 90.30 90.02 90.50 89.28 91.25 90.12 89.25 91.98 89.85 97.62 97.93 98.57 97.62 98.48 98.55 97.17 98.23 98.95 97.92 93.85 95.53 94.62 93.98 95.28 94.77 93.30 94.48 95.68 95.58 Table 10. Recognition performance of raw data-based and feature-based recognizers. Type of patterns Hypotheses tstatistics (T) tcritical ðt ) Decision Random H0 : randomðFeatureRawÞ ¼ 0 H1 : randomðFeatureRawÞ > 0 37.91 2.82 reject H0 Trend-up H0 : trendupðRawFeatureÞ ¼ 0 H1 : trendupðRawFeatureÞ > 0 6.32 2.82 reject H0 Trend-up H0 : trenddownðRawFeatureÞ ¼ 0 H1 : trenddownðRawFeatureÞ > 0 11.26 2.82 reject H0 Sudden shift up H0 : shiftupðRawFeatureÞ ¼ 0 H1 : shiftupðRawFeatureÞ > 0 34.86 2.82 reject H0 Sudden shift-down H0 : shiftdownðRawFeatureÞ ¼ 0 H1 : shiftdownðRawFeatureÞ > 0 19.94 2.82 reject H0 Cyclic H0 : cyclicðRawFeatureÞ ¼ 0 H1 : cyclicðRawFeatureÞ > 0 18.11 2.82 reject H0 Table 11. Statistical significance test for recognition performance of raw data-based and feature-based recognizers. 1600 A. Hassan et al. structure of random data streams that make them relatively more difficult to be recognized compared with unstable patterns. On the other hand, unstable data streams have a tendency to correlate among the successive data. As such, the structures of their patterns are more predictable and this may have contributed towards easier recognition of unstable patterns. The confusion among patterns could also possibly be attributed to some vague patterns (due to low amplitude, gradient, etc.) and interference from baseline noise. The recognizers were designed to select the pattern corresponding to the output node with the maximum value. One possible approach to overcome this confusion is by considering the quality of each output node value. A weak output should be classified as a reject class even though it is the maximum value. The average recognition performances of the respective 10 raw-based and 10 feature-based recognizers to recognize correctly different type of patterns are given in table 10. The recognition performance for each type of pattern was compared statistically using paired t-tests ( ¼ 0:01) and the results are summarized in table 11. These results clearly suggest that the feature-based recognizers are statistically less confused for all type of patterns compared with the raw data-based recognizers (all H0 were rejected). 7. Conclusions The objective of this study was to evaluate the relative performance of featurebased SPC recognizers compared with the raw data-based recognizers. The MLP neural network was used as a generic recognizer to classify six different types of SPC chart patterns. A set of six statistical features was used in training and testing the feature-based recognizers. In this study, feature-based recognizers achieved a statistically significant improvement in recognition performance. Further, the use of the statistical feature set required less training effort and resulted in better recall performance. These confirm the expectation that a feature-based input vector representation results in better recognizer performance. It is important to note that this is true only when a proper set of representative features is used. Thus, summary statistics can be used as a reliable and better alternative in representing the SPC chart pattern data. The feature-based scheme used in this study is capable of coping with a high degree of pattern variability that is within the control limits. The findings can be used as guidelines to develop better SPC pattern recognizers. Currently, we are extending this work to recognize transitional SPC chart patterns for real time monitoring and recognition. In this effort, mechanisms to improve type I errors are also being addressed. Other pattern types such as stratification, mixture and systematic are to be included in future studies, as well as other features. This work can also be extended to investigate online learning and effect of costs on the decisions. Acknowledgements The authors thank Professor D. T. Pham for giving the opportunity to A. H. to conduct part of this study in the Intelligent Systems Laboratory, Cardiff University of Wales, UK. They also acknowledge the anonymous referees for comments and suggestions. Improved SPC chart pattern recognition using statistical features 1601 Appendix: Mathematical expressions for the statistical features The mathematical expressions for extracting statistical features (mean, median, standard deviation and range) are widely available in most of the texts on SPC or statistics. The mathematical expressions for extracting the rest of the statistical features are as the following: A.1. Mean-square value 2 2 2 2 x þ x 1 þ x 2 þ þ xN 1 xe2 ¼ 0 ¼ Nþ1 Nþ1 N X x2i : ðA 2Þ i¼0 A.2. Cusum The tabular cusum provides accumulated deviation fromo that are above target with statistic C þ and accumulated deviations from o below target with C statistic. The statistics C þ and C which represented one-sided upper and lower cusums are computed as follows (Montgomery 2001a): þ Ciþ ¼ max½0; xi ð0 þ KÞ þ Ci1 ðA 2Þ Ci ¼ max½0; ð0 KÞ xi þ Ci1 ; ðA 3Þ Coþ Co where the starting values are ¼ ¼ 0: In this study, the last cusum statistic for each data stream was taken as the representative feature. The reference value, K was set to half shift magnitude to provide sensitivity to detect shift of 1. A.3. Skewness Skewness (a3 ) provides information concerning the shape of the distribution. It indicates any lack of symmetry in the data distribution. Skewness of frequency distribution is given by (Besterfield 1994): h P a3 ¼ 3 fi ðXi X Þ =n i¼1 s3 : ðA 4Þ where n is the number of observed values, s is sample standard deviation, fi is the frequency in a cell or frequency of an observed value, X is the average of the observed value and Xi is the observed value. A.4. Kurtosis Kurtosis (a4 ) is the peakness of the data. Similar to skewness, it provides information concerning the shape of the distribution. The kurtosis value is used as a measure of height of the peak in a distribution (Besterfield 1994): h P a4 ¼ i¼1 4 fi ðXi X Þ =n s4 : ðA 5Þ 1602 A. Hassan et al. A.5. Slope Let Y denote the mean of Yi and X denote the mean of Xi of the (Xi ; Yi ) pairs of sample observations. The best-fitting straight line is given by Yi ¼ b0 þ b1 Xi , where b1 is the slope. The least-squares line and the slope is obtained as follows (Kleinbaum et al. 1988, Neter et al. 1996): Y^ ¼ ^o þ ^1 X ðA 6Þ n X ðXi X ÞðYi Y Þ ^1 ¼ i¼1 n X ðA 7Þ 2 ðXi X Þ i¼1 ^o ¼ Y ^1 X : ðA 8Þ The intercept is given by ^o and ^1 is the slope of the fitted line. The value of ^1 was used as the feature in this study. A.6. Average autocorrelation Autocorrelation exists when one part of a signal depends on another part of the same signal. It measures the dependence of a data at one instant in time with another data at another instant in time (Brook and Wynne 1988). A signal will show the wildest fluctuation with time if there is no correlation between any two neighbouring values. For random signals, one can expect that its value in the distant future to be negligibly dependent on its present value. The autocorrelation function is defined as follows (Brook and Wynne 1988): Rxx ½k ffi 1 ½x x þ x1 x1þk þ xNk xN ; Nþ1k o k ðA 9Þ where N is the number of observations and k is the lag. In this study, the averages for autocorrelation at lag 1 and 2 were used as the feature. These lags were chosen based on preliminary simulation runs. References Amin, A., 2000, Recognition of printed Arabic text based on global features and decision tree learning techniques. Pattern Recognition, 33, 1309–1323. Anagun, A. S., 1998, A neural network applied to pattern recognition in statistical process control. Computers Industrial Engineering, 35, 185–188. Battiti, R., 1994, Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks, 5, 537–550. Brook, D. and Wynne, R. J., 1988, Signal Processing: Principles and Applications (London: Edward Arnold). Cheng, C.-S., 1989, Group technology and expert systems concepts applied to statistical process control in small-batch manufacturing. Unpublished PhD dissertation, Arizona State University. Demuth, H. and Beale, M., 1998, Neural Network Toolbox User’s Guide (Natick, MA: Math Works). Hassan, A., 2002, On-line recognition of developing control chart patterns. PhD thesis, Universiti Teknologi, Malaysia. Haykin, S., 1999, Neural Networks: A Comprehensive Foundation, 2nd edn (Englewood Cliffs, NJ: Prentice-Hall). Improved SPC chart pattern recognition using statistical features 1603 Hush, D. R. and Horne, B. G., 1993, Progress in supervised neural networks: what’s new since Lippmann? IEEE Signal Processing Magazine, January, 8–39. Hwarng, H. B. and Hubele, N. F., 1991, X-bar chart pattern recognition using neural nets. ASQC Quality Congress Transactions, 884–889. Hwarng, H. B. and Hubele, N. F., 1993, Back-propagation pattern recognisers for x control charts: methodology and performance. Computers and Industrial Engineering, 24, 219– 235. Jones, B., 1991, Design of experiments. In T. Pyzdek and R. W. Berger (eds), Quality Engineering Handbook (New York: Marcel Dekker), pp. 329–387. Montgomery, D. C., 2001a, Introduction to Statistical Quality Control, 4th edn (New York: Wiley). Montgomery, D. C., 2001b, Design and Analysis of Experiments, 5th edn (New York: Wiley). Neter, J., Kutner, M. H., Natchtsheim, C. J. and Wasserman, W., 1996, Applied Linear Statistical Models, 4th edn (Chicago: Irwin). Oakland, J. S., 1996, Statistical Process Control (Oxford: Butterworth-Heinemann). Pandya, A. S. and Macy, R. B., 1996, Pattern Recognition with Neural Network in C þþ (Boca Raton, FL: CRC Press). Patterson, D. W., 1996, Artificial Neural Networks: Theory and Applications (Singapore: Prentice-Hall). Pham, D. T. and Oztemel, E., 1993, Control chart pattern recognition using combinations of multilayer perceptrons and learning vector quantisation neural networks. Proceedings of the Institution of Mechanical Engineers, 207, 113–118. Pham, D. T. and Wani, M. A., 1997, Feature-based control chart pattern recognition. International Journal of Production Research, 35, 1875–1890. Ross, P. J., 1996, Taguchi Techniques for Quality Engineering (New York: McGraw-Hill). Swift, J. A., 1987, Development of a knowledge based expert system for control chart pattern recognition and analysis. Unpublished PhD dissertation, Graduate College, Oklahoma State University. Tontini, G., 1996, Pattern identification in statistical process control using fuzzy neural networks. In Proceedings of the 5th IEEE International Conference on Fuzzy Systems, 3, pp. 2065–2070. Tontini, G., 1998, Robust learning and identification of patterns in statistical process control charts using a hybrid RBF fuzzy artmap neural network. The 1998 IEEE International Joint Conference on Neural Network Proceedings (IEEE World Congress on Computational Intelligence), 3, pp. 1694–1699. Utku, H., 2000, Application of the feature selection method to discriminate digitised wheat varieties. Journal of Food Engineering, 46, 211–216. Walpole, R. E., Myers, R. H. and Myers, S. L., 1998, Probability and Statistics for Engineers and Scientists, 6th edn (New York: Macmillan). Wani, M. A. and Pham, D. T., 1999, Efficient control chart pattern recognition through synergistic and distributed artificial neural networks. Proceedings of the Institution of Mechanical Engineers: Part B, 213, 157–169. Zeki, A. A. and Zakaria, M. S., 2000, New primitive to reduce the effect of noise for handwritten features extraction. IEEE 2000 Tencon Proceedings: Intelligent Systems and Technologies for the New Millennium, pp. 24–27.
© Copyright 2024