Cost Estimation of Software Intensive Projects:A Survey of Current Practices* Jairus Hihn Harnid Habib-agahi Jet Propulsion Laboratory/California Institute 4800 Oak Grove Drive Pasadena, California 91109 Abstract This paper deseribes a survey conducted of the staff of the Jet Propulsion Laboratory (JPL) who estimate software costs for software intensive projects in JPL’s teehnical divisions. Respondents to the survey deseribed what techniques they use in estimation of software costs and, in an experiment, each respondent estimated the size and cost of a speeiflc piece of software described in a design document provided by the authors. It was found that the majority of the technical staff estimating software costs use informal analogy and high level partitioning of requirements, and that no formal procedure exists for incorporating risk and uncertainty. The technical staff is significantly better at estimating effort than siztx however, in both cases the variances are so large that there is a 30 pereent probability that any one estimate can be more than 50 percent off. seeking to improve the quality of its estimation and bidding practices. To achieve these objectives, research has been conducted on metrics collection and project analysis as wetl as the use and development of cost models. In addition, since bottom-up or grass roots estimates are the main methodology used for major JPL projects, a survey was conducted of the technical staff that actually performs the lowest level estimates during the summer and fall of 1989. The survey had two basic objectives: to identify the current cost methods and practices within the JPL technical divisions, and to obtain a clearer picture of the accuracy of the effort and size estimates. The most accurate approach that could be used to achieve these objectives would be to observe individuals when they make cost estimates and to collect data on estimates and actual project size and effort. Of course, using this approach the problem is that it would take years to collect sufficient data for analysis, and it would be very expensive. The alternative is to conduct a survey in which participants deseribe their cost estimation process and participate in an experiment estimating the size and effort for a piece of software that has atready been completed. In order to obtain some usable information in a reasonable time frame, the latter approach was selected. 1. Introduction As with all other aerospace organizations, JPL is in the process of having to confront the ever-increasing importance of software. As of 1987, over 50% of the technical staff was working on software intensive tasks III. As is well known, significant cost and schedule overruns are not uncommon on large scale software developments [2,3]. This is partly due to the contractual environment where in the past there have been incentives to underbid in order to land the projec~ It is also due to the inherent uncertainty of the cost estimation process [4]. In an attempt to control cost overruns, Congress is increasing the emphases on the use of fixed price contracts. In order to adjust to this changing environmen~ JPL is This report contains the following section~ Section 1, Introduction; Section 2, Sample Design and Background Information; Seetion 3, Summary of Participant Background Da@ Section 4, Current Practices Section 5, Size and Effort Experiment and Section 6, Summary and Conclusions. 2. Sample Design Information * The research described in this paper was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration. CH2982-7/91/OOOO/0276$Ol .00 @ 1991 IEEE of Technology and Background The following brief overview of JPL is meant to provide the proper context to understand the results of the survey 276 Recommended by: MitsunJ Ohba and experiment. JPL is a NASA field center mn by the California Institute of Technology under a government contract. As a national laboratory, it performs research and development activities, primarily the development of robotic spacecraft for interplanetary studies. In addition, a portion of JPL’s budget is supplied by non-NASA organizations such as the Department of Defense. compared to the sample breakdown of both the survey of current practices and the experiment in Table 1. The Chisquare test results are .0755 and.1 135 respectively, which for 5 degrees of freedom is less then a 1% pm’bability that the distributions are different. The interpretation of the test results are that the sample breakdowns for both the survey and the experiment are not statistically different fmm he planned distribution over the divisions. JPL is divided into eight “offices”: two offices for upper management and administration, four program offices, the office of technical divisions and an office for assurance activities. JPL is managed through a matrix structure with the four program offices controlling the funds and the eight technical divisions for the implementation of tasks. The divisions are defined by technical specialties, such as systems engineering or information science. Sections within each division are even more specialized and are further broken down into groups. Due to the differences in specialization, the viewpoints and practices between the divisions and even sections can be markedly different. Table 1: Sample Breakdown by Diviision (percentages) Division At JPL, the methods used to estimate costs differ depending upon the required reliability. A “Class A’ estimate is the most rigorous, and a formal review is required. For Class A estimates, model-based and other independent estimates are supposed to be obtained. However, most estimates are produced by a bottom-up exercise which may involve as many as 25 persons for the software activities portion of each task. These are the types of individuals which were included in the survey. Planned Attempted Completed 31 0.23 0.23 0.35 33 0.17 0.15 0.17 34 0.15 0.08 0.07 35 0.05 0.04 0.04 36 0.32 0.43 0.28 38 0.08 0.07 0.09 Obviously, not all personnel working cm software intensive tasks estimate software costs. The Earth and Space Sciences Division is composed of sections doing hard science. The Institutional Computing and Mission Operations Division performs operations and testing. While software is developed in these divisions, their developments are relatively small. The result was that these divisions were dropped from the survey as no one could be identified who would admit to estimating software costs. 2.1 Sample Definition and Methodology Over 185 persons were contacted for participation in the survey. Of the 185 contacted, over 100 wem identified who estimate effort, size and/or cost for software tasks. Of these, 83 persons were willing to complete the questionnaire on current software cost estimation practices. In addition to completing the qumtiomaire, a total of 48 persons completed some portion of the software size and effort estimation experiment. Only those who actually estimate software costs, either for reporting or for planning their own activities, were included in the survey. The sample is stratifkxi over the technical divisions. The number of persons to include in the survey fmm each division was determined by the results of a 1987 survey of software intensive activities at JPL [5]. The intention was to make the sample contain the same distribution over the divisions as the actual JPL labor force performing software intensive tasks. Only those who actually estimate software costs as part of their job were included in the survey. Potential participants were identified by group supervisors and section managers. In most cases, organization charts were obtained and the process started with an interview of the section manager. In this manner, a list of possible req.mndents was identifkd and the breakdown across the divisions was kept consistent with the 1987 survey. A percentage breakdown comparing the dkribution across divisions based on the 1987 survey Rsults (planned) is Potential participants were interviewed personally. When time permitted, the experiment portion of the survey was completed while the interviewer waited. More often, the experiment portion was explained and the results mailed back later. On average 30 to 60 minutes were required to complete the size and effort estimates, and the majority were returned within a week. The participants were asked to specify whether they estimate size, effort, ancVordollars, and then, in their own 27 words, to describe their normal process of estimating For purposes of analysis, these software costs. descriptions were then translated into binary and categorical variables which describe how a task is d~ompod, how effort and size estimates are generated, and what cost drivers are used. These variables will be discussed in greater detail in Section 4. For each participant, information was also collected on experience in software cost forecasting, software developmen~ and software management. Experience by programming language was also collected. For the experiment the questionnaire asked for size and effort estimates for the code described in a design document which was provided as part of the experiment. (Copies of the document are available ffom the authors.) A more detailed discussion of the experiment is provided in Section 5. Size was defined as executable source lines of code (SLOC), excluding comments and blanks. Effort was defined as the number of work days spent by the software programming team during the detailed design, code and unit test phases. Respondents could estimate effort in days, weeks or months. For the analysis, all estimates measured in months were converted to days, assuming five days per week and 21 days per month. Respondents wete asked for three estimates the lowest, most likely, and highest (triplet) estimates for both size and cost. Respondents were also asked if they were familiar with the type of software which was used in the experiment (a database) and what, if anything, they did differently from their normal method of cost estimation. 3. Summary Data of Participant Background Tables 2a and 2b summarizes the experience of the survey participants, which should be representative of the actual JPL population which participates in estimating software costs within the technical divisions. Table 2a summarizes the participants’ years of experience as cost estimators, software developers and software managers. Experience is ah broken down by type of language. Since many projects use more than one language, the total development and managerial experience will not be equal to the sum of their respective language experience categories. The breakdown by years of experience provides a picture of the distribution of years of experience. Table 2b presents a summary of how often the respondents perform estimates. The average person making cost times per year. estimates has approximately 15 years of experience working on software tasks and nine years making cost estimates. Seven of the 15 total years have been spent with managerial responsibilities (cognizant engineer or higher). Furthermore, two-thirds of the population has from 11 to 30 years of development experience. This reveals a substantial amount of experience that is being utilized for cost forecasting. The majority of experience is with FORTRAN and assembly. Although the design used for the experiment was to be implemented in C, there is relatively littte C experience in the population. Four percent of those costing software have no softwwe development expxience. In order to measure how frequently estimates are made, each participant was asked to specify the dates of their last three estimates. These were translated to the number of months from the date of each interview. Of those interviewed, 76 percent had made at least one estimate in the six months preceding the interview; 16 percent had made as many as three estimates in the preceding six months. The average time between estimates was eight months. 4. Current Practices There are several different major steps required to make an estimate. Each of these steps can be completed in different ways. To identify how each step is being completed at JPL, the descriptions of current practices have been organized according to the following questions. What do people estimate? To what extent are estimates driven by external constraints? How is software partitioned for cost estimation? How are estimates produced? What cost drivers are most frequently used for estimation? Each person’s description of their cost process is relatively unique. For purposes of analysis, it was necessary to simplify these descriptions by forcing each description into a small number of general categories. There is a fundamental tradeoff between precision in the definition of the categories and the number of categories, such that the more precise the definitions the greater the number of categories. In this study, given the small sample size and large variation within the data, a small number of categories was used. 4.1 What do people estimate? A simple summary of the data presented shows that the average person making a software cost estimate has substantial experience and makes estimates one or two Everyone interviewed reported estimating effort, while only 49 pment reported estimating size. However, while 278 Table ----- Type of Experience software Development 2a: --- Resnnndentc = Information .—. --—-.--— Percentage of respondents reporting Standard Deviation (Years) Mean (Years) Summarv -.-.-r—----—.—— Oyears 1-5 years 6-10 years 11-30 years 14.9 7.6 4 8 22 66 Fortran 7.0 7.8 28 28 21 23 c 1.1 3.0 73 19 5 3 Ada 0.2 44. 1.2 96 3 1 0 Assembly 6.2 52 16 19 13 Other 3.0 4.5 58 28 9 5 Managerial 7.2 5.8 17 27 28 28 Fortran 3.6 4.4 54 22 18 6 c 1.5 2.7 61 31 8 0 A& 0.2 0.7 96 4 0 0 Assembly 1.8 3.7 82 8 6 4 Other 1.5 3.3 78 16 5 1 Estimation. 9.4 6.6 0 36 30 24 Table 2b: Respondents Summary Information Percentage of respondents reporting Estimate Mean (months) Standard Deviation (months) 1-6 months 7-12 months 13-24 months >25 months Most Recent Second Most Recent 5 14 7 14 76 40 17 6 23 22 1 15 Third Most Recent 21 16 16 25 27 31 approximately half of the respondents provide size estimates, only 22 percent of those surveyed reported actually using size as part of their cost estimation proees~ the others estimated size only to meet a reporting requirement. A total of 69 percent reported estimating dollars. All Department of Defense-sponsored projects must provide size estimates. In the future, as part of the JPL software process standard (known internally as D40CK)),all software intensive projects will provide size estimates. Of course, this does not mean they must be used for cost estimation or tracking. 4.2 To what extent are estimates driven by external constraints? The cost estimating process can be seriously impacted by conditions of severe budget or schedule conskaints. The result is that the estimator’s job becomes km one of estimating costs and more one of analyzing system and functional tradeoffs. In many cases, if there is strong motivation to accept the work, the job may lbe accepted under the assumption that any inconsistencies between requirements, cost and schedule will be resolved while the task is under development. The result is that Ihere is little value in developing competent estimators. Thirty percent of the respondents reported being budget constrained while 20 percent reported being schedule constrained. 279 4.4 How are estimates produced? Twenty-four percent of the population reported having to estimate projects under both signifkant schedule and cost constraints. 4.3 How is software partitioned When costestimatesare generated, several techniques are often used either in combination or as alternative estimates. For this reason, a primary and secondary approach has been identifkd for each respondent. In many casesall of the methods described below were used by the estimator. The data presented reflects the authors’ interpretation of what techniques were the dominant ones based on each participant’s description of their estimation process. for cost estimation? Partitioning techniques have been grouped into three main categoriw function based, product based or algorithmic. The functional view is captured by two categorie~ high level functional breakdown (HLF) and low level functional breakdown (LLF). A partitioning technique is categorized as HLF if the breakdown is at the program set level or higher (CSCI for military projects) and LLF if the breakdown is to one level below a program set or lower (CSC for military projects). The product view is captured by the wodc breakdown structure (WBS) category. (This assumes that the WBS and the document and code deliverables are consistent, which is not always the case,) If a respondent reported primarily using a WBS, but also completed a low level functional breakdown, this was counted as LLF. Finally, the algorithmic category captures the computational process view of a software system. High Level. Fn Breakdown 53 Low Level Fn Breakdown 28 Four categories are identified informal analogy, formal analogy, rules of thumb and models. Rules of thumb, formal analogy and informal analogy are closely related. A respondent was defined as using informal analogy if expert judgment is used or if detailed comparisons to spec~lc projects were made but documented data was not used to support the estimate. The responses appeared to be divided eqwdly between the two extremes. Formal analogy means that documented data were used. Rules of thumb are used in many different ways; for example, to em.imate overhead, support tasks, or software size. By their very nature, roles of thumb must be seeded with some other information – information usually generated by analogy. This can be based upon data or experience. Whether a rule of thumb is a formalization of expert opinion or is derived from actual project data, the use of well defined rules of thumb clearly documents the estimator’s assumptions. There were two models used by those surveyed CoCOMO, developed at TRW, and Softcos4 developed at JPL in the early eighties. Both am lines-of-code driven. WBs 10 Table 4: Summary of Estimation Techniques . Table 3: Summary of Requirements Translation Techniques Translation of Requirements Algorithmic Total Percenhge Respondents Method ktimation Teehnique Primary (%) Secondary (%) 9 100 The data provide two main insights. One is that the vast majority of estimates are based on a functional view of the projec~ which is not surprising. Secondly, assuming that WBS and algorithmic pardtioning are usually completed at a high level, only 28 percent of those surveyed used a low level breakdown of the requirements. The extensive use of high level breakdowns most likely reflects the existence of the vague and volatile requirements which are an inherent part of R&D projects. Analogy, Informal 83 34 Analogy, Formal 4 0 Rules of Thumb 6 55 Models 7 11 100 100 Total The most prominent results are that 87 percent of those surveyed use analogy for estimating purposes but only 4 percent of those surveyed use formal analogy. Only one participant actually kept extensive records of previous projects he had worked on. Another insight provided is 2s0 that rules of thumb play an important role as a secondary method, where they are used 55 percent of the time. 4.5 What coat drivers are moat frequently estimation? used for Cost drivers were most frequently used as a multiplier of abase estimate. The value of the multiplier was normally based on a well defined rule of thumb. However, in some casesthe value was subjective and changed from estimate to estimate. In other cases, depending upon how specialized the group was, the quality of the personnel was simultaneously incorporated in a subjective manner with the base estimate. These latter methods tend to provide estimates which are not reproducible. Other cost drivers mentioned in order of their ffequency of occurrence were anything new (e.g., language, application, operating system), personnel, and complexity. These were incorporated into the process primarily as ratio factors determined by expert judgment. One-third of those interviewed described using some form of measure of volume. Twenty-two percent used lines of code, 10 percent used a form of “function points”, two persons counted requirements and one estimated real memory usage. Approximately 20 percent tried to incorporate factors related to standards, such as the amount of documentation or the number of reviews. There were also factors which people knew were important but they did not know how to incorporate into their estimates. ‘Ihere was concern expressed about estimating time for testing and reviews, and how to All participant incorporate risk and uncertainty. mentioned incorporating functionality in some manner, 4.6 Analysis The main insight that stands out from the current practices portion of the survey is that the technical staff are primarily using a functional partitioning of requirements. Requirements are generally partitioned at a relatively high level. Size and cost are estimated based on informal analogy, However, there is virtually no attempt to deal with risk in a systematic manner. In general, there are few relationships or patterns that are identifiable with respect to how people estimate software size and cost. Based on Chi-square tests for contingency table breakdowns of experience (estimation, development and managerial), and division versus translation of requ~ments and estimation method, no breakdown was significant at the 10 percent level. A few marginal patterns that were consistent with intuition are that the Systems Division uses a WBS more than statistically expected, and the Electronics Division uses an algorithmic approach more than expected. The expected value is derived from the total distribution over the categories, assuming that the two variables are independen~ E(Xij) = N* Pi*Pj. The less experience (especially software development experience) that software cost estimators have, the greater the likelihood that they use models or rules of thumb as their primary method. Those with no software development experience used an algorithmic partitioning of requirements more than expected. This most likely reveals that their work on software tasks is based on domain or scientific knowledge, and thenefore they natorally have a more processor descriptive view of what the computer system will have to do. 5. Size and Effort Experiment There are several ways to look at the process of software cost estimation. Dwect estimates can be made of the effort required to complete a software project. This is usually done by analogy to other similar projects that have been completed. The problem with this approach is that it depends upon the similarity of the projects being referenced and, in the case of informal analogy, the concern is the accuracy of the estimator’s memory. The other major alternative is to indirectly estimate effort via other project characteristics which are highly correlated to cost. The most frequently used correlate is so]memeasure of the size of the software system. The most frequently used measure of size is the number of source lines of code, which is used to derive the number of work months by multiplying by some assumed productivity. The problem wirh this approach is that actual productivity figures vary over a wide range due to differences in software projects, such as differences in requirements for ground versus flight software, complexity, programmer experience and capability, coding style, and language differences [s]. In one study, it was shown that the actual size of a piece of software depends upon the main objective to be achieved (for example, efficiency versus modularity) and can vary by a factor of 5 to 1 [6]. In general, rhe final size of a piece of code is more often driven by exceptional conditions than by the functional requirements [71. Previous studies have reported substantial errors and changes in size estimates during the life c~f a project, Wheaton [8] reports that size estimates grow an average of 105 percent from the time of contract award to the critical 281 design review. Kitchenbaum and Taylor [9] found tha~ on average, original estimateswere less than half of the actual code size. A recent analysis of several size estimation models yielded the result of an average overestimate of 130 percent [10]. The most accurate model estimates were based on information from detailed designs and wem still off by 28 percent. project to minimize the number of participants who would have to be excluded due to preexisting knowledge. A problem which arose was that the mapping from requirements to the implementation of a code module can be very complicated. Code segments from two software tasks we= considered an~ in both cases, it would have required a significant amount of time from several members of the development staff to unravel the relationships between the different documents, the code and the development effort. Therefore, a code module was selected based on art unusually detailed architectural design document and the ability to identifi the actual work effort and size. It is unclear what the impact on the estimators is of having access to the greater information contained in a design document as compared to a requirements document. On the one hand, it could improve the quality of the estimate; on the other hand, the estimator may have difficulty integrating all of the extra detail into their estimate becausethey are not used to such a level of detail. In either case, it is one more factor which makes the environment of the experiment different from the usual conditions faced by the estimator. Given the extent to which informal analogy is used by technical personnel, it is important to obtain some idea as to the overall ability of experienced estimators to determine the size and effort of a piece of software. The following contains a description of how the size and effort estimation experiment was defined, a description of the probabilistic viewpoint used to interpret the &t@ the numerical results, and an analysis of these results. 5.1 Defining the Experiment A major concern when setting up the experiment was that different types of software may be easier or harder to estimate. It was decided to resrnct the experiment to applications code since the majority of programming tasks at JPL are applications. A good cross section for studying accuracy estimation should at least include user interface, &tabase and mathematical applications code. User interface code is very dependent upon the computer system, language and software tools that am being used. Therefore, if these are not specified there will be a very large variation in the size estimates. On the other hand, scientific or mathematical code is primarily dependent on the equations and algorithms being coded and not the software or compute~ therefore, this type of software can most likely be estimated with greater accuracy. For this survey, an example of a database nmdule was selected from a representative JPL project (“SAR”). The actual code described in the experiment’s design document was a portion of a production controller (an interactive program which permits interaction with the database to produce products to fill user orders). The code is primarily written in C, but portions of the module were written in INGRES, a database language. The code is of medium size, is well documented, and the documentation could easily be isolated for a piece of the functionality. It was felt that the functionality of this project was similar to many other projects at JPL, but since it was a relatively small effort, the results would not be known by very many of the potential respondents. For purposes of analysis, it was important that preliminary effort and size estimates were avaitable, as well as consistent final effort and size figures. All of these factors made this piece of software well suited for use in the experiment. A cost estimation experiment should provide to the estimator information similar to that available when size and effort estimates are made for an actual project. The following criteria were used (1) the existence of a proper requirements document which mapped cleanly into the modules to be estimated (in this experiment a more detailed software design document was used in lieu of a requirements document~ (2) the size of the code could not be too small (under 509 source lines of code (SLOC) so as to be trivial), nor too large (over 10,000 SLOC to requhe an excessive amount of time); (3) reliable information on the actual project’s size and effort data had to be available for evacuation of the estimat~, (4) the code to be estimated must be straightforward so that most respondents could comprehend the functionality being describe@ (5) the code used had to be from a small 5.2 Theory For any requirements document there are a set of designs that will meet the requirements. For each design there is some distribution of SLOC that will satisfy the design specitlcations. Sources of SLOC variation include the software designers’ experience, the programmers’ experience, the tanguage used, and any coding styles which were used or imposed on the programmers. This is also true of the effort required to complete the task – once again, various factors influence the actual amount of 2s2 effort documentation standards, organizational issues, requhements volatility, and schedule pressures. Clearly, various factors can influence what size and effort become actualized. Therefore, the actual size and effort which are observed are a single random observation taken from the distribution of all possible SLOC and work months. The actual distribution is not known because all that can be observed is the one observation. However, it is expected that the guesses of experienced estimators should be related to the acturd distribution. All participants were asked to estimate a low, most likely, and high estimate for both lines of code and effort. It seems reasonable to assume that the estimation process is an attempt to guess the actual distribution. persons estimated size and 24 estimated what is frequently called the implementation effort, or the effort of the programming team during the detailed design, code and unit test phases of the life cycle. The remainder estimated different portions of total development costs and were incomparable. Table 5: Summary Table of Size and Effort Estimates Description Of prime concern was that all information provided on effort and size (low, most likely and high) be used in comparing the estimates to the actual. One of the simplest distributions which can be derived from these parameters is the triangular probability distribution function (another possibility is the Beta distribution). There are two main approaches for evaluating whether sub-populations differed in how they estimated size or effort to compare different distributions with a Chisquare test or to compute and compare the means with the Students-t test. It was decided to keep the analysis simple and only compare the means of the different populations. Mean Std. Dev. Percentageof responses wtthin 10% 20% q Size (SLOC) Average Low Most Likely High 2866 1902 2194 10% 1640 — ■ 26’?3 3920 2138 3121 “— “— lffort (days) Average Low Most Likely 144 103 183 84 65 87 114 High 137 — — 17% — — . ’16% — 33% — ‘— ,— ■ 5.3 Summary of Results due to the manner in which the Unfortumtely, questionnaire was designed and administere~ the participants experienced some confusion over what should be included in the cost estimates. When the participants’ estimates were well documented, they were converted to implementation costs. Interestingly, those with application experience were in general [much mo~ detailed in their descriptions of what their cost estimates included, making it possible to adjust the estimates to fit a Those without application consistent definition. experience often had little written description of how they produced the effort estimates for the experiment. For estimates with unclear definitions, the productivity size/effort was computed. If the productivity was greater than 20, it was assumed that the estimate was of implementation effort unless documented as ‘being some other effort estimate. The selection a productivity of 20 was somewhat arbitrary. It is based upon the frequently used rule of thumb of 10 SLOC/day for fully burdened ground based software and a 100 percent burden rate. If biased, the results should be biased upward because, based on our project data, the average JPL Iunburdened productivity averages 18 SLOC/day [111. A total of 48 persons completed some portion of the experiment. Two observations were rejected as outliers since the size and effort estimates were approximately five times larger then the respective averages. Forty-two The actual size of the code used in the experiment is 4959 SLOC and the actual implementation effort was 160 days. The original estimate by the JPL SAR project staff was 2925 SLOC; design and code effort was estimated at 143 Low Most Likely Average Figure 1: Triangular High Size Distribution The mean of a triangular distribution is computed by SLOC = S Si*P(Si). The mean can be calculated numerically by step-wise incrementing (Si) from the low to high value, calculating the probability P(Si) and summing. 283 questionnaire or sample design problems. With respect to the effort estimates, there is clearly some questionnaire design problems. It is expected that database code, because it is somewhat system dependen~ will have a larger variance then math based code. Also, since the actual task estimate and the sample average for the SLOC are very close, it does appear that them is an underlying consistency in the estimation process. Given the poor accuracy of size estimates reported in the literature, it is strongly suspectedthat variances are inherently large. days. These are very close to the averages of the SLOC and effort estimates by the participants. There were a total of six missing values for the low and high size and effort estimates. These were imputed based on regressions which used other sample data such as total effort and size to predict the variable with the missing values. The R2 for these regressions were all above .%. As can be seen in Table 5, on average the effort estimates are more accurate than the size estimates with one-third of the population estimating within 20 percent of the true value, while the size estimates are only 42 percent of the actual with one-sixth estimating within 20 percent of the actual. In addition, the actual size of the code fell within the low-to-high estimate range only one-third of the time. The actual effort fell within the respondents’ low-to-high estimate range 50 percent of the time. This is consistent with the result that only 20 percent use size as a cost driver while everyone estimates effort. Earlier studies also found problems with chronic size underestimation, even for those with size estimation experience [121. Clearly, the technical staff estimates effort significantly better then size. However, based on the experiment, it is very likely (67 percent) that any one effort estimate can be more then 20 percent off, even with a detailed design document. Table 6: Summary of Effort Estimates Description Percentage of responses within Mean Std. Dev. 10% 20% (days) (days) 144 84 17% 33% Yes 169 77 23% 46% No 115 86 9% 18% 141 71 28% 57% <=6 months 180 76 >6 months 90 107 rOtd population q@ication Exp. stimation Exp. <6 Y(XUS >6 Y(%US Time to last estimate In general, the variances are large. However, the size variance is especially large at 77 percent of the mean, while the effort variance is 58 percent of the mean. Given the large variation, one might ask why mom projects do not experience cost overruns or underruns. One of the most important reasons is that cost contracts tend to be self fulfdling prophecies. If there is slack in the finding, then the task can be made more elaborate, or better equipment and software can be purchased. If the budget is tigh~ then documentation, testing and requirements can be sacrificed. Another factor is that the example code module is just one small piece of a larger program. If many small pieces are added together to get total project costs and the errors are symmetrically distributed with zero mean, then they might cancel out. 18% 36% o% o% Previous studies have reported that estimates tend to be skewed downward, revealing an optimistic bias[12,10,s,T]. Our data supports this conclusion. The population averages for low, most likely and high software size are 1902,2673 and 3970 SLOC. While less than for the size estimates, the effort &ta is still skewed downward with a low of 103 &ys, a most likely of 137 days, and a high of 183 days. Including all effofi estimates yield 146, 191 and 262 days respectively. 5.4 Analysis As long as the errors introduced into the sample by the data collection process are symmetrically distributed around zero, then the means of the sample distribution will be unbiased estimates of the population means. Variation can arise from several sources: (1) the variance for size estimates could be naturally large, (2) ignorance of the application Iype (only 35 percent of the respondents reported any domain experience), (3) carelessness on the part of the participants, and (4) In order to test the impact of different types of experience, two way and three way splits were defiied for all of the measures of experience as deemed appropriate. A t-test for comparing two means with different variances was used to determine if the average size and effort estimates for the sub-populations represent distinctly different distributions. The tables below only 2s4 Table 7: Summary of Size Estimates Description Mean (days) Std. Dev. (days) Percentage of responses within 10% 20% 2194 10% 16% 1988 2054 17% 27% 2378 4% 12% Yes 2337 2174 4% 12% No 3836 2484 16% Total Population 2866 application Experienc~ Yes 3397 No Assembly Experience 21% Fortran Experience Yes 3134 2400 9% 17% No 1695 1290 13% 13% Estimation Experiena <6 years 2670 2093 12% 249? >= 6 )K3iTUS Time to last estimatt <=6 months 3730 2561 13% 18% 2257 1236 o% o% >6 months display results which revealed signifkrmt differences at the 10 percent level or better. Given the large variances, a 10 percent rejection test is not unreasonable since the data is more likely to generate a false acceptance then to falsely reject the hypothesis. What the results show is that experience is a very important factor in the accuracy of size estimation. Experience seems to be of three main types: application experience, years of estimation experience and immediacy of experience (time since last estimate). To study the interaction of estimation experience and frequency experience, the sample was split into subgroups by both estimation experience and time to last estimate. The results are displayed in Tables 6 and 7. Those with application experience did significantly better at estimating both size and effort. Furthermore, all of the extreme values high and low, and the two outliers which were excluded, come form the subgroup with no application experience. defined by six or more years of estimation experience and at least one estimation completed in the past six months performed significantly better than the other groups. Their estimates appear to be comparable with size estimates made by models [1oI. The story for rhe size estimates is a little different from that for the effort estimates. Assembly and FORTRAN experience had a significant impact such that the presence of assembly experience is correlated with less accurate size estimates, while FORTRAN experience is correlated with more accurate size estimates. The data does clearly show that those with experience programming in assembly are signifkantly different from those who have no assembly experience. It was expected that those with C experience would do better at size estimation than those without C experience. Respondents without C experience were numerically wome at estimating size, but the mean comparison test camot distinguish between the two populations. Normally, one thinks of cost estimation as a process of determining development costs based on a set of requirements. However, when there are external constraints it is much more likely that a potential set of requirements is identified based upon a given budget and/or schedule. To analyze the impact of external constraints on the cost estimation process, a binary variable, External Fores, was defined as having a value of 1 if the estimator is normally budget or schedule driven and O otherwise. The impact of external forces had a significance of 10.4 percent, which was clIose to but above the acceptance value of 10 percent. In addition, there is less than a 7 percent probability that the actual size of 4959 SLOC could have arisen from the same probability distribution as the distribution of size estimates of those who are chronically subjected to external forces. This is most likely due to the fact that people who are budget and schedule driven dlo not have the need to be able to estimate the number of lines of code. Their concern is meeting the budgets and/or schedules which have already been set. When a person is forced to ptan under severe schedule and budget constraints, planning consists mostly of a prioritized list and a lot of optimistic promises. Since the population estimating effort is different than the population of those estimating size, it is interesting to look at the subgroup which estimated both. The basic patterns are the same as for the larger groups; however, the average estimates of size (3814) are much closer to the observed size (4959). The variances are This breakdown shows very clearly the importance of the combination of overall experience with the performance of regular cost estimation exercises. The subgroup 285 approximately the same. As a ratio of the means, however, they have fallen from 75 percent to 55 percent. The improved accuracy most likely reflects the fact that those who are included in the implementation effort estimation subgroup understood the experiment better. 6. Summary The problem is that while the cost maybe controlled, this does not prevent reductions in functionality and performance which result from poor cost estimates. Another important issue the survey raises is that the technical staff appears to be relatively more accurate at estimating effort than at estimating size. Only half of those surveyed reported doing size estimates as part of their job, and many of these stated that they only did size estimates because they had to for repting purposes. However, everyone reported making effort forecasts, and if effort was misestimated, then there usually will be some penalty at delivery time, whereas if size estimates are incorrect, it really does not make much difference. and Conclusions The results of the survey only represent a partial picture of the JPL software cost estimation environment as it did not cover the senior managerial staff. For the portion of the JPL staff who is represented the in survey, it was found that there is extensive experience (15 years in software and nine years cost estimation experience), that most estimators use a functional breakdown and informal analogy, and that they are better at estimating effort than size. However, in both cases the probability of an estimate being more than 20 percent off is very high (50 percent for those with experience). The results of the experiment appear to be consistent with the results reported in previous studies [7,9,10]. Given the very large variances and small sample sizes in this survey, it is hard to distinguish between 50UPS. But even under these conditions, one main result stands outi the most accurate estimates of executable source lines of code is produced by those with more then six years of cost estimation experience and who have completed at least one software cost estimation in the past six months. This average size estimate for this group differed from the actual by 25 percen~ as compared to the overall sample which differed by 42 percent. With respect to the estimation experiment, it is very clear that experience is very important, especially with respect to application and estimation experience. Those with application experience did significantly better at estimating both size and effo~ the means were closer to the observed size and the variances smaller. In general, there appears to be a lack of emphasis on accurate and detailed cost estimates. This is partly due to past experiences where quality mattered more than cost, and that more money would usually be forthcoming when it was absolutely necessary. This has resulted in a decreasedemphasis on careful cost analysis so that many respondents commented that they never had enough time or money to do a proper cost estimate. Therefore, it is to be expected that the survey reveals little use of any formal forecasting methodology and that the majority of estimates are produced by informal analogy. Unfortunately, the results indicate that most software estimates curmtly made are neither traceable nor reproducible. In addition, while forecasting software costs is a highly uncertain business, the~ is really no formal procedure in place to incorporate uncertainty in the estimation process. These results have definite implications for the use of software cost models, all of which depend heavily on size as an input. Not only were the size estimates low, but they varied widely. This means that providing a point estimate for a model probably provides virtually meaningless cost estimates. One solution is to not use sizing at all for cost estimation, but make effoti estimates based on functional requirements. Several other options are available: use models which incorporate uncertainty in the inputs and estimates, provide training to make persons better at estimating size, and establish a database which will produce size estimates based on functional descriptions. Another consideration is to use an alternative measure of software size. Several studies have found that Halstead’s measure of the number of unique operands is highly correlated to final size and cost [1’,131.In an earlier study funded by JPL, Lewis and Dodson [1’] report that estimates of operands were low by 24-34 percen~ which is an improvement in accuracy compared to estimates of lines of code. This last point is probably the most important because, for NASA programs, virtually every project has a significant portion of elements which are new. This means that guessing about something that has a significant number of unknown items is a standard part of the estimating environment and no database will ever be totally sufficient. What we have to learn is how to forecast about the future in an intelligent manner and to incorporate risk in a sophisticated manner. One of the ways JPL deals with this internally is to have the implementing sections make a contractual commitment to the program managem, which is the same game Congress is now trying to play with NASA and DOD contractors. 2s6 7. References So what is JPL doing in response to this situation? Several years ago the development of a software development standard was started and is now being transferred to all software development tasks. Parts of this standard have been included in the IEEE software standard. Clearly, standardizing the process will improve predictability. A variety of tasks have been and am being funded for the development of cost estimation models, metrics definition, collection, and database development. Project studies and postmortems are providing insights into managerial practices, how tasks evolve and the identification of the key cost metrics in the JPL environment. Workshops are being conducted to share techniques between managers and estimators. Finally, work is beginning in the definition of a more formalized cost estimation process. At this point in time it appears that the major focus will be on documentation of assumptions, techniques for incorporating risk and uncertainty, the use of multiple estimates, and the use of historical data, whether it be to support the use of models, analogy or other approaches to cost estimation. [1] Lame, D., M. Bush and Y. DeSoto, Software Intensive Systems Technical Workforce, JPL, SSORCE presentation, February 2,1987. [2] ‘The Software Trap – Automate or Else,: Business pp. 142-148. Week, May 9,1988, [3] Schlender, B. R., “How to Break the Software Logjam: Fortune, September 25,1989. pp. 100-112. [4] Myers, W., “Allow Plenty of Time for Large-Scale Software: IEEE Sofiware, July, 1989. pp. 92-99. [5] Brooks, F., “Essence and Accidents of Software Engineering: Computer, April 1987, pp. 10-’19. [6] Weinberg, G. and E. Schulman, “~Goals and Performance in Computer Programming,? Human Factors, 1974, Vol 16, No. 1, pp. 70-77. [7] Lewis, D. and E. Dodson, “An Exploratory Study of the Determinants of Software Size,” CR-2-1542, General Research Corporation, Santa Barbara, Ca., October 30,1987. [8] Wheaton, M., “Software Sizing Task Final Report;’ The Aeropsace Corporation, September 30, 1983. Acknowledgements This survey and experiment was funded by JPL’s Systems Software and Operation Resource Center (SSORCE). The authors gratefully acknowledge the assistance of Steve Wake a summer co-op tiom Virginia Tech with data collection, and Randy Cassingham of JPL’s Systems Analysis section for editing and rewrite assistance. [9] Kitchenbaum, B. and N. Taylor, “Software Project Development Cost Estimation: The Journal of Systems and Sojhwve, vol. 5, no. 4, November, 1985, pp. 270280. [10] “A Descriptive Evaluation of Software Sizing Models: Data and Analysis Center for Software, RADC, September 1987. [11] “A Productivity Analysis of JPL Software; SSORCE/Engineering Economic Analysis Group Technical Report No. 1. JPL Discreet, July 1989. [12] Boehm, B., Sofware Prentice-Hall Inc., 1981. Engineering Economies, [13] Albrecht, A. and J. Gaffney, “Software Function, Source LOC and Development Effort prediction, A Software Science Validation: Transactions of Sofware Vol. SE-9, No. 6, November, 1983, pp. Engineering, 639-648. 2s7
© Copyright 2024