Download Report

Cost Estimation
of Software Intensive Projects:A
Survey of Current
Practices*
Jairus Hihn
Harnid Habib-agahi
Jet Propulsion
Laboratory/California
Institute
4800 Oak Grove Drive
Pasadena, California
91109
Abstract
This paper deseribes a survey conducted of the staff of
the Jet Propulsion Laboratory (JPL) who estimate
software costs for software intensive projects in JPL’s
teehnical divisions. Respondents to the survey deseribed
what techniques they use in estimation of software costs
and, in an experiment, each respondent estimated the size
and cost of a speeiflc piece of software described in a
design document provided by the authors. It was found
that the majority of the technical staff estimating software
costs use informal analogy and high level partitioning of
requirements, and that no formal procedure exists for
incorporating risk and uncertainty. The technical staff is
significantly better at estimating effort than siztx
however, in both cases the variances are so large that
there is a 30 pereent probability that any one estimate can
be more than 50 percent off.
seeking to improve the quality of its estimation and
bidding practices. To achieve these objectives, research
has been conducted on metrics collection and project
analysis as wetl as the use and development of cost
models. In addition, since bottom-up or grass roots
estimates are the main methodology used for major JPL
projects, a survey was conducted of the technical staff
that actually performs the lowest level estimates during
the summer and fall of 1989.
The survey had two basic objectives: to identify the
current cost methods and practices within the JPL
technical divisions, and to obtain a clearer picture of the
accuracy of the effort and size estimates. The most
accurate approach that could be used to achieve these
objectives would be to observe individuals when they
make cost estimates and to collect data on estimates and
actual project size and effort.
Of course, using this
approach the problem is that it would take years to collect
sufficient data for analysis, and it would be very
expensive. The alternative is to conduct a survey in
which participants deseribe their cost estimation process
and participate in an experiment estimating the size and
effort for a piece of software that has atready been
completed. In order to obtain some usable information in
a reasonable time frame, the latter approach was selected.
1. Introduction
As with all other aerospace organizations, JPL is in the
process of having to confront the ever-increasing
importance of software. As of 1987, over 50% of the
technical staff was working on software intensive tasks
III. As is well known, significant cost and schedule
overruns are not uncommon on large scale software developments [2,3]. This is partly due to the contractual environment where in the past there have been incentives to
underbid in order to land the projec~ It is also due to the
inherent uncertainty of the cost estimation process [4]. In
an attempt to control cost overruns, Congress is increasing the emphases on the use of fixed price contracts.
In order to adjust to this changing environmen~ JPL is
This report contains the following section~ Section 1,
Introduction; Section 2, Sample Design and Background
Information;
Seetion 3, Summary of Participant
Background Da@ Section 4, Current Practices Section 5,
Size and Effort Experiment and Section 6, Summary and
Conclusions.
2. Sample Design
Information
* The research described in this paper was carried out at
the Jet Propulsion Laboratory, California Institute of
Technology, under a contract with the National
Aeronautics and Space Administration.
CH2982-7/91/OOOO/0276$Ol .00 @ 1991 IEEE
of Technology
and Background
The following brief overview of JPL is meant to provide
the proper context to understand the results of the survey
276
Recommended by: MitsunJ Ohba
and experiment. JPL is a NASA field center mn by the
California Institute of Technology under a government
contract. As a national laboratory, it performs research
and development activities, primarily the development of
robotic spacecraft for interplanetary studies. In addition,
a portion of JPL’s budget is supplied by non-NASA
organizations such as the Department of Defense.
compared to the sample breakdown of both the survey of
current practices and the experiment in Table 1. The Chisquare test results are .0755 and.1 135 respectively, which
for 5 degrees of freedom is less then a 1% pm’bability that
the distributions are different. The interpretation of the
test results are that the sample breakdowns for both the
survey and the experiment are not statistically different
fmm he planned distribution over the divisions.
JPL is divided into eight “offices”: two offices for upper
management and administration, four program offices,
the office of technical divisions and an office for
assurance activities. JPL is managed through a matrix
structure with the four program offices controlling the
funds and the eight technical divisions for the
implementation of tasks. The divisions are defined by
technical specialties, such as systems engineering or
information science. Sections within each division are
even more specialized and are further broken down into
groups. Due to the differences in specialization, the
viewpoints and practices between the divisions and even
sections can be markedly different.
Table 1: Sample Breakdown by Diviision
(percentages)
Division
At JPL, the methods used to estimate costs differ
depending upon the required reliability.
A “Class A’
estimate is the most rigorous, and a formal review is
required. For Class A estimates, model-based and other
independent estimates are supposed to be obtained.
However, most estimates are produced by a bottom-up
exercise which may involve as many as 25 persons for the
software activities portion of each task. These are the
types of individuals which were included in the survey.
Planned
Attempted
Completed
31
0.23
0.23
0.35
33
0.17
0.15
0.17
34
0.15
0.08
0.07
35
0.05
0.04
0.04
36
0.32
0.43
0.28
38
0.08
0.07
0.09
Obviously, not all personnel working cm software
intensive tasks estimate software costs. The Earth and
Space Sciences Division is composed of sections doing
hard science. The Institutional Computing and Mission
Operations Division performs operations and testing.
While software is developed in these divisions, their
developments are relatively small. The result was that
these divisions were dropped from the survey as no one
could be identified who would admit to estimating
software costs.
2.1 Sample Definition and Methodology
Over 185 persons were contacted for participation in the
survey. Of the 185 contacted, over 100 wem identified
who estimate effort, size and/or cost for software tasks.
Of these, 83 persons were willing to complete the
questionnaire on current software cost estimation
practices. In addition to completing the qumtiomaire, a
total of 48 persons completed some portion of the
software size and effort estimation experiment.
Only those who actually estimate software costs, either
for reporting or for planning their own activities, were
included in the survey. The sample is stratifkxi over the
technical divisions. The number of persons to include in
the survey fmm each division was determined by the
results of a 1987 survey of software intensive activities at
JPL [5]. The intention was to make the sample contain
the same distribution over the divisions as the actual JPL
labor force performing software intensive tasks. Only
those who actually estimate software costs as part of their
job were included in the survey. Potential participants
were identified by group supervisors and section
managers. In most cases, organization charts were
obtained and the process started with an interview of the
section manager. In this manner, a list of possible
req.mndents was identifkd and the breakdown across the
divisions was kept consistent with the 1987 survey. A
percentage breakdown comparing the dkribution across
divisions based on the 1987 survey Rsults (planned) is
Potential participants were interviewed personally. When
time permitted, the experiment portion of the survey was
completed while the interviewer waited. More often, the
experiment portion was explained and the results mailed
back later. On average 30 to 60 minutes were required to
complete the size and effort estimates, and the majority
were returned within a week.
The participants were asked to specify whether they
estimate size, effort, ancVordollars, and then, in their own
27
words, to describe their normal process of estimating
For purposes of analysis, these
software costs.
descriptions were then translated into binary and
categorical variables which describe how a task is
d~ompod,
how effort and size estimates are generated,
and what cost drivers are used. These variables will be
discussed in greater detail in Section 4.
For each participant, information was also collected on
experience in software cost forecasting, software
developmen~ and software management. Experience by
programming language was also collected.
For the
experiment the questionnaire asked for size and effort
estimates for the code described in a design document
which was provided as part of the experiment. (Copies of
the document are available ffom the authors.) A more
detailed discussion of the experiment is provided in
Section 5. Size was defined as executable source lines of
code (SLOC), excluding comments and blanks. Effort
was defined as the number of work days spent by the
software programming team during the detailed design,
code and unit test phases. Respondents could estimate
effort in days, weeks or months. For the analysis, all
estimates measured in months were converted to days,
assuming five days per week and 21 days per month. Respondents wete asked for three estimates the lowest,
most likely, and highest (triplet) estimates for both size
and cost. Respondents were also asked if they were
familiar with the type of software which was used in the
experiment (a database) and what, if anything, they did
differently from their normal method of cost estimation.
3. Summary
Data
of Participant
Background
Tables 2a and 2b summarizes the experience of the
survey participants, which should be representative of the
actual JPL population which participates in estimating
software costs within the technical divisions. Table 2a
summarizes the participants’ years of experience as cost
estimators, software developers and software managers.
Experience is ah broken down by type of language.
Since many projects use more than one language, the total
development and managerial experience will not be equal
to the sum of their respective language experience
categories. The breakdown by years of experience
provides a picture of the distribution of years of
experience. Table 2b presents a summary of how often
the respondents perform estimates.
The average person making cost
times per year.
estimates has approximately 15 years of experience
working on software tasks and nine years making cost
estimates. Seven of the 15 total years have been spent
with managerial responsibilities (cognizant engineer or
higher). Furthermore, two-thirds of the population has
from 11 to 30 years of development experience. This
reveals a substantial amount of experience that is being
utilized for cost forecasting. The majority of experience
is with FORTRAN and assembly. Although the design
used for the experiment was to be implemented in C,
there is relatively littte C experience in the population.
Four percent of those costing software have no softwwe
development expxience.
In order to measure how frequently estimates are made,
each participant was asked to specify the dates of their
last three estimates. These were translated to the number
of months from the date of each interview. Of those
interviewed, 76 percent had made at least one estimate in
the six months preceding the interview; 16 percent had
made as many as three estimates in the preceding six
months. The average time between estimates was eight
months.
4. Current
Practices
There are several different major steps required to make
an estimate. Each of these steps can be completed in
different ways. To identify how each step is being
completed at JPL, the descriptions of current practices
have been organized according to the following
questions. What do people estimate? To what extent are
estimates driven by external constraints?
How is
software partitioned for cost estimation?
How are
estimates produced?
What cost drivers are most
frequently used for estimation?
Each person’s description of their cost process is
relatively unique. For purposes of analysis, it was
necessary to simplify these descriptions by forcing each
description into a small number of general categories.
There is a fundamental tradeoff between precision in the
definition of the categories and the number of categories,
such that the more precise the definitions the greater the
number of categories. In this study, given the small
sample size and large variation within the data, a small
number of categories was used.
4.1 What do people estimate?
A simple summary of the data presented shows that the
average person making a software cost estimate has
substantial experience and makes estimates one or two
Everyone interviewed reported estimating effort, while
only 49 pment reported estimating size. However, while
278
Table
-----
Type of
Experience
software
Development
2a:
---
Resnnndentc
=
Information
.—.
--—-.--—
Percentage of respondents reporting
Standard
Deviation
(Years)
Mean
(Years)
Summarv
-.-.-r—----—.——
Oyears
1-5 years
6-10 years
11-30 years
14.9
7.6
4
8
22
66
Fortran
7.0
7.8
28
28
21
23
c
1.1
3.0
73
19
5
3
Ada
0.2
44.
1.2
96
3
1
0
Assembly
6.2
52
16
19
13
Other
3.0
4.5
58
28
9
5
Managerial
7.2
5.8
17
27
28
28
Fortran
3.6
4.4
54
22
18
6
c
1.5
2.7
61
31
8
0
A&
0.2
0.7
96
4
0
0
Assembly
1.8
3.7
82
8
6
4
Other
1.5
3.3
78
16
5
1
Estimation.
9.4
6.6
0
36
30
24
Table 2b: Respondents Summary Information
Percentage of respondents reporting
Estimate
Mean
(months)
Standard
Deviation
(months)
1-6
months
7-12
months
13-24
months
>25
months
Most Recent
Second Most Recent
5
14
7
14
76
40
17
6
23
22
1
15
Third Most Recent
21
16
16
25
27
31
approximately half of the respondents provide size
estimates, only 22 percent of those surveyed reported
actually using size as part of their cost estimation proees~
the others estimated size only to meet a reporting
requirement. A total of 69 percent reported estimating
dollars. All Department of Defense-sponsored projects
must provide size estimates. In the future, as part of the
JPL software process standard (known internally as D40CK)),all software intensive projects will provide size
estimates. Of course, this does not mean they must be
used for cost estimation or tracking.
4.2 To what extent are estimates driven by external
constraints?
The cost estimating process can be seriously impacted by
conditions of severe budget or schedule conskaints. The
result is that the estimator’s job becomes km one of
estimating costs and more one of analyzing system and
functional tradeoffs. In many cases, if there is strong
motivation to accept the work, the job may lbe accepted
under the assumption that any inconsistencies between requirements, cost and schedule will be resolved while the
task is under development. The result is that Ihere is little
value in developing competent estimators. Thirty percent
of the respondents reported being budget constrained
while 20 percent reported being schedule constrained.
279
4.4 How are estimates produced?
Twenty-four percent of the population reported having to
estimate projects under both signifkant schedule and cost
constraints.
4.3 How is software partitioned
When costestimatesare generated, several techniques are
often used either in combination or as alternative
estimates. For this reason, a primary and secondary
approach has been identifkd for each respondent. In
many casesall of the methods described below were used
by the estimator. The data presented reflects the authors’
interpretation of what techniques were the dominant ones
based on each participant’s description of their estimation
process.
for cost estimation?
Partitioning techniques have been grouped into three
main categoriw
function based, product based or
algorithmic.
The functional view is captured by two
categorie~ high level functional breakdown (HLF) and
low level functional breakdown (LLF). A partitioning
technique is categorized as HLF if the breakdown is at the
program set level or higher (CSCI for military projects)
and LLF if the breakdown is to one level below a
program set or lower (CSC for military projects). The
product view is captured by the wodc breakdown
structure (WBS) category. (This assumes that the WBS
and the document and code deliverables are consistent,
which is not always the case,) If a respondent reported
primarily using a WBS, but also completed a low level
functional breakdown, this was counted as LLF. Finally,
the algorithmic category captures the computational
process view of a software system.
High Level. Fn
Breakdown
53
Low Level Fn
Breakdown
28
Four categories are identified informal analogy, formal
analogy, rules of thumb and models. Rules of thumb,
formal analogy and informal analogy are closely related.
A respondent was defined as using informal analogy if
expert judgment is used or if detailed comparisons to
spec~lc projects were made but documented data was not
used to support the estimate. The responses appeared to
be divided eqwdly between the two extremes. Formal
analogy means that documented data were used. Rules of
thumb are used in many different ways; for example, to
em.imate overhead, support tasks, or software size. By
their very nature, roles of thumb must be seeded with
some other information – information usually generated
by analogy. This can be based upon data or experience.
Whether a rule of thumb is a formalization of expert
opinion or is derived from actual project data, the use of
well defined rules of thumb clearly documents the
estimator’s assumptions. There were two models used by
those surveyed CoCOMO, developed at TRW, and
Softcos4 developed at JPL in the early eighties. Both am
lines-of-code driven.
WBs
10
Table 4: Summary of Estimation Techniques
.
Table 3: Summary of Requirements Translation
Techniques
Translation of
Requirements
Algorithmic
Total
Percenhge
Respondents Method
ktimation Teehnique Primary (%) Secondary (%)
9
100
The data provide two main insights. One is that the vast
majority of estimates are based on a functional view of
the projec~ which is not surprising. Secondly, assuming
that WBS and algorithmic pardtioning are usually
completed at a high level, only 28 percent of those
surveyed used a low level breakdown of the requirements.
The extensive use of high level breakdowns most likely
reflects the existence of the vague and volatile
requirements which are an inherent part of R&D projects.
Analogy, Informal
83
34
Analogy, Formal
4
0
Rules of Thumb
6
55
Models
7
11
100
100
Total
The most prominent results are that 87 percent of those
surveyed use analogy for estimating purposes but only 4
percent of those surveyed use formal analogy. Only one
participant actually kept extensive records of previous
projects he had worked on. Another insight provided is
2s0
that rules of thumb play an important role as a secondary
method, where they are used 55 percent of the time.
4.5 What coat drivers are moat frequently
estimation?
used for
Cost drivers were most frequently used as a multiplier of
abase estimate. The value of the multiplier was normally
based on a well defined rule of thumb. However, in some
casesthe value was subjective and changed from estimate
to estimate.
In other cases, depending upon how
specialized the group was, the quality of the personnel
was simultaneously incorporated in a subjective manner
with the base estimate. These latter methods tend to
provide estimates which are not reproducible.
Other cost drivers mentioned in order of their ffequency
of occurrence were anything new (e.g., language,
application,
operating
system),
personnel,
and
complexity. These were incorporated into the process
primarily as ratio factors determined by expert judgment.
One-third of those interviewed described using some
form of measure of volume. Twenty-two percent used
lines of code, 10 percent used a form of “function points”,
two persons counted requirements and one estimated real
memory usage. Approximately 20 percent tried to
incorporate factors related to standards, such as the
amount of documentation or the number of reviews.
There were also factors which people knew were
important but they did not know how to incorporate into
their estimates. ‘Ihere was concern expressed about
estimating time for testing and reviews, and how to
All participant
incorporate risk and uncertainty.
mentioned incorporating functionality in some manner,
4.6 Analysis
The main insight that stands out from the current
practices portion of the survey is that the technical staff
are primarily
using a functional partitioning
of
requirements. Requirements are generally partitioned at a
relatively high level. Size and cost are estimated based
on informal analogy, However, there is virtually no
attempt to deal with risk in a systematic manner.
In general, there are few relationships or patterns that are
identifiable with respect to how people estimate software
size and cost. Based on Chi-square tests for contingency
table breakdowns of experience (estimation, development
and managerial), and division versus translation of
requ~ments and estimation method, no breakdown was
significant at the 10 percent level. A few marginal
patterns that were consistent with intuition are that the
Systems Division uses a WBS more than statistically
expected, and the Electronics Division
uses an
algorithmic approach more than expected. The expected
value is derived from the total distribution over the
categories, assuming that the two variables are
independen~ E(Xij) = N* Pi*Pj.
The less experience (especially software development
experience) that software cost estimators have, the greater
the likelihood that they use models or rules of thumb as
their primary method. Those with no software
development experience used an algorithmic partitioning
of requirements more than expected. This most likely
reveals that their work on software tasks is based on
domain or scientific knowledge, and thenefore they
natorally have a more processor descriptive view of what
the computer system will have to do.
5. Size and Effort
Experiment
There are several ways to look at the process of software
cost estimation. Dwect estimates can be made of the
effort required to complete a software project. This is
usually done by analogy to other similar projects that
have been completed. The problem with this approach is
that it depends upon the similarity of the projects being
referenced and, in the case of informal analogy, the
concern is the accuracy of the estimator’s memory. The
other major alternative is to indirectly estimate effort via
other project characteristics which are highly correlated to
cost. The most frequently used correlate is so]memeasure
of the size of the software system.
The most frequently used measure of size is the number
of source lines of code, which is used to derive the
number of work months by multiplying by some assumed
productivity.
The problem wirh this approach is that
actual productivity figures vary over a wide range due to
differences in software projects, such as differences in
requirements for ground versus flight
software,
complexity, programmer experience and capability,
coding style, and language differences [s]. In one study, it
was shown that the actual size of a piece of software
depends upon the main objective to be achieved (for
example, efficiency versus modularity) and can vary by a
factor of 5 to 1 [6]. In general, rhe final size of a piece of
code is more often driven by exceptional conditions than
by the functional requirements [71.
Previous studies have reported substantial errors and
changes in size estimates during the life c~f a project,
Wheaton [8] reports that size estimates grow an average of
105 percent from the time of contract award to the critical
281
design review. Kitchenbaum and Taylor [9] found tha~
on average, original estimateswere less than half of the
actual code size. A recent analysis of several size
estimation models yielded the result of an average
overestimate of 130 percent [10]. The most accurate
model estimates were based on information from detailed
designs and wem still off by 28 percent.
project to minimize the number of participants who
would have to be excluded due to preexisting knowledge.
A problem which arose was that the mapping from
requirements to the implementation of a code module can
be very complicated. Code segments from two software
tasks we= considered an~ in both cases, it would have
required a significant amount of time from several
members of the development staff to unravel the
relationships between the different documents, the code
and the development effort. Therefore, a code module
was selected based on art unusually detailed architectural
design document and the ability to identifi the actual
work effort and size. It is unclear what the impact on the
estimators is of having access to the greater information
contained in a design document as compared to a
requirements document. On the one hand, it could
improve the quality of the estimate; on the other hand, the
estimator may have difficulty integrating all of the extra
detail into their estimate becausethey are not used to such
a level of detail. In either case, it is one more factor
which makes the environment of the experiment different
from the usual conditions faced by the estimator.
Given the extent to which informal analogy is used by
technical personnel, it is important to obtain some idea as
to the overall ability of experienced estimators to
determine the size and effort of a piece of software. The
following contains a description of how the size and
effort estimation experiment was defined, a description of
the probabilistic viewpoint used to interpret the &t@ the
numerical results, and an analysis of these results.
5.1 Defining the Experiment
A major concern when setting up the experiment was that
different types of software may be easier or harder to
estimate. It was decided to resrnct the experiment to
applications code since the majority of programming
tasks at JPL are applications. A good cross section for
studying accuracy estimation should at least include user
interface, &tabase and mathematical applications code.
User interface code is very dependent upon the computer
system, language and software tools that am being used.
Therefore, if these are not specified there will be a very
large variation in the size estimates. On the other hand,
scientific or mathematical code is primarily dependent on
the equations and algorithms being coded and not the
software or compute~ therefore, this type of software can
most likely be estimated with greater accuracy. For this
survey, an example of a database nmdule was selected
from a representative JPL project (“SAR”).
The actual code described in the experiment’s design
document was a portion of a production controller (an
interactive program which permits interaction with the
database to produce products to fill user orders). The
code is primarily written in C, but portions of the module
were written in INGRES, a database language. The code
is of medium size, is well documented, and the
documentation could easily be isolated for a piece of the
functionality.
It was felt that the functionality of this
project was similar to many other projects at JPL, but
since it was a relatively small effort, the results would not
be known by very many of the potential respondents. For
purposes of analysis, it was important that preliminary
effort and size estimates were avaitable, as well as
consistent final effort and size figures. All of these
factors made this piece of software well suited for use in
the experiment.
A cost estimation experiment should provide to the
estimator information similar to that available when size
and effort estimates are made for an actual project. The
following criteria were used (1) the existence of a proper
requirements document which mapped cleanly into the
modules to be estimated (in this experiment a more
detailed software design document was used in lieu of a
requirements document~ (2) the size of the code could
not be too small (under 509 source lines of code (SLOC)
so as to be trivial), nor too large (over 10,000 SLOC to
requhe an excessive amount of time); (3) reliable
information on the actual project’s size and effort data had
to be available for evacuation of the estimat~, (4) the
code to be estimated must be straightforward so that most
respondents could comprehend the functionality being
describe@ (5) the code used had to be from a small
5.2 Theory
For any requirements document there are a set of designs
that will meet the requirements. For each design there is
some distribution of SLOC that will satisfy the design
specitlcations. Sources of SLOC variation include the
software designers’ experience, the programmers’
experience, the tanguage used, and any coding styles
which were used or imposed on the programmers. This is
also true of the effort required to complete the task – once
again, various factors influence the actual amount of
2s2
effort documentation standards, organizational issues,
requhements volatility, and schedule pressures. Clearly,
various factors can influence what size and effort become
actualized. Therefore, the actual size and effort which are
observed are a single random observation taken from the
distribution of all possible SLOC and work months. The
actual distribution is not known because all that can be
observed is the one observation. However, it is expected
that the guesses of experienced estimators should be
related to the acturd distribution. All participants were
asked to estimate a low, most likely, and high estimate for
both lines of code and effort. It seems reasonable to
assume that the estimation process is an attempt to guess
the actual distribution.
persons estimated size and 24 estimated what is
frequently called the implementation effort, or the effort
of the programming team during the detailed design, code
and unit test phases of the life cycle. The remainder
estimated different portions of total development costs
and were incomparable.
Table 5: Summary Table of Size and Effort Estimates
Description
Of prime concern was that all information provided on
effort and size (low, most likely and high) be used in
comparing the estimates to the actual. One of the
simplest distributions which can be derived from these
parameters is the triangular probability distribution
function (another possibility is the Beta distribution).
There are two main approaches for evaluating whether
sub-populations differed in how they estimated size or
effort to compare different distributions with a Chisquare test or to compute and compare the means with the
Students-t test. It was decided to keep the analysis simple
and only compare the means of the different populations.
Mean
Std.
Dev.
Percentageof
responses
wtthin
10% 20%
q
Size (SLOC)
Average
Low
Most Likely
High
2866
1902
2194 10%
1640 —
■
26’?3
3920
2138
3121
“—
“—
lffort (days)
Average
Low
Most Likely
144
103
183
84
65
87
114
High
137
—
—
17%
—
—
.
’16%
—
33%
—
‘—
,—
■
5.3 Summary of Results
due to the manner in which the
Unfortumtely,
questionnaire was designed and administere~ the
participants experienced some confusion over what
should be included in the cost estimates. When the
participants’ estimates were well documented, they were
converted to implementation costs. Interestingly, those
with application experience were in general [much mo~
detailed in their descriptions of what their cost estimates
included, making it possible to adjust the estimates to fit a
Those without application
consistent definition.
experience often had little written description of how they
produced the effort estimates for the experiment. For
estimates with unclear definitions, the productivity
size/effort was computed. If the productivity was greater
than 20, it was assumed that the estimate was of
implementation effort unless documented as ‘being some
other effort estimate. The selection a productivity of 20
was somewhat arbitrary. It is based upon the frequently
used rule of thumb of 10 SLOC/day for fully burdened
ground based software and a 100 percent burden rate. If
biased, the results should be biased upward because,
based on our project data, the average JPL Iunburdened
productivity averages 18 SLOC/day [111.
A total of 48 persons completed some portion of the
experiment. Two observations were rejected as outliers
since the size and effort estimates were approximately
five times larger then the respective averages. Forty-two
The actual size of the code used in the experiment is 4959
SLOC and the actual implementation effort was 160 days.
The original estimate by the JPL SAR project staff was
2925 SLOC; design and code effort was estimated at 143
Low
Most
Likely
Average
Figure 1: Triangular
High
Size
Distribution
The mean of a triangular distribution is computed by
SLOC = S Si*P(Si).
The mean can be calculated numerically by step-wise
incrementing (Si) from the low to high value, calculating
the probability P(Si) and summing.
283
questionnaire or sample design problems. With respect to
the effort estimates, there is clearly some questionnaire
design problems. It is expected that database code,
because it is somewhat system dependen~ will have a
larger variance then math based code. Also, since the
actual task estimate and the sample average for the SLOC
are very close, it does appear that them is an underlying
consistency in the estimation process. Given the poor
accuracy of size estimates reported in the literature, it is
strongly suspectedthat variances are inherently large.
days. These are very close to the averages of the SLOC
and effort estimates by the participants.
There were a total of six missing values for the low and
high size and effort estimates. These were imputed based
on regressions which used other sample data such as total
effort and size to predict the variable with the missing
values. The R2 for these regressions were all above .%.
As can be seen in Table 5, on average the effort estimates
are more accurate than the size estimates with one-third
of the population estimating within 20 percent of the true
value, while the size estimates are only 42 percent of the
actual with one-sixth estimating within 20 percent of the
actual. In addition, the actual size of the code fell within
the low-to-high estimate range only one-third of the time.
The actual effort fell within the respondents’ low-to-high
estimate range 50 percent of the time. This is consistent
with the result that only 20 percent use size as a cost
driver while everyone estimates effort. Earlier studies
also found problems with chronic size underestimation,
even for those with size estimation experience [121.
Clearly, the technical staff estimates effort significantly
better then size. However, based on the experiment, it is
very likely (67 percent) that any one effort estimate can
be more then 20 percent off, even with a detailed design
document.
Table 6: Summary of Effort Estimates
Description
Percentage of
responses
within
Mean Std. Dev.
10%
20%
(days)
(days)
144
84
17%
33%
Yes
169
77
23%
46%
No
115
86
9%
18%
141
71
28%
57%
<=6 months
180
76
>6 months
90
107
rOtd population
q@ication Exp.
stimation Exp.
<6
Y(XUS
>6 Y(%US
Time to last estimate
In general, the variances are large. However, the size
variance is especially large at 77 percent of the mean,
while the effort variance is 58 percent of the mean.
Given the large variation, one might ask why mom
projects do not experience cost overruns or underruns.
One of the most important reasons is that cost contracts
tend to be self fulfdling prophecies. If there is slack in
the finding, then the task can be made more elaborate, or
better equipment and software can be purchased. If the
budget is tigh~ then documentation, testing and
requirements can be sacrificed. Another factor is that the
example code module is just one small piece of a larger
program. If many small pieces are added together to get
total project costs and the errors are symmetrically distributed with zero mean, then they might cancel out.
18%
36%
o% o%
Previous studies have reported that estimates tend to be
skewed downward, revealing an optimistic bias[12,10,s,T].
Our data supports this conclusion.
The population
averages for low, most likely and high software size are
1902,2673 and 3970 SLOC. While less than for the size
estimates, the effort &ta is still skewed downward with a
low of 103 &ys, a most likely of 137 days, and a high of
183 days. Including all effofi estimates yield 146, 191
and 262 days respectively.
5.4 Analysis
As long as the errors introduced into the sample by the
data collection process are symmetrically distributed
around zero, then the means of the sample distribution
will be unbiased estimates of the population means.
Variation can arise from several sources: (1) the variance
for size estimates could be naturally large, (2) ignorance
of the application Iype (only 35 percent of the
respondents reported any domain experience), (3)
carelessness on the part of the participants, and (4)
In order to test the impact of different types of
experience, two way and three way splits were defiied
for all of the measures of experience as deemed
appropriate. A t-test for comparing two means with
different variances was used to determine if the average
size and effort estimates for the sub-populations represent
distinctly different distributions. The tables below only
2s4
Table 7: Summary of Size Estimates
Description
Mean
(days)
Std.
Dev.
(days)
Percentage of
responses
within
10%
20%
2194
10%
16%
1988
2054
17%
27%
2378
4%
12%
Yes
2337
2174
4%
12%
No
3836
2484
16%
Total Population
2866
application Experienc~
Yes
3397
No
Assembly Experience
21%
Fortran Experience
Yes
3134
2400
9%
17%
No
1695
1290
13%
13%
Estimation Experiena
<6 years
2670
2093
12%
249?
>= 6 )K3iTUS
Time to last estimatt
<=6 months
3730
2561
13%
18%
2257
1236
o%
o%
>6 months
display results which revealed signifkrmt differences at
the 10 percent level or better. Given the large variances,
a 10 percent rejection test is not unreasonable since the
data is more likely to generate a false acceptance then to
falsely reject the hypothesis.
What the results show is that experience is a very
important factor in the accuracy of size estimation.
Experience seems to be of three main types: application
experience, years of estimation experience and
immediacy of experience (time since last estimate). To
study the interaction of estimation experience and
frequency experience, the sample was split into
subgroups by both estimation experience and time to last
estimate. The results are displayed in Tables 6 and 7.
Those with application experience did significantly better
at estimating both size and effort. Furthermore, all of the
extreme values high and low, and the two outliers which
were excluded, come form the subgroup with no
application experience.
defined by six or more years of estimation experience and
at least one estimation completed in the past six months
performed significantly better than the other groups.
Their estimates appear to be comparable with size
estimates made by models [1oI.
The story for rhe size estimates is a little different from
that for the effort estimates. Assembly and FORTRAN
experience had a significant impact such that the presence
of assembly experience is correlated with less accurate
size estimates, while FORTRAN experience is correlated
with more accurate size estimates. The data does clearly
show that those with experience programming in
assembly are signifkantly different from those who have
no assembly experience.
It was expected that those with C experience would do
better at size estimation than those without C experience.
Respondents without C experience were numerically
wome at estimating size, but the mean comparison test
camot distinguish between the two populations.
Normally, one thinks of cost estimation as a process of
determining development costs based on a set of
requirements. However, when there are external
constraints it is much more likely that a potential set of
requirements is identified based upon a given budget
and/or schedule. To analyze the impact of external
constraints on the cost estimation process, a binary
variable, External Fores, was defined as having a value
of 1 if the estimator is normally budget or schedule driven
and O otherwise. The impact of external forces had a
significance of 10.4 percent, which was clIose to but
above the acceptance value of 10 percent. In addition,
there is less than a 7 percent probability that the actual
size of 4959 SLOC could have arisen from the same
probability distribution as the distribution of size
estimates of those who are chronically subjected to
external forces. This is most likely due to the fact that
people who are budget and schedule driven dlo not have
the need to be able to estimate the number of lines of
code. Their concern is meeting the budgets and/or
schedules which have already been set. When a person is
forced to ptan under severe schedule and budget
constraints, planning consists mostly of a prioritized list
and a lot of optimistic promises.
Since the population estimating effort is different than the
population of those estimating size, it is interesting to
look at the subgroup which estimated both. The basic
patterns are the same as for the larger groups; however,
the average estimates of size (3814) are much closer to
the observed size (4959).
The variances are
This breakdown shows very clearly the importance of the
combination of overall experience with the performance
of regular cost estimation exercises. The subgroup
285
approximately the same. As a ratio of the means,
however, they have fallen from 75 percent to 55 percent.
The improved accuracy most likely reflects the fact that
those who are included in the implementation effort
estimation subgroup understood the experiment better.
6. Summary
The problem is that while the cost maybe controlled, this
does not prevent reductions in functionality and performance which result from poor cost estimates.
Another important issue the survey raises is that the
technical staff appears to be relatively more accurate at
estimating effort than at estimating size. Only half of
those surveyed reported doing size estimates as part of
their job, and many of these stated that they only did size
estimates because they had to for repting
purposes.
However, everyone reported making effort forecasts, and
if effort was misestimated, then there usually will be
some penalty at delivery time, whereas if size estimates
are incorrect, it really does not make much difference.
and Conclusions
The results of the survey only represent a partial picture
of the JPL software cost estimation environment as it did
not cover the senior managerial staff. For the portion of
the JPL staff who is represented the in survey, it was
found that there is extensive experience (15 years in
software and nine years cost estimation experience), that
most estimators use a functional breakdown and informal
analogy, and that they are better at estimating effort than
size. However, in both cases the probability of an
estimate being more than 20 percent off is very high (50
percent for those with experience). The results of the
experiment appear to be consistent with the results reported in previous studies [7,9,10].
Given the very large variances and small sample sizes in
this survey, it is hard to distinguish between 50UPS. But
even under these conditions, one main result stands outi
the most accurate estimates of executable source lines of
code is produced by those with more then six years of
cost estimation experience and who have completed at
least one software cost estimation in the past six months.
This average size estimate for this group differed from the
actual by 25 percen~ as compared to the overall sample
which differed by 42 percent. With respect to the
estimation experiment, it is very clear that experience is
very important, especially with respect to application and
estimation experience. Those with application experience
did significantly better at estimating both size and effo~
the means were closer to the observed size and the
variances smaller.
In general, there appears to be a lack of emphasis on accurate and detailed cost estimates. This is partly due to
past experiences where quality mattered more than cost,
and that more money would usually be forthcoming when
it was absolutely necessary. This has resulted in a
decreasedemphasis on careful cost analysis so that many
respondents commented that they never had enough time
or money to do a proper cost estimate. Therefore, it is to
be expected that the survey reveals little use of any
formal forecasting methodology and that the majority of
estimates are produced
by
informal
analogy.
Unfortunately, the results indicate that most software
estimates curmtly
made are neither traceable nor
reproducible.
In addition, while forecasting software
costs is a highly uncertain business, the~ is really no
formal procedure in place to incorporate uncertainty in
the estimation process.
These results have definite implications for the use of
software cost models, all of which depend heavily on size
as an input. Not only were the size estimates low, but they
varied widely. This means that providing a point estimate
for a model probably provides virtually meaningless cost
estimates. One solution is to not use sizing at all for cost
estimation, but make effoti estimates based on functional
requirements. Several other options are available: use
models which incorporate uncertainty in the inputs and
estimates, provide training to make persons better at
estimating size, and establish a database which will
produce size estimates based on functional descriptions.
Another consideration is to use an alternative measure of
software size. Several studies have found that Halstead’s
measure of the number of unique operands is highly
correlated to final size and cost [1’,131.In an earlier study
funded by JPL, Lewis and Dodson [1’] report that
estimates of operands were low by 24-34 percen~ which
is an improvement in accuracy compared to estimates of
lines of code.
This last point is probably the most important because,
for NASA programs, virtually every project has a
significant portion of elements which are new. This
means that guessing about something that has a
significant number of unknown items is a standard part of
the estimating environment and no database will ever be
totally sufficient. What we have to learn is how to forecast about the future in an intelligent manner and to
incorporate risk in a sophisticated manner. One of the
ways JPL deals with this internally is to have the
implementing sections make a contractual commitment to
the program managem, which is the same game Congress
is now trying to play with NASA and DOD contractors.
2s6
7. References
So what is JPL doing in response to this situation?
Several years ago the development of a software
development standard was started and is now being
transferred to all software development tasks. Parts of
this standard have been included in the IEEE software
standard. Clearly, standardizing the process will improve
predictability. A variety of tasks have been and am being
funded for the development of cost estimation models,
metrics definition, collection, and database development.
Project studies and postmortems are providing insights
into managerial practices, how tasks evolve and the
identification of the key cost metrics in the JPL
environment. Workshops are being conducted to share
techniques between managers and estimators. Finally,
work is beginning in the definition of a more formalized
cost estimation process. At this point in time it appears
that the major focus will be on documentation of
assumptions, techniques for incorporating risk and uncertainty, the use of multiple estimates, and the use of historical data, whether it be to support the use of models,
analogy or other approaches to cost estimation.
[1] Lame, D., M. Bush and Y. DeSoto, Software
Intensive Systems Technical Workforce, JPL, SSORCE
presentation, February 2,1987.
[2]
‘The Software Trap – Automate or Else,: Business
pp. 142-148.
Week, May 9,1988,
[3] Schlender, B. R., “How to Break the Software
Logjam: Fortune, September 25,1989. pp. 100-112.
[4] Myers, W., “Allow Plenty of Time for Large-Scale
Software: IEEE Sofiware, July, 1989. pp. 92-99.
[5]
Brooks, F., “Essence and Accidents of Software
Engineering: Computer, April 1987, pp. 10-’19.
[6] Weinberg, G. and E. Schulman, “~Goals and
Performance in Computer Programming,? Human
Factors, 1974, Vol 16, No. 1, pp. 70-77.
[7] Lewis, D. and E. Dodson, “An Exploratory Study
of the Determinants of Software Size,” CR-2-1542,
General Research Corporation, Santa Barbara, Ca.,
October 30,1987.
[8] Wheaton, M., “Software Sizing Task Final
Report;’ The Aeropsace Corporation, September 30,
1983.
Acknowledgements
This survey and experiment was funded by JPL’s Systems
Software and Operation Resource Center (SSORCE).
The authors gratefully acknowledge the assistance of
Steve Wake a summer co-op tiom Virginia Tech with
data collection, and Randy Cassingham of JPL’s Systems
Analysis section for editing and rewrite assistance.
[9] Kitchenbaum, B. and N. Taylor, “Software Project
Development Cost Estimation: The Journal of Systems
and Sojhwve, vol. 5, no. 4, November, 1985, pp. 270280.
[10] “A Descriptive Evaluation of Software Sizing
Models: Data and Analysis Center for Software,
RADC, September 1987.
[11] “A Productivity Analysis of JPL Software;
SSORCE/Engineering
Economic
Analysis Group
Technical Report No. 1. JPL Discreet, July 1989.
[12] Boehm, B., Sofware
Prentice-Hall Inc., 1981.
Engineering
Economies,
[13] Albrecht, A. and J. Gaffney, “Software Function,
Source LOC and Development Effort prediction, A
Software Science Validation: Transactions of Sofware
Vol. SE-9, No. 6, November, 1983, pp.
Engineering,
639-648.
2s7