How to revise the GUM? Walter Bich

Accred Qual Assur (2008) 13:271–275
DOI 10.1007/s00769-008-0357-y
DISCUSSION FORUM
How to revise the GUM?
Walter Bich
Received: 22 November 2007 / Accepted: 5 January 2008 / Published online: 26 January 2008
Springer-Verlag 2008
The Guide to the expression of uncertainty in measurement,
GUM [1], was published in 1993 by the International
Organization for Standardisation (ISO) in the name of seven
international organizations, namely: ISO itself, The Bureau
International des Poids et Mesures (BIPM), the International
Electrotechnical Commission (IEC), the International Federation for Clinical Chemistry and Laboratory Medicine
(IFCC), the International Unions for Pure and Applied
Chemistry (IUPAC) and Physics (IUPAP), and the International Organization for Legal Metrology (OIML). It was
reprinted with minor corrections in 1995. In 1997, the same
organizations established the Joint Committee for Guides
in Metrology (JCGM). The International Laboratory
Accreditation Cooperation (ILAC) joined in 1998. The
JCGM has two working groups. Working group 1,
‘‘Expression of uncertainty in measurement’’, has the task
‘‘to promote the use of the GUM and to prepare supplements
for its broad application’’. Working group 2 ‘‘on International vocabulary of basic and general terms in metrology’’,
has the task ‘‘to revise and promote the use of the VIM’’. The
first meeting of JCGM-WG1 was in March 2000. It was then
decided that the GUM would not be revised in the short term,
despite some identified limitations and drawbacks. Supplements covering these limitations and drawbacks would
instead be produced [2]. The main reason for this decision
was that the GUM was becoming by that time the authoritative document in the field of measurement uncertainty; as
its elaboration had been a fragile compromise between several different views, it was not deemed timely to liven up a
debate never really worked out. Recently, the JCGM decided
that in the future the GUM would be revised, and the first
contributions started to appear. Specifically, in two recent
papers [3, 4] Rabinovich explains his views on improvement
of the GUM. In Ref. [3], the main criticism of the present
GUM is that it is intended only for repeated measurements,
and nothing is said in it about single measurements, which
constitute the large majority. In the second [4], various other
issues are addressed, among which the most important is
probably that concerning the old debate about the true value
of a quantity. In this paper I will try to explain the principles
underpinning the GUM, and discuss the lines along which I
think it should be revised and updated.
Papers published in this section do not necessarily reflect the
opinion of the Editors, the Editorial Board and the Publisher.
Generalities
W. Bich (&)
Istituto Nazionale di Ricerca Metrologica, 10135 Torino, Italy
e-mail: w.bich@inrim.it
In 1980 the working group established to address the problem of unified treatment of uncertainty in measurement
Abstract The announcement of a revision of the Guide to
the expression of uncertainty in measurement has renewed
the debate about the topic of measurement uncertainty. In
this paper the author, chairman of Working Group 1 of the
Joint Committee for Guides in Metrology, replies to the
theses given in two recent papers by Semion Rabinovich.
His opinions are personal, and are not necessarily shared by
the JCGM/WG1. They are to be intended as a further
contribution to the present discussion.
Keywords Metrology Measurement Measurement uncertainty
Introduction
123
272
clearly stated, in its recommendation INC-1, that uncertainties should be expressed as variances (or, better, as their
positive square roots) both for random and systematic
components. This recommendation was reaffirmed in the
CIPM recommendations CI-1981 (GUM, A.2) and CI-1986
(GUM, A.3). The decision to adopt variances was opposed
to a different view, according to which error limits were to
be preferred. There are good reasons for preferring variances to error limits. Variances are well-defined quantities
in the theory of random variables, and their properties,
especially as concerns their ‘‘propagation’’ have been
known for a long time. In contrast, error limits, or maximum errors, are difficult to define, and since their
properties depend on their definition, they are not good
candidates for unique treatment of uncertainties. The GUM
was developed according to the CIPM recommendations,
and the best one can do presently, almost 15 years since its
publication, is to try to further develop its framework, and
to resolve, within that framework, its inconsistencies. This
is precisely the mission of the JCGM/WG1.
Accred Qual Assur (2008) 13:271–275
The phase of deciding a model relating the measurand to
the input quantities is part of what is sometimes denoted as
the ‘‘formulation stage’’. The GUM approach to this specific stage is quite general, the only prescription being that
a model must be available. The case treated in Ref. [3]
concerns a specific model claimed to be suitable for single
measurements and is compliant with the general framework of the GUM. Some guidance, mainly through
examples, is given in the GUM about the formulation stage
(GUM, Annex H). In addition, a model is given for the
cosine error, as a case representative of a class of experimental situations in which highly asymmetric distributions
are involved for the input quantities (GUM, F.2.4.4).
Examples of such experimental situations in the chemical
field are titration or measurement of the concentration of
impurities. Encoding an experimental procedure in a suitable model can be a difficult task. Recognizing this fact and
with the aim of enriching the treatment of modelling, the
JCGM/WG1 will prepare a specific supplement devoted to
this topic [2]. The modelling of single, or ‘‘direct’’ measurements might be considered as an useful example.
The GUM framework
Input evaluation stage
In this section I will review the main steps of the GUM
uncertainty framework, commenting on their merits and
limitations.
Formulation stage
The framework of the GUM is now well-known, and is
based on the assumption that the measurand Y (or output
quantity) is not observed directly, but is obtained via a
number of, say, N other quantities Xi (the input quantities)
to which it is related through a known functional relationship, the model Y = f(X1, X2, …, XN). Even the
simplest, seemingly direct measurements, such as those
mentioned by Rabinovich, fall into this categorization. For
example, the indication of a bathroom balance, which is
expressed in divisions of the scale, is not the measurand Y
(which is the mass of the person in kilograms), but simply
one of the input quantities, say, X1. The measurand is
obtained from the indication X1, perhaps repeated two or
three times, and a series of corrections X2, X3, …, XN (the
zero and the span of the scale, and perhaps its linearity, or
the deviation of the local acceleration due to gravity from
that of the place in which the balance was manufactured
and adjusted). Although in many practical cases, to obtain
an estimate, the corrections are negligible, the model must
be used to evaluate the uncertainty of that estimate. In fact,
the uncertainty associated with negligible corrections is
not, in general, also negligible.
123
The subsequent step might be called the ‘‘input evaluation
stage’’. The experimenter has now to assign estimates to
the input quantities, as well as uncertainties associated with
these estimates.
Assignment of estimates is rather intuitive. Some may
come from indications (repeated or not) of instruments; in
this case one usually takes the average of the indications.
Others may be constant values taken from the literature or
from prior experience (for example, the value of a reference standard taken from a certificate of calibration, or a
coefficient of thermal expansion taken from a textbook). In
the simplest cases one has an indication from the reading of
an instrument and a number of corrections expected to be
equal to zero or to unity (for additive and multiplicative
models, respectively).
The assignment of the uncertainties associated with
input quantities is less obvious. Clause 4 of the GUM is
entirely devoted to this task, and Annex F gives practical
guidance on several cases, including that of a single estimate, among which the case of a single indication from an
instrument is included. Therefore, it is unfair to claim that
the GUM ‘‘does not mention single measurements’’ [3].
The procedures given in the GUM are well defined, and
the well-known classification of evaluations of uncertainty
in Types A and B has been introduced to distinguish clearly
between two procedures adopting statistics and probability
theory, respectively. The distinction is not only formal, and
deserves some discussion.
Accred Qual Assur (2008) 13:271–275
Type A evaluations
If a quantity is repeatedly sampled during the experiment,
so that a set of indications is available and its average is
used to estimate the quantity value, the experimental standard deviation of the mean of the sample is typically (but
not invariably) assigned as the uncertainty associated with
the estimate of that quantity (GUM, 4.2.3) (Type A evaluation). In the GUM, this is considered as an estimate of the
‘‘true’’ standard deviation, as the sample average is viewed
as an estimate of the ‘‘true’’ quantity value. A degrees of
freedom is attached to the estimate of the standard deviation, as a measure of its reliability (or uncertainty).
Therefore, uncertainties obtained by Type A evaluations are
uncertain themselves (see GUM, E.4).
Type B evaluations
If a set of indications is not available for a quantity, a
subjective distribution of probability (PDF) is assigned,
embodying the available knowledge of that quantity
(Type B evaluation). This approach is based on interpretation of probability as degree of belief. The mean
(expectation) of the PDF is the estimate assigned to the
quantity and its variance is calculated in the appropriate
way depending on the available information (GUM, 4.3).
Therefore, mean and variance of the PDF in Type B
evaluations are the true distribution parameters, and
should be viewed as exact, i.e, with no uncertainty. This
concept is not fully implemented in the GUM, as degrees
of freedom (although typically very high) are attached to
them. The assignment of subjective degrees of freedom
in Type B evaluations is not convincing and looks like
an ad hoc procedure to align Type B with Type A
evaluations (GUM, G.4.2), in view of the determination
of the expanded uncertainty (see below). In any case, this
internal inconsistency represents, not only in my opinion,
the main drawback of the GUM [5, 6]. However, as far
as the discussion concerns the uncertainties associated
with input estimates, the inconsistency is only conceptual
and does no harm.
Propagation stage
The subsequent stage is the ‘‘propagation stage’’. The
problem here is to obtain an estimate of the measurand
and its associated uncertainty, given the model, the input
estimates and the uncertainties associated with them.
Also in this specific topic the GUM does not suggest
anything new, but simply adopts a well-known property
273
of random variables [7], that is, a random variable which
is function of other random variables has a variance (and
an expectation) which can be obtained from those of the
independent variables upon which it depends, according
to a comparatively simple (elementary for the expectation) formula. This result is based on a first-order, or
linear, approximation. Therefore, it improves with the
closeness to linearity of the relationship compared to the
magnitude of the variances. If the nonlinearity is appreciable, higher-order terms can be added, subject to some
conditions, to improve the approximation. Therefore, as
much as only standard uncertainties are concerned, the
framework of the GUM is satisfactory, at least from the
practical viewpoint, in many experimental situations.
However, a word of caution is necessary here, especially
in connection with Type A evaluations. The point is that
the formula is valid for parameters of a PDF, that is, for
the ‘‘true’’ variances (and expectations), whereas, in the
present GUM framework, both the input quantity values
and variances are considered as estimates of the corresponding parameters. This implies that the estimates must
be close to the corresponding parameters for the formula
to be (approximately) valid.
Expanded uncertainty
The concept of expanded uncertainty was introduced to
meet the need for greater confidence in the possible value
of the measurand than that given by the standard uncertainty. The GUM solution is to multiply the standard
uncertainty u(y) by a numerical factor k. ‘‘In general, k will
be in the range 2 to 3’’ (GUM, 6.3.1). However, it should
be realized that simple multiplication by k does not add
value to the amount of information given by the standard
uncertainty (GUM, 6.2.3), unless a measure of the confidence is at hand, that is, the coverage probability is
known. This expanded uncertainty at a prescribed coverage probability, Up, is a measure of uncertainty which
really adds value with respect to standard uncertainty, and
is what is almost invariably required, for example, in
metrology, in the declaration of the Calibration and
Measurement Capabilities [8]. In that case, as in many
others, an interval is required within which the value of the
measurand lies with a known (typically high, say, 0.95)
degree of belief, or probability. In the framework of the
present GUM, this task is very difficult to fulfill, even
approximately, for two reasons. First, the shape of the PDF
for the output quantity Y is not known, especially in the
tails (which are the interesting part in a coverage interval);
second, the standard uncertainty itself is uncertain, which
makes uncertain not only the shape, but also the size of
the PDF.
123
274
Drawbacks and remedies
Drawbacks
From the above discussion, the main drawback of the present
GUM is the following internal inconsistency—on one hand
PDFs are interpreted as pictures of the available knowledge;
on the other, degrees of freedom are attached to their
parameters, which are viewed as estimates affected by an
uncertainty. In other words, implementation of the view of
probability as degree of belief is incomplete. This has
important consequences. If the input uncertainties are
uncertain, so is the output uncertainty, and thus the size of the
output PDF. The way out is suggested in Annex G of the
GUM, on the basis that in most situations the output PDF is a
scaled-and-shifted Student’s t distribution. Guidance is
given on how to determine the degrees of freedom of such a
PDF, the so-called effective degrees of freedom, from the
degrees of freedom of the input PDFs and their variances, by
means of the Welch–Satterthwaite formula (GUM, G.4). In
this case, a coverage interval can be constructed comparatively easily. The objection to this scheme is that the
conditions for the output PDF being a scaled-and-shifted
Student’s t distribution are quite strong and not likely to be
met in many practical cases. For example, the input PDFs
should be several, independent, and all more or less of the
same size or, alternatively, a Gaussian should dominate. This
requirement is not met for a simple measurement in which a
dominant uniform (say, a value of the reference standard
taken from a calibration certificate) is superposed on a small
Gaussian (representing the comparison noise). This specific
case can be treated analytically in a straightforward way, but
in many other cases the treatment is difficult, or impossible.
A second limit of the present GUM is that it does not
cover exhaustively the case of an arbitrary number of
measurands determined from a common set of input
quantities. This case is frequent, for example when measuring complex quantities typical of electricity.
Remedies
A full implementation of the concept of probability as degree
of belief would greatly help. In this view, the PDFs are
assigned to the input quantities on the basis of available
knowledge and are viewed as a way to encode the latter in a
rigorous mathematical language. This view, intuitive for
Type B evaluations, can easily be adopted for Type A
evaluations also. Although in the former case knowledge
comes from non-statistical ways and in the latter it comes
from a set of indications, both types can be encoded in the
same way, using the appropriate tools. In the former case, the
principle of maximum entropy would be used [9], in the
123
Accred Qual Assur (2008) 13:271–275
latter, Bayes’ theorem [10], yielding as a result a scaled-andshifted Student’s t distribution. As a consequence, the input
uncertainties would have no uncertainty other than the
uncertainty associated with the measurand. This modification would remove one of the difficulties in the propagation
of uncertainties. The law would be applied to expectations
and variances, rather than to their estimates, and therefore it
would be valid, within the limits of a first-order approximation. Also the construction of a coverage interval would
be simplified, being reduced to a mathematically clear
problem—that of determining the PDF of a random variable,
given the PDFs of those on which it depends. This problem
has a well-known formal solution [11], although calculation
difficulties prevent its application except in simple cases.
Therefore, it is preferred to use a numerical method, yielding
a numerical approximation of the PDF for the measurand,
from which the required coverage interval can easily be
constructed. This topic is treated in the first supplement to the
GUM since its publication, now in print [12].
As to the case of an arbitrary number of measurands, it
will be the subject of a second specific supplement, now at
an advanced stage of drafting.
It is worth noting that the supplements are to be considered as a complement to the GUM, and must be used in
conjunction with it. Whether to implement them in the
revision of the main document is still matter of debate.
Further issues
There are other reasons to revise the GUM. The most
important is to make it compliant with VIM 3 [13], which
introduced important modifications in terminology, and in
general to review the document carefully and eliminate some
minor ambiguities. This takes us to the questions raised by
Rabinovich in his other paper [4] concerning terminology
and, especially, the terms ‘‘true value’’ and ‘‘error’’ carefully
avoided in the present GUM. As concerns the former term,
the GUM framework implies a measurand which, during
measurements, is considered ‘‘essentially unique’’ (GUM,
1.2), although there would be no difficulty modifying the
framework to encompass more general measurands, for
example, those having an intrinsic uncertainty, be it definitional or other. In regard of the latter term, ‘‘error’’, it is, at
least in part, connected with the former, so that if one
becomes alive again there is no reason to demonize the other.
In any case, the VIM 3 has new (!) definitions to which a
revised GUM should be fully compliant. I personally would
have appreciated in VIM 3 a term such as ‘‘estimate’’, which
has a precise meaning in probability and would be the first
choice for an experimenter to indicate a value for the
measurand obtained from a measurement. In any case, the
decision to distinguish in the VIM 3 between true value and a
Accred Qual Assur (2008) 13:271–275
value of the measurand is appreciable and will contribute to
clarification of GUM language. This, in general, has to be in
equilibrium between VIM definitions and terms that have a
precise meaning in the language of probability theory, which
underpins the whole GUM approach.
The existence in the GUM of two uncertainties,
although with different qualifiers, ‘‘standard’’ and
‘‘expanded’’, might also be a source of ambiguity. In the
future GUM, it is likely that the role of expanded uncertainty, viewed as a coverage interval for symmetric PDFs,
will be de-emphasized, in line with the approach of Supplement 1. Also the classification of the methods of
evaluation of input uncertainties as Types A and B is likely
to follow a similar destiny.
From the notational viewpoint, a weak point of the
present GUM is the adoption of the same symbol for both a
quantity and the corresponding random variable; this
should be corrected in a future revision.
Conclusion
I have tried in this paper to convey some ideas on how the
GUM should be revised. They represent a personal viewpoint, although they are in a sense the elaboration of long
discussions with my colleagues on the JCGM-WG1, whom
I thank.
275
References
1. BIPM, IEC, IFCC, ISO, IUPAC, IUPAP, and OIML (1995)
Guide to the expression of uncertainty in measurement, 2nd edn.
ISBN 92-67-10188-9
2. Bich W, Cox MG, Harris PM (2006) Metrologia 43:S161–S166
3. Rabinovich SG (2007) Accred Qual Assur 12:419–424
4. Rabinovich SG (2007) Accred Qual Assur 12:603–608
5. Kacker RN (2006) Metrologia 43:1–11
6. Kacker RN, Jones AT (2003) Metrologia 40:235–248
7. Lee PM (1992) Bayesian statistics: an introduction. Edward
Arnold, London, 294 p
8. Comite´ International des Poids et Mesures (CIPM) (1999) Mutual
recognition of national measurement standards and of calibration
and measurement certificates issued by national metrology
institutes, BIPM, Paris. http://www.bipm.org/utils/en/pdf/mra_
2003.pdf
9. Weise K, Wo¨ger W (1992) Meas Sci Technol 3:1–11
10. Sivia DS (2004) Data analysis—a Bayesian tutorial. Oxford
University Press, Oxford, 189 p
11. Bickel PJ, Doksum KA (1977) Mathematical statistics. PrenticeHall, Englewood Cliffs, 492 p
12. BIPM, IEC, IFCC, ILAC, ISO, IUPAC, IUPAP, and OIML
(2008) Evaluation of measurement data. Supplement 1 to the
Guide to the expression of uncertainty in measurement. Propagation of distributions using a Monte Carlo method, ISO/IEC
Guide 98-3/Supplement 1. ISO, Geneva
13. BIPM, IEC, IFCC, ILAC, ISO, IUPAC, IUPAP, and OIML
(2007) International vocabulary of metrology—basic and general
concepts and associated terms, VIM, 3rd edn. International
Organization for Standardization, Geneva
123