A Theoretical Foundation of Ambiguity Measurement

A Theoretical Foundation of Ambiguity Measurement
Yehuda Izhakian∗†
April 17, 2015
Abstract
Ordering alternatives by their degree of ambiguity is a crucial element in decision-making processes. The current paper introduces an empirically applicable, stake-independent ambiguity measure that allows for such ordering. This measure relies upon the idea that, in the presence of
ambiguity, probabilities are themselves uncertain, and related preferences are applied to these
probabilities such that aversion to ambiguity is defined as aversion to mean-preserving spreads in
probabilities. Thereby, the degree of ambiguity can be measured by the volatility of probabilities.
The applicability of this measure is demonstrated by incorporating ambiguity into an asset pricing
model.
Keywords: Ambiguity Measure, Ambiguity Aversion, Knightian Uncertainty, Uncertain Probabilities, Ambiguity Premium.
JEL Classification Numbers: D81, D83, G11, G12.
∗
Department of Economics and Finance, Zicklin School of Business, Baruch College; yud@stern.nyu.edu
I thank Menachem Abudy, Yakov Amihud, Doron Abramov, David Backus, Adam Brandenburger, Menachem Brenner, Xavier Gabaix, William Greene, Eitan Goldman, Yaniv Grinstein, Sergiu Hart, Edi Karni, Ruth Kaufman, Peter
Klibanoff, Ilan Kremer, Evgeny Lyandres, Fabio Maccheroni, Massimo Marinacci, Sujoy Mukerji, Yacov Oded, Efe Ok,
Jacob Sagi, David Schmeidler, Uzi Segal, Marciano Siniscalchi, Laura Veldkamp, Paul Wachtel, Jan Werner, Jaime
Zender, Stanley Zin and especially Itzhak Gilboa, Mark Machina and Thomas Sargent for valuable discussions and suggestions. I would also like to thank the seminar and conference audiences at Bar Ilan University, Baruch College, Indiana
University, Johns Hopkins University, Michigan State University, New York University, Norwegian School of Business,
Tel Aviv University, The Interdisciplinary Center (IDC) Herzliya, The Hebrew University of Jerusalem, University of
Colorado, University of Houston, University of Michigan, Arne Ryde Workshop in Financial Economics 2013, Decision:
Theory, Experiments and Applications (D-TEA) 2013, Netspar International Pension Workshop 2013, North American
Meetings of the Econometric Society Northwestern University 2012, Risk Uncertainty and Decision (RUD) 2013, University of Chicago Workshop on Ambiguity and Robustness in Macroeconomics and Finance 2013, and Foundations of
Utility and Risk (FUR) 2014.
†
1
1
Introduction
How should uncertain alternatives be ranked by the criterion of ambiguity? Consider the following
example: a large urn contains 30 balls which are either black or yellow, in an unknown proportion,
and a second smaller urn contains only 10 balls which are also either black or yellow, in an unknown
proportion. Which of the following two bets is more ambiguous? “A ball drawn from the large urn
is yellow” or, “A ball drawn from the small urn is Yellow.” Say you were offered $10 if a ball drawn
from the large urn is yellow and $10 if a ball drawn from the small urn is yellow. Which of these two
bets would you choose? Answering this type of questions is part of almost any real-life decision. They
imply that decision-making involves the ordering of alternatives by their degree of ambiguity.
This paper introduces an empirically applicable ambiguity measure, underpinned by a new theoretical concept, that allows for such ordering of alternatives. This new concept proposes that, in
the presence of ambiguity, probabilities are themselves uncertain, and preferences concerning ambiguity are applied directly to these probabilities such that aversion to ambiguity is defined as aversion
to mean-preserving spreads in probabilities—analogous to the Rothschild-Stiglitz (1970) aversion to
mean-preserving spreads in outcomes. Thereby, the degree of ambiguity can be measured by the
volatility of probabilities, just as the degree of risk can be measured by the volatility of outcomes.
The resulting measure is objective, stake independent, simple and intuitive. It measures the degree of
ambiguity independently of individuals’ preferences and can be computed from the data in empirical
studies. These are key qualities for the introduction of ambiguity into economic and financial models.
The decision making framework underpinning the current paper is expected utility with uncertain
probabilities (henceforth EUUP), introduced by Izhakian (2014). This framework assumes two tiers
of uncertainty, one with respect to consequences (outcomes) and the other with respect to the probabilities of these consequences. A decision maker (DM) in this environment applies two differentiated
phases of the decision process, each refers to one of these tiers. In the first phase—the probability
formation phase—she forms a representation of her perceived probabilities for all the events which are
relevant to her decision. Then, in the second phase—the valuation phase, she assesses the value of
each alternative using her perceived probabilities and chooses accordingly. Ambiguity—the uncertainty
about probabilities—plays a role in the probability formation phase, while risk—the uncertainty about
consequences—plays a role in the valuation phase.1 This structure introduces a complete distinction
of risk from ambiguity with regard to both beliefs and tastes. The degree of ambiguity and attitudes
toward it are then measured with respect to one tier, while risk and risk attitudes are measured with
1
Risk is defined as a condition in which the event to be realized is a-priori unknown, but the odds of all possible
events are perfectly known. Ambiguity (Knightian uncertainty) refers to conditions in which not only is the event to be
realized a-priori unknown, but the odds of events are also either not uniquely assigned or are unknown.
2
respect to the other tier.
The main idea of EUUP is that, in the probability formation phase, perceived probabilities are
formed in a Bayesian approach by the “certainty equivalent probabilities” of uncertain probabilities.2
That is to say, an uncertain probability is modeled explicitly in a state space that is subject to a
prior probability, and the perceived probability is the unique certain probability value that the DM
is willing to accept in exchange for the uncertain probability of a given event. Perceived probabilities
are subjectively formed based upon the DM’s preferences concerning ambiguity. These preferences
are applied to probabilities such that aversion to ambiguity is defined as aversion to mean-preserving
spreads in probabilities. Thereby, the Rothschild and Stiglitz (1970) approach can be used over
probabilities to define an ordering by ambiguity.
Based upon probability ordering, this paper shows that the degree of ambiguity can be measured
by four times the expected volatility of probabilities, across the relevant events. Formally, the measure
of ambiguity is given by
∫
f2 [f ] = 4
X
E [φf (x)] Var [φf (x)] dx,
where f is an act (a bounded measurable function from states into consequences); X is a convex
subset of the real numbers (consequences); φf (·) is an uncertain probability density function; and
the expectation E [·] and the variance Var [·] are taken with respect to second-order probabilities
(probabilities over a set of probability distributions).3 The measure f2 (mho2 ) can be viewed as
an objective measure of ambiguity, as it measures ambiguous beliefs (information) in isolation from
individuals’ tastes for ambiguity. The main advantage of this measure is that it can be computed from
the data and can be employed in empirical tests.4 Stake independence is another major advantage
of f2 ; unlike risk measures, it does not depend upon the magnitude of consequences. This is an
important property of f2 , for example, for assessing the ambiguity associated with a particular stock
market, regardless of the investment amount and the associated risk.
Several approaches to estimating ambiguity have been proposed in the literature. Dow and Werlang (1992) measure uncertainty as the sum of the probability of an event and the probability of its
complementary event. Ui (2011) measures ambiguity by the difference between the minimal possible
mean and the true mean. Bewley (2011) and Boyle et al. (2011) measure ambiguity by a critical
confidence interval. Maccheroni et al. (2013) measure ambiguity by the variance of an unknown
mean. These studies assume that the variance of outcomes is known and suggest a stake-dependent
2
In this paper the terms perceived probabilities and subjective probabilities are used∑interchangeably.
The measure f2 is also applicable in finite state space. In this case, f2 [f ] = 4 i E [φf (xi )] Var [φf (xi )] , where
φf (·) is a probability mass function.
4
See, for example, Brenner and Izhakian (2011, 2012).
3
3
measure, based only on the variation of the mean. Nevertheless, ambiguous variance has been found
to be an important element in decision making processes; as stressed, for example, in Epstein and
Ji (2013).5 The ambiguity measure f2 , proposed in this paper, is stake-independent and encompasses
both ambiguous variance and ambiguous mean, as well as the ambiguity of all higher moments of
the probability distribution (i.e., skewness, kurtosis, etc.), through the uncertainty of probabilities.
Relative entropy, measured by the deviation of a probability distribution from a reference probability
distribution (reference model), can also be interpreted as a measure of ambiguity; see, for example,
Hansen et al. (1999), Hansen and Sargent (2001) and Maccheroni et al. (2006). However, while the
use of relative entropy is restricted to cases of a single prior relative to a known true probability
distribution, f2 can be employed in cases of multiple priors, when either a single true probability
distribution does not exist or it is unknown.
Measuring the degree of ambiguity allows alternatives to be ranked by the criterion of ambiguity.
The ambiguity measure is a critical instrument for introducing ambiguity into models that attempt
to explain observable phenomena such as financial anomalies. It provides a way to address important
questions that arise regarding the nature of ambiguity, in general, and the nature of the aggregate
ambiguity of portfolios, in particular. Accounting for ambiguity might shed light on some phenomena
that previously could not be fully explained. Notable examples include the fact that individuals tend
to hold very small portfolios, 3-4 stocks (Goetzmann and Kumar, 2008), the equity premium puzzle
(Mehra and Prescott, 1985), the risk-free rate puzzle (Weil, 1989), the phenomenon of the observed
equity volatility being too high to be justified by changes in the fundamental (Shiller, 1981), and the
home bias puzzle (Coval and Moskowitz, 1999).
To demonstrate the applicability of the proposed measure of ambiguity, this paper generalizes asset
pricing theory (Arrow-Pratt) to incorporate ambiguity. Relaxing the assumption that probabilities
are known, it shows that the price of an asset is determined not only by its degree of risk and
the DM’s attitude toward risk, but also by its degree of ambiguity and the DM’s attitude toward
ambiguity. The paper constructs an uncertainty premium and proves that it can be separated into a
risk premium and an ambiguity premium. It provides a well-defined ambiguity premium, attributed to
ambiguity and preferences concerning ambiguity and completely distinguished from the risk premium.
Previous models have been mainly focused on the theoretical aspects of the implication of ambiguity
for the equity premium (e.g., Chen and Epstein (2002), Izhakian and Benninga (2011), Ui (2011), and
Maccheroni et al. (2013)). Unlike these models, the ambiguity premium in the current paper can be
computed from the data and tested empirically. For example, Brenner and Izhakian (2011) show that
5
In empirical asset pricing and macroeconomic contexts, stochastic time varying volatility also plays an important
role; see, for example, Bollerslev et al. (1988), Fernandez-Villaverde et al. (2010) and Bollerslev et al. (2011).
4
ambiguity, measured by f2 , has a significant impact on the market portfolio return.6
The rest of the paper is organized as follows. Section 2 presents the decision-making framework.
Section 3 simplifies the framework as a preparation for the extraction of an ambiguity measure. Using
this simplified representation, Section 4 defines ordering of events by ambiguity and Section 5 uses this
ordering to suggest a measure of ambiguity. Section 6 analyzes the special properties of the proposed
measure, and Section 7 discusses it relative to alternative measures of ambiguity. To demonstrate
an application of this measure for asset pricing, Section 8 models the ambiguity premium. Section 9
concludes. All proofs are provided in the Appendix.
2
The decision making framework
The decision making framework employed in this paper is expected utility with uncertain probabilities
(EUUP), proposed by Izhakian (2014). EUUP assumes two different tiers of uncertainty, one with respect to consequences (outcomes) and the other with respect to the probabilities of these consequences.
Each tier is modeled by a separate state space. A decision maker (DM) in this framework applies two
differentiated phases of the decision process, each refers to one of these tiers. Preferences for ambiguity, which are applied to uncertain probabilities in the first phase of a decision process, rely upon
the Savage (1954) axiomatic foundation. They underpin perceived probabilities, which are structured
from uncertain probabilities in a Bayesian approach. Given these perceived (nonadditive) probabilities, preferences for risk are applied to consequences in the second phase of a decision process.7
These preferences, which rely upon the foundations of Schmeidler’s (1989) Choquet expected utility
and Tversky and Kahneman’s (1992) cumulative prospect theory, are formulated by Wakker’s (2010)
axiomatization.
Formally, let S be a (finite or infinite) nonempty state space, called the primary space, endowed
with a σ-algebra, E, of subsets of S. Generic elements of this σ-algebra are called events and are
denoted by E. Define X ⊆ R to be a convex set of consequences that contains the interval [0, 1]. Let
a primary act f : S → X be a bounded E-measurable function from states into consequences, and
denote the set of all these (Savage) acts by F and the set of all simple measurable acts by F0 . A simple
primary act can be represented as a sequence of pairs, f = (E1 : x1 , . . . , En : xn ) , where (E1 , . . . , En ) is
a generic partition of the state space S; xj is the consequence if event Ej occurs; and the consequences
Ä
ä
x1 , . . . , xn are listed in a non-decreasing order. A primary indicator act δE = E C : 0, E : 1 assigns
6
As far as I’m aware, prior studies do not conduct direct empirical tests of models of decision making under ambiguity
other than through parametric fitting and calibrations. Uppal and Wang (2003), Epstein and Schneider (2008), and Ju
and Miao (2012), for example, calibrate their model to the data. Several papers attribute different explanatory variables
to ambiguity. For example, Anderson et al. (2009) attribute the disagreement of professional forecasters to ambiguity.
7
Schmeidler (1989), in his pioneering study, introduces the idea that, in the presence of ambiguity, the probabilities
that reflect the DM’s willingness to bet may not be additive, i.e., the sum of the probabilities can be either smaller or
greater than 1.
5
the outcome 1 to event E ∈ E and the outcome 0 to its complementary event E C ∈ E. The domain
of first-order preference relation, %1 , is the set of primary acts F0 , and the relations -1 , ≺1 , ≻1 and
∼1 are defined as usual. A consequence x ∈ X is considered to be unfavorable if x ≤ k and favorable
if k < x, where k is a reference point. An event E ∈ E is considered to be unfavorable under act f if
f (E) ≤ k and favorable if k < f (E).
Probabilities of events E occurring in the primary space are determined in a (finite or infinite)
nonempty secondary space, defined by a set P of all possible additive probability measures over the
primary space S. A first-order probability measure P ∈ P is then viewed as a state of nature in
this secondary space, and the state space P is assumed to be endowed with the maximal σ-algebra,
Π = 2P , of subsets of P. A secondary act, fÊ : P → X , is a bounded function from the secondary
Á A secondary act
space P into the set of consequences X . The set of all secondary acts is denoted F.
fÊ that describes the resulting expected outcome of a primary act f contingent upon a prior P ∈ P
(on S), is denoted fˆ; that is, fˆ : P → X satisfies fˆ (P) =
∫
S
Á
f dP. The set of all secondary acts fˆ ∈ F
“ and the subset of all secondary acts in F
“ that are associated with primary indicator
is denoted F,
ˆ A secondary act δˆE : P → [0, 1] in ∆,
ˆ associated with a primary indicator act
acts is denoted ∆.
Ä
ä
δE = E C : 0, E : 1 , is given by
δˆE (P) = P (E)
for every P ∈ P. A secondary act δˆE can, therefore, be viewed as a function that assigns each event
E ∈ E with its possible probabilities. In this view, δˆE can be interpreted as an uncertain variable
describing the probability P (E) of event E. A second-order non-atomic finitely-additive probability
measure χ on Π assigns each subset A ∈ Π of first-order probability measures in P with a probability
χ (A).8 This second-order belief is implicated in the DM’s second-order preference relation %2 over
Á In the view of δˆE : P → [0, 1] as describing the (uncertain) probability
the set of all secondary acts F.
ˆ defines a preference over probabilities.9
P (E) of event E, the preference relation %2 over ∆
Suppose that X and S satisfy the required richness of EUUP, the preference relation %1 on the set
of primary acts F0 satisfies Wakker’s (2010, Theorem 12.3.5) axioms, the preference relation %2 on the
Á satisfies Savage’s (1954) axioms, and that they jointly satisfy Izhakian’s (2014)
set of secondary acts F
axiom. Then, by Izhakian (2014, Theorem 1), there exists a function V : F0 → R such that
f %1 g
⇐⇒
V (f ) ≥ V (g) ,
To maintain non-atomic when P is finite, one can define the probability measure χ on the product σ-algebra Π ⊗ 2J ,
where J is a non-singleton convex set of probability distributions over some auxiliary state space S1 .
9
ˆ associated with the primary indicator
To understand this interpretation, consider the two secondary acts δˆE , δˆF ∈ ∆,
acts δE , δF ∈ F (whose outcomes are the same), and assume a DM who prefers δˆE to δˆF . This means that she prefers
to get the good outcome with the (uncertain) probability P (E) than with the (uncertain) probability P (F ).
8
6
for every f, g ∈ F0 , where
∫
V (f ) =
k
∫−∞
∞
ï
Γ−1
Γ−1
Å∫
P
Å∫
P
k
Ä
ã
ä
ò
Γ δˆ{s∈S | U(f (s))≥z} (P) dχ − 1 dz +
Ä
ä
(1)
ã
Γ δˆ{s∈S | U(f (s))≥z} (P) dχ dz;
U : X → R is strictly increasing continuous bounded functions, normalized such that U (k) = 0; and
Γ : [0, 1] → R is a non-constant bounded function.10 Furthermore, χ is uniquely determined, U is
unique up to a unit, and Γ is unique up to a positive linear transformation.
The Bayesian approach asserts that everything that is not known should be modeled explicitly in a
state space and be subject to a prior probability. The model of Equation (1) applies this approach to
uncertain probabilities. As a result, the function V takes the form of a two-sided Choquet integration
to unfavorable outcomes and to favorable outcomes (relative to the reference point). This functional
representation of the DM’s aggregate preferences makes a complete distinction between beliefs and
tastes and between risk and ambiguity. First-order beliefs are formed by the uncertain probability
measure P; second-order beliefs are formed by the probability measure χ; risk preferences are formed
by the utility function U;11 and ambiguity preferences are formed by the function Γ. The function Γ,
referred to as a outlook function, forms the DM’s attitude toward ambiguity. As with risk attitudes,
there are three types of attitudes toward ambiguity: aversion to ambiguity (formed by a concave Γ),
loving of ambiguity (formed by a convex Γ) and indifference to ambiguity (formed by a linear Γ).
To simplify the functional representation V in Equation (1), secondary acts (δˆE ) can be replaced
by their resulting probabilities to obtain
∫
V (f ) =
k
∫−∞
∞
k
ï
Γ−1
Γ−1
Å∫
ã
P
Å∫
P
ò
Γ (P ({s ∈ S | U (f (s)) ≥ z})) dχ − 1 dz +
(2)
ã
Γ (P ({s ∈ S | U (f (s)) ≥ z})) dχ dz.
This functional representation considers acts taking infinitely many values in an infinite state space.
It is important to note that all the results in this paper can be applied to a discrete representation in
a finite state space with acts taking finitely many values.
3
Preliminaries
To elicit a measure of the degree of ambiguity, the functional representation V has to be further simplified. The key for the additional simplification is the DM’s perceived probabilities. In EUUP, these
10
EUUP stems from the multiple priors paradigm (Gilboa and Schmeidler, 1989) and results in a two sided variation of
CEU (Gilboa, 1987 and Schmeidler, 1989). It combines the concept of nonadditive probabilities with the idea of referencedependent beliefs, which is applied to differentiate between the probability of unfavorable and favorable events. Since the
focus of this paper is ambiguity measurement, as opposed to preferences for ambiguity, it is assumed for simplicity that
the DM has the same preference for ambiguity, formed by Γ, concerning unfavorable and favorable events.
11
As usual, a concave U implies risk aversion, and a convex U implies risk loving.
7
are derived from the nature of the uncertainty about probabilities (ambiguity) and the DM’s preferences concerning this uncertainty. In particular, the concept of EUUP is that perceived probabilities
are formed by the certainty-equivalent probabilities of uncertain probabilities in a Bayesian approach.
Formally, the perceived probability Q(E) of event E ∈ E is defined by12
Q(E) = Γ
−1
Å∫
ã
P
Γ (P (E)) dχ .
(3)
This probability is a function of first-order (uncertain) probabilities, formed by a set P of possible
probability measures over E; a second-order probability measure χ (second-order belief) over P; and
the DM’s preferences concerning ambiguity, applied to probabilities and captured by Γ. Equation (3)
proposes that while making decisions, the DM, who views uncertain probabilities as a set of priors,
aggregates these probabilities in a nonlinear way to form her perceived probabilities.13
To simplify the exposition of perceived probabilities in Equation (3), the perceived probability
Q(E) of an event E ∈ E can be approximated by taking a second-order Taylor approximation with
respect to its first-order probabilities P (E) around its expected probability E [P (E)].14 To this end,
∫
E [P (E)] =
P
P (E) dχ
is defined to be the expected probability of event E; and
Var [P (E)] =
∫ Ä
P
P (E) − E [P (E)]
ä2
dχ
to be the variance of the probability of event E.
Theorem 1. Assume a strictly-increasing, continuous and twice-differentiable Γ satisfying
1
2
(
Γ′′ (E[P(F )])
Γ′ (E[P(F )]) Var [P (F )]
−
Γ′′ (E[P(E∪F )])
Γ′ (E[P(E∪F )]) Var [P (E
)
∪ F )] ≤ E [P (E)] for any events E, F ∈ E. Then,
for a relatively small P (E), the perceived probability of event E is
Q(E) ≈ E [P (E)] +
1 Γ′′ (E [P (E)])
Var [P (E)] .
2 Γ′ (E [P (E)])
Notice that the approximated perceived probabilities satisfy Q(∅) = 0, Q(S) = 1, and set monotonicity with respect to set-inclusion, i.e., Q(E) ≤ Q(F ) if E ⊂ F (by Lemma 2).15 The condition on
Γ bounds the level of ambiguity aversion (the concavity of Γ) and the level of ambiguity loving (the
convexity of Γ) to assure that the approximated perceived probabilities are nonnegative and that the
12
This functional representation is obtained by the value function V of an indicator act, formed in Equation (1), and
by replacing the secondary act with its resulting probabilities; see Izhakian (2014).
13
As a consequence of probabilistic sensitivity, i.e., the nonlinear ways in which individuals may interpret probabilities,
perceived probabilities are nonadditive. That is, the sum of the probabilities can be either smaller or greater than 1.
Ambiguity aversion results in a subadditive probability measure, while ambiguity loving results in a superadditive
measure.
14
The same method was applied by Arrow (1965) and Pratt (1964) to consequences within the expected utility
framework, whereas in this case it is applied to probabilities.
15
In fact, Q is a capacity—a subjective nonadditive probability; see, for example, Schmeidler (1989).
8
probability of an event is not smaller than the probability of any of its sub-events. Henceforth it is
assumed that Γ satisfies this condition. The perceived probabilities, proposed in Theorem 1, provide
a natural way to simplify the functional representation of preferences over acts to a more applicable
form.
Proposition 1. Suppose that the axioms of Izhakian (2014, Theorem 1) and the conditions of Theorem 1 are satisfied. The value of an act f ∈ F0 , formed in Equation (2), can then be written
∫
V (f ) ≈ −
∫
k
−∞
∞
∫
+
−∞
E [P ({s ∈ S | U (f (s)) ≤ z})] dz +
k
∞
E [P ({s ∈ S | U (f (s)) ≥ z})] dz
1 Γ′′ (E [P ({s ∈ S | U (f (s)) ≥ z})])
Var [P ({s ∈ S | U (f (s)) ≥ z})] dz.
2 Γ′ (E [P ({s ∈ S | U (f (s)) ≥ z})])
To simplify the notation, the following conventions are used. Pf (x) stands for the cumulative probability P ({s ∈ S | f (s) ≤ x}), and φf (x) stands for the probability density φ ({s ∈ S | f (s) = x}).16
It is assumed that a density function φf (x) exists and well defined for every x ∈ X . When it is clear
from the context, the subscript f , indicating an act, is omitted. With this notation in place, the next
theorem presents a dual representation of the value function V.
Theorem 2. Suppose that the axioms of Izhakian (2014, Theorem 1) and the conditions of Theorem 1
are satisfied. The dual representation W of the value function V can then be approximated by
∫
Ç
å
Γ′′ (1 − E [Pf (x)])
U (x) E [φf (x)] − ′
W (f ) ≈
E [φf (x)] Var [φf (x)] dx +
Γ (1 − E [Pf (x)])
−∞
Ç
å
∫ ∞
Γ′′ (1 − E [Pf (x)])
U (x) E [φf (x)] + ′
E [φf (x)] Var [φf (x)] dx.
Γ (1 − E [Pf (x)])
k
k
(4)
That is,
f %1 g
⇐⇒
W (f ) ≥ W (g) .
The functional representation provided by this theory eases the use of EUUP theory in general
and the extraction of an ambiguity measure in particular. The following corollary demonstrates it.
Corollary 1. Assume a DM typified by a constant relative ambiguity aversion (CAAA), i.e., Γ (P (E)) =
(P(E))1−η 17
.
1−η
The value function then takes the form
∫
W (f ) ≈
k
∫−∞
∞
Ä
ä
Ä
ä
U (x) E [φf (x)] − ηE [φf (x)] Var [φf (x)] dx +
U (x) E [φf (x)] + ηE [φf (x)] Var [φf (x)] dx.
k
In a discrete representation, when the state space S is finite, φf (x) stands for the probability mass function.
CAAA means that, given a return value, while shifting linearly the range of its possible probabilities, the attitude
toward ambiguity remains unchanged. See Izhakian (2014) for a detailed discussion about the nature of different attitudes
toward ambiguity.
16
17
9
Notice that if the DM is ambiguity neutral or if there is no ambiguity, i.e., the variance of probabilities is zero, the functional representation of the value of an act collapses to the classical expected
utility representation
∫
W (f ) =
∞
−∞
U (x) E [φf (x)] dx.
That is, no disutility occurs.
4
Ordering ambiguous events
A preliminary step in ordering acts by their degree of ambiguity is to define such an order over events.
This ordering is determined by the second order preference relation %2 . In the view of a secondary
act δˆE : P → [0, 1] as describing the (uncertain) probability P (E) of event E, the preference relation
ˆ of δˆ may well be referred to as a preference relation over probabilities. To see this,
%2 over the set ∆
ˆ associated with the primary indicator acts δE , δF ∈ F
consider the two secondary acts δˆE , δˆF ∈ ∆,
(whose outcomes are the same), and assume a DM who prefers δˆE to δˆF . This means that she
prefers to get the good outcome with the (uncertain) probability P (E) than with the (uncertain)
probability P (F ). That is, she prefers P (E) to P (F ). With this notion, an ordering of events by
their degree of ambiguity, induced by the DM’s preferences, can be defined as follows.
Definition 1. Let the uncertain probabilities of events E, F ∈ E have the same expectation, i.e.,
E [P (E)] = E [P (F )]. Event F is more ambiguous than event E if and only if
δˆE %2 δˆF
by any ambiguity-averse DM.
This definition provides a subjective ordering of events that arises from the DM’s preferences concerning ambiguity. Notice that preferences concerning ambiguity, %2 , apply only to the probabilities
of events and not to their consequences. Notice also that, by Izhakian (2014, Theorem 1), in EUUP18
δˆE %2 δˆF
⇐⇒ Γ−1
Å∫
ã
P
Γ (P (E)) dχ ≥ Γ−1
Å∫
ã
P
Γ (P (F )) dχ ,
implying that an objective ordering by the degree of ambiguity can be defined by mean-preserving
spreads in probabilities. Rothschild and Stiglitz (1970) apply the idea of mean-preserving spreads to
outcomes in order to define a ranking by risk, whereas here, this idea is applied to probabilities in
order to define a ranking by ambiguity.
18
This obtained immediately by applying the value function in Equation (1) to indicator acts.
10
Definition 2. Event F ∈ E is more ambiguous than event E ∈ E if there exists a random variable ϵ
such that
P (F ) − E [P (F )] =d P (E) − E [P (E)] + ϵ,
where =d means equal in distribution and E [ϵ | P (E)] = E [ϵ] = 0.19 That is, P (F ) is a meanpreserving spread of P (E). If ϵ is not identically zero, then F is strictly more ambiguous than E.
This definition does not assume that events share an identical expected probability or similar probability distributions. In this definition, one event being more ambiguous than another is a condition on
the deviations of its possible probabilities from the respective expectations of probabilities. It implies
that, given a random variable ϵ with E [ϵ] = 0, an event with the uncertain probability P (E) + ϵ is
more ambiguous than event E with the uncertain probability P (E). In turn, this implies that any
event with a non-constant probability is strictly more ambiguous than an event with its expected
probability. The next proposition ties the subjective ordering of Definition 1 by the DM’s preferences
to the objective ordering of Definition 2, guided by the notion that every ambiguity-averse DM prefers
a less ambiguous event to a more ambiguous one, assuming both have the same consequence and the
same expected probability.
Proposition 2. Suppose E and F are events in E with identical expected probabilities. Then,
P (F ) − E [P (F )] =d P (E) − E [P (E)] + ϵ
⇐⇒
δˆE %2 δˆF
by every ambiguity-averse DM, where E [ϵ | P (E)] = E [ϵ] = 0. That is, definitions 1 and 2 of the more
ambiguous event coincide.
The next step is to define the conditions under which spreads in probabilities can be measured
by the variance of probabilities such that the higher the variance of probabilities, Var [P (E)], the
higher the degree of ambiguity. These conditions apply to the nature of both, the DM’s preference for
ambiguity and to her beliefs. The former refers to cases where the DM’s attitude toward ambiguity
is quadratic or of the CAAA type. The latter refers to cases where the probabilities of events are
uniformly or elliptically distributed, i.e., the second-order probability distribution (probabilities of
probability distributions) is uniform or elliptical.20 The probability P (E) of event E is said to be
The condition E [ϵ | P (E)] = E [ϵ] means that ϵ is mean-independent of the uncertain probability P (E) of event E.
Note that, equality in distribution is a much weaker condition than equality and that mean-independence is less strong
than independence; independence implies mean-independence, but the converse is not true.
20
For applications of elliptically distributed returns to asset pricing theory see, for example, Owen and Rabinovitch (1983).
19
11
elliptically distributed if its probability characteristic function is of the form
Å
ϕP(E) (t) = eitE[P(E)] Ψ
where i =
ã
1 2
t Var [P (E)] ,
2
√
−1 and Ψ is a characteristic generator.21
Theorem 3. Suppose E and F are events in E with identical expected probabilities. Then,
F is more ambiguous than E
⇐⇒
Var [P (F )] ≥ Var [P (E)] ,
when one or more of the following conditions hold:
(i) The probabilities of events E and F are uniformly distributed;
(ii) The probabilities of events E and F are truncated elliptically distributed with an identical characteristic generator;22
(iii) The DM’s attitude toward ambiguity is of the CAAA type;
(iv) The DM’s attitude toward ambiguity is quadratic.
This theorem proposes that, given an event, the greater the spread of its possible probabilities, the
greater its ambiguity. It suggests that, when probabilities are uniformly or elliptically distributed, the
ordering of events by the variance of their probabilities coincides with the ordering of Definitions 1
and 2. Henceforth, it is assumed that the probabilities of all events are either uniformly or elliptically
distributed. If needed, this assumption can be replaced by assuming a CAAA or a quadratic outlook
function, Γ. At this point, the order of events by their degree of ambiguity, measured by Var [P (·)],
is well-defined. Using this order, one can define stochastic dominance with respect to ambiguity by
applying probabilities to outcomes.23
Definition 3. Let f, g ∈ F0 be two primary acts under which the expected probabilities of each consequence x ∈ X are identical. That is, E [φf (x)] = E [φg (x)], for any given x ∈ X . Act f first-order
stochastically (“cumulatively”) dominates act g with respect to ambiguity if and only if
∫
x
−∞
∫
E [φf (z)] Var [φf (z)] dz ≤
x
−∞
E [φg (z)] Var [φg (z)] dz
for any x ∈ X .
The notion of first-order stochastic dominance with respect to ambiguity allows for the definition
of the relation between the objective ordering of acts by stochastic dominance and the subjective
21
Particular forms of the elliptical distribution include: normal distribution, student-t distribution, logistic distribution,
exponential power distribution and laplace distribution.
22
Notice that, since probability values are bounded between 0 and 1, truncated elliptical distributions are considered.
23
Sarin and Wakker (1992) also extend the notion of stochastic dominance to uncertainty with respect to probabilities.
They define “that an act f stochastically (“cumulatively”) dominates an act g if the DM regards each cumulative
consequence set at least as likely under f as under g”.
12
ordering by the DM’s preferences concerning ambiguity. The following theorem settles this relation.
Theorem 4. Suppose f, g ∈ F0 are two primary acts under which the expected probabilities of each
consequence x ∈ X are identical. Act f first-order stochastically dominates act g if and only if
any ambiguity-averse DM weakly prefers f to g, i.e., W (f ) ≥ W (g), under every increasing utility
function U and every twice-differentiable increasing concave outlook function Γ.
The idea of stochastic dominance with respect to ambiguity can be developed further to define
second-order stochastic dominance.
Definition 4. Let f, g ∈ F0 be two primary acts under which the expected probabilities of each consequence x ∈ X are identical. Act f second-order stochastically (“cumulatively”) dominates act g with
respect to ambiguity if and only if
∫
x
∫
z
−∞ −∞
∫
E [φf (t)] Var [φf (t)] dtdz ≤
x
∫
z
−∞ −∞
E [φg (t)] Var [φg (t)] dtdz
for any x ∈ X .
Notice that, similarly to stochastic dominance with respect to risk, first-order stochastic dominance
with respect to ambiguity implies a second-order stochastic dominance. As with first-order stochastic dominance, it can be shown that there is a tight relationship between second-order stochastic
dominance with respect to ambiguity and preferences concerning ambiguity.
5
Ambiguity measurement
The well defined ordering of events by their degree of ambiguity paves the way for defining an ordering
of acts by their degree of ambiguity—a necessary step toward extracting a measure of ambiguity (over
acts). To inspect the impact of ambiguity, this ordering is preformed over acts with identical properties
except for their degree of ambiguity. That is, they have the same set of possible consequences with
the same expected probability (implying the same risk) such that the only difference between them
is the dispersion of probabilities around their expectation. The ordering of acts by their degree of
ambiguity, as revealed from the DM’s subjective choices, can then be defined as follows.
Definition 5. Let f, g ∈ F0 be two primary acts whose expected probabilities of any given consequence
x ∈ X are identical. Act g is more ambiguous than act f if and only if
f
%1 g,
by any ambiguity-averse DM.
13
To validate a measure of ambiguity, it has to be shown that ordering acts by the proposed measure
coincides with the ordering provided by a DM. The next theorem proposes a new measure of ambiguity.
It asserts that the degree of ambiguity associated with an act can be measured by the expected
volatility of its related probabilities.
Theorem 5. Assume an ambiguity-averse DM whose preferences satisfy the conditions of Theorem 2
and whose reference point is k = −∞. Then
f %1 g
⇐⇒
f2 [f ] ≤ f2 [g] ,
where
∫
f2 [f ] = 4
X
E [φf (x)] Var [φf (x)] dx,
f, g ∈ F0 are primary acts under which the expected probabilities of each consequence x ∈ X are
identical, and where the probabilities of each consequence x ∈ X are uniformly or elliptically distributed
(with the same characteristic generator).
This theorem ties the measure of ambiguity, denoted f2 (mho2 ), to preferences concerning ambiguity. The idea that ambiguity—the uncertainty about probabilities—takes the form of probability
perturbations and that aversion to ambiguity takes the form of aversion to mean-preserving spreads
in probabilities underpins the construction of f2 . Thereby, just as the degree of risk can be measured
by the volatility of outcomes, so too can the degree of ambiguity be measured by the volatility of
probabilities. Theorem 5 proves that if two acts are identical except in their degree of ambiguity, then
any ambiguity-averse DM prefers the act with the lower f2 over the act with the higher f2 . That
is, she prefers the act whose associated probabilities are on average less volatile (less spread) over
the act whose associated probabilities are on average more volatile (more spread).24 Therefore, the
measure f2 aggregates the variances of probabilities, which measures the dispersions of probabilities
of each outcome, while assigning the variance of the probability of each outcome a weight relative to
its expected probability.
Theorem 5 assumes that the DM’s reference point is k = −∞, which means that all outcomes are
considered favorable such that all outcomes are assigned with a positive utility. This can be viewed
as if the utility function is normalized such the minimal utility is 0. Note that when the utility is
always positive the Choquet expected utility of Schmeidler (1989) is obtained. This assumption can be
replaced by the assumption that the outcomes of acts are symmetrically distributed. Such a symmetry
in a framework with uncertain probabilities is defined as follows.
24
Jewitt and Mukerji (2011), for example, study the ranking of ambiguous acts as revealed by preferences, based upon
the smooth model of Klibanoff et al. (2005).
14
Definition 6. The outcomes of an act f ∈ F0 are said to be symmetrically distributed around a point
of symmetry k if
E [φf (k − x)] = E [φf (k + x)]
Var [φf (k − x)] = Var [φf (k + x)]
and
for any x ∈ X .
With this definition of symmetry in place, Theorem 5 can be restated. It is important to note
that for measuring ambiguity by the following theorem, the more restrictive assumption of normally
distributed outcomes, which allows measuring risk by variance, can be relaxed to only symmetry.
Theorem 6. Assume an ambiguity-averse DM whose preferences satisfy the conditions of Theorem 2.
Then
f %1 g
⇐⇒
f2 [f ] ≤ f2 [g] ,
where f, g ∈ F0 are primary acts whose outcomes are symmetrically distributed around a reference
point k, with identical expected probabilities of each consequence x ∈ X , and where the probabilities
of each consequence x ∈ X are uniformly or elliptically distributed (with the same characteristic
generator).
The measure of ambiguity f2 carries the unites of squared probabilities. A normalized (to the
units of probability) measure can then be simply defined by
∫
f [f ] = 2
X
E [φf (x)] Var [φf (x)] dx.
The measure f2 in Theorems 5 and 6 considers acts taking infinitely many values in an infinite state
space. This measure, however, can be applied to acts taking finitely many values in a finite state
space. In this case it takes the form
f2 [f ] = 4
∑
E [φf (xi )] Var [φf (xi )] .
i
This measure can also be applied to any nonempty subset Y ⊂ X ⊆ R of consequences
∫
f [f, Y] = 4
2
Y
E [φf (y)] Var [φf (y)] dy,
or to any given event E ∈ E
∫
f [f, E] = 4
2
f −1 (x)∈E
E [φf (x)] Var [φf (x)] dx.
Similarly to risk, stochastic dominance (with respect to ambiguity) is closely related to ambiguity
measurement. The next proposition ties the notion of stochastic dominance with respect to ambiguity
and the measure of ambiguity f2 .
15
Proposition 3. Suppose that the conditions of Theorem 5 or of Theorem 6 are satisfied. Let f, g ∈ F0
be two primary acts under which the expected probabilities of each consequence x ∈ X are identical
and are uniformly or elliptically distributed (with the same characteristic generator). Then, g is more
ambiguous than f , i.e., f2 [g] ≥ f2 [f ], if and only if g is first-order stochastically dominated by f
with respect to ambiguity.
This proposition implies that if for any consequence the cumulative probability uncertainty associated with act f is preferred (by an ambiguity-verse DM) to the cumulative probability uncertainty
associated with act g, then g is more ambiguous than f . It shows that the ordering of acts by firstorder stochastic dominance with respect to ambiguity coincides with their ordering by the degree of
ambiguity, measured by f2 . The next section further investigates the special properties of f2 .
Properties of f2
6
It is worth opening this section with an example that demonstrates some properties of the measure f2 .
Consider, first, a large urn with 30 colored balls which are either black or yellow, in an unknown
proportion. Assume that drawing a black ball (B) entitles the DM to a sum of $0, and a yellow
ball (Y ) entitles her to a sum of $1. The probability of B (an unfavorable event) can be one of the
values
0 1
30
30 , 30 , . . . , 30 ,
where the DM is assumed to act as if each is equally likely. Thus, the degree of
ambiguity (in units of probability) is f = 0.596. Now, consider a smaller urn with only 10 colored
balls which are either black or yellow, in an unknown proportion. The ambiguity associated with a
bet on the color of a ball drawn from this urn is higher than in the large urn; it is f = 0.632. If
there is only one ball in the urn, of unknown color, then f = 1, and in the other extreme case, if
there is an infinite number of balls in the urn, then f =
√1 .
3
Table 2 is a stylized description of these
variations. The perceived probabilities and the value of each alternative are computed, respectively,
by Equations (3) and (2), assuming a DM whose preferences concerning ambiguity are represented
by Γ (P (E)) =
»
P (E), her preferences concerning risk are represented by U (c) = 1 − e−c , where c
stands for consumption, and her reference point is k = 0.25
It can be observed that a larger number of possible probability values, which in this case are
uniformly spread over the interval [0, 1], implies a lower degree of ambiguity. To see the intuition for
this, notice that, since the consequence of the favorable (unfavorable) event is identical for all bets,
when the DM makes her choice over urns, she actually bets on the composition of the urn rather
than on the consequence. Suppose she chooses to bet on the 10-ball urn and that her probability
(proportion of balls) assessment is wrong. The minimal size of her error (in terms of probability) is
25
The utility function U is normalized such that U (k) = 0 , when k = 0.
16
#Balls
Total
Y
B
P
Q
V
f
0.500
0.316
0.000
0.445
0.281
0.577
0.435
0.275
0.596
0.417
0.263
0.632
0.250
0.158
1.000
30
15
15
15
30
∞
0...∞
0...∞
0...1
30
0, . . . , 30
0, . . . , 30
10
0, . . . , 10
0, . . . , 10
1
0, 1
0, 1
0
30 ,
0
10 ,
1
30 , . . . ,
1
10 , . . . ,
0, 1
30
30
10
10
Table 1: Degrees of ambiguity
1
10 .
If, however, she chooses to bet on the 30-ball urn and her probability assessment is wrong, the
minimal size of her error is only
1
30 .
Accordingly, the degree of ambiguity of the 30-ball urn is lower
than the degree of ambiguity of the 10-ball urn. The next observation defines the property that arises
from this intuition formally.
Observation 1. Suppose a finite set of probability measures P such that the probability density φ (x)
of each x ∈ X is uniformly distributed over [ax , bx ]. Then, the higher the cardinality n of P, the lower
the degree of ambiguity f2 [f ] of any f ∈ F0 .
The next observation identifies the range of possible values that the ambiguity measure f2 can
obtain.
Observation 2. The values of the ambiguity measure f2 are always between 0 and 1.
The minimal possible degree of ambiguity, f2 = 0, is attained when all probabilities are perfectly
known. The maximal possible degree of ambiguity, f2 = 1, is attained when there are two possible
outcomes and the probability of each is either 0 or 1 with equal odds. In this most extreme case, the
weighted cumulative variance of probabilities attains its maximal possible value, 14 ; see Observation 2.
Variances of probabilities are therefore normalized by 4 to provide an ambiguity measure ranging
between 0 and 1.
It is important to note that f2 is an objective measure of ambiguity. It does not depend upon
the reference point, k, which determines the sets of unfavorable and favorable events. It also does not
depend upon the DM’s subjective preferences. However, the most important property of f2 is stake
independency. Given an event, its degree of ambiguity, measured by f2 , is invariant to the consequence
of this event. That is, given an event, changing its associated (by an act) consequence does not affect
the degree of ambiguity of this event. Consider, for example, an event with an unknown probability of
winning $100. Changing the magnitude of gain to $1000 does not affect either its perceived probability
17
or its degree of ambiguity. This property of stake independency is of primary importance, as it allows
for the measurement of ambiguity independently of risk.
An interesting property of the measure f2 is that ambiguity may be “canceled out”.26 This
can happen when composing a “portfolio” of acts. To see this, consider the two binary acts f =
Ä
ä
Ä
ä
E : 1, E C : 2 and g = E C : 1, E : 2 , where E C stands for the complementary event of E. Even if
separately each act has a strictly positive degree of ambiguity, a portfolio consisting of only these two
acts has a zero degree of ambiguity (and in this extreme case, also a zero degree of risk). The reason
is that E ∪ E C = S and the probability P (S) of S is always exactly one, which in turn implies that
the degree of ambiguity of the entire state space (measured by 4Var [P (S)]), as well as the degree
of ambiguity of an empty subset of the state space, is zero. In a less extreme scenario, consider two
Ä
ä
Ä
ä
acts f = E C : 0, E : x and g = F C : 0, F : x where E and F are mutually exclusive events. In
this case, the ambiguity associated with the outcome x may be lower under the portfolio {f, g} than
under each act f or act g separately. This may happen when the possible probabilities of events E
and F are negatively correlated. To see this, the ambiguity associated with x under {f, g} can be
written f2 [{f, g} , x] = f2 [{f, g} , E ∪ F ] = 4Var [Pf (E)] + 4Var [Pg (F )] + 8Cov [Pf (E) , Pg (F )] =
f2 [f, E] + f2 [g, F ] + 8Cov [Pf (E) , Pg (F )]. An important conclusion arises from this example is that
unpacking an event E ∪ F into disjoint components E and F with different outcomes increases its
cumulative ambiguity when Cov [P (E) , P (F )] < 0.27 Note that the ambiguity associated with a union
of an event and its complementary event is always zero, since the probability of an event is perfectly
negatively correlated with the probability of its complementary event; see Lemma 1.
The Ellsberg’s three-color experiment can also be viewed as demonstrating the effect that unpacking events has on the degree of ambiguity. In this experiment the DM is presented with an urn. She
is told that the urn contains 90 colored balls, 30 of them red and the others either black or yellow
in an unknown proportion. A ball will be drawn from the urn at random and the prize for a correct
bet is $100. The experiment consists of two parts. In the first part, the DM has to choose between
two bets: the next drawn ball is red (R), or the next drawn ball is black (B), formed respectively by
acts f and g. Then, in the second part, the DM has to choose between betting that the next drawn
ball is red or yellow (RY ) or, alternatively, that the next drawn ball is black or yellow (BY ), formed
respectively by acts f ∗ and g ∗ . The DM in this example does not have any information indicating
which of the possible urn compositions (probabilities) is more likely, and thus she acts as if she assigns
an equal weight to each possibility. The following table formalizes this experiment in terms of acts
26
This notion coincides with Epstein and Zhang’s (2001) and Siniscalchi’s (2009) notion of complementarity.
Support theory, of Tversky and Koehler (1994) and Rottenstreich and Tversky (1997), documents that the judged
probability of an event generally increases when its description is unpacked into disjoint components and decreases by
unpacking its alternative description.
27
18
and summarizes the degree of ambiguity associated with each event and each act. The ambiguity of
the events with the high payoff that are relevant to each act are underlined.
Event f
Prize ($)
Act f
Act
R
Y
B
R
Y
B
RY
BY
f
100
0
0
0.000
0.233
0.233
0.329
0.000
0.000
g
0
0
100
0.000
0.233
0.233
0.329
0.000
0.584
f
∗
100
100
0
0.000
0.233
0.233
0.329
0.000
0.584
g
∗
0
100
100
0.000
0.233
0.233
0.329
0.000
0.000
Table 2: Ellsberg’s three-color experiment
Behavioral experiments have demonstrated that individuals usually prefer R over B and BY over
RY ; formally, f %1 g and g ∗ %1 f ∗ .28 It can be observed from Table 2 that, aligned with Theorem 6,
f [f ] < f [g] and f [g ∗ ] < f [f ∗ ]. This means that DMs usually prefer the less ambiguous bet, implying
an ambiguity-aversion behavior. Table 2 demonstrates that under act g event BY is unpacked into
events B and Y such that the ambiguity associated with act g is higher than that associated with act
f . Under act f ∗ event BY can also be viewed as unpacked such that the ambiguity associated with
f ∗ is higher than that associated with act g ∗ .
The ambiguity measure f2 , extracted in Theorems 5 and 6, measures the degree of ambiguity
at the highest possible accuracy. It measures the volatility of probabilities in the resolution of each
possible outcome separately. Sometimes such a resolution is not required and a simpler measure can
be defined at the expense of accuracy. For example, at the expense of a loss of some information, the
measure of ambiguity can be applied over only the two fundamental events: unfavorable and favorable
events. That is,
f2 [f ] = 4Var [Pf (U F )] = 4Var [Pf (F V )] ,
where Pf (U F ) is the probability of the unfavorable event U F = {s ∈ S | f (s) ≤ k} under act f , and
Pf (F V ) is the probability of the favorable event F V = {s ∈ S | f (s) > k} under act f .
7
Alternative measures of ambiguity
Since the seminal works of Knight (1921) and Ellsberg (1961) several attempts have been made to
define a measure of ambiguity. Hansen et al. (1999), Hansen and Sargent (2001) and Maccheroni et
al. (2006), for example, interpret relative entropy as a measure of ambiguity (or of model uncertainty).
28
In expected utility theory, the DM’s assessments of the likelihoods of R, B and Y can be described by some probability
measure P. The DM is assumed to prefer a greater chance of winning $100 to a smaller chance of winning $100, such
that the choices above imply that P (R) > P (B) and P (B ∪ Y ) > P (R ∪ Y ). However, since R, B and Y are mutually
exclusive events, no such conventional probability measure exists; hence, it is considered a paradox.
19
Relative entropy, also called the Kullback-Leibler distance, is measured by the deviation of a probability distribution from a reference distribution (reference model). Formally, the relative entropy of
probability distribution P with respect to distribution Q is defined by
∫
DKL (P | Q) =
∞
−∞
p (x) ln
p (x)
dx,
q (x)
where p and q are respectively the probability densities of P and Q. While the use of relative entropy
is restricted to cases of a single prior relative to a known true probability distribution, f2 can be
employed in cases of multiple priors where either a single true probability distribution does not exist
or it is not known.
Sometimes the literature takes the variance of variance or the variance of mean as measures of
ambiguity (see, for example, Maccheroni et al. (2013)). The measure f2 is broader than either
of these measures in that it accounts for both, as well as for the variance of all higher moments
of the probability distribution (i.e., skewness, kurtosis, etc.) through the variance of probabilities.
Furthermore, f2 solves some major issues that arise from the exclusive use of either the variance of
variance or the variance of mean as measures of ambiguity.
To illustrate a major drawback of using the variance of mean as a measure of ambiguity, consider
the following two bets: a bet A with the outcomes x = (−1, 0, 1) and, respectively, probabilities
P = (0.5, 0, 0.5), and a bet B with the same outcomes, but with two equally likely possible probability
distributions P1 = (0.4, 0.2, 0.4) and P2 = (0.3, 0.4, 0.3). The expected outcome of bet A is EA [x] = 0.
The expected outcome of bet B is either EB [x | P1 ] = 0 or EB [x | P2 ] = 0, respectively contingent
upon the probability distributions P1 and P2 . Measuring ambiguity by the variance of mean indicates
that both A and B have a zero degree of ambiguity, i.e., both are unambiguous. However, by definition
(and as f2 indicates), B, which has a positive degree of ambiguity, is more ambiguous than A, which
clearly is unambiguous.
The use of the variance of variance as a measure of ambiguity also bears a major drawback.
To illustrate this, one can take the following example: a bet A with the outcomes x = (−1, 0, 1)
and, respectively, probabilities P = (0.48, 0.04, 0.48), and a bet B with the same outcomes but with
two equally likely possible probability distributions P1 = (0.6, 0, 0.4) and P2 = (0.4, 0, 0.6). The
variance of the outcomes of bet A is VarA [x] = 0.96. The variance of the outcomes of bet B is either
VarB [x | P1 ] = 0.96 or VarB [x | P2 ] = 0.96, respectively contingent upon the probability distributions
P1 and P2 . Measuring ambiguity by the variance of variance indicates that both A and B have a zero
degree of ambiguity. While, by definition (and as f2 indicates), B is more ambiguous than A.
Both variance of variance and variance of mean are functions of outcomes, which makes them
stake dependent. As such, neither of these two measures allow for the measurement of the degree
20
of ambiguity in isolation from the degree of risk. Consider, for example, an event with an unknown
probability of winning $100. One would expect that changing the magnitude of gain to $1000 affects
neither the perceived probability of that event nor its degree of ambiguity. This requirement, however,
is not satisfied by the variance of variance or by the variance of mean. Both will indicate that the
bet with the $1000 prize is more ambiguous than the bet with the $100 prize, even though both are
bets on the same event and the change of its associated outcome from $100 to $1000 does not provide
any new information about its likelihood. On the other hand, as a stake-independent measure of
ambiguity, f2 will indicate that both bets have the same degree of ambiguity. The reason is that,
while variance of variance and variance of mean are functions of outcomes, and therefore subject to
risk, f2 is solely a function of probabilities. This means that f2 is not affected by the magnitude
or the sign of consequences. Increasing or decreasing the consequences of an act does not change its
degree of ambiguity, but it does change its degree of risk. Stake independency is a major advantage
of f2 . This property is of primary importance because it allows for the measurement of the degree
of ambiguity independently of risk, as well as for the detection of the implications of ambiguity in
isolation from risk, in empirical and behavioral studies.
The point to emphasize is that a decision-making process considers not only the degree of ambiguity
but also the degree of risk. Hence, when making choices, these two factors jointly play a role. For
example, a consolidated uncertainty measure that aggregates risk and ambiguity can be defined by
√
Υ (x) =
Var [x]
,
1 − f2 [x]
where the variance of outcomes Var [x] is taken using expected probabilities (see, Izhakian (2012)).
Namely, the variance of outcomes is defined by
∫
Var [x] =
Ä
E [φ (x)] x − E [x]
ä2
dx,
and the expected outcome is defined by the double expectation (with respect to probabilities and to
outcomes)
∫
E [x] =
8
E [φ (x)] xdx.
Application for asset pricing
To demonstrate the qualities of f2 , this section presents an application of the theory to asset pricing.
The prices that financial decision makers (investors) are willing to pay for assets could be affected
by the fact that they do not know the precise probabilities of future returns. They might require a
premium for bearing ambiguity in addition to the premium they require for bearing risk.
The risk premium can be viewed as the premium that a DM is willing to pay for exchanging a risky
21
bet for its expected outcome. The ambiguity premium can be viewed as the premium she is willing to
pay for exchanging an ambiguous bet for a risky but unambiguous bet that has an identical expected
outcome.29 The uncertainty premium can be viewed as the total premium that a DM is willing to pay
for exchanging an ambiguous bet for its expected outcome, i.e., the accumulation of the risk premium
and the ambiguity premium. In this view, the uncertainty premium, denoted K, can be defined by
∫
Ç
å
Γ′′ (1 − E [P (x)])
U (E [x] − K) ≈
U (x) E [φ (x)] − ′
E [φ (x)] Var [φ (x)] dx +
Γ (1 − E [P (x)])
−∞
Ç
å
∫ ∞
Γ′′ (1 − E [P (x)])
U (x) E [φ (x)] + ′
E [φ (x)] Var [φ (x)] dx,
Γ (1 − E [P (x)])
k
k
(5)
where x is the outcome of some act f and φ (x) is its probability (under act f ); and C = E [x] − K
is the certainty equivalent satisfying C ∼1 f . That is, C is the constant sure outcome for which
the DM is willing to exchange a risky and ambiguous (uncertain) outcome of act f . The next theorem approximates the uncertainty premium and separates it into a risk premium and an ambiguity
premium.30
Theorem 7. Assume a DM whose preferences are characterized by a twice-differentiable utility function U and a twice-differentiable outlook function Γ. For relatively small outcomes with relatively small
probabilities the uncertainty premium is
ñ
ô
ó
1 U′′ (E [x])
Γ′′ (1 − E [P (x)]) î
K ≈ −
Var [x] − E
E |x − E [x]| f2 [x] ,
′
′
2 U (E [x])
Γ (1 − E [P (x)])
(6)
where the former is the risk premium and the latter is the ambiguity premium.31
Concerning financial decisions, consequences can be described by rates of return, denoted r. Assume a DM who decides to save one unit of wealth and invest it in a uncertain (risky and ambiguous)
portfolio. The uncertainty premium in this case takes the following form.32
Corollary 2. Suppose that the conditions of Theorem 7 hold. The uncertainty premium, in terms of
rate of return, takes the form
ñ
ô
ó
1 U′′ (1 + E [r])
Γ′′ (1 − E [P (r)]) î
K ≈ −
Var
[r]
−
E
E |r − E [r]| f2 [r] .
′
′
2 U (1 + E [r])
Γ (1 − E [P (r)])
(7)
This model (and Theorem 7) provides two distinctions. First, it distinguishes between risk and
ambiguity premiums such that these two premiums are orthogonal. Second, within each premium it
29
The ambiguity premium can also be viewed as the price that a DM is willing to pay for the information about the
true probabilities of events.
30
The proof of this theorem applies the same methodology to probabilities as used by Arrow (1965) and Pratt (1964)
for consequences.
î ′′
ó ∫
[
] ∫
′′
(1−E[P(x)])
(1−E[P(x)])
31
Formally, E ΓΓ′ (1−E[P(x)])
= X E [φ (x)] ΓΓ′ (1−E[P(x)])
dx and E |x − E [x]| = X E [φ (x)] |x − E [x]| dx.
32
This representation is obtained by applying the same development of Theorem 7, where x = 1 + r.
22
distinguishes between the sources of premiums—preferences and beliefs. The risk premium,
R≈−
1 U′′ (1 + E [r])
Var [r] ,
2 U′ (1 + E [r])
is the Arrow-Pratt risk premium. Independently, a higher risk, measured by Var [r], or a higher
′′
(·)
aversion to risk, measured by the coefficient of absolute risk aversion − UU′ (·)
, result in a greater risk
premium.
The ambiguity premium,
ñ
ô
ó
Γ′′ (1 − E [P (r)]) î
A ≈ −E
E |r − E [r]| f2 [r] ,
′
Γ (1 − E [P (r)])
possesses attributes resembling those of the risk premium, but with respect to probabilities rather
than to consequences. A complete separation between ambiguity, measured by f2 , and tastes for
′′
(·)
ambiguity, measured by the coefficient of absolute ambiguity aversion − ΓΓ′ (·)
, is achieved. Ambiguity
′′
′′
(·)
(·)
aversion (− ΓΓ′ (·)
> 0) implies a positive ambiguity premium. Ambiguity loving (− ΓΓ′ (·)
< 0) implies
′′
(·)
a negative premium. Ambiguity neutrality (− ΓΓ′ (·)
= 0) implies a zero premium, obtained also when
probabilities are perfectly known (i.e., when f2 = 0). Higher degree of ambiguity or a higher aversion
to ambiguity result in a greater ambiguity premium. The ambiguity premium is also a function of
the expected absolute deviation of outcomes from expectation. This component scales the ambiguity
premium to the units of outcomes. For example, one may consider the case of measuring the ambiguity
premium in terms of dollars versus in terms of percentage rate of return.
The next corollary shows the different premiums in the case of a DM typified by constant relative
risk aversion (CRRA) and CAAA.
Corollary 3. Suppose
 that the conditions of Theorem 7 hold, and assume a DM who is characterized


by CRRA, U (c) =
c1−γ −k1−γ
,
1−γ
γ ̸= 1

 ln (c) − ln (k) , γ = 1
, and CAAA, Γ (P (E)) = − e
−ηP(E)
η
.33 The uncertainty
premium is then
î
ó
1
K ≈ γ Var [r] + ηE |r − E [r]| f2 [r] .
2
Several studies have documented ambiguity-averse behavior concerning gains (favorable events)
and ambiguity-loving behavior concerning losses (unfavorable events); see, for example, Maffioletti
and Michele (2005), Abdellaoui et al. (2005), and Du and Budescu (2005). The ambiguity premium,
constructed in Corollary 2, can be refined to support different ambiguity preferences concerning un33
A more standard formulation of CRRA, U (c) =
normalized to U (k) = 0.
c1−γ
1−γ
for γ ̸= 1 and otherwise for γ = 1 U (c) = ln (c), is not always
23
favorable and favorable events. Allowing this flexibility, the ambiguity premium takes the form
ñ∫
A ≈ −
k
−∞
E [φ (r)]
Γ′′U F (1 − E [P (r)])
dr +
Γ′U F (1 − E [P (r)])
∫
ô
∞
E [φ (r)]
k
î
ó
Γ′′F V (1 − E [P (r)])
dr E |r − E [r]| f2 [r] ,
′
ΓF V (1 − E [P (r)])
where ΓU F (·) captures ambiguity preferences concerning unfavorable events and ΓF V (·) captures
ambiguity preferences concerning favorable events.
The implications of ambiguity for the equity premium have been studied mainly by focusing on
theoretical aspects. Chen and Epstein (2002), Izhakian and Benninga (2011), Ui (2011), and Maccheroni et al. (2013) add an ambiguity premium to the conventional risk premium.34 In these models
the ambiguity premium is also a function of risk attitude; whereas in the model of Equation (7), the
ambiguity premium is independent of risk attitude.
The pricing model of Equation (7) has been tested empirically by Brenner and Izhakian (2011).
This study of the risk–ambiguity–return relationship employs the measure of ambiguity, f2 , as an
explanatory factor of the aggregate return on the stock market. To do so, it assumes that each
subset of stock returns is generated by the choices of a single representative DM conditional upon a
different prior P within her subjective set of priors P.35 The probability distribution of returns in
each subset is then estimated to reveal the set of priors. Assuming some structure on second-order
beliefs, Brenner and Izhakian (2011) compute f2 from the data and investigate its effect on stock
market returns. They find that ambiguity has a significant impact on expected returns. Their study
provides a possible explanation for the equity premium puzzle, demonstrating that f2 can be useful in
empirical studies of the implications of ambiguity.
9
Conclusion
Almost any real-life decision entails ambiguity. Naturally, one of the first steps of a decision-making
process is to rank alternative choices by their degree of ambiguity. The key to addressing this need
is a simple well-defined measure of ambiguity. The search for such a measure that can quantify the
degree of ambiguity associated with different alternatives can be viewed as having started with the
seminal study of Knight (1921). The measure of ambiguity introduced in this paper aims to address
this need. Ambiguity in this paper takes the form of probability perturbation (uncertain probabilities)
and aversion to ambiguity the form of aversion to mean-preserving spreads in these probabilities. In
this view, just as the degree of risk can be measured by the volatility of outcomes, so too can the
degree of ambiguity be measured by the volatility of probabilities. This concept provides a natural
objective stake-independent ambiguity measure, denoted f2 , which is simply four times the expected
34
Segal and Spivak (1990) also analyze the ambiguity premium, which they call a premium of order 2.
A representative DM can be defined as an artificial DM whose tastes and beliefs are such that if all investors in the
economy had tastes and beliefs identical to hers the equilibrium in the economy remains unchanged; see, for example,
Constantinides (1982).
35
24
volatility of probabilities across the relevant events.
The measure of ambiguity f2 has two main qualities. First, it is simple, applicable and can be
used for the empirical measurement of the degree of ambiguity. Second, it is an objective stakeindependent measure. That is, it is independent of risk and independent of individuals’ preferences.
These qualities are of primary importance for introducing ambiguity into theoretical, behavioral and,
especially, empirical studies. The importance of ambiguity—the uncertainty about probabilities—
for understanding economic and financial decision processes has being recognized in the literature
for the past half century. Relevant studies have acknowledged that attempts to portrait a realistic
picture of observable phenomena and anomalies should consider also the dimension of uncertainty with
respect to probabilities. Accounting for ambiguity might shed light on many economic and financial
phenomena that previously could not be fully explained. The measure of ambiguity introduced in
this paper can be employed for this mission. For example, it can be employed for investigating the
nature of the risk-ambiguity relationship and its implication for optimal decision making. Hopefully,
this measure will pave the way not only for the introduction of ambiguity into empirical studies, but
also for the expansion of theoretical and behavioural studies regarding the nature of ambiguity and
related preferences.
25
References
Abdellaoui, M., F. Vossmann, and M. Weber (2005) “Choice-Based Elicitation and Decomposition of Decision
Weights for Gains and Losses Under Uncertainty.,” Management Science, Vol. 51, No. 9, pp. 1384–1399.
Anderson, E. W., E. Ghysels, and J. L. Juergens (2009) “The Impact of Risk and Uncertainty on Expected
Returns,” Journal of Financial Economics, Vol. 94, No. 2, pp. 233–263.
Arrow, K. J. (1965) Aspects of the Theory of Risk Bearing, Helsinki: Yrjo Jahnssonin Saatio.
Bewley, T. F. (2011) “Knightian Decision Theory and Econometric Inferences,” Journal of Economic Theory,
Vol. 146, No. 3, pp. 1134–1147.
Bollerslev, T., R. F. Engle, and J. M. Wooldridge (1988) “A Capital Asset Pricing Model with Time-Varying
Covariances,” Journal of Political Economy, Vol. 96, No. 1, pp. 116–131.
Bollerslev, T., N. Sizova, and G. Tauchen (2011) “Volatility in Equilibrium: Asymmetries and Dynamic Dependencies,” Review of Finance, Vol. 16, No. 1, pp. 31–80.
Boyle, P. P., L. Garlappi, R. Uppal, and T. Wang (2011) “Keynes Meets Markowitz: The Tradeoff Between
Familiarity and Diversification,” Management Science, Vol. 58, pp. 1–20.
Brenner, M. and Y. Izhakian (2011) “Asset Prices and Ambiguity: Empirical Evidence,” Stern School of
Business, Finance Working Paper Series, FIN-11-010.
(2012) “Pricing Systematic Ambiguity in Capital Markets,” Stern School of Business, Finance Working
Paper Series, FIN-12-008.
Chen, Z. and L. Epstein (2002) “Ambiguity, Risk, and Asset Returns in Continuous Time,” Econometrica, Vol.
70, No. 4, pp. 1403–1443.
Constantinides, G. M. (1982) “Intertemporal Asset Pricing with Heterogeneous Consumers and without Demand
Aggregation,” The Journal of Business, Vol. 55, No. 2, pp. 253–67.
Coval, J. D. and T. J. Moskowitz (1999) “Home Bias at Home: Local Equity Preference in Domestic Portfolios,”
The Journal of Finance, Vol. 54, No. 6, pp. 2045–2073.
Dow, J. and S. R. d. C. Werlang (1992) “Uncertainty Aversion, Risk Aversion, and the Optimal Choice of
Portfolio,” Econometrica, Vol. 60, No. 1, pp. 197–204.
Du, N. and D. V. Budescu (2005) “The Effects of Imprecise Probabilities and Outcomes in Evaluating Investment
Options,” Management Science, Vol. 51, No. 12, pp. 1791–1803.
Ellsberg, D. (1961) “Risk, Ambiguity, and the Savage Axioms,” Quarterly Journal of Economics, Vol. 75, No.
4, pp. 643–669.
Epstein, L. G. and S. Ji (2013) “Ambiguous Volatility and Asset Pricing in Continuous Time,” Review of
Financial Studies, Vol. 26, No. 7, pp. 1740–1786.
Epstein, L. G. and M. Schneider (2008) “Ambiguity, Information Quality, and Asset Pricing,” The Journal of
Finance, Vol. 63, No. 1, pp. 197–228.
Epstein, L. G. and J. Zhang (2001) “Subjective Probabilities on Subjectively Unambiguous Events,” Econometrica, Vol. 69, No. 2, pp. 265–306.
Fern´andez-Villaverde, J., P. Guerr´on-Quintana, J. F. Rubio-Ram´ırez, and M. Uribe (2010) “Risk Matters: The
Real Effects of Volatility Shocks,” American Economic Review, Vol. 101, pp. 2530–2561.
Gilboa, I. (1987) “Expected Utility with Purely Subjective Non-Additive Probabilities,” Journal of Mathematical
Economics, Vol. 16, No. 1, pp. 65–88.
Gilboa, I. and D. Schmeidler (1989) “Maxmin Expected Utility with Non-Unique Prior,” Journal of Mathematical Economics, Vol. 18, No. 2, pp. 141–153.
Goetzmann, W. N. and A. Kumar (2008) “Equity Portfolio Diversification,” Review of Finance, Vol. 12, No. 3,
pp. 433–463.
Goldberger, A. (1991) A Course in Econometrics: Harvard University Press, 1st edition.
26
Hansen, L. P. and T. J. Sargent (2001) “Robust Control and Model Uncertainty,” American Economic Review,
Vol. 91, No. 2, pp. 60–66.
Hansen, L. P., T. J. Sargent, and T. D. Tallarini (1999) “Robust Permanent Income and Pricing,” The Review
of Economic Studies, Vol. 66, No. 4, pp. 873–907.
Izhakian, Y. (2012) “Capital Asset Pricing under Ambiguity,” Stern School of Business, Economics Working
Paper Series, ECN-12-02.
(2014) “Expected Utility with Uncertain Probabilies Theory,” SSRN eLibrary, 2017944.
Izhakian, Y. and S. Benninga (2011) “The Uncertainty Premium in an Ambiguous Economy,” The Quarterly
Journal of Finance, Vol. 1, pp. 323–354.
Jewitt, I. and S. Mukerji (2011) “Ordering Ambiguous Acts,” University of Oxford, Department of Economics,
Economics Series Working Papers.
Ju, N. and J. Miao (2012) “Ambiguity, Learning, and Asset Returns,” Econometrica, Vol. 80, pp. 559–591.
Klibanoff, P., M. Marinacci, and S. Mukerji (2005) “A Smooth Model of Decision Making under Ambiguity,”
Econometrica, Vol. 73, No. 6, pp. 1849–1892.
Knight, F. M. (1921) Risk, Uncertainty and Profit, Boston: Houghton Mifflin.
Maccheroni, F., M. Marinacci, and D. Ruffino (2013) “Alpha as Ambiguity: Robust Mean-Variance Portfolio
Analysis,” Econometrica, Vol. 81, pp. 1075–1113.
Maccheroni, F., M. Marinacci, and A. Rustichini (2006) “Ambiguity Aversion, Robustness, and the Variational
Representation of Preferences,” Econometrica, Vol. 74, No. 6, pp. 1447–1498.
Maffioletti, A. and M. Santoni (2005) “Do Trade Union Leaders Violate Subjective Expected Utility? Some
Insights From Experimental Data,” Theory and Decision, Vol. 59, No. 3, pp. 207–253.
Mehra, R. and E. C. Prescott (1985) “The Equity Premium: A Puzzle,” Journal of Monetary Economics, Vol.
15, No. 2, pp. 145–161.
Owen, J. and R. Rabinovitch (1983) “On the Class of Elliptical Distributions and Their Applications to the
Theory of Portfolio Choice,” The Journal of Finance, Vol. 38, No. 3, pp. 745–52.
Pratt, J. W. (1964) “Risk Aversion in the Small and in the Large,” Econometrica, Vol. 32, No. 1/2, pp. 122–136.
Rothschild, M. and J. E. Stiglitz (1970) “Increasing Risk: I. A Definition,” Journal of Economic Theory, Vol.
2, No. 3, pp. 225–243.
Rottenstreich, Y. and A. Tversky (1997) “Unpacking, Repacking, and Anchoring: Advances in Support Theory,”
Psychological Review, Vol. 104, No. 2, pp. 406–415.
Sarin, R. K. and P. P. Wakker (1992) “A Simple Axiomatization of Nonadditive Expected Utility,” Econometrica,
Vol. 60, No. 6, pp. 1255–1272.
Savage, L. J. (1954) The Foundations of Statistics, New York, USA: Wiley.
Schmeidler, D. (1989) “Subjective Probability and Expected Utility without Additivity,” Econometrica, Vol.
57, No. 3, pp. 571–587.
Segal, U. and A. Spivak (1990) “First Order Versus Second Order Risk Aversion,” Journal of Economic Theory,
Vol. 51, No. 1, pp. 111–125.
Shiller, R. J. (1981) “Do Stock Prices Move Too Much to be Justified by Subsequent Changes in Dividends?”
American Economic Review, Vol. 71, No. 3, pp. 421–436.
Siniscalchi, M. (2009) “Vector Expected Utility and Attitudes Toward Variation,” Econometrica, Vol. 77, No.
3, pp. 801–855.
Tversky, A. and D. Kahneman (1992) “Advances in Prospect Theory: Cumulative Representation of Uncertainty,” Journal of Risk and Uncertainty, Vol. 5, No. 4, pp. 297–323.
Tversky, A. and D. J. Koehler (1994) “Support Theory: A Nonextensional Representation of Subjective Probability,” Psychological Review, Vol. 101, pp. 547–567.
27
Ui, T. (2011) “The Ambiguity Premium vs. the Risk Premium under Limited Market Participation,” Review
of Finance, Vol. 15, No. 2, pp. 245–275.
Uppal, R. and T. Wang (2003) “Model Misspecification and Under Diversification,” The Journal of Finance,
Vol. 58, No. 1, pp. 2465–2486.
Wakker, P. and A. Tversky (1993) “An Axiomatization of Cumulative Prospect Theory,” Journal of Risk and
Uncertainty, Vol. 7, No. 2, pp. 147–175.
Wakker, P. (2010) Prospect Theory: For Risk and Ambiguity: Cambridge University Press.
Weil, P. (1989) “The Equity Premium Puzzle and The Risk-Free Rate Puzzle,” Journal of Monetary Economics,
Vol. 24, No. 3, pp. 401–421.
28
Appendix
Lemma 1. The covariance between the probability of event E and the probability of its complementary
event E C satisfies
î
Ä
Cov P (E) , P E C
î
Ä
where Cov P (E) , P E C
äó
=
∫ Ä
P
äó
î Ä
= −Var [P (E)] = −Var P E C
P (E) − E [P (E)]
äÄ Ä
ä
äó
î Ä
P EC − E P EC
,
äó ä
dχ is the covariance of
the probabilities of events E and E C .
Lemma 2. Assume a twice-differentiable outlook function Γ, satisfying
1
2
Ç
å
Γ′′ (E [P (F )])
Γ′′ (E [P (E ∪ F )])
Var
[P
(F
)]
−
Var [P (E ∪ F )]
Γ′ (E [P (F )])
Γ′ (E [P (E ∪ F )])
≤ E [P (E)]
for any events E, F ⊆ S. Then
Q(F ) ≤ Q(E ∪ F ).
Lemma 3.
d
dφ (x)
∫
x
−∞
φ (z) dz =
φ (x)
φ′ (x)
ˆ whose resulting probabilities are uniformly disLemma 4. Assume two secondary acts δˆE , δˆF ∈ ∆,
tributed or elliptically distributed with an identical characteristic generator, and have an identical
expectation, i.e., E [P (E)] = E [P (F )]. Let Std [P (E)] and Std [P (F )] be, respectively, the standard
deviations of their resulting probabilities. Then
P (F ) − E [P (F )] =d λ (P (E) − E [P (E)]) ,
where λ =
Std[P(F )]
Std[P(E)] .
Lemma 5. Let Z and ϵ be two random variables. If ϵ is mean-independent of z, then
E [Zϵ] = E [Z] E [ϵ] .
Lemma 6. Let Y and X be two random variables. If Y is mean-independent of X, then Y is also
mean-independent of Z = h(X), where h : R → R.
Lemma 7. The following mean-independencies hold:
(i) (φ (x) − E [φ (x)])2 is mean-independent of φ (x), implying that
î
ó
î
ó
E φ (x) (φ (x) − E [φ (x)])2 = E [φ (x)] E (φ (x) − E [φ (x)])2 ;
(ii) Var [φ (x)] is mean-independent of x, implying that E [xVar [φ (x)]] = E [x] E [Var [φ (x)]];
(iii) Var [φ (x)] is mean-independent of P (x), implying that
29
E [P (x) Var [φ (x)]] = E [P (x)] E [Var [φ (x)]];
(iv) |x − E [x]| is mean-independent of E [P (x)], implying that
î
ó
î
ó î
ó
E E [P (x)] |x − E [x]| = E E [P (x)] E |x − E [x]| .
Lemma 8. If Y is mean-independent of X, and φY , φX , φY,X exist, then
∫
∫
k
∫
k
−∞ −∞
φY,X (y, x) yxdydx =
∫
k
φY (y) ydy
−∞
k
−∞
φX (x) xdx
and
∫
∞∫ ∞
∫
∫
∞
φY,X (y, x) yxdydx =
k
∞
φY (y) ydy
k
k
φX (x) xdx,
k
for any k ∈ R.
Ä
Ä
ä
Since P (E) is additive, P E C = 1 − P (E). Thus, the covariance between
Proof of Lemma 1.
ä
P (E) and P E C can be written
î
Ä
Cov P (E) , P E
C
∫
äó
=
∫P
=
P
Ä Ä
ä
î Ä
(P (E) − E [P (E)]) P E C − E P E C
äóä
dχ
(P (E) − E [P (E)]) (E [P (E)] − P (E)) dχ,
and therefore
î
Ä
Cov P (E) , P E C
äó
= −Var [P (E)] .
The second equality is obtained by
∫
Var [P (E)] =
P
(P (E) − E [P (E)])2 dχ =
Proof of Lemma 2.
∫ Ä Ä
ä
î Ä
P EC − E P EC
P
äóä2
î Ä
dχ = Var P E C
äó
.
By Theorem 1,
1 Γ′′ (E [P (E ∪ F )])
Var [P (E ∪ F )] −
2 Γ′ (E [P (E ∪ F )])
1 Γ′′ (E [P (F )])
E [P (F )] −
Var [P (F )]
2 Γ′ (E [P (F )])
1 Γ′′ (E [P (E ∪ F )])
1 Γ′′ (E [P (F )])
= E [P (E)] +
Var
[P
(E
∪
F
)]
−
Var [P (F )] ,
2 Γ′ (E [P (E ∪ F )])
2 Γ′ (E [P (F )])
Q(E ∪ F ) − Q(F ) ≈ E [P (E)] + E [P (F )] +
which is nonnegative by the Lemma’s hypothesis.
Proof of Lemma 3.
∫
Let u = φ (z), then changing the integration variable provides
∫
x
−∞
φ(x)
φ (z) dz =
φ(−∞)
u
du =
′
φ (z)
∫
φ(x)
u
φ(−∞)
φ′ (φ−1 (u))
Differentiating with respect to φ(x) gives
d
dφ(x)
∫
φ(x)
φ(−∞)
du.
φ(x)
= ′
du
=
.
′
−1
′
−1
φ (φ (u))
φ (φ (u)) u=φ(x) φ (x)
u
u
30
Proof of Lemma 4.
Let y = P (E) − E [P (E)] and z = P (F ) − E [P (F )], and assume that event
F is more ambiguous than event E. To show that z =d λy it has to be proved that λy and z have an
identical probability characteristic function.
Consider, first, the case of uniformly distributed y and z. Since E [y] = 0 and E [z] = 0, then
y ∈ [−ay , ay ] and z ∈ [−az , az ], where ay and az are nonnegative. The characteristic function of z and
λy are, respectively,
ϕz (t) =
eitaz − e−itaz
2itaz
and
ϕλy (t) =
eitλay − e−itλay
.
2itλay
Since E [z] = E [y] = 0 and y and z are uniformly distributed, one can write their standard deviations
Ã
Std [z]
λ=
Std [y]
=
(2az )2 /12
(2ay )2 /12
to show that az = λay . This implies that ϕz (t) = ϕλy (t), and therefore z =d λy.
Consider now the case of elliptically distributed z and λy. That is,
z ∼ el (E [z] , Var [z] , Ψ)
Ä
ä
λy ∼ el λE [y] , λ2 Var [y] , Ψ .
and
Therefore, the characteristic function of z and λy are respectively
Å
ϕz (t) = eitE[z] Ψ
ã
1 2
t Var [z]
2
Å
ϕλy (t) = eitλE[y] Ψ
and
ã
1 2 2
t λ Var [y] .
2
Since ϕz and ϕλy have an identical characteristic generator Ψ, E [z] = λE [y] = 0 and Std [z] = λStd [y],
then ϕz = ϕλy , which implies z =d λy.
Proof of Lemma 5.
The expectation E [Zϵ] of Zϵ over the joint distribution of Z and ϵ can be
taken first over the distribution of ϵ conditional upon Z, and then over the marginal distribution of
Z. That is,
E [Zϵ] = E [E [Zϵ|Z]] .
Then, Z can be passed out of the inner expectation, implying that
E [Zϵ] = E [ZE [ϵ|Z]] .
[ ]
By mean-independence E ϵZ = E [ϵ]. Therefore,
E [Zϵ] = E [Z] E [ϵ] .
Proof of Lemma 6.
See, Goldberger (1991) page 61, M1.
31
Proof of Lemma 7.
(i) One can write
φ (x) − E [φ (x)] =d E [φ (x)] − E [φ (x)] + ϵ.
Since E [ϵ|φ (x)] = E [ϵ] = 0, clearly ϵ and φ (x) are mean-independent. Let Var [φ (x)] = σ 2 , then
[
]
[ ]
ó
î
by construction E ϵ2 |φ (x) = σ 2 = E ϵ2 . That is, ϵ2 is mean independent of φ (x), implying that
î
ó
E φ (x) (φ (x) − E [φ (x)])2 = E [φ (x)] E (φ (x) − E [φ (x)])2 .
(ii) Writing the conditional expectation of Var [φ (x)] explicitly, provides
ó
î î
ó
E [Var [φ (x)] |x] = E E (φ (x) − E [φ (x)])2 |x .
By the law of iterated expectation36
î î
ó
ó
E E (φ (x) − E [φ (x)])2 |x
î î
óó
= E E (φ (x) − E [φ (x)])2 |x .
By (i), (φ (x) − E [φ (x)])2 is mean-independent of φ (x). By Lemma 6, (φ (x) − E [φ (x)])2 is also
mean-independent of φ−1 (φ (x)) = x. Therefore,
î î
E E (φ (x) − E [φ (x)])2 |x
óó
î î
= E E (φ (x) − E [φ (x)])2
óó
,
implying that E [xVar [φ (x)]] = E [x] E [Var [φ (x)]];
(iii) By (ii), Var [φ (x)] is mean-independent of x. By Lemma 6, Var [φ (x)] is also mean-independent
of the function P (x) of x. Therefore, E [P (x) Var [φ (x)]] = E [P (x)] E [Var [φ (x)]].
(iv) Write
x − E [x] =d E [x] − E [x] + ϵ.
Since E [ϵ|x] = E [ϵ] = 0, clearly ϵ is mean-independent x. By Lemma 6, ϵ is also mean-independent
E [P (x)]. Therefore, E [ϵ|E [P (x)]] = E [ϵ]. Mean-independent implies uncorrelatedness; see, for example, Goldberger (1991), page 63, M2. Therefore, E [ϵE [P (x)]] = E [ϵ] E [E [P (x)]]. Since E [P (x)] ≥ 0
for every x, E [ϵE [P (x)]] = E [|ϵ| E [P (x)]] = E [|ϵ|] E [E [P (x)]]. Substituting for ϵ = x − E [x]
completes the proof.
Proof of Lemma 8.
Define Z = h(X) such that z = x if x ≤ k and otherwise z = 0. Since
Y is mean-independent of X, by Lemma 6, it is also mean-independent of Z. Therefore, E [Y Z] =
E [Y ] E [Z] . Writing the expectation explicitly, provides
∫
∞
∫
−∞ −∞
36
∫
∞
φY,Z (y, z) yzdydz =
∫
k
−∞
φY (y) ydy
See, for example, Goldberger (1991) page 47, T8.
32
∫
k
−∞
∫
∞
φY (y) ydy
φZ (z) zdz +
k
k
−∞
φZ (z) zdz,
which implies
∫
∫
k
∫
k
−∞ −∞
φY,X (y, x) yxdydx =
∫
k
−∞
φY (y) ydy
k
−∞
φX (x) xdx.
The second part is proved similarly.
Proof of Theorem 1.
The perceived probability, Q(E), of event E ∈ E can be written
Q(E) = Γ
−1
(Γ (E [P (E)] − Λ)) = Γ
−1
Å∫
ã
P
Γ (P (E)) dχ ,
(8)
for some Λ ∈ R. Taking the first-order Taylor approximation of Γ (E [P (E)] − Λ) around E [P (E)]
yields
Γ (E [P (E)] − Λ) ≈ Γ (E [P (E)]) − ΛΓ′ (E [P (E)]) .
(9)
The second-order Taylor approximation of Γ (P (E)) in Equation (8) around E [P (E)] is
Γ (P (E)) ≈ Γ (E [P (E)]) + Γ′ (E [P (E)]) (P (E) − E [P (E)])
1
+ Γ′′ (E [P (E)]) (P (E) − E [P (E)])2 .
2
(10)
Since Γ (E [P (E)]), Γ′ (E [P (E)]) and Γ′′ (E [P (E)]) are constants, the expectation of Equation (10) is
∫
1
Γ (P (E)) dχ ≈ Γ (E [P (E)]) + Γ′′ (E [P (E)]) Var [P (E)] .
2
P
(11)
Equating (9) to (11) and organizing terms yields
Λ ≈ −
1 Γ′′ (E [P (E)])
Var [P (E)] .
2 Γ′ (E [P (E)])
Substituting Λ into Equation (8), together with Lemma 2 (that assures nonnegativity), proves the
theorem.
Proof of Theorem 2.
By Wakker and Tversky (1993, Equation 6.1), the dual representation of
Equation (2) can be written
∫
W (f ) = −
∫−∞
∞
+
ï
k
U (x) d Γ
−1
Å∫
Å∫ P
ï
U (x) d Γ−1
P
k
ã
ò
Γ (1 − Pf (x)) dχ − 1
(12)
ãò
Γ (1 − Pf (x)) dχ
.
The subscript f can be omitted to write
ñ
Γ′ (1 − P (x)) φ (x)
d −1
Γ (E [Γ (1 − P (x))]) = E − ′ −1
dx
Γ (Γ (E [Γ (1 − P (x))]))
and to denote
D (φ (x)) = −
Γ′ (1 − P (x)) φ (x)
.
Γ′ (Γ−1 (E [Γ (1 − P (x))]))
33
ô
By Lemma 3, differentiating D with respect to φ (x) provides
Γ′′ (1 − P (x)) φ2 (x) − Γ′ (1 − P (x))
−
Γ′ (Γ−1 (E [Γ (1 − P (x))]))
(
)
Γ′ (1 − P (x)) φ (x) Γ′′ Γ−1 (E [Γ (1 − P (x))]) E [Γ′ (1 − P (x)) φ (x)]
.
(Γ′ (Γ−1 (E [Γ (1 − P (x))])))3
d
D (φ (x)) =
dφ (x)
∫
Notice that, since φ (z) is additive,
x
−∞
E [φ (z)] dz = E [P (x)]. Taking the first-order Taylor approx-
imation of D with respect to φ (x) around E [φ (x)] provides
ñ
ô
d
E [D (φ (x))] ≈ D (E [φ (x)]) + E
D (E [φ (x)]) (φ (x) − E [φ (x)])
dφ (x)
ó
Γ′′ (1 − E [P (x)]) î
= −E [φ (x)] + ′
E φ (x) (φ (x) − E [φ (x)])2 .
Γ (1 − E [P (x)])
î
ó
By Lemma 7, (φ (x) − E [φ (x)])2 is mean-independent of φ (x), which implies E φ (x) (φ (x) − E [φ (x)])2 =
ó
î
E [φ (x)] E (φ (x) − E [φ (x)])2 . Therefore,
E [D (φ (x))] ≈ −E [φ (x)] +
Γ′′ (1 − E [P (x)])
E [φ (x)] Var [φ (x)] .
Γ′ (1 − E [P (x)])
Substituting for E [D (φ (x))] in Equation (12), while accounting for the sign switch of U (x) E [φ (x)]
when moving from negative to positive utility across k (see Wakker and Tversky (1993)), provides
∫
å
Ç
Γ′′ (1 − E [P (x)])
E [φ (x)] Var [φ (x)] dx +
W (f ) ≈
U (x) E [φ (x)] − ′
Γ (1 − E [P (x)])
−∞
Ç
å
∫ ∞
Γ′′ (1 − E [P (x)])
U (x) E [φ (x)] + ′
E [φ (x)] Var [φ (x)] dx.
Γ (1 − E [P (x)])
k
Proof of Theorem 3.
k
This proof considers ambiguity aversion; the proof for ambiguity loving is
similar.
(i+ii) Let y = P (E)−E [P (E)] and z = P (F )−E [P (F )], and assume that event F is more ambiguous
than event E. Then, by Definition 2, z =d y + ϵ, where ϵ is mean-independent of y; therefore
Var [P (F )] = Var [P (E)] + Var [ϵ] .
For the opposite direction, assume that Var [P (F )] ≥ Var [P (E)] and define λ =
Std[P(F )]
Std[P(E)]
≥ 1. By the
distributions of P (E) and P (F ), the random variables y and z are either uniformly distributed or
elliptically distributed with an identical characteristic generator and E [z] = E [y] = 0. The random
variable λy has the same characteristic function as z with E [λy] = E [z] = 0 and Var [z] = λ2 Var [y].
Therefore, by Lemma 4, z =d λy. Next, write
x + y = α (x + λy) + (1 − α) x,
where α =
1
λ
and x is a random variable satisfying E [x | y] = E [x] = 0. Then, since Γ is concave, by
34
the Jensen inequality
Γ (x + y) ≥ αΓ (x + λy) + (1 − α) Γ (x) .
Taking expectations of both sides yields
E [E [Γ (x + y) | x]] ≥ αE [E [Γ (x + λy) | x]] + (1 − α) E [Γ (x)] .
(13)
Since E [λy] = 0, a concave Γ implies
E [Γ (E [x + λy | x])] = E [Γ (x)] ≥ E [E [Γ (x + λy) | x]] ,
which jointly with Equation (13) implies
E [E [Γ (x + y) | x]] ≥ E [E [Γ (x + λy) | x]] .
Let x = 0, then
E [Γ (y)] ≥ E [Γ (λy)] = E [Γ (z)] ,
which, by Izhakian (2014, Proposition 5), implies δˆE %2 δˆF .
(iii) Let Γ (P (E)) = − e
−ηP(E)
η
. Taking a second-order Taylor approximation around E [P (E)] yields
Γ (P (E)) ≈ −1 + (P (E) − E [P (E)]) −
1
(P (E) − E [P (E)])2 ,
2
and taking expectation yields
1
E [Γ (P (E))] ≈ −1 − ηVar [P (E)] .
2
This implies that, concerning an ambiguity-averse DM,
E [Γ (P (E))] ≥ E [Γ (P (F ))] ⇐⇒ Var [P (E)] ≤ Var [P (F )] .
and, therefore, by Izhakian (2014, Proposition 5),
δˆE %2 δˆF
⇐⇒ Var [P (E)] ≤ Var [P (F )] .
(iv) Let Γ (P (E)) = − (P (E) − α)2 , where P (E) ≤ α for some α ∈ R. Taking expectation provides
Ä
ä
E [Γ (P (E))] = − Var [P (E)] + (E [P (E)] − α)2 .
Since E [P (E)] = E [P (F )] = 0, then
E [Γ (P (E))] ≥ E [Γ (P (F ))] ⇐⇒ Var [P (E)] ≤ Var [P (F )] .
and, therefore, by Izhakian (2014, Proposition 5),
δˆE %2 δˆF
⇐⇒ Var [P (E)] ≤ Var [P (F )] .
35
Proposition 2 then completes the proof.
Proof of Theorem 4.
(⇐=) Suppose that V (f ) ≥ V (g) but f does not stochastically dominate g with respect to ambiguity.
Then, there exists x∗ ∈ X such that
∫
∫
x∗
−∞
E [φf (z)] Var [φf (z)] dz >
x∗
−∞
E [φg (z)] Var [Pg (z)] dz.
Define U (x) such that U (x) = −1 if x < x∗ ≤ k and otherwise U (x) = 0. Assume CAAA, i.e.,
Γ (P (E)) =
(P(E))1−η
.
1−η
Then, since E [φf (x)] = E [φg (x)] for every x ∈ X , by Equation (4)
∫
V (f ) − V (g) ≈ −η
x∗
−∞
î
ó
E [φf (z)] Var [φf (z)] − Var [φg (z)] dz
Clearly V (f ) − V (g) < 0, which is a contradiction.
(=⇒) Since E [φf (x)] = E [φg (x)] for every x ∈ X , by Equation (4)
∫
î
ó
Γ′′ (1 − E [Pf (x)])
E
[φ
(x)]
Var
[φ
(x)]
−
Var
[φ
(x)]
dx
g
f
f
Γ′ (1 − E [Pf (x)])
−∞
∫ ∞
î
ó
Γ′′ (1 − E [Pf (x)])
+
U (x) ′
E [φf (x)] Var [φf (x)] − Var [φg (x)] dx
Γ (1 − E [Pf (x)])
k
V (f ) − V (g) ≈ −
k
U (x)
By Lemma 7, Var [φ (x)] is mean-independent of x as well as of P (x). Therefore, by Lemma 8,
∫
∫
k
î
ó
Γ′′ (1 − E [Pf (x)])
E [φf (x)] U (x) ′
E [φf (x)] Var [φf (x)] − Var [φg (x)] dx
V (f ) − V (g) ≈ −
dx
Γ (1 − E [Pf (x)])
−∞
−∞
∫ ∞
∫ ∞
î
ó
Γ′′ (1 − E [Pf (x)])
+
E [φf (x)] U (x) ′
dx
E [φf (x)] Var [φf (x)] − Var [φg (x)] dx
Γ (1 − E [Pf (x)])
k
k
k
Since U (x) ≥ 0 for x ≥ k, U (x) ≤ 0 for x ≤ k, and
Γ′′ (·)
Γ′ (·)
≤ 0, if act f first-order stochastically
dominates act g, then V (f ) − V (g) ≥ 0.
Proof of Theorem 5.
Since E [φf (x)] = E [φg (x)] for any x and k = −∞, then by Theorem 2
∫
W (f ) − W (g) ≈
Ä
ä
Γ′′ (1 − E [Pf (x)])
E
[φ
(x)]
Var
[φ
(x)]
−
Var
[φ
(x)]
dx.
g
f
f
Γ′ (1 − E [Pf (x)])
∞
−∞
U (x)
By Lemma 7, Var [φf (x)] is mean-independent of x, as well as of Pf (x). Therefore,
∫
W (f ) − W (g) ≈
∫
Since
∞
−∞
E [φf (x)] U (x)
∞
−∞
E [φf (x)] U (x)
W (f ) ≥ W (g)
Γ′′
Γ′′ (1 − E [Pf (x)])
dx
Γ′ (1 − E [Pf (x)])
(1−E[Pf (x)])
Γ′ (1−E[Pf (x)])
∫
⇐⇒
∞
−∞
∫
∞
−∞
Ä
dx ≤ 0, then
∫
E [φf (x)] Var [φf (x)] dx ≤
∞
−∞
and, by Theorem 2,
∫
f% g
1
⇐⇒
∞
−∞
36
ä
E [φf (x)] Var [φf (x)] − Var [φg (x)] dx.
f2 [f ] ≤ f2 [g] .
E [φf (x)] Var [φg (x)] dx
Proof of Theorem 6.
Since E [φf (x)] = E [φg (x)] for any x, then by Theorem 2
∫
Ä
ä
Γ′′ (1 − E [Pf (x)])
E
[φ
(x)]
Var
[φ
(x)]
−
Var
[φ
(x)]
dx
g
f
f
Γ′ (1 − E [Pf (x)])
−∞
∫ ∞
Ä
ä
Γ′′ (1 − E [Pf (x)])
U (x) ′
+
E [φf (x)] Var [φf (x)] − Var [φg (x)] dx.
Γ (1 − E [Pf (x)])
k
k
W (f ) − W (g) ≈ −
U (x)
By Lemma 7, Var [φf (x)] is mean-independent of x, as well as of Pf (x). Therefore, by Lemma 8,
∫
∫
k
Ä
ä
Γ′′ (1 − E [Pf (x)])
dx
E [φf (x)] Var [φf (x)] − Var [φg (x)] dx
W (f ) − W (g) ≈ −
E [φf (x)] U (x) ′
Γ (1 − E [Pf (x)])
−∞
−∞
∫ ∞
∫ ∞
Ä
ä
Γ′′ (1 − E [Pf (x)])
+
E [φf (x)] U (x) ′
dx
E [φf (x)] Var [φf (x)] − Var [φg (x)] dx.
Γ (1 − E [Pf (x)])
k
k
∫
Since −
k
k
E [φf (x)] U (x)
Γ′′ (1−E[Pf (x)])
Γ′
(1−E[Pf (x)])
by the symmetry of outcomes around k,
−∞
∫
dx ≤ 0 and
∫
W (f ) − W (g) ≥ 0
⇐⇒
k
∫−∞
∞
k
∞
E [φf (x)] U (x)
k
Γ′′ (1−E[Pf (x)])
Γ′ (1−E[Pf (x)])
Ä
ä
Ä
ä
dx ≤ 0 then,
E [φf (x)] Var [φf (x)] − Var [φg (x)] dx +
E [φf (x)] Var [φf (x)] − Var [φg (x)] dx ≤ 0,
which implies
∫
W (f ) ≥ W (g)
⇐⇒
∞
−∞
∫
E [φf (x)] Var [φf (x)] dx ≤
∞
−∞
E [φf (x)] Var [φg (x)] dx
and, by Theorem 2,
∫
f% g
1
⇐⇒
∞
−∞
f2 [f ] ≤ f2 [g] .
Proof of Theorem 7. The first-order Taylor approximation of the LHS of Equation (5) with respect
to K, around 0, is
∫
LHS = U (E [x] − K) =
∞
−∞
∫
E [φ (x)] U (E [x] − K) dx ≈
∞
(
)
E [φ (x)] U (E [x]) − KU′ (E [x]) dx.
−∞
Writing the RHS of Equation (5) as
∫
RHS =
|
∞
−∞
{z
}
I
Ç∫
|
E [φ (x)] U (x) dx +
∞
k
′′
(1−E[P(x)])
U (x) ΓΓ′ (1−E[P(x)])
E [φ (x)] Var [φ (x)] dx −
{z
∫
k
−∞
å
′′
(1−E[P(x)])
U (x) ΓΓ′ (1−E[P(x)])
E [φ (x)] Var [φ (x)] dx
}
II
the second-order Taylor approximation of I with respect to x, around E [x], is then
∫
∞
Ç
å
1
E [φ (x)] U (E [x]) + U (E [x]) (x − E [x]) + U′′ (E [x]) (x − E [x])2 dx
I ≈
2
−∞
1
= U (E [x]) + U′′ (E [x]) Var [x] .
2
′
37
,
Taking the first-order Taylor approximation of II with respect to x, around E [x], provides37
∫
II ≈ −
+
k
−∞
∫ ∞
(
k
) Γ′′ (1 − E [P (x)])
(
U (E [x]) + U′ (E [x]) (x − E [x])
E [φ (x)] Var [φ (x)] dx
Γ′ (1 − E [P (x)])
) Γ′′ (1 − E [P (x)])
U (E [x]) + U′ (E [x]) (x − E [x]) ′
E [φ (x)] Var [φ (x)] dx.
Γ (1 − E [P (x)])
Since E [x] is relatively close to the reference point k and U (k) = 0, then U (E [x]) ≈ 0. Therefore,
∫
′
II = U (E [x])
∞
−∞
|x − E [x]|
Γ′′ (1 − E [P (x)])
E [φ (x)] Var [φ (x)] dx.
Γ′ (1 − E [P (x)])
Since, by Lemma 7, Var [φ (x)] is mean-independent of x, as well as of P (x),
II = U′ (E [x])
∫
∫
∞
−∞
E [φ (x)] Var [φ (x)] dx
∞
−∞
E [φ (x)] |x − E [x]|
Γ′′ (1 − E [P (x)])
dx.
Γ′ (1 − E [P (x)])
By Lemma 7 again, |x − E [x]| is also mean-independent of P (x). Therefore,
II = U′ (E [x])
∫
∫
∞
−∞
E [φ (x)] Var [φ (x)] dx
∞
−∞
∫
E [φ (x)] |x − E [x]| dx
∞
−∞
E [φ (x)]
Γ′′ (1 − E [P (x)])
dx
Γ′ (1 − E [P (x)])
Combining the LHS, the RHS, I and II, the uncertainty premium is
ñ
ô
ó
Γ′′ (1 − E [P (x)]) î
1 U′′ (E [x])
Var
[x]
−
E
E
|x
−
E
[x]|
f2 [x] .
K ≈ −
2 U′ (E [x])
Γ′ (1 − E [P (x)])
Proof of Proposition 1.
Immediately obtained by substituting the perceived probabilities approx-
imated by Theorem 1 into the value function in Equation (2), while accounting for U (x) ≤ 0 when
x ≤ k and substituting E [P ({s ∈ S | U (f (s)) ≤ z})] + E [P ({s ∈ S | U (f (s)) ≥ z})] for 1.
Proof of Proposition 2.
Let y = P (E) − E [P (E)] and z = P (F ) − E [P (F )], and assume that F
is more ambiguous than E. By Definition 2, z =d y + ϵ. By Izhakian (2014, Proposition 5), the DM’s
preference %2 is characterized by the outlook function Γ : [0, 1] → R, implying that
E [Γ (z)] = E [E [Γ (y + ϵ) | y]] .
Ignoring the expectation on the RHS for the moment, ambiguity aversion, formed by a concave Γ,
implies
E [Γ (z)] = E [Γ (y + ϵ)] ≤ Γ (E [y + ϵ]) = Γ (y) .
Taking expectation implies E [Γ (z)] ≤ E [Γ (y)]. Hence, by Izhakian (2014, Proposition 5), δˆF -2 δˆE .
For the opposite direction, let δˆF -2 δˆE . Then, by Izhakian (2014, Proposition 5),
E [Γ (z)] ≤ E [Γ (y)] .
It needs to be shown that there exists an ϵ that satisfies Definition 2. The proof considers two probability distributions P ∈ P; it can then be extended to any number of probability distributions. Let y
37
Note that this component holds an order of magnitude of the variance of probabilities. Thus, it is smaller by one
order of magnitude than probabilities.
38
and z take two possible values, (y1 , y2 ) and (z1 , z2 ), with probabilities (α, 1 − α) and (β, 1 − β), respectively. Without loss of generality, assume that z1 ≥ y1 ≥ y2 ≥ z2 . The random variable ϵ can then be
Ä
constructed as ϵ1 = (z1 − y1 , z2 − y1 ) with probabilities
probabilities
Ä
y2 −z2 z1 −y2
z1 −z2 , z1 −z2
ä
y1 −z2 z1 −y1
z1 −z2 , z1 −z2
ä
and ϵ2 = (z1 − y2 , z2 − y2 ) with
. It can be verified that the probabilities of ϵ1 and ϵ2 are all positive, and
that E [ϵ1 | y1 ] = 0 and E [ϵ2 | y2 ] = 0. Therefore, ϵ is mean-independent of y and E [z] = E [y + ϵ] = 0.
The probability that y + ϵ = z1 is
α
y2 − z 2
y1 − z2
+ (1 − α)
.
z1 − z2
z1 − z2
Since E [y] = E [z], then
α =
z2 − y2 + β (z1 − z2 )
.
y1 − y2
Together, this implies that the probability that y + ϵ = z1 is equal to β, and that the probability that
y + ϵ = z2 is equal to 1 − β. That is, z =d y + ϵ.
Proof of Proposition 3. By Theorem 4, W (f ) ≥ W (g) ⇐⇒ f stochastically dominates g. Then,
by Theorem 5, W (f ) ≥ W (g) ⇐⇒ f2 [f ] ≤ f2 [g]. The same holds by Theorem 6.
Proof of Corollary 1.
CAAA implies Γ′ (P (E)) = e−ηP(E) and Γ′′ (P (E)) = −ηe−ηP(E) . Substi-
tuting into Equation (4) proves the corollary.
Proof of Corollary 2.
Obtained by substituting 1 + r for x into Equation (6) of Theorem 7 and
rearranging terms.
CRRA implies U′ (x) = x−γ and U′′ (x) = −γx−γ−1 . CAAA implies
Proof of Corollary 3.
Γ′ (P (E)) = e−ηP(E) and Γ′′ (P (E)) = −ηe−ηP(E) . Substituting into Equation (7) proves the corollary.
Consider an outcome x ∈ X . Its expected probability can be written
Proof of Observation 1.
E [φ (x)] = ax +
n
∑
(bx − ax )
i=1
1
i
= ax + (bx − ax ) ,
n
2
and its variance can be written
Var [φ (x)] =
n Å
∑
i=1
i
ax + (bx − ax ) − E [φ (x)]
n
ã2
=
n Å
∑
i=1
i
(bx − ax )
n
ã2
−
1
(bx − ax )2 .
4
Differentiating Var [φ (x)] with respect to n provides
Ç
n
∑
i2
d
Var [φ (x)] = −2
(bx − ax )2 3
dn
n
i=1
å
,
which proves the claim.
Proof of Observation 2.
Given an outcome x ∈ X , the maximal variance of its probability is
39
attained when the possible probabilities are only either 0 or 1. In this case, the expected probability
of x is E [φ (x)] = χ. Therefore, the variance of the probability of x is
Var [φ (x)] = χ (1 − χ)2 + (1 − χ) (0 − χ)2 = χ − χ2 ,
which attains its maximal value when χ = 12 . In this case, Var [φ (x)] = 14 , and therefore f2 = 1. Notice
that the expected probability χ satisfies χ = n1 , where n is the number of different possible outcomes.
Therefore, the maximal value of f2 is attained when there are only two possible outcomes.
40