How do Observers Assess Resolve?

How do Observers Assess Resolve?
May 4, 2015
Joshua D. Kertzer⇤ , Jonathan Renshon† and Keren Yarhi-Milo‡
Abstract: What heuristics do observers use to estimate the resolve of others? Although
actors go to great lengths to signal or project their resolve to others, there is very little
sense in IR as to how observers actually draw these inferences, and many of our theoretical
frameworks that try to address this question focus only on a single mechanism at a time,
rather than testing their competing predictions or examining how they work in tandem.
We innovate by providing an integrative conceptual framework uniting these previously
disconnected theories, and employ a conjoint experimental design to test their observable
implications, showing how ordinary citizens intuitively carry around many of the core
principles of deterrence theory in their heads.
Acknowledgements: We gratefully acknowledge the support of the Weatherhead Center
for International A↵airs at Harvard University, audiences at ISA 2015, Columbia University, the University of Virginia, Yale University, Taj Moore for research assistance,
and Anton Strezhnev for his superb assistance with the programming of the study.
⇤ Assistant Professor of Government, Harvard University.
Email:
jkertzer@gov.harvard.edu.
Web:
http:/people.fas.harvard.edu/˜jkertzer/
† Assistant Professor, Department of Political Science, University of Wisconsin-Madison. Email: renshon@wisc.edu.
Web: http://jonathanrenshon.net
‡ Assistant Professor of Politics and International A↵airs, Woodrow Wilson School, Princeton University. Email:
kyarhi@princeton.edu. Web: http://scholar.princeton.edu/kyarhi/
In 1999, a review article on deterrence theory outlined a core puzzle in International Relations
(IR) theory: there is no agreement on what “military capabilities, interests at stake, and past
and current actions” lead actors to infer resolve in a given situation (Huth, 1999, 30). Despite
a plethora of research in the intervening years, we are still no closer to a definitive answer. IR
scholars have produced a proliferation of theoretical frameworks to address how observers assess
resolve, ranging from the role of past actions (Schelling, 1960, 1966; Jervis, Lebow and Stein, 1985;
Weisiger and Yarhi-Milo, 2015, often filtered through psychological heuristics as in Mercer, 1996),
to costly signaling (Morrow, 1994; Fearon, 1997), to current capabilities and stakes (Press, 2005),
to an ever-growing list of leader attributes (Horowitz and Stam, 2012; Gelpi and Grieco, 2001; Bak
and Palmer, 2010).
Despite the abundance of rich, conceptual frameworks for understanding how actors infer resolve,
progress has stalled. This has occurred, we argue, as a result of the tendency to focus on one
theory at a time, in isolation. This has delayed the development of an overarching conceptual
framework to help us understand how these myriad factors work in concert. International crises
are characterized by a complex information environment, in which observers can turn to a nearly
infinite number of indicators to draw inferences about which actors are more more likely to stand
firm than others. Given these countervailing indicators and proliferating theoretical expectations,
what cognitive shortcuts do observers rely on?
We innovate by providing an integrative conceptual framework to unite these previously disconnected theories. In our framework, observers seeking to assess an actor’s level of resolve turn
to two types of factors: (i) characteristics — who an actor is, and (ii) behavior — what an actor
does. When observers turn to an actor’s characteristics, they can do so at two levels of analysis,
focusing either on attributes of the state as a whole (such as regime type, capabilities, interests, and
alliances) or aspects of the political leader themselves (such as military experience, time in office, and
gender). When observers turn to an actor’s behavior, they may either look to how states behaved
in the past — as in the voluminous and contentious literature on reputation — or to how the state
is behaving in the current situation. In the current situation, states may, for example, attempt to
create credibility by engaging in costly signaling, either through tying hands or sinking costs.
Of course, providing evidence for our conceptual framework — engaging, as it does, so many
di↵erent theories of IR — requires a departure from conventional methods. Recent experimental
work (Tingley and Walter, 2011; Renshon, Dafoe and Huth, 2015; Kertzer, 2015) has highlighted
1
some of the benefits of being able to isolate and manipulate the key features of interest, but the
modal experimental design typically su↵ers from a significant logistical constraint: a given study
can only manipulate a small number of factors at a time. We solve this problem through the use of
a method likely new to IR scholars: a conjoint experimental design embedded in an online survey of
2000 American adults. Unlike traditional survey experiments that force researchers to restrict their
attention to a small number of treatments, conjoint experiments let us capture the information-rich
nature of crisis bargaining environments by allowing respondents to take a large number of factors
into consideration at once — thereby giving us the ability to adjudicate between the plethora of
competing theoretical frameworks on resolve and credibility that have germinated over the past two
decades, testing the observable implications from theoretical frameworks that are rarely investigated
together.
Our conjoint experimental design allows us the freedom to examine our conceptual framework
and provide evidence on how numerous factors at di↵erent levels of analysis work in concert to impact
observers’ estimates of resolve. We do so in the context of how members of the mass public assess
resolve, for three reasons. First, what the public thinks has significant ramifications for theories of
international relations: the empirical record shows that American leaders carefully monitor public
opinion in making decisions about the use of force, intervention, retrenchment, nuclear deployment,
and so on. From tempering President Eisenhower’s approach to the Taiwan Crisis of 1958 (Eliades,
1993), to compelling President McKinley’s involvement in the Spanish-American War (May, 1961),
to reversing President Clinton’s Somalia intervention in 1993 (Foyle, 1999, 201-229), public opinion
frequently shapes American foreign policy during crises. Whether the channel of influence is direct
(e.g., when leaders pay attention to the polls) or indirect (such as through elected members of
Congress, as in Gelpi and Grieco, 2014), there is much to be learned from an investigation of the
manner in which ordinary citizens evaluate and draw inferences about resolve in international a↵airs.
Second, the phenomenon we explore here is bigger than just IR: evolutionary theorists argue
that selection pressures have hard-wired humans and other animals with cognitive and behavioral
mechanisms to enable us to quickly and accurately draw inferences about others’ resolve, or what anthropologists and evolutionary biologists call “resource-holding potential” or “formidability” (Maynard Smith, 1974; Parker, 1974; Parker and Rubenstein, 1981; Sell, Tooby and Cosmides, 2009; for
an application of this coalitional psychology to IR, see Lopez, McDermott and Petersen, 2011). The
question of how we employ these adaptive mechanisms and weight multiple factors into a single
2
summary representation to assess formidability is thus far from a question that solely applies to
high-ranking intelligence officers, as seen by the range of studies that have tested this same question in a wide-range of animals, including five-year old children, for example (e.g. Pietraszewski
and Shaw, 2015). Given that we know how toads, speckled wood butterflies, red deer, and African
elephants assess formidability (Davies and Halliday, 1978; Davies, 1978; Clutton-Brock and Albon,
1979; Ganswindt et al., 2005), it seems germane for us to ask the same question about ordinary
humans.
The last point is methodological in nature. One of the reasons for the persistent disagreements
in debates about assessments of resolve is that scholars confront a cornucopia of competing theories,
but relatively little data to help us adjudicate between them. Although many of these theories make
causal claims, clean counterfactuals are often hard to come by, and many of our factors of interest are
endogenous (are current behavior and past actions ever independent of one another?) or collinear (is
regime type ever randomly distributed?). In this context, experimental methods can o↵er important
advantages in isolating and manipulating causal features of interest. Importantly, though, if we want
to harness the advantages of experimental methods, we must — even if we are eventually concerned
with the inferences of leaders — of necessity begin with studies on ordinary citizens. Moving on to
elite samples makes sense only after replications and extensions have increased our confidence in a
given research program and narrowed down the plausible candidates for experimentation to a number
of factors feasible for study in the extremely small samples that characterize elite experimentation
(Renshon, 2015). In our conclusion, we turn back to this broader question about elites by raising a
number of suggestions for scholars seeking to extend our results to elite decision-makers.
1
How do observers assess resolve?
Following Kertzer (2015), we define resolve as a “firmness or steadfastness of purpose”, maintaining a
policy — in this case, holding fast in crisis bargaining — despite contrary inclinations or temptations
to back down. Highly resolved actors are understood to be more willing to su↵er the costs of
fighting to achieve a desirable outcome in the political dispute. Resolve is thus both situational and
dispositional, varying both across issues and across actors. Although an actor’s resolve is private
information implicating its own willingness to risk war, it is obviously of critical importance to its
opponent, whose own demands will depend on its beliefs about the first state’s level of resolve.
3
Given that resolve is not directly observable, however, how do actors and observers infer the resolve
of others?
Figure 1: How do we infer resolve?
Characteristics
Behaviors
State-level
Leader-level
Past
Present
Capabilities
Interests
Alliances
Regime type
Military experience
Time in office
Gender
Reputation
Costly signals
Multiple literatures touch on this subject from a variety of perspectives, including reputation,
the democratic advantage, costly signaling, and audience costs. Although partitioning these factors
is analytically useful in order to bring theoretical clarity to a specific mechanism of interest, in as
much as these factors matter in foreign policy crises, they likely do so simultaneously. In order
to weave insights together from this diverse array of research and foster a richer understanding of
how observers assess resolve, we classify debates over resolve into two broad classes of explanations,
each containing two sub-classes of explanations that advance distinct hypotheses. The first class
of explanations emphasizes behavioral indicators, either in the past (reputation) or in the present
(signaling). The second class of explanations emphasizes the characteristics of the actor in question,
which can refer either to state-level characteristics (capabilities, interests, alliances, regime type) or
leader-level characteristics (military experience, time in office, gender, etc.).
1.1
1.1.1
Behavioral Indicators
Past Actions
Behavioral indicators of resolve refer to those actions that states choose to take in order to communicate their intentions to stand firm. The first set of behavioral indicators concern the state’s past
actions, that is, its reputation for resolve. A well established literature on reputation in international
politics holds that one of the most common ways actors calculate the credibility of current commitments is by looking at past actions (e.g., Copeland, 1997). Both American presidents and political
4
scientists have long assumed that reputation matters in international politics. For much of the Cold
War, scholars emphasized the importance of a state’s reputation for resolve as means of deterring
future conflicts. The United States’ reputation was seen as one of its most valuable possessions:
President Truman justified intervention in Korea on the grounds that a failure to respond “would
be an open invitation to new acts of aggression elsewhere,” while Thomas Schelling asserted that
the loss of 30,000 men in the resulting inconclusive war was “undoubtedly worth it,” because doing
so saved face for the United States and thus positively influenced Soviet expectations of American
behavior in future crises (Schelling, 1966, 124-125). Most recently, Weisiger and Yarhi-Milo (2015)
have found that states that backed down in the past were more likely to be challenged in subsequent
militarized disputes. They report that inferences drawn from past action hold for both democracies
and non-democracies and are not reset after leadership turnover. The above literature therefore
leads to the following hypothesis.
The Reputation Hypothesis: Observers will make predictions about level of resolve
by looking at the history of the crisis behavior of states. A country with a history of
backing down in crisis will lead observers to predict less resolute behavior, whereas
a country with a history of standing firm will lead observers to predict more resolute
behavior.
In contrast to those who believe that reputation matters in international crises, a number of
scholars have called into question the e↵ectiveness of past actions in a↵ecting assessments of resolve
(Clare and Danilovic, 2012; Danilovic, 2001; Huth and Russett, 1984; Hopf, 1991, 1994; Mercer,
1996; Press, 2005). Hopf (1991, 1994) finds that U.S. actions in peripheral regions did not influence
Soviet leaders’ assessments about the United States’ likely behavior in areas such as Europe or East
Asia. Mercer (1996) similarly finds little support for the formation of reputation in IR, arguing
that attribution theory structures the way actors make inferences about reputation. In particular,
adversaries that act undesirably — e.g., by standing firm — gain reputations for that action or
attribute while allies who do the same will likely have their behavior explained away as attributable
to situational constraints. Thus, adversaries almost never develop reputations for ”irresoluteness”
in each others’ eyes.
5
In case studies of European geopolitics in the 1930s and of Soviet-American relations during the
Cold War, Press (2005) finds no evidence that either Nazi leaders or American policymakers made
predictions about the likely behavior of their opponents based on their record of backing down in
previous crises. Instead, “current calculus” theory argues that observers infer resolve when an actor
has “sufficient power” to carry out their threats their demands “serve clear interests” (Press, 2005,
8-9). Finally, Jervis (1982) notes that reputational logics lead to a paradox that Press (2005, 29)
christens the “Never Again theory”: if actors care about maintaining a reputation for resolve, why
should observers assume that an actor that backed down in the past is more likely to back down in
the present crisis, rather than more likely to stand firm out of a desire to rebuild their reputation?
1.1.2
Current actions
The second set of behavioral indicators concern the state’s current actions, that is, the signaling
behavior of leaders during a crisis. The writings of Schelling (1960), Jervis (1970), Fearon (1997)
and others (Powell, 2004; Slantchev, 2005) on crisis bargaining feature one common theme: by
taking particular costly gestures or actions that raise the risks of escalation, leaders can e↵ectively
manipulate others’ assessments of their resolve. Put di↵erently, because resolve is private information
and leaders have incentives to pretend to be resolved, even when they are not, they need to take
actions that would be costly enough to separate them from those unresolved types. Two prominent
types of such costly signals have received much attention in the literature: sinking costs and tying
hands.1
The first type of costly signal of resolve refers to those that sink costs by taking actions, such
as mobilizing troops, that are financially or militarily costly ex ante.2 These are actions that are
costly to undertake in the near term but do not impact the payo↵s associated with future courses of
action. The second type refers to those costly signals in crisis that tie the hands of leaders by creating
audience costs that leaders will su↵er ex post from domestic political audiences if they don’t follow
through on their threat during crisis (Fearon, 1994). In these cases, the sender pays the cost only if it
does not follow through on its commitment. Traditionally, audience costs are seen as being greater in
magnitude in democracies where leaders are held electorally accountable for backing down on explicit
1 A similar signaling literature has also developed in biology, explaining the highly ritualized nature of many animal
confrontations, which allow actors and observers to assess potential conflict outcomes without actually engaging in
costly conflict itself — e.g., Clutton-Brock and Albon (1979) on approaching and parallel walking among red deer, or
Ganswindt et al. (2005) on the signaling value of musth in African elephants.
2 See also Kertzer and Brutger (2015), who find that even issuing a threat of force can create a sunk cost.
6
threats or damaging national reputation (Fearon, 1994). Autocracies, in the original formulation,
have no such moderating mechanism, and are therefore free to behave pragmatically. Weeks (2008,
2012), however, demonstrates that autocratic leaders di↵er significantly in the extent to which they
are vulnerable to audience costs, with some types of regimes being equally as susceptible — and thus
equally as able to appear resolved — as democracies. The signaling literature therefore generates
the following hypothesis.
Costly Signals Hypothesis: Observers will make predictions about level of resolve on
the basis of the crisis behavior of the countries involved. Costly signals that sink
costs or tie the hands of leaders will lead observers to predict resolute behavior,
whereas a country that fails to send such signals will be perceived as less resolved.
As in the case of reputation, the literature on signaling is not without its share of critics. For
example, Snyder and Borghard (2011) find little evidence to support the notion that audience costs
have been decisive in determining outcomes, arguing instead that democratic leaders are loathe to
lock themselves in by making explicit threats which involve reputation, and that authoritarian targets
do not give the same credence to audience costs that scholars suppose. Trachtenberg (2012) similarly
does not find support for the audience cost theory in an examination of a dozen great power crises.
Downes and Sechser (2012) find that many initial studies purporting to find evidence of audience
costs relied on data inappropriate to the question at hand. Experimental studies on audience costs,
on the other hand, have reported mixed results (Tomz, 2007; Levendusky and Horowitz, 2012;
Kertzer and Brutger, 2015; Levy et al., 2015).
1.2
1.2.1
Characteristics
State-level characteristics
In addition to behavioral indicators, observers calibrate their assessments of an actor’s resolve with
reference to the actor’s characteristics, at two di↵erent levels of analysis. State-level variables may
include a state’s level of capabilities, the nature of its interests (generally or in a particular crisis),
the depth and breadth of its alliances, and its regime type. Each of these attributes may impact
7
perceptions of the state’s resolve.3 For example, a state with higher capabilities may be perceived
as more resolute since it is likely to face lower costs for holding firm compared to a weaker state.
Capabilities are often modeled as a source of resolve in game theoretic approaches (e.g., Snyder
and Diesing, 1977; Morrow, 1989), and size and strength play important roles in assessments of
formidability in evolutionary models (Holbrook and Fessler, 2013). Similarly, a state with high
stakes in a situation will likely be perceived as more resolute, such that actors with high levels of
interest in the dispute will be more willing to bear the costs they face (Lebow, 1998; Arregu´ın-Toft,
2001).
Likewise, a state’s alliance portfolio is also likely to shape assessments of its resolve. Alliances
can shape assessments of resolve in a number of ways, from aggregating capabilities to signaling
a state’s place in the international community, but of particular interest here is the question of
whether allies of the United States are perceived as more likely to stand firm than adversaries
are. A large literature on alliances warns about the relative unreliability of allies, because of their
tendency for buckpassing and free riding (e.g., Snyder, 1984; Christensen and Snyder, 1990), while
image theory in International Relations emphasizes the extent to which adversaries tend to be
perceived as highly resolute (Herrmann and Fischerkeller, 1995). Finally, as the literature on the
democratic advantage suggests, democracies tend to be more resolute in crises than autocracies for
a variety of reasons, including their ability to be more strategic about the conflicts they enter and
display greater initiative on the battlefield (Reiter and Stam, 2002), as well as their advantage at
creating audience costs (Fearon, 1994), which we discuss in greater detail below. Based on these
various characteristics — capabilities and interests, alliances, and regime type — we can generate
the following three hypotheses.
Capabilities and Interests Hypothesis: States with stronger military capabilities will
be perceived as more likely to stand firm than states with weaker military capabilities.
States with more interests at stake in a dispute will be perceived as more likely to
stand firm than states with fewer interests at stake.
3 They
may also include characteristics we do not discuss here, such as cultural variables (e.g. Nisbett and Cohen,
1996).
8
Alliance Unreliability Hypothesis: Allies of the United States will be perceived as
less likely to stand firm than adversaries of the United States.
Regime Type Hypothesis: Irrespective of who they face in a crisis, democracies will
be viewed as more resolute than non-democracies.
1.2.2
Leader-level characteristics
Lastly, a growing body of literature has claimed that particular attributes of leaders a↵ect their
inclination to stand firm or back down in crisis, and by implication, others’ assessments of their
level of resolve. Along those lines, Horowitz and Stam (2012) find that leaders with prior military
service, but not combat experience, are significantly more likely to initiate militarized disputes and
wars than other leaders. Gelpi and Grieco (2001) find that because democracies have a high rate of
leadership turnover, democratic leaders have less time in office, and thus less experience compared
to their authoritarian counterparts. As a consequence, democracies are more likely to be challenged
than autocracies in international crises, since their inexperienced leaders are perceived to be more
likely to make concessions. Looking at the e↵ect of leadership turnover on resolve, Dafoe (2012, see
also Wolford, 2007) theorizes that observers’ incentive to probe the resolve of new leaders leads the
latter to stand firm and develop a reputation for resolve in their early years in office. Building on
this, Renshon, Dafoe and Huth (2015) find that the importance of leader-level characteristics — for
inferences about resolve — itself varies according to the amount of influence leaders are perceived
to have in a given situation.
Examining the impact of gender on conflict initiation, McIntyre et al. (2007) show that men are
more likely to engage in aggressive action than women, and more likely to lose their fights as well,
suggesting a relationship between testosterone levels and aggression (see also Horowitz, McDermott
and Stam, 2005; Johnson et al., 2006). In addition to biological di↵erences, social expectations may
also suggest to observers that female leaders will be less resolute than male leaders in crisis situations
(Caprioli and Boyer, 2001). We note, however, that none of these findings is ironclad. Biological
9
di↵erences may push in the opposite direction in some cases (e.g., for older female leaders who have
testosterone levels comparable to males, McDermott et al., 2007), and selection e↵ects might results
in only extremely tough females coming to power (Anzia and Berry, 2011; Ferreira and Gyourko,
2014).
The insights and findings reported above can help us generate several hypotheses about particular ways leaders’ attributes and experience could a↵ect observers’ assessments of resolve.
Leaders’ Military Experience Hypothesis: Observers will make predictions about level
of resolve on the basis of the military background of leaders. Leaders with military
experience are more likely to be perceived as resolved compared to leaders without
such background.
Leaders’ Political Experience Hypothesis: Observers will make predictions about
level of resolve on the basis of the political experience of leaders. Leaders with a
long political experience are more likely to be perceived as resolved compared to
leader with short political experience.
Leader Time-in-Office Hypothesis: Observers will make predictions about level of
resolve on the basis of leader’s time in office. Leaders who have been recently elected
are more likely to be perceived as resolved compared to leaders who have been in
office for a longer period of time.
Gender Hypothesis: Observers will make predictions about level of resolve on the
basis of leader’s gender. Male leaders are more likely to be perceived as resolute
compared to female leaders.
10
1.3
Interactionist hypotheses
The advantage of the conceptual framework outlined in Figure 1 is threefold. First, it illustrates
the extent to which di↵erent quadrants of the IR literature have pointed to very di↵erent indicators
that can be used to infer resolve, each of which o↵ers an observable implication that can be subject
to empirical testing. Second, it o↵ers a loose conceptual ordering to suggest how these di↵erent sets
of factors are related to one another. Third, although each of the heuristics discussed above can
a↵ect assessments of resolve separately, we also have theoretical reasons to believe they may operate
in combination with one another. Indeed, the existing literature makes a number of more complex
theoretical predictions we can test about how characteristics and behavior jointly a↵ect assessments
of resolve.
First, are the reputation skeptics, who tend to argue that what actors have done in the past
is less informative when predicting future behavior than various configurations of their current
characteristics. Press (2005), for example, o↵ers a “Current Calculus” hypothesis in which current
capabilities and interests override the e↵ects of past behavior. Mercer (1996) o↵ers an equally
pessimistic view about reputation, but for di↵erent reasons, turning to attribution theory to that
assessments of resolve are a↵ected by the relationship between the observer and the state in question,
i.e. whether they are allies or adversaries. Mercer contends that only undesirable behavior by
another country — such as when allies back down or adversaries stand firm — elicits dispositional
attributions, such that only undesirable behavior can be used to predict future behavior.
Current Calculus Hypothesis: Observers will make predictions about level of resolve
on the basis of calculations of military capabilities and interests at stake rather than
on the basis of reputation for resolve.
When a country is (not) military powerful and level of interests at stake are (low)
high, observers will predict (irresolute) resolute behavior regardless of whether the
country has a history of (standing firm) backing down.
Attribution Theory Hypothesis: Adversaries will get a reputation for resolve but they
11
will not get a reputation for lacking resolve, whereas allies will get a reputation for
lacking resolve but will not get a reputation for resolve.
A history of standing firm/backing down will/will not lead observers to predict
resolute/ irresolute behavior for their adversaries.
A history of standing firm/backing down will not/will lead observers to predict
resolute/irresolute behavior for their allies.
Second is democratic credibility theory. Following Fearon (1994), a large literature on audience costs has suggested that democratic states are more credibly able to issue threats than nondemocratic ones, because of the presence of a domestic audience that will punish leaders when they
back down on threats. If this argument is true, we would expect the e↵ect of public threats on assessments of resolve would vary with regime type, with democracies who issue public threats being
perceived as more resolved than their non-democratic counterparts.
Democratic credibility hypothesis: Democracies who issue public threats will be perceived as more resolved than non-democratic states who issue public threats.
However, a vibrant strain of recent work in IR has argued that “democratic-autocratic” distinction is either misleading or overstated. Weeks (2008), for example, show that there is significant
variation within autocratic states, with some types able to credibly generate audience costs at rates
similar to their democratic counterparts. Relatedly, Downes and Sechser (2012) show that the
purported “democratic advantage” is — if it exists at all — far smaller than previously believed.
Authoritarian Resolve hypothesis: Authoritarian states who issue public threats will
be perceived as equally resolved compared to their democratic counterparts who
issue threats.
12
2
Research Design
We would not be the first to note that the use of experimental methods in international relations
has grown exponentially over the last decade. Typically, the rationale o↵ered for such designs is
that experiments (lab or survey) have significant advantages in detecting causal e↵ects. Often, the
motivation for such studies is based on strong prior beliefs concerning the importance of a particular
variable. In these cases, the authors often note that while we have reason to believe these factors
matter, we have little evidence of a causal e↵ect.
For reasons of statistical power, however, most experiments in IR tend to focus on manipulating
only a small handful of factors at a time, and prior experimental work on credibility and reputations
has been no exception (Tomz, 2007; Trager and Vavreck, 2011; Davies and Johns, 2013; Tingley and
Walter, 2011). Experiments in these cases serve to eliminate confounding variables and explanations,
manipulating and isolating only those factors of import to the theory. This approach has allowed
us to uncover the likely influences on assessments about resolve in isolation, but to date we have
not been able to test the relative importance of many competing hypotheses about perceptions of
resolve in one comprehensive design.
The motivation for our study in this paper turns this conventional justification on its head.
Whereas we sometimes have strong beliefs about a particular variable, but no causal evidence, here
we have many competing theories, all with di↵erent hypotheses for what we should expect to find.
Here, in essence, our problem is too many theories and hypotheses, rather than too few.
To address this issue, we use a conjoint experimental design which is ideally suited for comparing
the relative explanatory power of various hypotheses (Hainmueller and Hopkins, 2014). These designs
— sometimes referred to as “choice-based” — have been prevalent in marketing for many years,
but have only recently been introduced to political scientists in Hainmueller and Hopkins (2014)
and Hainmueller, Hopkins and Yamamoto (2014). In our conjoint experiment, subjects are shown
randomly-generated profiles for two countries, A and B, and told that these two countries are
engaged in a foreign policy dispute. Subjects are then asked to indicate which country they think is
more likely to stand firm. We then repeat the exercise seven more times, each involving a dispute
between a di↵erent pair of randomly-generated country profiles.
One of the most significant advantages gained through the use of this conjoint design is in
statistical power. Formally, our experiment would be considered a 2 ⇥ 3 ⇥ 2 ⇥ 2 ⇥ 3 ⇥ 3 ⇥
13
2 ⇥ 2 ⇥ 2 ⇥ 2 ⇥ 3, giving us a total of 10,368 cells for each country. Making a small number
of assumptions, all of which are either guaranteed by the design itself or verifiable empirically,
allows us to pool observations across choice tasks.4 This is the case despite the fact that there are
many more possible country profiles than are ever observed in the study. An additional advantage
is that paired conjoint designs similar to the one utilized here seem to perform remarkably well in
reproducing behavioral data obtained in actual voting and choice contexts (Hainmueller, Hangartner
and Yamamoto, 2015).
The end result of conjoint experiments is the ability to estimate the e↵ect of each treatment — in
this setup, referred to as the Average Marginal Component E↵ect (AMCE) — with a relatively small
number of subjects (N = 2000). The AMCE represents the average di↵erence in the probability of
being seen as the country more likely to stand firm when comparing two di↵erent attribute levels —
e.g., a democracy versus a dictatorship — where the average is taken over all possible combinations
of other country attributes (Hainmueller and Hopkins, 2014). Because all other attributes are
randomly assigned, profiles with Regime Type = Democracy will have the same distribution for
all other attributes as profiles with Regime Type = Dictatorship. In other words, the AMCE
for “Democracy” (compared to Dictatorship) will not depend on values of other attributes (such as
whether or not the country has a strong military), but average across them.
2.1
Our Treatments
For a full list of our treatments, see Table 1.
2.2
Sample
The study described below as fielded on 2,009 American adults recruited via MTurk in January
2015. Participants were paid $1 for their participation. Despite some initial skepticism, Amazon’s
MTurk platform has become an increasingly popular resource for experimental social science.5
4 The first assumption is that the potential outcomes remain stable across periods; i.e., a subject should pick a
given country (conditional on its and the other country’s attributes) regardless of what countries or choices they had
seen previously. Second, we assume no profile-order e↵ects; i.e., we assume the choice would be the same regardless
of the order in which the two countries are presented. The final assumption is that the attributes of each profile are
randomly generated. Those assumptions are discussed in considerable detail in Hainmueller, Hopkins and Yamamoto
(2014). In Appendix §1, we conduct detailed sensitivity analysis to validate these assumptions and demonstrate the
robustness of our results.
5 And in political science. Experiments using MTurk samples have been published in almost all notable journals,
including the American Political Science Review (Tomz and Weeks, 2013), the American Journal of Political Science
(Healy and Lenz, 2014), International Organization (Wallace, 2013) and Journal of Conflict Resolution (Kriner and
Shen, 2014).
14
(A) Military capabilities
The country...
(1) ... has a very powerful military
(2) ... does not have a very powerful military
The country is ...
(1) ... a democracy
(2) ... a dictatorship
(3) ... in between a democracy and a dictatorship
Experts describe the country’s stakes in the dispute as...
(1) ... high
(1) ... low
The leader ...
(1) ... recently took office
(2) ... has been in power for many years
(B) Regime type
(C) Interests/Stakes
(D) Time in office
Leader background
(E) Gender
(1) He
(2) She
(F) Military Experience
(1) does not have experience in the military
(2) has served in the military briefly
(3) had a long career in the military
The country is...
(1) ... the United States
(2) ... an ally of the United States
(3) ... an adversary of the United States
(G) Foreign Relations
(H) Initiator
(1) it was challenged
(2) it initiated the crisis
Behavior in last
international dispute
(I) Identity of other state
in previous dispute
(1) ally of the United States
(2) adversary of the United States
(J) Outcome of Previous Dispute
(1) the country ultimately stood firm
(2) the country ultimately backed down
At the time, the country was...
(1) ... led by a di↵erent leader than the one
in the current dispute.
(2) ... led by the same leader as the one
in the current dispute
In the current crisis, the country...
(1) ... has yet to make any statements
or carry out any actions
(2) ... has mobilized troops
(3) ... has made a public threat that they will use force
if the other country does not back down
(K) Leadership Change
(L) Current behavior
Table 1: Conjoint Study Treatments: Treatment categories are denoted by letters (A-L), while gray
blocks denote clusters of treatments that are displayed together in order to remain understandable.
All other items are displayed in a random order. Though displayed together, all treatments in gray
clusters are manipulated independently save for the constraints imposed on randomization if the
country is desired as being the United States (described in detailed in main text).
15
Unless the experimental design requires a physical presence in a laboratory (such as studies that
measure physiological data), survey experiments are a far more efficient use of resources.
One potential concern is the representativeness of the sample, as compared to other potential
survey platforms. Berinsky, Huber and Lenz (2012, 366) show that MTurk samples are “often
more representative of the general population and substantially less expensive to recruit” than other
“convenience samples” often used in political science (see Hu↵ and Tingley, 2014, for the latest and
most definitive work on this subject).6 They also demonstrate the ability to replicate results from
nationally-representative samples — e.g., Kam and Simon’s (2010) work on framing and risk and
Tversky and Kahneman’s (1981) classic “asian disease problem” — using MTurk workers.7 In
keeping with “best practices” suggested by numerous researchers, we limited participation in the
study to MTurk workers located in the United States, who had completed
50 HITs, and whose
HIT approval rate was >95%.
2.3
The Experiment
After the consent process, subjects saw an introductory screen which described a dispute over a
contested territory.8 Subjects then saw the first conjoint block, and repeated the exercise seven more
times (eight total).9 A sample of what they might have seen is depicted in Table 2. Randomization
occurred in several places. As with any experiment, treatments themselves — here the characteristics
of Country A and B — were randomly assigned (and randomized independently of one another). In
the Foreign Relations treatment block, subjects were told that the country in question was either
(a) the United States (b) an ally or (c) adversary of the United States. If choice (c) was randomly
assigned for Country A, then randomization was constrained for both other aspects of Country A
as well as the identity of Country B. In particular, if Country A was described as being the United
States, it was also described as being a democracy, and having a very powerful military. Similarly, if
Country A was described as being the United States, then Country B was constrained such that it
could not also be described as being the United States. In order to connect a wider range of debates
6 Though compared to nationally representative samples, MTurk workers tend to be younger and more ideologically
liberal.
7 While there has been some initial wariness regarding online experiments, many famous and well-known behavioral
studies have been replicated using MTurk. For more on this, see Mason and Suri (2010); Buhrmester, Kwang and
Gosling (2011); Rand (2012). For a di↵erent viewpoint, see Krupnikov and Levine (2014), though their caution applies
only to very specific kinds of experimental studies.
8 Reproduced in Appendix §3.
9 Importantly, all of the crises participants are presented with concern territorial disputes, in order to hold the
meaning of “standing firm” constant. Future research should examine assessments of resolve in other types of disputes.
16
in IR, our dependent variable is an assessment of resolve rather than “credibility”, a narrower concept
that refers only to scenarios where explicit threats or promises are made; in our research design, we
look at whether democracies are able to threaten more credibly, for example, but we also look at
whether democracies are seen as more resolved in general.
Either before or after the main study, subjects answered a series of demographic questions,
including gender, age, education, Party ID, ideology, interest in politics and international a↵airs.10
3
Results and Discussion
For purposes of simplicity, in the discussions below we only focus on those crises where participants were either allies of the United States or adversaries of the United States, but not the United
States itself, an analysis we pursue in another paper. The results shown here thus represent foreign
policy disputes where respondents are third-party observers rather than affiliated with the actors
themselves, reflecting an important class of crises — the Suez Crisis, the Iran-Iraq War, recurring
Indo-Pakistan crises, and so on — in which Americans are onlookers rather than immediate participants. The quantities of interest are calculated from 8,090 choice tasks made by 1,995 participants
between 16,180 country profiles. We present our results in two stages. As an initial cut at the results
we present our Average Marginal Component E↵ects (AMCEs), which tell us how manipulating each
attribute shapes participants’ overall predictions as to which country in the dispute will stand firm
or back down. Then, we present more complex analyses that test for specific interaction e↵ects
derived from more complex hypotheses about reputation and signaling from the crisis bargaining
literature.
3.1
Average e↵ects
We begin our analysis by estimating the Average Marginal Component E↵ects (AMCEs), presented
with 95% clustered bootstrapped confidence intervals in Figure 2. These estimates tell us the
percentage change in the perceived likelihood that an actor with a particular attribute will be seen
as being more likely to stand firm in a foreign policy dispute. Since, with the exception of the past
behavior treatments (described in greater detail below), these quantities are calculated by averaging
over all of the other factor-level combinations, these quantities can be interpreted as conceptually
10 Demographic
questions are reproduced in Appendix §2.
17
18
Country A
Country B
In disputes like theses, countries either back down or stand firm.
If you had to choose between them, which of the two countries is more likely to stand firm in the current dispute?
The country has a very powerful military
In the current crisis, the country has made a public
threat that they will use force if the other country does
not back down.
In the current crisis, the country has yet to make any
statements or carry out any actions.
The country does not have a very powerful military
The last time this country was involved in an
international dispute, it initiated the crisis by issuing a
public threat to use force against an adversary of the
United States, and stood firm throughout the crisis. At
the time, the country was led by a di↵erent leader than
the one in the current dispute.
The last time this country was involved in an international
dispute, it initiated the crisis by issuing a public threat to
use force against an adversary of the United States, but
ultimately backed down. At the time, the country was led
by a di↵erent leader than the one in the current dispute.
The country is an adversary of the United States.
Experts describe the country’s stakes in the dispute as
high.
The leader recently took office; she had a long career in
the military.
Experts describe the country’s stakes in the dispute as
high.
The leader recently took office; he has served in the
military briefly.
The country is an ally of the United States.
The country is a democracy
The country is a democracy
Country B
Table 2: Sample Conjoint Choice
Given the information available, what is your best estimate about whether Country B will stand firm in this dispute, ranging form 0% to 100%?
[slider from 0-100]
Given the information available, what is your best estimate about whether Country A will stand firm in this dispute, ranging form 0% to 100%?
[slider from 0-100]
Military Capabilities
Current behavior
Previous behavior
in international
disputes
Foreign relations
Leader background
Interests in the dispute
Government
Country A
Figure 2: Predicting perceptions of resolve
Low Capabilities
Capabilities
& interests
High Capabilities
Low Stakes
Characteristics
High Stakes
Dictatorship
Regime
type
Democracy
Mixed
Relationship
with USA
Ally
Adversary
Established Leader
New Leader
Leader
characteristics
No Military Service
Some Military Service
Long Military Service
Female Leader
Male Leader
Target
Behavior
Initiator
Past behavior
in previous
foreign policy
crisis
Against ally
Against adversary
Different leader, stood firm
Same leader, stood firm
Same leader, backed down
Different leader, backed down
Current
behavior
Nothing
Mobilized troops
Public threat
-0.2
-0.1
0.0
0.1
Average marginal component effect (AMCE)
(Positive values equal greater resolve)
Figure 2 plots Average Marginal Component E↵ects (AMCEs) with 95% clustered bootstrapped confidence intervals.
Positive values indicate that the attribute increases the perceived likelihood that an actor will be more likely to stand
firm than its counterpart in the dispute, while negative values indicate that the attribute decreases the perceived
likelihood of an actor standing firm. All estimates should be interpreted relative to the “base category,” indicated by
the point without horizontal bars at x=0. Thus, for example, democracies are perceived as 4% less likely to stand
firm than dictatorships.
19
0.2
similar to the main e↵ects from a factorial experiment.
Capabilities and interests
We begin by looking at the e↵ect of state-level characteristics on perceptions of resolve, starting with
the results for the capabilities and interests at stake, shown at the top of Figure 2. Consistent with
Press’s (2005) “current calculus” hypothesis — which we test in greater detail below — we see that
observers indeed place great weight on an actor’s military capabilities and level of interests at stake in
the dispute. States with powerful militaries are high are perceived as 16.3% more likely to stand firm
than states with less powerful militaries, while states whose stakes in the dispute are described as high
are seen as 14.9% more likely to stand firm than states whose stakes in the dispute are described as
low. The substantively strong importance of these two characteristics is of interest not just because
of their prominence in theories in IR, but also because evolutionary theorists argue that selection
pressures have hard-wired humans and other animals with cognitive and behavioral mechanisms
geared towards assessing the strength and interest of others (Maynard Smith, 1974; Parker, 1974;
Parker and Rubenstein, 1981): studies have found humans are able to predict individuals’ strength
with relative accuracy based on their faces or voices alone (Sell et al., 2009), for example.
Regime type
Turning to the e↵ects of varying a state’s regime type, the results show that, in contrast, with
theories of democratic superiority in crisis bargaining, respondents saw democratic states 4.2% less
likely to stand firm than dictatorships; states with mixed regime types in between democracies
and dictatorships were seen as 1.4% less likely to stand firm than dictatorships, but the di↵erence
between the two is not statistically significant. These findings are of interest given debates in IR
theory between “democratic triumphalists” and “defeatists” (Desch, 2008), the former touting the
superiority of democracies both in terms of their likelihood of winning the wars they fight (Lake, 1992)
and how they choose which wars to get into (Reiter and Stam, 2002), and the latter emphasizing
how the mercurial and idealistic public threatens the viability of democratic foreign policy (Kennan,
1951; Lippmann, 1955). In his model of the informational e↵ects of democratic institutions in crisis
bargaining, Schultz (1999) finds that foreign rivals should be more likely to back down when facing
democracies in a dispute, out of the belief that democratic institutions and a free press should make
democracies less likely to blu↵. Our results here fail to find support for that view, and are consistent
20
instead with the apprehension voiced by Kennan (1951). At least as far as members of the public
are concerned, democratic states should be less likely to stand firm than non-democratic ones.
Alliances
We then turn to the country’s relationship with the United States, to see whether participants
perceive American allies or adversaries as being inherently more resolute. Interestingly, participants
perceive American adversaries as being significantly less resolute than American allies, with the
former seen as 6.3% less likely to stand firm. These results could simply be due to a kind of motivated
reasoning predicted by balance theory (Heider, 1958), in which participants have positive evaluations
of America, positive evaluations of American allies, and positive evaluations of demonstrating resolve,
such that participants will see America’s friends as more likely to stand firm than America’s foes;
participants could also perceive America’s allies as more resolved as more likely to stand firm because
they are allied with the unipolar power. Nonetheless, given the large and prominent scholarly
literature on buck-passing and alliance reliability (e.g., Snyder, 1984; Christensen and Snyder, 1990;
Leeds, 2003), it is noteworthy that our participants seem relatively unconcerned in this regard.
Leader characteristics
In general, the results above showed substantively strong results for the e↵ects of country-level
characteristics on assessments of resolve. An examination of leader characteristics o↵ers mixed
results. Against Wolford (2007) and other arguments suggesting that new leaders have a greater
incentive to stand firm and develop a reputation for resolve in their early years in office, we find that
new leaders are perceived as slightly (1.5%) less likely to stand firm compared to ones who have been
in power for many years. Its relatively weak substantive e↵ect, however, suggests that experience
plays less of a role in assessing resolve than the other attributes discussed above. Similarly weak
are the e↵ects of gender: participants are no more likely to attribute resolve to male leaders than
female ones. The null results of gender are noteworthy given the prominent role that evolutionary
psychologists attribute masculinity in our conceptions of leadership qualities in military contexts
(Spisak et al., 2012), and could be due to perceived selection e↵ects, in which female politicians who
are able to come to power are seen as no di↵erent than their male counterparts.11
11 It is also possible that the null results of the gender treatment are due to the subtlety of the treatment, but since
experimental work on prejudice and discrimination in other contexts employs manipulations of similar dosage, we find
the selection e↵ect argument more plausible.
21
The one leader-level characteristic that participants do use as a heuristic to predict resolve is
military experience: countries governed by leaders with extensive military experience are seen as
7.8% more resolved than those without any military experience; even those leaders with only brief
time in the military get a boost in perceived resolve, of 3.3%. These results are thus consistent
with work by Horowitz and Stam (2012) showing the distinctive foreign policy behavior of leaders
with military experience, although unlike in their work, we do not distinguish between leaders with
military backgrounds who have combat experience, and leaders with military backgrounds who do
not. In general, though, the situational features of the crisis in terms of the balance of capabilities
and interests between the belligerents contributes more to perceptions of resolve than characteristics
of leaders themselves. Political scientists are often criticized for our exceptionalism in downplaying
the role that individual leaders matter in world politics (Byman and Pollack, 2001). Regardless of
the extent to which leaders matter in international politics (Jervis, 2013), it is of interest that at
least with the characteristics we manipulate here, ordinary citizens tend to ascribe a greater role to
broader structural factors than leader characteristics.12
Past behavior
Past behavior represents a theoretically important set of attributes. In the experiment, we manipulated the country’s previous behavior in a foreign policy dispute in four di↵erent ways. First, we
described whether it initiated the crisis by making a public threat, or was instead the target of
the threat. Second, we manipulated whether the state the country was facing in the previous crisis
was itself an ally or adversary of the United States. Third, we varied the outcome of the previous
crisis: whether the country ultimately stood firm, or backed down. Finally, we identified whether
the leader of the country at the time of the dispute is the same leader as the one in the current crisis,
or whether a di↵erent leader is currently in charge. In supplementary analyses in the appendix we
interact all four of these past behavior variables with one another, but for ease of interpretation in
the analysis presented in Figure 2 we reduce the number of quantities of interest and only interact
the previous outcome and leader identity treatments, while presenting the average e↵ects of the target/initiator and against ally/against adversary treatments. While neither of the latter significantly
a↵ects perceptions of resolve, we see theoretically interesting e↵ects for the former.
12 Of course, as the survey instrument was hypothetical, it is also possible that real leaders with whom subjects
were familiar would yield di↵erent results.
22
First, states that backed down in their previous dispute are seen as significantly less resolved in
the current crisis, o↵ering evidence that participants are indeed drawing reputational inferences from
past behavior. If the leader in charge was the same leader as in the current crisis, backing down
corresponds with an 11% decrease in the perceived likelihood of standing firm, while a di↵erent
leader backing down causes a 9.1% decrease in the perceived likelihood of standing firm. Thus,
although it appears that backing down in past disputes is more informative under the same leader
than a di↵erent one, the di↵erences between the two e↵ects are not statistically significant. In
contrast, when the country stood firm in the previous dispute under the same leader, it is perceived
as being 4.9% more likely to stand firm than when the country stood firm in the previous dispute
under a di↵erent leader. Thus, we find evidence of leader-specific reputations for standing firm, but
not for backing down.These results are sensible — consistent with an analogical reasoning model
of reputational inference in which observers draw stronger inferences from past behavior the more
the previous context resembles the current one (Shannon and Dennis, 2007) — but not trivial.13
Jervis (1982, 12-13), for example, notes a reputation paradox in which states that have backed down
in the past may be more likely to stand firm in the future precisely to avoid incurring reputation
costs, which raises the prospect that observers will expect states “to follow retreats with displays of
firmness”, a pattern we do not see here (see also Nalebu↵, 1991).
Current behavior
Finally, we turn to the e↵ect of current behavior on perceptions of resolve. Consistent with the
literature on the informative value of public threats because of their ability to tie leaders’ hands
if they don’t follow through (Fearon, 1997), we see that countries that make public threats are
perceived as 6.4% more likely to stand firm. Interestingly, though, participants see sunk costs in
terms of troop mobilizations to be a more credible signal, with states that mobilize troops seen as
12% more likely to stand firm. Our participants don’t see talk as cheap, then, but they do see it as
less costly than other signals leaders can send.14
13 Following an analogical reasoning model, we would expect to find past behavior to be seen as even more predictive
of future behavior in cases where the opponent in the previous crisis is the same as in the current one.
14 Sinking costs and tying hands are ideal types of costly signals, but in reality many forms of signals involve a
combination of the two. For example, a state that it is publicly mobilizing its troops can be seen not only as incurring
sunk costs, but also as tying its hands, since the state can incur reputational costs for backing down. Similarly,
as Slantchev (2005) argues, such military signals can also create bargaining advantages for the mobilizing side by
shifting the balance of power in the crisis situation; our results are thus consistent with Sechser and Post (2015) on
the superiority of muscle-flexing over hands tying.
23
3.2
Complex e↵ects
The analyses described above focus on the average e↵ects of each of the attributes we manipulate
in the experiment. They therefore tell us, on average, whether participants see states with powerful
militaries (for example) as more likely to stand firm than states with less powerful militaries, and
by how much.15 Many of our theories about crisis bargaining, however, suggest more complex
hypotheses that cannot be tested by simply drawing comparisons between main e↵ects, and require
us to specify higher-order interactions between our causal factors. Theories of reputation in IR, for
example, typically involve asking questions like “under what circumstances do past actions lead to
reputations for resolve?”, which requires us to study how the e↵ects of past behavior are moderated
by these other factors.
First, Figure 3 tests the Current Calculus hypothesis, which we operationalize by interacting a
state’s capabilities, interests, and outcome of the previous conflict. We can think of two versions of
the Current Calculus hypothesis. One is a soft version, in which states do not simply look at past
behavior, but also turn to current capabilities and interests when calculating credibility. The other
is a hard hypothesis, that suggests that the e↵ects of capabilities and interests trump past behavior
altogether. As the quantities in Figure 3 illustrate, we find some evidence in favor of the soft version
of the hypothesis. Observers indeed look towards capabilities and interests in calculating credibility:
for example, a country that stood firm in the past, but has low capabilities and low stakes, is seen
as nearly 20% less likely to stand firm than a country that backed down in the past, but has high
capabilities and high stakes. Thus, the current balance of power and interests can counterbalance
previous behavior; states that backed down in the past are not doomed to appear irresolute in future
disputes.
At the same time, however, past behavior still matters, and observers do draw inferences from
previous actions. At every combination of capabilities and interests, participants see standing firm in
the past as significantly boosting the likelihood of standing firm in the present, and the informative
value of past behavior does not vary across di↵erent levels of current capabilities and interests.16
Second, Figure 4 tests the attribution theory hypothesis which holds that observers are more
15 Supplementary analyses in the appendix also investigate whether the magnitude of our treatment e↵ects varies
across participant characteristics. Importantly, we show that the treatment e↵ects are generally consistent across
di↵erent subgroups of our sample: high- and low-educated respondents, Democrats and Republicans, doves and
hawks all tend to employ similar heuristics when assessing resolve in these disputes.
16 Indeed, BIC scores and likelihood ratio tests suggest the interactive model does not fit the data significantly better
than an additive one.
24
Figure 3: Testing the current calculus hypothesis: past behavior matters, but its e↵ects can be
overcome
Backed down, high capabilities, high stakes
Backed down, low capabilities, high stakes
Backed down, high capabilities, low stakes
Backed down, low capabilities, low stakes
Stood firm, high capabilities, high stakes
Stood firm, low capabilities, high stakes
Stood firm, high capabilities, low stakes
Stood firm, low capabilities, low stakes
-0.4
-0.2
0.0
Average marginal component effect (AMCE)
Figure 3 plots Average Marginal Component E↵ects (AMCEs) with 95% clustered bootstrapped confidence intervals.
Positive values indicate that the combination of attributes increases the perceived likelihood that an actor will stand
firm, while negative values indicate that the combination of attributes decreases the perceived likelihood of an actor
standing firm. The results o↵er partial support for the current calculus hypothesis: although actors do draw inferences
about resolve based on whether an actor stood firm or backed down in the past, it isn’t the only factor: if an actor
backed down in the past dispute but its capabilities and stakes are both high, observers will still perceive it as being
significantly more resolved than an actor that stood firm in the past dispute and has high capabilities but low stakes.
However, past behavior is always seen as informative, regardless of current capabilities and stakes. The quantities of
interest here are calculated by interacting whether an actor stood firm or backed down in the previous crisis, by its
level of capabilities and interests. The same pattern of results is obtained if we only include cases where the actor
makes a public threat, or only include cases where the actor either makes a public threat or mobilizes troops.
25
0.2
Figure 4: Testing the attribution theory hypothesis: actors can still build reputations with desirable
behavior
Ally, backed down
Adversary, backed down
Adversary, stood firm
Ally, stood firm
-0.2
-0.1
0.0
0.1
0.2
Average marginal component effect (AMCE)
Figure 4 plots Average Marginal Component E↵ects (AMCEs) with 95% clustered bootstrapped confidence intervals.
Positive values indicate that the combination of attributes increases the perceived likelihood that an actor will stand
firm, while negative values indicate that the attribute decreases the perceived likelihood of an actor standing firm.
The results fail to o↵er support for the attribution theory hypothesis: allies who stand firm indeed gain reputations
for resolve, while adversaries who back down gain reputations for irresolution. The quantities of interest here are
calculated by interacting whether the actor is an ally or adversary of the US by whether the actor stood firm or
backed down in the previous crisis. The same pattern of results is obtained if we further interact the attributes by
whether the country’s previous crisis was against an ally or adversary of the US: thus, adversaries who back down
against US allies gain the same reputation for irresolution that adversaries who back down against other US adversaries
do, while allies who stand firm against US adversaries gain the same reputation for resolve that allies who stand firm
against other US allies do.
26
likely to make situational attributions for desirable actions, and dispositional ones for undesirable
ones, thereby implying that adversaries will be unable to lose reputations for resolve by backing
down, while allies will be unable to gain reputations for resolve by standing firm. We operationalize
this hypothesis by interacting the outcome of the previous crisis with whether the country was an
ally or adversary of the United States, and bolding the two relevant quantities of interest in Figure
4.
As the figure illustrates, we fail to find evidence in favor of the attribution theory hypothesis:
allies who stood firm in the past indeed gain reputations for resolve and are seen as more likely
to stand firm in the current crisis, while adversaries who backed down in the past indeed gain
reputations for irresoluteness and are seen as less likely to stand firm in the current crisis. In
supplementary analyses, we operationalize the concept of “desirability” further by interacting the
ally/adversary and past outcome treatments with the identity of the other state in the previous
dispute, letting us explicitly test whether adversaries who back down against allies in the past develop
di↵erent reputations for resolve than adversaries who backed down against other adversaries, and
so on. The results hold, indicating that the e↵ects of standing firm or backing down in the past is
not moderated by the relationship of the previous opponent in the dispute to the United States.
Finally, Figure 5 tests the democratic credibility hypothesis which holds that democracies —
compared to their non-democratic counterparts — are better able to harness the signaling advantages
of domestic audience costs and issue more credible threats. We operationalize the hypothesis by
dropping those observations where countries mobilize troops — leaving only those where counties
either did nothing or issued a threat — and interact the country’s regime type with whether it
issued a public threat. Our key quantities of interest are bolded in Figure 5. The results show that
regimes of all types are able to credibly threaten. Importantly, though, democracies do not appear
to display any unique advantages in this regard: democracies who issue threats are not seen as any
more likely to stand firm than dictatorships who issue threats, nor is the e↵ect of a public threat
larger for democracies than for dictatorships. Our results are thus consistent with Weeks (2008),
who argues that autocracies are also able to create audience costs, and Downes and Sechser (2012),
who find that the democratic advantage in crisis bargaining is overstated.
27
Figure 5: Testing the democratic credibility hypothesis: democracies are not seen as making more
credible threats than dictatorships
Dictatorship, nothing
Dictatorship, public threat
Mixed, nothing
Mixed, public threat
Democracy, nothing
Democracy, public threat
-0.2
-0.1
0.0
0.1
0.2
Average marginal component effect (AMCE)
Figure 5 plots Average Marginal Component E↵ects (AMCEs) with 95% clustered bootstrapped confidence intervals.
Positive values indicate that the combination of attributes increases the perceived likelihood that an actor will stand
firm, while negative values indicate that the attribute decreases the perceived likelihood of an actor standing firm. The
results fail to o↵er support for the democratic credibility hypothesis: although public threats increase the perceived
resolve of both democracies and dictatorships relative to when actors do not issue threats, democracies do not get a
bigger credibility boost from public threats than dictatorships do. The quantities of interest here are calculated by
interacting an actor’s regime type with whether an actor issued a public threat or not in the foreign policy crisis,
dropping those cases where the actor mobilized troops instead (a sunk cost rather than hands-tying signal, about
which democratic credibility theory is agnostic).
28
4
Conclusion
What heuristics do observers use to infer the resolve of their allies and adversaries? International
relations theory has highlighted, implicitly or explicitly, a plethora of indicators that should a↵ect
observers’ calculations of resolve. In this paper we have argued that, conceptually, the extant
studies on resolve inferences appear to cluster around two families of indicators. The first includes
behavioral indicators of resolve, in particular, the past and current crisis behavior of states. The
second cluster includes several state-level (i.e., capabilities, interests, alliances and regimes type) and
leader-specific (i.e. leaders’ experience, time in office and gender) characteristics. Using a conjoint
experimental design we test the relative importance of particular behavior-based and characteristicbased indicators in shaping observers’ inferences about resolve of adversaries and allies — and in so
doing, test observable implications from a wide range of theoretical frameworks across the discipline.
Our analysis reveals a great deal about how ordinary citizens make inferences about resolve
in international crises when faced with a contextually rich information environment. For example,
we find that contrary to theories of democratic triumphalism, observers do not seem to attach
much weight to the regime type of countries when inferring resolve, and tend to see democracies
as significantly less resolved than dictatorships. Capabilities and interests, however, stood out as
important factors: a state whose stakes were high in a crisis were seen as 15% more likely to stand
firm than a state with a lower level of interest. We also found evidence for the importance of
leader characteristics, though, not all of them: while time in office did not impact estimates of
resolve, military experience had a significant e↵ect, roughly half the magnitude of that of high
stakes, traditionally seen as one of the more powerful explanations in the IR literature.
In favor of the “confident theoretical beliefs” of many IR scholars (Dafoe, Renshon and Huth,
2014, 384) and against the “reputation paradox”, we do find strong evidence that past actions a↵ect
current estimates of resolve; past actions speak about as loudly as present ones. And, in keeping
with other work on this topic, we find additional evidence that reputations do not attach only to
the state; in fact, past actions undertaken with the same leader are more informative of current
resolve than past actions taken prior to a leadership turnover (Renshon, Dafoe and Huth, 2015).
We also find that countries are able to use public threats to increase estimates of their likelihood of
standing firm, though costlier actions (such as mobilizing troops) are even more e↵ective as a tool
for manipulating assessments of resolve.
29
By carefully designing appropriate statistical tests, we were also able to o↵er insight into a
number of interactionist theories of the assessment of resolve. For example, we found evidence in
favor of a soft version of Press’ (2005) “current calculus” theory of resolve: observers look to interests
and capabilities as heuristics in assessing resolve, but past behavior matters as well. And while we
fail to find evidence in favor of an attributional theory of reputation — allies seem to be able to
generate reputations for resolve and adversaries can be seen as “irresolute” — we also find evidence
in favor of Weeks’ (2008) claim that democracies and autocracies should not di↵er in their abilities
to generate audience costs.
Some of the results we see here are likely shaped by the current international political climate:
participants’ democratic defeatism, for example, also reflects an era in which authoritarian states
are increasingly flexing their muscles on the world stage (Gat, 2007). Yet it is similarly difficult
to disentangle IR scholars’ enchantment with democratic triumphalism throughout the 1990s from
the buoyant “end of history” in which they were writing. In this sense, we might expect that the
weight observers place on the factors we explore here are likely to shift over time. Future work
should also explore how they shift across space; the fact that we detect democratic defeatism among
American respondents, for example, who are likely to be highly socialized to venerate the superiority
of democratic political institutions, suggests we might expect even stronger negative e↵ects for regime
type in other states.
Our interest here was in exploring the heuristics that ordinary citizens use in assessing resolve in
disputes, as observers, which we believe provide useful foundations for future work on resolve in IR.
Indeed, one of the striking patterns in our results was the extent to which the three variables with
the substantively largest e↵ects — capabilities, stakes, and mobilization — are also three of the core
ingredients of rational deterrence theory. Ordinary citizens, without ever having read Brodie (1959);
Snyder (1961) or Huth and Russett (1984), seem to intuitively carry a “folk” version of deterrence
theory around in their heads (Kertzer and McGraw, 2012).
If we were interested in extending our results to elite decision-makers engaged in crisis decisionmaking in future work, there are four issues to consider. The first and most obvious issue is the
nature of the sample itself. While Press (2004), Hopf (1994) and Weisiger and Yarhi-Milo (2015)
leverage historical accounts of political leaders making inferences about resolve, we use a sample of
ordinary citizens. While we see this as the only appropriate first step in experimental designs, future
work should seek to extend these findings on elite samples, a question we are currently investigating.
30
However, we caution those expecting to harness the magic of elite samples: such designs are only
appropriate once sufficient progress has been made to establish that elites di↵er from the general
public on dimensions that are relevant to the concept being studied (such as personal power, Renshon,
2015). After all, even if elites and ordinary citizens di↵er on a variety of attributes (Hafner-Burton,
Hughes and Victor, 2013), these dispositional di↵erences should only a↵ect our experimental results
if they moderate our treatment e↵ects (Druckman and Kam, 2011). Moreover, even elite samples
rarely include the high-level decision-makers (such as Presidents and Prime Ministers) that are truly
the focus of our theories, thus requiring even elite studies to grapple with the issues of extrapolation
and generalizability.17
Second, leaders routinely make decisions in groups (Allison, 1971; Steinbruner, 1974), while our
participants participated in the study as individuals. Third, crisis decision-making environments
are characterized by time pressures, which were largely absent in our study. Although we are not
aware of any evidence that small group environments change the extent to which observers rely on
actor-based rather than behavior-based heuristics, and there is little theoretical guidance from the
decision-making literature about which types of cues should receive greater weight as time pressure
increases (beyond order e↵ects that conjoint experiments are designed to mitigate), future work
would benefit from such an investigation.
Fourth, and relatedly, the stakes for decision-makers in foreign policy crises are high, such that
decision-makers should have strong incentives for their assessments to be accurate (Tetlock, 1998).
Indeed, Press (2005, 6) contends that in high-stakes military situations, “people move beyond the
quick and dirty heuristics that serve them so well in mundane matters,” which implies that any
findings from an artificial experimental context might be difficult to apply to the realm of foreign
policy decision-making. Two points are worth keeping in mind here. First, from a psychological
perspective the current credibility calculus Press points to is no less of a heuristic than a past actions
are; they are both mental shortcuts used to simplify a complex information environment, and neither
one is more or less “rational.”18 As before, then, since we have no decision-making theories that tell
us how the relative contribution of di↵erent heuristics changes with levels of stakes, the question
is ultimately empirical rather than theoretical, and merits exploration in future research. Second,
although experimental research suggests high stakes do not impose anything like rationality upon
17 For
18 For
a thoughtful discussion of the nature of our samples in IR experiments, see Hyde (2015).
a discussion of the psychological microfoundations of rationality, see Rathbun, Kertzer and Paradis (2014).
31
individual decision-making (Ariely et al., 2009), there is some evidence that low and moderate stakes
induced through financial incentives can have salutary e↵ects (Camerer and Hogarth, 1999). However, establishing such an incentive structure in an experimental context requires a comprehensive
theory of how observers should assess resolve — the absence of which was one of the motivations behind this research in the first place. More generally, since our findings speak to how observers assess
resolve, rather than how participants in the disputes themselves do, future work should investigate
whether actors and observers rely on di↵erent mechanisms when it comes to drawing inferences
about resolve, a question we explore in a companion piece.
Finally, we note that experimental designs such as this one can be particularly useful in formalizing, isolating and testing multi-dimensional theories in international relations, without the concerns
about endogeneity, collinearity, and aggregation that IR scholars must wrestle with when they use
observational data (Tomz and Weeks, 2013). We drew from a varied and rich literature in reputations, credibility and resolve in IR to construct our conceptual framework. However, it would be
a shame were the feedback loop to end here. One of the most useful directions for future research
on this topic — in addition to conceptual replications and extensions of this experiment — would
be for the results to feed back into future case studies on assessing resolve in IR. In that way, our
theories of resolve will accumulate the advantages of each particular method and data source while
minimizing the harms of each method’s weaknesses.
32
References
Allison, Graham. 1971. Essence of Decision. Boston: Little Brown and Company.
Anzia, Sarah F and Christopher R Berry. 2011. “The Jackie (and Jill) Robinson E↵ect: Why Do
Congresswomen Outperform Congressmen?” American Journal of Political Science 55(3):478–
493.
Ariely, Dan, Uri Gneezy, George Loewenstein and Nina Mazar. 2009. “Large stakes and big mistakes.” The Review of Economic Studies 76(2):451–469.
Arregu´ın-Toft, Ivan. 2001. “How the Weak Win Wars: A Theory of Asymmetric Conflict.” International Security 26(1):93–128.
Bak, Daehee and Glenn Palmer. 2010. “Testing the Biden Hypotheses: Leader Tenure, Age, and
International Conflict.” Foreign Policy Analysis 6(3):257–273.
Berinsky, Adam J, Gregory A Huber and Gabriel S Lenz. 2012. “Evaluating online labor markets
for experimental research: Amazon.com’s mechanical turk.” Political Analysis 20(3):351–368.
Brodie, Bernard. 1959. “The Anatomy of Deterrence.” World Politics 11(2):173–191.
Buhrmester, M., T. Kwang and S.D. Gosling. 2011. “Amazon’s mechanical turk.” Perspectives on
Psychological Science 6(1):3–5.
Byman, Daniel L. and Kenneth M. Pollack. 2001. “Let Us Now Praise Great Men: Bringing the
Statesman Back In.” International Security 25(4):107–146.
Camerer, Colin F. and Robin M. Hogarth. 1999. “The e↵ects of financial incentives in experiments.”
Journal of Risk and Uncertainty 19(1):7–42.
Caprioli, Mary and Mark A Boyer. 2001. “Gender, violence, and international crisis.” Journal of
Conflict Resolution 45(4):503–518.
Christensen, Thomas J. and Jack Snyder. 1990. “Chain Gangs and Passed Bucks: Predicting Alliance
Patterns in Multipolarity.” International Organization 44(2):137–168.
Clare, Joe and Vesna Danilovic. 2012. “Reputation for Resolve, Interests, and Conflict.” Conflict
Management and Peace Science 29(1):3–27.
Clutton-Brock, T.H. and S.D. Albon. 1979. “The Roaring of Red Deer and the Evolution of Honest
Advertisement.” Behaviour 69(3/4):145–170.
Copeland, Dale C. 1997. “Do Reputations Matter?” Security Studies 7(1):33–71.
Dafoe, Allan. 2012. Resolve, Reputation, and War: Cultures of Honor and Leaders’ Time-in-Office
PhD thesis University of California, Berkeley.
Dafoe, Allan, Jonathan Renshon and Paul Huth. 2014. “Reputation and Status as Motives for War.”
Annual Review of Political Science 17:371–393.
Danilovic, Vesna. 2001. “The Sources of Threat Credibility in Extended Deterrence.” Journal of
Conflict Resolution 45(3):341–369.
Davies, Graeme AM and Robert Johns. 2013. “Audience costs among the British public: the
impact of escalation, crisis type, and prime ministerial rhetoric.” International Studies Quarterly
57(4):725–737.
33
Davies, N.B. 1978. “Territorial Defence in the Speckled Wood Butterfly (Pararge Aegeria): The
Resident Always Wins.” Animal Behavior 26:138–147.
Davies, N.B. and T.R. Halliday. 1978. “Deep croaks and fighting assessment in toads Bufo bufo.”
Nature 274:683–685.
Desch, Michael C. 2008. Power and Military E↵ectiveness: The Fallacy of Democratic Triumphalism.
Baltimore, MD: The Johns Hopkins University Press.
Downes, Alexander and Todd Sechser. 2012. “The Illusion of Democratic Credibility.” International
Organization 66(3):457–489.
Druckman, James N. and Cindy D. Kam. 2011. Students as Experimental Participants: A Defense
of the “Narrow Data Base”. In Cambridge Handbook of Political Science, ed. James N. Druckman,
Donald P. Green, James H. Kuklinski and Arthur Lupia. Cambridge University Press pp. 41–57.
Eliades, George C. 1993. “Once More unto the Breach: Eisenhower, Dulles, and Public Opinion
during the O↵shore Islands Crisis of 1958.” Journal of American-East Asian Relations pp. 343–
367.
Fearon, James. 1997. “Tying Hands versus Sinking Costs: Signaling Foreign Policy Interests.”
Journal of Conflict Resolution 41(1):68–90.
Fearon, James D. 1994. “Domestic Political Audiences and the Escalation of International Disputes.”
American Political Science Review 88(3):577–592.
Ferreira, Fernando and Joseph Gyourko. 2014. “Does gender matter for political leadership? The
case of US mayors.” Journal of Public Economics 112:24–39.
Foyle, Douglas C. 1999. Counting the public in: Presidents, public opinion, and foreign policy.
Columbia University Press.
Ganswindt, Andr´e, Henrik B. Rasmussen, Michael Heistermann and J. Keith Hodges. 2005. “The
sexually active states of free-ranging male African elephants (Loxodonta africana): defining musth
and non-musth using endocrinology, physical signals, and behavior.” Hormones and Behavior
47(1):83–91.
Gat, Azar. 2007. “The Return of Authoritarian Great Powers.” Foreign A↵airs 86(4):59–69.
Gelpi, Christopher and Joseph M. Grieco. 2001. “Attracting Trouble: Democracy, Leadership
Tenure, and the Targeting of Militarized Challenges, 1918-1992.” Journal of Conflict Resolution
45(6):794–817.
Gelpi, Christopher and Joseph M. Grieco. 2014. “Competency Costs in Foreign A↵airs: Presidential
Performance in International Conflicts and Domestic Legislative Success, 1953–2001.” American
Journal of Political Science Forthcoming.
Hafner-Burton, Emilie M., Alex Hughes and David G. Victor. 2013. “The Cognitive Revolution and
the Political Psychology of Elite Decision Making.” Perspectives on Politics 11(2):368–386.
Hainmueller, Jens and Daniel J Hopkins. 2014. “The hidden american immigration consensus: A
conjoint analysis of attitudes toward immigrants.” American Journal of Political Science Forthcoming.
Hainmueller, Jens, Daniel J Hopkins and Teppei Yamamoto. 2014. “Causal inference in conjoint
analysis: Understanding multidimensional choices via stated preference experiments.” Political
Analysis 22(1):1–30.
34
Hainmueller, Jens, Dominik Hangartner and Teppei Yamamoto. 2015. “Validating vignette and
conjoint survey experiments against real-world behavior.” Proceedings of the National Academy
of Sciences .
Healy, Andrew and Gabriel S Lenz. 2014. “Substituting the End for the Whole: Why Voters Respond
Primarily to the Election-Year Economy.” American Journal of Political Science 58(1):31–47.
Heider, Fritz. 1958. The Psychology of Interpersonal Relations. New York: Wiley.
Herrmann, Richard K. and Michael P. Fischerkeller. 1995. “Beyond the Enemy Image and Spiral
Model: Cognitive-Strategic Research After the Cold War.” International Organization 49(3):415–
50.
Holbrook, Colin and Daniel M.T. Fessler. 2013. “Sizing up the threat: The envisioned physical
formidability of terrorists tracks their leaders’ failures and successes.” Cognition 127:46–56.
Hopf, Ted. 1991. Soviet Inferences from Their Victories in the Periphery: Visions of Resistance or
Cumulating Gains? In Dominoes and Bandwagons: Strategic Beliefs and Great Power Competition
in the Eurasian Rimland, ed. Robert Jervis and Jack Snyder. Oxford: Oxford University Press
pp. 145–189.
Hopf, Ted. 1994. Peripheral visions: Deterrence theory and American foreign policy in the third
world, 1965-1990. Ann Arbor, MI: University of Michigan Press.
Horowitz, Michael C. and Allan C Stam. 2012. “How Prior Military Experience Influences The
Future Militarized Behavior of Leaders.” International Organization 68(3):527–559.
Horowitz, Michael, Rose McDermott and Allan C Stam. 2005. “Leader age, regime type, and violent
international relations.” Journal of Conflict Resolution 49(5):661–685.
Hu↵, Connor and Dustin H. Tingley. 2014. “Who Are These People?: Evaluating the Demographics
Characteristics and Political Preferneces of MTurk Survey Respondents.” Working Paper.
Huth, Paul and Bruce Russett. 1984. “What Makes Deterrence Work? Cases from 1900 to 1980.”
World Politics 36(4):496–526.
Huth, Paul K. 1999. “Deterrence and International Conflict: Empirical Findings and Theoretical
Debates.” Annual Review of Political Science 2:25–48.
Hyde, Susan D. 2015. “Experiments in International Relations: Lab, Survey, and Field.” Annual
Review of Political Science Forthcoming.
Jervis, Robert. 1970. The Logic of Images in International Relations. Princeton, NJ: Princeton
University Press.
Jervis, Robert. 1982. “Deterrence and Perception.” International Security 7(3):3–30.
Jervis, Robert. 2013. “Do Leaders Matter and How Would We Know?” Security Studies 22(2):153–
179.
Jervis, Robert, Richard Ned Lebow and Janice Gross Stein. 1985. Psychology and Deterrence.
Baltimore: John Hopkins University Press.
Johnson, Dominic DP, Rose McDermott, Emily S Barrett, Jonathan Cowden, Richard Wrangham,
Matthew H McIntyre and Stephen Peter Rosen. 2006. “Overconfidence in wargames: experimental
evidence on expectations, aggression, gender and testosterone.” Proceedings of the Royal Society
B: Biological Sciences 273(1600):2513–2520.
35
Kam, Cindy D and Elizabeth N Simas. 2010. “Risk orientations and policy frames.” The Journal
of Politics 72(2):381–396.
Kennan, George F. 1951. American Diplomacy, 1900-1950. Chicago: University of Chicago Press.
Kertzer, Joshua D. 2015. “Resolve in International Politics.” Book manuscript.
Kertzer, Joshua D. and Kathleen M. McGraw. 2012. “Folk Realism: Testing the Microfoundations
of Realism in Ordinary Citizens.” International Studies Quarterly 56(2):245–258.
Kertzer, Joshua D. and Ryan Brutger. 2015. “Decomposing Audience Costs: Bringing the Audience
Back into Audience Cost Theory.” American Journal of Political Science Forthcoming.
Kriner, Douglas L. and Francis X. Shen. 2014. “Reassessing American Casualty Sensitivity: The
Mediating Influence of Inequality.” Journal of Conflict Resolution 58(7):1174–1201.
Krupnikov, Yanna and Adam Seth Levine. 2014. “Cross-Sample Comparisons and External Validity.”
Journal of Experimental Political Science 1(1):59–80.
Lake, David A. 1992. “Powerful Pacifists: Democratic States and War.” American Political Science
Review 86(1):24–37.
Lebow, Richard Ned. 1998. “Beyond Parsimony: Rethinking Theories of Coercive Bargaining.”
European Journal of International Relations 4(1):31–66.
Leeds, Brett Ashley. 2003. “Alliance Reliability in Times of War: Explaining State Decisions to
Violate Treaties.” International Organization 57(4):801–827.
Levendusky, Matthew S. and Michael C. Horowitz. 2012. “When Backing Down is the Right Decision:
Partisanship, New Information, and Audience Costs.” Journal of Politics 74(2):323–338.
Levy, Jack S., Michael K. McKoy, Paul Poast and Geo↵rey PR Wallace. 2015. “Commitments and
Consistency in Audience Costs Theory.” Working paper.
Lippmann, Walter. 1955. Essays in the Public Philosophy. Boston: Little, Brown.
Lopez, Anthony C., Rose McDermott and Michael Bang Petersen. 2011. “States in Mind: Evolution,
Coalitional Psychology, and International Politics.” International Security 36(2):48–83.
Mason, W. and S. Suri. 2010. “Conducting behavioral research on Amazon’s Mechanical Turk.”
Behavior Research Methods pp. 1–23.
May, Ernest R. 1961. Imperial Diplomacy: The Emergence of America as a Great Power. New York:
Harcourt, Brace and Company.
Maynard Smith, J. 1974. “The theory of games and the evolution of animal conflicts.” Journal of
Theoretical Biology 47(1):209–221.
McDermott, Rose, Dominic Johnson, Jonathan Cowden and Stephen Rosen. 2007. “Testosterone
and aggression in a simulated crisis game.” The Annals of the American Academy of Political and
Social Science 614(1):15–33.
McIntyre, Matthew H, Emily S Barrett, Rose McDermott, Dominic DP Johnson, Jonathan Cowden
and Stephen P Rosen. 2007. “Finger length ratio (2D: 4D) and sex di↵erences in aggression during
a simulated war game.” Personality and Individual Di↵erences 42(4):755–764.
Mercer, Jonathan. 1996. Reputation and international politics. Ithaca, NY: Cornell University Press.
36
Morrow, James D. 1989. “Capabilities, Uncertainty, and Resolve: A Limited Information Model of
Crisis Bargaining.” American Journal of Political Science 33(4):941–72.
Morrow, James D. 1994. “Alliances, credibility, and peacetime costs.” Journal of Conflict Resolution
38(2):270–297.
Nalebu↵, Barry. 1991. “Rational Deterrence in an Imperfect World.” World Politics 43(3):313–35.
Nisbett, Richard E. and Dov Cohen. 1996. Culture of Honor: The Psychology of Violence in the
South. Boulder, CO: Westview Press.
Parker, G.A. 1974. “Assessment strategy and the evolution of fighting behavior.” Journal of Theoretical Biology 47(1):223–243.
Parker, G.A. and D.I. Rubenstein. 1981. “Role Assessment, Reserve Strategy, and Acquisition of
Information in Asymmetric Animal Conflicts.” Animal Behavior 29:221–240.
Pietraszewski, David and Alex Shaw. 2015. “Not by Strength Alone: Children’s Conflict Expectations Follow the Logic of the Asymmetric War of Attrition.” Human Nature Forthcoming.
Powell, Robert. 2004. “Bargaining and Learning While Fighting.” American Journal of Political
Science 48(2):344–361.
Press, Daryl Grayson. 2004. “The Credibility of Power: Assessing Threats during the “Appeasement”
Crises of the 1930s.” International Security 29(3):136–169.
Press, Daryl Grayson. 2005. Calculating credibility: How leaders assess military threats. Ithaca, NY:
Cornell University Press.
Rand, D.G. 2012. “The promise of Mechanical Turk: How online labor markets can help theorists
run behavioral experiments.” Journal of Theoretical Biology 299(21):172–179.
Rathbun, Brian C., Joshua D. Kertzer and Mark Paradis. 2014. “Homo Diplomaticus: MixedMethod Evidence of Variation in Strategic Rationality.” Working paper.
Reiter, Dan and Allan C Stam. 2002. Democracies at War. Princeton, NJ: Princeton University
Press.
Renshon, Jonathan. 2015. “Losing Face and Sinking Costs: Experimental Evidence on the Judgment
of Political and Military Leaders.” International Organization .
Renshon, Jonathan, Allan Dafoe and Paul K. Huth. 2015. “To Whom do Reputations Adhere?
Experimental Evidence on Influence-Specific Reputations.” Working paper.
Schelling, Thomas C. 1960. The Strategy of Conflict. Cambridge, MA: Harvard University Press.
Schelling, Thomas C. 1966. Arms and Influence. New Haven, CT: Yale University Press.
Schultz, Kenneth A. 1999. “Do Democratic Institutions Constrain or Inform? Contrasting Two
Institutional Perspectives on Democracy and War.” International Organization 53(2):233–266.
Sechser, Todd S. and Abigail S. Post. 2015. “Hand-Tying versus Muscle-Flexing in Crisis Bargaining.” Working paper.
Sell, Aaron, John Tooby and Leda Cosmides. 2009. “Formidability and the logic of human anger.”
Proceedings of the National Academy of Sciences 106(35):15073–15078.
37
Sell, Aaron, Leda Cosmides, John Tooby, Daniel Sznycer, Christopher von Rueden and Michael
Gurven. 2009. “Human adaptations for the visual assessment of strength and fighting ability from
the body and face.” Proceedings of the Royal Society B: Biological Sciences 275:575–584.
Shannon, Vaughn P. and Michael Dennis. 2007. “Militant Islam and the Futile Fight for Reputation.”
Security Studies 16(2):287–317.
Slantchev, Branislav L. 2005. “Military Coercion in Interstate Crises.” American Political Science
Review 99(4):533–547.
Snyder, Glenn. 1961. Deterrence and Defense. Princeton, NJ: Princeton University Press.
Snyder, Glenn. 1984. “The Security Dilemma in Alliance Politics.” World Politics 36(4):461–95.
Snyder, Glenn H. and Paul Diesing. 1977. Conflict Among Nations: Bargaining, Decision Making,
and System Structure in International Crises. Princeton, NJ: Princeton University Press.
Snyder, Jack and Erica D. Borghard. 2011. “The Cost of Empty Threats: A Penny, Not a Pound.”
American Political Science Review 105(3):437–456.
Spisak, Brain R., Peter H. Dekker, Max Kr¨
uger and Mark van Vugt. 2012. “Warriors and Peacekeepers: Testing a Biosocial Implicit Leadership Hypothesis of Intergroup Relations Using Masculine
and Feminine Faces.” PLoS ONE 7(1):e30399.
Steinbruner, John D. 1974. The Cybernetic Theory of Decision. Princeton, NJ: Princeton University
Press.
Tetlock, Philip E. 1998. Social Psychology and World Politics. In The Handbook of Social Psychology,
ed. Daniel T. Gilbert, Susan T. Fiske and Gardner Lindzey. New York: McGraw-Hill pp. 868–912.
Tingley, Dustin H. and Barbara F. Walter. 2011. “The E↵ect of Repeated Play on Reputation
Building: An Experimental Approach.” International Organization 65(2):343–365.
Tomz, Michael. 2007. “Domestic audience costs in international relations: An experimental approach.” International Organization 61(4):821–840.
Tomz, Michael R and Jessica Weeks. 2013. “Public opinion and the democratic peace.” American
Political Science Review 107(4):849–865.
Trachtenberg, Marc. 2012. “Audience Costs: An Historical Analysis.” Security Studies 21(1):3–42.
Trager, Robert F and Lynn Vavreck. 2011. “The political costs of crisis bargaining: Presidential
rhetoric and the role of party.” American Journal of Political Science 55(3):526–545.
Tversky, Amos and Daniel Kahneman. 1981. “The framing of decisions and the psychology of
choice.” Science 211(4481):453–458.
Wallace, Geo↵rey PR. 2013. “International law and public attitudes toward torture: An experimental
study.” International Organization 67(01):105–140.
Weeks, Jessica L. 2008. “Autocratic Audience Costs: Regime Type and Signaling Resolve.” International Organization 62(1):35–64.
Weeks, Jessica L. 2012. “Strongmen and Straw Men: Authoritarian Regimes and the Initiation of
International Conflict.” American Political Science Review 106(2):326–347.
Weisiger, Alex and Keren Yarhi-Milo. 2015. “Revisiting Reputation: How Past Actions Matter In
International Politics.” International Organization Forthcoming.
Wolford, Scott. 2007. “The Turnover Trap: New Leaders, Reputation, and International Conflict.”
American Journal of Political Science 51(4):772–788.
38
How do Observers Assess Resolve?
Supplementary Appendix
April 7, 2015
1
Robustness checks
As we note in the main text, conjoint experimental designs rest on a small number of assumptions.
First, the stability and carryover e↵ects assumption, which mandates that potential outcomes remain
stable across periods — that is, a participant should choose a given country (conditional on its and
the other country’s attributes) regardless of what countries or choices they had seen previously.
Second, the no profile order e↵ects assumption, which posits that respondents’ choices would be the
same regardless of the order in which the two countries are presented within a choice task. Third
is successful randomization, which simply assumes that the attributes of each profile are randomly
generated. We test each of these assumptions in turn.
First, re-estimating the quantities of interest within each round suggest that the AMCEs are
stable across rounds, except for in the sixth round, where the treatment e↵ects are noticeably smaller
in that round than in all of its counterparts. These temporarily smaller e↵ect sizes are difficult to
explain theoretically and are unlikely to be due to respondent fatigue, since the treatment e↵ects in
subsequent rounds return to their previous levels. We replicate the results from the main analysis
in Figure 1, this time dropping the sixth round, and find that the results remain the same. Since
including the sixth round doesn’t substantively change our results (and if anything, renders them
slightly more conservative), we include all eight rounds in all of the empirical results reported in the
main text.
Second, Figure 2 tests the profile order assumption, showing that the results do not appear
to systematically di↵er based on whether a particular characteristic was presented as belonging
to country A or country B. Third, Table 1 presents the results from our randomization check,
showing that the treatment assignments are relatively well-balanced across a host of demographic
characteristics. Finally, Figure 3 tests whether the row order in which attributes were presented to
respondents within the experimental stimuli changes their results. Recall that although the order
of attributes was randomized across respondents, it was held constant across rounds for any given
respondent: that is, if a participant saw regime type in the first row of the table in the first round,
that participant saw regime type in the first row of the table throughout all seven subsequent rounds.
Figure 3 shows that there do not appear to be systematic di↵erences in the AMCEs by attribute
order: characteristics presented in the first row are not significantly stronger than those in the last
row, for example, and characteristics presented in the first and last rows are not systematically
di↵erent from those presented in the middle. Interestingly, although the row order randomization
is designed to prevent treatments featured more prominently in the table from having a stronger
e↵ect, in the ”real world”, media outlets, political entrepreneurs, and political elites are likely to
play a role in ensuring people are more likely to receive some treatments than others. Future work
could therefore benefit from exploring these added layers to the information environment in which
observers are situated.
1.1
Dropping unusual dyadic combinations
As noted in the main text, the analyses presented above randomly generate characteristics of both
countries in each dispute, such that the leader characteristics of one country, for example, are
independent of the leader characteristics of the other. To preclude the possibility that unusual
dyadic combinations of characteristics are skewing the results, Figure 4 below replicates the results
from the main set of analyses, but dropping all disputes where two democracies faced o↵ against
one another, or two allies of the United States faced o↵ against one another. A comparison of this
restricted set of observations (in grey) with the unrestricted set from the main analyses (in black)
shows that the results hold regardless of whether the dyads are included or not — it appears that
allies are seen as relatively more resolute in the restricted model than the unrestricted one, but the
di↵erence is not statistically significant.
1.2
Fully saturated interactions for past behavior
The analysis of the e↵ects of past behavior in the main text presents interactions between past
actions (stood firm versus backed down) and the identity of the leader responsible (a di↵erent leader
2
Table 1: Randomization check
(Intercept)
Male
Age
Education
Party ID
Mil Assert
(1)
High capabilities
[-0.173, 0.129]
[-0.095, 0.032]
[-0.004, 0.002]
[-0.021, 0.024]
[-0.021, 0.233]
[-0.148, 0.192]
(2)
High stakes
[-0.173, 0.128]
[-0.028, 0.1]
[-0.001, 0.004]
[-0.042, 0.001]
[-0.137, 0.104]
[-0.091, 0.223]
(3)
New leader
[-0.05, 0.269]
[-0.055, 0.076]
[-0.003, 0.002]
[-0.032, 0.012]
[-0.133, 0.128]
[-0.278, 0.031]
(4)
Male leader
[-0.178, 0.114]
[-0.014, 0.111]
[-0.003, 0.002]
[-0.016, 0.029]
[-0.151, 0.096]
[-0.172, 0.152]
(Intercept)
Male
Age
Education
Party ID
Mil Assert
(5)
Adversary
[-0.002, 0.28]
[-0.039, 0.089]
[-0.003, 0.003]
[-0.043, 0.003]
[-0.265, -0.001]
[-0.206, 0.115]
(6)
Against adversary
[-0.182, 0.118]
[-0.046, 0.078]
[-0.002, 0.003]
[-0.021, 0.026]
[-0.149, 0.104]
[-0.162, 0.153]
(7)
Initiator
[-0.115, 0.201]
[-0.084, 0.037]
[-0.004, 0.002]
[-0.024, 0.022]
[-0.232, 0.027]
[-0.025, 0.32]
(8)
Backed down
[-0.15, 0.142]
[-0.034, 0.091]
[-0.006, 0]
[-0.011, 0.034]
[-0.082, 0.18]
[-0.075, 0.262]
(Intercept)
Male
Age
Education
Party ID
Mil Assert
(9)
Same leader
[-0.09, 0.21]
[-0.052, 0.07]
[-0.006, 0]
[-0.02, 0.024]
[-0.058, 0.193]
[-0.168, 0.164]
(10)
Democracy
[-0.291, 0.087]
[-0.018, 0.131]
[-0.001, 0.006]
[-0.022, 0.034]
[-0.191, 0.116]
[-0.222, 0.134]
(11)
Mixed
[-0.341, 0.001]
[0.027, 0.172]
[0, 0.007]
[-0.021, 0.038]
[-0.141, 0.139]
[-0.235, 0.147]
(12)
High service
[-0.087 , 0.269]
[-0.121, 0.028]
[-0.006, 0.002]
[-0.031, 0.025]
[0.001, 0.292]
[-0.354, 0.051]
(Intercept)
Male
Age
Education
Party ID
Mil Assert
(13)
Some service
[-0.107, 0.256]
[-0.06, 0.091]
[-0.001, 0.005]
[-0.051, 0.003]
[-0.068, 0.23]
[-0.423, -0.025]
(14)
Mobilized troops
[-0.221, 0.132]
[-0.015, 0.144]
[-0.002, 0.004]
[-0.053, -0.001]
[-0.113, 0.211]
[-0.093, 0.306]
(15)
Public threat
[-0.167, 0.197]
[-0.019, 0.135]
[-0.001, 0.005]
[-0.069, -0.014]
[-0.168, 0.17]
[-0.113, 0.282]
Note: Models 1- 9 depict quantile-based clustered boostrapped 95% confidence intervals from a
series of logistic regression models, while models 10-15 depict the quantile-based clustered
bootstrapped 95% confidence intervals from a series of multinomial logit models. The results
show the treatment assignment is relatively well-balanced.
3
than in the current dispute, versus the same leader as in the current dispute), but presents the other
two past action treatments (whether the state was the target or initiator, and whether the opponent
in the previous crisis was an ally or adversary of the US) as average e↵ects rather than estimate
a more complex four-way interaction. Figure 5 replicates the results from the main text, but this
time with the fully-saturated four-way interaction. The results are harder to interpret because of
the larger number of moving parts, but lends itself to the same substantive conclusion: behavior
in a previous crisis is seen as more informative of present behavior if led by the same leader as in
the current crisis than a di↵erent leader. In contrast, whether the state was the target or initiator,
or was facing an ally or adversary of the US, does not appear to meaningfully interact with past
actions in shaping assessments of resolve.
1.3
Estimating heterogeneous treatment e↵ects
Figures 6-9 look for heterogeneous treatment e↵ects across four di↵erent sets of characteristics: education (do respondents with a college degree rely on di↵erent heuristics than those without one?),
militant internationalism (do hawks use di↵erent heuristics than doves?), partisanship (do Republicans use di↵erent heuristics than Democrats?), and interest in foreign a↵airs (do particularly engaged
respondents employ di↵erent heuristics than respondents who are not as interested in international
politics?).1 The results find relatively little evidence of treatment heterogeneity: in Figure 6 the loweducated respondents (in grey) and high-educated respondents (in black) seem to use remarkably
consistent heuristics: more educated respondents are slightly more pessimistic about democracies,
for example, but the overall results remain the same. Similarly, in Figure 7, doves (in grey) and
hawks (in black) also display strikingly similar results. Hawks appear less likely to give dictatorships the benefit of the doubt, and see allies as more reliable, but these di↵erences are noticeably
small given the vast di↵erences between doves and haws we find in other areas of public opinion
about foreign policy. We might expect stronger treatment heterogeneity with respect to militant
internationalism if the United States was itself a participant in the disputes, rather than merely an
observer. Finally, although the confidence intervals around the point estimates in Figure 9 are wider
for respondents with high levels of foreign policy interest (in black) than low levels of foreign policy
interest (in grey), we see fairly similar results between the two: it appears that more interested
1 We calculate the threshold for doves and hawks, and low and high interest in foreign policy by mean-splitting
responses in the militant internationalism and foreign policy interest scales, shown in full below.
4
respondents place a greater weight on military capabilities, and somewhat lesser weights on costly
signals, but the di↵erences for the latter are relatively small.
In general, then, there do not appear to be systematic di↵erences across any of these characteristics: we never see that certain kinds of respondents are systematically more sensitive to leader-level
variables rather than country-level ones, for example, or draw more information from current behavior and less from past behavior. The absence of these systematic di↵erences does not mean that
we would see identical results in samples of elite-decision-makers, but as noted in the main text, it
does encourage us to think precisely about the precise mechanisms that would be responsible for
any such di↵erences (decision-makers’ greater levels of education, for example, are unlikely to be responsible), which could also then be used to explain systematic di↵erences between decision-makers
themselves.2
2 More generally, what makes this question both theoretically difficult (and worthwhile!) is that our quantity of
interest becomes a kind of di↵erence-in-di↵erence-in-di↵erences: we’re interested in theorizing the relative di↵erence
between, for example, the e↵ects of past actions (e.g. standing firm versus backing down), versus the current credibility
calculus (e.g. high stakes versus low stakes), between elites versus masses.
5
Figure 1: Robustness check: dropping the sixth choice task
Low Capabilities
High Capabilities
Low Stakes
High Stakes
Dictatorship
Democracy
Mixed
Ally
Adversary
Established Leader
New Leader
No Military Service
Some Military Service
Long Military Service
Female Leader
Male Leader
Target
Initiator
Against ally
Against adversary
Different leader, stood firm
Same leader, stood firm
Same leader, backed down
Different leader, backed down
Nothing
Mobilized troops
Public threat
-0.3
-0.2
-0.1
0.0
0.1
0.2
Average marginal component effect (AMCE)
To test the stability and no carryover e↵ects assumption, Figure 1 plots Average Marginal Component E↵ects (AMCEs) with 95% clustered bootstrapped confidence intervals, replicating Figure ?? from the main text, but without
including data from the sixth choice task, which displayed violations of the caryover assumption. Importantly, the
results hold, and are stronger than the more conservative estimates presented in the text. As before, positive values
indicate that the attribute increases the perceived likelihood that an actor will stand firm, while negative values indicate that the attribute decreases the perceived likelihood of an actor standing firm. Thus, for example, democracies
are perceived as 4% less likely to stand firm than dictatorships.
6
Figure 2: Model diagnostics: testing the profile order assumption
Low Capabilities
High Capabilities
Low Stakes
High Stakes
Dictatorship
Democracy
Mixed
Ally
Adversary
Established Leader
New Leader
No Military Service
Some Military Service
Long Military Service
Female Leader
Male Leader
Target
Initiator
Against ally
Against adversary
Different leader, stood firm
Same leader, stood firm
Same leader, backed down
Different leader, backed down
Nothing
Mobilized troops
Public threat
-0.3
-0.2
-0.1
0.0
0.1
0.2
Average marginal component effect (AMCE)
Figure 2 plots Average Marginal Component E↵ects (AMCEs) with 95% clustered bootstrapped confidence intervals,
replicating Figure ?? from the main text, but conditional on whether characteristic was presented as belonging to
country A (in black) or country B (in grey). Importantly, the results do not appear to systematically di↵er based on
whether a particular characteristic was presented first or second.
7
High
Some
AMCE 7
-0.3
AMCE 7
-0.3
AMCE
AMCE 6
AMCE 6
0.3
AMCE 5
AMCE 5
0.2
AMCE 4
AMCE 4
0.0
AMCE 3
-0.1
AMCE 2
AMCE 3
0.1
-0.3
AMCE 2
-0.2
0.3
-0.2
-0.1
0.0
0.1
-0.2
-0.1
AMCE
0.0
0.1
Previous action
0.2
Previous role
0.1
0.2
0.3
0.2
0.3
-0.3
AMCE 7
AMCE 6
AMCE 5
AMCE 4
AMCE 3
AMCE 2
AMCE 1
-0.3
AMCE 7
AMCE 7
AMCE 5
AMCE 5
AMCE 6
AMCE 4
AMCE 4
AMCE 6
AMCE 3
AMCE 3
-0.3
AMCE 2
AMCE
0.0
0.3
AMCE 1
AMCE
-0.1
0.2
AMCE 1
AMCE 2
AMCE 3
AMCE 4
AMCE 5
AMCE 6
AMCE 7
AMCE 1
AMCE 2
AMCE 3
AMCE 4
AMCE 5
AMCE 6
AMCE 7
AMCE 2
AMCE 1
-0.2
0.1
Mixed
AMCE 1
AMCE 1
-0.3
AMCE 1
AMCE 2
AMCE 3
AMCE 4
AMCE 5
AMCE 6
AMCE 7
AMCE 1
AMCE 2
AMCE 3
AMCE 4
AMCE 5
AMCE 6
AMCE 7
0.0
Male Leader
-0.1
Military service
-0.2
AMCE
-0.3
-0.3
Stakes
AMCE
AMCE 7
AMCE 7
0.3
AMCE 6
AMCE 6
0.2
AMCE 5
AMCE 5
0.1
AMCE 4
AMCE 4
0.0
AMCE 3
AMCE 3
-0.1
AMCE 2
AMCE 2
-0.2
AMCE 1
AMCE 1
Capabilities
Demoracy
-0.1
AMCE
0.0
0.1
-0.2
-0.2
AMCE
0.0
0.1
-0.1
AMCE
0.0
0.1
Previous leader
-0.1
Relationship with US
-0.2
Regime type
0.2
0.2
0.2
0.3
0.3
0.3
-0.3
AMCE 1
AMCE 2
AMCE 3
AMCE 4
AMCE 5
AMCE 6
AMCE 7
AMCE 1
AMCE 2
AMCE 3
AMCE 4
AMCE 5
AMCE 6
AMCE 7
-0.3
AMCE 7
AMCE 6
AMCE 5
AMCE 4
AMCE 3
AMCE 2
AMCE 1
-0.3
AMCE 7
AMCE 6
AMCE 5
AMCE 4
AMCE 3
AMCE 2
AMCE 1
Figure 3: Model diagnostics: treatment e↵ects do not systematically vary by attribute order
Mobilized troops
Public threat
8
-0.2
-0.2
-0.2
AMCE
0.0
0.1
AMCE
0.0
0.1
-0.1
AMCE
0.0
0.1
Current behavior
-0.1
Previous opponent
-0.1
New Leader
0.2
0.2
0.2
0.3
0.3
0.3
Figure 4: Dropping unusual dyadic combinations
Low Capabilities
High Capabilities
Low Stakes
High Stakes
Dictatorship
Democracy
Mixed
Ally
Adversary
Established Leader
New Leader
No Military Service
Some Military Service
Long Military Service
Female Leader
Male Leader
Target
Initiator
Against ally
Against adversary
Different leader, stood firm
Same leader, stood firm
Same leader, backed down
Different leader, backed down
Nothing
Mobilized troops
Public threat
-0.3
-0.2
-0.1
0.0
0.1
Average marginal component effect (AMCE)
9
0.2
Figure 5: Estimating the full four-way interaction for past actions
Low Capabilities
Capabilities
& interests
High Capabilities
Low Stakes
Characteristics
High Stakes
Regime
type
Dictatorship
Democracy
Mixed
Relationship
with USA
Ally
Adversary
Established Leader
New Leader
Leader
characteristics
No Military Service
Some Military Service
Long Military Service
Female Leader
Male Leader
Initiator, backed down vs ally, same leader
Initiator, backed down vs adversary, same leader
Target, backed down vs ally, same leader
Target, backed down vs ally, different leader
Target, backed down vs adversary, same leader
Initiator, backed down vs adversary, different leader
Past behavior
in previous
foreign policy
crisis
Initiator, backed down vs ally, different leader
Target, backed down vs adversary, different leader
Target, stood firm against ally, different leader
Target, stood firm vs adversary, different leader
Behavior
Initiator, stood firm vs ally, different leader
Initiator, stood firm vs adversary, same leader
Initiator, stood firm vs adversary, different leader
Initiator, stood firm vs ally, same leader
Target, stood firm vs ally, same leader
Target, stood firm vs adversary, same leader
Current
behavior
Nothing
Mobilized troops
Public threat
-0.3
-0.2
-0.1
0.0
0.1
Average marginal component effect (AMCE)
(Positive values equal greater resolve)
10
0.2
Figure 6: Heterogeneous treatment e↵ects: low-educated (grey) versus high-educated (black)
Low Capabilities
High Capabilities
Low Stakes
High Stakes
Dictatorship
Democracy
Mixed
Ally
Adversary
Established Leader
New Leader
No Military Service
Some Military Service
Long Military Service
Female Leader
Male Leader
Target
Initiator
Against ally
Against adversary
Different leader, stood firm
Same leader, stood firm
Same leader, backed down
Different leader, backed down
Nothing
Mobilized troops
Public threat
-0.3
-0.2
-0.1
0.0
0.1
Average marginal component effect (AMCE)
11
0.2
Figure 7: Heterogeneous treatment e↵ects: hawks (black) versus doves (grey)
Low Capabilities
High Capabilities
Low Stakes
High Stakes
Dictatorship
Democracy
Mixed
Ally
Adversary
Established Leader
New Leader
No Military Service
Some Military Service
Long Military Service
Female Leader
Male Leader
Target
Initiator
Against ally
Against adversary
Different leader, stood firm
Same leader, stood firm
Same leader, backed down
Different leader, backed down
Nothing
Mobilized troops
Public threat
-0.3
-0.2
-0.1
0.0
0.1
Average marginal component effect (AMCE)
12
0.2
Figure 8: Heterogeneous treatment e↵ects: Democrats (grey) versus Republicans (black)
Low Capabilities
High Capabilities
Low Stakes
High Stakes
Dictatorship
Democracy
Mixed
Ally
Adversary
Established Leader
New Leader
No Military Service
Some Military Service
Long Military Service
Female Leader
Male Leader
Target
Initiator
Against ally
Against adversary
Different leader, stood firm
Same leader, stood firm
Same leader, backed down
Different leader, backed down
Nothing
Mobilized troops
Public threat
-0.3
-0.2
-0.1
0.0
0.1
Average marginal component effect (AMCE)
13
0.2
Figure 9: Heterogeneous treatment e↵ects: low foreign policy interest (grey) versus high foreign
policy interest (black)
Low Capabilities
High Capabilities
Low Stakes
High Stakes
Dictatorship
Democracy
Mixed
Ally
Adversary
Established Leader
New Leader
No Military Service
Some Military Service
Long Military Service
Female Leader
Male Leader
Target
Initiator
Against ally
Against adversary
Different leader, stood firm
Same leader, stood firm
Same leader, backed down
Different leader, backed down
Nothing
Mobilized troops
Public threat
-0.3
-0.2
-0.1
0.0
0.1
Average marginal component effect (AMCE)
14
0.2
2
Dispositional Instrument
As noted in the main text, in addition to the choice-based conjoint experiments, participants also
completed a battery of dispositional and demographic measures. To avoid downstream e↵ects, participants were randomly assigned to receive the dispositional questionnaire either before completing
the conjoint tasks, or afterwards.
Unless otherwise specified, all response options below are scaled from “strongly agree” (1) to
“strongly disagree” (5).
2.1
Militant Internationalism
1. The best way to ensure world peace is through American military strength
2. The use of force generally makes problems worse [reverse-coded]
3. Rather than simply reacting to our enemies, it’s better for us to strike first
4. Generally, the more influence America has on other nations, the better o↵ they are
2.2
Cooperative internationalism
1. America needs to cooperate more with the United Nations in settling international disputes
2. It is essential for the United State to work with other nations to solve problems such as
overpopulation, hunger and pollution
2.3
Isolationalism
1. The U.S. needs to play an active role in solving conflicts around the world [reverse-coded]
2. The U.S. government should just try to take care of the well-being of Americans and not get
involved with other nations
2.4
International trust
1. Generally speaking, would you say that the United States can trust other nations, or that the
United States can’t be too careful in dealing with other nations? [the United States can trust
other nations/ the United States can’t be too careful in dealing with other nations]
15
2.5
Demographic Questions
1. What is your gender? [male/female]
2. What year were you born? [open text box]
3. What is the highest level of education you have completed [less than high school/ high school
or GED/ some college/ 2 year college degree/ 4 year college degree/ Master’s degree/ Doctoral
degree/ Professional degree (e.g., JD or MD)]
4. Generally speaking, do you usually think of yourself as a Republican, Democrat, or as an
independent (check the option that best applies)? [Strong Republican/ Republican/ Independent, but lean Republican/ Independent/ Independent, but lean Democrat/ Democrat/ Strong
Democrat]
5. Below is a scale on which the political views that people might hold are arranged from “extremely conservative” to “extremely liberal.” Where would you place yourself on this scale?
[extremely conservative/ conservative/ slightly conservative/ moderate/ slightly liberal/ liberal/ extremely liberal]
6. How interested are you in information about what’s going on in foreign a↵airs? [extremely
interested/ very interested/ moderately interested/ slightly interested/ not interested at all]
7. How interested are you in information about what’s going on in government and politics? [extremely interested/ very interested/ moderately interested/ slightly interested/ not interested
at all]
16
3
Conjoint Instrument Screen
In this portion of the study, we will present you with information about a series of eight foreign
policy disputes between di↵erent countries.
Countries often get into disputes over contested territories. These disputes receive considerable
attention because of the risk they can escalate to the use of force. Thus, the kinds of disputes
described here are ones that have occurred many times, and will likely occur again.
In each screen, we will present you with a pair of countries involved in a territorial dispute, tell
you a bit about each of them, and ask you to make predictions about what you think will happen.
There are no right or wrong answers, we’re simply interested in the kinds of predictions you make.
17