Bayesian cognitive science, under-considered alternatives, and the value of specialization October 16, 2014 Matteo Colombo, Tilburg Center for Logic and Philosophy of Science (TiLPS), Tilburg University, The Netherlands Rogier De Langhe, Complex Systems Institute, Ghent University, Belgium Abstract A widely held assumption in cognitive science is that the Bayesian framework should be chosen for discovering and assessing explanations of cognitive phenomena whose production involves uncertainty. However, it is controversial that the Bayesian framework enjoys special epistemic virtues over available underconsidered alternatives for representing uncertainty. A better justification for adopting the Bayesian framework in cognitive science is that currently it comprises a richer body of tools that can be opportunistically exploited so as to foster specialization. As the case of Bayesian cognitive science illustrates, while the value of specialization trades off with the value of innovation, specialization is often the best way to achieve scientific progress. Introduction Bayesian decision theory is a theoretical framework ever more prominent in the cognitive and brain sciences.1 Driven by mathematical advances in statistics and computer science, as well as by engineering successes in fields such as machine learning and artificial intelligence, Bayesian models have been proposed for many phenomena of perception, motor control, learning, decision-making and reasoning (Chater, Tenenbaum, and Yuille (2006), Doya, Ishii, Pouget, and Rao (2007), D. C. Knill and Richards (1996), K¨ording (2007), Maloney (2002), Rao, Olshausen, and Lewicki (2002), Tenenbaum, Kemp, Griffiths, and Goodman (2011)). One of the most common arguments for favoring the Bayesian approach in cognitive science is based on the fact that uncertainty is an ineliminable feature of cognitive systems’ interactions with the world. A cognitive system’s interaction with the world would require the system to infer the values of unknown parameters from the input data it receives. Because input data are sparse, ambiguous and corrupted by noise, which in turn may result in behavioural 1 The label “Bayesian” here is a placeholder for a set of interrelated principles, methods, tools and problemsolving procedures whose hard-core is the Bayesian rule of conditionalization, which prescribes how the probability of a hypothesis should be updated based on new evidence. 1 variability, cognitive systems constantly face problems of inference under uncertainty. Unless these problems are effectively solved, cognitive systems would not be able to interact adaptively with the world by producing reliable actions, accurate perceptions and by learning about the surrounding environment. If uncertainty is an ineliminable feature of cognitive systems’ interactions with the world -so continues this argument- then the explanatory framework cognitive scientists use to understand adaptive behaviour should be able to account for how cognitive systems can effectively deal with uncertainty. The framework should allow scientists to account for how cognitive systems make sound inferences under uncertainty. Because the Bayesian framework is the best one for understanding how systems can effectively deal with uncertainty and make sound inferences, this framework should be chosen for understanding at least some aspect of phenomena whose production requires cognitive systems to solve problems of inference under uncertainty. The argument just canvassed can be called the argument from uncertainty for Bayesian cognitive science. The first aim of this paper is to clearly reconstruct it (in Section 1). This reconstruction will contribute to clarify the conceptual foundations of the Bayesian approach, which is a pressing task given recent controversy about the nature, aims and scope of Bayesian modelling in cognitive science (e.g., Bowers and Davis (2012), Chater et al. (2011), Griffiths, Chater, Norris, and Pouget (2012), Jones and Love (2011); for recent philosophical discussions of Bayesian modelling in cognitive science see Eberhardt and Danks (2011), Colombo and Seri`es (2012), Colombo and Hartmann (under review)). The second aim is to assess the extent to which the argument from uncertainty justifies many cognitive scientists’ decision to work within the Bayesian framework. In assessing this argument, some currently underexplored alternatives to the Bayesian approach should be taken into account as frameworks for representing and dealing with uncertainty. Once such alternatives are taken into account, the argument from uncertainty loses much of its bite, since it is controversial that the Bayesian framework enjoys special epistemic virtues (Section 2). However, even if it is not taken for granted that the Bayesian approach enjoys special epistemic virtues in comparison to alternatives, there is reason to favour it in cognitive science -or so we shall argue in the light of the results of a simple agent-based model of the distribution of cognitive labour in science (De Langhe 2014, section 3). Compared to alternatives, the Bayesian approach currently affords cognitive scientists with a richer body of knowledge and tools, which have been developed in the fields of machine learning, artificial intelligence and statistics for tackling various classes of problems of inference under uncertainty. If cognitive systems tackle similar problems, then it can be rational for cognitive scientists to exploit well-trodden tools and knowledge available from neighbouring fields, rather than to explore novel or underdeveloped alternatives, in order to understand how cognitive systems handle uncertainty when they produce certain cognitive phenomena. Progress in cognitive science, similarly to other scientific fields, is often guided not by the pursuit of theories of the mind and brain that enjoy special epistemic virtues, or that we have reason to believe probably or approximately true. Rather, progress is often guided by the pursuit of theories that can opportunistically exploit available tools and knowledge from 2 neighbouring fields. This is as it should be, as shown by the results of the agent-based model we consider. Sometimes, it is epistemically more valuable for agents to respond to the tradeoff between exploration and exploitation by exploiting current available knowledge and tools instead of exploring novel paths of research. This general argument -we believe- provides a more telling justification than the argument from uncertainty for currently adopting the Bayesian framework in cognitive science. 1 From Uncertainty to Bayesian Brains The argument from uncertainty seeks to establish that the Bayesian framework should be privileged for explaining many cognitive phenomena whose production requires a cognitive system to handle uncertainty. The argument includes two steps. The first step aims to substantiate the claim that adaptive cognitive systems must effectively deal with uncertainty. The second step aims to establish that the Bayesian framework is the best one for explaining how a system can effectively deal with uncertainty and make sound inferences. 1.1 Uncertainty. Underdetermination and noise The first step in the argument from uncertainty is as follows: P1 Cognitive systems interact adaptively with the world. P2 If a cognitive system interacts adaptively with the world, then it must effectively deal with uncertainty and control behavioural variability. C Cognitive systems must effectively deal with uncertainty and control behavioural variability. The important premise is P2, which, in one formulation or another, motivates much of the current research in cognitive science carried out within the Bayesian framework. Here is an overview of relevant claims in the literature. D. Knill and Pouget (2004, 712) begin by claiming that “humans and other animals operate in a world of sensory uncertainty”. Ma, Beck, Latham, and Pouget (2006, 1432) motivate their study by saying that “virtually all computations performed by the nervous system are subject to uncertainty”. Pouget, Beck, Ma, and Latham (2013, 1170) echo them by writing that “uncertainty is an intrinsic part of neural computation, whether for sensory processing, motor control or cognitive reasoning.” Orban and Wolpert (2011, 1) explain that “uncertainty is ubiquitous in our sensorimotor interactions, arising from factors such as sensory and motor noise and ambiguity about the environment”. Tenenbaum et al. (2011, 1279) point out that “we build rich causal models, make strong generalizations, and construct powerful abstractions, whereas the input data are sparse, noisy, and ambiguous?in every way far too limited”. Finally, Vilares and K¨ ording (2011, 22) hold that “uncertainty is relevant in most situations in which humans need to make decisions and will thus affect the problems to be solved by the brain”. In the field, the term “uncertainty” is generally used broadly, to refer to the fact that a cognitive system facing some problem lacks some relevant piece of information. This lack of information may be due to noise -that is, to random disturbances corrupting the sensory 3 signals and processes of the system- or to the underdetermination of percepts, as well as of other cognitive states, by input data. Whether uncertainty is caused by noise or by the problem of underdetermination, it bears emphasis that uncertainty goes hand in hand with behavioural variability. For example, if you reach for an object in the darkness, your visual and motor systems will lack relevant information about the location of the object. Your uncertainty about its location will be reflected in a lack of accuracy in any one reaching trial. If you try to reach for that object over and over again, a large behavioural variability should be expected over trials. Even when a stimulus is hold as constant as possible over a number of trials, our perceptions of the stimulus will also vary from trial to trial. For a system to have accurate perceptions and to display reliable motor behaviour, it must find some way to tame such a variability. The uncertainty of a cognitive system may depend on the problem of underdetermination that it should constantly solve. Cognitive agents like humans can access the world only through their senses, which can be viewed as sources of information about the state obtaining in the world at any given time. If we frame this situation by using concepts from statistics, we may refer to the states obtaining in the world with the terms “environmental parameters”, “hidden states” or “models”, while to the sensory information received by cognitive agents with the terms “sensory data” or “evidence”. The values of the environmental parameters that the system must infer are underdetermined by the sensory data available to the system. This means that, at any given time, for any sensory input to our cognitive system, there are multiple states in the world that can fit the sensory input. Because the same sensory input can be fit by many different environmental states, processing the sensory input alone is not sufficient to determine which state in the world caused it. Hence, sensory inputs underdetermine their environmental causes. For instance, the sensory input generated by a convex object under normal lighting circumstances underdetermines its external cause. There are at least two possibilities: the object in the world that caused the input is convex and the light comes from overhead; or the object is concave and the light comes from below. In order to perceive, and to have accurate perceptions, our cognitive system must find some strategy to solve this underdetermination problem. The uncertainty of a cognitive system may be due to noise too, whose source can be internal or external to the system. In general, noise amounts to data received but unwanted by a system. As a noisy signal contains more data than the original signal by itself, noise modifies the signal and extends the cognitive system’s freedom of choice in decoding it. This is an undesirable freedom to the extent that the adaptive behaviour the system can produce requires a sufficient degree of fidelity between original and decoded signals. In biological agents, “noise permeates every level of the nervous system, from the perception of sensory signals to the generation of motor responses” (Faisal, Selen, & Wolpert 2008, 292). Specifically, three sources of noise are characteristic of biological agents. The first source of noise lies in the thermodynamic or quantal transduction of the energy comprised by sensory signals into electrical signals. “For example, all forms of chemical sensing (including smell and gustation) are affected by thermodynamic noise because molecules arrive at the receptor at random rates owing to diffusion and because receptor proteins are limited in their ability 4 to accurately count the number of signalling molecules” (D. C. Knill & Richards 1996, 4). The second source of noise lies in certain biophysical features of ion channels, of synaptic transmission, of network interactions and random processes governing neural activations. These biophysical features introduce noise at the level of cellular signalling. A third source of noise characteristic of biological agents lies in the transduction of signals carried by motor neurons into mechanical forces in muscle fibers. This transduction introduces noise in the signals underlying motor control, and can make motor behaviour very much variable even in the same types of circumstances when the same goal is pursued. In order to accurately perform motor commands, and to display reliable actions, biological agents must find some strategy to handle the noise introduced at different level of neural processing. If cognitive systems implement some algorithms that can solve the problem of underdetermination and mitigate the detrimental effects of noise, then they can effectively deal with sensory and motor uncertainty so as to generate accurate perceptions and reliable action. As the Bayesian framework provides cognitive scientists with a suite of algorithms and methods to represent and deal with uncertainty that can solve problems of underdetermination and mitigate detrimental effects of noise, this framework is justifiably chosen to explain at least some central aspects of cognition. 1.2 Bayes and Uncertainty. A Natural Marriage? The second step in the argument from uncertainty seeks to establish that the Bayesian is the best framework for explaining how a system can effectively solve problems of inference under uncertainty. This second step has the following form: Given feature F , which is necessarily involved in the production of explananda P1 , . . . , Pn , and given candidate explanatory frameworks X1, . . . , Xn for explaining P 1, . . . , P n, infer the explanatory superiority with respect to P 1, . . . , P n of that Xi, which is best for treating F . An argument with this form would have us to infer the explanatory superiority of one framework Xi among several others with respect to explananda P 1, . . . , P n. The basis for drawing this conclusion is that Xi is the best of the available competing frameworks for characterizing and treating some feature F that is necessarily involved in the production of P 1, . . . , P n. Here, F is the uncertainty that a cognitive system must handle when it produces adaptive behaviour and cognitive phenomena P 1, . . . , P n. F is necessarily involved in the production of explananda P 1, . . . , P n because, as a matter of fact, unless the system effectively deals with uncertainty, P 1, . . . , P n cannot be produced. The explanatory framework Xi would be the best for treating uncertainty in the sense that it would afford the best way to characterize, represent and deal with uncertainty. By adopting framework Xi, cognitive scientists can most fruitfully, most simply, most adequately or most generally explain how cognitive systems can make successful inferences under uncertainty so as to solve the problem of underdetermination and handle the detrimental effects of internal noise. Compared to alternative frameworks, if framework Xi is the best with respect to F , in the sense that it possesses more epistemic virtues (or epistemic virtues to a sufficiently higher degree) that bear on explanations of F -involving phenomena, then it should be concluded that Xi is superior to alternatives for explaining those phenomena. 5 Indeed, Bayesian decision theory has been characterized in cognitive science as the most “effective,” “congenial,” “natural” or “rational” framework to represent and deal with uncertainty (cf. Chater et al. (2006, 287), Doya et al. (2007, xi), D. Knill and Pouget (2004, 712), Maloney (2002, 145), Mamassian, Landy, and Maloney (2002, 13), Orban and Wolpert (2011, 1), Rescorla (in press, Section 2), Fiser, Berkes, Orban, and Lengyel (2010, 120). To understand in which sense Bayesian decision theory is the most “congenial” framework for representing uncertainty, its basic tenets should be brought into focus. Within Bayesian decision theory, uncertainty is represented by probability distributions; the Bayesian rule of conditionalization specifies how a probability distribution should be updated in the light of new information. Within this framework, a cognitive system is seen as entertaining “beliefs” drawn from a hypothesis space H. “Beliefs” are about what in the world could have caused the current input e to the system. Each “belief” is associated with a prior probability P rob(h), which represents the weight borne by the belief that h on the processes carried out by the system. At any given time, the system’s “beliefs” satisfy the axioms of the probability calculus. Probabilities are also assigned to (e, h) pairs, in the form of a generative model that specifies a joint probability distribution over inputs and hypotheses about states in the world generating those inputs. Generative models model likelihoods that represent how probable it is that the system would receive the current input e, given a hypothesized state h in the world, viz. P rob(e|h). Given a generative model, current input e and the prior knowledge associated with P rob(h), the system computes the posterior conditional probability P rob(h|e), thereby reallocating probabilities across the hypothesis space in accord with the Bayesian rule of conditionalization. Conditionalization governs how the system ought to update “beliefs” upon receiving new information, but it does not specify how the beliefs entertained by the system should be used to produce a decision, an action or some other phenomenon. How to use the posterior to produce a decision, an action or some other phenomenon requires the definition of a loss function, which specifies the relative cost of making a certain decision based on a certain belief. To determine the best possible decision available at a given time, the system needs to compute the estimated loss for any given decision and belief. With this outline in place of the basics of Bayesian decision theory, let us now re-examine the problem of underdetermination by considering the case of visual perception. The input data to the visual system consist of the image that arrives to the retina. The “beliefs” (or hypotheses) entertained by the perceptual system are about states in the world that could have given rise to that image. Based solely on input data, the system cannot determine which state gave rise to the retinal image, as any patch of retinal stimulation could correspond to an object of any size and almost any shape. However, if the system deploys knowledge about which size and shape are more likely a priori, it can determine which state or object would be most likely to produce the retinal input data. By applying the rule of conditionalization so as to combine prior knowledge with the likelihood of the state in the world giving rise to input data, the system can find a solution to the problem of underdetermination. What about noise? Prior information embodied in neurons’ receptive fields -viz. in the portion of sensory space that can elicit neural responses when stimulated- can be used and processed in a Bayesian fashion also to handle the effects of noise. The basic strategy is simple: “If the structure of the signal and/or noise is known it can be used to distinguish 6 signal from noise” (Faisal et al. 2008, 298). And distinguishing signal from noise is essential to producing reliable cognitive and behavioural phenomena. In other words, neurons’ prior knowledge about the expected statistical structure or noise of a signal for any given source of information allows the cognitive system to compensate for noise and to give more weight to more reliable (less noisy) signals in its processes. So, the human cognitive system can rely on prior knowledge about the expected structure of its inputs and combine it with incoming data by applying Bayesian conditionalization in order to deal with noise and produce reliable perceptions and motor behaviour. While there is no doubt that the Bayesian framework can be adopted to explain how cognitive systems can deal with uncertainty in producing certain phenomena, it remains unclear why this framework should be privileged over alternative frameworks. In current cognitive science, it is oft assumed that the Bayesian framework should be privileged because it obviously enjoys more epistemic virtues (or epistemic virtues to a larger degree) than alternatives. It would provide the most unified, precise, fruitful and “rational” explanatory framework for many sets of empirical data about cognitive phenomena. Bayesian decision theory would be unifying, as it offers a common, encompassing and flexible mathematical framework for studying a range of diverse phenomena (cf. D. Knill and Pouget (2004), Griffiths, Chater, Kemp, Perfors, and Tenenbaum (2010); see also Colombo and Hartmann (under review). It is precise insofar as it provides scientists with a rigorous and quantitative formalism that can be used to “precisely relating what one set of information tells us about another” (Chater et al. 2006, 287). The Bayesian framework would be fruitful insofar as it helps cognitive scientists to discover which algorithms can be tractably implemented by the mechanisms producing those phenomena ((Griffiths et al. 2012, 417)). The exploration of such algorithms, in turn, may have underappreciated consequences for our understanding of the nature and relation of perception and action (cf.Clark (2013)). Finally, Bayesian decision theory would be the most rational framework as it sets a normative standard concerning how rational agents should combine and weigh different beliefs, how they should update their beliefs upon receiving novel information and how they should make decision under uncertainty (cf.Doya et al. (2007), Griffiths et al. (2010); see also Bovens and Hartmann (2003). The normative force of such a standard relies on the idea that an agent’s degrees of belief should at least obey the probability calculus, which in turn is typically justified by appealing to (diachronic and synchronic) Dutch book arguments or to Cox (1946)’s theorem. While Dutch book arguments purport to establish that it is epistemically irrational for an agent to have degrees of belief that violate the rules of probabilistic calculus (cf. Pouget et al. (2013), Vineberg (2014)), Cox’s theorem would show that any rational measure of belief is isomorphic to a probability measure. Leveraging these alleged virtues, many cognitive scientists’ choice would then rest justified to embrace Bayesian decision theory as the “most effective,” “congenial” or “natural” framework for explaining how cognitive systems represent and traffic with uncertainty. 2 Trafficking with Uncertainty. A zoo of approaches There are two troubles for the cognitive scientists who justify their choice to work within the Bayesian framework by merely appealing to the argument from uncertainty. First trouble: 7 currently, there are several overlooked alternative frameworks, whereby one can represent uncertainty and solve problems of inference under uncertainty. Second trouble: it is not obvious that Bayesian decision theory offers the best framework for representing and dealing with uncertainty. If some explanatory framework alternative to the Bayesian one is currently available but overlooked by cognitive scientists, then it cannot be simply claimed that cognitive systems and the uncertainty-involving phenomena they produce are best explained within the Bayesian framework. The mere existence of underappreciated alternatives weakens the strength of the argument from uncertainty for Bayesian cognitive science. Furthermore, it is not obvious that Bayesian decision theory provides cognitive scientists with the best framework for finding explanations of phenomena that involve uncertainty; and no cognitive scientist has put forward an argument that this is the case. The Bayesian approach in cognitive science might have gained unjustified plausibility by shielding itself from relevant, but under-considered, alternative frameworks. Here we point to four frameworks for representing uncertainty alternative to Bayesian decision theory, namely: Dempster-Shafer theory, possibility theory, ranking theory and quantum probability theory.2 Before sketching the basic tenets of these four theories, it should be emphasized that we do not intend to offer a thorough comparative review (see e.g. Huber (2014), Halpern (2003) for more extensive reviews). For our purposes, suffices it to make some remarks aimed at driving home the weaker point that it is not obvious that Bayesian decision theory offers cognitive scientists the most unified, precise, fruitful and rational approach to uncertainty-involving phenomena. 2.1 The Dempster-Shafer framework The Dempster-Shafer approach to uncertainty can be considered as a generalization of the Bayesian approach, where probabilities are assigned to sets instead of to single events and Dempster’s rule is used for aggregating information instead of Bayesian conditionalization (Shafer (1992)). There are three functions in this framework allowing to represent uncertainty: the basic probability assignment function, the belief function and the plausibility function. Given some set of states, the basic probability assignment (bpa) function defines a mapping of the power set to the interval between 0 and 1, where the bpa of the null set is 0 and the summation of the bpa’s of all the subsets of the power set is 1. The belief function and the plausibility function define respectively a lower and an upper bound on intervals representing beliefs about (sets of) states. The lower bound Belief for a set of interest is defined as the sum of all the bpa’s of the proper subsets of the set of interest. The upper bound, Plausibility, is the sum of all the bpa’s of the sets that intersect the set of interest. The core rule for aggregating several bpa’s associated to information from multiple, independent sources is Dempster’s rule. This rule corresponds to a normalized conjunctive operation according to which information should be combined by favouring the agreement between the sources and ignoring all the conflicting evidence (see Dempster (1968) for details). 2 While each one of these frameworks provides explicit representations of uncertainty, there are implicit, nonprobabilistic approaches to uncertainty too (see e.g. Simoncelli (2009), Drugowitsch and Pouget (2012)). By expanding the set of alternatives to the Bayesian framework, the case against the argument from uncertainty for Bayesian cognitive science should become even more persuasive. 8 2.2 The possibility framework Within the possibility approach to uncertainty, possibility measures are based on ideas from fuzzy logic, and put into focus the notion of imprecision due to vagueness. In this framework, possibility distributions P oss represent “the knowledge of an agent (about the actual state of affairs) distinguishing what is plausible from what is less plausible, what is the normal course of things from what is not, what is surprising from what is expected” (Dubois and Prade (2007). One way possibility theory differs from the Bayesian approach is by the use of a pair of dual set-functions (i.e., possibility and necessity measures) instead of only one, which make it easier to capture partial ignorance. Formally, one prominent difference between the Bayesian approach and the possibility one is that the former is characterized by an additivity property, while the latter is characterized on a “maxitivity” property. According to this property if U and V are disjoint sets, then P oss(U ∪ V ) = max(P oss(U ), P oss(V )). When P oss(U ) >0 and the set V is non-empty, information can be aggregated as: P oss(V |U ) = 1, if P oss(U ∩ V ) = P oss(U )P oss(V |U ) = P oss(U ∩ V ), otherwise. The difference between this rule and Bayesian conditionalization is that here the renormalisation via division is changed into a shift to 1 of the “possibility” values of the most possible elements in U (see Dubois and Prade (2007) for details). 2.3 The ranking framework In ranking theory, a ranking function κ represents degrees of disbelief (or surprise) on an integer scale (Spohn 2009). A proposition A is disbelieved just in case its rank is positive, κ(A)>0. Accordingly, tautological propositions should not be disbelieved. Propositions not disbelieved at all are ranked 0. Propositions ranked 0 are “unsurprising.” But this does not mean that propositions ranked 0 are necessarily believed. A proposition A is believed just in case its negation is disbelieved, κ(¬A)>0. Disbelieved propositions are ranked with greater and greater degrees, up to ∞. Thus, higher ranks correspond to higher degrees of surprise. Accordingly, contradictory propositions should be disbelieved to the highest degree. Conditional ranks are defined as differences of unconditional ranks as: κ(A|B) = κ(A ∩ B) − κ(B). Using conditional ranks, the main rules for updating and aggregating ranks correspond to Bayesian conditionalization. The relation between ranking and Bayesian theory is complex and subtle. Perhaps the major difference is that ranking theory is focused on the everyday, categorical notion of belief that can be truth or false, instead of the quantitative notion of degree of belief that is captured by the Bayesian framework. Thus ranking theory uses numbers, or ranks, to address several traditional philosophical puzzles centered around the everyday notion of belief. On these grounds it has been claimed that the ranking theoretic approach has some advantages over probabilistic approaches: ranking theory would allow us to do almost everything that we can do with probabilistic measures and also to tackle traditional problems in epistemology (cf. Huber (2014, section 3.3), Spohn (2009, section 3-4). 2.4 The quantum probability framework Quantum probability theory is a geometric approach to probability, where different outcomes are represented as subspaces of varying dimensionality in a multidimensional Hilbert space, 9 which is a vector space used to represent all possible outcomes for questions we could ask about a system. Unit vectors correspond to possible states of the system, and embody current knowledge about the system under consideration. Probabilities of outcomes are determined by projecting the state vector onto different sub-spaces and computing the squared length of the projection. The determination of probabilities is context- and order-dependent, as individual states can be superposition states and composite systems can be entangled. Thus, while in the Bayesian framework Prob (A&B) = Prob (B&A), in quantum probability theory commutativity in conjunction does not always hold (see R´edei and Summers (2007) for an introduction to the theory). The motivation for adopting the quantum probability framework in cognitive science is that core properties of this framework, such as incompatibility, superposition and entanglement, would allow cognitive scientists to accurately account for many cognitive processes and experimental results that are not obviously captured within the Bayesian framework (Pothos & Busemeyer 2013). For example, incompatibility in quantum probability theory entails that it is impossible to concurrently assign a truth-value to two hypotheses. Psychologically, the two hypotheses can be processed only serially, as processing of one hypothesis interferes with the other. Given the hypotheses A and B, if A is true at a certain time, then B can be neither true nor false at that time. Conjunctions between incompatible hypotheses are then defined in a sequential way as “A and then B” (see Busemeyer and Bruza (2012) for details). 2.5 Which one is the best? Some cursory considerations Each one of the approaches just sketched has both epistemic virtues and vices over the Bayesian one. Dempster-Shafer theory can be considered as a generalization of the Bayesian approach, where uncertainty deriving from ignorance is naturally represented by vacuous belief functions, and evidence is combined by Dempster’s rule of combination without requiring strong independence assumptions. Partial and total ignorance can be represented without the need to specify a prior: Your initial degrees of belief should be vacuous, viz. zero everywhere but for tautological propositions. At any later time, your degrees of belief should be the result of combining the vacuous belief function with your total evidence. Belief-states and evidence are represented by the same types of mathematical objects, viz. belief functions. The Dempster-Shafer approach might then be considered as a more unifying framework than the Bayesian one. However, inference within this framework is significantly less computationally efficient than Bayesian inference. This inefficiency depends on the fact that within the Dempster-Shafer framework evidence is represented by a belief function that is induced by a probability measure on the power-set of possible outcomes of a question, instead of by a probability measure on the set of possible outcomes. Hence, the amount of computation required for the combination of evidence by Dempster’s rule increases exponentially with the cardinality of the set of possible outcomes. Possibility theory can be seen as a simpler methodology to inference under uncertainty, where uncertainty corresponds to imprecise or ambiguous information that is void of randomness. The possibility framework has a computational advantage over probability as “maxitivity” makes possibility measures compositional -viz. P oss(U ∪ V ) is determined by P oss(A) and P oss(B): it is the maximum of the two. Instead, all that can be said about P rob(A∪B) is that it is at least max(P rob(A), P rob(B)) and at most min(P rob(A) + P rob(B)). However, since 10 fuzzy approaches to uncertainty such as possibility theory are not isomorphic to probability theory, it can be suggested that Cox’s theorem rules out possibility theory as a rational means of quantifying uncertainty (Lindley (1982); but see Colyvan (2008)). Ranking theory provides a link between quantitative (focused on degrees of belief) and qualitative (focused on the categorical notion of yes-or-no belief) approaches to the representation of uncertainty and belief dynamics. It can be used ingeniously to address general questions of philosophical interest. But it does not obviously have a bearing on actual scientific practice, since it is far from being clear how one should use ranking functions with noisy sample data for making sound inferences about any concrete system or process. Finally, quantum probability theory rests on normatively dubious grounds, as it is based on a set of axioms that allows for an agent to be Dutch-booked. While quantum probability theory “is perhaps a framework for bounded rationality . . . and not as rational as in principle possible” (Pothos & Busemeyer 2013, 2), courtesy of its unique properties, including superposition, entanglement, incompatibility, and interference, it claims to accommodate empirical results related to order/context effects that are not easily captured within a Bayesian framework. If this is so, for some phenomena the Bayesian framework is less empirically adequate than the quantum probabilistic one. While these observations suggest that it is problematic to hold that the Bayesian approach is the best for dealing with uncertainty, virtually all work in Bayesian cognitive science has proceeded by neglecting current alternative frameworks, taking for granted the superiority of the Bayesian one, or assuming that the Bayesian approach is the only game in town. If the argument from uncertainty is used to justify the Bayesian approach in cognitive science, then available alternatives should not be ignored. Unless the relative epistemic virtues of the Bayesian framework are actually probed against virtues and disadvantages of these possible competitors on actual case-studies, the argument from uncertainty alone does not justify many cognitive scientists’ choice to work within the Bayesian framework. For it needs to be established that it is actually the “most effective,” “most congenial” or “most rational” framework to understand many phenomena produced by cognitive systems that must handle uncertainty. We believe there is a better argument to justify many cognitive scientists’ decision to go Bayesian. This argument can be called the argument from specialization and can be established through the results of a simple agent-based model of the distribution of cognitive labour in science. The next and final section aims to make this argument clear. 3 An argument from specialization for Bayesian cognitive science Currently, there is little doubt that the most common approach to represent and deal with uncertainty is the Bayesian one (Halpern 2003, 4). The tools which a Bayesian cognitive scientist can currently use to address problems of uncertain inference are more sophisticated than alternatives, being routinely used in neighbouring fields like machine learning, artificial intelligence, and statistics. In comparison to Dempster-Shafer theory, possibility theory, ranking theory and quantum probability theory, the Bayesian approach is more widespread 11 in each of a wide variety of fields ranging from statistics to machine learning and AI (Poirier 2006). And the popularity of Bayesian modelling has been growing in cognitive science too, as evidenced by an increase in the number of articles, conference papers and workshops dedicated to Bayesian modelling of cognition and its foundations (Kwisthout, Wareham, & van Rooij 2011, note 1). Given this popularity, and given that it is not obvious that the Bayesian framework is the best one for representing uncertainty, it is plausible to explain many cognitive scientists’ choice to carry out their research within the Bayesian framework in terms some non-epistemic, sociological factors. These sociological factors may have led more and more scientists to approach research questions within the Bayesian framework, while neglecting some of the alternative frameworks available for dealing with uncertainty. As more and more cognitive scientists have addressed research questions within the Bayesian framework, a division of cognitive labour has been fostered in the field. Sophisticated tools have been developed, which have been exploited to approach problems at a higher level of specialization. Under certain conditions, this extensive exploitation is the rational thing to do for scientists, since specialization can promote scientific progress. If this is correct, then the value of specialization within the social structure of current cognitive science offers more solid grounds for the choice to currently work within the Bayesian framework. What follows will substantiate this idea by using a simple but general agent-based model of the distribution of cognitive labour. We believe that this model picks out only those features that are essential to any scientific framework. If this is correct, then its results will be robust and applicable to the case of cognitive science. With this model in hand, it will be shown under what circumstances exploiting a well-developed existing framework, instead of exploring under-considered alternatives, is the rational thing to do for scientists. 3.1 Trading-off specialization and innovation. An agent-based model In introducing our model, the first thing to note is that a framework is not a statement that can be true or false, but a standard to generate and assess such statements. For example, Bayesian decision theory affords cognitive scientists with a “unifying mathematical language for framing cognition as the solution to inductive problems”(Tenenbaum et al. 2011, 1285) This way of framing cognition allows scientists to generate and assess predictions and explanations of several cognitive phenomena and behaviour (cf. Colombo and Seri`es (2012)). What is particular about standards is that a standard enables coordination. As such the value of standards depends not solely on their value when used by individuals in isolation, viz. their intrinsic value, but also on their value when used by multiple individuals simultaneously. As a framework in cognitive science, the value of Bayesian decision theory does not lie only in its intrinsic virtues that allow scientists to represent and handle uncertainty “naturally” or “congenially”. Its value depends also on its power to facilitate scientific coordination. One way to illustrate this point is by considering that agents typically have strong preferences for one standard over another even in the absence of differences in intrinsic epistemic value. For example, right-hand and left-hand driving are alternative standards for traffic. Although there is no difference in the intrinsic value of both options, society is not indifferent to the side 12 people drive their cars on. Rather, in the absence of differences in intrinsic value it is clear that coordination is the remaining criterion to evaluate the desirability of both alternatives. The same goes for the adoption of scientific frameworks such as the Bayesian one, viz.: their value depends both on their intrinsic epistemic value and on their success at facilitating coordination. In science, successful coordination allows scientists to divide labour and specialize (Kuhn (1970), Kitcher (1990), Wray (2011, chapter 7), De Langhe (2010)). Successful coordination on a joint standard means scientists can spend less time developing the framework itself and offers them more time to actually use it to gain knowledge and solve problems. This increased productivity in the short term comes at a cost in the long term. Less time spent on critical evaluation of the current framework and the formulation and exploration of novel frameworks with potentially superior intrinsic epistemic values entails a reduced ability to adapt to newly gathered knowledge, and a higher probability of lock-in to a suboptimal standard (cf. Arthur (1989)). In sum, scientists adopting a framework face a trade-off between the conflicting demands of specialization and innovation. For example, cognitive scientists adopting the Bayesian framework instead of the quantum probabilistic approach would face a trade-off between specialization and innovation, between exploiting well-trodden tools and knowledge, and exploring less developed ones. The source of this dilemma is that exploring new frameworks and exploiting a given framework are two mutually exclusive activities because they take place at different levels: within and between frameworks. Specialization corresponds to taking the framework for granted and focusing on using it to achieve results. The amount of specialization that a framework allows is a measure for comparison between frameworks and is a function of the number of adopters (with whom labour can be divided and productivity can be increased) in comparison to other frameworks. Innovation is a measure for comparison of contributions within a framework. The innovativeness of a contribution to a framework depends on the number of other contributions already made to that framework. A contribution exploiting a popular framework might not be very innovative from the point of view of making progress towards improving or replacing that framework, but it does contribute to its exploitation and as such allows other scientists to specialize in other aspects of that framework’s exploitation. Conversely, a contribution to a novel framework might not foster immediate specialization but pave the way for a future, more efficient standard for specialization. As such we conceive of the utility of a scientific contribution as the product of these two fundamentally different but essential factors. Put this point formally, consider a community of N (1, . . . , n) scientists. Each time, each scientist makes a contribution C(c1 , . . . , cN ) to a framework S(s1 , . . . , sM ). Exploitation consists in making a contribution to an existing framework. The more scientists exploit the same framework, the higher the benefit of specialization becomes because scientists can specialize in narrower sub-problems, and specialized tools can be developed, which will foster scientific progress.3 As a consequence, a local proxy for the benefits of exploitation is the number of 3 The insight that division of labour increases productivity by fostering specialization is as old as Adam Smith (2003) and marked the birth of modern economics. For an application of this model to the literature on the division of cognitive labour, see De Langhe (2014). 13 adopters of a framework. More precisely, the “adoption” A of a framework s is the sum of the number of scientists that contribute to it. As (t) = X ai,s (t). (1) Exploration consists in an allocation of scientific labour to a new framework. The less articulated a framework, the higher the innovative value of contributing to that framework. If it is assumed that each scientist makes one contribution at each time, then a local proxy for the benefits of exploration is the inverse of the number of contributions made to a framework at that time. More precisely, the “production” P as the sum of contributions to a framework s is the sum of adopters through time t: Z t As (t0 )dt0 . (2) Ps (t) = 0 The utility U of a framework s is jointly determined by exploration and exploitation: Us (t) = Aαs (t) . Ps (t) (3) The parameter alpha denotes the output elasticity of coordination, which is a function of the (for the purpose of this paper exogenous) state of the tools and epistemic technologies underlying a scientific community (e.g., textbooks, focused workshops, conferences, standard methodologies, well-understood formalisms, specialized tools for data analysis, etc). Us is backward-looking because it evaluates the utility of the last contribution to a framework. However, individual agents do not take into account the utility of the last contribution to a framework, but the utility that their own contribution would have if they made it to that framework.4 Hence the utility of a contribution to a framework is: Us0 (t) = (As (t) + 1)α . Ps (t) + 1 (4) The relation between adoption and production, as specified here dynamically, is an attempt to capture the essence of the trade-off between exploitation and exploration. More exploitation means less exploration, and similarly the number of adopters to a framework increases the specialization benefits from exploiting it but decreases the novelty of exploring it. The result is a constant tension for each agent between exploitation and exploration, where more exploration causes exploitation to become more attractive and vice versa. On the one hand, convergence of scientists on a shared framework is a good thing because of specialization effects. This is modelled by letting the utility of a contribution to a framework increase as more others adopt that framework. On the other hand, it is also important that new frameworks can be developed. This is modelled by letting the utility of a contribution to a framework 4 If scientists always chose to develop that framework which is best articulated, then no alternative frameworks would ever be developed because any new framework would always be less articulated than the existing ones. Scientists can only be expected to develop new frameworks if their focus is not backward-looking, but forward-looking. 14 decrease as more contributions to that framework have been made. In sum, the utility of a contribution to a framework varies with adoption and varies inversely with production. Although our model allows different frameworks to be assigned different intrinsic values, in line with our assessment of the argument from uncertainty above, we shall assume that the intrinsic value of each framework is the same. This is to isolate the effects of the essential dynamics sketched above and to provide a clean answer to the questions that concern us here: Under what conditions is exploitation of an existing framework like the Bayesian one the rational decision to make? And what ratio is superior of exploration and exploitation, ceteris paribus? The dynamics of the model are as follows. Each time each agent makes a contribution to a framework with the probability of contributing to a specific framework proportional to the utility of the next contribution to that framework. This not only assigns a likelihood of adoption to each existing framework but also introduces a non-zero utility to the creation of a new framework. A new framework has no adoption (A = 0) and no production (P = 0), resulting in a fixed utility of 1 regardless of α. The probability that a scientist creates a new framework will therefore vary inversely proportional to the sum of the utility of all frameworks already being adopted. The lower the utility of the existing frameworks, the higher the probability a new one is created, thus self-regulating the system into a balance between the exploitation of existing frameworks and the exploration of new frameworks. The number of frameworks in the model is not given, but a function of the organization of labour in the model itself. Now that utilities have been assigned, it is possible to evaluate alternative strategies for trading off exploration and exploitation based on the total utility they produce for the community as a whole. For example, it is possible to assess how the cognitive science community should trade off the value of exploiting a well-articulated, widespread framework, and of exploring some novel, under-considered alternative. Fig.1 shows for alpha = 2.5 that the total utility in a system with an adaptive ratio is greater than the total utility created by any fixed proportion of explorers from 0 to 100%. A system without explorers (0%) does well initially but is unsustainable in the long term. In our case, if no cognitive scientist explores novel frameworks for representing uncertainty alternative to Bayesian decision theory, then acquisition of valuable knowledge in cognitive science will be hindered in the long run. A system consisting only of explorers (100%) can keep on growing due to continuous innovation, but with as many frameworks as there are scientists it cannot reap the benefits of specialization. In our case, if the cognitive science community were very much fragmented with individual scientists or labs working within different frameworks, then there would be no opportunity at all to advance the fields through specialization. Combinations of both do better, with the optimal fixed strategy around 30% exploration. Yet the most important result is that an adaptive strategy, whereby the ratio of exploration and exploitation varies with the utility of the available alternatives, is superior. This is because it not only combines the benefits of specialization deriving from exploitation with the benefits of innovation deriving from exploration, but has the ability to adaptively shift between them as the circumstances require it. In relation to our case, if the majority of cognitive scientists worked within the Bayesian framework, while a sizable minority explored under-considered 15 Figure 1: Comparison of total utility with an adaptive vs. fixed ratio of explorers for 1,000 agents over 100 steps with α = 2.5. 16 Figure 2: Total utility as a function of α after 1000 steps for 1,000 agents. alternatives, then advancements in scientific knowledge of at least some cognitive phenomena and behaviours would be most likely to result. The superiority of the adaptive strategy is however not universal. Robustness analysis in fig.2 shows that the adaptive strategy is only superior if there are sufficient benefits to dividing labour, viz. if alpha is sufficiently large. If alpha is too small, communities derive greater benefit from full exploration than from an adaptive strategy. This possibly would explain why branches of cognitive science that have not been able (yet) to develop the epistemic technology and tools to make a division of labour possible or worthwhile (e.g., textbooks, workshops, conferences, standard methodologies, formalisms, specialized tools) have a substantially different disciplinary structure from those that have. For all systems where alpha is sufficiently high, this model shows that well-timed intermittent phases of exploitation and exploration are preferable over fixed strategies. In conclusion, two morals relevant to the case of Bayesian cognitive science can be drawn from our model. First, the (successive) monopoly of a single framework is preferable over pluralism in situations where the intrinsic value of different frameworks is comparable or unknown. Second, a fixed exploration ratio of around 30% is superior to other fixed ratio’s, but inferior to a dynamic ratio by which exploration of new frameworks increases with the number of contributions to the monopolistic framework and that framework is gradually depleted. So, currently, the monopoly of the Bayesian approach to uncertainty in cognitive science may well be justified because of the value of specialization it promotes. However, in the longer run, progress will require a mixed strategy of continuing to exploit the Bayesian framework, while investing more time and attention towards new or underappreciated ones to explore. 17 4 Conclusion In several branches of current cognitive science, it is widely assumed that the Bayesian framework should be chosen for finding and assessing explanations of cognitive phenomena whose production involves uncertainty. However, as we have argued in this paper, this assumption is far from being unproblematic, since it is not obvious that the Bayesian framework enjoys special epistemic virtues over available but under-considered alternatives for representing uncertainty. A better justification for adopting the Bayesian framework in cognitive science is that currently it comprises a richer body of tools and epistemic technologies that can be opportunistically exploited so as to foster specialization. While the value of specialization trades off with the value of innovation, specialization is often the best way to achieve scientific progress. Thus, the present paper has made two contributions to existing literature in philosophy and cognitive science. First, it has critically reconstructed the argument from uncertainty for Bayesian cognitive science, arguing that it does not provide cognitive scientists with strong reason to favour the Bayesian approach over alternatives. Second, by relying on a simple model of the division of cognitive labour in science, the paper has put forward a novel argument based on the value of specialization in support of the Bayesian approach in current cognitive science. References Arthur, B. (1989). Competing technologies, increasing returns, and lock-in by historical events. Economic Journal (394), 116-131. Bovens, L., & Hartmann, S. (2003). Bayesian epistemology. Oxford: Clarendon Press. Bowers, J., & Davis, C. (2012). Bayesian just-so stories in psychology and neuroscience. Psychological Bulletin, 138 , 389-414. Busemeyer, J., & Bruza, P. (2012). Quantum models of cognition and decision. Cambridge: Cambridge University Press. Chater, N., Goodman, N. D., Griffiths, T. L., Kemp, C., Oaksford, M., & Tenenbaum, J. B. (2011). The imaginary fundamentalists: The unshocking truth about bayesian cognitive science. Behavioral and Brain Sciences, 34 , 194-196. Chater, N., Tenenbaum, J. B., & Yuille, A. (2006). Probabilistic models of cognition: Conceptual foundations. Trends in Cognitive Sciences, 10 (7), 287-291. Clark, A. (2013). Whatever next? predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36 (3), 181-204. Colombo, M., & Hartmann, S. (under review). Bayesian cognitive science, unification, and explanation. The British Journal for Philosophy of Science. Colombo, M., & Seri`es, P. (2012). Bayes in the brain. on bayesian modelling in neuroscience. The British Journal for Philosophy of Science, 63 (3), 697-723. Colyvan, M. (2008). Is probability the only coherent approach to uncertainty? Risk Analysis, 28 (3), 645-652. Cox, R. (1946). Probability, frequency, and reasonable expectation. American Journal of Physics(14), 1-13. De Langhe, R. (2010). The division of labour in science: the tradeoff between specialisation and diversity. Journal of Economic Methodology, 17 (1), 37-51. 18 De Langhe, R. (2014). A unified model of the division of cognitive labor. Philosophy of Science, 81 (3), 444-459. Dempster, A. (1968). A generalization of bayesian inference. Journal of the Royal Statistical Society, Series B, Methodological , 30 , 205-247. Doya, K., Ishii, S., Pouget, A., & Rao, R. (2007). Bayesian brain: probabilistic approaches to neural coding. Cambridge, MA: MIT Press. Drugowitsch, J., & Pouget, A. (2012). Probabilistic vs. non-probabilistic approaches to the neurobiology of perceptual decision-making. Current opinion in neurobiology, 22 (6), 963-969. Dubois, D., & Prade, H. (2007). Possibility theory. Scholarpedia, 2 (10), 2074. Eberhardt, F., & Danks, D. (2011). Confirmation in the cognitive sciences: The problematic case of bayesian models. Minds and Machines, 21 (3), 389-410. Faisal, A. A., Selen, L. P. J., & Wolpert, D. M. (2008). Noise in the nervous system. Nature Reviews Neuroscience, 9 , 292-303. Fiser, J., Berkes, P., Orban, G., & Lengyel, M. (2010). Statistically optimal perception and learning: from behavior to neural representations. Trends in Cognitive Sciences, 14 (3), 119-130. Griffiths, T. L., Chater, N., Kemp, C., Perfors, A., & Tenenbaum, J. B. (2010). Probabilistic models of cognition: exploring representations and inductive biases. Trends in cognitive sciences, 14 (8), 357-364. Griffiths, T. L., Chater, N., Norris, D., & Pouget, A. (2012). How the bayesians got their beliefs (and what those beliefs actually are). Psychological Bulletin, 138 , 415-422. Halpern, J. (2003). Reasoning about uncertainty. Cambridge, MA: MIT Press. Huber, F. (2014). Formal representations of belief. The Stanford Encyclopedia of Philosophy (Spring 2014 Edition), Jones, M., & Love, B. C. (2011). Bayesian fundamentalism or enlightenment? on the explanatory status and theoretical contributions of bayesian models of cognition. Behavioral and Brain Sciences, 34 , 169-188. Kitcher, P. (1990). The division of cognitive labor. Journal of Philosophy, 87 (1), 5-22. Knill, D., & Pouget, A. (2004). The bayesian brain: the role of uncertainty in neural coding and computation. Trends in Neurosciences, 27 , 712-719. Knill, D. C., & Richards, W. E. (1996). Perception as bayesian inference. New York: Cambridge University Press. K¨ording, K. (2007). Decision theory: what ”should” the nervous system do? Science, 318 , 606-610. Kuhn, T. (1970). The structure of scientific revolutions, 2nd ed. Chicago: Chicago University Press. Kwisthout, J., Wareham, T., & van Rooij, I. (2011). Bayesian intractability is not an ailment that approximation can cure. Cognitive Science, 35 (5), 779-784. Lindley, D. V. (1982). Scoring rules and the inevitability of probability. International Statistical Review , 50 , 1-26. Ma, W., Beck, J., Latham, P., & Pouget, A. (2006). Bayesian inference with probabilistic population codes. Nature Neuroscience, 9 , 1432-1438. Maloney, L. T. (2002). Statistical decision theory and biological vision. In D. H. . R. Mausfeld (Ed.), Perception and the physical world: Psychological and philosophical issues in perception. 19 Mamassian, P., Landy, M. S., & Maloney, L. T. (2002). Bayesian modeling of visual perception. In R. Rao, M. Lewicki, & B. Olshausen (Eds.), Probabilistic models of the brain; perception and neural function. Orban, G., & Wolpert, D. (2011). Representations of uncertainty in sensorimotor control. Current Opinions in Neurobiology, 21 , 1-7. Poirier, D. J. (2006). The growth of bayesian methods in statistics and economics since 1970. Bayesian Analysis, 1 , 969-980. Pothos, E. M., & Busemeyer, J. R. (2013). Can quantum probability provide a new direction for cognitive modeling? Behavioral and Brain Sciences, 36 , 255-327. Pouget, A., Beck, J., Ma, W., & Latham, P. (2013). Probabilistic brains: knowns and unknowns. Nature Neuroscience, 16 , 1170-1178. Rao, R., Olshausen, B., & Lewicki, M. (2002). Probabilistic models of the brain: perception and neural function. Cambridge, MA: MIT Press. R´edei, M., & Summers, S. J. (2007). Quantum probability theory. Studies in the History and Philosophy of Modern Physics, 38 , 390-417. Rescorla, M. (in press). Bayesian perceptual psychology. In M. Matthen (Ed.), The oxford handbook of the philosophy of perception. Shafer, G. (1992). The dempster-shafer theory. In S. C. Shapiro (Ed.), Encyclopedia of artificial intelligence, second edition. Simoncelli, E. P. (2009). Optimal estimation in sensory systems. In M. S. Gazzaniga (Ed.), The cognitive neurosciences, iv. Smith, A. (2003). Wealth of nations. New York: Bantam Classics. ([1776]) Spohn, W. (2009). A survey of ranking theory. In F. Huber & C. Schmidt-Petri (Eds.), Degrees of belief. Tenenbaum, J., Kemp, C., Griffiths, T., & Goodman, N. (2011). How to grow a mind: statistics, structure and abstraction. Science, 331 , 1279-1285. Vilares, I., & K¨ ording, K. (2011). Bayesian models: the structure of the world, uncertainty, behavior, and the brain. Annals of the New York Academy of Sciences, 1224 , 22-39. Vineberg, S. (2014). Dutch book arguments. The Stanford Encyclopedia of Philosophy, Wray, K. B. (2011). Kuhn’s evolutionary social epistemology. Cambridge: Cambridge University Press. 20
