• Tools and Resources
  • Customer Services
  • Affective Science
  • Biological Foundations of Psychology
  • Clinical Psychology: Disorders and Therapies
  • Cognitive Psychology/Neuroscience
  • Developmental Psychology
  • Educational/School Psychology
  • Forensic Psychology
  • Health Psychology
  • History and Systems of Psychology
  • Individual Differences
  • Methods and Approaches in Psychology
  • Neuropsychology
  • Organizational and Institutional Psychology
  • Personality
  • Psychology and Other Disciplines
  • Social Psychology
  • Sports Psychology
  • Share This Facebook LinkedIn Twitter

Article contents

Judgment and decision making.

  • Priscila G. Brust-Renck , Priscila G. Brust-Renck Graduate School of Psychology, Universidade do Vale do Rio dos Sinos, Brazil
  • Rebecca B. Weldon Rebecca B. Weldon Department of Social and Behavioral Sciences, SUNY Polytechnic Institute
  •  and  Valerie F. Reyna Valerie F. Reyna Departments of Human Development and Psychology, Cornell University
  • https://doi.org/10.1093/acrefore/9780190236557.013.536
  • Published online: 26 April 2021

Everyday life is comprised of a series of decisions, from choosing what to wear to deciding what major to declare in college and whom to share a life with. Modern era economic theories were first brought into psychology in the 1950s and 1960s by Ward Edwards and Herbert Simon. Simon suggested that individuals do not always choose the best alternative among the options because they are bounded by cognitive limitations (e.g., memory). People who choose the good-enough option “satisfice” rather than optimize, because they are bounded by their limited time, knowledge, and computational capacity. Daniel Kahneman and Amos Tversky were among those who took the next step by demonstrating that individuals are not only limited but are inconsistent in their preferences, and hence irrational. Describing a series of biases and fallacies, they elaborated intuitive strategies (i.e., heuristics) that people tend to use when faced with difficult questions (e.g., “What proportion of long-distance relationships break up within a year?”) by answering based on simpler, similar questions (e.g., “Do instances of swift breakups of long-distance relationships come readily to mind?”).

More recently, the emotion-versus-reason debate has been incorporated into the field as an approach to how judgments can be governed by two fundamentally different processes, such as intuition (or affect) and reasoning (or deliberation). A series of dual-process approaches by Seymour Epstein, George Lowenstein, Elke Weber, Paul Slovic, and Ellen Peters, among others, attempt to explain how a decision based on emotional and/or impulsive judgments (i.e., system 1) should be distinguished from those that are based on a slow process that is governed by rules of reasoning (i.e., system 2). Valerie Reyna and Charles Brainerd and other scholars take a different approach to dual processes and propose a theory—fuzzy-trace theory—that incorporates many of the prior theoretical elements but also introduces the novel concept of gist mental representations of information (i.e., essential meaning) shaped by culture and experience. Adding to processes of emotion or reward sensitivity and reasoning or deliberation, fuzzy-trace theory characterizes gist as insightful intuition (as opposed to crude system 1 intuition) and contrasts it with verbatim or precise processing that does not consist of meaningful interpretation. Some of these new perspectives explain classic paradoxes and predict new effects that allow us to better understand human judgment and decision making. More recent contributions to the field include research in neuroscience, in particular from neuroeconomics.

  • decision making
  • bounded rationality
  • prospect theory
  • judgment biases
  • judgment fallacies
  • dual-process theory
  • fuzzy-trace theory

Overview: Judgment and Decision Making in Psychology Research

Judging and deciding what to do can involve seemingly simple tasks in some circumstances, such as continuing to read this article or choosing what to eat but also can involve larger life choices, such as whom to marry or what subject to study in college. Research on judgment and decision making within the field of psychology has been devoted to unraveling the way humans make their decisions on a day-to-day basis. Overall, judgment per se can be characterized as the thought, opinion, or evaluation of a stimulus, and the decision is the behavior of choosing among alternative options. In the traditional view, the decision-making process is complex given that one must analyze alternative options, estimate the consequences of choosing each option, and deal with conditions of uncertainty (von Neumann & Morgenstern, 1944 ). Research in judgment and decision making has increased in an interdisciplinary fashion.

Historically, behaviorism was the primary school of thought in psychology until the 1950s or so, but critics of behaviorism recognized that stimulus–response accounts are not sufficient for explaining human behavior (Greenwood, 1999 ). For example, two stimuli can elicit the same response, and one stimulus can lead to two different responses. Furthermore, it is too simplistic to draw conclusions about human behavior without considering the underlying mental processes. In the early years, the judgment and decision-making field was primarily based on theory and data from economics and psychology (notably Edwards, 1954 ), but judgment and decision making also integrates law, political science, social policy, management science, marketing, engineering, and medicine, among others (Arkes & Hammond, 1986 ; Hammond, 1996 ; Slovic et al., 1977 ).

Research on judgment concerns such topics as perceptions of consequences and predictions about future outcomes, and research on decision-making concerns understanding preferences (for reviews, see Fischhoff & Broomell, 2020 ; Mellers et al., 1998 ; Weber & Johnson, 2009 ). Psychological processes (e.g., Kahneman & Tversky, 1979 ) have been studied to explain phenomena of judgment and choice that date back to original predictions of economics models (e.g., von Neumann & Morgenstern, 1944 ). In order to best understand the advances in psychology to predict judgment and decision processes, a brief overview of relevant economic theories is necessary. In particular, the normative approach from economic theory, which was based on axioms of coherence in preferences, showed that following these axioms would ultimately deliver decisions that maximized an individual’s expected utility. Expected utility is the weighted average of the extent to which an outcome is preferred relative to its alternatives. For example, these axioms include transitivity of preferences; if option A is preferred to option B, and option B is preferred to option C, then option A should be preferred to option C. One goal of this important work was to establish normative rules defining rational choices in terms of each individual’s preference structure. Without identifying the best option per se, coherence, or consistency in decision making, is deemed to be normative (Baron, 2012 ).

Despite these normative criteria for evaluation as an attempt to explain the descriptive behavior, judgments about risk and probability do not always obey consistent and coherence rules (e.g., Tversky & Kahneman, 1983 ). The apparent failure of people to reason coherently raises larger concerns about the ability of humans to function well in real-life situations (e.g., Allais, 1953 ; Tversky & Kahneman, 1974 ; but see Simon, 1955 , under “Early Milestones in Psychology” section). In contrast to models that assumed rationality, a new set of descriptive models was developed to account for how individuals actually make decisions based on cognitive psychology research. The distinction among normative, descriptive, and prescriptive models is needed to clarify research goals: Normative models apply to how people should decide; descriptive models refer to how people actually make decisions; and prescriptive approaches help people make better decisions (Bell et al., 1988 ).

This article provides an overview of the historical path of research in the field of psychology. Because the early milestones were a direct reaction to economics research, the first step is an overview of the key relevant models, such as expected utility theory (a theory of rational choice), which assumed normative and descriptive models were the same. This is followed by a review of the early research on violating normative standards of the economic models, including Simon’s Satisficing hypothesis (to accept an available option as satisfactory rather than maximizing), ambiguity aversion (a preference for known risks rather than unknown risks), and other paradoxes (i.e., Allais, 1953 ; Ellsberg, 1961 ), which suggested descriptive models violated normative assumptions. These ideas and associated empirical phenomena challenged normative models, and they provide the foundation for significant departures from rational models. These challenges set the stage for insights and methods from psychology that could explain why human behavior did not follow the tenets of rational choice theories.

A turning point for psychology was when a substantial amount of research demonstrated that deviations from the rational rules of judgments and decision making were systematic. Daniel Kahneman and Amos Tversky’s ( 1979 ) prospect theory was central to this new era of research based on descriptive models, generating phenomena that deviated from normative predictions. This article reviews current models of psychological processes involved in making judgments and choices, noting those that account for the roles of affect, rationality, intuition, and other psychological processes (e.g., Kahneman, 2003 ; Peters & Slovic, 2007 ; Reyna & Brainerd, 2011 ). The conclusion includes recent contributions to the field from neuroscience, in particular, from neuroeconomics.

Prelude: Classical Economics

The advent of research in judgment and decision making in psychology was directly related to how these topics were studied in the field of economics (see Becker & McClintock, 1967 , for a review). Economic theory proposed to identify the best possible solution to a problem given the decision maker’s values and preferences (for reviews, see Baron, 2012 ; Fischhoff, 2010 ). Such preferences are a result of the probability to win multiplied by the value of that outcome—expected value—a concept that dates back to the mathematical work of Blaise Pascal in the 17th century (for a review, see Edwards, 2001 ). A decision problem may constitute a set of alternative possible outcomes (e.g., winning a $100 prize in a lottery), the uncertainty of information in terms of probability of occurrence (e.g., the chances of winning the prize), ambiguity (in which the decision maker lacks knowledge or information about the probabilities or outcomes), or outcomes that occur sooner versus later in time (e.g., Luce & Shipley, 1962 ; O’Donoghue & Rabin, 1999 ). Note that probabilities can be known, as in decisions under risk, or unknown, decisions under ambiguity. To clarify, “ambiguity is epistemic uncertainty about probability created by missing information that is relevant and could be known” (Camerer & Weber, 1992 , p. 330).

A key contribution to the field was Daniel Bernoulli’s ( 1954 ) concepts that later were called diminishing marginal utility (i.e., that small changes to extremely large values have little impact on choice, but the identical changes to small amounts are more likely to make a difference). Interestingly, mathematicians and physicists showed very little interest in Bernoulli’s 1738 work. It is so fundamental to economic theory, however, that economists translated it from Latin and published it in 1954 in a top journal over 200 years after it was written. There continues to be substantial interest in this work long after Bernoulli’s death (Stearns, 2000 ). Bernoulli presented one of the first accounts of why people have a preference for the sure gains option over the gamble when the expected value is the same (i.e., risk aversion). The deviation from expected value was explained by assuming that utility, which is a subjective function of value, is not linear with objective value, but a concave function.

The ideas about maximizing utility and rational choice that eventually were developed in the 20th century stemmed from intellectual ideas in the 19th century . During the latter time, philosophers debated about a policy to benefit the greater good (what type of policy would benefit the most number of people?) while simultaneously trying to predict economic outcomes (how does an economy filled with self-interested individuals thrive?; for a review, see Levin & Milgrom, 2004 ). Rational choice theory has been used to explain choices about saving and spending, crime, marriage, childbearing, with an emphasis on the individual doing what is best for themselves and choosing the action that has the greatest perceived utility (in a cost–benefit analysis of options). Rational choice theory has been useful in that it has helped with generating clear and falsifiable hypotheses, in turn advancing the field of judgment and decision making. Rational choice theory made assumptions of human rationality and maximization of utility.

The concept of rationality within this framework is expressed as internal coherence of a set of preferences (see Mellers et al., 1998 , for a review). In this view, real-world deviations from consistency of revealed preferences were considered irregular or trivial and eliminated from the rational choice model (Samuelson, 1938 ; Suzumura, 1976 ). According to these types of models, individuals are assumed to be rational, that is, they choose coherently, with the chosen option reflecting utilities or personal preferences. Thus, if a person shows a preference for one particular object (or activity) when compared to another one, the utility of that object is higher than that of the rejected object, and the preference relation follows principles of coherence, such as transitivity. Transitivity refers to the coherence of preferences, such that, for example, if a person prefers bananas over apples and apples over oranges, that person would consistently choose bananas over oranges (Levin & Milgrom, 2004 ). The true nature of preferences is revealed by choices themselves; in classical rational choice theory, there is no underlying preference beyond what can be inferred from people’s choices.

Von Neumann and Morgenstern ( 1944 ) showed that when people’s choices obeyed these rules of coherence, they would maximize their expected utility or overall satisfaction. The theorem proving maximization of expected utility was a major achievement the details of which are beyond the scope of this article. Expected utility was related to expected value; the latter is a result of the multiplication of each possible outcome by its probability of occurrence. For example, a gamble with a $100 gain with a 50% chance would be preferred over a sure gain of $40 because $100 × 0.5 = $50, which is greater than $40 × 1.0 = $40. However, expected utility theory assumes that satisfaction of outcomes is not linearly related to objective magnitude.

Theories that followed expected utility theory introduced the idea that probabilities are not perceived objectively (e.g., Luce & Raiffa, 1957 ; Markowitz, 1952 ; Savage, 1954 ). Such theories as expected utility theory and subsequently subjective expected utility theory (e.g., Keeney & Raiffa, 1976 ; Savage, 1954 ; Schoemaker, 1982 ; Stigler, 1950 ; von Neumann & Morgenstern, 1944 ) became well established in economics research, and the assumption of individual rationality was applied to markets and policies (e.g., Frank, 2015 ). According to these theories of rationality, people should choose consistently among their options, and they maximize their expected utility by choosing the option with the overall greatest value.

Also, expected utility theories continue to influence modern economic approaches, including those using econometric techniques. These techniques are used to predict human behavior based on large economic data applied to consumer behavior, health policies, and social and political sciences, among others (see Pope & Sydnor, 2015 , for a review). Even though these models are still considered mainstream and are the current view of many areas of economics (e.g., Frank, 2015 ), they do not account for key phenomena of behavior, as discussed in “ Early Milestones in Psychology: Departures from Economic Theories .” To preview, in order to deal with the growing lists of behavioral violations from rational choices, and the need to accept behavioral assumptions and insights from psychology, the field of behavioral economics emerged in the late 1980s. Richard Thaler, one of the founding fathers of the field and first director of the Center for Behavioral Economics and Decision Research in 1989 at Cornell University, combined work from psychologists and empirical economists to attempt to account for biases and examine alternative frameworks, for which Thaler was awarded the 2017 Nobel Prize in Economic Sciences (e.g., Kahneman, 2012 ; Pope & Sydnor, 2015 ; Rabin, 1998 , 2002 ; Rangel et al., 2008 ). One of the main goals of behavioral economics was to acknowledge and incorporate psychology into descriptive assumptions in order to improve economic analysis. The research that served as inspiration for the change in mindset is the focus of the next section, “ Early Milestones in Psychology: Departures from Economic Theories .”

Early Milestones in Psychology: Departures from Economic Theories

From the 1950s to 1970s, judgment and decision-making research in psychology reacted to the standards of economic, normative models, and identified systematic departures from those standards (i.e., biases and fallacies). Information theory in the context of radio communication during World War I influenced the Cognitive Revolution, which drew on information theories and computer technology (for a retrospective, see Miller, 2003 ), after which psychologists began to study the rational mind in addition to the stimulus–response experience and observable behavior. In 1954 , Ward Edwards took this topic of research to psychologists by publishing an article on the principles of microeconomic theory that directly apply to psychology, such as risky choice, subjective probability, and game theory. This paper was followed by reviews of the empirical and theoretical evidence from economics from 1954 to 1960 (Edwards, 1961 ).

Psychologists started investigating the relationship between normative and descriptive aspects of judgment and decision making. They discovered that people’s behavior and preferences violated normative theories, exhibiting biases and fallacies. These behaviors and preferences were biases and fallacies when compared against normative theories. Psychologists focused on understanding these biases and fallacies, whereas economists downplayed them (e.g., intransitive ordering of risky choices; Tversky, 1969 ). The study of the discrepancies between normative and descriptive models is still a recurring theme underlying contemporary judgment and decision-making research (for a review, see Keren & Wu, 2015 ).

One important problem that influenced two notable researchers in judgment and decision making, Daniel Kahneman and Amos Tversky, is illustrative of systematic violations of consistency and thus challenges expected utility theory: the Allais paradox. In 1953 , Maurice Allais proposed a comparison between two lotteries, one with a sure option with a gamble (francs are converted to dollars in the following example):

A. Receive $1 million for sure. B. 10% chance of receiving $5 million, an 89% chance of receiving $1 million, or a 1% chance of receiving nothing.

Then he also proposed a comparison between two additional gambles:

C. 11% chance of receiving $1 million, or 89% chance of receiving nothing. D. 10% chance of receiving $5 million, or 90% chance of receiving nothing.

The normative prediction would be that if people choose A in the first lottery (representing risk aversion), then they should also choose C in the second lottery in which there is a greater chance of winning, and thus they would show consistent preferences for risk (Allais, 1953 ). Alternatively, the same people who choose option B should also choose D, showing consistent risk-seeking preferences. However, this is not the case, and people tend to be more risk averse (choose A) in the first lottery and risk seeking in the second (choose D), that is, they make choices that are not consistent. These violations of consistencies violate rationality.

Inconsistent preferences illustrated in the Allais paradox could be explained by limitations of human cognition. Herbert Simon ( 1955 ) applied the concept of “bounded rationality” to accommodate limitations of human cognition. In particular, Simon’s ( 1955 , 1957 ) hypothesis of Satisficing was based on the need to deal with unrealistic expectations of maximization. According to Simon, individuals have cognitive limitations that should be taken into account when making judgments and decisions. Some of these limitations are related to memory capacity, attention span, and limitations of time, all of which constitute a framework of what Simon referred to as “bounded rationality” (Simon, 1955 ). Simon proposed that people tend to find solutions that are good enough instead of optimizing (i.e., finding the best possible solution), because it is not reasonable for people to exhaustively compute their expected utility (e.g., they choose the first or second car that meets a satisfying criterion instead of researching all available cars on the market). This scenario is easily observed when there are multiple attributes, which makes the computational process more difficult, and there are greater benefits in minimizing the time and cognitive resources to produce a satisfactory result. In other words, boundedly rational decision makers satisfice instead of optimize their choices (Simon, 1956 , 1990 ).

In 1961 , another paradox was introduced by Daniel Ellsberg, who worked for the RAND corporation on military topics (and who also reviewed the Pentagon Papers). Unlike Allais, who tested decision under risk, Ellsberg challenged the assumptions for decisions under ambiguity, in which the exact probabilities of the outcomes cannot be precisely determined. This is a classic ambiguity problem: There is an urn with 90 balls (30 red balls and 60 balls that are black or yellow, with the latter of unknown proportion).

In round 1, a prize of $100 is offered for a correct guess of which color would be drawn at random from the urn: (a) red or (b) black.

In round 2, a prize of $100 is offered for a correct guess of which color would be drawn at random from the same urn, but with different options: (c) red or yellow or (d) black or yellow.

The most common pattern of response is to prefer to bet on red (option A) in the first round and to prefer to bet on black or yellow (option D) in the second round. This finding is contrary to normative predictions that people who bet on the known result (option A, bet on red for which there is a one-third chance of winning) in the first round (because they know how many red balls there are) would be likely to bet on the same principles to choose the sure option (option C, bet on red or yellow) in the second round. In this choice, people appear to ignore the fact that the probability of drawing a yellow ball is identical in both options in the second bet and thus, the remaining probabilities would match the first round of bets (i.e., choose either red or black). However, according to Ellsberg ( 1961 ), people prefer to avoid ambiguity of the unknown probabilities of outcomes and prefer the options for which they know the probability of each outcome.

A few years later, Edwards et al. ( 1963 ) wrote an important paper about Bayesian reasoning in probability assessment to psychological researchers. Edwards believed that humans behaved as if they had Bayes’s rules engrained in their minds. Edwards’s work inspired Tversky and Kahneman to generate new hypotheses and explore new topics with experimentation that ultimately led to questioning of normative standards. Thus, although Edwards thought people’s choices approximated those predicted by classical economic theory, this conclusion was rejected by the work of Tversky and Kahneman, for which Kahneman later was awarded the 2002 Nobel Prize in Economic Sciences (Tversky was deceased by the time the prize was awarded). They are recognized as some of the founders of behavioral economics (Lewis, 2016 ; Smith, 2001 ).

Not all decisions are simple choices between lotteries. For example, when deciding what car to buy, there are several factors besides cost that should be considered, such as insurance and average miles traveled per gallon of gas. To deal with this situation, multi-attribute expected utility theory was developed alongside Tversky and Kahneman’s work. According to this theory, utility could be determined for each attribute and ordered by preference, such that the downside of one attribute (e.g., cost) could be compensated (traded off) by the benefits of another attribute (e.g., average miles per gallon). The theory combined models of measurement and scaling with economic assessment of utility though weight assignment to each attribute to account for utility (e.g., Fishburn, 1967 ; see also Becker & McClintock, 1967 ). Nevertheless, most day-to-day situations are complex and require a rather sophisticated computation of overall utility which is likely to be beyond the average person’s numerical and computational ability (for a review, see Reyna et al., 2009 ).

Turning Point: Heuristics, Biases, and Framing Effects

From the 1970s to 1990s, psychological research continued to pursue evidence against normative models following several governmental incentives to promote the use of evidence-based outcomes in developing best practices. Daniel Kahneman and Amos Tversky took the central stage of descriptive theories and discovered a host of deviations from normative models, called “biases” and “fallacies” (for reviews, see Gilovich et al., 2002 ; Lewis, 2016 ). They also identified intuitive strategies—heuristics or mental shortcuts—that allow people to make judgments and decisions quickly, which often leads to the aforementioned systematic biases and fallacies (Tversky & Kahneman, 1974 ). They also report research on framing effects, which are well-established biases related to decisions that involve risk (Kahneman & Tversky, 1979 ).

Heuristics and Biases

In the early 1970s, Amos Tversky proposed the elimination-by-aspects model, which describes a psychological strategy to make choices given some specified features, such as cost (Tversky, 1972 ). The process is sequentially identifying options that do not meet predefined criteria (i.e., desirable features) and then eliminating them until only one alternative remains available for choice. For example, among five cars available for buying, perhaps only three meet the feature of having low average miles per gallon of gas, and thus the other two cars are eliminated. Next, one out of the three remaining models has a really high insurance policy, which is undesirable and leads to its elimination from the option set. Finally, the least expensive of the two cars left is the final choice. Note that this strategy does not maximize across the multiple attributes because options are eliminated, even though the magnitudes of good attributes might offset the magnitudes of bad attributes. Elimination-by-aspects is a plausible psychological strategy and an elegant model; it was another nail in the coffin of rational choice theories that assumed utility maximization.

Research on heuristics and biases in judgment under uncertainty is a direct reaction to Simon’s ( 1955 ) idea of bounded rationality. According to Tversky and Kahneman (Kahneman, 2003 ; Kahneman & Tversky, 1972 ; Tversky & Kahneman, 1971 , 1974 ), people’s judgments violate principles of coherence. Three basic heuristics—representativeness, availability, and anchoring and adjustment—were introduced as evidence that points to how people tend to process information in a highly economical and effective way, even though they are subject to biases (Kahneman & Tversky, 1972 ; Tverksy & Kahneman, 1974 ).

The first heuristic, representativeness, is when people judge probability by similarity. Specifically, when identifying whether an object is a part of a category, they identify how similar the object is to the typical member of that category (Baron, 2012 ; Kahneman & Tversky, 1972 , 1973 ). For example, in estimating the likelihood or frequency of event A compared to both events A and B, the representativeness heurist leads what is called the conjunction fallacy (Tversky & Kahneman, 1983 ). For example, consider the classic Linda problem:

Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in antinuclear demonstrations.

In Tversky and Kahneman’s ( 1983 ) study, participants were given two options and were asked which is most likely; 85% of participants ranked the option “Linda is a bank teller and is an active feminist” (events A and B) above the option “Linda is a bank teller” (event A). This result demonstrates how human mental operations do not always correspond to the law of probabilities (Tversky & Kahneman, 1983 ). The probability of two events occurring together (in “conjunction”) is always less than or equal to the probability of either one occurring alone: P(A) ≥ P(A MATH B) ≤ P(B). The observed ranking is a conjunction fallacy because the probability that Linda is either a bank teller, P(A) or an active feminist, P(B), should be judged as more probable (or equally probable) than the probability that she is both, P(A MATH B). The description of Linda, however, was more representative of a stereotype (Linda was deeply concerned with issues of discrimination and social justice), and therefore people thought that the probability of the conjunction was more representative than the unrepresentative class of bank tellers.

Examples of inconsistent joint probability judgments are also observed as disjunction fallacies (Bar-Hillel, 1973 ; Tversky & Shafir, 1992 ). A disjunction fallacy occurs when two events, A or B, are judged as being less probable than at least one of the components individually. However, the disjunction of two events is at least as likely as either of the events occurring individually: P(A) ≤ P(A MATH B) ≥ P(B). For example, the chance that Linda is either a bank teller or a feminist (or both) should be greater than the chance that she is just either one of those options. On average, however, people tend to choose the single event that better fits their stereotype instead of the disjunctive option (e.g., Bar-Hillel & Neter, 1993 ; Tversky & Koehler, 1994 ).

A series of other biases, such as insensitivity to prior probabilities, insensitivity to the accurate predictability (i.e., making a prediction based on the representativeness of a scenario description, not the reliability of the evidence), illusions of validity (i.e., showing great confidence in a prediction based on the good fit of the description and the available options even if they are aware of the factors that limit the accuracy of the prediction), belief in the law of small numbers (i.e., that long runs and streaks cannot be random even in small samples of behavior) (e.g., Gilovich et al., 1985 ; Kahneman & Tversky, 1972 , 1973 ; Tversky & Kahneman, 1971 , 1974 ).

The second heuristic, availability, refers to the instances in which judgments of the frequency of a class or the probability of an event or similar occurrences are remembered or come to mind (Kahneman & Tversky, 1972 ; Tversky & Kahneman, 1974 ). For example, one may assess the risk of a hurricane based on memory for recent events or may estimate the chance of a car accident as a result of driving under the influence of alcohol by recalling such events among their acquaintances. In this case, the availability of information (easy to retrieve memories) can create biases because judgments based on recollections of specific events often are affected by other factors instead of frequency and probability. Some of these biases are a result of the retrievability of instances due to familiarity (e.g., how many time one has driven under the influence) or salience of an event (e.g., the impact of being in a hurricane zone during the storm surge) or the effectiveness of a search set, which is influenced by cues such as the first letter of the word or the retrieval context in which that information appears. They can also be a result from how well one can imagine the events, such as contingencies (e.g., the risk involved in not heeding a hurricane evacuation warning is evaluated by imagining contingencies such as flooding), or even illusory correlation, which is the overestimation of the likelihood that two events will co-occur (e.g., to believe that small cities have generally nicer people than larger cities without any factual basis in objective probabilities; Chapman & Chapman, 1967 ; Tversky & Kahneman, 1973 , 1974 ).

Another heuristic is anchoring and adjustment, in which people tend to make an estimation of a value starting from an initial value and then adjust. However, the adjustment is usually insufficient (e.g., Slovic & Lichtenstein, 1971 ; Tversky & Kahneman, 1974 ). An example of insufficient adjustment can be illustrated by the attempt to quickly estimate the product of two computations: (A) 8 × 7 × 6 × 5 × 4 × 3 × 2 × 1 and (B) 1 × 2 × 3 × 4 × 5 × 6 × 7 × 8. In both cases, the initial values (i.e., 8 and 1) serve as anchors and a quick estimation of the result led to insufficient adjustment in both cases: the median estimates were 2,250 and 512 respectively, even though the correct answer is identical, 40,320.

Other heuristics and biases have been later identified (the following examples represent relevant effects that were influential even though not presented in chronological order). One such example is the confirmation bias, in which people seek out and give more weight to evidence that is consistent with their hypotheses while failing to test disconfirming hypothesis or ignoring evidence (e.g., I favor candidate A for an upcoming election. Thus, I will seek out and remember favorable news press on candidate A while not seeking out unfavorable news about candidate A that would undermine the initial impression of the candidate; Wason, 1960 ). Klayman and Ha ( 1987 ) pointed out that only seeking to confirm hypothesis could be a defensible strategy under specific conditions. Hindsight bias captures the idea that people tend to believe that an event was more predictable than it was prior to the event occurring (e.g., I always knew my team would win; Fischhoff, 1975 ; Fischhoff & Beyth, 1975 ; Klein et al., 2017 ; Roese & Vohs, 2012 ). There is also the overconfidence effect, in which people tend to believe that their own abilities, knowledge, and/or judgments are greater than they actually are in reality (Brenner et al., 1996 ; Dunning et al., 1990 ). This list is not exhaustive, but is meant to provide examples of influential judgment heuristics that shaped and continue to shape the field of judgment and decision making (see also Gilovich et al., 2002 ).

Amos Tversky and Daniel Kahneman argued that heuristics were adaptive but also produced biases and fallacies. Gerd Gigerenzer and his colleagues challenged the claim that biases and fallacies were errors and in that sense argued that heuristics are adaptive (e.g., Gigerenzer, 1991 , 1996 ; Gigerenzer & Gaissmaier, 2011 ; Gigerenzer et al., 1999 ). These researchers suggest that heuristics must have been favored by evolution (although the fact that a behavior occurs does not make it a product of natural selection; that is a fallacy). In addition, evolutionary arguments are post hoc and thus difficult to test scientifically (but see Cosmides & Tooby, 1996 ). Gigerenzer and Hoffrage ( 1995 ) claimed that heuristics do not necessarily lead to biases if people are asked questions in terms of frequencies (instead of probabilities), which they asserted to be more “natural.” However, evidence disentangling multiple causes of performance have shown that frequency formats do not improve performance (Barbey & Sloman, 2007 ; Cuite et al., 2008 ; Evans et al., 2000 ; Koehler & Macchi, 2004 ; Reyna, 2004 ; Wolfe & Reyna, 2010 ). Other “fast and frugal” heuristics (i.e., heuristics that do not take much processing time nor many cognitive resources), such as the recognition heuristic and the gaze heuristic, have also been studied (Gigerenzer & Goldstein, 1996 ; Goldstein & Gigerenzer, 2002 ). Researchers point to the need to specify the environmental circumstances that bound the accurate use of heuristics (e.g., Dougherty et al., 2008 ; Newell & Shanks, 2004 ; Hogarth & Karelaia, 2006 ; Kahneman & Tversky, 1996 ).

Framing Effects

Unlike most decisions made based on heuristics, which rely on judgments under uncertainty, decisions under risk involve the knowledge of the probabilities (i.e., gamble) associated with available outcomes. When facing a choice between a sure win (e.g., $50 for sure) and a gamble (e.g., 50% chance to win $100), people are often risk averse and prefer the sure gain to the gamble when the expected value is the same (even if they prefer a gamble when the expected value is higher). When faced with losses, however, they show a preference toward the risky gamble (e.g., 50% chance to lose $100) over the certain loss (e.g., to lose $50 for sure), that is, they are more risk seeking (Tversky & Kahneman, 1986 , 1991 ; see also Steiger & Kühberger, 2018 ). (Note that risk taking patterns change with very small probabilities; e.g., Kahneman & Tversky, 1979 .)

To predict the gain–loss change in response pattern, Kahneman and Tversky ( 1979 ) suggest that irrational biases occur even when the expected value is the same in all four options (i.e., $50 for gains and losses), a framing effect. The framing effect is the display of conflicting risk preferences despite quantitatively equivalent options. Consider the classical example of the dread-disease problem (Tversky & Kahneman, 1981 ):

Imagine that the United States is preparing for the outbreak of an unusual disease that is expected to kill 600 people. Two alternative programs to combat the disease have been proposed. Assume that the exact scientific estimates of the consequences of the programs are as follows:

If program A is adopted, 200 people will be saved.

If program B is adopted, there is a one-third probability that 600 people will be saved, and a two-thirds probability that no people will be saved.

Alternatively,

If program C is adopted 400 people will die.

If program, D is adopted there is a one-third probability that nobody will die, and a two-thirds probability that 600 people will die.

In this example, people choose between two different programs to combat the disease depending on the condition they were assigned. The expected value is the same among all four options (i.e., 200 would live and 400 would die), but preferences change across gains and losses problems (i.e., in the gain frame, the majority choose program A, which is the risk-averse option, whereas in the loss frame, the majority choose program D, which is the risk-seeking option). Framing effects have been widely investigated, and preferences seem to replicate across multiple contexts and cultures (e.g., Edwards et al., 2001 ; Gallagher & Updegraff, 2011 ; Kühberger, 1995 , 1998 ; Kühberger & Tanner, 2010 ; Levin et al., 1998 ; McGettigan et al., 1999 ; van Schie & van der Pligt, 1995 ). Yet, some researchers suggest that individuals are more likely to produce the traditional framing effect in situations that are simply described to them as hypothetical scenarios rather than in situations learned from experience (e.g., Barron & Erev, 2003 ; Estes, 1976 ; Hadar & Fox, 2009 , Hertwig & Erev, 2009 ).

This preference reversal (i.e., risk aversion for gains and risk seeking for losses) was predicted by a highly influential descriptive theory, prospect theory (Kahneman & Tversky, 1979 ) and later, cumulative prospect theory (Tversky & Kahneman, 1992 ). Prospect theory is an attempt to explain the process by which people make choices between different gambles (or prospects) associated with different probabilities, using a psychological value function for outcomes and a psychological weighting function for probabilities. The value function differentiates gains and losses based on deviations from the reference point and is assumed to be concave for gains and convex (and steeper) for losses. In the nonlinear weighting function for probabilities, small probabilities tend to be overweighted relative to their objective magnitudes, and large probabilities tend to be underweighted. Prospect theory also influenced some subdisciplines of economics from this time: behavioral game theory (Camerer, 1990 ), behavioral decision theory (Einhorn & Hogarth, 1981 ), and behavioral finance theory (Thaler, 1980 , 1993 ).

Modern Era: Rationality and Intuition

After the 1990s, several approaches were used to distinguish two processes responsible for cognitive function in judgment and decision making, one process that is based on rationality, which is largely about making consistent choices, and the other that is a result of intuition or affect, often leading to biases (see Kahneman, 2003 , 2011 ; Stanovich & West, 2000 , for a detailed overview). Dual process models incorporate rationality in addition to intuition (and sometimes affect or emotion) as both sides of a coin. In a simplified way, dual-process approaches (which were prevalent in several subdisciplines in psychology) recognize the influence of both rational thoughts and irrational intuition on judgment and decision making. For example, if the Linda problem is revisited, one would make the wrong judgment again because their cognition would most likely rely on heuristic and intuitive processes (system 1), even though rational, deliberative process (system 2) would most likely yield the correct answer.

One of the psychologists to discuss a conflict between these processes was Seymour Epstein, building directly on Freudian dualism as well as Cartesian dualism (between the immaterial mind/soul and the material body). For Epstein ( 1994 ), and his cognitive-experiential self-theory, the two methods of information processing are distinct. That is, intuition-rationality distinction was based on Freud’s psychodynamic distinction between primary versus secondary processes (i.e., pleasure and control systems, respectively). Even though Epstein was not a decision research scientist, his contribution to the field was instrumental to the systematic understanding of individual differences in these processes (see also Reyna & Brainerd, 2008 ; Stanovich & West, 2008 ).

Several other researchers have attempted to describe these processes and, despite differences, the features are that the intuitive or emotional process is often associative, experiential, fast, and impulsive, which can also be described as System 1; and the rational process, or System 2, is more analytical, deliberative, rule-based, and slow, and cognitively effortful, which is responsible for well-thought-out judgments and putatively advanced choices (Epstein, 1994 ; Epstein et al., 1996 ; Kahneman, 2003 , 2011 ; Evans & Stanovich, 2013 ; Sloman, 1996 ; Stanovich, 1999 ; Stanovich & West, 2000 ).

According to these theories, people can rely on one process more than the other when making decisions. Susceptibility to framing effects, for example, should be a result of high intuitive thinking and low rationality, because they occur when options of the same objective value are evaluated differently (e.g., Kahneman, 2003 ; Porcelli & Delgado, 2009 ). However, there is relevant empirical evidence contesting standard dual-process theory (e.g., Reyna & Brainerd, 2008 ; Shiloh et al., 2002 ), suggesting that even a seemingly all-inclusive rational–irrational dualism needs updating.

One version of the dual process approach was the assumption of intuition (or affect) as a default over rationality, even though rationality can override intuition (Epstein et al., 1996 ; see also Kahneman, 2003 , 2011 ). Kahneman and Frederick ( 2002 , 2005 ) test the hypothesis that the overriding function of rationality is part of a monitoring feature that allows expressions of intuition but intervenes when necessary (see also Kahneman, 2003 , 2011 , for a review). Frederick ( 2005 ) introduced the Cognitive Reflection Test to assess individual differences in these processes. People answer questions in which the immediate, impulsive guess is incorrect, and thus they have to inhibit the erroneous thought and check for the correct response. For example, people are asked to indicate the cost of a ball in the following scenario “A bat and a ball cost $1.10 in total. The bat costs $1 more than the ball.” Most people (more than 50%) tend to answer 10 cents because it is the result of the sum with $1 and the first response that occurs to them. However, on reflection, the correct answer turns out to be 5 cents.

A slightly different approach to this dualism is the affect heuristic proposed by Slovic and colleagues (e.g., Finucane et al., 2000 ). They point to an important role for feelings (or affective responses that occur fast), not cognition, as a basis for judgment and decision processes (Slovic et al., 2002 , 2005 ). According to this perspective, how people feel about a topic (i.e., their subjective feeling of risk) is what allows them to construct preferences (e.g., between wind energy and nuclear power plants). Both negative and positive affect are argued to play a role in the overall evaluation of alternatives (Loewenstein et al., 2001 ; Weber & Johnson, 2009 ).

Other researchers have qualified their view of dual systems approaches by replacing the term “systems” with “types,” to avoid oversimplification of the processes underlying decision making in a two-system view (Evans, 2008 , 2009 , 2010 , 2011 ; Stanovich, 2009 , 2010 ; Stanovich et al., 2011 ). In this view, type 1 processes are intuitive, fast, and automatic. The defining attribute is that type I processes are not limited by cognitive capacity, in contrast to type 2 processes that are slow because of working memory limitations. Type 2 processes are also associated with executive functions (for counter-evidence, see Reyna & Brainerd, 1995 ). Finally, type 3 is a reflective mind responsible for monitoring and inhibiting conflicting responses between types 1 and 2, or even overriding type 1 responses as needed (Evans, 2011 ). More generally, according to Barrouillet ( 2011 , p. 83), “the developmental predictions that can be drawn from this [dual-process] theory are contradicted by facts,” which bear on the validity of theories about adults (certainly about which process is less vs. more advanced). Keren and Schul ( 2009 ) also argue that most standard dual processes had ill-defined theoretical structures of the two systems and were not formulated as testable hypotheses.

In another descriptive theory that went beyond prior theories to make new predictions for judgment and decision making, Valerie Reyna and Charles Brainerd ( 1995 , 2011 ) proposed a distinction based on how information is mentally represented, that is, gist or verbatim representations and associated processes, as well as social values, reward sensitivity (sensation seeking), and inhibition (Reyna et al., 2015 ). The theory’s description of mental representation distinguishes how people represent information in a verbatim-to-gist continuum (i.e., from the most precise and literal to the simplest meaningful distinction between options). Verbatim representations support rote analytical processes (e.g., 20% risk = 2 × 10% risk). Gist representations support intuitive processing that is imprecise (i.e., fuzzy), but also insightful, advanced, and meaningful interpretations of information (e.g., some as opposed to no risk, or, if needed, low as opposed to high risk). This gist process is considered a more advanced form of processing because it incorporates factors that affect the understanding of information, such as background knowledge, life experience, culture, education, and emotional import (e.g., whether a patient should feel worried or relieved about a 20% risk). Gist and verbatim processing occurs in parallel as a means of representing information that is relevant to the decision process, in contrast to standard default-interventionist approaches to dual processes (e.g., Brust-Renck et al., 2016 ; Reyna, 2012 ; Reyna & Adam, 2003 ; Reyna & Farley, 2006 ).

Most adults have a fuzzy preference to rely on gist-based processes to make decisions, relying on the bottom-line, qualitative interpretation of the meaning of information (e.g., difference between some versus none, or more versus less) rather than a rote meaningless approach (e.g., the categorical difference between 200 saved and saving no one; Broniatowski & Reyna, 2018 ). Thus, in the dread-disease problem, choices would be a result of the simplest qualitative distinctions. Information is encoded from the two options based on the gist distinctions, such as “saving some people” (i.e., 200) versus “possibly saving some people or saving none” (i.e., one-third of 600 or two-thirds of 0; Kühberger & Tanner, 2010 ; Reyna & Brainerd, 1991 , 2011 ). According to this theory, a fuzzy preference to rely on gist representations of the options helps people apply their values to that gist (values such as saving lives is good). This can explain the choice for the sure option in the gain frame because of adult’s preference for “saving some lives” compared to “saving none,” In the loss frame, people are given the choice between the safe option, “If program C is adopted 400 people will die” and the risky option, “If program D is adopted there is one-third probability that nobody will die, and two-thirds probability that 600 people will die.” Given these alternatives, people tend to opt for the risky option because they derive the gist of the options for program C versus program D, and they prefer “none dying” (i.e., one-third of 0) to “some dying” (i.e., 400). Hence, these simple gist distinctions produce risk aversion for gains and risk seeking for losses in the dread-disease problem and many similar risky decisions (Reyna, 2012 ; Reyna et al., 2014 ).

This research also rules out alternative explanations for gain–loss framing effects, such as prospect theory (Kahneman & Tversky, 1979 ; Tversky & Kahneman, 1992 ). According to Kühberger and Tanner ( 2010 ), one of several critical tests of prospect theory and fuzzy-trace theory is to show the question without the “zero complement” of the risky option (i.e., two-thirds of 0 surviving in the gain frame, and one-third of 0 dying in the loss frame), for which the proportion of people that preferred the risky option in the gain frame (52%) and in the loss frame (48%) is approximately the same. This result disconfirms prospect theory because removing zero should have no effect on framing differences. The authors showed the classical effect when the “zero complement” was present, namely, that people preferred the risky option 30% of the time in the gain frame and 61% of the time in the loss frame, as predicted by both theoretical perspectives (see Broniatowski & Reyna, 2018 ; Reyna et al., 2014 ).

Other approaches, such as information leakage, explain attribute framing effects, but not risky choice framing effects (Sher & McKenzie, 2006 ). Attribute framing is when a single dimension is expressed positively (e.g., 80% correct on a test) as opposed to negatively (e.g., 20% wrong on a test). Speakers’ choice of positive wording conveys additional information about valence, such that the test is perceived more positively when expressed as 80% correct than as 20% wrong. (For an elegant discussion of the differences among attribute framing, risky-choice framing, and goal framing see Keren, 2012 .). Consistent with the assumption of the information leakage account that people respond similarly when information is perceived to be equivalent, van Buiten and Keren ( 2009 ) found that there were no reversals in risk preference when all participants (speakers and listeners) were provided with both frames and told that both sets of options were mathematically equivalent. Therefore, separate but related theories are needed to account for both attribute framing and risky-choice framing effects (but see Gamliel & Kreiner, 2020 ).

Fuzzy-trace theory also predicts individual differences across adults and developmental differences across the lifespan (Reyna & Brainerd, 2011 ). For example, individuals with certain kinds of autism are higher in verbatim processing and lower in gist processing. Therefore, fuzzy-trace theory makes the surprising prediction that they will be technically more rational because they are less likely to demonstrate gist-based biases such as framing effects and the conjunction fallacy; these predictions were supported. The theory also predicts that framing effects and other biases become greater from childhood to adulthood, as information processing becomes more gist-based (also observed; Reyna & Farley, 2006 ; see also Paulsen et al., 2012 ). These studies remove the burden of symbolic and formal mathematical processing by using piles of prizes (e.g., stickers or toys) as outcomes and shaded areas of spinners to convey probability (Reyna & Ellis, 1994 ). Research on fuzzy-trace theory has further shown that prospect theory and utility theories cannot explain framing and other classic effects, and that novel phenomena of memory, judgment, and decision making can be explained with a small set of testable assumptions (Corbin et al., 2015 ). These ideas have been applied with the goal of improving decision making in law, medicine, and public health (Blalock & Reyna, 2016 ; Reyna et al., 2016 ).

Conclusion: What the Future Holds

Historically, the study of judgment and decision making in the field of psychology has centered on questions related to evaluation of options, preferences, and choice, focusing on deviations from economic, normative behavior and proposing descriptive models of behavior that account for these deviations. Current psychological models increasingly emphasize process-level explanations and behavioral predictions rather than mere demonstrations of biases and fallacies. Recently, neuroeconomics has emerged as an interdisciplinary field at the intersection of psychology, economics, and the growing field of neuroscience (Loewenstein et al., 2008 ). Neuroeconomics builds on data and theory from behavioral economics and decision research to further understanding of the brain.

Neuroscience findings, in turn, can further our understanding of current models of judgment and decision making (Reyna et al., 2012 ). Neuroscientists use tools such as functional magnetic resonance imaging (fMRI) and electroencephalography (EEG) to lend insight into human judgment and decision making that could not easily be investigated using solely behavioral paradigms—for example, findings from neuroscience studies suggest that decision making involves so-called “default mode” (neural) networks (DMN; internally oriented processing as opposed to engagement with external tasks) along with task-engaged controlled processes (Li et al., 2017 ; Loewenstein et al., 2008 ). Recent findings from an extensive meta-analysis on the DMN and the subjective value network suggest that there is overlap in the functional connectivity of these neural networks, specifically in the central ventromedial prefrontal cortex (cVMPFC) and the dorsal posterior cingulate cortex (Acikalin et al., 2017 ). These findings are consistent with the current understanding of the VMPFC as an area involved in subjective value assessment, and it has been shown that subjective value is positively associated with VMPFC activation (Levy & Glimcher, 2012 ).

Neuroscience tools can be used to look at neural activation during different decision strategies and to observe activity in the brain after winning versus losing a gamble (Venkatraman et al., 2009 ; Xue et al., 2011 ). Neuroscience can also be used to understand the neural circuitry of systematic inconsistencies and errors that have been established in the judgment and decision-making literature. For example, a substantial amount of work has been devoted to examining the neural underpinnings of framing effects (e.g., De Martino et al., 2006 ; Li et al., 2017 ; Reyna et al., 2018 ; Roiser et al., 2009 ; Weller et al., 2007 ; Zheng et al., 2010 ). Several studies have shown that the amygdala is activated when people are making framing-consistent choices (i.e., choosing the sure gain or risky loss; De Martino et al., 2006 ; Li et al., 2017 ; Roiser et al., 2009 ). Findings from a recent meta-analysis of neuroimaging studies of framing suggest that activation during framing-consistent choices resembles activation that closely corresponds to the DMN, whereas the pattern of activation during framing-inconsistent choices (i.e., choosing the risky gain or sure loss) most closely corresponds to areas activated during task engagement (Li et al., 2017 ). Note that these results do not simply suggest that frame-consistent choices require limited effort or engagement, and for that reason, they are associated with the neural profiles of the DMN. Lack of effort would merely predict random or indifferent responses. Instead, critical tests indicate that systematic framing biases are attributable to gist representations (e.g., Reyna et al., 2014 ), which might be reflected in coactivation between DMN and PFC, and the latter can also reflect inhibition of noticed biases (see Broniatowski & Reyna, 2018 ; McCormick et al., 2019 ; Spreng & Turner, 2019 ).

Developmental neuroscience has also used behavioral paradigms to examine neural activity during judgments and decisions involving risk in adolescence, a period of development that involves a heightened amount of risky decision making in real life (Casey et al., 2016 ; Chein et al., 2011 ; Reyna, 2018 ; Steinberg, 2008 ). For example, using a simulated driving task, Chein and colleagues ( 2011 ) found that adolescents take more risks and have greater activation in reward-related areas such as the ventral striatum (VS) and orbital frontal cortex when driving with a peer present versus when they are driving alone. These findings suggest that peers may elicit a response in reward centers of the brains of adolescents that is similar to the response to food, sex, or drugs. Casey et al. ( 2016 ) illustrate a hierarchy of the changes that occur in the brain to explain the neural substrates of adolescent risky decision making. The authors describe a transition from subcortico-subcortical to cortico-cortical connectivity across development. In childhood, subcortical systems are driving behavior, whereas adolescence is characterized by a strengthening of connections to cortical frontal areas. Finally, in young adulthood, the cortico-cortical networks are more developed, with increased lateral PFC modulation of the medial PFC, resulting in more top-down control and goal-oriented behavior.

Neuroscience shows great promise for furthering our understanding of human judgment and choice. However, the interpretation of neuroscientific findings rests crucially on the behavioral tasks that are used. Brain activation by itself is meaningless. Together, carefully designed laboratory tasks and neuroscientific methods have extensive ecological implications: Judgment and decision making affect who is elected to office, what kinds of policies are supported, risky choices (e.g., drinking and driving), and unhealthy behaviors (e.g., smoking cigarettes). Understanding more about the underlying processes by testing theoretical predictions is fundamental to designing effective behavioral interventions and ultimately improving judgment and decision making.

Further Reading

  • Ariely, D. (2009). Predictably irrational, revised and expanded edition . Harper Collins.
  • Baron, J. (2007). Thinking and deciding (4th ed.). Cambridge University Press.
  • Belsky, G. , & Gilovich, T. (2010). Why smart people make big money mistakes and how to correct them: Lessons from the life-changing science of behavioral economics . Simon & Schuster.
  • Fischhoff, B. , Brewer, N. T. , & Downs, J. S. (2012). Communicating risks and benefits: An evidence-based user’s guide . Government Printing Office.
  • Frank, R. H. (2018). The economic naturalist: In search of explanations for everyday enigmas . Basic Books.
  • Hammond, J. S. , Keeney, R. L. , & Raiffa, H. (2015). Smart choices: A practical guide to making better decisions . Harvard Business Review Press.
  • Hanoch, Y. , Barnes, A. J. , & Rice, T. (2017). Behavioral economics and healthy behaviors: Key concepts and current research . Taylor & Francis.
  • Hastie, R. , & Dawes, R. M. (2009). Rational choice in an uncertain world: The psychology of judgment and decision making . SAGE.
  • Kahneman, D. (2011). Thinking, fast and slow . Macmillan.
  • Reyna, V. F. , & Zayas, V. E. (2014). The neuroscience of risky decision making . American Psychological Association.
  • Russo, J. E. , & Schoemaker, P. J. (2002). Winning decisions: Getting it right the first time . Currency.
  • Thaler, R. H. , & Sunstein, C. R. (2009). Nudge: Improving decisions about health, wealth, and happiness . Penguin.
  • Wilhelms, E. A. , & Reyna, V. F. (Eds.). (2014). Neuroeconomics, judgment, and decision making . Psychology Press.
  • Acikalin, M. Y. , Gorgolewski, K. J. , & Poldrack, R. A. (2017). A coordinate-based meta-analysis of overlaps in regional specialization and functional connectivity across subjective value and default mode networks. Frontiers in Neuroscience , 11 , 1.
  • Allais, M. (1953). Le comportement de l’homme rationnel devant le risque: Critique des postulats et axiomes de l’école Américaine. Econometrica , 21 , 503–546.
  • Arkes, H. R. , & Hammond, K. R. (Eds.). (1986). Judgment and decision making: An interdisciplinary reader . Cambridge University Press.
  • Bar-Hillel, M. (1973). On the subjective probability of compound events . Organizational Behavior and Human Performance , 9 (3), 396–406.
  • Bar-Hillel, M. , & Neter, E. (1993). How alike it is versus how likely it is: A disjunction fallacy in probability judgments . Journal of Personality and Social Psychology , 65 (6), 1119–1131.
  • Barbey, A. K. , & Sloman, S. A. (2007). Base-rate respect: From ecological rationality to dual process . Behavioral and Brain Sciences , 30 (2), 241–297.
  • Baron, J. (2012). The point of normative models in judgment and decision making . Frontiers in Psychology , 3 , Article 577.
  • Barron, G. , & Erev, I. (2003). Small feedback-based decisions and their limited correspondence to description-based decisions . Journal of Behavioral Decision Making , 16 (3), 215–233.
  • Barrouillet, P. (2011). Dual-process theories and cognitive development: Advances and challenges . Developmental Review , 31 , 79–85.
  • Becker, G. M. , & McClintock, C. G. (1967). Value: Behavioral decision theory . Annual Review of Psychology , 18 (1), 239–286.
  • Bell, D. E. , Raiffa, H. , & Tversky, A. (1988). Decision making: Descriptive, normative, and prescriptive interactions . Cambridge University Press.
  • Bernoulli, D. (1954). Exposition of a new theory on the measurement of risk . Econometrica , 22 (1), 23–36. (Originally published in 1738 as Specimen theoriae novae de mensura sortis: Commentarii Academiae Scientiarum Imperialis Petropolitanae, Tomus V, 175–192)
  • Blalock, S. J. , & Reyna, V. F. (2016). Using fuzzy-trace theory to understand and improve health judgments, decisions, and behaviors: A literature review . Health Psychology , 35 (8), 781–792.
  • Brenner, L. A. , Koehler, D. J. , Liberman, V. , & Tversky, A. (1996). Overconfidence in probability and frequency judgments: A critical examination . Organizational Behavior and Human Decision Processes , 65 (3), 212–219.
  • Broniatowski, D. A. , & Reyna, V. F. (2018). A formal model of fuzzy-trace theory: Variations on framing effects and the Allais Paradox . Decision , 5 (4), 205–252.
  • Brust-Renck, P. G. , Reyna, V. F. , Wilhelms, E. A. , & Lazar, A. N. (2016). A fuzzy-trace theory of judgment and decision making in healthcare: Explanation, prediction, and application . In M. A. Diefenbach , S. M. Miller , & D. J. Bowen (Eds.), Handbook of health and decision science (pp. 71–86). Springer.
  • Camerer, C. F. (1990). Behavioral game theory. In R. M. Hogarth (Ed.), Insights in decision making: A tribute to Hillel J. Einhorn (pp. 311–336). University of Chicago Press.
  • Camerer, C. , & Weber, M. (1992). Recent developments in modeling preferences: Uncertainty and ambiguity. Journal of Risk and Uncertainty , 5 (4), 325–370.
  • Casey, B. J. , Galván, A. , & Somerville, L. H. (2016). Beyond simple models of adolescence to an integrated circuit-based account: A commentary . Developmental Cognitive Neuroscience , 17 , 128–130.
  • Chapman, L. J. , & Chapman, J. P. (1967). Genesis of popular but erroneous psychodiagnostic observations . Journal of Abnormal Psychology , 72 (3), 193–204.
  • Chein, J. , Albert, D. , O’Brien, L. , Uckert, K. , & Steinberg, L. (2011). Peers increase adolescent risk taking by enhancing activity in the brain’s reward circuitry. Developmental Science , 14 (2), F1–F10.
  • Corbin, J. C. , Reyna, V. F. , Weldon, R. B. , & Brainerd, C. J. (2015). How reasoning, judgment, and decision making are colored by gist-based intuition: A fuzzy-trace theory approach . Journal of Applied Research in Memory and Cognition , 4 (4), 344–355.
  • Cosmides, L. J. , & Tooby, J. (1996). Are humans good intuitive statisticians after all? Rethinking some conclusions from the literature on judgment under uncertainty . Cognition , 58 (1), 1–73.
  • Cuite, C. L. , Weinstein, N. D. , Emmons, K. , & Colditz, G. (2008). A test of numeric formats for communicating risk probabilities . Medical Decision Making , 28 (3), 377–384.
  • De Martino, B. , Kumaran, D. , Seymour, B. , & Dolan, R. J. (2006). Frames, biases, and rational decision-making in the human brain . Science , 313 (5787), 684–687.
  • Dougherty, M. R. , Franco-Watkins, A. M. , & Thomas, R. (2008). Psychological plausibility of the theory of probabilistic mental models and fast and frugal heuristics . Psychological Review , 115 (1), 199–213.
  • Dunning, D. , Griffin, D. W. , Milojkovic, J. D. , & Ross, L. (1990). The overconfidence effect in social prediction . Journal of Personality and Social Psychology , 58 (4), 568–581.
  • Edwards, A. W. F. (2001). Blaise Pascal . In C. C. Heyde , E. Seneta , P. Crépel , S. E. Fienberg , & J. Gani (Eds.), Statisticians of the centuries (pp. 17–22). Springer Science and Business Media.
  • Edwards, W. (1954). The theory of decision making . Psychological Bulletin , 51 (4), 380–417.
  • Edwards, W. (1961). Behavioral decision theory . Annual Review of Psychology , 12 , 473–498.
  • Edwards, A. , Elwyn, G. , Covey, J. , Matthews, E. , & Pill, R. (2001). Presenting risk information: A review of the effects of “framing” and other manipulations on patient outcomes . Journal of Health Communication , 6 (1), 61–82.
  • Edwards, W. , Lindman, H. , & Savage, L. J. (1963). Bayesian statistical inference for psychological research . Psychological Review , 70 (3), 193–242.
  • Einhorn, H. J. , & Hogarth, R. M. (1981). Behavioral decision theory: Processes of judgment and choice . Annual Review of Psychology , 32 , 53–88.
  • Ellsberg, D. (1961). Risk, ambiguity, and the savage axioms. Quarterly Journal of Economics , 75 (4), 670–689.
  • Epstein, S. (1994). Integration of the cognitive and the psychodynamic unconscious . American Psychologist , 49 (8), 709–724.
  • Epstein, S. , Pacini, R. , Denes-Raj, V. , & Heier, H. (1996). Individual differences in intuitive– experiential and analytical–rational thinking styles . Journal of Personality and Social Psychology , 71 (2), 390–405.
  • Estes, W. K. (1976). The cognitive side of probability learning. Psychological Review , 83 (1), 37–64.
  • Evans, J. St. B. T. (2008). Dual processing accounts of reasoning, judgment and social cognition . Annual Review of Psychology , 59 , 255–278.
  • Evans, J. St. B. T. (2009). How many dual-process theories do we need? One, two, or many? In J. St. B. T. Evans & K. Frankish (Eds.), In two minds: Dual processes and beyond (pp. 33–54). Oxford University Press.
  • Evans, J. St. B. T. (2010). Intuition and reasoning: A dual-process perspective . Psychological Inquiry , 21 (4), 313–326.
  • Evans, J. St. B. T. (2011). Dual-process theories of reasoning: Contemporary issues and developmental applications . Developmental Review , 31 (2–3), 86–102.
  • Evans, J. St. B. T. , Handley, S. J. , Perham, N. , Over, D. E. , & Thompson, V. A. (2000). Frequency versus probability formats in statistical word problems . Cognition , 77 (3), 197–213.
  • Evans, J. S. B. , & Stanovich, K. E. (2013). Dual-process theories of higher cognition: Advancing the debate . Perspectives on Psychological Science , 8 (3), 223–241.
  • Finucane, M. L. , Alhakami, A. , Slovic, P. , & Johnson, S. M. (2000). The affect heuristic in judgments of risks and benefits. Journal of Behavioral Decision Making , 13 (1), 1–17.
  • Fischhoff, B. (1975). Hindsight ≠ foresight: The effect of outcome knowledge on judgment under uncertainty. Journal of Experimental Psychology: Human Perception and Performance , 1 (3), 288–299.
  • Fischhoff, B. (2010). Judgment and decision making . WIREs Cognitive Science , 1 (5), 724–735.
  • Fischhoff, B. , & Beyth, R. (1975). “I knew it would happen”: Remembered probabilities of once–future things . Organizational Behavior and Human Performance , 13 (1), 1–16.
  • Fischhoff, B. , & Broomell, S. B. (2020). Judgment and decision making . Annual Review of Psychology , 71 , 331–355.
  • Fishburn, P. C. (1967). Methods for estimating additive utilities. Management Science , 13 (7), 435–453.
  • Frank, R. H. (2015). Microeconomics and behavior . McGraw-Hill Education.
  • Frederick, S. (2005). Cognitive reflection and decision making . Journal of Economic Perspectives , 19 (4), 25–42.
  • Gallagher, K. M. , & Updegraff, J. A. (2011). Health message framing effects on attitudes, intentions, and behavior: A meta-analytic review. Annals of Behavioral Medicine , 43 (1), 101–116.
  • Gamliel, E. , & Kreiner, H. (2020). Applying fuzzy-trace theory to attribute-framing bias: Gist and verbatim representations of quantitative information . Journal of Experimental Psychology: Learning, Memory, and Cognition , 46 (3), 497–506.
  • Gigerenzer, G. (1991). How to make cognitive illusions disappear: Beyond heuristics and biases. European Review of Social Psychology , 2 (1), 83–115.
  • Gigerenzer, G. (1996). On narrow norms and vague heuristics: A reply to Kahneman and Tversky. Psychological Review , 103 (3), 592–596.
  • Gigerenzer, G. , & Gaissmaier, W. (2011). Heuristic decision making. Annual Review of Psychology , 62 (1), 451–482.
  • Gigerenzer, G. , & Goldstein, D. G. (1996). Reasoning the fast and frugal way: Models of bounded rationality. Psychological Review , 103 (4), 650–669.
  • Gigerenzer, G. , & Hoffrage, U. (1995). How to improve Bayesian reasoning without instruction: Frequency formats. Psychological Review , 102 (4), 684–704.
  • Gigerenzer, G. , Todd, P. M. , & The ABC Research Group . (1999). Simple heuristics that make us smart . Oxford University Press.
  • Gilovich, T. , Griffin, D. W. , & Kahneman, D. (2002). Heuristics and biases: The Psychology of intuitive judgment . Cambridge University Press.
  • Gilovich, T. , Vallone, R. , & Tversky, A. (1985). The hot hand in basketball: On the misperception of random sequences. Cognitive Psychology , 17 (3), 295–314.
  • Goldstein, D. G. , & Gigerenzer, G. (2002). Models of ecological rationality: The recognition heuristic. Psychological Review , 109 (1), 75–90.
  • Greenwood, J. D. (1999). Understanding the “cognitive revolution” in psychology . Journal of the History of the Behavioral Sciences , 35 (1), 1–22.
  • Hadar, L. , & Fox, C. R. (2009). Information asymmetry in decision from description versus decision from experience. Judgment and Decision Making , 4 (4), 317–325.
  • Hammond, K. R. (1996). Human judgment and social policy: Irreducible uncertainty, inevitable error, unavoidable injustice . Oxford University Press.
  • Hertwig, R. , & Erev, I. (2009). The description–experience gap in risky choice . Trends in Cognitive Sciences , 13 (12), 517–523.
  • Hogarth, R. M. , & Karelaia, N. (2006). Regions of rationality: Maps for bounded agents . Decision Analysis , 3 (3), 124–144.
  • Kahneman, D. (2003). A perspective on judgment and choice: Mapping bounded rationality . American Psychologist , 58 (9), 697–720.
  • Kahneman, D. (2011). Thinking fast and slow . Farrar, Strauss, Giroux.
  • Kahneman, D. (2012). Two systems in the mind . Bulletin of the American Academy of Arts and Sciences , 65 (2), 55–59.
  • Kahneman, D. , & Frederick, S. (2002). Representativeness revisited: Attribute substitution in intuitive judgment. In T. Gilovich , D. Griffin , & D. Kahneman (Eds.), Heuristics and biases: The psychology of intuitive judgment (pp. 49–81). Cambridge University Press.
  • Kahneman, D. , & Frederick, S. (2005). A model of heuristic judgment. In K. J. Holyoak & R. G. Morrison (Eds.), The Cambridge handbook of thinking and reasoning (pp. 267–293). Cambridge University Press.
  • Kahneman, D. , & Tversky, A. (1972). Subjective probability: A judgment of representativeness . Cognitive Psychology , 3 (3), 430–454.
  • Kahneman, D. , & Tversky, A. (1973). On the psychology of prediction. Psychological Review , 80 (4), 237–251.
  • Kahneman, D. , & Tversky, A. (1979). Prospect theory: An analysis of decision under risk . Econometrica , 47 (2), 263–292.
  • Kahneman, D. , & Tversky, A. (1996). On the reality of cognitive illusions. Psychological Review , 103 (3), 582–591.
  • Keeney, R. L. , & Raiffa, H. (1976). Decisions with multiple objectives: Preferences and value trade-offs . Cambridge University Press.
  • Keren, G. B. (2012). Framing and communication: The role of frames in theory and in practice . Netspar Panel Paper; No. 32. NETSPAR .
  • Keren, G. , & Schul, Y. (2009). Two is not always better than one: A critical evaluation of two-system theories . Perspectives on Psychological Science , 4 (6), 533–550.
  • Keren, G. , & Wu, G. (2015). The Wiley Blackwell handbook of judgment and decision making . John Wiley & Sons.
  • Klayman, J. , & Ha, Y.-W. (1987). Confirmation, disconfirmation, and information in hypothesis testing . Psychological Review , 94 (2), 211–228.
  • Klein, O. , Hegarty, P. , & Fischhoff, B. (2017). Hindsight forty years on: An interview with Baruch Fischhoff. Memory Studies , 10 (3), 249–260.
  • Koehler, J. , & Macchi, L. (2004). Thinking about low-probability events: An exemplar cuing theory . Psychological Science , 15 (8), 540–546.
  • Kühberger, A. (1995). The framing of decisions: A new look at old problems. Organizational Behavior and Human Decision Processes , 62 (2), 230–240.
  • Kühberger, A. (1998). The influence of framing on risky decisions: A meta-analysis. Organizational Behavior and Human Decision Processes , 75 (1), 23–55.
  • Kühberger, A. , & Tanner, C. (2010). Risky choice framing: Task versions and a comparison of prospect theory and fuzzy-trace theory. Journal of Behavioral Decision Making , 23 (3), 314–329.
  • Levin, J. , & Milgrom, P. (2004). Introduction to choice theory . Stanford University Press.
  • Levin, I. P. , Schneider, S. L. , & Gaeth, G. J. (1998). All frames are not created equal: A typology and critical analysis of framing effects . Organizational Behavior and Human Decision Processes , 76 (2), 149–188.
  • Levy, D. J. , & Glimcher, P. W. (2012). The root of all value: A neural common currency for choice. Current Opinion in Neurobiology , 22 (6), 1027–1038.
  • Lewis, M. (2016). The undoing project: A friendship that changed the world . Penguin.
  • Li, R. , Smith, D. V. , Clithero, J. A. , Venkatraman, V. , Carter, R. M. , & Huettel, S. A. (2017). Reason’s enemy is not emotion: Engagement of cognitive control networks explains biases in gain/loss framing. Journal of Neuroscience , 37 (13), 3588–3598.
  • Loewenstein, G. F. , Weber, E. U. , Hsee, C. K. , & Welch, N. (2001). Risk as feelings. Psychological Bulletin , 127 (2), 267–286.
  • Loewenstein, G. , Rick, S. , & Cohen, J. D. (2008). Neuroeconomics. Annual Review of Psychology , 59 , 647–672.
  • Luce, R. D. , & Raiffa, H. (1957). Games and decisions: Introduction and critical survey . John Wiley.
  • Luce, R. D. , & Shipley, E. F. (1962). Preference probability between gambles as a step function of event probability . Journal of Experimental Psychology , 63 (1), 42–49.
  • Markowitz, H. (1952). The utility of wealth. Journal of Political Economy , 60 (2), 151–158.
  • McCormick, M. , Reyna, V. F. , Ball, K. , Katz, J. , & Deshpande, G. (2019). Neural underpinnings of financial decision bias in older adults: Putative theoretical models and a way to reconcile them . Frontiers in Neuroscience , 13 , 184.
  • McGettigan, P. , Sly, K. , O’Connell, D. , Hill, S. , & Henry, D. (1999). The effects of information framing on the practices of physicians. Journal of General Internal Medicine , 14 (10), 633–642.
  • Mellers, B. A. , Schwartz, A. , & Cooke, A. D. J. (1998). Judgment and decision making. Annual Review of Psychology , 49 , 447–477.
  • Miller, G. A. (2003). The cognitive revolution: A historical perspective. Trends in Cognitive Sciences , 7 (3), 141–144.
  • Newell, B. R. , & Shanks, D. R. (2004). On the role of recognition in decision making. Journal of Experimental Psychology: Learning Memory, and Cognition , 30 (4), 923–935.
  • O’Donoghue, T. , & Rabin, M. (1999). Doing it now or later. American Economic Review , 89 (1), 103–124.
  • Paulsen, D. , Carter, R. M. , Platt, M. , Huettel, S. A. , & Brannon, E. M. (2012). Neurocognitive development of risk aversion from early childhood to adulthood . Frontiers in Human Neuroscience , 5 , Article178.
  • Peters, E. , & Slovic, P. (2007). Affective asynchrony and the measurement of the affective attitude component. Cognition & Emotion , 21 (2), 300–329.
  • Pope, D. G. , & Sydnor, J. R. (2015). Behavioral economics: Economics as a psychological discipline. In G. Keren & G. Wu (Eds.), The Wiley Blackwell handbook of judgment and decision making (pp. 800–827). John Wiley & Sons.
  • Porcelli, A. J. , & Delgado, M. R. (2009). Acute stress modulates risk taking in financial decision making. Psychological Science , 20 (3), 278–283.
  • Rabin, M. (1998). Psychology and economics. Journal of Economic Literature , 36 (1), 11–46.
  • Rabin, M. (2002). Alfred Marshall Lecture: A perspective on psychology and economics. European Economic Review , 46 , 657–685.
  • Rangel, A. , Camerer, C. , & Montague, P. R. (2008). Neuroeconomics: The neurobiology of value-based decision-making . Nature Reviews Neuroscience , 9 (7), 545–556.
  • Reyna, V. F. (2004). How people make decisions that involve risk: A dual process approach . Current Directions in Psychological Science , 13 (2), 60–66.
  • Reyna, V. F. (2012). A new intuitionism: Meaning, memory, and development in fuzzy-trace theory . Judgment and Decision Making , 7 (3), 332–359.
  • Reyna, V. F. (2018). Neurobiological models of risky decision-making and adolescent substance use . Current Addiction Reports , 5 , 128–133.
  • Reyna, V. F. , & Adam, M. B. (2003). Fuzzy-trace theory, risk communication, and product labeling in sexually transmitted diseases . Risk Analysis , 23 (2), 325–342.
  • Reyna, V. F. , & Brainerd, C. J. (1991). Fuzzy-trace theory and framing effects in choice: Gist extraction, truncation, and conversion . Journal of Behavioral Decision Making , 4 (4), 249–262.
  • Reyna, V. F. , & Brainerd, C. J. (1995). Fuzzy-trace theory: An interim synthesis . Learning and Individual Differences , 7 (1), 1–75.
  • Reyna, V. F. , & Brainerd, C. J. (2008). Numeracy, ratio bias, and denominator neglect in judgments of risk and probability . Learning and Individual Differences , 18 (1), 89–107.
  • Reyna, V. F. , & Brainerd, C. J. (2011). Dual processes in decision making and developmental neuroscience: A fuzzy-trace model . Developmental Review , 31 (2–3), 180–206.
  • Reyna, V. F. , Chapman, S. B. , Dougherty, M. , & Confrey, J. (2012). The adolescent brain: Learning, reasoning and decision making . American Psychological Association.
  • Reyna, V. F. , Chick, C. F. , Corbin, J. C. , & Hsia, A. N. (2014). Developmental reversals in risky decision-making: Intelligence agents show larger decision biases than college students . Psychological Science , 25 (1), 76–84.
  • Reyna, V. F. , Corbin, J. C. , Weldon, R. B. , & Brainerd, C. J. (2016). How fuzzy-trace theory predicts true and false memories for words, sentences, and narratives . Journal of Applied Research in Memory and Cognition , 5 (1), 1–9.
  • Reyna, V. F. , & Ellis, S. C. (1994). Fuzzy-trace theory and framing effects in children’s risky decision making . Psychological Science , 5 , 275–279.
  • Reyna, V. F. , & Farley, F. (2006). Risk and rationality in adolescent decision-making: Implications for theory, practice, and public policy . Psychological Science in the Public Interest , 7 (1), 1–44.
  • Reyna, V. F. , Helm, R. K. , Weldon, R. B. , Shah, P. D. , Turpin, A. G. , & Govindgari, S. (2018). Brain activation covaries with reported criminal behaviors when making risky choices: A fuzzy-trace theory approach. Journal of Experimental Psychology: General , 147 (7), 1094–1109.
  • Reyna, V. F. , Nelson, W. , Han, P. , & Dieckmann, N. F. (2009). How numeracy influences risk comprehension and medical decision making . Psychological Bulletin , 135 (6), 943–973.
  • Reyna, V. F. , Wilhelms, E. A. , McCormick, M. J. , & Weldon, R. B. (2015). Development of risky decision making: Fuzzy-trace theory and neurobiological perspectives . Child Development Perspectives , 9 (2), 122–127.
  • Roese, N. J. , & Vohs, K. D. (2012). Hindsight bias . Perspectives on Psychological Science , 7 (5), 411–426.
  • Roiser, J. P. , De Martino, B. , Tan, G. C. , Kumaran, D. , Seymour, B. , Wood, N. W. , & Dolan, R. J. (2009). A genetically mediated bias in decision making driven by failure of amygdala control . Journal of Neuroscience , 29 (18), 5985–5991.
  • Samuelson, P. A. (1938). A note on the pure theory of consumer’s behaviour . Economica , 5 (17), 61–71.
  • Savage, L. J. (1954). The foundations of statistics . John Wiley.
  • Schoemaker, P. J. H. (1982). The expected utility model: Its variants, purposes, evidence and limitations . Journal of Economic Literature , 20 (2), 529–563.
  • Sher, S. , & McKenzie, C. R. (2006). Information leakage from logically equivalent frames . Cognition , 101 (3), 467–494.
  • Shiloh, S. , Salton, E. , & Sharabi, D. (2002). Individual differences in rational and intuitive thinking styles as predictors of heuristic responses and framing effects . Personality and Individual Differences , 32 (3), 415–429.
  • Simon, H. (1955). A behavioral model of rational choice. Quarterly Journal of Economics , 69 (1), 99–118.
  • Simon, H. (1956). Rational choice and the structure of the environment. Psychological Review , 63 (2), 129–138.
  • Simon, H. A. (1957). Models of man: Social and rational . John Wiley.
  • Simon, H. A. (1990). Invariants of human behavior . Annual Review of Psychology , 41 , 1–19.
  • Sloman, S. A. (1996). The empirical case for two systems of reasoning. Psychological Bulletin , 119 (1), 3–22.
  • Slovic, P. , & Lichtenstein, S. (1971). Comparison of Bayesian and regression approaches to the study of information processing in judgment. Organization Behavior and Human Performance , 6 (6), 649–744.
  • Slovic, P. , Finucane, M. L. , Peters, E. , & MacGregor, D. G. (2002). The affect heuristic. In T. Gilovich , D. Griffin , & D. Kahneman (Eds.), Heuristics and biases: The psychology of intuitive judgment (pp. 397–420). Cambridge University Press.
  • Slovic, P. , Fischhoff, B. , & Lichtenstein, S. (1977). Behavioral decision theory. Annual Review of Psychology , 28 , 1–39.
  • Slovic, P. , Peters, E. , Finucane, M. L. , & MacGregor, D. G. (2005). Affect, risk, and decision making . Health Psychology , 24 (4S), S35–S40.
  • Smith, V. L. (2001). From old issues to new directions in experimental psychology and economics. Behavioral and Brain Sciences , 24 (3), 428−429.
  • Spreng, R. N. , & Turner, G. R. (2019). The shifting architecture of cognition and brain function in older adulthood . Perspectives on Psychological Science , 14 (4), 523–542.
  • Stanovich, K. E. (1999). Who is rational? Studies of individual differences in reasoning . Erlbaum.
  • Stanovich, K. E. (2009). Distinguishing the reflective, algorithmic, and autonomous minds: Is it time for a tri-process theory? In J. S. B. T. Evans & K. Frankish (Eds.), In two minds: Dual processes and beyond (pp. 55–88). Oxford University Press.
  • Stanovich, K. E. (2010). Rationality and the reflective mind . Oxford University Press.
  • Stanovich, K. E. , & West, R. F. (2000). Individual differences in reasoning: Implications for the rationality debate? . Behavioral and Brain Sciences , 23 (5), 645–665.
  • Stanovich, K. E. , & West, R. F. (2008). On the relative independence of thinking biases and cognitive ability . Journal of Personality and Social Psychology , 94 (4), 672–695.
  • Stanovich, K. E. , West, R. F. , & Toplak, M. E. (2011). The complexity of developmental predictions from dual process models . Developmental Review , 31 , 103–118.
  • Stearns, S. C. (2000). Daniel Bernoulli (1738): Evolution and economics under risk. Journal of Biosciences , 25 (3), 221–228.
  • Steiger, A. , & Kühberger, A. (2018). A meta-analytic re-appraisal of the framing effect . Zeitschrift für Psychologie , 226 (1), 45–55.
  • Steinberg, L. (2008). A social neuroscience perspective on adolescent risk taking . Developmental Review , 28 (1), 78–106.
  • Stigler, B. (1950). The development of Utility Theory. I Journal of Political Economy , 58 , 307–327.
  • Suzumura, K. (1976). Rational choice and revealed preference . The Review of Economic Studies , 43 (1), 149–158.
  • Thaler, R. H. (1980). Toward a positive theory of consumer choice. Journal of Economic Behavior , 1 (1), 39–60.
  • Thaler, R. H. (1993). Advances in behavioral finance . Russell Sage Foundation.
  • Tversky, A. (1969). Intransitivities of preferences. Psychological Review , 76 (1), 31–48.
  • Tversky, A. (1972). Elimination by aspects: A theory of choice. Psychological Review , 79 (4), 281–299.
  • Tversky, A. , & Kahneman, D. (1971). Belief in the law of small numbers. Psychological Bulletin , 76 (2), 105–110.
  • Tversky, A. , & Kahneman, D. (1973). Availability: A heuristic for judging frequency and probability. Cognitive Psychology , 5 (2), 207–232.
  • Tversky, A. , & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science , 185 (4157), 1124–1131.
  • Tversky, A. , & Kahneman, D. (1981). The framing of decisions and the psychology of choice. Science , 211 (4481), 453–458.
  • Tversky, A. , & Kahneman, D. (1983). Extensional vs. intuitive reasoning: The conjunction fallacy in probability judgment. Psychological Review , 90 (4), 293–3l5.
  • Tversky, A. , & Kahneman, D. (1986). Rational choice and the framing of decisions. Journal of Business , 59 (4), S251–S278.
  • Tversky, A. , & Kahneman, D. (1991). Loss aversion in riskless choice: A reference-dependent model. Quarterly Journal of Economics , 106 (4), 1039–1061.
  • Tversky, A. , & Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty , 5 , 297–323.
  • Tversky, A. , & Koehler, D. J. (1994). Support theory: A nonextensional representation of subjective probability. Psychological Review , 101 (4), 547–567.
  • Tversky, A. , & Shafir, E. (1992). The disjunction effect in choice under uncertainty. Psychological Science , 3 (5), 305–309.
  • van Buiten, M. , & Keren, G. (2009). Speaker–listener incompatibility: Joint and separate processing in risky choice framing . Organizational Behavior and Human Decision Processes , 108 (1), 106–115.
  • van Schie, E. C. M. , & van der Pligt, J. (1995). Influencing risk-preference in decision-making: The effects of framing and salience. Organizational Behavior and Human Decision Processes , 63 (3), 264–275.
  • Venkatraman, V. , Payne, J. W. , Bettman, J. R. , Luce, M. F. , & Huettel, S. A. (2009). Separate neural mechanisms underlie choices and strategic preferences in risky decision making. Neuron , 62 (4), 593–602.
  • von Neumann, J. , & Morgenstern, O. (1944). Theory of games and economic behavior . Princeton University Press.
  • Wason, P. C. (1960). On the failure to eliminate hypotheses in a conceptual task. Quarterly Journal of Experimental Psychology , 12 (3), 129–140.
  • Weber, E. U. , & Johnson, E. J. (2009). Mindful judgment and decision making. Annual Review of Psychology , 60 (1), 53–85.
  • Weller, J. A. , Levin, I. P. , Shiv, B. , & Bechara, A. (2007). Neural correlates of adaptive decision making in risky gains and losses. Psychological Science , 18 (11), 958–964.
  • Wolfe, C. R. , & Reyna, V. F. (2010). Semantic coherence and fallacies in estimating joint probabilities . Journal of Behavioral Decision Making , 23 (2), 203–223.
  • Xue, G. , Lu, Z. , Levin, I. P. , & Bechara, A. (2011). An fMRI study of risk-taking following wins and losses: Implications for the gambler’s fallacy. Human Brain Mapping , 32 (2), 271–281.
  • Zheng, H. , Wang, X. T. , & Zhu, L. (2010). Framing effects: Behavioral dynamics and neural basis. Neuropsychologia , 48 (11), 3198–3204.

Related Articles

  • Group Processes
  • Group Decision-Making
  • Development of Judgment, Decision Making, and Rationality

Printed from Oxford Research Encyclopedias, Psychology. Under the terms of the licence agreement, an individual user may print out a single article for personal use (for details see Privacy Policy and Legal Notice).

date: 27 April 2024

  • Cookie Policy
  • Privacy Policy
  • Legal Notice
  • Accessibility
  • [66.249.64.20|45.133.227.243]
  • 45.133.227.243

Character limit 500 /500

  • Frontiers in Psychology
  • Research Topics

Judgment and Decision Making Under Uncertainty: Descriptive, Normative, and Prescriptive Perspectives

Total Downloads

Total Views and Downloads

About this Research Topic

Judging and deciding are endemic features of everyday life, representing prime categories of higher-order cognition that often follow thinking and reasoning and precede planning and action. Although some judgement and decisions may be made under conditions of certainty, by far, most involve some form of ...

Keywords : Judgment, decision-making, uncertainty, probability, higher-order cognition

Important Note : All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.

Topic Editors

Topic coordinators, recent articles, submission deadlines.

Submission closed.

Participating Journals

Total views.

  • Demographics

No records found

total views article views downloads topic views

Top countries

Top referring sites, about frontiers research topics.

With their unique mixes of varied contributions from Original Research to Review Articles, Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author.

  • Browse All Articles
  • Newsletter Sign-Up

Judgments →

research paper about judgement

  • 04 Mar 2024
  • Research & Ideas

Want to Make Diversity Stick? Break the Cycle of Sameness

Whether on judicial benches or in corporate boardrooms, white men are more likely to step into roles that other white men vacate, says research by Edward Chang. But when people from historically marginalized groups land those positions, workforce diversification tends to last. Chang offers three pieces of advice for leaders striving for diversity.

research paper about judgement

  • 23 May 2023

Face Value: Do Certain Physical Features Help People Get Ahead?

Society seems to reward people with particular facial features. Research by Shunyuan Zhang and colleagues uses machine learning to analyze traits that people associate with charisma. The findings highlight opportunities to enhance one's image—and challenge bias.

research paper about judgement

  • 29 Sep 2022

Inclusive Leadership Advice: Get Comfortable With the Uncomfortable

People tend to seek sameness, but they can teach themselves to relish in the differences of the human experience. Francesca Gino offers these three principles from improv to anyone who's trying to lead more inclusively.

research paper about judgement

  • 24 Jul 2019
  • Lessons from the Classroom

Can These Business Students Motivate Londoners to Do the Right Thing?

In the Harvard Business School course Behavioral Insights, students work in the UK with psychology experts to understand what motivates consumers and workers. What they learn can help businesses of all types, says Michael Luca. Open for comment; 0 Comments.

research paper about judgement

  • 05 Feb 2019
  • Working Paper Summaries

Stereotypes and Belief Updating

Increasing evidence demonstrates that stereotyped beliefs drive key economic decisions. This paper shows the significant role of self-stereotyping in predicting beliefs about one’s own ability. Stereotypes do not just affect beliefs about ability when information is scarce. In fact, stereotypes color the way information is incorporated into beliefs, perpetuating initial biases.

research paper about judgement

  • 30 Jul 2018

Why Ethical People Become Unethical Negotiators

You may think you are an ethical person, but self-interest can cloud your judgment when you sit down at the bargaining table, says Max Bazerman. Open for comment; 0 Comments.

  • 07 Aug 2013
  • What Do You Think?

Is There Still a Role for Judgment in Decision-Making?

Summing Up: Human judgment should be a part of all decisions, but play a dominant role in significantly fewer of them, according to many of Jim Heskett's readers. Is good old-fashioned intuition out of date? What do YOU think? Closed for comment; 0 Comments.

  • 27 Feb 2013

Sidetracked: Why Can’t We Stick to the Plan?

In her new book, Sidetracked, behavioral scientist and professor Francesca Gino explores the unexpected forces that often keep people from following through with their plans, both professional and personal. Closed for comment; 0 Comments.

  • 05 Feb 2009

Why Can’t We Figure Out How to Select Leaders?

Managers discuss their own experience in organizations in response to February's column. All good leaders teach as well as learn, says Jim Heskett. Is it possible with any degree of confidence to select people for certain leadership jobs? (Forum now closed. Next forum begins March 5.) Closed for comment; 0 Comments.

  • 28 Aug 2008

How Can Decision Making Be Improved?

While scholars can describe how people make decisions, and can envision how much better decision-making could be, they still have little understanding of how to help people overcome blind spots and behave optimally. Chugh, Milkman, and Bazerman organize the scattered knowledge that judgment and decision-making scholars have amassed over several decades about how to reduce biased decision-making. Their analysis of the existing literature on improvement strategies is designed to highlight the most promising avenues for future research. Key concepts include: People put great trust in their intuition. The past 50 years of decision-making research challenges that trust. A key task for psychologists is to identify how and in what decision-making situations people should try to move from intuitive, emotional thinking to more deliberative, logical thinking. The more that researchers understand the potentially harmful effects of some biased decision-making, the more important it is to have empirically tested strategies for reaching better decisions. Closed for comment; 0 Comments.

  • 03 Jan 2008

Does Judgment Trump Experience?

It's a question as relevant for business as for the U.S. presidential campaign, says HBS professor Jim Heskett. If "judgment capability" is a function of experience, what kind of experience is important? Does plenty of experience really improve judgment? Online forum now CLOSED. Closed for comment; 0 Comments.

Rethinking the field of automatic prediction of court decisions

  • Original Research
  • Open access
  • Published: 25 January 2022
  • Volume 31 , pages 195–212, ( 2023 )

Cite this article

You have full access to this open access article

research paper about judgement

  • Masha Medvedeva   ORCID: orcid.org/0000-0002-2972-8447 1 , 2 ,
  • Martijn Wieling 1 &
  • Michel Vols 2  

15k Accesses

27 Citations

21 Altmetric

Explore all metrics

In this paper, we discuss previous research in automatic prediction of court decisions. We define the difference between outcome identification, outcome-based judgement categorisation and outcome forecasting, and review how various studies fall into these categories. We discuss how important it is to understand the legal data that one works with in order to determine which task can be performed. Finally, we reflect on the needs of the legal discipline regarding the analysis of court judgements.

Similar content being viewed by others

research paper about judgement

In AI we trust? Perceptions about automated decision-making by artificial intelligence

research paper about judgement

Discriminated by an algorithm: a systematic review of discrimination and fairness by algorithmic decision-making in the context of HR recruitment and HR development

A random forest guided tour.

Avoid common mistakes on your manuscript.

1 Introduction

Automatic analysis of legal documents is a useful, if not necessary task in contemporary legal practice and research. Of course, data analysis should be conducted in a methodologically sound, transparent and thorough way. These requirements are extra important with regard to legal data. The stakes that legal professionals such as lawyers, judges and other legal decision-makers deal with and the cost of error in this field make it very important that automatic processing and analysis are done well. That means that it is essential to understand how the automated systems used in the analysis work, what legal data exactly is analysed and for what purpose.

The need for established practices and methodology is becoming more urgent with the growing availability of data. In striving for transparency, many national and international courts in Europe adhere to the directive to promote accessibility and re-use of public sector information Footnote 1 and publish their documents online (Marković and Gostojić 2018 ). This is also the case for many other courts around the world. Footnote 2 Digital access to a large amount of published case law provides a unique opportunity to process this data automatically on a large scale using natural language processing (NLP) techniques.

In this paper we review previous work on applying NLP techniques to court decisions, and discuss the methodological issues as well as good practices. While automatic legal analysis is an enormous field which has been around for some time, in this paper we focus solely on the recent development of using machine learning techniques for classifying court decisions. This sub-field has expanded drastically in the past 6 years with papers that attempt to predict decisions of various courts around the world. We subsequently discuss whether it is fair to say that they indeed succeed. Our main finding is that many of the papers under review claiming to predict decisions of the courts using machine learning actually perform one of three different tasks.

In the following section, we define the scope of review we conducted. Next, in Sect.  3 we discuss (our terminology of) different types of tasks within the field of automatic analysis of court decisions and how previous research falls within those categories. We examine the purpose of such research for each task, as well as good practices and potential pitfalls. We then discuss our survey in Sect.  4 . In Sect.  5 we summarise and conclude our work.

2 Scope of the review

We limit our review to the papers that use machine learning techniques and claim to be predicting court decisions. The publication dates range from 2015 to (June) 2021. Footnote 3 We specifically chose these years, as this is when machine learning in this field became popular. If a paper included in our review attempts multiple tasks, we only focus on the experiment(s) that focus on predicting judicial decisions. While our survey is meant to provide an exhaustive overview, we may have inadvertently missed some research in the field.

While we already mentioned that the research in the field is growing, not all courts share (all) their case law online. Furthermore, the majority of available case law is extremely varied in its outcomes, which may make it harder to set up an outcome prediction task. For this reason, research often focuses on a relatively restricted set of courts. In this paper, we surveyed publications that use machine learning approaches and focus on case-law of the US Supreme Court (Sharma et al. 2015 ; Katz et al. 2017 ; Kaufman et al. 2019 ), the French court of Cassation (Şulea et al. 2017b ; Sulea et al. 2017a ), the European Court of Human Rights (Aletras et al. 2016 ; Liu and Chen 2017 ; Chalkidis et al. 2019 ; Kaur and Bozic 2019 ; O’Sullivan and Beel 2019 ; Visentin et al. 2019 ; Chalkidis et al. 2020 ; Condevaux 2020 ; Medvedeva et al. 2020a , b ; Quemy and Wrembel 2020 ; Medvedeva et al. 2021 ), Brazilian courts (Bertalan and Ruiz 2020 ; Lage-Freitas et al. 2019 ), Indian courts (Bhilare et al. 2019 ; Shaikh et al. 2020 ; Malik et al. 2021 ), UK courts (Strickson and De La Iglesia 2020 ), German courts (Waltl et al. 2017 ), the Quebec Rental Tribunal (Salaün et al. 2020 ) (Canada), the Philippine Supreme Court (Virtucio et al. 2018 ), the Thai Supreme Court (Kowsrihawat et al. 2018 ) and the Turkish Constitutional Court (Sert et al. 2021 ). Many of these papers achieve a relatively high performance on their specific task using various machine learning techniques.

The distinction between different tasks in this paper is conditional on the data, but is not contingent on the algorithms used. Consequently, we discuss the following papers from the perspective of which data was used, how it was processed and general performance of the systems using particular data for a particular task. We do not go into detail of the algorithms used for achieving that performance. For the specifics of different systems, we therefore refer the interested reader to the papers at hand. For a more detailed explanation of machine learning classification for legal texts in general, see Medvedeva et al. ( 2020a ) and Dyevre ( 2020 ).

3 Terminology and types of judgement classification

In papers that use machine learning for classifying court decisions, different terms and types of tasks are often used interchangeably. For the field to move forward, we therefore argue for a more strict use of terminology. Consequently, in this paper, we use ‘judgement’ to mean the text of a published judgement. While the word ‘outcome’ is a very general term, for the purposes of distinguishing between different tasks in the legal context, we define outcome as a specific closed class of labels for verdicts (i.e. with a pre-defined limited number of verdicts). For example, in the context of case law concerning the European Convention on Human Rights (ECHR) the outcome will be a violation or a non-violation of a specific human right. Other examples of outcomes are eviction or non-eviction in a housing law context (Vols 2019 ) or the US Supreme Court affirming or reversing a decision of a lower court. We use ‘verdict’ and ‘decision’ as synonyms of ‘outcome’.

In this paper we will distinguish between three types of tasks: outcome identification , outcome-based judgement categorisation , and outcome forecasting . Footnote 4 In simple terms, outcome identification is the task of identifying the verdict in the full text of the published judgements, judgement categorisation is the task of categorising documents based on the outcome, and outcome forecasting is the task of predicting future decisions of a particular court. At present, these task distinctions are not clearly made in the literature, even by ourselves (Medvedeva et al. 2020a ). This is potentially problematic as the different tasks have specific uses, which we will discuss below.

The most likely reason for the ambiguity in terminology is the cross-disciplinary nature of the field, combining law with NLP. When using machine learning in the field of NLP, all three tasks are so-called classification tasks. The most commonly used approach in machine learning, and the one all of the reviewed papers have used, is supervised learning . This means that the system is trained on some input data (e.g., facts extracted from a criminal case) that is connected to the labels (outcomes), for instance whether the case was won by the defendant or the prosecution. During the training phase, the model is presented with input data together with their labels in order to infer patterns characterising the relationship between the two. To evaluate the system after training, the system is provided with similar data ( not used during the training phase), such as other criminal cases, and it then predicts the label for each document. Since the label in each task is the outcome, identifying the purpose of these systems within NLP as ‘predicting court decisions’ is appropriate. However, that meaning does not translate in the same way outside of the NLP domain. Specifically, the word predict in the legal domain suggests that one can forecast a decision (of the judge) that has not been made yet, whereas in NLP predict merely refers to the methodology and terminology of machine learning. The majority of papers on predicting court decisions published today, however, do not attempt to predict decisions of the cases that have not been judged yet. Furthermore, the majority of the work in this interdisciplinary field suggests a benefit for legal professionals, but does not explicitly specify what the models that were introduced can be used for.

To circumvent the use of the ambiguous word predict , we therefore suggest using terminology that better reflects the different tasks, and thereby also differentiates between objectives. In order to distinguish between outcome identification, outcome-based judgement categorisation and outcome forecasting it is important to carefully assess the data used in the experiments conducted.

When discussing different papers, we will also refer to their performance scores. The conventional way of reporting the performance of a classification system is by using accuracy or the F1-score. Accuracy is how many of the labels (in our case, outcomes) were classified (i.e. identified, categorised, or forecasted) correctly. The F1-score is a harmonic mean of precision and recall, where precision is the amount of judgements for which the assigned outcome is correct and recall is the percentage of cases with a specific outcome which are classified (i.e. identified, categorised, or forecasted) correctly by the system.

In the following subsections we will make the definitions of the three tasks more explicit, and then give examples from published research for each task. We also highlight the distinct uses of the different tasks for legal professionals.

3.1 Outcome identification

Outcome identification is defined as the task of identifying the verdict within the full text of the judgement, including (references to) the verdict itself. In principle, a machine learning system is often not necessary for such a task, as keyword search (or using simple regular expressions) might suffice.

Outcome identification falls under the field of information extraction and when not confused with predicting court decisions is often also referred to as outcome extraction (e.g., Petrova et al. 2020 ). Given the growing body of published case law across the world, the automation of this task may be very useful, since many courts publish case law without any structured information (i.e. metadata) available, other than the judgements themselves, and often one may require a database where the judgements are connected to the verdicts in order to conduct research. At present and to our knowledge, most of such work is generally done manually, as a human can do this task with 100% accuracy (by simply reading the case and finding the verdict in it).

Automation of outcome identification allows one to save time when collecting this information. While the task is not necessarily always trivial for a machine and depends on how the verdict is formulated (see, for instance, Vacek and Schilder ( 2017 ), Petrova et al. ( 2020 ) and Tagny-Ngompé et al. ( 2020 )), there is nonetheless an expectation that these automated systems should achieve (almost) perfect performance to justify the automation. However, the approach to outcome identification is highly dependent on the structure of judgements in a particular legal domain or jurisdiction and the language of the case law. As a result, a system that automatically identifies a verdict in a particular set of judgements cannot be applied easily to case law of courts in other legal domains or other jurisdictions.

3.1.1 Research in outcome identification

A total of eight papers that aimed to predict court decisions (see Table  1 ) were performing the outcome identification task. These papers use the text of the final judgements published by the court that contain references to the verdict or the verdict itself.

One of the earliest papers that tried predicting court decisions using the text of the judgement is Aletras et al. ( 2016 ). The authors used a popular machine learning algorithm, a Support Vector Machine (SVM) to predict decisions of the European Court of Human Rights (ECtHR). Their model aimed to predict the court’s decision by extracting the available textual information from relevant sections of the ECtHR judgements and reached an average accuracy of 79% for three separate articles of the ECHR. While the authors did exclude the verdict itself (or the complete section containing the verdict), they still used the remaining text of the judgements, which often still included specific references to the final verdict (e.g., ‘Therefore there is a violation of Article 3’). While their work was positioned as predicting the outcome of court cases, the task they conducted was therefore restricted to outcome identification.

Other studies focusing on the ECtHR included Liu and Chen ( 2017 ), Visentin et al. ( 2019 ), and Quemy and Wrembel ( 2020 ). Since Liu and Chen ( 2017 ) and Visentin et al. ( 2019 ) used the same dataset as Aletras et al. ( 2016 ), they also conducted the task of outcome identification. Liu and Chen ( 2017 ) used similar statistical methods as Aletras et al. ( 2016 ) and achieved an 88% accuracy using an SVM, whereas Visentin et al. ( 2019 ) achieved an accuracy of 79% using an SVM ensemble. Whereas Quemy and Wrembel ( 2020 ) collected a larger dataset for the same court and performed a binary classification task (violation of any article of the ECHR vs. no violation) using neural models, they did not appear to exclude any part of the judgement, thereby restricting their task also to outcome identification (with a concomitant high accuracy of 96% using a range of statistical methods). These studies show that automatic outcome identification to a large extent is possible for the ECtHR. However, from a legal perspective this task is not very useful, as the verdict has already been categorised on the ECtHR website.

The studies on the basis of the ECtHR illustrate two broad categories of papers which aim at predicting court judgements, but instead are outcome identification tasks. The first category consists of studies which were only partially successful in removing the information about (references to) the verdict. Besides the aforementioned studies of Aletras et al. ( 2016 ), Liu and Chen ( 2017 ) and Visentin et al. ( 2019 ), the studies of Şulea et al. ( 2017a , b ) suffer from the same problem. They focus on the French Court of Cassation and reach an accuracy of up to 96%. While they masked the words containing the verdict, various words which were found to be important for the prediction of their model appeared to be closely related to the outcome description. Consequently, they were not completely successful in filtering out the information about the outcome.

The second category consists of studies which do not filter out any information out of the judgement at all (or do not mention filtering out this type of information), such as Quemy and Wrembel ( 2020 ). Virtucio et al. ( 2018 ) are explicit in not filtering out the actual court decision of the Philippine Supreme Court (due to a lack of consistent sectioning in the judgement descriptions) when predicting its judgement. Nevertheless, their accuracy was rather low at only 59%. In addition, there is a number of papers that do not specify any pre-processing steps to remove the information that may contain the verdict. Examples are Lage-Freitas et al. ( 2019 ) who deal with appeal cases of Brazilian courts (with an F1-score of 79%) and Bertalan and Ruiz ( 2020 ) who worked on second-degree murder and corruption cases tried in São Paolo Justice Court (with an F1-score of up to 98%).

3.2 Outcome-based judgement categorisation

Outcome-based judgement categorisation is defined as categorising court judgements based on their outcome by using textual or any other information published with the final judgement, but excluding (references to) the verdict in the judgement. Since the outcomes of such cases are published and no longer need to be ‘predicted’, this task is mainly useful for identifying predictors (facts, arguments, judges, etc) of court decisions within the text of judgements. To avoid the system identifying the outcome within the text of the judgement and in order for it to learn new information any references to the verdict need to be removed.

While an algorithm may perform very well on the categorisation task, the obtained categories are not useful by themselves. As the documents used by the system are only available when the judgements are made and public, the outcome categorisation does not contribute any new information (one can simply extract the verdict from the published judgement). This view is also supported by Bex and Prakken ( 2021 ) who insist that the ability to categorise decisions without explaining why the categorisation was made, does not provide any useful information and may even be misleading. The performance of a machine learning model for judgement categorisation, however, may provide useful information about how informative the characteristic features are. To enable feature extraction, it is important that the system is not a ‘black box’ (such as many of the more recent neural classification models). Therefore, rather than ‘predicting court decisions’ the main objective of the outcome-based judgement categorisation task should be to identify predictors underlying the categorisations.

As we only discuss publications that categorise judgements on the basis of the outcome of the case, we will refer to outcome-based judgement categorisation simply as judgement categorisation.

3.2.1 Research in outcome-based judgement categorisation

Most of the papers in the field categorise judgements. The papers surveyed that involve judgement categorisation can be found in Table 2 . For all fifteen papers, we indicate the paper itself, the court, whether or not the authors provide a method of analysing feature importance (FI) and consequently identify specific predictors of the outcome within the text, and the maximum performance.

Within these studies, two broad categories can be distinguished depending on which type of data they use. On the one hand, most studies use the raw text, explicitly selecting parts of the judgement which does not include (references to) the verdict. On the other hand, there are (fewer) studies which manually annotate data and use that as a basis for the categorisation.

Kowsrihawat et al. ( 2018 ) used the raw text to categorise (with an accuracy of 67%) the documents of the Thai Supreme Court on the basis of the facts of the case and the text related to the legal provisions in the cases such as murder, assault, theft, fraud and defamation using a range of statistical and neural methods. Medvedeva et al. ( 2018 ), Medvedeva et al. ( 2020a ) categorised (with an accuracy of at most 75%) decisions of the ECtHR using only the facts of the case (i.e. a separate section in each ECtHR judgement). Notably, Medvedeva et al. ( 2020a ) identified the top predictors (i.e. sequences of one or more words) for each category, which was possible due to the (support vector machine) approach they used. Strickson and De La Iglesia ( 2020 ) worked on categorising judgements of the UK Supreme Court and compared several systems trained on the raw text of the judgement (without the verdict) and reported an accuracy of 69%, while also presenting the top predictors for each class. Sert et al. ( 2021 ) categorised cases of the Turkish Constitutional Court related to public morality and freedom of expression using a traditional neural multi-layer perceptron approach with an average accuracy of 90%. Similarly to Medvedeva et al. ( 2020a ), Chalkidis et al. ( 2019 ) also investigated the ECtHR using the facts of the case, and proposed several neural methods to improve categorisation performance (up to 82%). They additionally proposed an approach (a hierarchical attention network) to identify which words and facts were most important for the classification of their systems. In their subsequent study Chalkidis et al. ( 2020 ) used a more sophisticated neural categorisation algorithm which was specifically tailored for legal data (LEGAL-BERT). Unfortunately, while their approach did show an improved performance (with an F1-score of 83%) it was not possible to determine the best predictors of the outcome due to the system’s complexity. Medvedeva et al. ( 2021 ) reproduced the algorithms in Chalkidis et al. ( 2019 ) and Chalkidis et al. ( 2020 ) in order to compare their performance for categorisation and forecasting tasks (see below) for a smaller subset of ECtHR cases, and achieved an F1-score of up to 92% for categorising judgements of 2019. The scores however varied throughout the years. For example, categorisation of cases from 2020 did not surpass 62%. Several other categorisation studies (with accuracies ranging between 69 and 88%) focused on the facts of the ECtHR, but likewise did not investigate the best predictors (Kaur and Bozic 2019 ; O’Sullivan and Beel 2019 ; Condevaux 2020 ). Malik et al. ( 2021 ) used neural methods to develop a system that categorised Indian Supreme Court Decisions achieving 77% accuracy. As their main focus was to develop an explainable system, they used an approach which allowed them to investigate the importance of their features, somewhat similar to the approach of Chalkidis et al. ( 2020 ).

Manually annotated data was used by Kaufman et al. ( 2019 ) who focused on data from the US Supreme Court (SCOTUS) Database (Spaeth et al. 2014 ) and achieved an accuracy of 75% using statistical methods (i.e. AdaBoosted decision trees). However, they did not investigate the most informative predictors. Shaikh et al. ( 2020 ) also used manually annotated data to categorise the decisions of murder-cases of the Delhi District Court with an accuracy of up to 92% using classification and regression trees. These authors manually annotated 18 features, including whether the injured is dead or alive, the type of evidence, the number of witnesses et cetera. Importantly, they analysed the impact of each type of feature for each type of outcome.

Finally, Salaün et al. ( 2020 ) essentially combined the two types of predictors, by not only extracting a number of characteristics from the cases of Rental Tribunal of Quebec (including the court location, judge, types of parties, et cetera), but also using the raw text of the facts (as well as the complete text excluding the verdict), achieving a performance of at most 85% with a French BERT model, FlauBERT.

Notably, the performance of Sert et al. ( 2021 ) was very high. Despite the high success rate of their system, however, the authors warn against using it for decision-making. Nevertheless, they do suggest that their system can potentially be used for prioritising the cases that have a higher likelihood to end up in a violation. This suggestion mirrors the proposition made by Aletras et al. ( 2016 ) for potentially using their system to prioritise cases with human rights violations. In both cases, however, the experiments were conducted using data extracted from the final judgements of the court, and the performance of these systems using data compiled before the verdict was reached (i.e. information necessary to prioritise cases) is unknown. Making these types of recommendations is therefore potentially problematic.

Many categorisation papers shown in Table  2 claim to be useful for legal aid. However, as we argued before, categorisation as such is not a useful task, given that the verdict can simply be read in the judgement text. To be useful, it is essential that categorisation performance is supplemented with the most characteristic features (i.e. predictors). Unfortunately, only a minority of studies provides this information. And even if they do, the resulting features, especially when using the raw text (i.e. characteristic words or phrases), may not be particularly meaningful.

In an attempt to be maximally explainable, Collenette et al. ( 2020 ) suggest using Abstract Dialectical Framework instead of machine learning. They apply this framework to deducing the verdict from the text of judgements of the ECtHR regarding Article 6 of the ECHR (the right to a fair trial). The system requires the user to answer a range of questions, and on the basis of the provided answers, the model determines whether there was a violation of the right to a fair trial or not. The questions for the system were derived by legal experts, and legal expertise is also required to answer these questions (Collenette et al. 2020 ). While their system seemed to perform flawlessly when tested on ten cases, we face the same issue as with the machine learning systems. Specifically, the main input data is based on the final decision that has already been made by the judge. For instance, one of the questions that the model requires to be answered is whether the trial was independent and impartial, which is a question that has to be decided on by the judge. While this type of tool may potentially 1 day be used for judicial support, for example, as a checklist for a judge when making a specific decision, it is unable to actually forecast decisions in advance, or point to external factors that are not identified by legal experts.

3.3 Outcome forecasting

Outcome forecasting is defined as determining the verdict of a court on the basis of textual information about a court case which was available before the verdict was made (public). This textual information can, for instance, be submissions by the parties, or information (including judgements) provided by lower courts to predict the decisions of a higher court, such as the US Supreme Court. Forecasting thereby comes with the essential assumption that the input for the system was not influenced in any way by the final outcome that it forecasts. In contrast to outcome-based judgement categorisation , it is useful to evaluate how well the algorithm is able to predict the outcome of cases. For example, individuals may use such algorithms to evaluate how likely it is that they will win a court case. Similarly to judgement categorisation , determining the factors underlying a well-performing model is useful as well. While identification and categorisation tasks only allow one to extract information and analyse already made court decisions, forecasting allows one to predict future decisions that have not been made yet. Note that whether or not a model was trained on older cases than it was evaluated on (e.g., the ‘predicting the future’ experiment conducted by Medvedeva et al. 2020a ) does not affect its classification as a judgement categorisation as opposed to a judgement forecasting task. Only the type of data affects which task it is. Since Medvedeva et al. ( 2020a ) use extracted data from the court judgements, their task is still an outcome-based judgement categorisation task.

3.3.1 Research in outcome forecasting

Table  3 lists the papers that focus on forecasting court verdicts. While many publications focus on ‘predicting court decisions’, only five papers satisfy our criteria for outcome forecasting. We can observe that the performance of these studies is lower than for the categorisation and identification tasks. This is not surprising as forecasting can be expected to be a harder task. Given the small number of papers, we discuss each of these in some detail.

The advantage of working with the US Supreme Court databases is that it attracts much attention. Consequently, all data from the trials are always systematically and manually annotated by legal experts with many variables immediately after the case was tried. Sharma et al. ( 2015 ) and Katz et al. ( 2017 ) both use variables available to the public once the case was moved to the Supreme Court, but before the decision was made to forecast decisions of SCOTUS. Sharma et al. ( 2015 ) use neural methods, whereas Katz et al. ( 2017 ) use the more traditional technique of random forests. Both approaches resulted in forecasting 70% of the outcomes correctly, which was a small improvement over the 68% baseline accuracy where the petitioner always wins (suggested by Kaufman et al. 2019 ). Moreover, Sharma et al. ( 2015 ) present the importance of various variables in their model, therefore potentially enabling a more thorough legal analysis of the data. The variables used in both studies contained information about the courts and the proceedings but hardly any variables pertaining to the facts of the case.

Waltl et al. ( 2017 ) attempted to forecast decisions of the German appeal court in the matters of Tax Law (Federal Fiscal Court). The authors used the documents and meta-data of the case (e.g., year of dispute, court, chamber, duration of the case, et cetera) from the court of first instance. They extracted keywords from the facts and (lower) court reasoning to forecast decisions. They tried a range of methods, but selected the best-performing naive Bayes classifier as their final model. Their relatively low F1-score of 0.57 indicates that it may have been a rather difficult task, however.

Medvedeva et al. ( 2020b ) used raw text and facts within documents that were published by the ECtHR (sometimes years) before the final judgement. These documents are known as ‘communicated cases’. Specifically they used the facts as presented by the applicant and then communicated by the Court to the State as a potential violator of the human rights. The communicated cases reflect the side of the potential victim, and are only communicated when no similar cases have been processed by the court before. Consequently, these documents include a very diverse set of facts, and different issues (although all within the scope of the European Convention on Human Rights) are covered in them. Medvedeva et al. ( 2020b ) reported an accuracy of 75% using SVMs on their dataset (the model is re-trained and run again every month). This system is integrated in an online platform that also highlights the sentences or facts within the text of these (communicated) cases that are most important for the model’s decision. Footnote 5 Medvedeva et al. ( 2021 ) used a slightly different dataset of the same documents (i.e. only cases with the judgement in English were included, but the dataset was expanded by adding cases that resulted in inadmissibility based on merit) and retrained the model per year (as opposed to per month in Medvedeva et al. ( 2020b ). The authors compared how the state-of-the-art algorithms for this court, BERT (Chalkidis et al. 2019 ), LegalBERT (Chalkidis et al. 2020 ), and SVMs (Medvedeva et al. 2020a , b ) perform on data available before the final judgement and with the final judgement. The results showed that forecasting is indeed a much harder task, as the models achieved a maximum F1-score of 66% as opposed to 92% for categorisation of the same cases.

4 Discussion

It is clear that ‘predicting court decisions’ is not an unambiguous task. There is therefore a clear need to carefully identify the objective of the experiments before conducting them. We believe such an objective has to be rooted within the specific needs of the legal community to prevent systems being developed of which the authors believe them to be useful, whereas they do not have any meaningful application in the legal field at all. The purpose of our paper was to provide some terminology which may be helpful for this.

While researchers may believe they are ‘predicting court decisions’, very infrequently this involves actually being able to predict the outcome of future judgements. In fact, predicting court decisions sometimes (likely inadvertently, due to sub-optimal filtering or insufficient knowledge about the exact dataset) ended up not being anything other than identifying the outcome from the judgement text. While sophisticated approaches were often put forward in those cases, a simple keyword search might already have resulted in a higher performance for this identification task. Most often, however, predicting court decisions was found to be equal to the task of categorising the judgements according to the verdicts. This is not so surprising given the available legal datasets, which more often contain complete judgements than documents which were produced before the verdict was known.

In sum, to identify the exact task, and the concomitant goals which are useful from a legal perspective, it is essential that researchers are well aware of the type of data they are analysing. Unfortunately, this is frequently not the case. For example, several researchers (Chalkidis et al. 2019 ; Quemy and Wrembel 2020 ; Condevaux 2020 ) have recently started to develop (multilabel classification) systems, which are able to predict which articles were invoked in an ECtHR case. However, this task is not relevant from a legal perspective, as articles which are potentially violated have to be specified when petitioning the ECtHR.

Therefore when creating a new application, for instance, using data from another court, one should clearly determine the goal of such a system first, and then review whether the data for the established task is available. Specifically, one needs full judgements for the outcome identification task. In case of a judgement categorisation task, full judgements from which the outcomes can be removed are necessary. If the system needs to perform a forecasting task, it requires data available before the judgement is made.

For all of the above tasks, explainability (i.e. being able to determine the importance of various features when determing the model’s outcome) helps to better analyse the performance and gain insight into the workings of the system. However, explainability is essential for judgement categorisation, as this task is reliant on the ability to investigate which features are related to the outcome.

As we mentioned before, the identification task does not always require the use of machine learning techniques. This task can often be solved with a keyword search which does not require any annotated data. Using machine learning is necessary when the judgement text is not very structured, and when more complex descriptions of the outcome need to be extracted. For both the judgement categorisation task and the forecasting task, statistics may be useful to assess the relation between predetermined factors and the outcome, whereas for categorisation task machine learning techniques allow for discovering new patterns and factors within the judgements that may have not been considered previously. Similarly, machine learning can be used to forecast future court decisions by training the system on the decision that the court has made in the past. To illustrate these three tasks, their goals and requirements, a flow-chart is shown in Fig.  1 .

figure 1

Flowchart illustrating the goals and requirements for the three court decision prediction tasks

Finally, we would like to emphasise that while the approaches discussed in this paper can be suitably used in legal analysis, and for example to try to understand past court decisions, none of the systems capable of solving any of the discussed tasks are appropriate for making court decisions. Judicial decision-making requires (among others) knowledge about the law, knowledge about our ever-changing world, and arguments to be weighed. This is very different from the (sometimes very sophisticated) pattern-matching capabilities of the systems discussed in this paper.

5 Conclusion

In this paper, we have proposed several definitions for analysing court decisions using computational techniques. Specifically, we discussed the difference between forecasting decisions, categorising judgements according to the verdict and identifying the outcome based on the text of the judgement. We also highlighted the specific potential goals associated with each of these tasks and illustrated that each task is strongly dependent on the type of data used.

The availability of enormous amounts of legal (textual) data in combination with the legal discipline being relatively methodologically conservative (Vols 2021 ) has enabled researchers from various other fields to attempt to analyse these data. However, to conduct meaningful tasks, we argue for more interdisciplinary collaborations, not only involving technically skilled researchers, but also legal scholars to ensure meaningful legal questions are answered, and this new and interesting field is propelled forwards.

https://digital-strategy.ec.europa.eu/en/policies/legislation-open-data , accessed on 11/10/2021.

See, for instance, case law of the Constitutional Court of South Africa available at: https://collections.concourt.org.za .

For description of earlier approaches in automatic prediction of court decision with and without using machine learning we refer to Ashley and Brüninghaus ( 2009 )

In principle, there are three additional tasks, namely charge identification , charge-based judgement categorisation and charge forecasting . These tasks involve determining the specific sentence or charge. For example, the number of years someone was sentenced to go to prison in criminal court proceedings. These tasks have most often been investigated for various courts in China (Luo et al. 2017 ; Ye et al. 2018 ; Jiang et al. 2018 ; Liu and Chen 2018 ; Zhong et al. 2018a , b ; Li et al. 2019 ; Chen et al. 2019 ; Long et al. 2019 ; Chao et al. 2019 ; Fan et al. 2020 ; Cheng et al. 2020 ; Tan et al. 2020 ; Huang et al. 2020 ). The distinction we make between identification, categorisation and forecasting (and the pitfalls and suggestions regarding this distinction) in this paper, however, hold for these cases as well.

https://jurisays.com .

Aletras N, Tsarapatsanis D, Preoţiuc-Pietro D, Lampos V (2016) Predicting judicial decisions of the European Court of Human Rights: a natural language processing perspective. PeerJ Comput Sci 2:e93

Article   Google Scholar  

Ashley KD, Brüninghaus S (2009) Automatically classifying case texts and predicting outcomes. Artif Intell Law 17(2):125–165

Bertalan VGF, Ruiz EES (2020) Predicting judicial outcomes in the Brazilian legal system using textual features. In: DHandNLP@ PROPOR, pp 22–32

Bex F, Prakken H (2021) On the relevance of algorithmic decision predictors for judicial decision making. In: Proceedings of the 19th international conference on artificial intelligence and law (ICAIL 2021). ACM Press

Bhilare P, Parab N, Soni N, Thakur B (2019) Predicting outcome of judicial cases and analysis using machine learning. Int Res J Eng Technol (IRJET) 6:326–330

Google Scholar  

Chalkidis I, Androutsopoulos I, Aletras N (2019) Neural legal judgment prediction in English. In: Proceedings of the 57th annual meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Florence, Italy, pp 4317–4323. https://doi.org/10.18653/v1/P19-1424 . https://www.aclweb.org/anthology/P19-1424

Chalkidis I, Fergadiotis M, Malakasiotis P, Aletras N, Androutsopoulos I (2020) LEGAL-BERT: “preparing the muppets for court”. In: Proceedings of the 2020 conference on empirical methods in natural language processing: findings, pp 2898–2904

Chao W, Jiang X, Luo Z, Hu Y, Ma W (2019) Interpretable charge prediction for criminal cases with dynamic rationale attention. J Artif Intell Res 66:743–764

Chen H, Cai D, Dai W, Dai Z, Ding Y (2019) Charge-based prison term prediction with deep gating network. arXiv preprint arXiv:1908.11521

Cheng X, Bi S, Qi G, Wang Y (2020) Knowledge-aware method for confusing charge prediction. In: CCF international conference on natural language processing and Chinese computing. Springer, pp 667–679

Collenette J, Atkinson K, Bench-Capon TJ (2020) An explainable approach to deducing outcomes in European Court of Human Rights cases using ADFs. In: COMMA, pp 21–32

Condevaux C (2020) Neural legal outcome prediction with partial least squares compression. Stats 3(3):396–411

Dyevre A (2020) Text-mining for lawyers: how machine learning techniques can advance our understanding of legal discourse. Available at SSRN 3734430

Fan Y, Zhang L, Wang P (2020) Leveraging label semantics and correlations for judgment prediction. In: China conference on information retrieval. Springer, pp 70–82

Huang YX, Dai WZ, Yang J, Cai LW, Cheng S, Huang R, Li YF, Zhou ZH (2020) Semi-supervised abductive learning and its application to theft judicial sentencing. In: 2020 IEEE international conference on data mining (ICDM). IEEE, pp 1070–1075

Jiang X, Ye H, Luo Z, Chao W, Ma W (2018) Interpretable rationale augmented charge prediction system. In: Proceedings of the 27th international conference on computational linguistics: system demonstrations, pp 146–151

Katz DM, Bommarito MJ II, Blackman J (2017) A general approach for predicting the behavior of the Supreme Court of the United States. PloS One 12(4):e0174698

Kaufman AR, Kraft P, Sen M (2019) Improving Supreme Court forecasting using boosted decision trees. Polit Anal 27(3):381–387

Kaur A, Bozic B (2019) Convolutional neural network-based automatic prediction of judgments of the European Court of Human Rights. In: AICS, pp 458–469

Kowsrihawat K, Vateekul P, Boonkwan P (2018) Predicting judicial decisions of criminal cases from Thai Supreme Court using bi-directional GRU with attention mechanism. In: 2018 5th Asian conference on defense technology (ACDT). IEEE, pp 50–55

Lage-Freitas A, Allende-Cid H, Santana O, de Oliveira-Lage L (2019) Predicting brazilian court decisions. arXiv preprint arXiv:1905.10348

Li Y, He T, Yan G, Zhang S, Wang H (2019) Using case facts to predict penalty with deep learning. In: International conference of pioneering computer scientists. Springer, Engineers and Educators, pp 610–617

Liu YH, Chen YL (2018) A two-phase sentiment analysis approach for judgement prediction. J Inf Sci 44(5):594–607

Liu Z, Chen H (2017) A predictive performance comparison of machine learning models for judicial cases. In: 2017 IEEE symposium series on computational intelligence (SSCI). IEEE, pp 1–6

Long S, Tu C, Liu Z, Sun M (2019) Automatic judgment prediction via legal reading comprehension. In: China national conference on Chinese computational linguistics. Springer, pp 558–572

Luo B, Feng Y, Xu J, Zhang X, Zhao D (2017) Learning to predict charges for criminal cases with legal basis. In: Proceedings of the 2017 conference on empirical methods in natural language processing. Association for Computational Linguistics, Copenhagen, Denmark, pp 2727–2736. https://doi.org/10.18653/v1/D17-1289 . https://www.aclweb.org/anthology/D17-1289

Malik V, Sanjay R, Nigam SK, Ghosh K, Guha SK, Bhattacharya A, Modi A (2021) ILDC for CJPE: Indian legal documents corpus for court judgment prediction and explanation. arXiv preprint arXiv:2105.13562

Marković M, Gostojić S (2018) Open judicial data: a comparative analysis. Soc Sci Comput Rev 38, 295-314

Medvedeva M, Vols M, Wieling M (2018) Judicial decisions of the European Court of Human Rights: looking into the crystal ball. In: Proceedings of the conference on empirical legal studies

Medvedeva M, Vols M, Wieling M (2020a) Using machine learning to predict decisions of the European Court of Human Rights. Artif Intell Law 28:237–266

Medvedeva M, Xu X, Wieling M, Vols M (2020b) Juri says: prediction system for the European Court of Human Rights. In: Legal knowledge and information systems: JURIX 2020: the thirty-third annual conference, Brno, Czech Republic, December 9-11, 2020. IOS Press, vol 334, p 277

Medvedeva M, Üstun A, Xu X, Vols M, Wieling M (2021) Automatic judgement forecasting for pending applications of the European Court of Human Rights. In: Proceedings of the fifth workshop on automated semantic analysis of information in legal text (ASAIL 2021)

O’Sullivan C, Beel J (2019) Predicting the outcome of judicial decisions made by the European Court of Human Rights. In: AICS 2019—27th AIAI Irish conference on artificial intelligence and cognitive science

Petrova A, Armour J, Lukasiewicz T (2020) Extracting outcomes from appellate decisions in US State Courts. In: Legal knowledge and information systems: JURIX 2020: the thirty-third annual conference, Brno, Czech Republic, December 9-11, 2020. IOS Press, vol 334, p 133

Quemy A, Wrembel R (2020) On integrating and classifying legal text documents. In: International conference on database and expert systems applications. Springer, pp 385–399

Salaün O, Langlais P, Lou A, Westermann H, Benyekhlef K (2020) Analysis and multilabel classification of Quebec court decisions in the domain of housing law. In: International conference on applications of natural language to information systems. Springer, pp 135–143

Sert MF, Yıldırm E, İrfan Haşlak (2021) Using artificial intelligence to predict decisions of the Turkish Constitutional Court. Soc Sci Comput Rev

Shaikh RA, Sahu TP, Anand V (2020) Predicting outcomes of legal cases based on legal factors using classifiers. Procedia Comput Sci 167:2393–2402

Sharma RD, Mittal S, Tripathi S, Acharya S (2015) Using modern neural networks to predict the decisions of Supreme Court of the United States with state-of-the-art accuracy. In: International conference on neural information processing. Springer, pp 475–483

Spaeth H, Epstein L, Ruger T, Whittington K, Segal J, Martin AD (2014) Supreme Court database code book

Strickson B, De La Iglesia B (2020) Legal judgement prediction for UK courts. In: Proceedings of the 2020 the 3rd international conference on information science and system, pp 204–209

Sulea OM, Zampieri M, Malmasi S, Vela M, Dinu LP, Van Genabith J (2017a) Exploring the use of text classification in the legal domain. In: Proceedings of the 2nd workshop on automated semantic analysis of information in legal texts (ASAIL 2017)

Şulea OM, Zampieri M, Vela M, van Genabith J (2017b) Predicting the law area and decisions of French Supreme Court cases. In: Proceedings of the international conference recent advances in natural language processing, RANLP 2017. INCOMA Ltd., Varna, Bulgaria, pp 716–722

Tagny-Ngompé G, Mussard S, Zambrano G, Harispe S, Montmain J (2020) Identification of judicial outcomes in judgments: a generalized Gini-PLS approach. Stats 3(4):427–443

Tan H, Zhang B, Zhang H, Li R (2020) The sentencing-element-aware model for explainable term-of-penalty prediction. In: CCF international conference on natural language processing and Chinese computing. Springer, pp 16–27

Vacek T, Schilder F (2017) A sequence approach to case outcome detection. In: Proceedings of the 16th edition of the international conference on artificial intelligence and law, pp 209–215

Virtucio MBL, Aborot JA, Abonita JKC, Avinante RS, Copino RJB, Neverida MP, Osiana VO, Peramo EC, Syjuco JG, Tan GBA (2018) Predicting decisions of the Philippine Supreme Court using natural language processing and machine learning. In: 2018 IEEE 42nd annual computer software and applications conference (COMPSAC). IEEE, vol 2, pp 130–135

Visentin A, Nardotto A, O’Sullivan B (2019) Predicting judicial decisions: a statistically rigorous approach and a new ensemble classifier. In: 2019 IEEE 31st international conference on tools with artificial intelligence (ICTAI). IEEE, pp 1820–1824

Vols M (2019) European law and evictions: property, proportionality and vulnerable people. Eur Rev Priv Law 27(4):719–752

Vols M (2021) Legal research. Eleven Publishing, The Hague

Waltl B, Bonczek G, Scepankova E, Landthaler J, Matthes F (2017) Predicting the outcome of appeal decisions in Germany’s tax law. In: International conference on electronic participation. Springer, pp 89–99

Ye H, Jiang X, Luo Z, Chao W (2018) Interpretable charge predictions for criminal cases: learning to generate court views from fact descriptions. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long papers). Association for Computational Linguistics, New Orleans, Louisiana, pp 1854–1864. https://doi.org/10.18653/v1/N18-1168 . https://www.aclweb.org/anthology/N18-1168

Zhong H, Guo Z, Tu C, Xiao C, Liu Z, Sun M (2018a) Legal judgment prediction via topological learning. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 3540–3549

Zhong H, Xiao C, Guo Z, Tu C, Liu Z, Sun M, Feng Y, Han X, Hu Z, Wang H et al (2018b) Overview of cail2018: legal judgment prediction competition. arXiv preprint arXiv:1810.05851

Download references

Author information

Authors and affiliations.

Center for Language and Cognition Groningen, University of Groningen, Groningen, The Netherlands

Masha Medvedeva & Martijn Wieling

Department of Legal Methods, University of Groningen, Groningen, The Netherlands

Masha Medvedeva & Michel Vols

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Masha Medvedeva .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Medvedeva, M., Wieling, M. & Vols, M. Rethinking the field of automatic prediction of court decisions. Artif Intell Law 31 , 195–212 (2023). https://doi.org/10.1007/s10506-021-09306-3

Download citation

Accepted : 07 November 2021

Published : 25 January 2022

Issue Date : March 2023

DOI : https://doi.org/10.1007/s10506-021-09306-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Judicial decisions
  • Machine learning
  • Natural language processing
  • Find a journal
  • Publish with us
  • Track your research

Call +61 403 600 248

International Coach Academy

Coach Training School

Articles, Case Studies & Interviews

Research paper: judgment, judging, and judgmental.

research-paper-post-lindsey-auman-600x352

Research Paper By Lindsey Auman (Life Coach, UNITED STATES)

What’s the difference? What’s good? What’s harmful?

judg·ment: noun – the ability to make considered decisions or come to sensible conclusions judg·ing: verb – action of forming an opinion or conclusion about judg·men·tal: adjective – having or displaying an excessively critical point of view1

Look up! Now! Look in front of you!! That car has slammed on its breaks and is starting to swerve on ice across several lanes of traffic and you are in danger! Quick! Make a decision. What do you judge is the right thing to do? Will you speed up to race in front of their sliding car? Will you swerve to the opposite side of the road? Will you slam on your breaks? What about the fact that your car needs the breaks checked? Is now the time to trust them? Or, what about the fact that your loved one is in the passenger seat, will you swerve your car so that your side is exposed to the dangerously out of control vehicle coming towards you – or will you swerve and put your loved one at risk? What will you do? What is your judgment call?

There isn’t a right answer here. Before the judgement is made, you don’t know the results of your decision. And in this particularly stressed and fast moment, your mind is likely to only be able to process a few of the available options before the judgement call must be made – else it is too late and a lack of judgment was actually your decision. Such is the challenge and beautiful truth of judgment – you don’t know if you’re right. You can’t disprove the other options, because you can only take one option. Applied to another person/day/scenario, the variables are different, even if subtly so, and thus the outcome of trying a different course of action – weighing a different judgment – cannot be deemed to be better or worse. It simply is what it is. It stands alone – it is your judgment for yourself and this set of circumstances in this moment of time. It’s one and done, and then it’s gone.

That’s not to say that the lessons that you’ve learned from exercising actions based upon your jugement in your past will not be excellent tools for the later judgments you must make. You had a failed marriage and in looking back, you can now see red flags that you were completely blind to before you were engaged – those are new pieces of information you will use in future judgments. They aren’t lost lessons. Each judgment leads to a lesson which will impact your ability to make later judgments.

Most parents will say that they hope that their lessons learned will help their children make better decisions for themselves. These parents will tirelessly share with their children their lessons – hoping to impact their childs judgment. The child may choose to apply some of that parental wisdom, or perhaps to throw it all out the angsty window of self-discovery. That is them exercising their judgment – weighing the information available to them and making their decision, for themselves.

About Our School

Become a Coach Coaching Niches & Specialities Our Methodology Faculty Class Schedules Online Community

What is Coaching? Certification & Credentialing Guide Graduate Research Papers Graduate Yearbooks

Locations/Contact

Global: Clik Collective Asia: Duo Tower Postal: PO Box 3190 Mentone East, Melbourne AUSTRALIA, 3194

Email: [email protected] Questions: Ask here

Privacy Overview

research paper about judgement

Customer Reviews

Emilie Nilsson

  • Terms & conditions
  • Privacy policy
  • Referral program

icon

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Taylor & Francis Open Select

Logo of taylorfranopen

The Role of Expert Judgment in Statistical Inference and Evidence-Based Decision-Making

Naomi c. brownstein.

a Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, FL;

b Department of Oncologic Sciences, University of South Florida, Tampa, FL;

c Department of Behavioral Sciences and Social Medicine, Florida State University, Tallahassee, FL;

Thomas A. Louis

d Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD;

Anthony O’Hagan

e School of Mathematics and Statistics, The University of Sheffield, Sheffield, UK;

Jane Pendergast

f Department of Biostatistics and Bioinformatics, Duke University, Durham, NC

Associated Data

This article resulted from our participation in the session on the “role of expert opinion and judgment in statistical inference” at the October 2017 ASA Symposium on Statistical Inference. We present a strong, unified statement on roles of expert judgment in statistics with processes for obtaining input, whether from a Bayesian or frequentist perspective. Topics include the role of subjectivity in the cycle of scientific inference and decisions, followed by a clinical trial and a greenhouse gas emissions case study that illustrate the role of judgments and the importance of basing them on objective information and a comprehensive uncertainty assessment. We close with a call for increased proactivity and involvement of statisticians in study conceptualization, design, conduct, analysis, and communication.

1. Introduction

As participants in the October 2017 Symposium on Statistical Inference (SSI), organized and sponsored by the American Statistical Association (ASA), we were challenged to host a session and write a paper inspired by the question, “Do expert opinion and judgment have a role in statistical inference and evidence-based decision-making?” While we work from different perspectives and in different statistical paradigms (both frequentist and Bayesian), there was a resounding “yes!” among us, with ample common ground in our thinking related to this infrequently discussed and often under-appreciated component of statistical and scientific practice.

Expert judgment is a feature of both frequentist and Bayesian inference. Judgments are subjective, and subjectivity is a topic that has generated heated debate in the Statistical community (Gelman and Hennig 2017 ). The subjectivity in a Bayesian prior distribution has been criticized by frequentists. Implicit claims are that frequentist methods are objective and that they, “let the data speak for themselves.” However, frequentist methods also require some components of subjectivity in the choice of a model and of estimators, test statistics, etc. These choices are subjective in the sense that every expert builds knowledge and judgment into their own personal framework of understanding, which is not likely to be identical to that of anyone else. Expert judgment is clearly needed for valid statistical and scientific analyses. Yet, questions regarding how, when, how often, and from whom judgment is helpful, rather than leading to biased, misleading and nonreproducible results, is less clear.

The ASA commissioned this special issue of The American Statistician to stimulate “a major rethinking of statistical inference, aiming to initiate a process that ultimately moves statistical-science—and science itself—into a new age.” We consider the roles of expert judgment in scientific practice. This article is a distillation of our common ground, along with our advice, warnings, and suggestions for incorporating expert judgment into science, while maintaining the integrity and scientific rigor of the research. Note that although we were asked in the Symposium to consider expert opinion and judgment, we avoid the word “opinion” because it risks being equated to uninformed rhetoric. We emphasize “judgment” that should be informed, carefully considered and transparent. While we might still disagree on specific details or the relative importance of various points, we present a strong, unified statement related to this key component of statistical practice. Additional literature on expert judgment in statistics may be found in a companion paper (Brownstein 2018 )

Our article is organized in the following sections where we share our thoughts on when and how expert judgment has a legitimate and necessary role in scientific inquiry. Section 2 presents the role of judgment in the scientific method and more generally, in four stages of scientific inquiry, inference, and decision-making. In Section 3 , the four stages are examined in more detail, focusing on their needs for expert judgment and the qualifications implied by “expert.” Two case studies are presented in Section 4 , with emphasis on the principled and scientific application of expert judgments. Finally, Section 5 summarizes our key conclusions.

2. The Cycles of Inference and Decision in Science

The practice of science is often described in terms of the scientific method, defined in Oxford University Press ( 2018d ) as involving, “systematic observation, measurement, and experiment, and the formulation, testing, and modification of hypotheses.” More generally, one might describe the scientific method as a collection of methodologies and processes of gathering observable, measurable evidence using experimentation and careful observation, and of analyzing the evidence to draw conclusions, inform decisions and raise new questions. The purpose may be to inform about pathways or mechanisms by which results are obtained or to aid in prediction or estimation of quantities of interest. A thorough review of the history of the scientific method is found in Anderson and Hepburn ( 2016 ). While we believe that not all scientific inquiry, and certainly not all decision-making, falls within that prototypical guide, it is important to understand how the scientific method fits into our framework. Namely, science is data-driven in the ascertainment of objective information to draw scientific conclusions, yet with expert judgment feeding into the processes.

We base our four-stage graphic ( Figure 1 ) on the depiction of Garland ( 2016 ), adding the Inform stage to accommodate decision-making and broadening the definitions of the other stages. It illustrates four stages in the perpetual process of scientific inquiry and evidence-based decision-making: Question, Study, Interpret, and Inform. These stages are formulated in highly general terms, because the practice of scientific inquiry, and hence the remit of statistical analysis, is also very wide.

An external file that holds a picture, illustration, etc.
Object name is UTAS_A_1529623_F0001_C.jpg

The cycles of inference and decision.

We first describe activities that comprise each of these stages. Our framework allows us to underscore the varied roles of expert judgment throughout the perpetual cycles of the scientific method.

  • Question : Scientific inquiry can be characterized as beginning with one or more questions. A Question might be a formal scientific hypothesis arising either out of observation of real-world phenomena or from the current status of scientific knowledge and debate. On the other hand, a Question could be posed outside the scientific community, such as a request for evidence-based input to inform an impending policy decision. It might also simply express a wish to estimate more accurately certain quantities or parameters in current scientific theories.
  • Study : To address the Question scientifically, it is necessary to gather evidence in a rigorous manner. In the Study stage, we include all aspects of study design, including design of observational studies, experiments, questionnaires, systematic literature reviews, and meta-analyses. The Study may also involve sequential design or the design of a number of distinct, possibly concurrent studies. We also include in this stage the conduct of the study, resulting in some form of data.
  • Interpret : In the Interpret stage, data resulting from the Study are employed to address the Question. Typically, this may involve descriptive statistics and statistical inference, such as parameter estimation and hypothesis testing. In a Bayesian analysis, the primary statistical result might simply take the form of a posterior distribution. However, the “Interpret” stage should also embed the findings of the analysis into the wider body of science to which it refers, thereby updating that body of knowledge. In doing so, the wider implications of the findings will emerge.
  • Inform : The Interpret stage will often suggest new Questions, and a new cycle of scientific investigation will thereby be initiated. First, however, the Interpret stage usually will be followed by the Inform stage. For a formal scientific study, findings should be formally written and communicated in peer-reviewed outlets, such as conferences, journals, and books. In fact, peer-review may lead to revisions in the Interpret stage before formal publication of the findings. Subsequent examination and evaluation of published studies by the scientific community may in turn lead to new interpretations of existing studies and a new Question, leading to a new Study. Where the Question is a request for input to a decision, the Inform stage is when the results of the Study are communicated to facilitate the decision-making process. New Questions may arise based on the output produced in the Inform stage. Alternatively, the original Question may need to be revisited in one or more future Studies, especially when the evidence is not yet adequate to merit a robust conclusion.

3. Science and Subjectivity

In all stages of the scientific method defined in Section 2 , expert judgment and knowledge are required. The relevant definitions are first presented. First, knowledge is defined (Oxford University Press 2018b ) as, “facts, information, and skills acquired by a person through experience or education; the theoretical or practical understanding of a subject.” Judgment is defined (Oxford University Press 2018a ) as, “the ability to make considered decisions or come to sensible conclusions.” By contrast, opinion (Oxford University Press 2018c ) is, “a view or judgment formed about something, not necessarily based on fact or knowledge.”

We consider that judgment implies using information and knowledge to form an assessment or evaluation. While it is possible that opinions can be similarly well-informed, by definition, opinion does not necessarily imply that external information was incorporated into the evaluation. It may simply be a view, judgment, or appraisal that fits within a belief system or is comfortable for some other reason. While “judgment” is normally thought of as being based on observable truth, “opinion” can be simply based on preference. The subjectivity in this article refers to judgment of experts based on knowledge, skill, and experience. We examine the roles of expert judgment in each of the four stages presented in Figure 1 , paying particular attention to integration of statistical and content expertise. We use content expertise to refer to the discipline expertise in which the Question arises. For more detail on expert scientific judgment, please see Brownstein ( 2018 ).

3.1. The Question Stage

When developing the Question, we rely heavily on the expert judgment of the content experts. The Question may arise from identification of a barrier or problem in need of an answer, a quest to understand the “why” behind some phenomenon, event, or process, or simply to better quantify some parameter, effect or disturbance. For example, one might ask “Why are some people able to fend off the negative impacts of an HIV infection while others cannot?” Knowledge of the literature and what other experiments or studies others have done to address this Question or related Questions is critical. Where the Question arises from a request for scientific input to inform a decision, the content experts serve key roles in formulating specific questions, such as “What can we say about the toxicity of this pollutant for fish in European rivers?”, or “Which areas in this catchment will be flooded if the catchment experiences a once-in-100-years weekly rainfall?”

While the content experts have primary responsibility to develop the Question, statisticians can elicit clarity on the framing of the Question by asking pertinent questions from their perspective. Inquiry from the statistician serves not only to establish and confirm the statistician’s understanding of the Question and its scientific context, but also to translate the question to a statistical framework, which may guide analytic decisions in the Study Stage. Indeed, strong listening and communication skills are critical for both the content and statistical experts! Barry Nussbaum, 2017 President of the ASA, shared his mantra, “It’s not what you said. It’s not what they heard. It’s what they say they heard” (Nussbaum 2017 ). Communication can be improved by echoing back your understanding of how you interpreted what you heard, and asking others to echo their understanding of the points you have made. Producing a written summary of the Question, evidence-based justification for the importance of the Question, and evidence needed to answer the Question allows the research team members a chance to check on how well they are communicating and where points of disagreement may still exist.

If the Question is seeking information on potential pathways or mechanisms, evidence for or against competing theories is presented, and decisions must be made on the rationale for how the Question will be pursued. The content experts play the main role here, but statistical expertise can be helpful when framing the Question to bring up potential statistical issues with the proposed approach.

Part of defining the Question is determining what evidence, measures and parameters would be useful and adequate to arrive at an answer. Properties of those measurements, including validity, reliability, cost, and distributional properties are considered. This is an area in which both content and statistical experts can contribute. For example, content experts may be focused on what data would be needed and whether primary data collection would be needed or existing secondary data would suffice. If the study needs to collect primary data, there will be a need for discussion of exactly what information will be desired and how to elicit that information. In turn, the statistician will seek to better understand properties of the desired measurements. Moreover, the statistician may highlight unmeasured influences, the impact of missing data, and whether sources of bias, confounding, or variability could be reduced or eliminated by appropriate study design or data collection methods.

In addition, a Study may involve more than one Question. In this case, discussions are needed regarding which Questions are considered primary or secondary, the interrelatedness or independence of the Questions, and whether any of the Questions can be addressed jointly. Statistical and content experts should collaborate in making these decisions.

Thinking through the issues that can (and will!) arise when defining the Question and information needed to address it requires effective collaboration and team effort. Effective collaboration requires mutual respect for team members and personal authenticity; an understanding of the strengths and skill sets of each member; an understanding and buy-in for a common goal; a willingness not only to hear and understand what another has said, but to embrace ideas different from yours; a willingness to ask for evidence and assumptions behind a statement of fact or belief; reliability and consistency in thought and behavior; a willingness to offer alternative ideas or approaches; a willingness to compromise, when appropriate. All of this rely on fluency, both written and oral, in the language used, including technical and discipline-specific terms. Cognizance of acronyms and language shortcuts is the first step to reducing them to enhance communication. We recommend that researchers create a written summary of decisions made in such discussions, and make sure such documents are accessible electronically to all study members for review and editing.

3.2. The Study Stage

Once the Question and pertinent measures are defined, the approach to gathering information (data) must be developed. Can the Question be answered in one study, or will it take a series of planned studies? What studies have been done before? Could any component of prior studies be replicated with modification or improvement in this study? Did previous studies report unanticipated problems that could be avoided in this study?

Much of the work in planning a study involves a collaborative effort among all members of the research team. Those researchers who will be “on the ground” collecting information will have expertise on what is feasible and what is not. The statistician can offer their expertise and judgment on many aspects of study design and implementation, such as strengths and weaknesses of different study designs, questionnaire development, psychometric properties of data collection instruments, issues surrounding sources and control of error, replication, operational randomization, and blinding. The content experts will share their knowledge and judgments on the target scope of inference, such as whether to estimate the 100-year flood plane for a large geographic area or just one river; for all people with a disease or only those in a local area who meet defined inclusion criteria. Content experts bring to the discussion additional information that can be considered in the study design, perhaps stemming from theorized or understood pathways, mechanisms, or concurrent influences by which observed outcomes can differ.

If working within the Bayesian framework, statisticians help elicit information from the content experts to feed into the prior distribution. Those approaching the problem from a frequentist perspective will also be looking to prior studies and expert judgment when developing a study design. No matter what analysis approach is used, the ultimate goal remains the same: to collect enough high quality information to effectively address the Question. Here, the word “quality” can encompass many aspects, including reduced variability of measures and removal or control of sources of biases, along with proper data collection and maintenance systems that protect the integrity of the data.

During the Study stage, prior to data collection, there should be enough information to create a study protocol, data monitoring plan, and a statistical analysis plan (SAP), upon which everyone agrees (Finfer and Bellomo 2009 ; Ellenberg, Fleming, and DeMets 2003 ; Ott 1991 ). Such SAPs are becoming more common and recognized by funders as an important part of the research process. While not every study will require these formal documents, the idea is that there is a common understanding on how the study will be conducted; how the data will be collected, monitored, and maintained; and the analytic approach used to address the Question. With the goal of transparency of the body of work to be accomplished, good documentation and data provenance are important components of scientific inquiry.

Unfortunately, forms of scientific malpractice, such as repeating the analysis with a slightly altered question in an attempt to achieve statistically significant results—a practice called HARKing (hypothesizing after the results are known)—or testing for many associations without prior hypotheses ( p -hacking) are commonplace in the scientific literature (Kerr 1998 ; Head et al. 2015 ). A well thought-out study design and SAP can help safeguard against urges to reanalyze the data later after obtaining disappointing results. However, it should be noted that the data monitoring process itself involves judgment, as exemplified in Section 4.1.3 and discussed further elsewhere (Pocock 2006 ).

A related practice of publishing a Registered Report, in which methods and proposed analyses are preregistered and peer-reviewed prior to conducting the research, is gaining momentum. The intent is to eliminate questionable research practices, such as selective reporting of findings and publication bias, by provisional acceptance for publication before data collection begins. Currently, the Center for Open Science lists 121 journals that use the Registered Reports publishing format as an option in regular submissions or as part of special issues (Center for Open Science 2018 ).

Often, before a study can begin, funding must be obtained. When there are commissioned studies, often researchers need to write a formal proposal to seek funding. A grant proposal provides an opportunity to present not only the justification for the study but also detail on the study process, analysis plan, and how the desired information will address the Question. The grant review process brings in a new external set of experts with judgments of their own on the merits of the proposal. Feedback for a proposal can identify strengths and weaknesses of the proposed work. Reviewers’ expert judgments on whether a particular study should be funded are intended to weed out studies without strong support and justification for both the importance of the Question and the development of the Study.

As the study progresses, important discussions and decisions are likely to be made at research team meetings. It is important that statisticians, like other key members of the team, are present at the table to listen, question, offer insight and expertise, and retain knowledge of decisions made. Written summaries of research meetings, documentation of any decisions made that were not anticipated in the Question or (earlier in the) Study stages or are necessary to successfully implement the study protocol, all promote transparency and reproducibility. Additional documentation for the Study stage that should be available for all team members to view may include computer code management systems, clearly documented analytic code, well-considered file structures and naming conventions, and common file space for all members to see such documentation. Subsequently, the team will synthesize information from the Study stage for use in the Interpret and Inform stages.

3.3. The Interpret Stage

The methodology used to analyze the data and the specifications of the assumed model feed into the interpretation of the data. Some analytic methods produce estimated probabilities that relate directly to the Question; others may provide information that helps address the Question, but perhaps more indirectly. The chosen methodology may produce parameter estimates that need to be understood in context of the model, and the alignment of the model to the observed data and prior beliefs needs to be considered.

The expertise of the statistician is needed both to understand the nuances of proper interpretation of the analytic results in context of the executed study, assumptions made, and modeling used and to guard against overinterpretation. For example, a statistician may help protect the team from common misconceptions and malpractice, such as tendencies to extend inferences to populations outside those under study or to interpret association as causation, or failure to consider the impact of unmeasured confounders, mediators or moderators in the interpretation of results. The statistical expert can protect against improper use or interpretation of a p -value, discuss the difference between clinical and statistical significance, and highlight the potential impact of missingness mechanisms and violations of statistical and causal assumptions on the results. If working in a Bayesian framework, the statistician can also discuss the impact of potential expertise bias and over or underprecision in specification of the prior.

Similarly, it is not appropriate for statisticians to focus solely on analytic methods and numeric results, or for content experts to delay involving statisticians until the Interpret stage. By not being involved in other study aspects and discussions, the statistician is poorly positioned to make modeling choices, recognize possible biases, and interpret findings in the context of the study as conducted, which could differ from the study as planned.

As the analytic results are interpreted in the framework of the Question and the Study protocol, both the content and statistical experts help the study team blend the (properly interpreted) new findings into their existing knowledge and understanding. This process will likely include team discussions of the clinical meaning and impact these findings might have in the population under study, and how the results could be explained or interpreted within the current framework of understanding. The discussion will include the strength of the evidence, based on posterior distributions, point estimates of key parameters, confidence or credibility intervals, etc. and consistency with prior studies, as well as potential weaknesses or caveats of the study posed by both the statistical and content experts.

3.4. The Inform Stage

Once the analytic results have been interpreted within the framework of the study design and measures used to address the Question, it is time to assess what was learned and share that information more broadly. Of course, those who developed the Question and often those who funded the study will need and expect a complete summary of the work done, describing how the results have informed the Question. Indeed, there may be interest in the work outside of academia, such as patients curious about new therapies for their conditions, policy-makers seeking to understand actions that may yield societal benefit, and others simply wondering about current topics and trends in various fields of science.

Scientific reports or publications, where the process, methods, results, and conclusions of a study can be shared broadly, are important tangible outcomes of the Inform stage. In each Inform stage, the team members build on what they knew previously and share what they learned from the findings, whether or not the results obtained were anticipated or exciting. To guard against publication bias, null results, in particular, must be communicated, despite disincentives for doing so (Franco, Malhotra, and Simonovits 2014 ; Easterbrook et al. 1991 ).

Strong communication and documentation skills in the Inform stage are paramount to ensure that all components of a study are documented as the study progresses and are presented clearly, completely, and with as much transparency as possible. While the statistician is frequently tasked with managing the sections describing the quantitative methods and results, they should also collaborate throughout the report to provide their input to interpretation and implications of the findings (see Section 3.3 ). Additional documentation from the Study stage, such as clearly written and readable computer code, can also be included as supplementary material .

The process of creating a scientific manuscript and undergoing the peer-review process for publication is another place in which statistical and content expert judgment enters into both the Interpret and Inform stages. Comments from others on drafts of the manuscript can lead to revised interpretation in light of new information or perspective from that feedback before submission of the manuscript for publication. Once submitted, additional expertise is gathered from the peer reviewers, such as calls for clearer evidence of claims or interpretations made, challenges made to stated justifications of assumptions or interpretations, call for greater transparency or detail in information presented, or citations to related work which may blend into the discussion section. Reviewer comments can greatly impact the quality and clarity of information presented in the final publication.

Recently, journal editors are considering re-examining their review processes, focusing acceptance or rejection on strength of evidence, not p -values. Locascio ( 2017b ) argues for a results-blind review system in which the reviewer makes a preliminary decision on the strength of the manuscript without knowing the results. Final decisions would not allow rejection on the basis of p -values. Commentary on the feasibility of this approach may be found elsewhere (Marks 2017 ; Grice 2017 ; Hyman 2017 ; Locascio 2017a ). We encourage journal reviewers to examine the appropriateness of the methods for the study under consideration, rather than accepting justification of methods simply based on their publication and use elsewhere.

When the Question has arisen to facilitate decision-making, the primary purpose of the Inform stage is to convey the scientific evidence to the decision-maker, after it has been assembled and analyzed in the Study and Interpret stages. Here, too, communication skills are particularly important. Governments are increasingly but not exclusively reliant on evidence for policy- and decision-making (Oliver and de Vocht 2017 ; HM Treasury 2015 ; Oliver, Lorenc, and Innvaer 2014 ; LaVange 2014 ), and there is much current interest in the challenges of communicating uncertainty to decision-makers (Cairney and Oliver 2017 ; National Academies of Sciences, Engineering, and Medicine et al. 2017; National Research Council et al. 2012a,b). Despite these challenges, it is essential to measure uncertainty and important to try to communicate it as effectively as possible. Understanding the perspectives of decision-makers (i.e., priorities and goals, options under consideration, risks/benefits), what processes they must follow, and time-constraints are helpful in such communications.

As depicted in our graphic of the scientific method ( Figure 1 ), learning and discovery is cyclical. Once we address one Question, new Questions often arise. Researchers may be interested in the work of other authors in similar fields, perhaps using information from published studies to inform their next study. Sometimes the results obtained are not definitive, or not adequate for robust decision-making, and ways to redirect the next investigation of the same or a revised version of the Question are planned. Other times, a replication Study is conducted based on the same Question simply to see if the results remain qualitatively similar, despite inevitable lack of perfect duplicability in all aspects of the study environment (Lindsay and Ehrenberg 1993 ).

3.5. But Is It Science?

When expert judgments are used in any stage of a scientific inquiry, the outcomes contain subjective elements. The inescapable conclusion is that science itself has a subjective component, aspects of which can be communicated probabilistically, and should be interpreted according to the theory of subjective probability (Anscombe and Aumann 1963 ). That is, while there may be objective information on which probabilities are based, there also will always be between-individual (scientist) variation.

There can be heated opposition to the notion of subjective science, with objectivity promoted as fundamental to the scientific method, and subjectivity considered as anathema to a true scientist. According to this viewpoint, subjectivity is unscientific, biased, possibly prejudiced. But, science cannot be totally objective. Scientists propose new or amended theories, choose experimental designs or statistical analysis techniques, interpret data, and so on. Although these judgments are subjective, expert judgments result from a synthesis of related prior knowledge and experiences based on observable evidence and careful reasoning. Indeed, making informed subjective judgments is one of the key features that distinguish a top scientist from a lesser one. Statistics similarly involves subjective judgments, as others have recently argued (Gelman and Hennig 2017 ).

In the design portion of the Study stage, by definition, information is not yet available from the study being planned. Instead, study design decisions are preposterior or preanalysis and must be based on (prior) external data and judgment. In this sense, regardless of the subsequent analysis, one could consider the design phase as automatically Bayesian. Designers employ varying degrees of formalism in developing the study design and statistical models. A formal Bayesian approach can be used to either develop a frequentist design (Bayes for frequentist) by, for example, finding a sample size or other design components that ensure

(see e.g., Shih 1995 for an implementation), or to ensure that Bayesian properties are in an acceptable region (Bayes for Bayes). We recommend practitioners consider additional use of Bayesian approaches, even if only to provide a vehicle for documenting the roles of judgments and as a platform for sensitivity analyses.

All researchers, irrespective of their philosophy or practice, use expert judgment in developing models and interpreting results. We must accept that there is subjectivity in every stage of scientific inquiry, but objectivity is nevertheless the fundamental goal. Therefore, we should base judgments on evidence and careful reasoning, and seek wherever possible to eliminate potential sources of bias.

Science also demands transparency. Ideally, it should be standard practice in all scientific investigations to document all subjective judgments that are made, together with the evidence and reasoning for those judgments. Although we recognize that this may be impractical in large studies with many investigators, we believe that it might be facilitated with suitable collaborative working tools. We suggest, therefore, that journals might consider requiring that authors provide such documentation routinely, either in an appendix to a submitted paper or in online supplementary material .

4. Case Studies

We present three case studies, each exhibits complexities that require collaborative input from content experts and statisticians. They highlight the roles that expert judgments play and the steps that were taken to ensure that judgments were as objective and scientific as possible.

4.1. Expert Judgment in Randomized Clinical Trials

4.1.1. background on bayesian clinical trials.

Bayesian approaches to clinical trial design, conduct and analysis have been shown to offer substantial improvements over traditional approaches in a variety of contexts. See Abrams, Spiegelhalter, and Myles ( 2004 ) and Berry et al. ( 2010 ), for a range of examples. The Bayesian approach formalizes using prestudy data and expert judgments in design, conduct, and analysis. Importantly, the formalism provides an effective language for discussing interim and final results, for example, by supporting statements such as, “in the light of accruing data, the probability that treatment A is better than treatment B by at least 5 percentage points is 0.95 …” It provides a natural way to compute and communicate predictive assessments such as futility, that is, in the light of current information, what is the probability that, ultimately, the trial will not be definitive. The Block HF and TOXO studies outlined below illustrate many of these characteristics.

4.1.2. The Block HF Study

The BlockHF study (Curtis et al. 2013 ) provides an example of the benefits of embedding evaluations in the Bayesian formalism. It was an adaptive randomized trial using Bayesian criteria, with specification of all features dependent on collaboration amongst clinical and statistical experts. The abstract states,

We enrolled patients who had indications for pacing with atrioventricular block; New York Heart Association (NYHA) class I, II, or III heart failure; and a left ventricular ejection fraction of 50% or less. Patients received a cardiac-resynchronization pacemaker or implantable cardioverter defibrillator (ICD) (the latter if the patient had an indication for defibrillation therapy) and were randomly assigned to standard right ventricular pacing or biventricular pacing. The primary outcome was the time to death from any cause, an urgent care visit for heart failure that required intravenous therapy, or a 15% or more increase in the left ventricular end-systolic volume index.

Two pacing regimens, biventricular (BiV) versus right ventricular (RiV) were compared using a Cox proportional hazards model. Analyses were stratified by two cardiac devices, with information on the two, stratum-specific hazard ratios (HRs) combined for monitoring. The Bayesian monitoring rules addressed patient safety, stopping for futility, and stopping if the treatment comparison was sufficiently convincing (probability that a clinically meaningful difference was sufficiently high), based on combining evidence over the two stratum-specific HRs. From the statistical analysis section,

An adaptive Bayesian study design allowing up to 1200 patients to undergo randomization was used, featuring sample size re-estimation and two interim analyses with prespecified trial-stopping rules …. An intention-to-treat analysis served as the primary analysis for all outcomes.

The trial resulted in a “win” for biventricular pacing compared to right ventricular, HR 0.74, 95% credible interval (0.60, 0.90). Table 1 presents the monitoring rules which are based on the following,

θ = a weighted average of stratum-specific log hazard ratios, each comparing BiV versus RiV pacing; PP 0 = posterior probability that the study objective has been met, PRR = posterior probability the study objective has not been met.

NOTE: See Curtis et al. ( 2013 ) for details.

  • θ = a weighted average of stratum-specific log HRs, each comparing BiV versus RiV pacing.
  • PP 0 = P { θ < log (0.90)|data} = pr { HR  < 0.90|data}; the posterior probability that the study objective has been met.
  • PRR = P { θ > log (0.90)|data} = pr { HR  > 0.90|data}; the posterior probability that the study objective has not been met.
  • P { θ  > 0|data} = pr { HR  > 1.00|data}; the posterior probability of a safety concern.

For example, if the probability of meeting the study objective is sufficiently high (PP 0 > 0.99 ), stop the study; if moderately high ( 0.90 < PP 0 < 0.99 ), continue the study with the current sample size target; if too low (PP 0 < 0.90 ) but there is a reasonable likelihood of success (PRR < 0.90), increase the sample size. Inclusion of such probability-based rules was an important benefit of using the Bayesian formalism. It supported complex decision rules that communicated effectively with the clinical experts on the monitoring board.

4.1.3. The “TOXO” Trial

When accruing data are consistent with a prior distribution, a trial can be stopped earlier than with traditional, likelihood-based monitoring. However, there are also situations wherein the prior and the data diverge and stopping is delayed, as is the case in the following, post hoc, rerunning of the Community Programs for Clinical Research on AIDS randomized trial of prevention of TOXO conducted in the 1990s. The study compared pharmacological prevention at a subtherapeutic dose with placebo, with all treatment groups carefully monitored for indications of disease. The premise of such prevention studies is that a low dose of a pharmaceutical that is typically used to treat overt disease may also prevent or delay onset. However, even a low dose of a pharmaceutical may induce adverse effects, such as toxicities or resistance to treatment, and consequently, watchful waiting (the placebo “intervention”) may be better than the potentially preventive treatment.

Jacobson et al. ( 1994 ) detail the several decisions and expert judgments needed to design and implement the trial, including choice of drugs, inclusion and exclusion criteria, clinical endpoints, and monitoring and analysis plans. The abstract of the article reporting results (Jacobson et al. 1994 ) states in part,

Pyrimethamine, 25 mg thrice weekly, was evaluated as primary prophylaxis for toxoplasmic encephalitis (TE) in a double-blind, randomized clinical trial in patients with human immunodeficiency virus (HIV) disease, absolute CD4 lymphocyte count of < 200/microL [CD 4 lymphocytes fight disease; a low level indicates immunodeficiency] (or prior AIDS-defining opportunistic infection), and the presence of serum IgG to Toxoplasma gondii.” …“There was a significantly higher death rate among patients receiving pyrimethamine [compared to control] (relative risk [RR], 2.5; 95% confidence interval [CI], 1.3–4.8; P =.006), even after adjusting for factors predictive of survival. The TE event rate was low in both treatment groups (not significant). Only 1 of 218 patients taking [the control intervention] but 7 of 117 taking aerosolized pentamidine for prophylaxis against Pneumocystis carinii pneumonia developed TE (adjusted RR …, 0.16; 95% CI, 0.01–1.79; P =.14). Thus, for HIV-infected patients receiving trimethoprim-sulfamethoxazole, additional prophylaxis for TE appears unnecessary.

The Data and Safety Monitoring Board monitored the trial at prespecified, calendar-determined dates using the O’Brien and Fleming ( 1979 ) boundaries. Early in the trial, these boundaries require substantial evidence (e.g., a small p -value) to stop and make a decision; as the trial approaches the predetermined maximum number of follow-up visits, the criterion is close to that for a fixed sample size trial.

The full, calendar time indexed database was available for the after-the-fact example of how monitoring might have proceeded using a Bayesian approach. This illustrative analysis evaluated the “TOXO or death” endpoint using the Cox ( 1972 ) proportional hazards model with adjustment for baseline CD 4 count. The illustrative trial was stopped when the posterior probability of benefit or the posterior probability of harm became sufficiently high. Importantly, prior elicitation occurred while the trial was being conducted, before any outcome information was available to the elicitees.

4.1.4. Model for the “TOXO” Trial

The HR (relative “TOXO or death” event rate between the two treatment groups) was modeled using a log-linear model with covariates treatment group ( z 1 j = 1 , if participant j received pyrimethamine; z 1 j = 0 , if placebo), and CD 4 cell count at study entry ( z 2 j ). The CD 4 covariate adjusted for possible differences in immune status at baseline between the two groups. Specifically, the log hazard-ratio is,

with β 1 < 0 indicating a benefit for pyrimethamine. A flat prior was used for the CD 4 effect ( β 2 ), and a variety of priors were developed for the pyrimethamine effect ( β 1 ). The choice of the Cox ( 1972 ) model and the use of a noninformative prior for β 2 were judgments of the statisticians. Though the Cox model has become the default choice when modeling the time to an event, it is based on important assumptions, namely that the hazard functions for both interventions are proportional and censoring is noninformative. As such, the model should only be adopted after careful consideration by experts, such as in this example. The choice of a noninformative prior distribution for β 2 reflects the clinicians’ and statisticians’ judgments that no information was available in advance of the trial on the association of CD 4 with TOXO incidence.

4.1.5. Elicitation in the “TOXO” Trial

As described in Carlin et al. ( 1993 ) and Chaloner et al. ( 1993 ), prior distributions for β 1 were elicited from five content experts—three HIV/AIDS clinicians, one person with AIDS conducting clinical research, and one AIDS epidemiologist. Elicitation occurred while the trial was in progress, only pretrial information was available to the elicitees. Two additional priors were included in the monitoring—an equally weighted mixture of the five elicited priors and a noninformative flat prior (equivalent to using the normalized, partial likelihood as the posterior distribution).

Elicitation targeted potentially observable, clinically meaningful features and then transformed responses to parameters of the Cox model. That is, rather than directly elicit a prior for the hazard ratio, each elicitee was asked to report their best estimate of the probability of TOXO or death in two years under placebo ( P 0 ) , then draw a picture of the distribution of the two-year probability under pyrimethamine, conditional on the estimate under placebo ( [ P pyri | P 0 ] ). Then, for each elicitee these conditional distributions were converted to a prior distribution for the log(hazard ratio) using the transformation, β 1 =   log   ( 1 − P 0 ) −   log   ( 1 − P pyri ) .

Figure 2 displays the elicitation results. At trial initiation, there was little known about the baseline incidence of TOXO, and so the content experts based their distributions on general expertise and analogy with other endpoints and contexts. The range of the five reported two-year incidence probabilities under placebo was wide, but elicitees believed that TOXO had a non-negligible baseline incidence. All elicitees were optimistic regarding pyrimethamine’s benefit, with experts C and E the most optimistic, placing all of their probability distribution for incidence under pyrimethamine way below their estimate of incidence under placebo.

An external file that holds a picture, illustration, etc.
Object name is UTAS_A_1529623_F0002_C.jpg

Elicited priors for the five elicitees. The red vertical line at P 0 is the “best guess” two-year incidence of TOXO or death under placebo. The smoothed densities are for two-year TOXO or death incidence under pyrimethamine, conditional on the placebo rate.

While a degree of optimism may be needed to motivate conducting a trial, ethics require that there be sufficient equipoise (uncertainty) regarding which treatment is superior in order to initiate a trial. The elicited priors in this example probably express too strong a belief in the efficacy of pyrimethamine to be used in monitoring an actual trial, but using them in this illustrative monitoring exercise with comparison to likelihood-based monitoring, effectively highlights issues we discuss in Section 4.1.7 .

4.1.6. Monitoring Results for the “TOXO” Trial

The actual trial was monitored at calendar dates (15 Jan 1991, 31 Jul 1991, 31 Dec 1991, 30 Mar 1992). At the December 31, 1991 meeting, the monitoring board recommended stopping the trial for futility, because the pyrimethamine group had not shown significantly fewer events, and the low overall event rate made a statistically significant difference in efficacy unlikely to emerge. Additionally, an increase in the number of deaths in the pyrimethamine group relative to the placebo indicated a safety issue.

For the illustrative, after the fact monitoring example, Figure 3 displays posterior probabilities of benefit (HR ≤ 0.75 , equivalently, β 1 ≤   log   ( 0.75 ) in the Cox model), and harm (HR > 1.0 ; β 1 > 0 ) for an equally weighted mixture of the five elicited prior distributions (denoted by “B”), and for a flat prior that generates partial likelihood-based/traditional monitoring (denoted by “L”). For example, β 1 =   log   ( 0.75 ) indicates that the hazard rate (event rate) in the pyrimethamine group is 75% of that in the control group; β 1 = 0 =   log   ( 1.0 ) indicates equal rates.

An external file that holds a picture, illustration, etc.
Object name is UTAS_A_1529623_F0003_C.jpg

Posterior probability of benefit (hazard ratio ≤ 0.75 ) and harm (hazard ratio > 1.0) for the mixture prior (blue lines) and for monitoring based on the partial likelihood (black lines), which is equivalent to using a flat (improper) prior. Monitoring was at calendar dates (15 Jan 1991, 31 Jul 1991, 31 Dec 1991, 30 Mar 1992) with the X-axis indicating the cumulative number of toxoplasmosis or death events from the combined arms.

As displayed in Figure 2 , elicitees believed that TOXO had a nonnegligible incidence and that pyrimethamine would have a substantial prophylactic effect. Thus, each prior and their mixture are to varying degrees far from the accruing information in the trial. Consequently, monitoring based on the partial likelihood, which can be considered “flat prior Bayes,” gives an earlier warning of harm compared to monitoring based on the mixture prior. The mixture required considerably more information to overcome the a priori optimism of the elicitees. Also, the posterior probability of harm based on monitoring with any single prior in Figure 2 would lag behind that based on the partial likelihood, with use of prior A, B, or D giving an earlier warning than use of priors C or E.

4.1.7. Discussion of the “TOXO” Trial

Traditional, non-Bayesian monitoring depends on several judgments. For example, trial design including maximum sample size depends on assumptions about baseline event rates, the HR, principal efficacy and safety endpoints, the statistical model, etc. In the Bayesian approach, a subset of these features are given prior distributions, in the TOXO example by independent priors for the slopes on the log HR and on the baseline CD 4 value. These priors, along with other design features determine the monitoring frequency, the shape of monitoring boundaries (flat, increasing, decreasing) and associated values. Boundaries are necessary for all monitored trials, determining them can be either Bayesian or frequentist.

The after-the-fact analysis highlights ethical issues generated by real-time use of expert knowledge. It shows that if the elicited priors had been used in the actual clinical trial monitoring, then due to their optimism it is very likely that trial stopping and other decisions would differ from those produced by likelihood-based (flat prior) monitoring. However, in early phase clinical applications, and in nonclinical applications, use of such optimistic priors can be appropriate and effective.

In the example, stopping would have been delayed, highlighting the question of whether it would be ethical to continue beyond a traditional stopping point due to prior beliefs that pyrimethamine would have a strong, prophylactic effect. Of course, if pyrimethamine had performed well, stopping would have been earlier than that based on traditional methods, also raising an ethical issue for some. And, if the priors had been pessimistic, but the data moderately optimistic, the Bayesian analysis would have tempered enthusiasm (delayed stopping) until convincing, positive, data had accrued.

The TOXO example shows one possible effect of using prior distributions in clinical evaluations designed to be definitive (Phase III). The example is based on the mixture that equally weighted the five prior distributions. The posterior distribution is also a mixture, but the posterior weights give more influence to the priors that are most compatible with the data, so there is some degree of automatic adjustment. Other options include separately monitoring with each prior and then at each monitoring point determine stopping based on the “majority rule” or requiring unanimity. Each choice will produce its own operating characteristic, so extensive simulations are needed to understand properties.

To summarize, in clinical trial monitoring prior information can have two main effects:

  • If prior information conflicts with the available data at the time of monitoring, it may suggest continuing the trial, at least until the next review. This situation can arise when the experts are more optimistic than the emerging data, as in the TOXO trial, or when they are more pessimistic. In either case, continuing the trial long enough to obtain sufficient evidence to convince the content experts may have an added benefit that the results would translate to practice relatively rapidly.
  • If prior information supports the available data at the time of monitoring, it may suggest terminating early, on the grounds that sufficient evidence now exists to make a decision.

An increasing number of trials are utilizing prior information (Abrams, Spiegelhalter, and Myles 2004 ; Berry et al. 2010 ), and the advent of such trials emphasizes the importance of prior judgments being made as carefully and as objectively as possible. Judgments should be sought from a sufficient number of content experts to properly represent the range of views. Determining the number and type of elicitees is, itself, an exercise in study design. Of course, prior distributions should be elicited carefully and rigorously, as described, for instance, in O’Hagan ( 2018 ).

4.2. Expert Judgment in Environmental Modeling

4.2.1. background: uk carbon flux.

This case study describes the estimation of a parameter in a complex environmental modeling problem. As a party to the Kyoto protocol, the United Kingdom is committed to specific target reductions in net emissions of greenhouse gases. Monitoring progress toward these targets is challenging, involving accounting for numerous sources of emissions. One potential mitigating factor is the ability of vegetation to remove carbon dioxide from the atmosphere, thereby acting as a “carbon sink.” However, accounting for this is also extremely complex. During the day, through the process of photosynthesis, vegetation absorbs carbon dioxide and releases oxygen, using the carbon to build plant material. Photosynthesis requires sunlight, chlorophyll in green leaves and water and minerals gathered by the plant’s roots, so the efficiency of carbon removal depends on factors such as weather, leaf surface area, and soil conditions. Conversely, carbon is released at night, carbon extracted from the atmosphere and converted to biomass will eventually be released as the plant ages and dies, and carbon in leaf litter is released by microbial action in the soil. The Sheffield Dynamic Global Vegetation Model (SDGVM) was built with mathematical descriptions of all these processes to predict net carbon sequestration due to vegetation (Woodward and Lomas 2004 ). For a given site, the model takes many inputs describing the type of vegetation cover and soil at the site, together with weather data, to estimate Net Biosphere Production (NBP), that is, the net decrease in atmospheric CO 2 , from that site over a given time period.

This case study used SDGVM to estimate the total NBP for England and Wales in the year 2000. It is important to recognize that there is inevitably uncertainty about all the model inputs. Uncertainty about inputs induces uncertainty about model outputs, and the study sought to quantify the output uncertainty in the form of a probability distribution for the total NBP. Details are reported in Kennedy et al. ( 2008 ) and Harris, O’Hagan, and Quegan ( 2010 ).

Before considering the statistical aspects of this case study, we first note that considerable content expertise had already gone into the development of SDGVM. Based on the available scientific knowledge, expert judgments were made in choosing the structure of the model and the equations that describe each of its biological processes. Care went into these choices to ensure that they were reasonable and scientifically defensible. However, in such modeling, it is usually unrealistic to include every process to the greatest level of detail and complexity that is believed to be applicable. First, the more complex and detailed the model, the longer it will take to compute; for practical reasons, it may be necessary to compromise on complexity. Second, more complex models may be less robust and reliable in their predictions, because at the highest level of detail, there is often less scientific consensus about the equations and the parameters within them. In addition, models with a large number of parameters may suffer from overfitting (Hawkins 2004 ). Judgments about the optimal degree of complexity to achieve a computable, accurate, and reliable representation of the phenomenon being modeled often demand a particularly high level of expertise.

4.2.2. Input Distributions for the SDGVM

The model was run over 707 grid cells covering England and Wales. For each grid cell, given the land cover in that cell, and given input parameters describing the soil composition and properties of the vegetation types, the model was first “spun-up” for 600 years. That is, the model was initialized with a default set of inputs describing the state of the vegetation in terms of ages, heights, leaf density, etc., and the state of the soil in terms of moisture content, age and quantity of organic matter, etc. Then the model was run forward for 600 years using historic climate data at that site from 1400 to 2000, to stabilize the vegetation and soil at states representative of how that site would have been in 2000. The model was then run forward for one more year using weather data from 2000, and the NBP for the year was computed for each grid cell. The NBP for England and Wales in 2000 is the sum of the NBP values across all the grid cells.

Care was taken to quantify the uncertainty in the various inputs as described previously (O’Hagan 2012 ). Briefly, details are described below.

  • Soil composition : A publicly available soil map for England and Wales (Bradley et al. 2005 ) was used to provide estimates of the soil composition in each grid cell. Because the map was created at a higher resolution than the grid cells used for SDGVM, figures were averaged over each grid cell to provide an estimate. The variance of the same figures over a grid cell, divided by the number of map points in the cell, was used to quantify uncertainty around the estimates for a grid cell. However, the variance was increased by a factor to represent (a) additional uncertainty in the map data and (b) spatial correlation within the cell. The decision to use an increased estimate of uncertainty in the model was based on an expert judgment on the part of the statisticians, in consultation with content experts.
  • Land cover : A map of land cover was also publicly available (Haines-Young et al. 2000), derived from satellite observation. The original analysis reported in Kennedy et al. ( 2008 ) did not quantify uncertainty in land cover. However, unlike the soil map, the content experts felt that the uncertainties in land cover were large and there could be biases in the process by which land cover is inferred from the satellite readings. In a subsequent analysis (Cripps, O’Hagan, and Quaife 2013 ; Harris, O’Hagan, and Quegan 2010 ), a statistical model was built to quantify uncertainty in land cover maps derived from remote sensing. The analysis of England and Wales NBP in 2000 was then extended to account for the additional uncertainty. The method makes use of the “confusion matrix,” which for the Haines-Young et al. (2000) map was given by Fuller et al. ( 2002 ). The confusion matrix is a contingency table based on a large survey of actual, ground-truth, land cover, and shows, for each ground-truth vegetation type, the frequency with which it was classified by the Haines-Young et al. (2000) map in each vegetation type. The statistical analysis required probabilistic inversion of the confusion matrix to derive, conditional on the satellite land cover, the probabilities of the various ground-truth cover. The careful expert judgments of statisticians and content experts are delineated in Cripps, O’Hagan, and Quaife ( 2013 ).
  • Vegetation properties : SDGVM classifies land cover into plant functional types (PFT). For England and Wales, we used four PFTs—evergreen needleleaf trees, deciduous broadleaf trees, crops, and grassland. Each PFT is associated with a set of quantities, including maximum age, stem growth rate and leaf lifespan. However, most of these inputs were missing. Some properties had been estimated experimentally, but only for very few individual species within a given PFT. We therefore used expert elicitation to construct a probability distribution for each parameter. Elicitation is an area where it is particularly important to take care to avoid biases that commonly arise in subjective judgments of probabilities (Kynn 2008 ). Another article arising from the Symposium addresses issues related to elicitation in detail (O’Hagan 2018 ).
  • Weather : Weather data, such as temperature, precipitation, and cloud cover, were available for each grid cell for every day in 2000. There are no doubt errors in these data, due not only to errors in the underlying daily measurements, but also to the fact that those measurements have been interpolated to produce the data at the level of grid cells. Nevertheless, it was felt that uncertainty in these inputs was relatively small and could not be quantified without adding extra assumptions. The decision not to trade (potentially unreasonable) assumptions for (potentially increased) precision for weather data was a judgment made jointly by statisticians and content experts.

4.2.3. Propagating the Input Uncertainty in the SDGVM

To propagate input uncertainty through a mechanistic model such as SDGVM, the simplest and most direct approach is by Monte Carlo. A large sample of random draws are made from the probability distributions of the inputs, and the model is run for each sampled input set. The resulting set of outputs is then a random sample from the output uncertainty distribution. However, like many models that are built to describe complex physical processes, the computational load in running the SDGVM model is substantial, and it would not have been feasible to propagate parameter uncertainty through the model in this way at even one of the grid cells. The problem of quantifying uncertainty in complex computer models has acquired the name uncertainty quantification (UQ), and various tools are available to enable uncertainty propagation. The analysis used Gaussian process emulation (O’Hagan 2006 ; Oakley 2016 ), which is probably the most popular UQ technique. Even so, it would not have been feasible to build emulators at every one of the 707 grid cells; 33 were chosen by content experts to represent, in their expert judgment, the range of conditions across England and Wales, and GP techniques were adapted to infer the magnitudes of output uncertainty at the other 674 cells. The computational techniques and accompanying statistical theory to manage the analysis are set out in Kennedy et al. ( 2008 ) and Gosling and O’Hagan ( 2006 ).

4.2.4. Results of the UK Carbon Flux Study

The content experts were certainly interested in knowing how much uncertainty in the total NBP would be induced by uncertainty in the inputs. Their instinctive approach to estimating the total NBP was to run the model just once with all of the inputs set to their expected values; we call their idea the plug-in estimate. Yet, the NBP output from SDGVM is a highly nonlinear function of its inputs. Therefore, the expected value of NBP, when we allow for uncertainty in the inputs, is not equal to the plug-in estimate. Statisticians incorporated the input uncertainty and computed the expected NBP for England and Wales in 2000 as 7.46 Mt C (megatonnes of carbon). By contrast, the plug-in estimate was 9.06 Mt C. Not only is this a substantial difference, but the standard deviation was estimated as 0.55 Mt C. Therefore, the total NBP was probably in the interval (6.36 Mt, 8.56 Mt C) and very likely to be less than the plug-in estimate.

The result was very surprising for the content experts. It seems that the explanation lies in their estimates of vegetation properties. In effect, the experts had estimated values that were approximately optimal for the plants to grow and absorb CO 2 . Any deviation from these values led to lower NBP. For the total NBP to be even close to the plug-in estimate required all the parameter values to be close to their estimates, a joint event that had very low probability.

In this case study, the combination of expert judgments from both content experts and statisticians, applied with as much care, rigor, transparency, and objectivity as possible, led to a scientific result that certainly highlighted the role of expert judgment and the statistical quantification of uncertainty, and also prompted new questions regarding the accuracy of current methods for carbon budgeting, with important implications for the science of global climate change. For example, see Cripps, O’Hagan, and Quaife ( 2013 ). The cycle of scientific investigation was thereby renewed.

5. Conclusions

Expert scientific judgment involves carefully considered conclusions and decisions based on deep knowledge and skills. The use of expert judgment is essential in and permeates all phases of scientific and policy studies. Expert knowledge is information; to ignore it or fail to obtain it incurs considerable opportunity costs. Judgments should be as objective as possible and based on data when available. Anything less is unscientific. Yet, deciding what data are relevant always involves degrees of judgment.

Written documentation that logs decisions is critical for informing stakeholders of how and in which study components (design, conduct, analysis, reporting, and decision-making) the principal judgments had impact. By “impact,” we mean that such judgments led to decisions that directly affected (or had potential to alter or influence) any aspect of how the study was conducted and/or results interpreted.

Elicitation of expert judgments to produce probability distributions that represent uncertainty about model parameters can be conducted informally, but such judgments are easily affected by unconscious cognitive biases, such as overoptimism or failure to recall all relevant evidence. When such distributions form an important part of a scientific activity, the expert judgments should be elicited scientifically and as objectively as possible, minimizing relevant sources of bias. Doing so requires a carefully designed process, an elicitation protocol, as fully discussed by O’Hagan ( 2018 ).

It should be unsurprising that statisticians have essential roles as scientists, ideally serving as leaders or coleaders in all study aspects. Indeed, virtually all aspects of a study have statistical content, though almost no aspects are solely statistical. Consequently, we advise statisticians to more pro-actively promote the needs for statistical principles to permeate all stages of research studies. Furthermore, we advise research authorities, such as journal editors and funding agencies, to recommend or even require thorough collaboration with one or more experts in statistics throughout the duration of all projects.

The 2017 Symposium will not produce a single position document on statistical practice like the “The ASA Statement on p -Values” that resulted from the 2015 ASA Board meeting (Wasserstein and Lazar 2016 ). However, we echo the call in Gibson ( 2017 ) for statisticians to better advocate for the importance of their involvement throughout the scientific process. Finally, we applaud stakeholders, such as the National Institutes of Health ( Collins and Tabak 2014 ) and the American Association for the Advancement of Science (McNutt 2014 ) in the USA, and National Institute for Health and Care Excellence (NICE 2012 ) and HM Treasury (2015) in the UK for leading calls for increased statistical rigor and understanding of uncertainty. We encourage additional funding agencies, journals, hiring and promotion committees, and others to join in the call for higher scientific standards, statistical and otherwise. Science in the twenty-first century and beyond deserves nothing less.

Supplementary Material

Funding statement.

The authors gratefully acknowledge support, in part, from the following: NCB: N/A; TAL: NIH-NIAID, U19-AI089680; PMA2020 from the Bill & Melinda Gates Foundation; AO’H: N/A; and JP: NIA grant P30AG028716. Authorship order is alphabetical.

  • Abrams K. R., Spiegelhalter D., and Myles J. P (2004), Bayesian Approaches to Clinical Trials and Health Care , New York: Wiley. [ Google Scholar ]
  • Anderson H., and Hepburn B (2016), “Scientific Method,” available at https://plato.stanford.edu/archives/sum2016/entries/scientific-method .
  • Anscombe F. J., and Aumann R. J (1963), “A Definition of Subjective Probability,” The Annals of Mathematical Statistics , 34 , 199–205. DOI: 10.1214/aoms/1177704255. [ CrossRef ] [ Google Scholar ]
  • Berry S. M., Carlin B. P., Lee J., and Müller P (2010), Bayesian Adaptive Methods for Clinical Trials , Boca Raton, FL: Chapman&Hall/CRC Press. [ Google Scholar ]
  • Bradley R., Bell J., Gauld J., Lilly A., Jordan C., Higgins A., and Milne R (2005), “UK Soil Database for Modelling Soil Carbon Fluxes and Land Use for the National Carbon Dioxide Inventory,” Report to Defra Project SP0511, Defra, London. [ Google Scholar ]
  • Brownstein N. C. (2018), “Perspective from the Literature on the Role of Expert Judgment in Scientific and Statistical Research and Practice,” arXiv no. 1809.04721 . [ Google Scholar ]
  • Cairney P., and Oliver K (2017), “Evidence-Based Policymaking Is Not Like Evidence-Based Medicine, So How Far Should You Go to Bridge the Divide Between Evidence and Policy?,” Health Research Policy and Systems , 15 , 35. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Carlin B. P., Chaloner K., Church T., Louis T. A., and Matts J. P (1993), “Bayesian Approaches for Monitoring Clinical Trials, With an Application to Toxoplasmic Encephalitis Prophylaxis,” The Statistician , 42 , 355–367. DOI: 10.2307/2348470. [ CrossRef ] [ Google Scholar ]
  • Center for Open Science (2018), “Registered Reports,” available at https://cos.io/rr/ .
  • Chaloner K., Church T., Louis T. A., and Matts J. P (1993), “Graphical Elicitation of a Prior Distribution for a Clinical Trial,” The Statistician , 42 , 341–353. DOI: 10.2307/2348469. [ CrossRef ] [ Google Scholar ]
  • Collins F. S., and Tabak L. A (2014), “NIH Plans to Enhance Reproducibility,” Nature , 505 , 612–613. DOI: 10.1038/505612a. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cox D. R. (1972), “Regression Models and Life Tables” (with discussion), Journal of the Royal Statistical Society , Series B , 34 , 187–220. DOI: 10.1111/j.2517-6161.1972.tb00899.x. [ CrossRef ] [ Google Scholar ]
  • Cripps E., O’Hagan A., and Quaife T (2013), “Quantifying Uncertainty in Remotely Sensed Land Cover Maps,” Stochastic Environmental Research and Risk Assessment , 27 , 1239–1251. DOI: 10.1007/s00477-012-0660-3. [ CrossRef ] [ Google Scholar ]
  • Curtis A., Worley S., Adamson P., Chung E., Niazi I., Sherfesee L., Shinn T., and Sutton M (2013), “Biventricular Pacing for Atrioventricular Block and Systolic Dysfunction,” New England Journal of Medicine , 368 , 1585–1593. DOI: 10.1056/NEJMoa1210356. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Easterbrook P. J., Gopalan R., Berlin J., and Matthews D. R (1991), “Publication Bias in Clinical Research,” The Lancet , 337 , 867–872. [ PubMed ] [ Google Scholar ]
  • Ellenberg S. S., Fleming T. R., and DeMets D. L (2003), Data Monitoring Committees in Clinical Trials: A Practical Perspective , Chichester: Wiley. [ Google Scholar ]
  • Finfer S., and Bellomo R (2009), “Why Publish Statistical Analysis Plans,” Critical Care and Resuscitation , 11 , 5–6. [ PubMed ] [ Google Scholar ]
  • Franco A., Malhotra N., and Simonovits G (2014), “Publication Bias in the Social Sciences: Unlocking the File Drawer,” Science , 345 , 1502–1505. DOI: 10.1126/science.1255484. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Fuller R., Smith G., Sanderson J., Hill R., Thomson A., Cox R., Brown N., Clarke R., Rothery P., and Gerard F (2002), “Countryside Survey 2000 Module 7. Land Cover Map 2000,” Final Report, CSLCM/Final. [ Google Scholar ]
  • Garland T., Jr. (2016), “Scientific Method as an Ongoing Process,” available at https://en.wikipedia.org/w/index.php?title=Scientific_method&oldid=822947033 .
  • Gelman A., and Hennig C (2017), “Beyond Subjective and Objective in Statistics,” Journal of the Royal Statistical Society , Series A , 180 , 967–1033. DOI: 10.1111/rssa.12276. [ CrossRef ] [ Google Scholar ]
  • Gibson E. W. (2017), “Leadership in Statistics: Increasing Our Value and Visibility,” The American Statistician , DOI: 10.1080/00031305.2017.1336484. [ CrossRef ] [ Google Scholar ]
  • Gosling J. P., and O’Hagan A (2006), “Understanding the Uncertainty in the Biospheric Carbon Flux for England and Wales,” Technical Report 567/06, University of Sheffield, available at http://tonyohagan.co.uk/academic/pdf/UUCF.pdf .
  • Grice J. W.(2017), “Comment on Locascio’s Results Blind Manuscript Evaluation Proposal,” Basic and Applied Social Psychology , 39 , 254–255. DOI: 10.1080/01973533.2017.1352505. [ CrossRef ] [ Google Scholar ]
  • Haines-Young R., Barr C., Black H., Briggs D., Bunce R., Clarke R., Cooper A., Dawson F., Firbank L., Fuller R., Furse M. T., Gillespie M. K., Hill R., Hornung M., Howard D. C., McCann T., Morecroft M. D., Petit S., Sier A. R. J., Smart S. M., Smith G. M., Stott A. P., Stuart R. C., and Watkins J. W (2000), “Accounting for Nature: Assessing Habitats in the UK Countryside,” Natural Environment Research Council and Centre for Ecology and Hydrology , Department of the Environment, Transport and the Regions, available at https://trove.nla.gov.au/work/17604300 . [ Google Scholar ]
  • Harris K., O’Hagan A., and Quegan S (2010), “The Impact of Satellite-Derived Land Cover Uncertainty on Carbon Cycle Calculations,” available at tonyohagan.co.uk/academic/pdf/CTCDPaper_v6.pdf. [ Google Scholar ]
  • Hawkins D. M.(2004), “The Problem of Overfitting,” Journal of Chemical Information and Computer Sciences , 44 , 1–12. DOI: 10.1021/ci0342472. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Head M. L., Holman L., Lanfear R., Kahn A. T., and Jennions M. D (2015), “The Extent and Consequences of P-Hacking in Science,” PLoS Biology , 13 , e1002106 DOI: 10.1371/journal.pbio.1002106. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • HM Treasury (2015), “The Aqua Book: Guidance on Producing Quality Analysis for Government,” HM Government, London, UK , nd https://www.gov.uk/government/publications/theaqua-book-guidance-on-producing-quality-analysis-for-government (accessed July 10, 2017).
  • Hyman M. R.(2017), “Can ‘Results Blind Manuscript Evaluation’ Assuage ‘Publication Bias’?,” Basic and Applied Social Psychology , 39 , 247–251. DOI: 10.1080/01973533.2017.1350581. [ CrossRef ] [ Google Scholar ]
  • Jacobson M., Besch C., Child C., Hafner R., Matts J., Muth K., Wentworth D., Neaton J., Abrams D., Rimland D., and Perez G (1994), “Primary Prophylaxis With Pyrimethamine for Toxoplasmic Encephalitis in Patients With Advanced Human Immunodeficiency Virus Disease: Results of a Randomized Trial,” The Journal of Infectious Diseases , 169 , 384–394. DOI: 10.1093/infdis/169.2.384. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kennedy M., Anderson C., O’Hagan A., Lomas M., Woodward I., Gosling J. P., and Heinemeyer A (2008), “Quantifying Uncertainty in the Biospheric Carbon Flux for England and Wales,” Journal of the Royal Statistical Society , Series A , 171 , 109–135. [ Google Scholar ]
  • Kerr N. L. (1998), “Harking: Hypothesizing After the Results Are Known,” Personality and Social Psychology Review , 2 , 196–217, PMID: 15647155. DOI: 10.1207/s15327957pspr0203_4. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kynn M. (2008), “The ‘Heuristics and Biases’ Bias in Expert Elicitation,” Journal of the Royal Statistical Society , Series A , 171 , 239–264. [ Google Scholar ]
  • LaVange L. M. (2014), “The Role of Statistics in Regulatory Decision Making,” Therapeutic Innovation & Regulatory Science , 48 , 10–19. DOI: 10.1177/2168479013514418. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lindsay R. M., and Ehrenberg A. S (1993), “The Design of Replicated Studies,” The American Statistician , 47 , 217–228. DOI: 10.2307/2684982. [ CrossRef ] [ Google Scholar ]
  • Locascio J. J. (2017a), “Rejoinder to Responses on ‘Results-Blind Publishing’,” Basic and Applied Social Psychology , 39 , 258–261. DOI: 10.1080/01973533.2017.1356305. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Locascio J. J. (2017b), “Results Blind Science Publishing,” Basic and Applied Social Psychology , 39 , 239–246. DOI: 10.1080/01973533.2017.1336093. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Marks M. (2017), “Commentary on Locascio,” Basic and Applied Social Psychology , 39 , 252–253. DOI: 10.1080/01973533.2017.1350580. [ CrossRef ] [ Google Scholar ]
  • McNutt M. (2014), “Journals Unite for Reproducibility,” Science , 346 , 679–679. [ PubMed ] [ Google Scholar ]
  • National Academies of Sciences, Engineering, and Medicine, Division of Behavioral and Social Sciences and Education, Committee on National Statistics, Panel on Improving Federal Statistics for Policy and Social Science Research Using Multiple Data Sources and State-of-the-Art Estimation Methods , Harris-Kojetin B. A., Groves R. M., eds. (2017), Federal Statistics, Multiple Data Sources, and Privacy Protection: Next Steps , Washington, DC: National Academies Press. [ PubMed ] [ Google Scholar ]
  • National Research Council, Division on Engineering and Physical Sciences, Board on Mathematical Sciences and Their Applications, and Committee on Mathematical Foundations of Verification, Validation, and Uncertainty Quantification (2012a), Assessing the Reliability of Complex Models: Mathematical and Statistical Foundations of Verification, Validation, and Uncertainty Quantification , Washington, DC: National Academies Press. [ Google Scholar ]
  • National Research Council, Division of Behavioral and Social Sciences and Education, Division on Engineering and Physical Sciences, Committee on National Statistics, Computer Science and Telecommunications Board, and Panel on Communicating National Science Foundation Science and Engineering Information to Data Users (2012b), Communicating Science and Engineering Data in the Information Age , Washington, DC: National Academies Press. [ Google Scholar ]
  • NICE (2012), “The Guidelines Manual: Process and Methods,” available at https://www.nice.org.uk/process/pmg6/ .
  • Nussbaum B. (2017), “Introductory Remarks: JSM 2017 Presidential Invited Address,” available at https://ww2.amstat.org/meetings/jsm/2017/webcasts/index.cfm .
  • Oakley J. (2016), “Introduction to Uncertainty Quantification and Gaussian Processes,” available at http://gpss.cc/gpuqss16/slides/oakley.pdf .
  • O’Brien P. C., and Fleming T. R (1979), “A Multiple Testing Procedure for Clinical Trials,” Biometrics , 35 , 549–556. [ PubMed ] [ Google Scholar ]
  • O’Hagan A. (2006), “Bayesian Analysis of Computer Code Outputs: A Tutorial,” Reliability Engineering & System Safety , 91 , 1290–1300. DOI: 10.1016/j.ress.2005.11.025. [ CrossRef ] [ Google Scholar ]
  • O’Hagan A. (2012), “Probabilistic Uncertainty Specification: Overview, Elaboration Techniques and Their Application to a Mechanistic Model of Carbon Flux,” Environmental Modelling & Software , 36 , 35–48. DOI: 10.1016/j.envsoft.2011.03.003. [ CrossRef ] [ Google Scholar ]
  • O’Hagan A. (2018), “Expert Knowledge Elicitation: Subjective, But Scientific,” [submitted]. [ Google Scholar ]
  • Oliver K. A., and de Vocht F (2017), “Defining ‘Evidence’ In Public Health: A Survey of Policymakers’ Uses and Preferences ,” European Journal of Public Health , 27 , 112–117. DOI: 10.1093/eurpub/ckv082. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Oliver K., Lorenc T., and Innvaer S (2014), “New Directions in Evidence-Based Policy Research: A Critical Analysis of the Literature,” Health Research Policy and Systems , 12 , 34, DOI: 10.1186/1478-4505-12-34. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ott M. G. (1991), “Importance of the Study Protocol in Epidemiologic Research,” Journal of Occupational Medicine , 33 , 1236–1239. [ PubMed ] [ Google Scholar ]
  • Oxford University Press (2018a), “Judgment,” available at https://en.oxforddictionaries.com/definition/judgment .
  • Oxford University Press (2018b), “Knowledge,” available at https://en.oxforddictionaries.com/definition/knowledge .
  • Oxford University Press (2018c), “Opinion,” available at https://en.oxforddictionaries.com/definition/opinion .
  • Oxford University Press (2018d), “Scientific Method,” available at https://en.oxforddictionaries.com/definition/scientific_method .
  • Pocock S. J. (2006), “Current Controversies in Data Monitoring for Clinical Trials,” Clinical Trials , 3 , 513–521. DOI: 10.1177/1740774506073467. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shih J. (1995), “Sample Size Calculation for Complex Clinical Trials With Survival Endpoints,” Controlled Clinical Trials , 16 , 395–407. DOI: 10.1016/S0197-2456(95)00132-8. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wasserstein R. L., and Lazar N. A (2016), “The ASA’s Statement on p -Values: Context, Process, and Purpose,” The American Statistician , 70 , 129–133. DOI: 10.1080/00031305.2016.1154108. [ CrossRef ] [ Google Scholar ]
  • Woodward F. I., and Lomas M. R (2004), “Vegetation Dynamics—Simulating Responses to Climatic Change,” Biological Reviews , 79 , 643–670. DOI: 10.1017/S1464793103006419. [ PubMed ] [ CrossRef ] [ Google Scholar ]

IMAGES

  1. (PDF) Do Judgmental Researchers Use their Own Research? A Review of

    research paper about judgement

  2. (PDF) A code of best practice for judgement-based operational research

    research paper about judgement

  3. (PDF) Mistaking Judgments of the Agreeable and Judgments of Taste

    research paper about judgement

  4. Wise Judgement Scenario Essay Example

    research paper about judgement

  5. (PDF) Clinical judgment analysis

    research paper about judgement

  6. Judgement Writing

    research paper about judgement

VIDEO

  1. Important judgment, agreement ruling, Supreme Court and High Court judgement, #judgenent #decision

  2. What are Judging and Perceiving

  3. the judgement x paper

  4. Reading and understanding judgment in a better way by Justice Arjan Kumar Sikri Former Judge SC

  5. Judgment Walkthrough Part 1 No Commentary

  6. Judgment Walkthrough Part 26 No Commentary

COMMENTS

  1. Moral judgement and decision-making: theoretical predictions ...

    The study of moral judgement and decision making examines the way predictions made by moral and ethical theories fare in real world settings. Such investigations are carried out using a variety of ...

  2. PDF Fuel in the Fire: How Anger Impacts Judgment and Decision-Making

    Fuel in the Fire: How Anger Impacts Judgment and Decision-Making Paul M. Litvak, Jennifer S. Lerner, Larissa Z. Tiedens, and Katherine Shonk Abstract In keeping with the handbook format, this chapter identifies four types of methods in the behavioral decision-making literature for detecting the influence of anger on judgments and choices.

  3. The Psychology of Morality: A Review and Analysis of Empirical Studies

    With this procedure, we found 1,278 papers published from 1940 through 2017 that report research addressing morality. Notwithstanding the enormous research interest visible in empirical publications on morality, a comprehensive overview of this literature is lacking. ... 84% for moral judgment, 87% for moral self-views, and 95% for moral ...

  4. Judgment and Decision Making

    The science of judgment and decision making involves three interrelated forms of research: analysis of the decisions people face, description of their natural responses, and interventions meant to help them do better. After briefly introducing the field's intellectual foundations, we review recent basic research into the three core elements of decision making: judgment, or how people predict ...

  5. The effects of visualization on judgment and decision-making: a

    The visualization of information is a widely used tool to improve comprehension and, ultimately, decision-making in strategic management decisions as well as in a diverse array of other domains. Across social science research, many findings have supported this rationale. However, empirical results vary significantly in terms of the variables and mechanisms studied as well as their resulting ...

  6. Judgment and Decision Making

    Prelude: Classical Economics. The advent of research in judgment and decision making in psychology was directly related to how these topics were studied in the field of economics (see Becker & McClintock, 1967, for a review).Economic theory proposed to identify the best possible solution to a problem given the decision maker's values and preferences (for reviews, see Baron, 2012; Fischhoff ...

  7. An Overview of Judgment and Decision Making Research Through the Lens

    Abstract. We present the basic tenets of fuzzy trace theory, a comprehensive theory of memory, judgment, and decision making that is grounded in research on how information is stored as knowledge, mentally represented, retrieved from storage, and processed. In doing so, we highlight how it is distinguished from traditional models of decision ...

  8. Moral judgement and decision-making: theoretical predictions and null

    The study of moral judgement and decision making examines the way predictions made by moral and ethical theories fare in real world settings. Such investigations are carried out using a variety of approaches and methods, such as experiments, modeling, and observational and field studies, in a variety of populations.

  9. PDF Essays on Judgment and Decision Making

    Essays on Judgment and Decision Making Abstract Judgment and choice under uncertainty does not occur in a vacuum—instead, it occurs amidst a rich web of social and contextual cues that profoundly shape how individuals navigate their worlds. Each of the essays in this dissertation, while primarily intended as an independent

  10. PDF Moral judgement and decision-making: theoretical predictions ...

    We highlight the importance of such null‐results papers, especially in fields that are traditionally governed by theoretical frameworks. The study of moral judgement and decision making examines ...

  11. What is an Academic Judgement?

    Academic judgement is not any judgement made by an academic; it is a judgement that is made about a matter where the opinion of an academic expert is essential. So for example a judgement about marks awarded, degree classification, research methodology, whether feedback is correct or adequate, and the content or outcomes of a course will ...

  12. Professional judgement and decision-making in social work

    The first focused upon risk in social work (Whittaker & Taylor, 2017) and this special issue focuses upon professional judgement and decision-making. It consists of eight articles across a range of countries and settings that examine key issues that are relevant to practitioners and managers as well as researchers and policy-makers.

  13. Judgment and Decision Making Under Uncertainty: Descriptive ...

    As has long been de rigueur in JDM research, papers that integrate two of more of these perspectives are strongly favored. From a descriptive vantage point, submitted articles could focus on aspects of judgment and/or decision-making under uncertainty, in general, or on judgment and decision-making in a specific domain.

  14. (PDF) Moral Judgments

    Research on morality has increased rapidly over the past 10 years. At the center of this research are moral judgments—evaluative judgments that a perceiver makes in response to a moral norm ...

  15. Judgments: Articles, Research, & Case Studies on Judgments

    Chugh, Milkman, and Bazerman organize the scattered knowledge that judgment and decision-making scholars have amassed over several decades about how to reduce biased decision-making. Their analysis of the existing literature on improvement strategies is designed to highlight the most promising avenues for future research.

  16. Full article: The Role of Expert Judgment in Statistical Inference and

    1 Introduction. As participants in the October 2017 Symposium on Statistical Inference (SSI), organized and sponsored by the American Statistical Association (ASA), we were challenged to host a session and write a paper inspired by the question, "Do expert opinion and judgment have a role in statistical inference and evidence-based decision-making?"

  17. (PDF) The judgment of personality: An overview of current empirical

    8112, Pocatello, ID 83209, USA. E-mail: [email protected]. Abstract. This article presents an overview of the current state of knowledge in personality judgment. research. W e discuss accuracy ...

  18. Rethinking the field of automatic prediction of court decisions

    The papers surveyed that involve judgement categorisation can be found in Table 2. For all fifteen papers, we indicate the paper itself, the court, whether or not the authors provide a method of analysing feature importance (FI) and consequently identify specific predictors of the outcome within the text, and the maximum performance. ...

  19. Research on Teachers' Pedagogical Thoughts, Judgments, Decisions, and

    Using a lens-model analysis to identify the factors in teacher judgment 1980 April East Lansing Institute for Research on Teaching, Michigan State University (Research Series No. 73). Google Scholar Cadwell J. Alternative regression models of teacher judgment and decision making. 1980 April Paper presented at the annual meeting of the American ...

  20. Research bias in judgement bias studies

    1. Introduction. Judgement bias has been a behavioural research topic in real estate valuation for many years. As in many economic domains, behavioural research erupted in real estate valuation research, since the early 1990s onwards as a counterpart to the more traditional finance approach that has long dominated the research agenda of scholars.

  21. Research Paper: Judgment, Judging, and Judgmental

    judg·ment: noun - the ability to make considered decisions or come to sensible conclusions. judg·ing: verb - action of forming an opinion or conclusion about. judg·men·tal: adjective - having or displaying an excessively critical point of view1. Look up!

  22. Research Paper About Judgement

    Order Now. 1753. Finished Papers. Min Area (sq ft) Meet Eveline! Her commitment to quality surprises both the students and fellow team members. Eveline never stops until you're 100% satisfied with the result. She believes essay writing to be her specialty. Research Paper About Judgement, Problem Solving Eat Bulaga Twitter, Postman Essay In ...

  23. The Role of Expert Judgment in Statistical Inference and Evidence-Based

    1. Introduction. As participants in the October 2017 Symposium on Statistical Inference (SSI), organized and sponsored by the American Statistical Association (ASA), we were challenged to host a session and write a paper inspired by the question, "Do expert opinion and judgment have a role in statistical inference and evidence-based decision-making?"

  24. Elections 2024 April 26 highlights: Over 63% turnout, LS polls phase-2

    The Supreme Court on April 18 reserved its judgment on a batch of petitions seeking directions to tally Voter-Verifiable Paper Audit Trail (VVPAT) slips with votes cast through Electronic Voting ...