Why Maximize Expected Value?
Summary. Standard Bayesian decision theory tells us to maximize the expected value of our actions.[1] For instance, suppose we see a number of kittens stuck in trees, and we decide that saving some number n of kittens is n times as good as saving one kitten. Then, if we are faced with the choice of either saving a single kitten with certainty or having a 50-50 shot at saving three kittens (where, if we fail, we save no kittens), then we ought to try to save the three kittens, because doing so has expected value 1.5 (= 3*0.5 + 0*0.5), rather than the expected value of 1 (= 1*1) associated with saving the single kitten. But why expected value? Why not instead maximize some other function of probabilities and values? I present two intuitive arguments in this piece. First, in certain situations, maximizing the expected number of organisms helped is equivalent to maximizing the probability that any given organism is helped. Second, even in cases where that isn't true, the law of large numbers will often guarantee a better outcome over the long run, especially in cases involving apparent physical randomness.
A Fictional Example
An unknown disease has broken out among the 20,000 inhabitants of a small island. The disease is highly contagious: it spreads to everyone on the island before anyone detects it. Fortunately, because the island is isolated, there is no danger that the disease will spread to other parts of the world. Unfortunately for the islanders themselves, the disease is also 100% fatal, and each person now has only three days to live.
The world medical community has no drugs to treat the disease, or even to stave off its fatal side effects. Nonetheless, medical teams are dispatched to the island in order to provide palliative care. The medical teams have a limited budget of $10,000 with which to buy analgesics that, if successful, will alleviate the painfulness of death by the disease. You, the director of the medical team, are deciding which of two possible medicines to buy.
Since you believe that pointless suffering prior to death is equally bad regardless of which of the islanders experiences it, you are of the opinion that successfully treating n people is n times as good as successfully treating one person. You reason as follows: "If we buy SureRelieve, we are guaranteed to prevent the suffering of 10,000/2.04 = 4,900 people. If we choose CheapRelieve, we'll be able to buy 10,000 treatments, but it's unclear how many people we'll help. Since each treatment has a 50% chance of success, the expected value of the number of people helped is 10,000*0.5 + 0*0.5 = 5,000. This is higher than 4,900, so we should buy CheapRelieve."
But what if lots more medicines fail than expected? What if, say, only 4,800 of them work? Then we will have "gambled away" treatments that could have helped 100 people. Isn't it better to stick with the safe bet?
Point 1: Take a Vote
Suppose we don't decide ahead of time which of the islanders will get the treatments we buy. Then if we have t treatments, the probability is t/20,000 that any individual will get a treatment. We then take a poll of the islanders to ask if they would prefer having the medical team buy all SureRelieve, all CheapRelieve, or some combination of both.
If the islanders vote for the option that maximizes their probability of being successfully treated, then they will all vote to buy all CheapRelieve. This follows from a simple
Theorem. Suppose there are N organisms who will experience some amount of brutal pain unless they receive help. Let T be a random variable for the number of organisms--randomly chosen from the N organisms--that will successfully avoid the painful experience by receiving help. (T is always less than or equal to N.) Then the probability that any organism avoids the pain is E(T) / N, where E(T) denotes the expected value of T. In particular, the probability of avoiding the pain always increases as E(T) increases, regardless of the variance of T.
We can also apply this thought to the kitten example from before. Suppose you're one of the kittens, and you're deciding whether you want your potential rescuer to save one of the three or take a 50-50 shot at saving all three. In the former case, the probability is 1/3 that you'll be saved. In the latter case, the probability is 1 that you'll be saved if the rescuer is successful and 0 if not. Since each of these is equally likely, your overall probability of being saved is (1/2)*1 + (1/2)*0 = 1/2, which is bigger than 1/3.
I should note that in practice people in situations like that of the islanders may not actually choose the option that maximizes their probability of being helped, perhaps on account of ambiguity aversion, as illustrated in the Ellsberg paradox. Not knowing how many total successful treatments are available may be more ambiguous than knowing the actual number of treatments and merely being uncertain about who will receive them.
Point 2: The Law of Large Numbers
The above point works well in situations where the potential benefits being distributed are equal, so that people care only about their probability of receiving the benefit. But what about situations where potential benefits are unequal--e.g., preventing someone from getting a cold versus preventing someone from getting malaria? Clearly it's not desirable for people merely to choose the option that maximizes their probability of getting some treatment, because, e.g., a probability 1/2 of avoiding the common cold is clearly not better than a probability 1/3 of avoiding malaria. We need to impose some utility function on different outcomes that specifies how much better malaria prevention is than cold prevention.
If we randomly distributed cold-prevention and malaria-prevention among a group of people who maximized their expected individual utility, then it's not hard to show that they would prefer the treatment method that maximized the expected utility of the whole group. But this begs the question, because we need to understand why people would want to maximize their expected individual utility.
The reason that is usually put forward is that, when decisions are made repeatedly regarding some random event, maximizing expected value makes it probable that, over long periods of time, you'll maximize the actual average value. This follows from the law of large numbers, which says that if we do enough uncorrelated random trials (e.g., flipping a die enough times), we can become as certain as we like that the actual average value we observe in our trials (e.g., the average of the dice rolls that we make) will be as close as we like to the expected value (which, in this case, is 3.5 = 1*(1/6) + 2*(1/6) + ... + 6*(1/6)).[2]
In the island disease example, the number of people treated by CheapRelieve is a sum of 10,000 random outcomes. This is a "large number," which means the probability that the actual number of people treated deviates significantly from 5,000 is small. In fact, the chance is only 2.3% that CheapRelieve will successfully treat fewer people than SureRelieve.[3]
What about Mixed Strategies?
For instance, why not spend $5,000 on SureRelieve and $5,000 on CheapRelieve? With this strategy, you can buy 2,450 SureRelieve treatments and 5,000 CheapRelieve treatments. The expected number of people helped is 2,450 + 0.5*5,000 = 4,950. Here, we've bought a little bit of "insurance" against extremely low numbers of people helped, but at the cost of the chance to actually help more people. Even here, the chance is only 21% that our mixed strategy will help more people than the riskier strategy.[4]
If we had spent less than 50% of our budget on SureRelieve, this gap in expected values would have narrowed, but our insurance would have declined along with it. I see no reason to prefer a mixed strategy: if buying some CheapRelieve will help more than buying no CheapRelieve, then buying all CheapRelieve will be even better. If the improvement of buying all CheapRelieve over mostly CheapRelieve is hard to see with only 10,000 people getting treatments, then consider 10 trillion or 10 googol. In those cases, it's practically guaranteed that you'll help more people by buying all CheapRelieve. (And as the Appendix points out, for certain types of randomness, the number of replications may not be merely 10 trillion or 10 googol but infinitely many.)
Implications
Now consider the following. You are again the medical-project director, and you discover that you've gotten an extra donation of $51 with which to buy more medicines. If you buy the SureRelieve, you'll be guaranteed to help 51/2.04 = 25 people. If you buy CheapRelieve, the expected number of people you'll help is 25.5. But now, there's a 44% chance that CheapRelieve will help fewer people, perhaps several fewer. Do you decide that, unlike before, this case is too risky, so it's best to play it safe?
Hopefully not. The extra $51 is not isolated; it's part of the overall budget. If you had started out with a budget of $10,051, the no-mixed-strategies argument above says that you should have used all of it to buy CheapRelieve, because that would have almost guaranteed a better outcome, possibly much better.
It's tempting to be risk-averse with our charitable actions. For instance, suppose we decide to invest $1,000 in the capital markets while we wait to donate it to a humanitarian group. We might say to ourselves, "This money is for an important purpose. I would feel so bad if I invested it in a fund that tanked and lost most of its value. No, I'm going to stick with safe investments that will guarantee that this money gets to those who need it. I'll invest it in government bonds, rather than some high-risk stock or derivative security." We might proceed to invest the money, earn 4% interest over the next year, and donate the $1,040 to our favorite charity, feeling good about ourselves the whole time.
But how is this example different from the medical-program director who
gets the extra $51 donation? Some of the work that our charity does will
happen with or without our donation; we'll just be expanding the amount of
work that the charity can do. From this broader perspective, it won't be
catastrophic if our $10,000 disappears, because other money will still be
there. But if we achieve really high returns from our risky investment,[5] we will have done a lot more good. (See The Case for
Risky Investments.
)
Infinite Outcomes
As William Feller notes on p. 251 of
An Introduction to Probability Theory and Its Applications, the weak law
of large numbers fails for random variables with infinite expectation, so
the long-run-average argument falls through. Similarly, the von-Neumann
Morgenstern expected-utility
theorem, which is also sometimes invoked, relies on a continuity axiom
that fails to hold when we allow infinitely large utility values (without
also allowing infinitesimal probabilities). See this
section of A
Defense of Pascal's Wager
for some approaches to infinite decision
theory.
What about Isolated Actions?
The long-run-average idea applies to cases in which our donations or actions will be one part of a larger ensemble of actions. But what if that isn't the case? What if we encounter a one-time all-or-nothing situation in which we can't rest assured that the law of large numbers will make things work out okay overall?
Scenario. You are the only sentient organism in the universe, but you learn that, at 5 p.m. tomorrow, 2 million people will come into existence for an hour, be brutally tortured, and then vanish again. No other sentient organisms will exist afterwards.
You discover a certain box that has two buttons, one red and one blue. The Red Button, if pressed, has a one-in-a-million chance of preventing all two million of the people from being tortured; instead, they'll come into existence for an hour and read the newspaper before vanishing. If the Blue Button is pressed, it will, with certainty, allow exactly one of the two million people to avoid torment and instead read the newspaper. You can only press one button because once one of these two buttons is pressed, the box vanishes forever.
Here, the argument about long-run averages seems not to apply
because there are no repetitions of the event. The take a vote
argument
would apply, if we could poll in advance the 2 million people that would be
coming into existence. However, it's possible to devise more complicated
thought experiments in which this point, too, would break down. At this point, I would be willing simply to accept the expected-value criterion as an axiomatic intuition: The potential good accomplished by the
Red Button is just so great that a chance for it shouldn't be forgone. However, below I survey two additional arguments.
Argument 1: Quantum MWI
The many-worlds interpretation (MWI) of quantum mechanics enjoys relatively large support among certain groups of physicists and presents what I consider a more coherent view than the Copenhagen interpretation. According to MWI, apparently random quantum events do not select a particular measurement outcome; rather, all of the possibilities are realized in different, parallel worlds. For instance, if we put a cat into a box hooked up to Geiger-counter-triggered poisonous-gas machine, it's not that there's a 50% chance that the cat will be killed; rather, there are two different world branches, and in one, the cat actually is killed. Thus, an expected value (using a probability distribution that matches the fractions of the various worlds realized) does not just reflect what might happen: It actually counts what does happen. So if the efficacy of the Red Button in the previous example is determined by a quantum outcome, the fact that this is a one-shot
action doesn't matter: In a small fraction of worlds, you actually do prevent all 2 million people from being tortured!
Two qualifications are in order. First, the naïve picture about counting numbers of worlds
is not quite right -- see, e.g., Understanding Deutsch's Probability in a Deterministic Multiverse
by Hilary Greaves (2004), sec. 5.3. What really counts are measures given by the Born rule. But this raises the question of what exactly measure is and how to justify Born probabilities rather than some other measure (like one based on having an odd number of socks -- see sec. 3.2). Indeed, Greaves (2004) concludes that using Born probabilities in decision theory may simply need to be taken as something of a primitive
(p. 34), which brings us right back to square one (Why expected value?) except perhaps to the extent that other MWI-based intuitions can be adduced.
Second, even if we agree that we should use Born-rule probabilities, this only applies to physical uncertainties, such as whether an electron will be measured spin-up or spin-down, or whether neurons in my brain will fire in a way that causes me to drive off the side of the road. Ideally, we want to maximize expected values
calculated according to the true Born-rule measures over various worlds. But our probability distributions are not perfect: Much of our uncertainty about the future is not due to quantum splitting but merely our own ignorance, which might not be anything close to the true distribution of measure over outcomes. Moreover, we may assign meta-level probabilities that don't refer to specific outcomes at all (e.g., What's the probability that MWI is false? How likely is this or that law of physics to be true?). The MWI justification for maximizing expected value only holds to the extent that our subjective probability distributions match true quantum measures.
Argument 2: Evidential / Subjunctive Decision Theory
Philosophers debate the relative merits of causal decision theory
-- evaluating actions in terms of the causal impact that they will have to make things better -- against evidential decision theory
-- which says you should do the action that, if chosen, would give you the best expectations for how things will turn out. In his Good and Real, Gary Drescher proposes an intermediate version of these theories based on subjunctive means-ends relations: what would be the case if an action were taken,
where, in making such counterfactual inferences, we give preference to more general and explanatory links (p. 212). If we accept either evidential decision theory or Drescher's subjunctive form, then there may be a further argument for maximizing expected value even in one-time situations.
It runs along the lines of Drescher's defense of altruism based on the Prisoner's Dilemma in Ch. 7 of his book. If you are one of the prisoners, you would like to find it the case that your partner-in-crime wants to cooperate instead of defecting. Since you and he are humans running similar cognitive decision-making algorithms in symmetric situations, the fact that you choose to cooperate is an (acausal) link to his cooperation. Applied more generally, to the extent that your choices to help others reflect upon the outcomes of the decision-making algorithms that others are using, it's good for you to behave altruistically.
Applied to the Red-Button example from before, we can say that even if this is the only time you'll ever have the opportunity to press a button and thereby potentially prevent torture, you would like it to be the case that others, in similar situations, behave the way you did, because aggregated over all such situations, that will prevent more total people from being tortured.
I'm not totally comfortable with the above point, because I'm not sure how far Drescher's idea should be taken. In the Prisoner's Dilemma case, where your opponent is assumed to be symmetric to yourself in all respects, cooperation on evidential / subjunctive grounds is clearly the right answer. But other people in the world are not symmetric copies of me and may be running different decision algorithms. Drescher addresses this objection, but I'm not sure whether I buy his answer.
However, suppose we live in what Nick Bostrom has called a Big World,
i.e., a multiverse of at least Level I in Max Tegmark's classification scheme. This is, as Tegmark notes, rather uncontroversial
among physicists, since it requires only cosmic inflation (p. 1). Then it's not strictly true that other people in the world are not symmetric copies of me
: In fact there is an exact copy of me roughly 10^(10^29) meters away (Parallel Universes,
p. 4). And there are infinitely many such particle-for-particle replicas throughout the multiverse, giving me confidence that if I press Button B, it will in fact be pressed infinitely many times (Infinite Ethics,
p. 39).
How much does this buy us? If we viewed the universe as operating stochastically to determine the outcome of pressing Button B (say, due to quantum collapse), then it would clinch the expected-value-maximizing argument, for then we would have infinitely many repetitions of the random process, and the law of large numbers would practically guarantee that more total tortures were prevented (assuming such a notion makes sense despite the infinite number of tortures prevented in either case).[6] On the other hand, if Button B's outcome was decided in a (non-MWI) deterministic way, the infinite repetitions might not matter, since if the initial particle configurations of the region of the universe containing each copy were the same, the result of the button push in each case would also be the same: Either the 2 million are saved in every instance, or in none.[7] Finally, if we took an MWI view, the case for maximizing expected value would hold, but no more strongly than by Argument 1 alone.
The Big-World argument in the preceding paragraph, like the MWI argument, relies on physical randomness (or apparent randomness) regarding the result of pressing Button B; it doesn't hold when the result of the button push is fixed and we're merely uncertain what it is. Drescher's act as you would like to see others act
does still apply to merely subjective uncertainty, assuming that over the long run, more total good does result when large numbers of people choose actions maximizing expected values according to their subjective (non-physically uncertain) probability distributions. How far this argument for expected-value maximization can be taken depends on how much we accept Drescher's thesis in general.
[1] In mathematical language, this means that we consider a sample space of possible worlds (e.g., one possible world might include a kitten being saved from a tree, while another possible world might involve the same kitten not being saved). We then decide upon an objective function that maps from our sample space to the real numbers (or perhaps the hyperreal numbers or something similar). We then consider some set of possible actions (assumed finite for simplicity) we might take. For each action, we assign a subjective probability distribution to our sample space which recognizes the various possible results of taking that action (e.g., if our action is to call the firefighter, this probability distribution would say how likely it is that the kitten will be saved). So, for each action, our objective function becomes a random variable. Standard decision theory says the following: If, for each action, the objective function has finite expectation, then choose an action whose expectation is maximal.
If we are utilitarians, then our objective function maps from possible worlds to cardinal utility assignments.
[2] This is technically the weak law of large numbers, which holds in more cases than does the strong law.
[3] This number is easily computed by the normal approximation to the binomial distribution. With CheapRelieve, mu = 0.5*10,000 = 5,000, sigma = (10,000*0.5*(1-0.5))^(1/2) = 50, z = (4,900 - 5,000)/50 = -2. The chance is 2.3% that a standard normal random variable will be less than -2.
[4] Consider the difference of two random variables: one binomial(10,000, 0.5) and the other binomial(5,000, 0.5). The probability that the mixed strategy does better is the probability that the difference of these two is less than 2,450. Approximate both as independent normally distributed variables. The difference of the two has variance equal to the sum of the individual variances: 10,000*0.5*(1-0.5) + 5,000*0.5*(1-0.5), which implies sigma = 61.2. mu = 2,500. Our probability is the probability that a standard normal random variable will be below -0.816.
[5] That riskier assets yield higher expected returns is well established. The Capital Asset Pricing Model is one theoretical justification, but the proposition is far weaker. Sufficient conditions are efficient capital markets and risk-averse investors.
[6] This is an interesting result, because it means that people who reject MWI but still believe in a Level-I multiverse (which is a large proportion of physicists) should, if they also accept evidential / subjunctive decision theory, feel the same moral urgency as is felt by people David Pearce has called responsible Everettistas
: People who, say, realize that when they drive their cars, they actually kill pedestrians in some fraction of worlds, instead of merely having a chance of doing so. The non-many-worlders face the same situation due to the law of large numbers: Some fraction of their infinite copies, will almost surely randomly run off the road and hit people. Of course, I don't think there is any additional urgency in post-Everett decision theory, because utilitarians were already concerned with maximizing expected value, and it's important not to give excessive mental weight to tiny-measure worlds.
[7] Of course, there could be exact replicas of myself whose surrounding environment was different. All of my copies would still press Button B, but if their surroundings differed slightly, the result of the button push might indeed vary from one to the next.