Why Maximize Expected Value?

Standard Bayesian decision theory tells us to maximize the expected value of our actions.[1] For instance, suppose we see a number of kittens stuck in trees, and we decide that saving some number n of kittens is n times as good as saving one kitten. Then, if we are faced with the choice of either saving a single kitten with certainty or having a 50-50 shot at saving three kittens (where, if we fail, we save no kittens), then we ought to try to save the three kittens, because doing so has expected value 1.5 (= 3*0.5 + 0*0.5), rather than the expected value of 1 (= 1*1) associated with saving the single kitten. But why expected value? Why not instead maximize some other function of probabilities and values? I present two intuitive argument in this piece.

A Fictional Example

An unknown disease has broken out among the 20,000 inhabitants of a small island. The disease is highly contagious: it spreads to everyone on the island before anyone detects it. Fortunately, because the island is isolated, there is no danger that the disease will spread to other parts of the world. Unfortunately for the islanders themselves, the disease is also 100% fatal, and each person now has only three days to live.

The world medical community has no drugs to treat the disease, or even to stave off its fatal side effects. Nonetheless, medical teams are dispatched to the island in order to provide palliative care. The medical teams have a limited budget of $10,000 with which to buy analgesics that, if successful, will alleviate the painfulness of death by the disease. You, the director of the medical team, are deciding which of two possible medicines to buy.

Since you believe that pointless suffering prior to death is equally bad regardless of which of the islanders experiences it, you are of the opinion that successfully treating n people is n times as good as successfully treating one person. You reason as follows: "If we buy SureRelieve, we are guaranteed to prevent the suffering of 10,000/2.04 = 4,900 people. If we choose CheapRelieve, we'll be able to buy 10,000 treatments, but it's unclear how many people we'll help. Since each treatment has a 50% chance of success, the expected value of the number of people helped is 10,000*0.5 + 0*0.5 = 5,000. This is higher than 4,900, so we should buy CheapRelieve."

But what if lots more medicines fail than expected? What if, say, only 4,800 of them work? Then we will have "gambled away" treatments that could have helped 100 people. Isn't it better to stick with the safe bet?

Point 1: Take a Vote

Suppose we don't decide ahead of time which of the islanders will get the treatments we buy. Then if we have t treatments, the probability is t/20,000 that any individual will get a treatment. We then take a poll of the islanders to ask if they would prefer having the medical team buy all SureRelieve, all CheapRelieve, or some combination of both.

If the islanders vote for the option that maximizes their probability of being successfully treated, then they will all vote to buy all CheapRelieve. This follows from a simple

Theorem. Suppose there are N organisms who will experience some amount of brutal pain unless they receive help. Let T be a random variable for the number of organisms--randomly chosen from the N organisms--that will successfully avoid the painful experience by receiving help. (T is always less than or equal to N.) Then the probability that any organism avoids the pain is E(T) / N, where E(T) denotes the expected value of T. In particular, the probability of avoiding the pain always increases as E(T) increases, regardless of the variance of T.

We can also apply this thought to the kitten example from before. Suppose you're one of the kittens, and you're deciding whether you want your potential rescuer to save one of the three or take a 50-50 shot at saving all three. In the former case, the probability is 1/3 that you'll be saved. In the latter case, the probability is 1 that you'll be saved if the rescuer is successful and 0 if not. Since each of these is equally likely, your overall probability of being saved is (1/2)*1 + (1/2)*0 = 1/2, which is bigger than 1/3.

I should note that in practice people in situations like that of the islanders may not actually choose the option that maximizes their probability of being helped, perhaps on account of ambiguity aversion, as illustrated in the Ellsberg paradox. Not knowing how many total successful treatments are available may be more ambiguous than knowing the actual number of treatments and merely being uncertain about who will receive them.

Point 2: The Law of Large Numbers

The above point works well in situations where the potential benefits being distributed are equal, so that people care only about their probability of receiving the benefit. But what about situations where potential benefits are unequal--e.g., preventing someone from getting a cold versus preventing someone from getting malaria? Clearly it's not desirable for people merely to choose the option that maximizes their probability of getting some treatment, because, e.g., a probability 1/2 of avoiding the common cold is clearly not better than a probability 1/3 of avoiding malaria. We need to impose some utility function on different outcomes that specifies how much better malaria prevention is than cold prevention.

If we randomly distributed cold-prevention and malaria-prevention among a group of people who maximized their expected individual utility, then it's not hard to show that they would prefer the treatment method that maximized the expected utility of the whole group. But this begs the question, because we need to understand why people would want to maximize their expected individual utility.

The reason that is usually put forward is that, when decisions are made repeatedly regarding some random event, maximizing expected value makes it probable that, over long periods of time, you'll maximize the actual average value. This follows from the law of large numbers, which says that if we do enough uncorrelated random trials (e.g., flipping a die enough times), we can become as certain as we like that the actual average value we observe in our trials (e.g., the average of the dice rolls that we make) will be as close as we like to the expected value (which, in this case, is 3.5 = 1*(1/6) + 2*(1/6) + ... + 6*(1/6)).[2]

In the island disease example, the number of people treated by CheapRelieve is a sum of 10,000 random outcomes. This is a "large number," which means the probability that the actual number of people treated deviates significantly from 5,000 is small. In fact, the chance is only 2.3% that CheapRelieve will successfully treat fewer people than SureRelieve.[3]

What about Mixed Strategies?

For instance, why not spend $5,000 on SureRelieve and $5,000 on CheapRelieve? With this strategy, you can buy 2,450 SureRelieve treatments and 5,000 CheapRelieve treatments. The expected number of people helped is 2,450 + 0.5*5,000 = 4,950. Here, we've bought a little bit of "insurance" against extremely low numbers of people helped, but at the cost of the chance to actually help more people. Even here, the chance is only 21% that our mixed strategy will help more people than the riskier strategy.[4]

If we had spent less than 50% of our budget on SureRelieve, this gap in expected values would have narrowed, but our insurance would have declined along with it. I see no reason to prefer a mixed strategy: if buying some CheapRelieve will help more than buying no CheapRelieve, then buying all CheapRelieve will be even better. If the improvement of buying all CheapRelieve over mostly CheapRelieve is hard to see with only 10,000 people getting treatments, then consider 10 trillion or 10 googol. In those cases, it's practically guaranteed that you'll help more people by buying all CheapRelieve.

Implications

Now consider the following. You are again the medical-project director, and you discover that you've gotten an extra donation of $51 with which to buy more medicines. If you buy the SureRelive, you'll be guaranteed to help 51/2.04 = 25 people. If you buy CheapRelieve, the expected number of people you'll help is 25.5. But now, there's a 44% chance that CheapRelieve will help fewer people, perhaps several fewer. Do you decide that, unlike before, this case is too risky, so it's best to play it safe?

Hopefully not. The extra $51 is not isolated; it's part of the overall budget. If you had started out with a budget of $10,051, the no-mixed-strategies argument above says that you should have used all of it to buy CheapRelieve, because that would have almost guaranteed a better outcome, possibly much better.

It's tempting to be risk-averse with our charitable actions. For instance, suppose we decide to invest $1,000 in the capital markets while we wait to donate it to a humanitarian group. We might say to ourselves, "This money is for an important purpose. I would feel so bad if I invested it in a fund that tanked and lost most of its value. No, I'm going to stick with safe investments that will guarantee that this money gets to those who need it. I'll invest it in government bonds, rather than some high-risk stock or derivative security." We might proceed to invest the money, earn 4% interest over the next year, and donate the $1,040 to our favorite charity, feeling good about ourselves the whole time.

But how is this example different from the medical-program director who gets the extra $51 donation? Some of the work that our charity does will happen with or without our donation; we'll just be expanding the amount of work that the charity can do. From this broader perspective, it won't be catastrophic if our $10,000 disappears, because other money will still be there. But if we achieve really high returns from our risky investment[5], we will have done a lot more good.

Isolated Actions

The long-run-average idea applies to cases in which our donations or actions will be one part of a larger ensemble of actions. But what if that isn't the case? What if we encounter a one-time all-or-nothing situation in which we can't rest assured that the law of large numbers will make things work out okay overall?

Scenario. You are the only sentient organism in the universe, but you learn that, at 5 p.m. tomorrow, 2 million people will come into existence for an hour, be brutally tortured, and then vanish again. No other sentient organisms will exist afterwards.

You discover a certain box that has two buttons, one red and one blue. The Red Button, if pressed, has a one-in-a-million chance of preventing all two million of the people from being tortured; instead, they'll come into existence for an hour and read the newspaper before vanishing. If the Blue Button is pressed, it will, with certainty, allow exactly one of the two million people to avoid torment and instead read the newspaper. You can only press one button because once one of these two buttons is pressed, the box vanishes forever.

Here, the argument about long-run averages doesn't apply because there are no repetitions of the event. The "take a vote" argument would apply, if we could poll in advance the 2 million people that would be coming into existence. However, it's possible to devise more complicated though experiments in which this argument, too, would break down. At that point, accepting the expected-value criterion would simply be a matter of intuition. My intuition tells me that the potential good accomplished by the Red Button is so great that a chance for it shouldn't be forgone. However, I have no further intuitive appeals to readers who disagree. Fortunately, many situations in the real world are not one-time all-or-nothing scenarios.

Infinite Outcomes

As William Feller notes on p. 251 of An Introduction to Probability Theory and Its Applications, the weak law of large numbers fails for random variables with infinite expectation, so the long-run-average argument falls through. Similarly, the von-Neumann Morgenstern expected-utility theorem, which is also sometimes invoked, relies on a continuity axiom that fails to hold when we allow infinitely large utility values (without also allowing infinitesimal probabilities). See this section of "A Defense of Pascal's Wager" for some approaches to infinite decision theory.

Further Reading

This piece overlaps substantially with "The Case for Risky Investments."


[1] In mathematical language, this means that we consider a sample space of possible worlds (e.g., one possible world might include a kitten being saved from a tree, while another possible world might involve the same kitten not being saved). We then decide upon an objective function that maps from our sample space to the real numbers (or perhaps the hyperreal numbers or something similar). We then consider some set of possible actions (assumed finite for simplicity) we might take. For each action, we assign a subjective probability distribution to our sample space which recognizes the various possible results of taking that action (e.g., if our action is to call the firefighter, this probability distribution would say how likely it is that the kitten will be saved). So, for each action, our objective function becomes a random variable. Standard decision theory says the following: If, for each action, the objective function has finite expectation, then choose an action whose expectation is maximal.

If we are utilitarians, then our objective function maps from possible worlds to cardinal utility assignments.

[2] This is technically the weak law of large numbers, which holds in more cases than does the strong law.

[3] This number is easily computed by the normal approximation to the binomial distribution. With CheapRelieve, mu = 0.5*10,000 = 5,000, sigma = (10,000*0.5*(1-0.5))^(1/2) = 50, z = (4,900 - 5,000)/50 = -2. The chance is 2.3% that a standard normal random variable will be less than -2.

[4] Consider the difference of two random variables: one binomial(10,000, 0.5) and the other binomial(5,000, 0.5). The probability that the mixed strategy does better is the probability that the difference of these two is less than 2,450. Approximate both as independent normally distributed variables. The difference of the two has variance equal to the sum of the individual variances: 10,000*0.5*(1-0.5) + 5,000*0.5*(1-0.5), which implies sigma = 61.2. mu = 2,500. Our probability is the probability that a standard normal random variable will be below -0.816.

[5] That riskier assets yield higher expected returns is well established. The Capital Asset Pricing Model is one theoretical justification, but the proposition is far weaker. Sufficient conditions are efficient capital markets and risk-averse investors.