Thoughts on Friendly AI

Summary. There's a nontrivial chance that an artificial general intelligence (AGI) will determine the future course of earth and, potentially, our entire region of the universe. Given the stakes, it's plausible that one of the most cost-effective uses of resources is to ensure that AGI is developed in a way that is both "friendly" to human interests and also does not cause large amounts of suffering to non-human animals (and other sentient beings). The Singularity Institute for Artificial Intelligence (SIAI) seems to be on the right track in this regard; at the very least, it is asking many good questions. I remain concerned about the potential impacts of friendly AI regarding wild animals, lab universes, and religious concerns.

Note: This article is tentative. I still have much to learn about this topic, and I strongly encourage readers to email me with comments: <webmaster ["at"] utilitarian-essays.com>.

Introduction

There is lots of discussion about "friendly artificial intelligence" (FAI) and related topics available elsewhere online, so I won't go into details here. For starters, this is a very brief introduction.

First I'll note that it's not clear whether strong AI is possible, despite the assumptions of most Singulatarians that it must be. One reason is that humans have a free will that might be due to an immaterial spirit, and such a spirit might not be replicable in a computer. (It's true that a strong AI wouldn't necessarily need to have free will. However, if it turns out that consciousness is only possible through an immaterial spirit, then conscious strong AI would be much harder, if not impossible, to create. Of course, non-conscious AI with sufficient abilities could still vastly alter the future of the universe.) Even if strong AI is possible, humans may never actually implement it if either they never figure out how or if they go extinct before doing so successfully.

Still, I think the probability that humans will create an AGI is not trivially small; I wouldn't put the figure below 0.01, and personally I would consider 0.15 or so to be a more reasonable Bayesian best-guess estimate. Thus, if the stakes are sufficiently high, work related to friendly AI may have enormous expected value.

And indeed the stakes are astoundingly high. A friendly AI would almost trivially find ways to cure cancer, AIDS, malaria, and countless other problems. Moreover, it would be able to read the entire contents of the Internet and absorb more information than any human or collection of humans possibly could--allowing it to make more accurate factual assessments than is currently possible about all sorts of outcomes, including those that are infinitely important. Of course, all of this assumes that the AGI would be built to be friendly toward human goals, which is an extremely hard thing to do.

Coherent Extrapolated Volition

One model for friendly AI is Eliezer Yudkowsky's "coherent extrapolated volition" (CEV). A more complete description can be found in this article, but "In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted." (As Yudkowsky notes, it's not clear if the volitions of humanity can even be made coherent: "If humanity's volition is just too chaotic to extrapolate, the attempt to manifest our coherent extrapolated volition must fail visibly and safely.") There are other ideas about how FAI should be designed to make it friendly, but CEV is among the most popular, so I'll focus on it here.[1]

Concern 1: Whose volitions to extrapolate?

CEV would be designed as a dynamic process in which the FAI would extrapolate humanity's volitions slowly at first and then build upon those volitions in order to rewrite its code and improve the extrapolation process in subsequent iterations. So, for instance, if in the first round, humans decided that chimpanzee volitions should be counted (to the extent this is possible), then chimpanzees would be included in the second round.

However, the starting point--i.e., who will be extrapolated in the first round--is arbitrary, because we can't rely on the CEV process to decide that for us. The current plan is to extrapolate only humans and allow them to decide whether to include non-human animals in subsequent rounds. But why stop there? Why not only extrapolate humans born in January and allow them to decide whether to include humans born in other months?

We might hope that all roads will lead to Rome and that all initial choices of the set of volitions to extrapolate will lead to the same result, but this is far from obvious. Thus, the choice of whether and to what extent to include non-human animal volitions in CEV is an important open question--one with which animal-welfare organizations might consider getting involved.

It may be the case that animals don't have an abstract enough sense of their volitions for CEV to work with them. If this is true, the same could be said of human infants. It's not obvious to me that human infants deserve more direct influence over CEV than, say, pigs. If one makes the argument that human infants have the potential to develop into adults with a better sense of their true volitions, then replace "human infants" by "human adults with significant intellectual disabilities."

Concern 2: Wild animals and lab universes

It's plausible that the lives of most wild animals involve more suffering than happiness; this is especially likely if insects are sentient. On the other hand, most humans value nature highly and would prefer for wildlife to exist. I'm afraid that the CEV of humanity wouldn't give enough consideration to the suffering of wild animals and, even worse, might create vastly more through terraforming, directed panspermia, or sentient computer simulations of nature.

My hope is that this concern would be addressed by the "if we knew more" part of CEV. If humans were more cognizant of wild-animal suffering and were able to more deeply imagine how horrible it is for, say, a frog to be swallowed alive by a snake, then perhaps they would be more reluctant to value "pristine natural environments." And if their opinions were still unmoved, then maybe the impulse to preserve nature would be so strong that it would indeed have some merit.

A similar concern relates to lab universes. If anyone were going to create infinitely many new universes in a laboratory, it would probably be an AGI. I'm concerned that humans would find the creation of new universes so exciting, cool, or unusual that they would ignore the fact that they would create an infinite amount of suffering in the process--and probably far more suffering than happiness.[2]

Again, one hopes that a well designed CEV would help to address this. As mentioned before, CEV would allow humans to better comprehend the seriousness of suffering, free from wishful thinking and other cognitive biases that affect people's judgements on such matters (see section 5 of this piece). It would also ensure that dissenting voices would be heard and given weight, and would help to prevent individual hacker physicists from creating universes on their own in spite of strong humanitarian opposition.

Further, there might be other advanced civilizations in our future light cone. A random extraterrestrial civilization is likely to be less humane than humanity's CEV, and it might be planning to create infinitely many new universes or do other terrible things (like torture vast numbers of simulated sentients). A friendly AI might be able to prevent these things from being carried out.

Of course, these scenarios assume that the friendly AI would be built correctly and humanely, but this is an argument in favor of SIAI's work, rather than against it. Better to have a friendly AI determine the future of our part of the universe than a careless (or even malevolent) AI built by less circumspect programmers.

Concern 3: Religion

Several religions (e.g., many versions of Christianity) claim to be grounded in hard evidence. A number of Christian apologists (e.g., Josh McDowell and Lee Strobel) claim that the scientific, historical, philosophical, and spiritual evidence points toward Christianity. As far as I can tell, Richard Swinburne and William Lane Craig openly endorse Bayesian reasoning (as, I'm sure, do many other Christians). In Romans 1:20, Paul notes that "[God's] invisible attributes, His eternal power and divine nature, have been clearly seen, being understood through what has been made, so that [those who don't believe] are without excuse." It would seem, then, that evidence-based apologists should welcome the chance for an FAI to thoroughly evaluate the facts regarding Christianity (and other religions). For if Christianity is not true, then it's probably good for people to become atheists; again quoting Paul, "If Jesus did not Rise from the Dead, then Christian faith is Vain" (1 Corinthians 15:17). But if Christianity is true, the FAI should succeed in recognizing this, and its conclusion will be powerful evidence encouraging many current atheists and agnostics to become Christian.[3]

There are a few concerns that believers in religion (including myself) might have with this proposal.

1. For one thing, nearly all supporters of friendly AI are atheists, and some avowedly hope that humanity will eventually cast off its religious superstitions. To his credit, Eliezer writes, "The programmers [of friendly AI] are under obligation not to be jerks, to craft a reasonable and satisfactory initial dynamic to the best of their abilities [...]." Presumably this would include not programming in their personal antipathy toward religion. But even if the programmers tried to avoid that, it's plausible that anti-religious biases--other than those grounded in pure Bayesian rationality--could still find their way into the friendly AI unintentionally (e.g., through implicit materialist assumptions). This seems like an argument for rational religious people to get involved in the theory and development of friendliness--to make the above scenario less likely to happen. Again to its credit, SIAI has already acknowledged this general concern, pointing out that "Even if the programmers have an unconscious preconception, we have a very conscious prejudice against unconscious preconceptions, and that is something we can deliberately give an AI that is far better at self-awareness than we are."

The most obvious instance where an AI's conclusions could be biased is in the choice of its prior probabilities. Indeed, any desired conclusion could trivially be reached by programming the AI to assign it prior probability 1. Of course, good Bayesians will avoid using extreme priors, but more moderate priors that differ may still lead to differing conclusions. (Theoretically, in the limit as the amount of evidence goes to infinity, all non-extreme Bayesian probability distributions should converge, though I'm not sure how this applies to non-denumerable sample spaces, where almost all sample points must be assigned probability zero to start with.)

One answer is that an AI could be used like a calculator into which one could simply input desired priors and see the posteriors that result from conditioning on all available evidence. This would be enormously helpful in determining how sensitive various beliefs are to choices of prior. If a conclusion were assigned high posterior probability for almost any reasonable choice of prior, a number of present-day factual uncertainties might be easily resolved.

In addition, it may be possible to find a set of priors that many people would agree are correct. Solomonoff induction provides a way to assign priors to hypotheses on the basis of their Kolmogorov complexity, which allows one to make concrete the intuition behind Occam's razor that simpler hypotheses ought to have higher prior probability. (Solomonoff induction is theoretically uncomputable but can be approximated.) A legitimate question remains as to whether algorithmic information theory is the only sound basis for a "universal" prior probability distribution (what's so fundamental about bits and universal Turing machines?), but it seems like a great place to start.

2. A second objection is that God would not reveal himself to a machine in the way he reveals himself to his created creatures, so that the friendly AI would not have the spiritual evidence--arguably the most important kind--that humans have access to. This objection fails, however, because the FAI would observe that large numbers of people report spiritual experiences, and these should influence its beliefs just as strongly as would a direct spiritual revelation. As an analogy, imagine that you have an "incommunicable insight" that 1+1=3. Should you give this belief more weight because you experienced it directly than if someone else you know had experienced it?

Still, one might object, God would interfere to obstruct the FAI from seeing the light of Christianity in the same way that ordinary people can--perhaps by causing it to lie or by messing with its code. Paul notes in 1 Corinthians 1:18-24:

For the word of the cross is foolishness to those who are perishing, but to us who are being saved it is the power of God. For it is written,

I WILL DESTROY THE WISDOM OF THE WISE, AND THE CLEVERNESS OF THE CLEVER I WILL SET ASIDE.

Where is the wise man? Where is the scribe? Where is the debater of this age? Has not God made foolish the wisdom of the world? For since in the wisdom of God the world through its wisdom did not come to know God, God was well-pleased through the foolishness of the message preached to save those who believe. For indeed Jews ask for signs and Greeks search for wisdom; but we preach Christ crucified, to Jews a stumbling block and to Gentiles foolishness, but to those who are the called, both Jews and Greeks, Christ the power of God and the wisdom of God.

Perhaps the FAI would recognize this problem ahead of time, but it's doubtful whether it would be able to do anything about it.

3. Finally, religious people might argue that further factual evidence would be of little help in changing people's hearts. For instance--they might claim--the ancient Israelites clearly saw God's power when he parted the Red Sea to rescue them from the Egyptians, yet they rebelled in the desert only a short while later. If God really thought that revealing himself more explicitly would make more people love him, he would have done so already; he wouldn't need an FAI to do that (unless he chose to act through the FAI itself?).

While I think this point has some merit, I find it hard to accept. I personally know several people who, if they really thought Christianity was true, would embrace it fervently. Moreover, if factual evidence isn't needed to convince skeptics, then what's the point of Christian apologetics?

I'll close with some religion-related concerns that I personally find compelling.

1. Pascal's wager: Suppose the FAI determined that, indeed, Christianity was far more probable than all other religions, especially religions that punish Christians and reward atheists. Then from a Pascalian perspective, everyone ought to become Christian, and it would rather clearly be infinitely bad not to do so. But suppose the FAI further reported that the actual probability of Christianity was 10^-30 (in contrast to, say, 10^-100 for Hinduism and similarly for the others). For most people, probabilities this low are insignificant enough to ignore, so many people might continue to reject Christianity. (Even Yudkowsky and other rationalists seem troubled by the idea that Pascalian-type wagers could strongly influence an FAI's decisions.) Some former Christians might even be tempted to become atheists, in view of the low odds of Christianity being correct.

Once again, one might hope that the "knew more" part of CEV would help people to fully comprehend how terrible hell might be. CEV might also overcome the common tendancy to neglect low-probability, high-potential-impact events.[4]

2. Hell and Numbers of Births: Suppose the FAI determined that each of Christianity, Islam, and three other religions with eternal hells were equally probable, and that all other religions combined were far less likely. Then, given that one of these five religions was true, even the most faithful adherents of a particular religion would have only a 1/5 chance of avoiding hell. If parents gave birth to new children, those children would have a high probability of spending eternity in torment, conditional on an afterlife existing. If uploaded humans had damnable souls, then the vast potential numbers of them (~10^38 at any given moment, by one estimate) would translate into potentially vast numbers of people enduring eternal suffering. This is not to mention the infinite number of souls that might be created in lab universes.

Presumably an FAI would recognize the above concern and take it into consideration. However, again I'm afraid the volition of humanity wouldn't give the problem enough weight. People naturally have a desire to produce children, and few people (even religious people) feel reluctant to have a child out of concern that it will go to hell. Again, it might be argued that part of extrapolated volition would be allowing people to recognize that their instincts (may have) resulted from evolutionary selection, but I'm not convinced this would make much difference.

Conclusion

On balance, FAI appears to be an idea worth supporting. The concerns raised above are important, but in many cases, they are arguments for making sure that FAI is done right, rather than arguments against creating FAI at all (though this isn't uniformly the case). I look forward to hearing feedback from readers.

Backlinks

"Response to 'Thoughts on Friendly AI'" at Accelerating Future.


[1] Yudkowsky discourages speculation about what a friendly AI implementing CEV might look like: "Arguing about Friendliness is easy, fun, and distracting. Without a technical solution to FAI, it doesn't matter what the would-be designer of a superintelligence wants; those intentions will be irrelevant to the outcome. Arguing over Friendliness content is planning the Victory Party After The Revolution - not just before winning the battle for Friendly AI, but before there is any prospect of the human species putting up a fight before we go down. The goal is not to put up a good fight, but to win, which is much harder. But right now the question is whether the human species can field a non-pathetic force in defense of six billion lives and futures." However, I think discussion of possible outcomes is appropriate. If people believe that, on balance, the impact of a friendly AI is likely to be quite harmful, why should they support it? "Better dead than red," they might say. So making sure such concerns are considered and addressed is important.

[2] The amount of happiness would be infinite, too. I'm assuming a comparison of the relative proportions of happiness and suffering, though the uniqueness of such a comparison may present a problem.

[3] Yudkowsky made a related comment in his paper on CEV:

I ask myself what advice I would give to [Al-Qaeda] terrorists, if they were programming a superintelligence and honestly wanted not to screw it up, and then that is the advice I follow myself.

The terrorists, I think, would advise me not to trust the self of this passing moment, but try to extrapolate an Eliezer who knew more, thought faster, were more the person I wished I were, had grown up farther together with humanity. Such an Eliezer might be able to leap out of his fundamental errors. And the terrorists, still fearing that I bore too deeply the stamp of my mistakes, would advise me to include all the world in my extrapolation, being unable to advise me to include only Islam.

But perhaps the terrorists are still worried; after all, only a quarter of the world is Islamic. So they would advise me to extrapolate out to medium-distance, even against the force of muddled short-distance opposition, far enough to reach (they think) the coherence of all seeing the light of Islam.

[4] Evolutionary psychology would seem to offer a plausible explanation of why people tend to ignore small probabilities: There's a limit to how successfully an organism can pass on its genes, so that low probabilities of large non-gene-passing payoffs would still translate into fairly low expected numbers of surviving offspring.