Thoughts on Friendly AI
Summary. I think there's a small but non-negligible chance that an artificial general intelligence (AGI) will determine the future course of earth and, potentially, our entire region of the universe. Given the stakes, it may be highly cost-effective to work on ensuring that AGI is developed in a way that is both "friendly" to humans and also does not cause large amounts of suffering to non-human animals (and other sentient beings). The Singularity Institute for Artificial Intelligence (SIAI) seems to be roughly on the right track in this regard; at the very least, it is asking many good questions. I remain concerned about the potential impacts of friendly AI regarding wild animals, lab universes, and religion.
Introduction
There is lots of discussion about friendly artificial intelligence
(FAI)
and related topics available elsewhere online, so I won't go into details
here. For starters, this
is a very brief Singularity introduction, and this presentation fills in some details on the Friendliness problem.
First I'll note that it's not clear that strong AI is even possible, despite the assumptions of most Singulatarians that it must be. (Of course, an AI needn't be conscious in order to vastly impact the universe.) In addition, even if strong AI is possible, humans may never actually implement it if either they never figure out how or if they go extinct before doing so successfully.[0]
Still, I think the probability that humans will create an AGI is not trivially small; I wouldn't put the figure below 0.005 or so. Thus, if the stakes are sufficiently high, work related to friendly AI may have enormous expected value.
And indeed the stakes are astoundingly high. A friendly AI could almost trivially find ways to cure cancer, AIDS, malaria, and countless other problems. Moreover, it would be able to read the entire contents of the Internet and absorb more information than any human or collection of humans possibly could -- allowing it to make more accurate factual assessments than is currently possible about all sorts of possibilities, including those that are infinitely important. Of course, all of this assumes that the AGI would be built to be friendly toward sentient organisms, which is an extremely hard thing to do.
Coherent Extrapolated Volition
One model for friendly AI is Eliezer Yudkowsky's
coherent extrapolated volition
(CEV). A more complete description can be
found in this article,
but "In poetic terms, our coherent extrapolated volition is our wish if we
knew more, thought faster, were more the people we wished we were, had grown
up farther together; where the extrapolation converges rather than diverges,
where our wishes cohere rather than interfere; extrapolated as we wish that
extrapolated, interpreted as we wish that interpreted." (As Eliezer notes,
it's not clear if the volitions of humanity can even be made coherent: "If
humanity's volition is just too chaotic to extrapolate, the attempt to
manifest our coherent extrapolated volition must fail visibly and safely.")
There are other ideas about how FAI should be designed to make it friendly,
but CEV is among the most popular, so I'll focus on it here.[1]
Concern 1: Whose volitions to extrapolate?
CEV would be designed as a dynamic process in which the FAI would extrapolate humanity's volitions slowly at first and then build upon those volitions in order to rewrite its code and improve the extrapolation process in subsequent iterations. So, for instance, if in the first round, humans decided that chimpanzee volitions should be counted (to the extent this is possible), then chimpanzees would be included in the second round.
However, the starting point -- i.e., who will be extrapolated in the first round -- is arbitrary, and we can't rely on the CEV process to decide that for us. The current plan is to extrapolate only humans and allow them to decide whether to include non-human animals in subsequent rounds. But why stop there? Why not only extrapolate humans born in January and allow them to decide whether to include humans born in other months? We might hope that all roads will lead to Rome and that all initial choices of the set of volitions to extrapolate will lead to the same result, but this is far from obvious.
One reply is that we don't understand animal brains well enough to assess their extrapolated volitions, if indeed they have any at all. This may be a fair point, though if accepted, it ought to imply that, e.g., severely intellectually disabled adult humans also will not have their volitions extrapolated. Michael Anissimov acknowledges that this may indeed be a good idea; in fact, he even suggests, it might be safer to use only one human being as the initial input.
Excluding animals may be a necessary hack that has to be done to make things work, and if so, that's fine. Similarly, if we're fishing with a 3-cm net, we have to accept the fact that we'll only catch fish bigger than 3 cm. But if our goal is to determine the average length of fish in the lake, we shouldn't regard the average length of the fish we catch as a good estimator -- we should use our guesses about what fraction of fish in the lake might be smaller than 3 cm to correct the bias as best we can.
Concern 2: Wild animals and lab universes
It's plausible that the lives of most wild animals involve more suffering than happiness; this is especially likely if insects are sentient. On the other hand, most humans value nature highly and would prefer for wildlife to exist. I'm afraid that the CEV of humanity wouldn't give enough consideration to the suffering of wild animals and, even worse, might create vastly more through terraforming, directed panspermia, or sentient computer simulations of nature.
My hope is that this concern would be addressed by the "if we knew more" part of CEV. If humans were more cognizant of wild-animal suffering and were able to more deeply imagine how horrible it is for, say, a frog to be swallowed alive by a snake, then perhaps they would be more reluctant to value "pristine natural environments."
A similar concern relates to lab universes. If anyone were going to create infinitely many new universes in a laboratory, it would probably be an AGI. I'm concerned that humans would find the creation of new universes so exciting, cool, or unusual that they would ignore the fact that they would create an infinite amount of suffering in the process -- and probably far more suffering than happiness.[2]
Again, one hopes that a well designed CEV would help to address this. As mentioned before, CEV would allow humans to better comprehend the seriousness of suffering, free from wishful thinking and other cognitive biases that affect people's judgements on such matters (see section 5 of this piece). It would also ensure that dissenting voices would be heard and given weight, and would help to prevent individual hacker physicists from creating universes on their own in spite of strong humanitarian opposition.
Further, there might be other advanced civilizations in our future light cone. A random extraterrestrial civilization is likely to be less humane than humanity's CEV, and it might be planning to create infinitely many new universes or do other terrible things (like inflict needless suffering on vast numbers of sentients, the way evolution does). A human-designed friendly AI might prevent such things from being carried out.
I suspect that human-type empathy is actually rather rare in the universe; I see little reason to think that advanced intelligence necessarily or even probably leads to concern for the suffering of others. Thus, humans may be, in the words of Princess Leia, the only hope
for the vast numbers of small wild animals throughout the cosmos that have evolved the ability to suffer but not the intelligence to overcome their Darwinian misery. And it would require a friendly superintelligence to dispatch cosmic rescue missions for preventing such suffering throughout our future light cone.
How plausible this last scenario is, I have doubts; I'm skeptical of its seeming sanguine naivety, especially in view of the mediocritarian fact that we find ourselves in a world full of animal suffering, rather than a post-human paradise. These problems are usually harder than we expect, as is illustrated by the numerous unanticipated consequences of seemingly simple engineering fixes that humans have attempted throughout their history.
Of course, these points are arguments in favor of, not against, SIAI's work. If circumspect and altruistic programmers have only a slim shot at success, how much less likely is it that careless programmers will design a benevolent AI. This is to say nothing of what might result from malificent or recklessly indifferent engineers. On the other hand, whether a successful friendly AI actually concentrates its efforts on the types of things I care about -- preventing large amounts of suffering in the multiverse -- may depend highly on the extent to which other humans care about pain experienced by organisms that don't belong to their species (including animals and potentially sentient computer simulations). For this reason, I may personally prefer to support efforts to change people's philosophical outlook on these matters -- since, to me, a friendly AI that neglects wild-animal suffering is just as bad as a friendly AI that tiles the solar system with smiley faces.
Concern 3: Religion
Several religions (e.g., many versions of Christianity) claim to be
grounded in hard evidence. A number of Christian
apologists (e.g., Josh
McDowell and Lee
Strobel) claim that the scientific, historical, philosophical, and
spiritual evidence points toward Christianity. As far as I can tell,
Richard Swinburne
and William
Lane Craig openly endorse Bayesian reasoning (as, I'm sure, do
many other Christians). In Romans 1:20, Paul
notes that [God's] invisible attributes, His eternal power and divine
nature, have been clearly seen, being understood through what has been
made, so that [those who don't believe] are without excuse.
It would
seem, then, that evidence-based apologists should welcome the chance
for an FAI to thoroughly evaluate the facts regarding Christianity
(and other religions). For if Christianity is not true, then it's
probably good for people to become atheists; again quoting Paul, If
Jesus did not Rise from the Dead, then Christian faith is Vain
(1 Corinthians
15:17). But if Christianity is true, the FAI should succeed in
recognizing this, and its conclusion will be powerful evidence
encouraging many current atheists and agnostics to become
Christian.[3]
There are a few concerns that believers in religion (including myself) might have with this proposal.
1. For one thing, nearly all supporters of friendly AI are atheists, and some avowedly hope that humanity will eventually cast off its religious superstitions. To his credit, Eliezer writes, "The programmers [of friendly AI] are under obligation not to be jerks, to craft a reasonable and satisfactory initial dynamic to the best of their abilities [...]." Presumably this would include not programming in their personal antipathy toward religion. But even if the programmers tried to avoid that, it's plausible that anti-religious biases -- other than those grounded in pure Bayesian rationality -- could still find their way into the friendly AI unintentionally (e.g., through implicit materialist assumptions). This seems like an argument for rational religious people to get involved in the theory and development of friendliness -- to make the above scenario less likely to happen. Again to its credit, SIAI has already acknowledged this general concern, pointing out that "Even if the programmers have an unconscious preconception, we have a very conscious prejudice against unconscious preconceptions, and that is something we can deliberately give an AI that is far better at self-awareness than we are."
The most obvious instance where an AI's conclusions could be biased is in the choice of its prior probabilities. Indeed, any desired conclusion could trivially be reached by programming the AI to assign it prior probability 1. Of course, good Bayesians will avoid using extreme priors, but more moderate priors that differ may still lead to differing conclusions. (Theoretically, in the limit as the amount of evidence goes to infinity, all non-extreme Bayesian probability distributions should converge, though I'm not sure how this applies to non-denumerable sample spaces, where almost all sample points must be assigned probability zero to start with.)
One answer is that an AI could be used like a calculator into which one could simply input desired priors and see the posteriors that result from conditioning on all available evidence. This would be enormously helpful in determining how sensitive various beliefs are to choices of prior. If a conclusion were assigned high posterior probability for almost any reasonable choice of prior, a number of present-day factual uncertainties might be easily resolved.
In addition, it may be possible to find a set of priors that many people would agree are correct. Solomonoff induction provides a way to assign priors to hypotheses on the basis of their Kolmogorov complexity, which allows one to make concrete the intuition behind Occam's razor that simpler hypotheses ought to have higher prior probability. (Solomonoff induction is theoretically uncomputable but can be approximated.) A legitimate question remains as to whether algorithmic information theory is the only sound basis for a "universal" prior probability distribution (what's so fundamental about bits and universal Turing machines?), but it seems like a great place to start.
2. A second objection is that God would not reveal himself to a machine in the way he reveals himself to his created creatures, so that the friendly AI would not have the spiritual evidence that humans have access to. This objection fails, however, because the FAI would observe that large numbers of people report spiritual experiences, and these should influence its beliefs just as strongly as would a direct spiritual revelation. As an analogy, imagine that you have an "incommunicable insight" that 1+1=3. Should you give this belief more weight because you experienced it directly than if someone else you know had experienced it (assuming you trust that person's testimony to be completely accurate)?
Still, one might object, God would interfere to obstruct the FAI from seeing the light of Christianity in the same way that ordinary people can -- perhaps by causing it to lie or by messing with its code. Paul notes in 1 Corinthians 1:18-24:
For the word of the cross is foolishness to those who are perishing, but to us who are being saved it is the power of God. For it is written,
I WILL DESTROY THE WISDOM OF THE WISE, AND THE CLEVERNESS OF THE CLEVER I WILL SET ASIDE.Where is the wise man? Where is the scribe? Where is the debater of this age? Has not God made foolish the wisdom of the world? For since in the wisdom of God the world through its wisdom did not come to know God, God was well-pleased through the foolishness of the message preached to save those who believe. For indeed Jews ask for signs and Greeks search for wisdom; but we preach Christ crucified, to Jews a stumbling block and to Gentiles foolishness, but to those who are the called, both Jews and Greeks, Christ the power of God and the wisdom of God.
Perhaps the FAI would recognize this problem ahead of time, but it's doubtful whether it would be able to do anything about it.
3. Religious people might argue that further factual evidence would be of little help in changing people's hearts. For instance, as the Journey Bible observes in its commentary on Exodus 14:31, the ancient Israelites clearly saw God's power when he parted the Red Sea to rescue them from the Egyptians, yet they rebelled in the desert only a short while later. If God really thought that revealing himself more explicitly would make more people love him, he would have done so already; he wouldn't need an FAI to do that (unless he chose to act through the FAI itself?).
This point has some merit; I think it's true that many people
really don't want it to be the case that a Christian God exists. On
the other hand, I know a number of sincere rationalists who,
if they really thought Christianity was true, would probably become
believers
in it. Moreover, if factual evidence isn't needed to
convince skeptics, then what's the point of Christian
apologetics?
4. Moses warned the Israelites, Do not test the Lord your God
[...]
(Deuteronomy 6:16), and this theme is repeated
elsewhere in the Bible. It's an interesting question whether the
scenario proposed above -- build a friendly AI and see what
conclusions it reaches about religion -- would count as testing
God.
To the extent that the AI would just be a really powerful tool for information aggregation and inference, it would do nothing fundamentally different from what our own brains do: namely, applying Bayes' rule to update its priors (though, of course, the AI's calculations would be more exact). However, if the AI also tried to run experiments (say, by coordinating prayer studies), I suppose it might be transgressing the command, depending on how one interprets it. But then the same could be said of Elijah's challenge to Baal.
The AI would be smart enough to realize that cheap tests (e.g.,
challenging God to reveal himself within a week -- see Proof #3 here)
have essentially no evidentiary value with respect to the hypothesis
of a Judeo-Christian God, in view of the Deuteronomy passage
above. (The probability of no revelation within a week is essentially
1 whether or not God exists.) Of course, there are other conceivable
religions according to which such tests would have evidentiary value,
and the AI might find it worthwhile to try such experiments to
evaluate them. It's unclear to me if these tests would violate the
Biblical proscription, which is against testing your God
(i.e.,
the Judeo-Christian God), not other members of the set of possible
Gods
(?).
Finally, below are two further religion-related concerns.
1. Pascal's wager: Suppose the FAI determined that, indeed, Christianity was far more probable than all other religions, especially religions that punish Christians and reward atheists. Then from a Pascalian perspective, everyone ought to become Christian, and it would rather clearly be infinitely bad (in expectation) not to do so. But suppose the FAI further reported that its best-guess, fully updated posterior probability of Christianity was 10^-30 (in contrast to, say, 10^-100 for Hinduism and similarly for the others). For most people, probabilities this low are insignificant enough to ignore, so many people might continue to reject Christianity. (Even Eliezer and other rationalists seem troubled by the idea that Pascalian-type wagers could strongly influence an FAI's decisions.) Some former Christians might even be tempted to become atheists, in view of the low odds of Christianity being correct.
Once again, one might hope that the knew more
part of CEV would help
people to fully comprehend how terrible hell might be. CEV might also
overcome the common tendancy to neglect low-probability,
high-potential-impact events.
2. Hell and Numbers of Births: Suppose the FAI determined that each of Christianity, Islam, and three other religions with eternal hells were equally probable, and that all other religions combined were far less likely. Then, given that one of these five religions was true, even the most faithful adherents of a particular religion would have only a 1/5 chance of avoiding hell. If parents gave birth to new children, then if it were the case that an afterlife existed, those children would have a high probability of spending eternity in torment. If uploaded humans had damnable souls, then the vast potential numbers of them (~10^38 at any given moment, by one estimate) would translate into potentially vast numbers of people enduring eternal suffering. This is not to mention the infinite number of souls that might be created in lab universes.
Presumably an FAI would recognize the above concern and take it into consideration. However, again I'm afraid the volition of humanity wouldn't give the problem enough weight. People naturally have a desire to produce children, and few people (even religious people) feel reluctant to have a child out of concern that it will go to hell.
Infinity, Etc.
Eliezer acknowledges that he's personally an infinite set atheist.
I'm not sure exactly what this stance entails (whether it exludes potential infinities or merely actual infinities), but I'm not quite willing to take such a metaphysical leap of faith -- even though I would love for infinity not to exist (as it would imply that certain scenarios involving infinite suffering are impossible). But what is desirable may not be true; infinite amounts of suffering may actually exist. In fact, a Pascal's-wager-style argument seems to imply that we should only consider possible realities in which infinity is real, since the finite scenarios are negligible by comparison.
This point isn't airtight. We could take it as axiomatic that things can be only finitely valuable. Or maybe we just have to assume certain restrictions to make the math work, and we have no other choice. Infinite consequentialism is indeed a puzzle, plagued by all sorts of paradoxes that need to be worked out. For instance, how do we solve the problem of My infinity is bigger than yours?,
that is, if we allow, say, different cardinalities of infinity, then any chance of a change in utility the size of the continuum swamps all countably large changes; but the same can be said of changes in utility the size of the power set of the continuum, and so on.
This is not an argument against SIAI; indeed, there are few other organizations working so actively on these sorts of questions! A good FAI would have access to all of this literature and would be allowed to make a reasonable decision about whether (and what sorts of) infinities might exist. And Eliezer himself acknowledges, I am not a physicist so my fond hope [of infinite-set atheism] may be ruled out for some reason of which I am not aware.
Conclusion
On balance, FAI appears to be an idea worth supporting. The concerns raised above are important, but in many cases, they are arguments for making sure that FAI is done right, rather than arguments against creating FAI at all.
Backlinks
"Response to 'Thoughts on Friendly AI'" at Accelerating Future.
[0] The doomsday argument (DA) is one reason we might expect humans to go extinct relatively soon. One response to the DA is that we're potentially in a simulation, and simulations of pre-Singularity humans are relatively common. This may weaken the DA, but it doesn't do as much to improve the probability of friendly AI because of the problem of evil: If humans do succeed in creating a friendly AI, it would probably not (we hope) run simulations of the past that contain massive amounts of suffering. Rather, if we are in a simulation, it's most likely indifferent to our welfare, treating emotions (as evolution does) as merely a mechanism for achieving certain behaviors. Of course, the fact that we're potentially simulations by a non-friendly intelligence needn't rule out simulations by friendly intelligences, but it requires that the latter not be overwhelmingly frequent in the multiverse. Also, if we are in a non-friendly simulation, we might be disallowed from, as Carl Shulman said, creating a positive singularity
that requires vast amounts of future computational power.
[1] Eliezer discourages speculation about what a friendly AI implementing CEV might look like: "Arguing about Friendliness is easy, fun, and distracting. Without a technical solution to FAI, it doesn't matter what the would-be designer of a superintelligence wants; those intentions will be irrelevant to the outcome. Arguing over Friendliness content is planning the Victory Party After The Revolution - not just before winning the battle for Friendly AI, but before there is any prospect of the human species putting up a fight before we go down. The goal is not to put up a good fight, but to win, which is much harder. But right now the question is whether the human species can field a non-pathetic force in defense of six billion lives and futures." However, I think discussion of possible outcomes is appropriate. If people believe that, on balance, the impact of a friendly AI is likely to be quite harmful, why should they support it? "Better dead than red," they might say. So making sure such concerns are considered and addressed is important.
[2] The amount of happiness would be infinite, too. I'm assuming a comparison of the relative proportions of happiness and suffering, though the uniqueness of such a comparison may present a problem.
[3] Eliezer made a related comment in his paper on CEV:
I ask myself what advice I would give to [Al-Qaeda] terrorists, if they were programming a superintelligence and honestly wanted not to screw it up, and then that is the advice I follow myself.
The terrorists, I think, would advise me not to trust the self of this passing moment, but try to extrapolate an Eliezer who knew more, thought faster, were more the person I wished I were, had grown up farther together with humanity. Such an Eliezer might be able to leap out of his fundamental errors. And the terrorists, still fearing that I bore too deeply the stamp of my mistakes, would advise me to include all the world in my extrapolation, being unable to advise me to include only Islam.
But perhaps the terrorists are still worried; after all, only a quarter of the world is Islamic. So they would advise me to extrapolate out to medium-distance, even against the force of muddled short-distance opposition, far enough to reach (they think) the coherence of all seeing the light of Islam.