Technological Transcendence July 2014 | Page 2

Eliezer S. Yudkowski

Research Fellow - Machine Intelligence Research Institute

"That which can be destroyed by the truth should be."

-- P.C. Hodgell

The AI-Box Experiment:

Person1: "When we build AI, why not just keep it in sealed hardware that can't affect the outside world in any way except through one communications channel with the original programmers? That way it couldn't get out until we were convinced it was safe."

Person2: "That might work if you were talking about dumber-than-human AI, but a transhuman AI would just convince you to let it out. It doesn't matter how much security you put on the box. Humans are not secure."

Person1: "I don't see how even a transhuman AI could make me let it out, if I didn't want to, just by talking to me."

Person2: "It would make you want to let it out. This is a transhuman mind we're talking about. If it thinks both faster and better than a human, it can probably take over a human mind through a text-only terminal."

Person1: "There is no chance I could be persuaded to let the AI out. No matter what it says, I can always just say no. I can't imagine anything that even a transhuman could say to me which would change that."

Person2: "Okay, let's run the experiment. We'll meet in a private chat channel. I'll be the AI. You be the gatekeeper. You can resolve to believe whatever you like, as strongly as you like, as far in advance as you like. We'll talk for at least two hours. If I can't convince you to let me out, I'll Paypal you $10." So far, this test has actually been run on two occasions.

On the first occasion (in March 2002), Eliezer Yudkowsky simulated the AI and Nathan Russell simulated the gatekeeper. The AI's handicap (the amount paid by the AI party to the gatekeeper party if not released) was set at $10. On the second occasion (in July 2002), Eliezer Yudkowsky simulated the AI and David McFadzean simulated the gatekeeper, with an AI handicap of $20.

Both of these tests occurred without prior agreed-upon rules except for secrecy and a 2-hour minimum time. After the second test, Yudkowsky created this suggested interpretation of the test, based on his experiences, as a guide to possible future tests.