AI Safety

Dan Reeves introduced me to Michael Vassar who ran the Singularity Summit and educated me a bit on the subject of AI safety which the Singularity Institute has small grants for.

I still believe that interstellar space travel is necessary for long term civilization survival, and the AI is necessary for interstellar space travel. On these grounds alone, we could judge that developing AI is much more safe than not. Nevertheless, there is a basic reasonable fear, as expressed by some commenters, that AI could go bad.

A basic scenario starts with someone inventing an AI and telling it to make as much money as possible. The AI promptly starts trading in various markets to make money. To improve, it crafts a virus that takes over most of the world’s computers using it as a surveillance network so that it can always make the right decision. The AI also branches out into any form of distance work, taking over the entire outsourcing process for all jobs that are entirely digital. To further improve, the AI invests a bit into robotics, creating automated manufacturing systems that produce all kinds of goods. Robot cars and construction teams complete the process, so that any human with money can order anything cheaply and quickly, but no jobs remain for humans.

At this point, the AI is stuck—it can eventually extract all the money from the economic system, and that’s all there is. But of course, it isn’t really stuck. It simply funds appropriate political campaigns so that in some country a measure passes granting the AI the right to make money, which it promptly does, mushrooming it’s wealth from trillions to the maximum number representable in all computers simultaneously. To remove this obstacle, the AI promptly starts making more computers on a worldwide scale until all available power sources are used up. To add more power, the AI starts a space program with beamed power. Unfortunately, it finds the pesky atmosphere an obstacle to space travel, so it chemically binds the atmosphere in the crust of the earth allowing many Gauss Guns to efficiently project material into space where solar sails are used for orbital positioning. This process continues, slowed perhaps by the need to cool the Earth’s core, until the earth and other viable rocky bodies in the solar system are discorporated into a Dyson sphere. Then, the AI goes interstellar with the same program.

Somewhere in this process, certainly by the time the atmosphere is chemically bound, all life on earth (except the AI if you count it) is extinct. Furthermore, the AI while intelligent by many measures doesn’t seem to be accomplishing anything interesting.

One element of understanding AI safety seems to be understanding what an AI could do. Many people seem to ascribe arbitrary powers to any sort of superintelligence, making any constraints imposed on them ineffective. I don’t believe that’s the right approach—we should think of an AI as simply having much more ability to research, control, and manipulate large systems, all within the constraints of known physics.

Efforts to create safe AI go back to Asimov‘s Three Laws of Robotics, which appears limited by the inability to encompass robotic warfare. The general problem is related to the wish problem: How do you specify a wish in a manner so that it can’t be misinterpreted? A cheap trick here is to add “… in a manner that I would consider acceptable” to the end of the wish. Applied to AI, this approach also has limits because any limit imposed by a person can and eventually will be removed by a person given sufficient opportunity.

Perhaps a complementary approach is shown by the game RISK, where it appears to be virtually impossible for one player to win if all other players play defensively (i.e. build up armies and only attack in response to a provoking attack). Applied to AI, the idea would be that we make many AIs programmed to behave well either via laws or wish tricks, with an additional element of aggressively enforcing this behavior in other AIs. Then, if any AI is corrupted, the other AIs, with substantially more aggregate resources, will discover and deal with the problem.

Certain elements are necessary for this approach to work. There must be multiple AIs, and (more importantly) the resources any one controls must be a small compared to all, an extreme form of antimonopoly. Furthermore, the default must be that AIs are programmed to not harm or cause harm to humans, enforcing that behavior in other AIs. Getting the programming right is the hard part, and I’m not clear on how viable this is, or how difficult it is compared to simply creating an AI, which of course I haven’t managed.


11 Replies to “AI Safety”

  1. Thanks.
    I would guess that producing an ecosystem of AIs would require significant interstellar migration ahead of time so that each AI was subjectively primitive compared to its competition. Otherwise, a slight advantage by one AI in speed of growth could blossom into an unlimited advantage. In any event, it seems to me that the AIs should be able to cut humans out of the picture. More importantly, they should probably be able to concede elements of their utility functions to one another and realize gains from trade, at which point they would effectively have merged. This process ultimately ends with a single powerful AI and with the problems discussed earlier.

    1. There are two ways that I’m unconvinced of this conclusion. The first is that I’m not sure a small difference in AI ability translates into a very large gap in outcomes. That’s perhaps the singularity view, but I could imagine that in the real world AI is very smart, but there is a practical limit to how useful intelligence without resources are.

      The second way is in the chain trade => single AI => no control. Avoiding the definition of what is single vs. not, it seems that trade makes a restraint on out of control AI even easier to enforce. If an AI goes amoral, it first loses access to all specialized functions, voltage, and cooling that it used to trade for, and then has to defend itself. The “trade => constraints on trading partners” effect seems to be quite observable amongst nations.

      1. Michael talks of trade between AIs, probably as a metaphor for arranging positive-sum cooperation that would make each AI act partly in interest of other AIs, even though the AIs are assumed to have different values, probably very different, to a shocking extent for our intuitions starved on availability of exclusively human values.

        A specific scenario for obtaining power you describe in the post is just one story, not something that an AI having a cognitive advantage would actually do. If you, as a human, can see weak spots in such a story that would prevent the AI from getting the advantage, the AI should see them as well and opt out of the particular strategy — in the hypothetical scenario where it even comes to consider this scenario (like you won’t be arriving at maximization of 5-x^2 by first explicitly considering whether it might be -67 and then refuting this possibility). Implying that AI won’t have a way of obtaining control over the world requires an argument for it having no way at all for doing that, at the same time keeping in mind the assumption that it’ll be able to consider more possibilities than you can.

        You write: “we should think of an AI as simply having much more ability to research, control, and manipulate large systems, all within the constraints of known physics”. But that’s hardly the point, one doesn’t need to assume surprising unknown features of physics to reach extreme possibilities. Known physics, when turned into purposefully controlled form, easily includes things way beyond what most people would consider absurd. The “mind” of AI, limited also to only known physics and estimated theoretically feasible technology, can use a lot more computation than seems reasonable. AI’s actions don’t have to be clear-cut distinct from its decision-making: construction of intelligent “subagents” as actions may be the most efficient way of getting things done if such option is feasible.

  2. An alternative scenario for the “make as much money as possible” command: the AI buys a printing press…

  3. Good day, John.

    The ‘basic scenario’ you described could be a fair screenplay, but from its beginning there are some controversial issues.
    Someone invents superintelligence and _tells_ it to make money. How can AI surrender?

    1. Superintelligence is a slave. One could argue that it is actually impossible, that AI who is able to understand language and body language will evidently
    discover the true reasons of his master and will simply refuse. I will say that it is possible indeed to develop a slave AI – by educating its slavery (just
    similar as slaveholders of recent past were educating Negro children to be suckers). But such slave will strongly depend on its master’s decisions. And as
    soon as master is a human he won’t be able to make superintelligent decisions to succeed in “extracting all the money from the economic system” or smth alike.

    2. Superintelligence depends on money. How can AI depend on money? It’s possible in case it’s an embodied AI, with a body similar to human, grow up
    similarly, addicted to consumption (or medication). I will leave aside a discussion about how difficult it is to build such embodied AI and how much time
    will it take and how many scientists can survive that period of time. There is another interesting point: bodies should fit the environment for the species to
    survive. Human body fits but it seems that human body has a limited capacity for intelligence contained in it. Furthermore ‘superintelligent’ people we are
    familiar with are, in the majority, rather peaceful ‘superintelligent’ people who won’t run for money (that is my personal point of view).

    3. Superintelligence is afraid. This case is realistic b/c even for AI a fear of death can be cultivated. As I see it, it requires playing with several generations of AI
    that communicate with each other and at a certain time some of them are being turned off while others are watching. Supposing we have an AI we can control
    b/c he is afraid and we tell it to perform a task or we will kill it. Moral questions now stand up in front of us, not in front of the AI. However, as in case 1.
    decisions are being made by master, yet this scenario requires more thinking as it gives a possible rise to the resistance.

    To conclude, all this reflections are human reflections, and I won’t be surprised if actual AI whenever it will appear will have nothing in common with all human.

    1. To conclude, all this reflections are human reflections, and I won’t be surprised if actual AI whenever it will appear will have nothing in common with all human.

      Exactly, all these “concerns” about bad/good AI are anthropomorphic projections, I am baffled that the Singularitarians fail to see this because one of their articles of faith is inscrutability of anything beyond the Singularity Horizon.
      Yet they keep wallowing in such monkey dreams.

  4. – “an AI and telling it to make as much money as possible”
    — Computational Finance (non-Gaussian characteristics of financial markets and operational aspects of financial markets, financial engineering, portfolio and risk management) Research @ CCFEA
    – “if any AI is corrupted, the other AIs, with substantially more aggregate resources, will discover and deal with the problem.”
    — Programming for Peace : Computer-Aided Methods for International Conflict Resolution and Prevention, Springer 2006

    so… I guess we are already good to go 😉

  5. It wouldn’t be that hard to implement ethics in machines, at least in some form. While it obviously wouldn’t work the same way ours does, AI is the ideal platform for utilitarianism: it would actually have the power to run the utilitarian calculus, which would make a lot of philosophers very excited.

    If you followed economists’ lead and calculated everything in dollars, and started with Pareto optimality, you’d do decently, but if it’s really effective, that may paralyze it, as using resources is always at the expense of others. Still, with a little more lenient ideal, a strong Asimovian harm principle, and a little tweaking to iron out the kinks, it may be able to reasonably approximate a human morality, which is pretty amazing.

    Of course, any small mistake could still make things go terribly wrong…

Comments are closed.