In my previous posts I spent some time musing about the rule of law as a tool for reducing the scope for bad action by an AI. I was responding to a version of dangerous AI that Toby Ord set out in his chapter on existential risks from AI in his book, The Precipice.
In Ord's scenario, a super-human AI built combining deep learning and reinforcement learning is programmed to to maximise its reward function (whatever outcome has been designated as 'rewarding' to the AI). To do so, it forms instrumental goals. It recognises the incompatibility of those goals with human interests, and then takes action to control humanity in order to protect the maximisation of its reward function.
I suggested that programming an instruction not to violate the rule of law could help protect against this kind of scenario.
I want to talk a little about the limitations of that instruction. I certainly don't consider it a fix-all.
The natural retort to my suggestion is that if the AI is super-human in its intelligence it will be able to overcome the constraints imposed by the rule of law. Eliezer Yudkowsky's explanation of the difficulty of AI safety engineering comes to mind:
if something goes wrong at any level of abstraction, there may be cognitive powerful processes seeking out flaws and loopholes in your safety measures. When you think a goal criterion implies something you want, you may have failed to see where the real maximum lies. When you try to block one behavior mode, the next result of the search may be another very similar behavior mode that you failed to block.
No doubt this is true. But let me just dig down a little into what you do and don't get by implementing a rule that requires an AI not to violate the law (assuming it is possible to implement).
Ord imagined an AI hacking bank accounts to steal money, hacking insecure computers to copy and store backups of itself, using its ill gotten gains to bribe, and using its access to sensitive information to extort human agents into doing its bidding. All of these things are illegal.
The point is, compliance with the law rules out actions that are clearly criminal - conspiracy to commit murder, theft, fraud, extortion, bribery.
This is not to say an AI bound by the rule of law couldn't get money or gain influence or do harm in other ways by lawful means. Let me return to my example of the polluting corporation. I explained that we use law to impose costs for polluting in order to better align the corporations conduct with the public good. We don't have to try to change the corporation's reward function - its profit motivation. We just change the environmental factors determining what is profitable.
What I didn't get into in that post was the fact that, while laws have generated a huge reduction in pollution by corporations they haven't eliminated it. Moreover, corporations recognise laws that regulate pollution as cost-generators, and therefore have an interest in changing the law.
So, while direct violations of existing laws might be ruled out by a strict directive to comply with the law, a whole raft of bad actions still remain available. Lobbying, rent-seeking, loophole seeking are all still options - and a super-smart AI would presumably be the best rent-seeker, lobbyist and loophole exploiter ever. It might even erode the whole institutional apparatus of the law in order to obtain the legal ruleset it deemed most advantageous.
I don't pretend to have a comprehensive answer to these problems, but it seems to me that at least they are narrower problems than those we'd face if we were dealing with a fraudulent, thieving, extorting criminal mastermind AI.
In future posts I'll spend some more time thinking about what to do about lobbying, rent-seeking and loopholes.
Comentários