The alignment problem and the rule of law - part 2

henrylfraser
Jun 9, 2020
5 min read

Updated: Jul 22, 2020

In the previous post I set out a version of the alignment problem - the potentially disastrous consequences of a super AI's incentive system being misaligned with human values.

My immediate reaction to Toby Ord's sketch of the alignment problem in The Precipice was to think about law. If the question is, "how can we get an AI to adopt human values when those values are emergent, diverse, complex and conflicting", then law must surely be part of the answer.

By 'law', I don't mean regulation of AI research. I mean law in the sense of 'rule of law'.

Getting an AI to fully internalise and adhere to 'human values' is impossible, because there is no unitary, consistent body of values. The law is also not perfectly unitary and consistent, but it is a far simpler, more coherent, more manageable value system. Instilling a commitment to the rule of law in AIs could be, at the very least, a useful staging point or milestone on the path to better alignment for AIs.

Let me step through my early thoughts on this.

A problem of complex, inconsistent, emergent value systems

Part of the alignment problem seems to be the complexity, diversity and emergent nature of human values. It's hard to set rules or rewards that encompass them all. They conflict sometimes.

And Ord points out, while different moral or social theories may align on many basic points, their end goals may diverge. Even if we could succeed in teaching an AI a complex ethical system, it might be the wrong one - one that fails to optimise our wellbeing; or, worse, one that, seen through to the fullest, plunges us into misery. And we've seen, in part 1 of this post, that once an AI's reward function drives it toward a certain end goal, we may have great difficulty in changing its values.

Law as a simplifying tool

As I was thinking about this problem, it struck me that we already have a set of tools for setting basic rules of conduct in the face of the complexity, diversity and evolving nature of human values. In states governed by the rule of law, we use laws to set a baseline for conduct. We don't pretend that the law is the be-all and end-all of morality and ethics, but it reduces the surface area for immoral or bad actions.

Law as a tool for limiting power

If the rule of law is good for anything, it is good for placing limits on the exercise of (dangerous) power.

Whose power? Powerful individuals. States. Corporations. Where the rule of law prevails, none of these entities is above the law, and the power of each is therefore limited.

We already use law to deal with entities that resemble AIs

This brings me to my next musing We already use the law to manage and adjust the incentives of powerful entities that we might think of as being, themselves, artificial intelligences.

Take corporations. The law treats them as 'persons' capable of suing and being sued for their conduct. They are far more powerful, far 'smarter' than any human individual. This is because they combine the capabilities of many individuals pooling their labour both synchronously - working on the same problem at the same time - and over time - gradually building up resources and know-how.

Then there is the market whose 'invisible hand' allocates resources with far more subtelty and efficiency than any individual, or corporation, or even any state, could possibly achieve.

These entities are not natural ‘beings’ in the ordinary sense (perhaps AIs aren’t either), but we certainly think of them as agents capable of acts and omissions. These persons or systems or entities or agents (or whatever you want to call them) manage informational complexity by means other than digital computation. But it seems reasonable to describe their capacity to process and generate action based on information as 'intelligence'. And that intelligence surpasses human intelligence.

Aligning incentives through law

So, to return to my point, we don't manage these entities solely by asking them to learn and apply 'human values'. Granted, we do rely to some extent on human agents within those systems being more or less aligned with broader human values, and by making appeals to the moral sense of those constituents.

But we often run into collective action problems where, even if all participants act in good faith (and they don’t - there are always some bad actors), incentives and circumstances conspire to encourage bad actions.

Markets and corporations, like AIs built on reinforcement learning, also have reward systems. The reward that incentivises and directs their conduct is, in most cases, money.

When reward systems that shape markets misalign with the good of society we say that market forces have produced negative externalities.

The textbook example is pollution. Companies and businesses reap rewards (make money) when they produce energy, manufacture goods, or use energy and resources to deliver goods - but all of this creates waste. In the ordinary course of things (without some intervening factor) their reward is not diminished when they create harmful waste, whether in the form of air pollution, landfill, or chemical wastes that contaminated waterways and soil. Most of the cost is suffered by others, collectively. And so corporations have no incentive to minimise waste and plenty of incentive to keep producing it and making money.

We don’t respond to this problem by lecturing to all agents concerned about the harm they are causing, or appealing to their moral sense - although of course since humans are involved in running corporations, social and moral pressure can have some impact.

They way we create meaningful and enduring change, though, is by enacting legal reforms which internalise the costs of waste. We tax the waste. We impose fines. We treat pollution as a private harm that can be remedied through civil actions. We prohibit certain kinds of polution. It stops paying to pollute, and starts costing money.

So the 'reward function' driving corporate conduct is adjusted, without ever having to change the nature of the reward itself. We don't ask corporations to stop trying to make money, we just change the parameters that determine when money can be made.

Cautious optimism

Law and the rule of law are the tools that we already use to simplify the complexity of emergent and diverse values, and align incentives of powerful agents with the good of all.

This gives me cause to think that there is promise (for addressing the alignment problem) in researching the prospect of linking AI's reward functions to the law or the rule of law; to implementing some principle or instruction such as ‘obey the law’ and/or ‘submit to the rule of law’.

Of course, many powerful entities pursue ends which are misaligned with our wellbeing within the bounds of the law. But at the very least, implementing respect for the rule of law seems a useful way to close the door to a very large range of harmful action.

Another advantage of the rule of law is its capacity to separate instructions for conduct into layers of abstraction, which allows a simple instruction to encompass a complex value system. This will be the subject of my next post - part three of my reflections on the alignment problem and the rule of law.

The alignment problem and the rule of law - part 2

Recent Posts

Comments