 Human Compatible

ID: human-compatible

 Top articles  Latest articles New article in topic

Ciro Santilli 40 Updated 2025-07-16

The key takeaway is that setting an explicit value function to an AGI entity is a good way to destroy the world due to poor AI alignment. We are more likely to not destroy by creating an AI whose goals is to "do want humans what it to do", but in a way that it does not know before hand what it is that humans want, and it has to learn from them. This approach appears to be known as reward modeling.

Some other cool ideas:

a big thing that is missing for AGI in the 2010's is some kind of more hierarchical representation of the continuous input data of the world, e.g.:
- intelligence is hierarchical
- we can group continuous things into higher objects, e.g. all these pixels I'm seeing in front of me are a computer. So I treat all of them as a single object in my mind.
game theory can be seen as part of artificial intelligence that deals with scenarios where multiple intelligent agents are involved
probability plays a crucial role in our everyday living, even though we don't think too much about it every explicitly. He gives a very good example of the cost/risk tradeoffs of planning to the airport to catch a plane. E.g.:
- should you leave 2 days in advance to be sure you'll get there?
- should you pay an armed escort to make sure you are not attacked in the way?
economy, and notably the study of the utility, is intrinsically linked to AI alignment

 Read the full article

 New to topics? Read the docs here!