Source: cirosantilli/human-compatible

= Human Compatible
{c}
{tag=AI alignment}
{tag=Good book}
{tag=Implications of AGI}
{title2=2019}
{title2=Stuart Russel}
{wiki=Human_compatible}

= Human Compatible by Stuart J. Russell (2019)
{synonym}

The key takeaway is that setting an explicit <value function> to an <AGI> entity is a good way to destroy the world due to poor <AI alignment>. We are more likely to not destroy by creating an AI whose goals is to "do want humans what it to do", but in a way that it does not know before hand what it is that humans want, and it has to learn from them. This approach appears to be known as <reward modeling>.

Some other cool ideas:
* a big thing that is missing for <AGI> in the 2010's is some kind of more hierarchical representation of the continuous input data of the world, e.g.:
  * <intelligence is hierarchical>
  * we can group continuous things into higher objects, e.g. all these pixels I'm seeing in front of me are a computer. So I treat all of them as a single object in my mind.
* <game theory> can be seen as part of <artificial intelligence> that deals with scenarios where multiple intelligent agents are involved
* <probability> plays a crucial role in our everyday living, even though we don't think too much about it every explicitly. He gives a very good example of the cost/risk tradeoffs of planning to the airport to catch a plane. E.g.:
  * should you leave 2 days in advance to be sure you'll get there?
  * should you pay an armed escort to make sure you are not attacked in the way?
* <economy>, and notably the study of the <utility>, is intrinsically linked to <AI alignment>