An impossible AI-complete dream!
It is impossible to understand speech, and take meaningful actions from it, if you don't understand what is being talked about.
And without doubt, "understanding what is being talked about" comes down to understanding (efficiently representing) the geometry of the 3D world with a time component.
Not from hearing sounds alone.

Articles by others on the same topic (1)