An impossible AI-complete dream!
It is impossible to understand speech, and take meaningful actions from it, if you don't understand what is being talked about.
And without doubt, "understanding what is being talked about" comes down to understanding (efficiently representing) the geometry of the 3D world with a time component.
Not from hearing sounds alone.

Articles by others on the same topic (1)

Natural Language Processing (NLP) is a subfield of artificial intelligence (AI) and computer science focused on the interaction between computers and human (natural) languages. The goal of NLP is to enable machines to understand, interpret, and respond to human language in a way that is both meaningful and useful. NLP incorporates techniques from various disciplines, including linguistics, computer science, and machine learning.