I tried to use every single free offline text-to-speech engine that would run on Ubuntu 24.04 without too much hassle to see if any of them sounded natural. pico2wave was the overall winner so far, but it is not perfect.
I've been noticing a gap between the "AI" SOTA and what is actually packaged well enough to be usable by a general audience.
Also played a bit more with OpenAI Whisper: askubuntu.com/questions/24059/automatically-generate-subtitles-close-caption-from-a-video-using-speech-to-text/1522895#1522895 Mind blowing performance and perfect packaging as well, kudos.