Project Gutenberg remove line breaks by Ciro Santilli 34 Updated +Created
Their txt formats are so crap!
E.g. for;
wget -O pap.txt https://www.gutenberg.org/ebooks/1342.txt.utf-8
a good one is:
perl -0777 -pe 's/(?<!\r\n)\r\n(?!\r\n)( +)?/ /g' pap.txt
The ( +)? is for the endlessly many quoted letters they have, which use four leading spaces per line as a quote marker.