{wiki=Canterbury_corpus}

The Canterbury Corpus is a collection of texts commonly used in the field of linguistics, particularly in studies related to language modeling, text analysis, and natural language processing. It comprises a variety of written texts that are representative of different styles, genres, and forms of literature. The corpus was originally compiled by researchers at the University of Kent at Canterbury as a resource for linguistic analysis and is often used for tasks such as testing algorithms for text generation, machine translation, and lexical studies.


 Canterbury corpus

ID: canterbury-corpus