Source: cirosantilli/cosmopedia
= Cosmopedia
* https://github.com/huggingface/cosmopedia
* https://huggingface.co/datasets/HuggingFaceTB/cosmopedia
> Cosmopedia is a dataset of synthetic textbooks, blogposts, stories, posts and WikiHow articles generated by Mixtral-8x7B-Instruct-v0.1.The dataset contains over 30 million files and 25 billion tokens, making it the largest open synthetic dataset to date.