Source: cirosantilli/cosmopedia

= Cosmopedia

* https://github.com/huggingface/cosmopedia
* https://huggingface.co/datasets/HuggingFaceTB/cosmopedia

> Cosmopedia is a dataset of synthetic textbooks, blogposts, stories, posts and WikiHow articles generated by Mixtral-8x7B-Instruct-v0.1.The dataset contains over 30 million files and 25 billion tokens, making it the largest open synthetic dataset to date.