Hey, I scraped the open.mp documentation and processed it with AI, creating a dataset of around 3,600 lines. As a trial, I fine-tuned the gpt-oss-120b model, and the results turned out quite good for a model of this size.
Next, I’m planning to expand the dataset by processing well-structured libraries, game modes, filterscripts, and snippets in the same way. Clean and well-organized projects in particular should significantly improve the overall dataset quality.
If you know any game modes or filterscripts with solid, reliable code architecture, feel free to share them here. I’m aiming to make the dataset more robust and diverse.
Dataset: https://huggingface.co/datasets/yeatdev/openmp-dataset