Transform Wikipedia articles into high-quality AI training datasets with automatic subtitle detection, bulk processing, and dual format support.
WikiAI Converter is a powerful, free, and open-source tool that converts Wikipedia articles into structured JSONL (JSON Lines) format for AI model training. Perfect for creating instruction-following datasets, chat training data, and fine-tuning language models.
β¨ Features
π§ Core Functionality
π― Multiple Input Methods: Single URL, bulk URL processing, or HTML file upload
π Dual Format Support: Instruction format (Alpaca/Vicuna) and Chat format (ChatGPT/Claude)
π·οΈ Automatic Subtitle Detection: Intelligently combines main title with subsections
π§Ή Smart Text Cleaning: Removes citations, edit links, and normalizes whitespace
π¦ Bulk Processing: Process multiple Wikipedia URLs simultaneously
ποΈ ZIP Archive Support: Download multiple files in a convenient ZIP package
π Advanced Features
π Metadata Enrichment:
Source URL tracking
Language detection
Category extraction
Publication dates
Author information
Page type classification
Extraction timestamps
βοΈ Configurable Processing:
Include/exclude list items
Citation removal options
Edit link filtering
Custom title override
π Modern UI: Dark/light theme with responsive design
π Live Preview: See converted content before processing
π Version History
v3.0 (Current)
β¨ Multi-URL bulk processing
π Metadata enrichment options
π Dark/light theme support
π¦ ZIP archive downloads
π¨ Modern responsive UI
v2.0
π― Dual format support (Instruction/Chat)
π§Ή Enhanced text cleaning
π Live preview functionality
v1.0
π Initial release
π Basic Wikipedia to JSONL conversion
π§ File upload support