AI System
BitcoinWiki Content Migration
AI-powered pipeline that migrated and rewrote hundreds of articles from WordPress to MediaWiki.
3 modes processing
~$0.03 per article
Overview
An automated pipeline for migrating content from WordPress to MediaWiki. The system scrapes articles, processes them through AI (rewriting, translating, formatting), handles citations, and publishes to MediaWiki — all through a CLI with progress tracking. Three processing modes handle articles of any length.
Problems Solved
- Manual migration of hundreds of articles was impractical
- Articles needed format conversion (HTML to wikitext)
- Short articles needed expansion to meet quality standards
- Long articles exceeded LLM context windows
- Citations needed proper MediaWiki formatting
Architecture
Four-service pipeline: Scraper (WordPress REST API), Rewriter (OpenRouter AI with 3-tier mode selection), Text Prep (normalization, link checking, citation processing), and Publisher (MediaWiki/WordPress API). SQLite tracks article state through the pipeline.
Key Features
- Three-tier processing: Rewrite (<500 words, expands to 1100+), Convert (500-3000 words), Chunking (3000+ words with context overlap)
- Automatic language detection and English translation
- Two-stage citation pipeline: LLM generates inline markers, text prep converts to proper ref tags
- AI content detection and humanization
- Broken link detection and cleanup
- Batch processing with progress bars
- Dry-run mode for validation before publishing
- Full cost tracking per article
Results
- Small articles: ~$0.03-0.08, processed in 30-45 seconds
- Medium articles: ~$0.02-0.05, processed in 45-60 seconds
- Large articles: ~$0.50-1.00, processed in 10-15 minutes
- 50+ Python files, ~16,700 lines of code
- Complete pipeline from WordPress scrape to MediaWiki publish
Stack