Pipeline Stages
Lines of Python
Cost Per Video
Human Review Time
What It Does
A self-feeding AI video factory for a personal finance YouTube channel. The topic engine runs weekly to research trending queries, score them against YouTube demand/competition data, and auto-queue the best ones. Then for each topic: Claude generates a script with integrated visual specifications, local TTS creates voiceover, Manim/Pillow render animated visuals, FFmpeg assembles everything, and a human spends ~10 minutes verifying claims via Telegram before YouTube upload. All for about $0.10 per video.
Pipeline Architecture
9-stage state machine with resumable checkpoints:
- Topic Engine - Research trends, score demand vs competition, auto-queue
- Script Generation - Claude AI writes scene-by-scene script with visual specs + data calculations
- Voiceover - Kokoro ONNX (local, free) or Edge TTS per scene
- Visual Generation - Manim animations (charts, tables) + Pillow text cards
- B-Roll - Pexels API fills gaps between AI-generated visuals
- Background Music - Selection + volume ducking under voiceover
- Assembly - FFmpeg concatenation with audio-video sync
- Review - Telegram bot sends preview + claims for human verification
- Upload - YouTube API with metadata, thumbnail, scheduling
Key Technical Highlights
Multi-dimensional scoring: YouTube search demand, competition analysis, evergreen potential. Channel context injection avoids duplicate topics.
13 fine-grained states in SQLite. Can resume from any failed stage without re-running earlier work. Deterministic slug generation enables caching.
Manim for animated charts/tables, Pillow for text cards. Dynamic placeholder substitution with Indian lakh-scale auto-detection. Audio-synced duration.
Local TTS (Kokoro ONNX, ~80MB model, no GPU). Only Claude API has actual cost. Selective B-roll fetching. ~$0.08-0.15 per video.
Tech Stack
Core implementation
Script generation
Local text-to-speech
Visual rendering
Video assembly
Pipeline state
Async review
Upload + metadata