Video Podcast Maker

Name: Video Podcast Maker
Author: Agents365-ai

By Agents365-ai· Agents365-ai/video-podcast-maker· 0

Turn a topic into a narrated 4K explainer video with script, voiceover, and music.

Installation

1
Make sure Claude is on your device and in your terminal.
Skills load from ~/.claude/skills/ when Claude Code starts up — so you need it on your machine first. If you don't have it yet, install it once with the command below, then run claude in any terminal to verify.
One-time setup
```
npm i -g @anthropic-ai/claude-code
```
Already have it? Skip ahead.

Paste into Claude Code or into your terminal.

Install

git clone ht••••••••••••••••••••••••••••••••••••••••••••••••••••• •••••••••••••••••••••••••••••••••••••• •• ••••• •• ••••••••••••••••••••••••••••••••••••••••••••••••• •• •• •• ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• ••••••••••••••••••••••••••••••••••••••••••••••••••

This copies the whole skill folder into ~/.claude/skills/video-podcast-maker-agents365-ai/ — the SKILL.md plus any scripts, reference docs, or templates the skill ships with. Safe default: works for every skill.

Faster alternative (instruction-only skills)

Skips the clone and grabs only the SKILL.md file. Don't use this if the skill ships Python scripts, reference markdowns, or asset templates — they won't be downloaded and the skill will fail when it tries to load them.

Quick install (SKILL.md only)

mkdir -p ~/.•••••••••••••••••••••••••••••••••••••••••••••• •• •••• ••••• ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• •• ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

Restart Claude Code.
Quit and reopen Claude Code (or any other agent that loads from ~/.claude/skills/). New skills are picked up on startup.
Just ask Claude.
Skills auto-activate when your request matches the skill's description — no slash command needed. Trigger phrases live in the skill's own frontmatter; you can read them in the “What this skill does” section above.

Prefer to read the source first? Open on GitHub.

When Claude uses it

Use when the user gives a topic and wants an automated topic-driven narrated explainer, podcast, or knowledge-summary video (Bilibili / YouTube / Xiaohongshu / Douyin / WeChat Channels), or asks to learn visual design patterns from a reference video/image. Trigger when the user mentions creating a knowledge video, narrated explainer, video podcast, or talking-head topic video from a topic — even if they don't say "video podcast" explicitly. Also trigger when the user wants to regenerate, re-render, rebuild, update, or iterate on a narrated video this skill already produced — e.g. they edited the script/prompt, changed the visuals, or swapped the background music and want the final video remade (reuse the existing videos/{name}/ directory, never start a new project). Do NOT trigger for generic video editing, trimming, format conversion, color grading, or non-narrative video tasks. Produces 4K video via research → script → TTS → Remotion → MP4 + BGM.

What this skill does

REQUIRED: Load Remotion Best Practices First

This skill depends on remotion-best-practices. You MUST invoke it before proceeding:
Invoke the skill/tool named: remotion-best-practices

Video Podcast Maker

Automated pipeline for 4K Bilibili horizontal knowledge videos from a topic. Coding agent + TTS backend + Remotion + FFmpeg.

Bootstrap — update check + prerequisites (run before Step 1)
Execution Modes — Auto vs Interactive, default decisions
Regenerating an Existing Video — reuse videos/{name}/ to iterate on a finished video
Workflow — the 15 steps + phase-file pointers + mandatory stops
Hard Rules — non-negotiable production constraints + output specs
Per-Video Layout — directory structure, --public-dir, naming
Additional Resources — when to load each references/ file
User Preferences
Troubleshooting

Bootstrap

Resolve SKILL_DIR to the directory containing this SKILL.md. If your agent exposes a built-in skill directory variable (e.g. ${CLAUDE_SKILL_DIR}), map it to SKILL_DIR.

SKILL_DIR="${SKILL_DIR:-${CLAUDE_SKILL_DIR}}"

# 1. Update check (notify-only, throttled to 24h)
"${SKILL_DIR}/scripts/check_update.sh"

# 2. Prerequisites (CLIs + backend env vars)
python3 "${SKILL_DIR}/scripts/check_prereqs.py"

check_update.sh output:

UPDATE_AVAILABLE vX.Y.Z -> vA.B.C — tell the user the version delta and ask before running git -C "${SKILL_DIR}" pull --ff-only. Notify-only by design — never pull without consent (the skill directory belongs to the user).
UP_TO_DATE / SKIPPED_RECENT_CHECK / MANUAL_INSTALL — continue silently.

Prereqs failures — see README.md for setup. The check is backend-aware (resolves TTS_BACKEND env → user_prefs.json global.tts.backend → edge default), so only env vars required by the active backend are validated.

Design Learning shortcut: If the user provides a reference video/image or asks to save/list/delete style profiles, see references/design-learning.md instead of running the workflow below.

Execution Modes

Detect at workflow start:

"Make a video about..." / no special instructions → Auto Mode (default)
"I want to control each step" / "interactive" → Interactive Mode

Auto Mode defaults

Full pipeline with sensible defaults. Mandatory stop at Step 9 (Studio review); Step 10 (4K render) only fires when the user says "render 4K" / "render final".

Step	Decision	Auto Default
3	Title position	top-center
5	Media assets	Skip (text-only animations)
7	Thumbnail method	Remotion-generated (16:9 + 4:3)
9	Outro animation	Pre-made MP4 (white/black by theme)
12	Subtitle method	Remotion-native (skip legacy FFmpeg burn)
14	Cleanup	Auto-clean temp files

Override any default in the initial request:

"make a video about AI, burn subtitles" → auto + subtitles on
"use dark theme, AI thumbnails" → auto + dark + imagen
"need screenshots" → auto + media collection enabled

Interactive Mode

Prompts at each decision point.

Regenerating an Existing Video

If videos/{name}/ already exists and the user is iterating on a finished or in-progress video — "regenerate", "re-render", "rebuild", "I edited the script/prompt", "update the video", "change the BGM" — reuse that directory. Do NOT start a new project or a new videos/{newname}/; that is the Single Project rule applied to iteration, and starting fresh is the most common mistake here.

Pick the smallest re-run for what actually changed. Every command targets the same videos/{name}/, and every Remotion command keeps --public-dir videos/{name}/:

Changed	Re-run	Reuses (don't redo)
Narration script (`podcast.txt`)	Step 8 (`generate_tts.py --output-dir videos/{name}`) → Step 10 render → Step 11 BGM	topic research + section design
Visuals only (components, layout, colors, props)	Step 10 render	`podcast_audio.wav` / `timing.json` (audio unchanged)
Background music only	Step 11 mix	`output.mp4` (no re-render)
Subtitles only	Step 12	`output.mp4` / `video_with_bgm.mp4`

A script change shifts every downstream timestamp, so always regenerate timing.json through TTS — never hand-edit it (see Audio-Master Clock). After any re-run, re-verify:

python3 ${SKILL_DIR}/scripts/verify_output.py videos/{name}/

Cleanup only removes TTS temp files, never output.mp4 / video_with_bgm.mp4 — so BGM/subtitle re-runs avoid a full ~8-min re-render.

Workflow

Iterating on a finished video? If videos/{name}/ already exists and the user wants to regenerate after a change, do NOT start at Step 1 — see Regenerating an Existing Video for the minimal re-run.

At Step 1 start, create one task per step in your agent's tracker (Claude Code TaskCreate / Codex todo list / equivalent). Mark in_progress on start, completed on finish. Files in videos/{name}/ are the durable record — if interrupted, inspect the directory to determine where to resume.

#	Step	Output	Phase file
1	Define topic direction	`topic_definition.md`	workflow-script.md
2	Research topic	`topic_research.md`	workflow-script.md
3	Design 5-7 sections	(in-memory)	workflow-script.md
4	Write narration script	`podcast.txt`	workflow-script.md
4.5	Pronunciation pre-flight (zh-CN)	`phonemes.json`	workflow-script.md
5	Collect media (Auto: skip)	`media_manifest.json`	workflow-production.md
6	Generate publish info (Part 1)	`publish_info.md`	workflow-production.md
7	Generate thumbnails (16:9 + 4:3)	`thumbnail_*.png`	workflow-production.md
8	Generate TTS audio	`podcast_audio.wav`, `timing.json`	workflow-production.md
9	Remotion composition + Studio preview	—	workflow-production.md
10	Render 4K video (only on user request)	`output.mp4`	workflow-production.md
11	Mix background music	`video_with_bgm.mp4`	workflow-production.md
12	Finalize (optional legacy subtitle burn)	`final_video.mp4`	workflow-publish.md
13	Complete publish info (Part 2)	chapter timestamps	workflow-publish.md
14	Verify output (`scripts/verify_output.py`)	—	workflow-publish.md
15	Generate vertical shorts (optional)	`shorts/`	workflow-publish.md

Mandatory stops (bold rows above):

Step 9 — Studio review. MUST launch npx remotion studio and wait for user feedback before rendering. NEVER render 4K until the user explicitly confirms ("render 4K" / "render final").
Step 14 — verify_output.py. MUST pass before declaring the video done. Exit 0 = green; exit 2 = warnings still publishable. Auto-fixes common omissions (creates final_video.mp4 if missing). For machine-readable output add --format json (auto when piped).

Pre-render audit (recommended) — before Step 9:

python3 ${SKILL_DIR}/scripts/audit_beat_sync.py <Video.tsx> <timing.json>

Flags beats that drift > 1.5s from narration. Especially important for kinetic-typography videos.

Validation Checkpoints

After Step	Check
8 (TTS)	`podcast_audio.wav` plays · `timing.json` covers all sections · SRT is UTF-8
10 (Render)	`output.mp4` is 3840×2160 · audio-video sync · no black frames
14 (Verify)	`verify_output.py` exits 0 (or 2 with reviewed warnings)

Hard Rules

Rule	Requirement
Single Project	All videos under `videos/{name}/` in user's Remotion project. NEVER create a new project per video.
4K Output	3840×2160 (or 2160×3840 vertical), use `scale(2)` wrapper over 1920×1080 design space
Audio Sync	Audio (`podcast_audio.wav` + `podcast_audio.srt`) is the master clock. `timing.json` MUST be generated from the real TTS output, never hand-estimated. Before rendering, final video duration must match audio within ±0.5s. See Audio-Master Clock.
Thumbnail	MUST generate both 16:9 (1920×1080) AND 4:3 (1200×900) — see design-guide.md
Studio Before Render	MUST launch `remotion studio` for review. NEVER render 4K until user explicitly confirms.
`--public-dir`	Every Remotion command uses `--public-dir videos/{name}/`

Visual minimums (text sizes, content width, safe zones, animation safety) live in references/design-guide.md. MUST load before Step 9.

Audio-Master Clock & Sync

Golden rules

Audio is the master clock. Every slide start, subtitle, progress-bar chapter, and animation beat is derived from podcast_audio.wav and podcast_audio.srt.

Generate timing from TTS, not from text estimates. The canonical pipeline is:

podcast.txt (final)
  → generate_tts.py
  → podcast_audio.wav + podcast_audio.srt + timing.json
  → Remotion composition
  → render

Never hand-write timing.json before audio exists. If you already have curated slides, run align_timing_from_srt.py to anchor them to the real SRT, or add a "section" field to each slide and then run it.
Compensate TransitionSeries overlap. TransitionSeries renders sum(section.duration_frames) - (N-1) * transitionFrames frames. To keep the rendered length equal to timing.total_frames, scale every section proportionally; do not stuff all overlap frames into the first section. The corrected pattern is in templates/Video.tsx.

Mandatory sync checkpoints

When	Check	Command / Action
After Step 8	`timing.json.total_duration` matches `podcast_audio.wav` within ±0.5s	`ffprobe -show_entries format=duration podcast_audio.wav`
Before Step 10	`Video.tsx` scales all sections for transition overlap	Inspect the `compensatedSections` calculation
After Step 10/12	`final_video.mp4` duration matches `podcast_audio.wav` within ±0.5s	`ffprobe -show_entries format=duration final_video.mp4`
Step 14	`verify_output.py` exits 0 and reports green on audio/timing	`python3 ${SKILL_DIR}/scripts/verify_output.py videos/<name>/`

If any checkpoint fails, stop. Do not publish.

Output Specs

Parameter	Horizontal (16:9)	Vertical (9:16)
Resolution	3840×2160 (4K)	2160×3840 (4K)
Frame rate	30 fps	30 fps
Encoding	H.264, 16Mbps	H.264, 16Mbps
Audio	AAC, 192kbps	AAC, 192kbps
Duration	1-15 min	60-90s (highlight)

Per-Video Layout

project-root/                           # Remotion project root
├── src/remotion/                       # Remotion source (Root.tsx, compositions, index.ts)
├── videos/{video-name}/                # Per-video assets (the agent's working dir)
│   ├── topic_definition.md             # Step 1
│   ├── topic_research.md               # Step 2
│   ├── podcast.txt                     # Step 4: narration script
│   ├── phonemes.json                   # Step 4.5: zh-CN pronunciation overrides
│   ├── podcast_audio.wav               # Step 8: TTS audio
│   ├── podcast_audio.srt               # Step 8: subtitles
│   ├── timing.json                     # Step 8: timeline (drives animations)
│   ├── thumbnail_*.png                 # Step 7
│   ├── output.mp4                      # Step 10: 4K render (no BGM)
│   ├── video_with_bgm.mp4              # Step 11
│   ├── final_video.mp4                 # Step 12: final output
│   └── bgm.mp3                         # Background music
└── remotion.config.ts

`--public-dir` per video

Remotion commands MUST use --public-dir videos/{name}/ — each video's assets stay in its own directory, no copy to public/. Enables parallel renders.

npx remotion studio src/remotion/index.ts --public-dir videos/{name}/
npx remotion render src/remotion/index.ts CompositionId videos/{name}/output.mp4 --public-dir videos/{name}/ --video-bitrate 16M
npx remotion still src/remotion/index.ts Thumbnail16x9 videos/{name}/thumbnail.png --public-dir videos/{name}/

Naming

Video name {video-name}: lowercase English, hyphen-separated (e.g. reference-manager-comparison)
Section name {section}: lowercase English, underscore-separated, matches [SECTION:xxx]
Thumbnail naming (16:9 AND 4:3 both required):

Type	16:9	4:3
Remotion	`thumbnail_remotion_16x9.png`	`thumbnail_remotion_4x3.png`
AI	`thumbnail_ai_16x9.png`	`thumbnail_ai_4x3.png`

Additional Resources

Load on demand — do NOT load all at once:

File	Load when
references/workflow-script.md	Steps 1-4 (topic → script)
references/workflow-production.md	Steps 5-11 (media → TTS → Remotion → render → BGM)
references/workflow-publish.md	Steps 12-15 (subtitles, publish, cleanup, shorts)
references/design-guide.md	MUST load before Step 9 — visual minimums, typography, animation safety
references/design-learning.md	User provides a reference video/image, or manages style profiles
references/azure-tts-pitfalls.md	Choosing Azure voice/style, debugging hoarse/glitchy audio
references/troubleshooting.md	On error, or user asks about preferences/BGM
templates/presets/kinetic-typography/	Bold type-driven preset (opinion / argument / declaration videos)
examples/	Reference for composition structure and `timing.json` format

Script suite dispatcher

All scripts under ${SKILL_DIR}/scripts/ are reachable through one hierarchical entry point:

python3 ${SKILL_DIR}/scripts/cli.py --help                  # list resources
python3 ${SKILL_DIR}/scripts/cli.py <resource> --help       # list actions
python3 ${SKILL_DIR}/scripts/cli.py <resource> <action> --help    # forwards to underlying script
python3 ${SKILL_DIR}/scripts/cli.py schema [<method>]       # JSON parameter schema

User Preferences

Skill auto-learns and applies preferences. Full commands and learning details: references/troubleshooting.md.

Storage: user_prefs.json (auto-created from user_prefs.template.json, schema in prefs_schema.json).
Priority: Root.tsx defaults < global < topic_patterns[type] < current instructions.
User commands: "show preferences" · "reset preferences" · "save as X default".

Troubleshooting

See references/troubleshooting.md on errors, BGM options, preference learning, design-learning issues.

Related skills

Generative Code Art

anthropics

Create algorithmic art with p5.js using randomness and interactive parameters.

OfficialComplete terms in LICENSE.txt

UI/UX Pro Max

anthropics

Build production-grade web components and interfaces with distinctive, polished design.

OfficialComplete terms in LICENSE.txt

Artifact Theme Toolkit

anthropics

Apply professional color and font themes to slides, docs, and web pages.

OfficialComplete terms in LICENSE.txt

Multi-Component Web Artifacts

anthropics

Build complex React artifacts with Tailwind CSS and shadcn/ui components.

OfficialComplete terms in LICENSE.txt

Installation

When Claude uses it

What this skill does

Video Podcast Maker

Contents

Bootstrap

Execution Modes

Auto Mode defaults

Interactive Mode

Regenerating an Existing Video

Workflow

Validation Checkpoints

Hard Rules

Audio-Master Clock & Sync

Golden rules

Mandatory sync checkpoints

Output Specs

Per-Video Layout

--public-dir per video

Naming

Additional Resources

Script suite dispatcher

User Preferences

Troubleshooting

Related skills

Generative Code Art

UI/UX Pro Max

Artifact Theme Toolkit

Multi-Component Web Artifacts

`--public-dir` per video