AugmentClaude

Video Assembler

Mix narration audio over video, adjust levels, and add subtitles.

Installation

  1. Make sure Claude is on your device and in your terminal.

    Skills load from ~/.claude/skills/ when Claude Code starts up — so you need it on your machine first. If you don't have it yet, install it once with the command below, then run claude in any terminal to verify.

    One-time setup
    npm i -g @anthropic-ai/claude-code

    Already have it? Skip ahead.

  2. Paste into Claude Code or into your terminal.

    This copies the whole skill folder into ~/.claude/skills/video-assemble-worldwonderer/ — the SKILL.md plus any scripts, reference docs, or templates the skill ships with. Safe default: works for every skill.

    Faster alternative (instruction-only skills)

    Skips the clone and grabs only the SKILL.md file. Don't use this if the skill ships Python scripts, reference markdowns, or asset templates — they won't be downloaded and the skill will fail when it tries to load them.

    Quick install (SKILL.md only)
    Sign up to copy
  3. Restart Claude Code.

    Quit and reopen Claude Code (or any other agent that loads from ~/.claude/skills/). New skills are picked up on startup.

  4. Just ask Claude.

    Skills auto-activate when your request matches the skill's description — no slash command needed. Trigger phrases live in the skill's own frontmatter; you can read them in the “What this skill does” section above.

Prefer to read the source first? Open on GitHub.

When Claude uses it

Assemble a final recap video: mux narration audio over the source video, duck the original audio under the narration, render subtitles (SRT/ASS, optionally burned in), and loudness- normalize. Use as the last stage of the video-recap bundle. Consumes the source video + tts_meta.json (+ narration placement); produces recap_<name>.mp4 + subtitles.srt/.ass. 触发词: 视频合成, 混音, 字幕, 压字幕, assemble video, mux, ducking, subtitles, 成片.

What this skill does

What this does

  1. Mixes the narration audio segments onto the source video at their placed times.
  2. Ducks the original audio under narration (fixed / sidechain / zone modes).
  3. Renders subtitles from the narration placement → subtitles.srt (+ subtitles.ass, burned in with --burn-subtitles).
  4. Optional final loudness normalization to a target LUFS.

Input contract

  • <video> — the source video (the original, or edited_source.mp4 in cut mode).
  • work_dir/tts_meta.json{segments: [...]} from video-voiceover (each segment carries audio_path, timing, pause_after_ms, and overlaps_speech/placement used for ducking + subtitles).

Run

python3 scripts/assemble.py <video> --work-dir <work_dir> \
  [--recap-stem <name>] [--output-dir <dir>] [--burn-subtitles]
  [--source-video <orig.mp4>] [--export-jianying [--jianying-out <dir>]]

Output contract

  • recap_<stem>.mp4 — the final recap video (written to --output-dir or work_dir's parent). It is the stable output alias, overwritten in place on every run so iterating on the narration refreshes the same file.
  • work_dir/output.mp4 — the in-place render.
  • subtitles.srt — narration subtitles; subtitles.ass when --burn-subtitles is used.
  • timeline.json — backend-neutral multi-track model (video / original-audio / narration / BGM / subtitle tracks with ducking automation). Always written.
  • assembly_manifest.json — a slim render record: the input/source paths, the cut-mode source fingerprint (proving a stale ambient SOURCE_VIDEO did not leak into a full-mode export), the render settings, and the final output path.
  • 剪映 draft folder (recap_<stem>/draft_content.json + draft_info.json + draft_meta_info.json) — only with --export-jianying.

Notes

  • Audio is mixed as tracks (like a cut-software timeline): the original audio, an optional BGM bed, and the narration.
  • Optional 剪映/JianYing export: --export-jianying (or EXPORT_JIANYING=1) turns timeline.json into an editable 剪映 draft — original clips, separate audio tracks, and volume keyframes for the ducking. Fully decoupled and lazy-imported: the ffmpeg render never depends on it, and 剪映 need not be installed. In cut mode pass --source-video <orig> so the draft references the real clips. Point --jianying-out at 剪映's drafts root to open it in-app. If a draft folder with the same name already has files, export writes a numbered sibling instead of overwriting it. Media is bundled into the draft folder by default (--jianying-no-bundle-media to reference in place) — this is required on macOS, where 剪映 is sandboxed and cannot read external paths. Note: the draft references the un-burned original, so the source's hardcoded subtitles are visible there (mask them in 剪映 if needed).
  • Subtitle look: SUBTITLE_FONT_SIZE, SUBTITLE_MARGIN_V, SUBTITLE_MAX_CHARS, etc.
  • Ducking / loudness: the original swells to IDLE_ORIG_VOLUME in the gaps and ducks to SPEECH_DUCKING_VOLUME under narration (DUCK_FADE_SECONDS smooths the transition); also DUCKING_MODE, ZONE_DUCKING_VOLUME, FINAL_LOUDNORM, TARGET_LUFS.
  • BGM (optional): set BGM_PATH to any audio file; it loops to length and ducks under narration (BGM_VOLUME / BGM_DUCKING_VOLUME).
  • Burning subtitles requires an ffmpeg with subtitles/libass support.

What this skill does NOT do

  • Does NOT generate narration or synthesize TTS.
  • Does NOT re-transcribe or alter timing decisions — it consumes placement from tts_meta.json.
  • Burning subtitles is opt-in (--burn-subtitles); it does not re-encode unless asked.

Related skills