Burning Subtitles

Name: Burning Subtitles
Author: jianshuo

By jianshuo· jianshuo/claude-skills· 0

Burn subtitles into video or add them as toggleable tracks in one encoding pass.

Installation

1
Make sure Claude is on your device and in your terminal.
Skills load from ~/.claude/skills/ when Claude Code starts up — so you need it on your machine first. If you don't have it yet, install it once with the command below, then run claude in any terminal to verify.
One-time setup
```
npm i -g @anthropic-ai/claude-code
```
Already have it? Skip ahead.

Paste into Claude Code or into your terminal.

Install

git clone ht••••••••••••••••••••••••••••••••••••••••••• •••••••••••••••••••••••••••• •• ••••• •• ••••••••••••••••••••••••••••••••••••••••••••••• •• •• •• •••••••••••••••••••••••••••••••••••••••••••••••••••• ••••••••••••••••••••••••••••••••••••••••••••••••

This copies the whole skill folder into ~/.claude/skills/wjs-burning-subtitles-jianshuo/ — the SKILL.md plus any scripts, reference docs, or templates the skill ships with. Safe default: works for every skill.

Faster alternative (instruction-only skills)

Skips the clone and grabs only the SKILL.md file. Don't use this if the skill ships Python scripts, reference markdowns, or asset templates — they won't be downloaded and the skill will fail when it tries to load them.

Quick install (SKILL.md only)

mkdir -p ~/.•••••••••••••••••••••••••••••••••••••••••••• •• •••• ••••• •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• •• ••••••••••••••••••••••••••••••••••••••••••••••••••••••••

Restart Claude Code.
Quit and reopen Claude Code (or any other agent that loads from ~/.claude/skills/). New skills are picked up on startup.
Just ask Claude.
Skills auto-activate when your request matches the skill's description — no slash command needed. Trigger phrases live in the skill's own frontmatter; you can read them in the “What this skill does” section above.

Prefer to read the source first? Open on GitHub.

When Claude uses it

Use when the user has a video + an SRT and wants the subtitles either burned into the pixels (libass, always-visible) or soft-muxed as a togglable track. Also handles the final composite step for the localization pipeline — burn subs, mix a dub track, and keep the original audio as a low-volume bed, all in ONE ffmpeg encode (no cascade). Verifies libass availability and auto-downloads a static evermeet ffmpeg build when Homebrew's stripped binary lacks it. Triggers — "烧字幕", "硬字幕", "burn subtitles", "burn-in subs", "embed subtitle", "soft mux SRT", "把字幕烧进视频", "做最终合成".

What this skill does

wjs-burning-subtitles

Video + SRT → video with subtitles. Also the final-encode stage for the localization pipeline: takes a video, an optional dub track from /wjs-dubbing-video, and an optional SRT to burn, and produces the upload-ready MP4 in one ffmpeg pass. No cascade of decodes/re-encodes.

When to use

User has an SRT and wants it always-visible on the video (burn-in for 微信视频号 / 抖音 / WeChat — players that won't honor embedded subtitle tracks).
User wants a togglable subtitle track (soft-mux) for QuickTime / VLC / IINA / mobile players that support mov_text.
Final composite after /wjs-dubbing-video: burn target-language subs + mix dub over original-as-bed in one encode.

When NOT to use

No SRT yet → run /wjs-transcribing-audio then /wjs-translating-subtitles first.
HTML/CSS captions (kinetic, per-word highlights, custom fonts) on a clip composed in HyperFrames → use /wjs-overlaying-video instead. Don't mix libass burn-in with HyperFrames captions on the same output.
The "subtitles" are actually motion graphics (animated callouts, lower-thirds with logos, kinetic typography) → that's /wjs-overlaying-video, not this skill.

The 3 modes of `render.py`

scripts/render.py auto-detects mode from flags:

Subtitles only — --video + --srt → re-encodes video with burned subs, original audio passes through.
Dub only — --video + --dub → keeps original video stream; replaces or mixes the audio track.
Full localized cut — --video + --srt + --dub → burns subs AND mixes dub. By default keeps original audio at low volume as a "bed" under the dub (set --bed-volume 0 or --no-original-audio to drop it).

Burn-in requires an ffmpeg built with libass. The script auto-downloads a static libass-enabled build from evermeet.cx into /tmp/ff_bin/ on first use if needed.

Soft-mux (togglable subtitle track)

Player apps can show/hide. Works with any ffmpeg build — does not need libass:

ffmpeg -i input.mp4 -i input.zh-CN.srt \
  -map 0:v -map 0:a -map 1:0 \
  -c:v copy -c:a copy -c:s mov_text \
  -metadata:s:s:0 language=zho -metadata:s:s:0 title="中文" \
  output.mp4

This is fast (stream-copy) and reversible. Use it when:

Target platform supports embedded subs (YouTube auto-detects; VLC/QuickTime honors).
User wants viewers to be able to toggle off.
You don't want to re-encode the video.

render.py --video IN.mp4 --srt SUB.srt --soft-mux runs this path.

Hardcoded burn-in (always visible, libass)

Required for WeChat/抖音/朋友圈 etc. where the player will not honor embedded subtitle tracks.

Verify libass is available BEFORE promising burn-in

ffmpeg -filters 2>&1 | grep -E "subtitles|^.. ass "

If neither subtitles nor ass shows up, the build lacks libass. Homebrew's default ffmpeg formula is often stripped (no --enable-libass, no --enable-libfreetype, no drawtext). Don't waste time fighting the comma-escaping inside force_style — it will fail with No such filter: 'subtitles' no matter how the shell quotes it.

Fastest fix on macOS — drop in a static build, no system changes

curl -fsSL -o /tmp/ff.zip https://evermeet.cx/ffmpeg/getrelease/zip
unzip -o /tmp/ff.zip -d /tmp/ff_bin >/dev/null
FF=/tmp/ff_bin/ffmpeg
$FF -version | grep -oE -- "--enable-(libass|libfreetype)"

Then use $FF instead of ffmpeg for the render. The brew binary is fine for everything else (probe, audio extraction, soft-mux). render.py does this auto-fallback if its default ffmpeg lacks libass.

Burn-in render with style overrides

🛑 Checkpoint — confirm before full-render. Burn-in re-encodes the entire video (minutes of CPU on a 5-min clip). Before kicking it off:

Render only the first 30s with -t 30 for a fast preview.
Extract a frame from the longest-line cue (see Fontsize calibration below) and Read it.
Show the user the preview frame + the cue text, ask: "字号/字体/边距 OK 吗？OK 才跑全片。" Wait for explicit confirmation.

Skip the checkpoint only if the user has already approved a full render of this exact video at this exact font config in the same conversation.

$FF -i input.mp4 \
  -vf "subtitles=input.zh-CN.srt:force_style='Fontname=PingFang SC\,Fontsize=12\,PrimaryColour=&H00FFFFFF\,OutlineColour=&H00000000\,BorderStyle=1\,Outline=2\,Shadow=1\,MarginL=20\,MarginR=20\,MarginV=40'" \
  -c:v libx264 -crf 18 -preset medium -pix_fmt yuv420p \
  -c:a copy output.mp4

Inside force_style, escape every comma as \, (the filter graph parser eats the bare comma as a chain separator). All other special chars are fine.

Fontsize calibration — critical

libass scales its internal PlayRes up to the actual video resolution. The number you pass is not pixels in the output. As a starting calibration on a 544×960 vertical phone video, Fontsize=22 rendered each Chinese character at ~55px wide and overflowed the frame, while Fontsize=12 rendered at ~30–35px wide and fit cleanly with 15-char lines.

Rule of thumb: start at Fontsize=12, render, then always extract a frame and look:

$FF -ss 30 -i output.mp4 -frames:v 1 /tmp/frame.png -y
# then Read /tmp/frame.png to verify the longest-line cue fits

Pick a timestamp that lands on the cue with the most characters per line — short lines won't expose overflow. Add MarginL=20 MarginR=20 as a safety inset; never trust default left/right margins.

Style cheatsheet

Keys that matter (libass force_style):

Fontname=PingFang SC — macOS default CJK; alternates: Songti SC, Heiti SC, STHeiti, Hiragino Sans GB.
Fontsize=12 — start small, scale up only after frame check.
PrimaryColour=&H00FFFFFF — white text (BBGGRR + alpha).
OutlineColour=&H00000000 — black outline.
BorderStyle=1 — outline only (clean over varied backgrounds). Use BorderStyle=3 for an opaque box behind text when the background is busy.
Outline=2 — 2px outline thickness.
Shadow=1 — subtle drop shadow.
MarginL=20 MarginR=20 — keep text inside the frame.
MarginV=40 — vertical distance from the bottom edge.

SRT line-length discipline for burn-in

Even with correct Fontsize, lines that are too long will wrap or overflow. Keep each on-screen line ≤ ~15 Chinese characters (~42 Latin chars). Use explicit \n line breaks inside the SRT block — do not rely on auto-wrapping. Two short lines beat one long one every time. (This is upstream discipline — /wjs-translating-subtitles should already cap cues at these limits.)

Audio mixing — keep the original as a low-volume bed

A pure dub-only track sounds dubbed (because it is). Mixing the original audio at low volume under the dub gives the "professional translation" feel — you still hear the speaker's breath, emphasis, and laughter, just under the new voice.

$FF -i original.mp4 -i dub.mp4 \
  -filter_complex "[0:a]volume=0.18[orig];\
                   [1:a]volume=1.0[dub];\
                   [orig][dub]amix=inputs=2:duration=longest:normalize=0[a]" \
  -map 0:v -map "[a]" \
  -c:v copy -c:a aac -b:a 192k mixed.mp4

Reasonable starting volumes:

Original bed at 0.15–0.25 (≈ −16 to −12 dB)
Dub at 1.0
Use normalize=0 so amix doesn't auto-attenuate when both are active.

To drop the original entirely: --no-original-audio (equivalent to --bed-volume 0).

Combining dub + burn-in + bed (the full job)

One ffmpeg call does all three — burn the target subtitle onto the video stream and mix the two audio tracks:

$FF -i original.mp4 -i dub.mp4 \
  -filter_complex "[0:v]subtitles=input.zh-CN.srt:force_style='Fontname=PingFang SC\,Fontsize=12\,PrimaryColour=&H00FFFFFF\,OutlineColour=&H00000000\,BorderStyle=1\,Outline=2\,Shadow=1\,MarginL=20\,MarginR=20\,MarginV=40'[v];\
                   [0:a]volume=0.18[orig];[1:a]volume=1.0[dub];\
                   [orig][dub]amix=inputs=2:duration=longest:normalize=0[a]" \
  -map "[v]" -map "[a]" \
  -c:v libx264 -crf 18 -preset medium -pix_fmt yuv420p \
  -c:a aac -b:a 192k final.mp4

This is the "ship to social media" final cut. render.py --video original.mp4 --dub dub.mp4 --srt input.zh-CN.srt runs this exact pipeline.

Running `render.py`

# Subtitles only (burn):
python3 ~/.claude/skills/wjs-burning-subtitles/scripts/render.py \
    --video IN.mp4 --srt SUB.srt --out OUT.mp4

# Dub only (replace audio, no subs):
python3 ~/.claude/skills/wjs-burning-subtitles/scripts/render.py \
    --video IN.mp4 --dub IN_zh_dub.mp4 --out OUT.mp4

# Full localized cut (burn + dub + original bed):
python3 ~/.claude/skills/wjs-burning-subtitles/scripts/render.py \
    --video IN.mp4 --srt IN.zh-CN.srt --dub IN_zh_dub.mp4 --out OUT.mp4

# Soft-mux (no re-encode):
python3 ~/.claude/skills/wjs-burning-subtitles/scripts/render.py \
    --video IN.mp4 --srt SUB.srt --soft-mux --out OUT.mp4

See render.py --help for the full style/audio flag list (--font, --fontsize, --color, --outline-color, --margin-v, --bed-volume, --no-original-audio).

Output

Burn mode: <source>_burned.mp4 (re-encoded, libass-rendered subs)
Soft-mux mode: <source>_softsub.mp4 (stream-copy, mov_text track)
Full cut: <source>_final.mp4 (re-encoded video with burned subs + mixed audio)

Anti-patterns

❌ Promising burn-in without verifying libass. Check ffmpeg -filters | grep subtitles first; auto-fall back to evermeet static build if missing.
❌ Committing a burn render without a frame check. Always extract a frame at the longest-line cue and Read it before kicking off the full render.
❌ Bare commas inside force_style. The filter graph parser eats them. Escape every internal comma as \,.
❌ Mixing libass burn-in with HyperFrames captions. Pick ONE caption system per output video. If you're using HTML/CSS captions in /wjs-overlaying-video, don't burn here too.
❌ Using period milliseconds in the SRT. Whisper local writes .mmm; libass tolerates it but other downstream tools choke. Normalize to ,mmm.
❌ Defaulting to BorderStyle=3 (opaque box). Use BorderStyle=1 (outline only) unless the background is genuinely busy — the box looks heavy and dated.

Upstream

/wjs-transcribing-audio + /wjs-translating-subtitles — produce the SRT input.
/wjs-dubbing-video — produces the *_<lang>_dub.mp4 input for full-localized-cut mode. The dub-only file is technically a finished video; this skill is what mixes the original underneath and burns the subs to make it shippable.

Common pitfalls

Fontsize that worked on one video looks tiny / huge on another. libass scales by PlayRes ratio, not pixels. Recalibrate per video resolution; don't trust a hardcoded value.
Margin defaults clip text on vertical phone videos. Always set MarginL=20 MarginR=20 and MarginV=40 (or higher) explicitly.
mov_text track shows up in QuickTime but not in some Android players. If the target audience is mobile-Chinese, soft-mux is unreliable; burn instead.
Background-bus busy / contrast issues. Increase Outline=2 → Outline=3, or switch to BorderStyle=3 for a translucent box (BackColour=&H80000000 for 50% black).

Related skills

Skill Builder & Optimizer

anthropics

Create, edit, and optimize Claude skills with performance testing and benchmarking.

Official

Org Change Management

alirezarezvani

Guide teams through organizational changes using the ADKAR model and communication strategies.

MIT

Audio/Video Transcription

daymade

Transcribe audio and video files to text with fast local or remote processing.

Claude Export Conversation Fixer