Audio/Video Transcription
Transcribes audio and video files to text using Qwen3-ASR. Supports two modes — local MLX inference on macOS Apple Silicon (no API key, 15-27x realtime) and remote API via vLLM/OpenAI-compatible endpoints. Auto-detects platform and recommends the best path. Triggers when the user wants to transcribe recordings, convert audio/video to text, do speech-to-text, or mentions ASR, Qwen ASR, 转录, 语音转文字, 录音转文字. Also triggers for meeting recordings, lectures, interviews, podcasts, screen recordings, or any audio/video file the user wants converted to text.
Installation
- Make sure Claude is on your device and in your terminal.
Skills load from
~/.claude/skills/when Claude Code starts up — so you need it on your machine first. If you don't have it yet, install it once with the command below, then runclaudein any terminal to verify.One-time setupnpm i -g @anthropic-ai/claude-codeAlready have it? Skip ahead.
- Paste into Claude Code or into your terminal.Install
git clone https://github.com/daymade/claude-code-skills.git /tmp/daymade__claude-code-skills && mkdir -p ~/.claude/skills/asr-transcribe-to-text-daymade && cp -r /tmp/daymade__claude-code-skills/asr-transcribe-to-text/. ~/.claude/skills/asr-transcribe-to-text-daymade/This copies the whole skill folder into
~/.claude/skills/asr-transcribe-to-text-daymade/— the SKILL.md plus any scripts, reference docs, or templates the skill ships with. Safe default: works for every skill.Faster alternative (instruction-only skills)
Skips the clone and grabs only the SKILL.md file. Don't use this if the skill ships Python scripts, reference markdowns, or asset templates — they won't be downloaded and the skill will fail when it tries to load them.
Quick install (SKILL.md only)mkdir -p ~/.claude/skills/asr-transcribe-to-text-daymade && curl -fsSL https://raw.githubusercontent.com/daymade/claude-code-skills/main/asr-transcribe-to-text/SKILL.md -o ~/.claude/skills/asr-transcribe-to-text-daymade/SKILL.md - Restart Claude Code.
Quit and reopen Claude Code (or any other agent that loads from
~/.claude/skills/). New skills are picked up on startup. - Just ask Claude.
Skills auto-activate when your request matches the skill's description — no slash command needed. Trigger phrases live in the skill's own frontmatter; you can read them in the “What this skill does” section above.
Prefer to read the source first? Open on GitHub.
When Claude uses it
Transcribes audio and video files to text using Qwen3-ASR. Supports two modes — local MLX inference on macOS Apple Silicon (no API key, 15-27x realtime) and remote API via vLLM/OpenAI-compatible endpoints. Auto-detects platform and recommends the best path. Triggers when the user wants to transcribe recordings, convert audio/video to text, do speech-to-text, or mentions ASR, Qwen ASR, 转录, 语音转文字, 录音转文字. Also triggers for meeting recordings, lectures, interviews, podcasts, screen recordings, or any audio/video file the user wants converted to text.
What this skill does
ASR Transcribe to Text
Transcribe audio/video files to text using Qwen3-ASR. Two inference paths:
| Mode | When | Speed | Cost |
|---|---|---|---|
| Local MLX | macOS Apple Silicon | 15-27x realtime | Free |
| Remote API | Any platform, or when local unavailable | Depends on GPU | API/self-hosted |
Configuration persists in ${CLAUDE_PLUGIN_DATA}/config.json.
Step 0: Detect Platform and Load Config
cat "${CLAUDE_PLUGIN_DATA}/config.json" 2>/dev/null
If config exists, read values and proceed to Step 1.
If config does not exist, auto-detect platform first:
python3 -c "
import sys, platform
is_mac_arm = sys.platform == 'darwin' and platform.machine() in ('arm64', 'aarch64')
print(f'Platform: {sys.platform} {platform.machine()}')
print(f'Apple Silicon: {is_mac_arm}')
if is_mac_arm:
print('RECOMMEND: local-mlx')
else:
print('RECOMMEND: remote-api')
"
Then use AskUserQuestion with platform-aware defaults:
For macOS Apple Silicon (recommended: local):
ASR setup — your Mac has Apple Silicon, so local transcription is recommended.
Q1: Transcription mode?
A) Local MLX — runs on your Mac's GPU, no API key needed, 15-27x realtime (Recommended)
B) Remote API — send audio to a server (vLLM, Tailscale workstation, etc.)
Q2: Does your network have an HTTP proxy that might intercept traffic?
A) Yes — bypass proxy for ASR traffic (Recommended if using Shadowrocket/Clash)
B) No — direct connection
For other platforms (recommended: remote):
ASR setup — local MLX requires macOS Apple Silicon. Using remote API mode.
Q1: ASR Endpoint URL?
A) http://workstation-4090-wsl:8002/v1/audio/transcriptions (Qwen3-ASR vLLM via Tailscale)
B) http://localhost:8002/v1/audio/transcriptions (Local server)
C) Custom URL
Q2: Proxy bypass needed?
A) Yes (Recommended for Shadowrocket/Clash/corporate proxy)
B) No
Save config:
mkdir -p "${CLAUDE_PLUGIN_DATA}"
python3 -c "
import json
config = {
'mode': 'MODE', # 'local-mlx' or 'remote-api'
'model': 'MODEL_ID', # local: 'mlx-community/Qwen3-ASR-1.7B-8bit', remote: 'Qwen/Qwen3-ASR-1.7B'
'max_tokens': 200000, # local only, critical for long audio
'endpoint': 'URL', # remote only
'noproxy': True,
'max_timeout': 900 # remote only
}
with open('${CLAUDE_PLUGIN_DATA}/config.json', 'w') as f:
json.dump(config, f, indent=2)
print('Config saved.')
"
Step 1: Extract Audio (if input is video)
For video files (mp4, mov, mkv, avi, webm), extract as 16kHz mono WAV:
ffmpeg -i INPUT_VIDEO -vn -acodec pcm_s16le -ar 16000 -ac 1 OUTPUT.wav -y
Audio files (wav, mp3, m4a, flac, ogg) can be used directly. Get duration:
ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 INPUT_FILE
Cleanup: After transcription succeeds, delete extracted WAV files to save disk space.
Step 2: Transcribe
Path A: Local MLX (macOS Apple Silicon)
Use the bundled script — it handles model loading, chunking, and the critical max_tokens parameter:
uv run ${CLAUDE_PLUGIN_ROOT}/scripts/transcribe_local_mlx.py \
INPUT_AUDIO [INPUT_AUDIO2 ...] \
--output-dir OUTPUT_DIR
The script loads the model once and transcribes all files sequentially (no GPU contention). For details on performance, model compatibility, and the max_tokens truncation issue, see references/local_mlx_guide.md.
Critical: The upstream mlx-audio default max_tokens=8192 silently truncates audio longer than ~40 minutes. The bundled script defaults to 200000. If calling model.generate() directly, always pass max_tokens=200000.
Path B: Remote API
Health check first (skip if already verified this session):
python3 -c "
import json, subprocess, sys
with open('${CLAUDE_PLUGIN_DATA}/config.json') as f:
cfg = json.load(f)
base = cfg['endpoint'].rsplit('/audio/', 1)[0]
noproxy = ['--noproxy', '*'] if cfg.get('noproxy', True) else []
result = subprocess.run(
['curl', '-s', '--max-time', '10'] + noproxy + [f'{base}/models'],
capture_output=True, text=True
)
if result.returncode != 0 or not result.stdout.strip():
print(f'HEALTH CHECK FAILED: {base}/models', file=sys.stderr)
sys.exit(1)
print(f'Service healthy: {base}')
"
Read config and send via curl:
python3 -c "
import json, subprocess, sys, os, tempfile
with open('${CLAUDE_PLUGIN_DATA}/config.json') as f:
cfg = json.load(f)
noproxy = ['--noproxy', '*'] if cfg.get('noproxy', True) else []
timeout = str(cfg.get('max_timeout', 900))
audio_file = 'AUDIO_FILE_PATH'
output_json = tempfile.mktemp(suffix='.json', prefix='asr_')
result = subprocess.run(
['curl', '-s', '--max-time', timeout] + noproxy + [
cfg['endpoint'],
'-F', f'file=@{audio_file}',
'-F', f'model={cfg[\"model\"]}',
'-o', output_json
], capture_output=True, text=True
)
with open(output_json) as f:
data = json.load(f)
if 'text' not in data:
print(f'ERROR: {json.dumps(data)[:300]}', file=sys.stderr)
sys.exit(1)
text = data['text']
print(f'Transcribed: {len(text)} chars', file=sys.stderr)
print(text)
os.unlink(output_json)
" > OUTPUT.txt
If remote health check fails, diagnose in order:
- Network:
ping -c 1 HOSTortailscale status | grep HOST - Service:
tailscale ssh USER@HOST "curl -s localhost:PORT/v1/models" - Proxy: retry with
--noproxy '*'toggled
Step 3: Verify Output
After transcription, check for truncation — the most common failure mode:
- Confirm output is not empty
- Check character count is plausible (~400 chars/min for Chinese, ~200 words/min for English)
- Check the ending — does it trail off mid-sentence? If so,
max_tokenswas exhausted - Show user the first and last ~200 characters as preview
If truncated or wrong, use AskUserQuestion:
Transcription may be truncated:
- Expected: ~[N] chars for [M] minutes of audio
- Got: [actual] chars ([pct]% of expected)
- Last line: "[last 100 chars...]"
Options:
A) Retry with higher max_tokens (current: [N], try: [N*2])
B) Switch mode — try [local/remote] instead
C) Save as-is — the output looks complete to me
D) Abort
Step 4: Fallback — Overlap-Merge (Remote API Only)
If single remote request fails (timeout, OOM), fall back to chunked transcription:
python3 ${CLAUDE_PLUGIN_ROOT}/scripts/overlap_merge_transcribe.py \
--config "${CLAUDE_PLUGIN_DATA}/config.json" \
INPUT_AUDIO OUTPUT.txt
Splits into 18-minute chunks with 2-minute overlap, merges using punctuation-stripped fuzzy matching. See references/overlap_merge_strategy.md for algorithm details.
For local MLX mode, overlap-merge is unnecessary — the bundled script handles chunking internally with max_tokens=200000.
Step 5: Recommend Transcript Correction
ASR output always contains recognition errors — homophones, garbled technical terms, broken sentences. After successful transcription, proactively suggest running the transcript-fixer skill on the output:
Transcription complete: [N] chars saved to [output_path].
ASR output typically contains recognition errors (homophones, garbled terms, broken sentences).
Would you like me to run /transcript-fixer to clean up the text?
Options:
A) Yes — run transcript-fixer on the output now (Recommended)
B) No — the raw transcription is good enough for my needs
C) Later — I'll run it myself when ready
If the user chooses A, invoke the transcript-fixer skill with the output file path. The two skills form a natural pipeline: transcribe → correct → review.
Reconfigure
rm "${CLAUDE_PLUGIN_DATA}/config.json"
Then re-run Step 0.
Bundled Resources
Scripts:
transcribe_local_mlx.py— Local MLX transcription (macOS ARM64, PEP 723 deps)overlap_merge_transcribe.py— Chunked transcription with overlap merge (remote API fallback)
References:
local_mlx_guide.md— Performance benchmarks, max_tokens truncation, model compatibilityoverlap_merge_strategy.md— Why naive chunking fails, fuzzy merge algorithm
Related skills
Skill Builder & Optimizer
anthropics
Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch, edit, or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or optimize a skill's description for better triggering accuracy.
Org Change Management
alirezarezvani
Framework for rolling out organizational changes without chaos. Covers the ADKAR model adapted for startups, communication templates, resistance patterns, and change fatigue management. Handles process changes, org restructures, strategy pivots, and culture changes. Use when announcing a reorg, switching tools, pivoting strategy, killing a product, changing leadership, or when user mentions change management, change rollout, managing resistance, org change, reorg, or pivot communication.
Claude Export Conversation Fixer
daymade
Fixes broken line wrapping in Claude Code exported conversation files (.txt), reconstructing tables, paragraphs, paths, and tool calls that were hard-wrapped at fixed column widths. Includes an automated validation suite (generic, file-agnostic checks). Triggers when the user has a Claude Code export file with broken formatting, mentions "fix export", "fix conversation", "exported conversation", "make export readable", references a file matching YYYY-MM-DD-HHMMSS-*.txt, or has a .txt file with broken tables, split paths, or mangled tool output from Claude Code.
Claude Skills Troubleshooter
daymade
Diagnose and resolve Claude Code plugin and skill issues. This skill should be used when plugins are installed but not showing in available skills list, skills are not activating as expected, or when troubleshooting enabledPlugins configuration in settings.json. Triggers include "plugin not working", "skill not showing", "installed but disabled", or "enabledPlugins" issues.