AugmentClaude

Audio Jingle

Generate music, voiceovers, and sound effects as audio files.

Installation

  1. Make sure Claude is on your device and in your terminal.

    Skills load from ~/.claude/skills/ when Claude Code starts up — so you need it on your machine first. If you don't have it yet, install it once with the command below, then run claude in any terminal to verify.

    One-time setup
    npm i -g @anthropic-ai/claude-code

    Already have it? Skip ahead.

  2. Paste into Claude Code or into your terminal.

    This copies the whole skill folder into ~/.claude/skills/audio-jingle-nexu-io/ — the SKILL.md plus any scripts, reference docs, or templates the skill ships with. Safe default: works for every skill.

    Faster alternative (instruction-only skills)

    Skips the clone and grabs only the SKILL.md file. Don't use this if the skill ships Python scripts, reference markdowns, or asset templates — they won't be downloaded and the skill will fail when it tries to load them.

    Quick install (SKILL.md only)
    Sign up to copy
  3. Restart Claude Code.

    Quit and reopen Claude Code (or any other agent that loads from ~/.claude/skills/). New skills are picked up on startup.

  4. Just ask Claude.

    Skills auto-activate when your request matches the skill's description — no slash command needed. Trigger phrases live in the skill's own frontmatter; you can read them in the “What this skill does” section above.

Prefer to read the source first? Open on GitHub.

When Claude uses it

Audio generation skill — jingles, beds, voiceover, and sound effects. Routes music requests to Suno V5 / Udio / Lyria, speech to MiniMax TTS / FishAudio / ElevenLabs V3, and SFX to ElevenLabs SFX or AudioCraft. Output is one MP3/WAV file saved to the project folder.

What this skill does

Audio Jingle Skill

Three sub-modes. The active project's audioKind decides which one runs:

audioKindModels we route toPlan focus
musicSuno V5 (default), Udio, Lyria 2genre + tempo + instrumentation
speechMiniMax TTS (default), Fish, ElevenLabs V3script + voice + pacing
sfxElevenLabs SFX (default), AudioCrafttexture + impact + duration

Resource map

audio-jingle/
├── SKILL.md
└── example.html

Workflow

Step 0 — Read the project metadata

audioKind, audioModel, audioDuration (seconds), and (for speech) voice. Branch by audioKind and use the values verbatim — no clarifying form unless something is marked (unknown — ask).

Important: voice is provider-specific. For minimax-tts, --voice must be a valid MiniMax voice_id (for example male-qn-qingse), not a natural-language description. If you only have a prose voice brief ("warm female narrator", "neutral Mandarin"), keep that in your plan but omit --voice so the daemon's default voice id applies, or ask the user to choose a specific id.

Step 1 — Plan

Music

  • Genre + reference artists (1-2)
  • Tempo (BPM) + key
  • Instrumentation (3-5 instruments max)
  • Vocals: yes / no / hummed / choir
  • Mood arc (intro → chorus → outro)

Speech

  • Script (final, not draft — TTS runs verbatim)
  • Voice target + pacing For MiniMax this means a real voice_id, not prose in --voice
  • Pronunciation hints for proper nouns / acronyms

SFX

  • Texture (impact / whoosh / ambience / foley)
  • Duration + envelope (sharp attack vs. gentle swell)
  • Layering note (single hit vs. stacked)

State the plan in 2-3 sentences before dispatching.

Step 2 — Compose the prompt

Use the format the upstream model prefers. Bind audioDuration to the API parameter directly; never put "make it 30 seconds" in prose.

Step 3 — Dispatch via the media contract

Use the unified dispatcher — do not call provider APIs by hand:

"$OD_NODE_BIN" "$OD_BIN" media generate \
  --project "$OD_PROJECT_ID" \
  --surface audio \
  --audio-kind "<music|speech|sfx>" \
  --model "<audioModel from metadata>" \
  --duration <audioDuration seconds> \
  [--voice "<provider voice id (speech only)>"] \
  --output "<short-slug>-<duration>s.mp3" \
  --prompt "<assembled prompt from Step 2 — for speech, the literal script>"

The command prints one line of JSON: {"file": {"name": "...", ...}}. The bytes land in the project; the FileViewer renders the audio transport controls automatically.

Step 4 — Hand off

Reply with: plan summary, the filename returned by the dispatcher, and one sentence on what to try if the user wants a variation (e.g. "swap tempo from 92 to 108 BPM" rather than "make it different").

Hard rules

  • TTS runs your script literally. Proof it before dispatching — even one stray comma changes the cadence.
  • MiniMax TTS rejects free-form voice prose in --voice. Use a real MiniMax voice_id (for example male-qn-qingse) or omit the flag and let the daemon's default voice apply.
  • Music: under 30s = single section; 30–90s = intro + body; 90s+ = full arc. Don't try to fit a 3-act song into 15 seconds.
  • SFX: prefer one well-described layer over a paragraph of "make it cool" — generators reward specific texture words.
  • Save the file every turn. The audio viewer shows transport controls the moment the file lands.

Related skills