CLI Testing
Test a running assistant end-to-end from the terminal without a web UI.
Installation
- Make sure Claude is on your device and in your terminal.
Skills load from
~/.claude/skills/when Claude Code starts up — so you need it on your machine first. If you don't have it yet, install it once with the command below, then runclaudein any terminal to verify.One-time setupnpm i -g @anthropic-ai/claude-codeAlready have it? Skip ahead.
- Paste into Claude Code or into your terminal.
This copies the whole skill folder into
~/.claude/skills/cli-testing-vellum-ai/— the SKILL.md plus any scripts, reference docs, or templates the skill ships with. Safe default: works for every skill.Faster alternative (instruction-only skills)
Skips the clone and grabs only the SKILL.md file. Don't use this if the skill ships Python scripts, reference markdowns, or asset templates — they won't be downloaded and the skill will fail when it tries to load them.
Quick install (SKILL.md only)Sign up to copy - Restart Claude Code.
Quit and reopen Claude Code (or any other agent that loads from
~/.claude/skills/). New skills are picked up on startup. - Just ask Claude.
Skills auto-activate when your request matches the skill's description — no slash command needed. Trigger phrases live in the skill's own frontmatter; you can read them in the “What this skill does” section above.
Prefer to read the source first? Open on GitHub.
When Claude uses it
Manually test a running Vellum assistant end-to-end purely from the CLI — no desktop app or web UI. Hatch an instance, send messages, watch the reply, and tear it down. Use when verifying assistant behavior, reproducing a bug, or smoke-testing a change without the macOS/web clients.
What this skill does
CLI Testing — Exercise the Assistant End-to-End
Drive a real assistant from the terminal only. The vellum CLI (cli/, package
@vellumai/cli) manages instance lifecycle; vellum message / vellum events
exercise a running instance. See cli/AGENTS.md and
the root README.md § CLI for command reference.
0. Prerequisites
export PATH="$HOME/.bun/bin:$PATH" # bun + the linked `vellum` binary
vellum ps # sanity check the CLI resolves
If vellum is missing, run ./setup.sh from the repo root once (installs deps,
links the vellum command). Docker must be running for the default flow below.
1. Provide an LLM provider key (from the environment)
Local-mode and Docker-mode instances need one LLM provider key. The CLI reads it straight from the host environment — just export it before hatching/setup:
export ANTHROPIC_API_KEY=sk-ant-... # or OPENAI_API_KEY / GEMINI_API_KEY /
# FIREWORKS_API_KEY / OPENROUTER_API_KEY /
# MINIMAX_API_KEY
In Devin sessions ANTHROPIC_API_KEY is typically already present in the
environment — check with echo "${ANTHROPIC_API_KEY:0:7}" before asking for one.
The CLI maps providers to env vars in
cli/src/shared/provider-env-vars.ts.
2. Hatch — default to a Docker hatch built from source
Always default to --remote docker. It runs the assistant, gateway, and
credential-executor in isolated containers that mirror production and keep the
test off your host process table. Reserve --remote local (§5) for the rare
case where Docker is unavailable.
Build from source — that's the point of testing. A bare
vellum hatch --remote docker pulls the published platform images even when
the CLI itself runs from your checkout, so it would test released code, not your
changes. Source-build is opt-in via a flag
(resolveDockerHatchMode in cli/src/lib/docker.ts):
--source <path>— build images once from the source tree at<path>, no watcher. Default for testing: picks up your current changes and is robust for a scripted one-shot run.--watch— build from source and start a file-watcher that rebuilds the affected image on change (watches each service'ssrc/,package.json, andDockerfile). Use while iterating. The watcher is a long-lived foreground process, so prefer--sourcefor unattended/scripted runs.
vellum hatch --remote docker --source . --name clitest # build from cwd
# → "Mode: build-from-source" then "Images (local build): vellum-assistant:local-clitest …"
If
--source/--watchis passed but no full source tree is found (e.g. the CLI is running from a packaged app bundle), the CLI falls back to pulling the published images and says so — watch for that line if you expect a build. Building all three images takes ~1–2 min the first time.
Hatch attached — do not pass -d. An attached hatch leases the guardian
token and configures the provider credential from your environment inline,
then returns once the containers are healthy — no follow-up vellum setup
needed. Detached mode (-d) defers the guardian-token lease, so a later
vellum setup cannot authenticate against the gateway and fails with an
invalid_signature 401. Confirm readiness with vellum ps (🟢 healthy)
before messaging.
3. Verify functionality
vellum message is async (returns a message id, not the reply — --json only
adds {accepted, messageId}). vellum events streams the reply but is
long-running, so background it, send, wait, then read.
Assert on a token the assistant must generate, never one you put in the
prompt. vellum events echoes your prompt as **You:** <text>
(cli/src/commands/events.ts), so
grepping for a word that appears in the prompt passes even when the assistant
never replied. Ask a question whose answer is absent from the prompt:
( vellum events > /tmp/vel_events.log 2>&1 & ) # stream in background
sleep 2
vellum message "What is 6 multiplied by 7? Reply with only the number."
sleep 25 # let the assistant respond
pkill -f "vellum events"
grep -w 42 /tmp/vel_events.log # "42" is NOT in the prompt,
# so a match proves a real reply
The assistant's streamed reply is written as plain text (no **You:** prefix),
so a match on a generated answer confirms the round-trip worked. If you must use
a fixed sentinel string, strip the echoed prompt first
(grep -v '^\*\*You:\*\*' /tmp/vel_events.log | grep <sentinel>).
Common verification commands
| Command | Purpose |
|---|---|
vellum ps | List instances + health (🟢 healthy), id, runtime URL, cloud |
vellum message "<text>" | Send a message (async; prints message id) |
vellum events | Stream live events/replies (long-running — background it) |
vellum logs -n 100 | Last 100 log lines; add -f to follow, -s assistant/-s gateway to filter |
vellum client | Interactive terminal chat session (manual exploration) |
vellum message --json "<text>" | Send-ack as JSON ({accepted, messageId}) — the reply still arrives via vellum events, not here |
4. Tear down
vellum retire clitest --yes # stops containers and removes the instance
retire is destructive (removes per-instance Docker volumes); always clean up
test instances when done.
5. Fallback: local mode (no Docker)
Only when Docker is unavailable. Runs the daemon + gateway as plain host processes; configures the provider key automatically from the env at hatch time:
vellum hatch --name clitest # defaults to --remote local
# verify via the `vellum events` + generated-answer pattern in §3, then:
vellum retire clitest --yes
Related skills
Documentation Co-Authoring
anthropics
Guide structured workflows for writing docs, proposals, and technical specs collaboratively.
MCP Server Builder
anthropics
Build protocol servers that connect language models to external APIs and services.
Skill Builder & Optimizer
anthropics
Create, edit, and optimize Claude skills with performance testing and benchmarking.
Multi-Component Web Artifacts
anthropics
Build complex React artifacts with Tailwind CSS and shadcn/ui components.