Perfup

Name: Perfup
Author: raullenchai

By raullenchai· raullenchai/Rapid-MLX· 0

Autonomously research, benchmark, and implement performance optimizations with production-ready pull requests.

Installation

1
Make sure Claude is on your device and in your terminal.
Skills load from ~/.claude/skills/ when Claude Code starts up — so you need it on your machine first. If you don't have it yet, install it once with the command below, then run claude in any terminal to verify.
One-time setup
```
npm i -g @anthropic-ai/claude-code
```
Already have it? Skip ahead.

Paste into Claude Code or into your terminal.

Install

git clone ht•••••••••••••••••••••••••••••••••••••••••• ••••••••••••••••••••••••••• •• ••••• •• ••••••••••••••••••••••••••••••••••• •• •• •• ••••••••••••••••••••••••••••••••••••••••••••••••••• ••••••••••••••••••••••••••••••••••••

This copies the whole skill folder into ~/.claude/skills/perfup-raullenchai/ — the SKILL.md plus any scripts, reference docs, or templates the skill ships with. Safe default: works for every skill.

Faster alternative (instruction-only skills)

Skips the clone and grabs only the SKILL.md file. Don't use this if the skill ships Python scripts, reference markdowns, or asset templates — they won't be downloaded and the skill will fail when it tries to load them.

Quick install (SKILL.md only)

mkdir -p ~/.•••••••••••••••••••••••••••••••• •• •••• ••••• ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• •• ••••••••••••••••••••••••••••••••••••••••••••

Restart Claude Code.
Quit and reopen Claude Code (or any other agent that loads from ~/.claude/skills/). New skills are picked up on startup.
Just ask Claude.
Skills auto-activate when your request matches the skill's description — no slash command needed. Trigger phrases live in the skill's own frontmatter; you can read them in the “What this skill does” section above.

Prefer to read the source first? Open on GitHub.

When Claude uses it

Autonomous performance optimization: research, PoC, benchmark, implement, review, PR

What this skill does

/perfup — Autonomous Performance Optimization

Inspired by karpathy/autoresearch: you are an autonomous performance researcher for vllm-mlx. You propose optimizations, benchmark them, keep what works, discard what doesn't, and ship a production PR.

Key Files

Results log: reports/perfup-results.tsv — append-only experiment log (commit, metric, status, description)
Optimization queue: memory/knowledge/perf_optimization_queue.md — ranked list of candidates
Memory index: memory/MEMORY.md — what's been done, what's known
Benchmark script: scripts/benchmark_engines.py
Model for benchmarking: Check memory for current model path. If unavailable, ask user.

The 6 Phases

Phase 1: Research

Read existing state, then discover new opportunities.

Read memory/knowledge/perf_optimization_queue.md and memory/MEMORY.md
If $ARGUMENTS is provided (e.g. /perfup decode), focus on that area. Otherwise broad search.
Scan codebase for optimization opportunities:
- Use Task(subagent_type=Explore) on critical paths
- Search for TODO/FIXME/PERF/HACK comments
- Check ml-explore/mlx-lm recent releases (gh release list --repo ml-explore/mlx-lm --limit 5)
WebSearch for latest MLX inference optimizations if needed
Produce candidate list, each with: problem, solution, estimated impact, effort, coverage, risk

Phase 2: Prioritize

Score and rank. Persist to memory.

Score each candidate (1-5 per axis):
- Impact: Performance gain magnitude (5 = >2x)
- Ease: Implementation effort (5 = <1 day)
- Coverage: Models that benefit (5 = all)
- Safety: Regression risk (5 = zero)
Sort by composite = Impact x Ease x Coverage x Safety
Update memory/knowledge/perf_optimization_queue.md:
- Completed items → "Completed" section (date + results)
- Failed/rejected → "Rejected" section (reason)
- Active queue → "Queue" section with [P0]-[P3] tags
Present top 3 to user. Wait for confirmation before proceeding.

Phase 3: PoC Experiment Loop

This is the core loop. Inspired by autoresearch: try, measure, keep or discard. Repeat.

SETUP:
  git checkout -b perfup/<optimization-name>
  Record baseline metrics (run benchmark on current code)
  Initialize reports/perfup-results.tsv if not exists

LOOP:
  1. Implement minimal PoC change in code
  2. git commit -m "perfup: <brief description>"
  3. Run benchmark: python3.12 scripts/benchmark_engines.py (or custom)
     Redirect output: > reports/perfup-run.log 2>&1
  4. Extract metrics from log (TTFT, decode tok/s, etc.)
  5. Record to reports/perfup-results.tsv:
     commit<TAB>decode_tps<TAB>ttft_ms<TAB>status<TAB>description
  6. DECISION:
     - If metric improved: KEEP. Log "keep" status. This is the new baseline.
     - If metric same or worse: DISCARD. Log "discard". git reset --hard to previous keep.
     - If crashed: Log "crash". Try to fix (1-2 attempts). If unfixable, discard and move on.
  7. If improvement confirmed and significant (>5%): break loop → Phase 4
  8. If no candidate works after trying top 3: inform user and stop.

Rules for the loop:

Each PoC should be MINIMAL — smallest change that tests the hypothesis
Benchmark must run on a REAL model (not mocks)
If benchmark takes too long or model not loaded, ask user
Do NOT ask "should I continue?" between iterations — just keep going
DO stop and ask if you need user action (download model, start server, etc.)

Phase 4: Full Implementation

PoC validated. Now build it properly.

Clean up or rewrite the PoC code for production quality
Enter plan mode — design clean architecture, tests, docs
Implement:
- Clean code, proper error handling, logging
- Unit tests matching existing patterns in tests/
- No hacks, no dead code
Run full test suite: python3.12 -m pytest tests/ -v
Run benchmark again — confirm improvement matches PoC

Phase 5: Review Loop

Independent review via Codex.

Invoke: /review-loop <description of optimization>
Address all findings (P0 = blocker, P1 = should fix, P2 = nice to have)
After review passes, run final benchmarks on all relevant models
Update README/docs with new benchmark numbers if applicable

Phase 6: PR & Ship

Ensure all changes are on perfup/<name> or feat/<name> branch
Push to raullenchai remote (NEVER origin, NEVER main directly)
Create PR:
```
gh pr create --repo raullenchai/vllm-mlx --base main
```
PR body must include:
- Summary: What was optimized and why
- Benchmark results: Before/after table from perfup-results.tsv
- Test plan: How to verify
Update memory:
- Move optimization to "Completed" in perf_optimization_queue.md with PR#, date, confirmed speedup
- Remove from todo if applicable
Present PR URL to user

Results TSV Format

commit	decode_tps	ttft_ms	status	description
a1b2c3d	68.4	245	baseline	current main branch
b2c3d4e	72.1	240	keep	reduce redundant mx.eval in decode loop
c3d4e5f	67.9	248	discard	speculative prefill chunking
d4e5f6g	0.0	0	crash	fused MoE kernel (import error)

Focus Areas

If $ARGUMENTS provided:

ttft — Time to first token (prefill optimization)
decode — Decode throughput (tok/s)
tools — Tool calling accuracy/reliability
accuracy — Model output quality
memory — Memory usage / longer contexts
prefill — Prefill speed
cache — Cache hit rate / prompt reuse
No argument → broad research across all areas

Important Rules

Benchmark proves everything. No optimization ships without measured improvement.
Memory is truth. perf_optimization_queue.md is the canonical record of what's tried/works/failed.
Git discipline. Feature branch → PR on raullenchai/vllm-mlx. Never push to main.
Keep it simple. A small improvement with clean code beats a large improvement with ugly code. Removing code for equal performance is a win.
Ask only when blocked. Don't ask "should I continue?" — just keep iterating. Ask only for user actions (model download, server restart, etc.).

Related skills

Generative Code Art

anthropics

Create algorithmic art with p5.js using randomness and interactive parameters.

OfficialComplete terms in LICENSE.txt

Poster & Visual Design

anthropics

Create original posters and visual art in PNG and PDF formats.

OfficialComplete terms in LICENSE.txt

Claude API Helper

anthropics

Build, debug, and optimize Claude API applications with caching and model migration support.

OfficialComplete terms in LICENSE.txt

MCP Server Builder

anthropics

Build protocol servers that connect language models to external APIs and services.

OfficialComplete terms in LICENSE.txt