AugmentClaude

AI-Toolkit LoRA Trainer

Train custom LoRAs for WAN video models and Z-Image with local or cloud setup.

Installation

  1. Make sure Claude is on your device and in your terminal.

    Skills load from ~/.claude/skills/ when Claude Code starts up — so you need it on your machine first. If you don't have it yet, install it once with the command below, then run claude in any terminal to verify.

    One-time setup
    npm i -g @anthropic-ai/claude-code

    Already have it? Skip ahead.

  2. Paste into Claude Code or into your terminal.

    This copies the whole skill folder into ~/.claude/skills/ai-toolkit-trainer-artokun/ — the SKILL.md plus any scripts, reference docs, or templates the skill ships with. Safe default: works for every skill.

    Faster alternative (instruction-only skills)

    Skips the clone and grabs only the SKILL.md file. Don't use this if the skill ships Python scripts, reference markdowns, or asset templates — they won't be downloaded and the skill will fail when it tries to load them.

    Quick install (SKILL.md only)
    Sign up to copy
  3. Restart Claude Code.

    Quit and reopen Claude Code (or any other agent that loads from ~/.claude/skills/). New skills are picked up on startup.

  4. Just ask Claude.

    Skills auto-activate when your request matches the skill's description — no slash command needed. Trigger phrases live in the skill's own frontmatter; you can read them in the “What this skill does” section above.

Prefer to read the source first? Open on GitHub.

When Claude uses it

Train custom LoRAs with ostris AI-Toolkit — covers WAN 2.2/2.1 (people, styles, video motion) and Z-Image (Turbo & Base, low-VRAM image LoRAs). Use when the user wants to train a WAN or Z-Image LoRA; covers local + RunPod setup, dataset prep, key params, and using the result in a ComfyUI workflow.

What this skill does

AI-Toolkit LoRA Trainer (WAN 2.2 & Z-Image)

Overview

AI-Toolkit by ostris is "the ultimate training toolkit for finetuning diffusion models" (MIT license) — a standalone trainer with its own web UI, NOT a ComfyUI custom node. It runs a Node.js UI front end over a Python (run.py) training backend, and trains LoRAs for many model families — here we cover WAN 2.2 / 2.1 video models and Z-Image (Turbo & Base).

  • Repo: https://github.com/ostris/ai-toolkit (cloned by the installers).
  • Backend: python run.py config/<job>.yml. UI: a Node.js app under ui/ that schedules/monitors jobs (you don't have to keep the UI open while a job runs).
  • Output: a standard .safetensors LoRA you drop into ComfyUI models/loras/ and load with LoraLoaderModelOnly.

Best for:

  • WAN LoRAs — a person/character, an art style, or a specific camera/video motion (image or video clip datasets). For using WAN see wan-t2v-video / wan-flf-video.
  • Z-Image LoRAs — fast, very low-VRAM image LoRAs (faces, characters, outfits, styles) on the 6B Z-Image base/turbo. For using Z-Image see z-image-base / z-image-turbo (and the z-image-xy-plot pack to compare trained LoRAs).

For low-VRAM anime image LoRAs on a different stack (kohya sd-scripts), see the sibling anima-lora-trainer.

Two LoRA kinds (WAN): a WAN image LoRA trains on still images (cheaper, ~24GB-class, good for identity/style); a WAN video LoRA trains on short clips (heavier — best on cloud — good for motion). Z-Image is image-only.

Install

The installer comes in two generations — both clone ostris/ai-toolkit, set up Torch for your GPU, and launch the web UI. Put it in a folder whose full path has NO spaces (e.g. C:\AI-Toolkit).

  • V1 — AI-TOOLKIT_AUTO_INSTALL.bat: expects Git, Python 3.10.x, and Node 18+ already in PATH.
  • V2 — AI-TOOLKIT_AUTO_INSTALL-V2.bat (recommended): uses an embedded Python 3.10.11, auto-installs Git + Node, builds a clean PATH without your system Python, and adds aggressive pip/curl retries — far fewer prerequisites and the more robust choice. (Used for the Z-Image Turbo LoRA training release.)

Both are CUDA-aware and select the Torch wheel by GPU generation:

ChoiceGPUCUDATorch indexTorch packages
1RTX 50-series (Blackwell)12.8https://download.pytorch.org/whl/cu128torch==2.7.0 torchvision==0.22.0
2RTX 40 / 30 / 20 and older12.6https://download.pytorch.org/whl/cu126torch==2.7.0 torchvision==0.22.0

Each then: clones ostris/ai-toolkit; downloads two launcher scripts (LAUNCHER-TOOLKIT.bat, SECURE_LAUNCHER-TOOLKIT.bat, from https://huggingface.co/Aitrepreneur/FLX/resolve/main/); makes the venv; installs Torch from the chosen index; pip install -r requirements.txt; then cd ui && npm run build_and_start.

RunPod / Linux — AI-TOOLKIT_AUTO_INSTALL-RUNPOD.sh (and -V2.sh)

Installs into the persistent volume /workspace/ai-toolkit; idempotent (re-run just relaunches the UI). Use RunPod's PyTorch 2.8.0 template, 100GB disk. It installs apt deps, clones the repo, makes a venv, installs Torch (torchaudio included), installs nvm + Node 22, then builds/starts the UI.

ChoiceGPUStreamTorch spec
1RTX 5000-series (Blackwell)cu128torch==2.7.0+cu128 torchvision==0.22.0+cu128 torchaudio==2.7.0+cu128
2Ada / Hopper / Ampere, oldercu126torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0

Ports & auth: UI on 8675, Jupyter on 8888. Set AI_TOOLKIT_AUTH (UI password) before launch. Reach it at https://${RUNPOD_POD_ID}-8675.proxy.runpod.net. GPU recs: RTX 4090/5090 for image (WAN t2i/t2v, Z-Image) LoRAs; RTX 6000 Pro (Blackwell) for heavy WAN video / high-res / high-rank jobs.

Launching the web UI

  • Windows: run LAUNCHER-TOOLKIT.bat (local) or SECURE_LAUNCHER-TOOLKIT.bat (password-protected) from the ai-toolkit folder.
  • RunPod: rerun the .sh — it detects the install and starts the UI instantly on :8675.

In the UI: create a Job, point it at a dataset folder, pick the model (WAN variant or Z-Image), set params, start. Jobs run in the Python backend, so you can close the browser. (Or bypass the UI: copy a config/examples/*.yml, edit, python run.py config/<job>.yml.)

Dataset preparation

AI-Toolkit pairs each sample with a same-basename .txt caption and auto-resizes/buckets aspect ratios (no pre-cropping).

Image LoRA (WAN identity/style, or Z-Image)

my_dataset/
  001.png  001.txt
  002.jpg  002.txt
  • Captions: natural-language; include a unique trigger word for a person/character.
  • ~15–40 varied images for a person; more for a broad style.

Video LoRA (WAN motion only)

Short clips + a .txt per clip; caption the motion/camera move. Per-clip frames via the job's num_frames (e.g. 81). Markedly heavier — prefer cloud GPUs.

Key training params

WAN 2.2

WAN 2.2 14B is a Mixture-of-Experts with a high-noise expert (structure/motion) and a low-noise expert (detail). AI-Toolkit trains both via Multi-stage.

ParamDefaultNotes
Linear rank / dim1616 simple; 16–32 complex/cinematic
Learning rate5e-5 (identity)7e-5–1e-4 style; high LR → plasticky skin
Steps1500–2500stop before overbaking
Resolution512 (or 768)bucketed; 768 costs more VRAM
num_frames (video)81per-clip frame count
Multi-stageHigh + Low = ONtrains both experts
Switch Every10raise to 20–50 if offload swapping is slow
Optimizer / QuantAdamW8bit / 4-bit ARA or float8fits 14B on consumer cards

Z-Image (Turbo & Base)

Z-Image is a ~6B single-stream model — no hi/lo multi-stage (leave Multi-stage OFF; you train one model). It's the lightest target here: the headline of the Z-Image releases is training on very low VRAM.

ParamStarting pointNotes
Linear rank / dim16–3232 for detailed characters/styles
Learning rate1e-4lower (5e-5) for tighter identity
Steps1500–3000dataset-dependent
Resolution768 (or 1024)Z-Image's native range
Multi-stageOFFsingle-stream model, not WAN's MoE
Optimizer / QuantAdamW8bit / float8enables sub-12GB training

Train on Base, deploy anywhere. Z-Image Base is the finetuning-friendly model; a LoRA trained on Base generally applies to the Turbo workflow too. Use the z-image-xy-plot pack to grid-compare your trained LoRAs.

Param tables are aggregated starting points (community/training-guide sources), not read from the repo's config/examples/*.yml — open the actual WAN / Z-Image example config in your clone and tune. See "Unverified".

VRAM / GPU guidance

  • Z-Image image LoRA: the lightest — trainable on modest consumer GPUs with quantization (the releases tout very-low-VRAM training); a 4090 is comfortable, smaller cards work with float8 + 512–768 res.
  • WAN image LoRA (t2i/t2v): 24GB+ locally with quantization; below that, use RunPod.
  • WAN video LoRA / high res / high rank: heavier — cloud (RTX 5090, or RTX 6000 Pro Blackwell / H100).
  • Memory savers: quantization, batch size 1, 512 res, and (WAN) raise Switch Every.

Using the trained LoRA in ComfyUI

  1. Copy <your_lora>.safetensors into ComfyUI models/loras/.
  2. Load with LoraLoaderModelOnly:
    • WAN 2.2 is dual hi/lo — apply the LoRA to both the HighNoise and LowNoise model branches (like lightning/concept LoRAs in wan-t2v-video). Typical strength 0.5–1.0.
    • Z-Image is a single model — one LoraLoaderModelOnly on the Z-Image model path (see the z-image-base / z-image-turbo packs). Strength 0.7–1.0.
    { "class_type": "LoraLoaderModelOnly",
      "inputs": { "model": ["<base_model>", 0],
                  "lora_name": "<your_lora>.safetensors",
                  "strength_model": 1.0 } }
    
  3. Prompt using the trigger word / caption style you trained with (for WAN motion LoRAs, describe the same camera/motion).

Troubleshooting

  • No module named 'torchaudio' when starting a job (AI-Toolkit). The venv's Torch stack is mismatched. Fix: activate the AI-Toolkit venv (venv\Scripts\activate), then pip uninstall torch torchaudio torchvision -y and pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 (or your CUDA's index). Only affects the AI-Toolkit install, not ComfyUI.
  • self and mat2 must have the same dtype (ComfyUI-WanVideoWrapper, WAN usage). Re-clone ComfyUI-WanVideoWrapper in custom_nodes/ and reinstall its requirements.txt, then restart ComfyUI.
  • 5000-series (Blackwell) onnxruntime "QuickGelu" / CUDA error. pip install onnxruntime==1.20.1 in the affected venv.
  • Pascal/Maxwell GPUs (GTX 9xx/10xx). Recent Torch (cu128/cu130) dropped them — reinstall the cu126 Torch build into the venv.
  • Path with spaces (Windows). Keep the install path space-free or the build/launch fails.
  • OOM during training. Quantization (4-bit ARA / float8), 512 res, batch 1, (WAN) raise Switch Every, or a bigger RunPod GPU.
  • RunPod UI won't load / asks for a password. Confirm AI_TOOLKIT_AUTH is set and you're on the 8675 proxy URL.

Unverified / verify before relying

  • The param tables (both WAN and Z-Image) are synthesized starting points, not read from the repo's config/examples/*.yml. Open the actual example config in your clone and adjust.
  • Z-Image VRAM floor for training is described qualitatively ("very low VRAM") in the release notes — confirm against your card; quantization + 512–768 res is the lever.
  • Ports: Windows UI port is whatever the launcher binds (the installer doesn't print it — check the launcher window). RunPod 8675/8888 are per the template.
  • The launcher .bat files are downloaded from a third-party HuggingFace repo (Aitrepreneur/FLX); review before running on a security-sensitive machine.
  • Model weights are fetched at job time by AI-Toolkit/HF, not by the installer — confirm the model selector lists your target WAN variant or Z-Image model before a long run.

Related skills