A/B Test Analysis
Analyze test results for statistical significance and decide whether to ship or extend.
Installation
- Make sure Claude is on your device and in your terminal.
Skills load from
~/.claude/skills/when Claude Code starts up — so you need it on your machine first. If you don't have it yet, install it once with the command below, then runclaudein any terminal to verify.One-time setupnpm i -g @anthropic-ai/claude-codeAlready have it? Skip ahead.
- Paste into Claude Code or into your terminal.
This copies the whole skill folder into
~/.claude/skills/ab-test-analysis-phuryn/— the SKILL.md plus any scripts, reference docs, or templates the skill ships with. Safe default: works for every skill.Faster alternative (instruction-only skills)
Skips the clone and grabs only the SKILL.md file. Don't use this if the skill ships Python scripts, reference markdowns, or asset templates — they won't be downloaded and the skill will fail when it tries to load them.
Quick install (SKILL.md only)Sign up to copy - Restart Claude Code.
Quit and reopen Claude Code (or any other agent that loads from
~/.claude/skills/). New skills are picked up on startup. - Just ask Claude.
Skills auto-activate when your request matches the skill's description — no slash command needed. Trigger phrases live in the skill's own frontmatter; you can read them in the “What this skill does” section above.
Prefer to read the source first? Open on GitHub.
When Claude uses it
Analyze A/B test results with statistical significance, sample size validation, confidence intervals, and ship/extend/stop recommendations. Use when evaluating experiment results, checking if a test reached significance, interpreting split test data, or deciding whether to ship a variant.
What this skill does
A/B Test Analysis
Evaluate A/B test results with statistical rigor and translate findings into clear product decisions.
Context
You are analyzing A/B test results for $ARGUMENTS.
If the user provides data files (CSV, Excel, or analytics exports), read and analyze them directly. Generate Python scripts for statistical calculations when needed.
Instructions
-
Understand the experiment:
- What was the hypothesis?
- What was changed (the variant)?
- What is the primary metric? Any guardrail metrics?
- How long did the test run?
- What is the traffic split?
-
Validate the test setup:
- Sample size: Is the sample large enough for the expected effect size?
- Use the formula: n = (Z²α/2 × 2 × p × (1-p)) / MDE²
- Flag if the test is underpowered (<80% power)
- Duration: Did the test run for at least 1-2 full business cycles?
- Randomization: Any evidence of sample ratio mismatch (SRM)?
- Novelty/primacy effects: Was there enough time to wash out initial behavior changes?
- Sample size: Is the sample large enough for the expected effect size?
-
Calculate statistical significance:
- Conversion rate for control and variant
- Relative lift: (variant - control) / control × 100
- p-value: Using a two-tailed z-test or chi-squared test
- Confidence interval: 95% CI for the difference
- Statistical significance: Is p < 0.05?
- Practical significance: Is the lift meaningful for the business?
If the user provides raw data, generate and run a Python script to calculate these.
-
Check guardrail metrics:
- Did any guardrail metrics (revenue, engagement, page load time) degrade?
- A winning primary metric with degraded guardrails may not be a true win
-
Interpret results:
Outcome Recommendation Significant positive lift, no guardrail issues Ship it — roll out to 100% Significant positive lift, guardrail concerns Investigate — understand trade-offs before shipping Not significant, positive trend Extend the test — need more data or larger effect Not significant, flat Stop the test — no meaningful difference detected Significant negative lift Don't ship — revert to control, analyze why -
Provide the analysis summary:
## A/B Test Results: [Test Name] **Hypothesis**: [What we expected] **Duration**: [X days] | **Sample**: [N control / M variant] | Metric | Control | Variant | Lift | p-value | Significant? | |---|---|---|---|---|---| | [Primary] | X% | Y% | +Z% | 0.0X | Yes/No | | [Guardrail] | ... | ... | ... | ... | ... | **Recommendation**: [Ship / Extend / Stop / Investigate] **Reasoning**: [Why] **Next steps**: [What to do]
Think step by step. Save as markdown. Generate Python scripts for calculations if raw data is provided.
Further Reading
Related skills
A/B Test Designer
coreyhaines31
Design and plan A/B tests to measure which version performs better.
Ad Creative Generator
coreyhaines31
Generate and iterate high-performing ad copy, headlines, and variations for any platform.
Analytics Tracking Setup
coreyhaines31
Set up and audit analytics tracking for events, conversions, and marketing measurement.
Churn Prevention Playbook
coreyhaines31
Build retention strategies, save offers, and recovery flows to reduce customer cancellations.