AugmentClaude

Bio Alignment Amplicon Clipping

Remove PCR primer sequences from amplicon sequencing reads using samtools.

Installation

  1. Make sure Claude is on your device and in your terminal.

    Skills load from ~/.claude/skills/ when Claude Code starts up — so you need it on your machine first. If you don't have it yet, install it once with the command below, then run claude in any terminal to verify.

    One-time setup
    npm i -g @anthropic-ai/claude-code

    Already have it? Skip ahead.

  2. Paste into Claude Code or into your terminal.

    This copies the whole skill folder into ~/.claude/skills/bio-alignment-amplicon-clipping-gptomics/ — the SKILL.md plus any scripts, reference docs, or templates the skill ships with. Safe default: works for every skill.

    Faster alternative (instruction-only skills)

    Skips the clone and grabs only the SKILL.md file. Don't use this if the skill ships Python scripts, reference markdowns, or asset templates — they won't be downloaded and the skill will fail when it tries to load them.

    Quick install (SKILL.md only)
    Sign up to copy
  3. Restart Claude Code.

    Quit and reopen Claude Code (or any other agent that loads from ~/.claude/skills/). New skills are picked up on startup.

  4. Just ask Claude.

    Skills auto-activate when your request matches the skill's description — no slash command needed. Trigger phrases live in the skill's own frontmatter; you can read them in the “What this skill does” section above.

Prefer to read the source first? Open on GitHub.

When Claude uses it

Trim PCR primers from aligned reads in amplicon-panel BAMs using samtools ampliconclip. Use when processing SARS-CoV-2 ARTIC, hereditary cancer panels, ctDNA hot-spot panels, or any amplicon assay where primer-derived bases would falsely confirm reference at primer footprints.

What this skill does

Version Compatibility

Reference examples tested with: samtools 1.19+, pysam 0.22+

Before using code patterns, verify installed versions match. If versions differ:

  • Python: pip show <package> then help(module.function) to check signatures
  • CLI: <tool> --version then <tool> --help to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

Alignment Amplicon Clipping

"Trim primer-derived bases from amplicon BAM" -> Soft- or hard-clip the 5' primer footprint after alignment using a primer BED, then repair fixmate/MD/NM tags.

  • CLI: samtools ampliconclip -b primers.bed input.bam -o clipped.bam (since samtools 1.11)
  • Alternative: iVar trim, BAMClipper, fgbio ClipBam

Why Primer Trimming After Alignment

Amplicon panels (SARS-CoV-2 ARTIC, hereditary cancer panels, ctDNA hot-spot panels, fusion panels, 16S rRNA) use designed PCR primers for enrichment. Primer-derived bases at read 5' ends do NOT reflect biological sequence -- they reflect the primer template. Without trimming:

  • False reference confirmation at primer footprint positions.
  • Variant allele frequency suppressed at variants under primers (the primer sequence cannot capture the variant base).
  • Strand bias artifacts (primers are typically one-strand).

Standard amplicon BAMs should NEVER be processed by samtools markdup -- by design every read at a primer location is a "duplicate" by coordinate. See duplicate-handling for the assay-aware decision.

Tool Selection

ToolWhenNotes
samtools ampliconclipDefault for amplicon panels (since 1.11)Soft- or hard-clip from BED; modifies CIGAR; invalidates MD/NM
iVar trimSARS-CoV-2 ARTIC pipelinesCoordinates by primer name/position; applies hard or soft clip
BAMClipperCapture / hybrid panels with primer overlap5'-end clipping with overlap handling
fgbio ClipBamWhen read-pair coordination mattersSoft/hard-clip with mate-aware end adjustment
cutadapt (pre-alignment)Legacy / when alignment is downstreamTrims at FASTQ stage; less precise for amplicon

Soft-Clip vs Hard-Clip

Goal: Decide whether trimmed bases are kept in the BAM (reversible) or discarded (irreversible).

Approach: Soft-clip is the safe default; hard-clip only when archiving and disk is constrained.

ModeFlagWhat it doesReversible?
Soft-clip(default) / --soft-clipBases kept in SEQ; CIGAR uses S; bases not alignedYes (CIGAR can be re-extended)
Hard-clip--hard-clipBases removed from SEQ; CIGAR uses HNo (bases lost)

Soft-clip is the recommended default. Hard-clip is irreversible -- once applied, the trimmed bases cannot be recovered for re-analysis with different primer coordinates.

Basic ampliconclip Workflow

Goal: Trim primers from a coordinate-sorted, indexed amplicon BAM and produce a downstream-ready BAM.

Approach: Run ampliconclip with primer BED, optionally --both-ends and --strand, then re-fixmate (CIGAR changed) and re-calmd (MD/NM tags invalidated by clip).

# 1. Soft-clip primers (default; reversible)
samtools ampliconclip --both-ends --strand --soft-clip \
    -b primers.bed input.bam -o clipped.bam

# 2. Re-pair tags (CIGARs changed -- mate info needs refresh)
samtools sort -n clipped.bam | \
    samtools fixmate -m - - | \
    samtools sort -o sorted.bam -

# 3. Repair MD/NM tags (invalidated by clip; required by mpileup BAQ and IGV)
samtools calmd -b sorted.bam reference.fa > clipped_final.bam
samtools index clipped_final.bam

Strand-Aware Clipping

--strand clips primer bases only on the strand the primer is designed for. Without --strand, both strands are clipped at the primer site, removing valid biological sequence on the off-strand.

Both-End Clipping

--both-ends allows clipping at both 5' and 3' positions of the read (some primers can appear at either end after alignment). Necessary for amplicon designs where reads can read through the entire amplicon.

Primer BED Format

# tab-separated, 0-based half-open like all BED
chr1   100   125   primer_1_F    +
chr1   500   525   primer_1_R    -
chr1   600   625   primer_2_F    +
chr1   1000  1025  primer_2_R    -

Tools that consume the BED: column 1-3 (region), column 6 (strand) is required for --strand. ARTIC primer schemes ship pre-built BEDs (e.g., primer.bed from artic-network/primer-schemes).

SARS-CoV-2 ARTIC Comparison

ToolApproachWhen
samtools ampliconclipSoft-clip from BED, post-alignmentnf-core/viralrecon, modern ARTIC workflows
iVar trimSoft- or hard-clip with primer-position parsingOriginal ARTIC field bioinformatics

Modern viral consensus pipelines tend to use ampliconclip then samtools consensus --config illumina --ambig for IUPAC heterozygote handling. See reference-operations for consensus generation.

After Clipping: Required Re-Processing

Clipping invalidates several tags and CIGAR-derived fields:

FieldImpactRepair
CIGARNew S or H operations addedAutomatic from ampliconclip
MD:ZMismatch positions now wrongsamtools calmd -b in.bam ref.fa
NM:iEdit distance recomputedsamtools calmd
TLENTemplate length changes when both mates clippedsamtools fixmate -m
ms, MC:ZMate score (lowercase per SAMtags) / mate CIGARsamtools fixmate -m

A clipped BAM that bypasses fixmate + calmd causes silent failures in bcftools mpileup BAQ (which depends on MD), IGV mismatch coloring, and any tool using NM for filtering.

Why Not Markdup

Amplicon reads at primer locations are by design coordinate-degenerate -- every read mapped to the same amplicon shares the same start/end coordinates because they all come from the same primer pair. samtools markdup would mark essentially every read as a duplicate and erase the dataset. For amplicon panels:

  • WITHOUT UMIs: skip dedup entirely; rely on coverage uniformity from amplicon design.
  • WITH UMIs (deep panels): use fgbio GroupReadsByUmi -> CallMolecularConsensusReads instead of markdup. See duplicate-handling.

Common Errors

ErrorCauseSolution
MD tag mismatch after clippingcalmd not runRun samtools calmd -b clipped.bam ref.fa
Variant calls with strand bias at every amplicon endForgot --strandRe-run with strand-aware clipping
Markdup output shows ~100% duplicatesAmplicon BAM was processed with markdupRestart from raw alignment; use ampliconclip; skip markdup
Unexpected reference confirmation at primer-overlapping variantsampliconclip not runRun before variant calling

Quick Reference

TaskCommand
Soft-clip primerssamtools ampliconclip --both-ends --strand -b primers.bed in.bam -o clipped.bam
Hard-clip (irreversible)samtools ampliconclip --both-ends --strand --hard-clip -b primers.bed in.bam -o clipped.bam
Repair MD/NM after clipsamtools calmd -b clipped.bam ref.fa > final.bam
Repair mate infosamtools sort -n - | samtools fixmate -m - - | samtools sort -o out.bam -

Related Skills

  • duplicate-handling - Why amplicon BAMs should not be markdup'd; UMI-aware alternatives
  • alignment-filtering - Post-clip filtering for amplicon variant calling
  • alignment-sorting - Re-sort after fixmate
  • pileup-generation - mpileup flags for amplicon (-aa -A -d 600000 -B)
  • reference-operations - Consensus generation from amplicon BAMs (samtools consensus)
  • read-qc/quality-reports - Pre-alignment adapter/quality trimming

Related skills