AugmentClaude

Snowpipe BCDR on Azure

Design and operate Snowpipe disaster recovery patterns on Azure ADLS Gen2.

Installation

  1. Make sure Claude is on your device and in your terminal.

    Skills load from ~/.claude/skills/ when Claude Code starts up — so you need it on your machine first. If you don't have it yet, install it once with the command below, then run claude in any terminal to verify.

    One-time setup
    npm i -g @anthropic-ai/claude-code

    Already have it? Skip ahead.

  2. Paste into Claude Code or into your terminal.

    This copies the whole skill folder into ~/.claude/skills/snowflake-snowpipe-bcdr/ — the SKILL.md plus any scripts, reference docs, or templates the skill ships with. Safe default: works for every skill.

    Faster alternative (instruction-only skills)

    Skips the clone and grabs only the SKILL.md file. Don't use this if the skill ships Python scripts, reference markdowns, or asset templates — they won't be downloaded and the skill will fail when it tries to load them.

    Quick install (SKILL.md only)
    Sign up to copy
  3. Restart Claude Code.

    Quit and reopen Claude Code (or any other agent that loads from ~/.claude/skills/). New skills are picked up on startup.

  4. Just ask Claude.

    Skills auto-activate when your request matches the skill's description — no slash command needed. Trigger phrases live in the skill's own frontmatter; you can read them in the “What this skill does” section above.

Prefer to read the source first? Open on GitHub.

When Claude uses it

Use when designing or operating Snowpipe disaster recovery on Azure ADLS Gen2 — choosing between dual-pipe, RA-GRS, active-active, or Failover Group patterns, and running failover/failback/catchup. Triggers: snowpipe BCDR, snowpipe disaster recovery, snowpipe failover, dual pipe, active-active pipe, RA-GRS snowpipe, GZRS snowpipe, snowpipe catchup, pipe failback, snowpipe high availability, failover group snowpipe, snowpipe RPO RTO.

What this skill does

Snowpipe BCDR on Azure

Overview

This skill helps developers and data engineers design, implement, and operate Snowpipe business continuity / disaster recovery on Azure ADLS Gen2. Six patterns are supported — pick one based on your RPO, RTO, edition, and cost tolerance.

#PatternRPORTOCostComplexity
1Manual Catchup (single pipe)Min–hrs30–90 minLowestLow
2Active-Active + DedupZero~1 minHighestHigh
3RA-GRS Read PatternNear-zero5–15 minMediumMedium
4Active-Passive (SF + GRS)Near-zero5–15 minMediumMedium
5Snowflake-Only Failover GroupNear-zero5–15 minLow–MedLow
6Dual Storage, Dual PipesNear-zero5–10 minMediumMedium

Options 2–6 require Business Critical Edition for Failover Groups. On lower editions, only Option 1 applies.

Decision

Business Critical?
├── No → Option 1
└── Yes
    ├── Zero data loss?           → Option 2
    ├── Azure is the failure?     → Option 4 or 6
    ├── Snowflake is the failure? → Option 5
    ├── Have RA-GRS?              → Option 3
    └── Budget constrained?       → Option 1 or 5

This skill is Azure ADLS Gen2 only. AWS S3 / GCS variants are out of scope.

Core Rules

  1. Single writer per file set. Only one pipe active per file set, except Option 2 (which dedups).
  2. Storage integration must allow both URLs:
    STORAGE_ALLOWED_LOCATIONS = (
      'azure://<primary>.blob.core.windows.net/<container>/',
      'azure://<secondary>.blob.core.windows.net/<container>/'
    );
    
  3. COPY_HISTORY is replicated in Failover Groups but only retains 14 days. For longer history, maintain a FILE_LOAD_HISTORY backup table.
  4. ALTER PIPE ... REFRESH is hard-capped at 7 days. For older files, use DIRECTORY(@stage) + COPY INTO.
  5. Inbound notification integrations (DIRECTION = INBOUND) do NOT replicate. After failover-group promotion you must recreate them with the same name in the DR account, re-establish Service Principal trust, and create the DR-region Event Grid subscription.

Workflow

Step 1 — Confirm prerequisites

  • Edition (Business Critical for Options 2–6)
  • Azure storage type (LRS / GRS / RA-GRS / GZRS / RA-GZRS)
  • Event Grid + Storage Queue exists in target regions

Step 2 — Implement chosen option

Each option follows the same shape: storage integration → stage → pipe (with AUTO_INGEST=TRUE and notification integration) → monitoring views → catchup procedure.

Option 2 dedup hash columns:

MD5(METADATA$FILENAME)                          AS FILE_HASH,
MD5(CONCAT_WS('|', $1, $2, $3, $4, $5))         AS RECORD_HASH,
'PRIMARY'                                       AS SOURCE_REGION

Dedup with a Dynamic Table: QUALIFY ROW_NUMBER() OVER (PARTITION BY FILE_HASH, RECORD_HASH ORDER BY LOAD_TIMESTAMP) = 1.

Option 6 dual pipe pattern:

CREATE PIPE PIPE_PRIMARY   AUTO_INGEST=TRUE INTEGRATION='NOTIF_INT_A' AS
  COPY INTO TARGET_TABLE (..., 'PRIMARY' AS _source_region)   FROM @STAGE_PRIMARY;
CREATE PIPE PIPE_SECONDARY AUTO_INGEST=TRUE INTEGRATION='NOTIF_INT_B' AS
  COPY INTO TARGET_TABLE (..., 'SECONDARY' AS _source_region) FROM @STAGE_SECONDARY;
ALTER PIPE PIPE_SECONDARY SET PIPE_EXECUTION_PAUSED = TRUE;

⚠️ STOPPING POINT: Show planned CREATE STORAGE INTEGRATION, CREATE PIPE, and ALTER FAILOVER GROUP statements to the user and wait for approval before executing.

Step 3 — Validate

SELECT PARSE_JSON(SYSTEM$PIPE_STATUS('pipe_name')):executionState::STRING  AS state,
       PARSE_JSON(SYSTEM$PIPE_STATUS('pipe_name')):pendingFileCount::NUMBER AS pending;

Monitor pending files (>1000 for >15 min), error rate from SNOWFLAKE.ACCOUNT_USAGE.COPY_HISTORY (>5%/hr), and load latency (>30 min).

Step 4 — Failover / failback / catchup

ALTER PIPE pipe_primary   SET PIPE_EXECUTION_PAUSED = TRUE;
-- record checkpoint from COPY_HISTORY
ALTER PIPE pipe_secondary SET PIPE_EXECUTION_PAUSED = FALSE;
ALTER PIPE pipe_secondary REFRESH MODIFIED_AFTER = '<checkpoint-15min>';

⚠️ STOPPING POINT: Confirm the failover decision and the checkpoint timestamp with the user before pausing the primary pipe.

Catchup choices: ALTER PIPE … REFRESH (≤7 days), COPY INTO … FROM @stage (any age), or a cursor proc that diffs DIRECTORY(@stage) against COPY_HISTORY for an audit trail.

Common Mistakes

MistakeSymptomFix
Both pipes runningDuplicate loadsPause standby; use a failover proc
Secondary URL missing from integrationACCESS_DENIED on failoverRecreate integration with both URLs
Relying on COPY_HISTORY past 14 daysCatchup finds nothingMaintain FILE_LOAD_HISTORY table
No DR Event Grid subscriptionpendingFileCount = 0 post-failoverCreate matching Event Grid + queue in DR region
PIPE REFRESH for old filesFiles >7 days skipped silentlySwitch to DIRECTORY(@stage) + COPY INTO
Pipe recreated without AUTO_INGEST=TRUEPipe stops auto-loadingInclude AUTO_INGEST=TRUE and INTEGRATION=
Inbound notification integration not recreated post-failoverPipe healthy but pendingFileCount=0Recreate DIRECTION=INBOUND integration in DR account, re-grant Service Principal

Stopping Points

  • Step 2 — wait for approval before running CREATE/ALTER for storage integrations, pipes, and failover groups.
  • Step 4 — confirm the failover decision and checkpoint timestamp before pausing the active pipe.

References

Related skills