AugmentClaude

Sidemantic Modeler

Build and manage semantic layers that map databases to business metrics and dimensions.

Installation

  1. Make sure Claude is on your device and in your terminal.

    Skills load from ~/.claude/skills/ when Claude Code starts up — so you need it on your machine first. If you don't have it yet, install it once with the command below, then run claude in any terminal to verify.

    One-time setup
    npm i -g @anthropic-ai/claude-code

    Already have it? Skip ahead.

  2. Paste into Claude Code or into your terminal.

    This copies the whole skill folder into ~/.claude/skills/sidemantic-modeler-sidequery/ — the SKILL.md plus any scripts, reference docs, or templates the skill ships with. Safe default: works for every skill.

    Faster alternative (instruction-only skills)

    Skips the clone and grabs only the SKILL.md file. Don't use this if the skill ships Python scripts, reference markdowns, or asset templates — they won't be downloaded and the skill will fail when it tries to load them.

    Quick install (SKILL.md only)
    Sign up to copy
  3. Restart Claude Code.

    Quit and reopen Claude Code (or any other agent that loads from ~/.claude/skills/). New skills are picked up on startup.

  4. Just ask Claude.

    Skills auto-activate when your request matches the skill's description — no slash command needed. Trigger phrases live in the skill's own frontmatter; you can read them in the “What this skill does” section above.

Prefer to read the source first? Open on GitHub.

When Claude uses it

Build, validate, and manage semantic models using Sidemantic. Use when asked to create a semantic layer, define metrics/dimensions, model a database schema, generate models from SQL queries, import from Cube/dbt/LookML, or set up analytics definitions. Prioritizes CLI-first workflows, with YAML and optional Python API usage for advanced automation.

What this skill does

Sidemantic Modeler

Build semantic layers that map physical database tables to business-friendly dimensions and metrics. Sidemantic generates SQL from these definitions, handling joins, aggregations, granularity, and dialect differences automatically.

Quick Start

2-Minute First Success (CLI-first onboarding path)

uv add sidemantic duckdb

mkdir -p models

cat > models/orders.yml <<'YAML'
models:
  - name: orders
    table: orders
    primary_key: order_id
    dimensions:
      - name: status
        type: categorical
    metrics:
      - name: revenue
        agg: sum
        sql: order_amount
      - name: order_count
        agg: count
YAML

uv run sidemantic validate models/ --verbose
uv run sidemantic info models/
uv run sidemantic query models/ -c duckdb:///data.duckdb \
  "SELECT revenue, status FROM orders ORDER BY revenue DESC LIMIT 5"

Assumes an orders table already exists in data.duckdb with status and order_amount columns.

YAML (preferred for file-based models)

models:
  - name: orders
    table: orders
    primary_key: order_id
    dimensions:
      - name: status
        type: categorical
      - name: order_date
        type: time
        sql: created_at
        granularity: day
    metrics:
      - name: revenue
        agg: sum
        sql: order_amount
      - name: order_count
        agg: count

Load and query:

from sidemantic import SemanticLayer

layer = SemanticLayer.from_yaml("models.yml", connection="duckdb:///data.duckdb")
result = layer.sql("SELECT revenue, status FROM orders")

Python API (advanced, optional)

from sidemantic import Model, Dimension, Metric, SemanticLayer

layer = SemanticLayer(connection="duckdb:///data.duckdb")
Model(
    name="orders",
    table="orders",
    primary_key="order_id",
    dimensions=[
        Dimension(name="status", type="categorical"),
        Dimension(name="order_date", type="time", sql="created_at", granularity="day"),
    ],
    metrics=[
        Metric(name="revenue", agg="sum", sql="order_amount"),
        Metric(name="order_count", agg="count"),
    ],
)
result = layer.sql("SELECT revenue, status FROM orders")

Generate Models from SQL Queries

The fastest path when existing queries are available. The Migrator reverse-engineers semantic models by analyzing SQL: it extracts tables, columns, aggregations, joins, time dimensions, derived metrics, and window functions automatically.

CLI (bootstrap from a folder of .sql files)

# Generate model YAML + rewritten queries from raw SQL
sidemantic migrator --queries queries/ --generate-models output/

# Check coverage: how well do existing models handle these queries?
sidemantic migrator models/ --queries queries/ --verbose

Python API (advanced/automation only)

from sidemantic import SemanticLayer
from sidemantic.core.migrator import Migrator

# Connect to your database (optional but improves inference via information_schema)
layer = SemanticLayer(connection="duckdb:///data.duckdb", auto_register=False)
migrator = Migrator(layer, connection=layer.conn)

# Feed it SQL queries (strings, not files)
queries = [
    "SELECT status, SUM(amount) AS revenue, COUNT(*) AS orders FROM orders GROUP BY status",
    "SELECT DATE_TRUNC('month', created_at), SUM(amount) FROM orders GROUP BY 1",
    "SELECT c.region, SUM(o.amount) / COUNT(DISTINCT c.id) AS rev_per_customer "
    "FROM orders o JOIN customers c ON o.customer_id = c.id GROUP BY 1",
]

report = migrator.analyze_queries(queries)
models = migrator.generate_models(report)          # YAML-ready model dicts
graph_metrics = migrator.generate_graph_metrics(report, models)  # cross-model metrics
rewritten = migrator.generate_rewritten_queries(report)          # semantic SQL

# Write to disk
migrator.write_model_files(models, "output/models/")
migrator.write_rewritten_queries(rewritten, "output/rewritten_queries/")

# Print coverage report
migrator.print_report(report, verbose=True)

What the Migrator auto-detects

Pattern in SQLWhat it generates
SUM(amount) / COUNT(*) / AVG(price)Metric with matching agg
COUNT(DISTINCT user_id)Metric with agg: count_distinct
SUM(amount) AS revenueMetric named revenue (preserves aliases)
GROUP BY statusDimension type: categorical
DATE_TRUNC('month', created_at)Dimension type: time, granularity extracted from SQL (here: month)
JOIN customers ON o.customer_id = c.idRelationship many_to_one, foreign_key: customer_id
SUM(a) / NULLIF(COUNT(b), 0)Derived metric with formula
SUM(x) OVER (ORDER BY date ROWS ...)Cumulative metric with window
SUM(x) OVER (PARTITION BY DATE_TRUNC(...))Cumulative metric with grain_to_date
Cross-model expressionsGraph-level derived metrics

Workflow: queries first, then refine

  1. Collect existing SQL queries (dashboards, reports, ad-hoc analyses)
  2. Run migrator.analyze_queries(queries) to generate a first pass
  3. Review generated models: rename metrics, add descriptions, fix types
  4. Run coverage check to verify queries can be rewritten through the semantic layer
  5. Iterate until coverage is high

For the full Migrator API (all methods, outputs, edge cases), load references/generation.md.

Core Workflow

Follow these steps when building a semantic model from a database schema.

Step 1: Analyze the database schema

Inspect tables, columns, data types, and foreign key relationships. Identify which tables hold transactional/event data (fact tables) and which hold descriptive attributes (dimension tables).

Step 2: Create Model definitions

For each table, create a Model with:

  • name: a short, snake_case identifier
  • table: schema-qualified table name (e.g., public.orders)
  • primary_key: the table's primary key column (default: id)

Use sql instead of table for derived/virtual tables built from a SQL expression.

Step 3: Define Dimensions

Add dimensions for columns used in GROUP BY or WHERE clauses. Choose the correct type:

TypeWhen to useExample
categoricalStrings, enums, IDs for groupingstatus, region
timeDates/timestamps (enables granularity)created_at, order_date
booleanComputed true/false from SQL expressionsql: "amount > 100"
numericNumbers used for grouping, not aggregationquantity_bucket

Time dimensions require granularity (one of: second, minute, hour, day, week, month, quarter, year). Queries use double-underscore syntax: orders.order_date__month.

Use sql when the dimension maps to a different column name or a computed expression. If omitted, defaults to a column matching name.

Set parent on dimensions to create drill-down hierarchies (e.g., country > state > city).

Step 4: Define Metrics

Add metrics for columns that should be aggregated.

Simple aggregations (model-level):

aggSQL generatedNotes
sumSUM(col)Revenue, quantities
countCOUNT(*)Row counts (no sql needed)
count_distinctCOUNT(DISTINCT col)Unique values
avgAVG(col)Averages
min / maxMIN(col) / MAX(col)Extremes
medianMEDIAN(col)Median

Model-level simple metrics currently validate against: sum, count, count_distinct, avg, min, max, median.

Use filters on a metric to create filtered aggregations (e.g., filters: ["status = 'completed'"]). These become CASE WHEN expressions, not WHERE clauses.

Complex metrics (usually graph-level, in top-level metrics: section):

typePurposeRequired fields
ratioDivision of two measuresnumerator, denominator
derivedArbitrary SQL formulasql (references other metrics)
cumulativeRolling/running totalssql, optional window or grain_to_date
time_comparisonPeriod-over-periodbase_metric, comparison_type (yoy/mom/wow/dod/qoq)
conversionFunnel analysisentity, base_event, conversion_event

Graph-level metrics sit in the top-level metrics: section (outside models:). They reference model-level measures using model.metric syntax.

Step 5: Define Relationships

Connect models with relationships so Sidemantic can auto-generate JOINs.

TypeDirectionExample
many_to_oneThis model has FK to otherorders -> customers
one_to_oneUnique FKuser -> user_profile
one_to_manyOther model has FK to thiscustomer -> orders
many_to_manyThrough junction tablestudents <-> courses

Declare relationships on the model that owns the foreign key. For many_to_one, foreign_key defaults to {related_model}_id.

For many_to_many, specify through (junction model), through_foreign_key, and related_foreign_key.

Step 6: Validate and inspect

# Validate definitions (checks for errors and warnings)
sidemantic validate models/ --verbose

# Quick summary of what's defined
sidemantic info models/

Step 7: Test with queries

# Validate and inspect without writing code
uv run sidemantic validate models/ --verbose
uv run sidemantic info models/

# Execute semantic SQL through CLI
uv run sidemantic query models/ -c duckdb:///data.duckdb \
  "SELECT revenue, status FROM orders WHERE status = 'completed'"

Python API (optional):

# Structured query API
result = layer.query(
    metrics=["orders.revenue"],
    dimensions=["orders.status", "orders.order_date__month"],
    filters=["orders.status = 'completed'"],
    order_by=["orders.revenue DESC"],
    limit=10,
)

# SQL interface (auto-rewrites through semantic layer)
result = layer.sql("SELECT revenue, status FROM orders WHERE status = 'completed'")

# Compile to SQL without executing
sql = layer.compile(metrics=["orders.revenue"], dimensions=["customers.region"])

Segments

Reusable named WHERE filters applied at query time. Unlike metric filters, segments affect all metrics in the query.

models:
  - name: orders
    table: orders
    segments:
      - name: completed_orders
        sql: "status = 'completed'"
      - name: us_only
        sql: "{model}.region = 'US'"

Segments are model-scoped and used as model.segment references at query time:

uv run sidemantic query models/ -c duckdb:///data.duckdb \
  "SELECT revenue, status FROM orders WHERE completed_orders"

Python API (optional):

result = layer.query(
    metrics=["orders.revenue"],
    dimensions=["orders.status"],
    segments=["orders.completed_orders"],
)

Loading from Other Formats

CLI-first:

uv run sidemantic info path/to/models/
uv run sidemantic validate path/to/models/ --verbose

Python API (optional):

from sidemantic import SemanticLayer, load_from_directory

layer = SemanticLayer(connection="duckdb:///data.duckdb")
load_from_directory(layer, "path/to/models/")

Auto-detects: Cube (.yml with cubes:), dbt MetricFlow (.yml with semantic_models:), LookML (.lkml), Malloy (.malloy), Rill, Hex, Snowflake Cortex, and more.

For detailed field mappings from each format, load references/migration.md.

Auto-Registration

When SemanticLayer() is created with auto_register=True (the default), it sets itself as the "current layer." Any Model() or Metric() constructed while a layer is active auto-registers with it. This is why the Quick Start examples don't call layer.add_model().

If you create Models before creating a SemanticLayer, they won't be registered. Either create the layer first, or use layer.add_model(model) explicitly.

Jinja2 Parameters

SQL expressions in models support Jinja2 templating:

models:
  - name: orders
    sql: "SELECT * FROM orders WHERE region = '{{ region }}'"

Pass values at query time:

result = layer.query(metrics=["orders.revenue"], parameters={"region": "US"})

CLI Reference

All commands are run as sidemantic <command>. Use --config path/to/sidemantic.yaml to load a config file with connection and model path settings.

CommandPurpose
validate [DIR] --verboseValidate definitions, show errors and warnings
info [DIR]Summary of models, dimensions, metrics, relationships
query [DIR] -c CONNECTION SQLExecute SQL through the semantic layer (--format table/json/csv, --limit N)
migrator [DIR] --queries PATHCoverage analysis: check how well models handle SQL queries
migrator --queries PATH --generate-models OUTBootstrap: generate model YAML from SQL queries
preagg recommend [DIR]Recommend pre-aggregation tables from query patterns
preagg apply [DIR]Apply pre-aggregation recommendations
serve [DIR] -c CONNECTIONStart PostgreSQL wire-protocol server
mcp-serve [DIR] -c CONNECTIONStart MCP server for AI tool integration
workbench [DIR] -c CONNECTIONInteractive TUI with SQL editor and charting
lspStart LSP server for Sidemantic SQL files

Connection Strings

duckdb:///:memory:                             # In-memory DuckDB
duckdb:///path/to/db.duckdb                    # File-based DuckDB
duckdb://md:database_name                      # MotherDuck
postgres://user:pass@host:port/dbname          # PostgreSQL
bigquery://project_id/dataset_id               # BigQuery
snowflake://user:pass@account/database/schema  # Snowflake
clickhouse://user:pass@host:port/database      # ClickHouse
databricks://token@server-hostname/http-path   # Databricks
spark://host:port/database                     # Spark SQL
adbc://driver/uri                              # ADBC

Reference Files

Load these when you need deeper detail:

  • references/yaml-schema.md: Field-level YAML schema with every field, type, default, and constraint
  • references/patterns.md: Complete YAML templates for e-commerce, SaaS, marketing, IoT, and star schema patterns
  • references/validation.md: All validation rules, error messages, and fixes
  • references/migration.md: Field-by-field mappings from Cube, dbt, LookML, and other formats
  • references/generation.md: Migrator API, schema introspection, auto-model generation, pre-aggregation recommendations

Common Mistakes

  1. Missing granularity on time dimensions. Every type: time dimension needs granularity: day (or similar).
  2. Simple metric without agg. Metrics that are not complex types need an agg field (sum, count, avg, etc.) or a full SQL expression like sql: "SUM(amount)".
  3. Unqualified fields in multi-model queries. Single-model SQL can use unqualified names (SELECT revenue FROM orders), but cross-model queries should use explicit model.field.
  4. No relationship path between models. Cross-model queries require a chain of relationships connecting all involved models.
  5. Using type: string or type: number for dimensions. The valid types are categorical, time, boolean, numeric.
  6. Confusing model-level vs graph-level metrics. Model-level metrics use agg. Graph-level metrics (ratio, derived, etc.) go in the top-level metrics: section.
  7. Missing required fields on complex metrics. ratio needs numerator + denominator. derived needs sql. time_comparison needs base_metric. conversion needs entity, base_event, conversion_event.
  8. Plural relationship names create wrong FK defaults. Relationship named customers defaults FK to customers_id, not customer_id. Always set foreign_key explicitly.
  9. SQL expressions with YAML special characters. Quote SQL containing :, #, {, or >.
  10. Duplicate model or metric names. Names must be unique across the entire semantic layer.
  11. Creating Models before SemanticLayer. With auto-registration (default), Models must be created after the SemanticLayer, or they won't register.

Related skills