Data Observability

A Data Observability Tool That Works From Inside Your AI Assistant

Cold outreach converts at 0% when prospects must hand warehouse credentials to a stranger's SaaS. We shipped a Claude Code plugin that runs against a real demo warehouse with no signup, API key, or credentials. How we built it on existing infra, and what we chose not to build.

Our last cold-email campaign reached 298 data engineers. It produced 139 link clicks. It produced 0 signups. Rewriting the call-to-action does not fix this. Data engineers will not hand warehouse credentials to a SaaS they have not tried, and no amount of subject-line tuning changes that. The only move that works is to let them run the product before the credential ask.

47% click-through with 0% signup is not a funnel problem in the mechanical sense. The signup page loads. The trial-code validator returns is_valid=true. The Clerk sign-up flow completes when we walk it ourselves. The block is trust, not engineering. The prospect clicks, sees a form asking for an email address and a password for a SaaS they have never heard of, and closes the tab. Every dollar we spent converting clicks into form loads was spent against a ceiling shaped by the fact that the prospect had not yet seen the product do anything. This post is the engineering story of how we removed that ceiling.

$ claude plugin install armor@anomalyarmor
$ claude
> is my data healthy?

Three commands. One real answer. No signup. No API key.

The three commands above are the entire onboarding surface for a cold prospect. The plugin installs, a demo API key mints itself in the background, and the skill returns a structured answer against the BalloonBazaar demo warehouse we already use for our in-app guided onboarding. Every other data observability tool in our category asks you to open their dashboard first. We work from inside your AI assistant.

This post walks through what a cold prospect actually sees, how the integration works on five pieces of infrastructure that already existed in our codebase, and what we deliberately chose not to build. If you want to skip the engineering detail and just try it, the last section has two commands and a source link.

What cold prospects actually do

A data engineer runs the three commands above. The skill kicks off. No prompts for an account. No browser redirect. The terminal fills with a structured response that looks like this:

[demo-mode] no ARMOR_API_KEY found
[demo-mode] minted a 1-hour demo key against anomalyarmor-public-demo
[demo-mode] this key is read-only and expires 2026-04-22T16:42:00Z

armor:status
  target: anomalyarmor-public-demo (BalloonBazaar dataset)
  overall_status: degraded (2 of 14 assets need attention)

components:
  freshness:        ok       14/14 tables updating on schedule
  schema:           drift    1 column added, 1 type change in last 24h
  volume:           ok       all tables within ±15% of 7-day baseline
  anomalies:        warning  1 distribution shift on orders.price
  lineage:          ok       all 3 source-to-dashboard paths resolved

items needing attention:
  1. orders.currency_code  added 2026-04-21, NOT NULL, no default
     downstream impact:   3 dbt models, 1 dashboard
     suggested action:    add explicit cast in stg_orders.sql

  2. orders.price          distribution shift detected 2026-04-21 03:14 UTC
     p95 was 49.99 USD, now 34.50 USD (-31%)
     root cause:          likely upstream pricing change
     suggested action:    confirm with finance before nightly rollup runs

next steps (pick one):
  > armor:investigate orders.currency_code
  > armor:ask "what changed in the orders table yesterday?"
  > armor:alerts
  > get a trial code and run this against your own warehouse:
    https://app.anomalyarmor.ai/signup?intent=skill-status&q=is+my+data+healthy

The response is real. BalloonBazaar is the demo warehouse we preload for every new account, and the seeded anomalies are reproducible on a schedule so the demo stays honest. The two findings in the output above are the kind that would cause real data downtime in a production warehouse: an undefaulted NOT NULL column addition that breaks every downstream INSERT, and a 31% price distribution shift that skews every revenue-based aggregate until someone notices. A prospect running armor:status on BalloonBazaar sees the same shape of alert they would get on their own Snowflake instance, against a dataset where we control the failure modes and can reproduce them on demand.

The schema drift case in the output mirrors the pattern we cover in Schema Drift: The Silent Pipeline Killer: the column appears, the ingestion job succeeds, the downstream joins return nulls or fail hard, and nobody notices for hours. The distribution shift case is the baseline-detection pattern from our anomaly detection guide, implemented as a Welford-rolling-variance monitor under the hood. A prospect who clicks through armor:investigate orders.currency_code after the status call gets a deeper response that walks lineage, pulls the git blame on the upstream schema, and suggests a specific dbt patch. None of this requires signup.

The signup link at the end preserves the original question as a q= parameter so when the prospect completes signup, the wizard lands them on the in-app agent with their question pre-filled. Nothing resets. Nothing prompts for credentials twice.

Three things are true at once in this flow. The prospect saw real output against a real dataset. The prospect was never asked for an email address. The prospect can convert by copy-pasting the URL at the bottom of the output. None of this required us to build a marketing site demo mode, a sandboxed product branch, or a separate auth system.

How it actually works

The integration runs on five pieces of infrastructure, four of which already existed. Only one new endpoint shipped. The rest is glue.

Short-lived API keys with scopes

Our backend had been issuing API keys for a year. Keys are prefixed (aa_ for standard, aa_admin_ for admin-issued) and each key carries a scope field checked on every request. The three existing scopes are read-only, read-write, and admin. We added a fourth prefix, aa_demo_, with hard-coded read-only scope and a non-null expires_at column. The migration that added the expires_at column is a single line against auth_api_keys plus a partial index. A read-only key can query /api/v1/sdk/health/summary and /api/v1/sdk/assets and everything else that is already marked as a read. It cannot create a monitor, acknowledge an alert, or change a single row anywhere in the database.

The cost of this design choice is that the skill cannot offer write operations in demo mode. The prospect cannot create a monitor against their own table and see it fire. They can only run read-side queries against the preloaded demo dataset. We considered a second tier of demo write access scoped to a disposable company ID, and we rejected it. The incremental prospect signal from "I set up a monitor in demo" over "I ran a query in demo" did not justify the complexity of teaching the scope enforcer a new predicate.

One new unauthenticated mint endpoint with per-IP rate limiting

POST /api/v1/demo/session is the only new HTTP surface we shipped. It accepts no body. It returns a single JSON object: {"api_key": "aa_demo_xxx", "expires_at": "...", "demo_company": "anomalyarmor-public-demo"}. It is skip-auth because a cold prospect has no credentials to present. The rate limit is 2 requests per minute per source IP with a burst of 3, enforced in Redis via a Lua-scripted atomic counter so horizontal scaling does not leak the limit. If Redis is unreachable, the endpoint falls back to an in-memory counter, which is wrong for a multi-worker deployment but fails closed in the sense that it is stricter, not more permissive. The endpoint sits in the PUBLIC_PATHS allowlist of our API key auth middleware and the EXEMPT_PATHS allowlist of our subscription enforcement middleware.

The mint endpoint writes one row to auth_api_keys with expires_at = now() + interval '1 hour'. It does not mint a Clerk session, a JWT, or anything else. It is the cheapest possible credential we could issue that reuses everything downstream.

Scope enforcement was already wired on 39+ write endpoints

Before we could ship demo mode, we had to be confident that a read-only key truly could not cause write damage. The shape of the concern was simple: if even one write endpoint forgot to declare its scope guard, a demo key could hit it. We audited backend/app/shared/api/v1/public/ and found 39 endpoints across 11 files that should have required read-write or stricter. Twenty-five of them were missing the Depends(require_api_key_scope("read-write")) decorator. The fix was mechanical. The regression test is a single pytest file that walks every APIRoute under public/, asserts each non-GET endpoint declares a scope guard with _required_scope in {"read-write", "admin"}, and fails loudly if any endpoint slips past. Drift detection is built in: new write endpoints fail the test until they declare a scope.

We considered relying on code review to catch missing scope guards and rejected it. Code review catches 80% of these at best. A test that walks the route table catches 100% and wakes up when a new endpoint forgets.

The audit itself was 90 minutes of grep and a spreadsheet. We listed every endpoint under public/, noted each one's HTTP method, noted whether it had require_api_key_scope anywhere in its dependency chain, and flagged the 25 that did not. One endpoint, intelligence_ask, is a POST-shaped read (body-in-POST semantic query), and we deliberately allowlisted it rather than demote it to a GET. The test file encodes the allowlist so future reviewers can see exactly which endpoints break the rule and why. The pattern of "one explicit allowlist entry per exception, with a comment" is how we keep drift from ever looking like silent breakage. Our setup-in-minutes philosophy applies as much to our own codebase as it does to the product: if a new endpoint takes more than five minutes to wire correctly, the scaffolding is wrong.

PreToolUse hook on every skill

The skills themselves are thin. Each one is a SKILL.md file, a handful of Python helpers under sdk/python/, and a PreToolUse hook registered with Claude Code that runs before the skill does anything else. The hook logic is the same across all five shipped skills: if ARMOR_API_KEY is set in the environment, use it. Else, if ~/.armor/.demo-session.json exists and its expires_at is in the future, use the cached demo key. Else, POST to /api/v1/demo/session to mint a fresh one, write it to cache, export it to the environment, and print a two-line banner. Every skill gets this behavior without a single per-skill branch because the hook runs on every invocation.

This was the right place to centralize the demo fallback. We considered putting the fallback inside each skill's body. That would have meant five copies of the same try/except block, five places to update when the mint endpoint changes, and five places where a bug could hide. A PreToolUse hook runs in one file with one code path. The skills downstream do not know demo mode exists.

The last piece is the signup URL at the bottom of every demo response. When the skill cannot answer a prospect's question in read-only mode, or when the prospect wants to run against their own warehouse, the terminal output includes a URL of the form:

https://app.anomalyarmor.ai/signup?intent=skill-<name>&q=<url-encoded prompt>

intent=skill-status, intent=skill-ask, intent=skill-alerts, intent=skill-investigate, intent=skill-recommend are the five intents we shipped. The q= parameter is whatever the prospect originally typed. On the signup side, the existing wizard-intent routing recognizes these intent values and, after Clerk auth completes, redirects to /agent?q=<original prompt> so the prospect lands on the in-app agent with their question pre-filled. The wizard intent routing was already in place for our homepage prompt selector work, so the handoff required zero new frontend state.

We seeded five trial codes (SKILL-STATUS, SKILL-ASK, SKILL-ALERTS, SKILL-INVESTIGATE, SKILL-RECOMMEND) against the professional plan with a 14-day trial duration and a source=skill-<name> attribution. Any conversion that flows through the handoff URL is tagged so we can measure skill-sourced signups separately from cold email, blog, and direct traffic. We did not invent a new trial code system. We used the one that has been minting codes for the Zeus cold-email pipeline since February.

What we didn't build

The integration shipped in under 5 days across three pull requests because we said no to five different tempting expansions.

No new auth middleware. We did not add a demo-session middleware, a signed-URL verifier, or a cookie-based demo context. The demo key is a regular API key with an earlier expires_at and a different prefix. It flows through APIKeyAuthMiddleware like any other key. The middleware got one allowlist entry for the mint endpoint and nothing else.

No custom JWT issuer. Everyone involved suggested at least once that we should mint a JWT for the demo session so we could carry more context. We did not. JWTs require a key rotation story, a signing service in the request path, and a validation library on the Python side, and they buy us nothing over a random 32-byte token stored in Postgres with an expires_at. The demo key is the smallest possible credential.

No per-skill branching. Every skill works the same way whether it has a user key, a cached demo key, or no key at all. The PreToolUse hook resolves the key before the skill body runs. If we had distributed the fallback logic into each skill, the integration would have been 200 lines longer and noticeably slower to iterate on.

No new trial-code routing. The intent handoff reuses the wizard-intent system that already existed for the homepage prompt selector. We added five new intent values to a tuple, mapped each one to a trial code via a new skillIntentToTrialCode() helper, and seeded the codes. No new routes. No new redirect middleware. No new post-signup hook.

No /mcp-demo route. The first design doc proposed a separate /mcp-demo path on our MCP server that would serve demo-sized responses. We deleted that proposal the day after writing it. A single demo-mode company on the existing MCP surface is simpler, and the anonymous mint endpoint turns out to be the only new URL we need. Reusing the MCP server means we inherit every improvement the MCP server ships without a parallel code path.

No LLM-generated skill descriptions. Claude Code routes user prompts to skills by semantic match against each skill's description field. The temptation is to write descriptions with an LLM because LLMs are good at synonym expansion. We wrote every description by hand. The reason is that a description like "Use when the user asks about data health, pipelines, tables, freshness, schema, anomalies, lineage, alerts, or anything adjacent" reads like it covers everything and in practice routes badly because it overlaps with every other skill in the index. A description like "Use when the user asks a top-level question about whether their data is healthy right now" is narrower, routes cleanly, and misroutes less. The five shipped skills each have a description that names the specific user intent they answer and names at least one thing they do not answer. Handwritten discipline beats generated coverage here.

The design credo across the whole integration is reuse over invent. Every time we considered a new abstraction, we first looked for an existing one that could bend one degree further. Most of the time it could. When it could not, we shipped the new abstraction at the smallest scope possible (one endpoint, one migration, one column, one test).

One honest failure

The integration went live on 2026-04-22 at 09:40 UTC. By 10:02 UTC, every call to POST /api/v1/demo/session was returning a 409. The failure mode was unambiguous in the logs: duplicate key value violates unique constraint "auth_api_keys_pkey". The auth_api_keys_id_seq sequence had drifted below the max existing id during a prior migration replay on the staging-to-prod cutover, so every new insert collided with an existing row. The first production traffic exposed it.

The fix was a single SQL statement: SELECT setval('auth_api_keys_id_seq', (SELECT MAX(id) FROM auth_api_keys));. Applied at 10:04 UTC. Service resumed. No prospects hit the bad window because the feature had not been announced yet and the smoke-test curl that caught it was the only traffic.

We added a CloudWatch alarm for 5xx and 409 rates on the mint endpoint with a threshold of 2% over a 5-minute window. The alarm would have paged us during the 22-minute window above rather than catching it on the next manual check. The alarm is cheap. The sequence drift was the expensive lesson.

We also added a pre-deploy smoke test that mints a demo key and reads one asset against it, which runs after every backend rollout in staging and production. If the mint endpoint errors, the smoke test fails, and the rollout rolls back. The full path from "sequence drift" to "this cannot happen silently again" was four hours of engineering: one SQL fix, one CloudWatch alarm, one smoke test, one line in the runbook.

The deeper lesson from the incident is about the gap between "our tests pass in CI" and "our production database behaves the same way as our test database." The sequence drift was invisible in CI because CI uses a fresh database for every run. Staging caught everything except sequence state, because staging's sequences had been reset as part of the data refresh we run weekly. Production was the first environment where historical inserts and the current sequence value had been coexisting long enough for drift to matter. The class of bug is called "environment-specific state drift" and the only reliable defense is production smoke tests that exercise the real write path. The CloudWatch alarm is a safety net for the next instance of this class, not the solution.

Three asks for feedback

The integration is small enough that we can still course-correct cheaply. Three specific questions we want to answer before we commit to the current shape:

Should intent handoff deep-link into each skill's in-app analog? Right now, all five intent values land on /agent?q=<prompt>, which is the in-app agent. An argument exists for landing on a skill-specific surface instead. intent=skill-alerts could route to /alerts?q=<prompt> instead of the generic agent. The trade-off is that the agent can answer any question the skill can answer, and it is one surface to maintain. Deep-linking would be five surfaces to maintain with five navigation quirks. We think the current homogenization is correct, but it is the kind of decision that is easier to change in the first month than the sixth.

Is 1 hour the right demo key TTL? We picked 1 hour because it is short enough that a leaked key does not cause material damage and long enough that a prospect running several skills in a single session does not hit a re-mint during normal use. We could imagine going to 24 hours for lower friction, or going to 15 minutes for a stricter security posture. The observed usage pattern after a week of traffic will settle this, but if anyone has a reason to prefer a different default a priori, we would like to hear it.

Which of our 9 non-featured skills should get demo-mode handoff? We shipped demo mode on the five most-invoked skills (armor:status, armor:ask, armor:alerts, armor:investigate, armor:recommend) because those cover the majority of cold-prospect intent. We have nine more skills (connect-warehouse, tag-assets, export-lineage, schedule-scan, review-pr, and others) that are only useful in an authenticated context with a real warehouse. A few of those could plausibly work in demo mode with minor modification. Which ones would you want to run before signing up?

Any of these three decisions can flip in a single pull request. If you use the plugin and have an opinion, tell us.

Claude Code data observability plugin FAQ

What is the AnomalyArmor Claude Code plugin?

It is an MCP-based skill pack you install into Claude Code that lets you ask data observability questions ("is my data healthy", "what changed in the orders table") and get real answers, running against a live demo warehouse with no signup, API key, or credentials.

Do I need an AnomalyArmor account to try it?

No. The whole point is that the first interaction requires no account. The plugin talks to an unauthenticated demo-mint endpoint that issues a short-lived, scoped key automatically. You connect your own warehouse only if you decide to after seeing it work.

Is it safe to install? What can it access?

The skills are read-only against a demo dataset until you explicitly authenticate. Keys are short-lived and scoped, the mint endpoint is per-IP rate limited, and a PreToolUse hook gates every skill call. The SKILL.md files are deliberately under 40 lines each so you can read the entire behavior before installing.

Does this work in Cursor or other MCP clients, or only Claude Code?

The skill pack is published for Claude Code via the plugin manifest. The underlying server is a standard MCP server, so other MCP-capable assistants can point at it, but the packaged install path documented here is Claude Code.

Which warehouses does the real (authenticated) product support?

Snowflake, Databricks, BigQuery, Redshift, and Postgres, with Snowflake and Databricks treated equally as first-class sources.

Is the plugin open source?

Yes. The plugin manifest is at github.com/anomalyarmor/agents and the Python SDK the skills depend on is at github.com/anomalyarmor/core/tree/main/sdk/python. Both are MIT-licensed.

What does it cost after the demo?

AnomalyArmor is $5 per table per month on your own warehouse. The demo mode itself is free to run because it uses a shared demo dataset, not your data.

How to try it

$ claude plugin install armor@anomalyarmor
$ claude
> is my data healthy?

The plugin manifest is at github.com/anomalyarmor/agents. The Python SDK the skills depend on is at github.com/anomalyarmor/core/tree/main/sdk/python. Both are MIT-licensed. The SKILL.md files in the agents repo are deliberately short (under 40 lines each) so you can read the entire thing in a minute and decide if you want to install.

If you install and the skill returns something broken, or surprising, or boring, or wrong, we want to hear that too. Reply on this post, open an issue on anomalyarmor/agents, or email [email protected]. The 0-out-of-298 cold-email number has a ceiling, and we are trying to find out what the new ceiling looks like when the first interaction with our product is a real answer instead of a signup form.

A Data Observability Tool That Works From Inside Your AI Assistant

What cold prospects actually do

How it actually works

Short-lived API keys with scopes

One new unauthenticated mint endpoint with per-IP rate limiting

Scope enforcement was already wired on 39+ write endpoints

PreToolUse hook on every skill

What we didn't build

One honest failure

Three asks for feedback

Claude Code data observability plugin FAQ

What is the AnomalyArmor Claude Code plugin?

Do I need an AnomalyArmor account to try it?

Is it safe to install? What can it access?

Does this work in Cursor or other MCP clients, or only Claude Code?

Which warehouses does the real (authenticated) product support?

Is the plugin open source?

What does it cost after the demo?

How to try it

Read more

What Is a Data Quality Agent?

OKF vs RAG for Data Warehouse Context: Why We Chose a Living Knowledge Base

Citations for Your Data Incidents: Introducing EvidenceCapsule

How AI-Native Data Observability Changes Incident Response

What cold prospects actually do

How it actually works

Short-lived API keys with scopes

One new unauthenticated mint endpoint with per-IP rate limiting

Scope enforcement was already wired on 39+ write endpoints

PreToolUse hook on every skill

Signup handoff that preserves intent

What we didn't build

One honest failure

Three asks for feedback

Claude Code data observability plugin FAQ

What is the AnomalyArmor Claude Code plugin?

Do I need an AnomalyArmor account to try it?

Is it safe to install? What can it access?

Does this work in Cursor or other MCP clients, or only Claude Code?

Which warehouses does the real (authenticated) product support?

Is the plugin open source?

What does it cost after the demo?

How to try it

Read more

What Is a Data Quality Agent?

OKF vs RAG for Data Warehouse Context: Why We Chose a Living Knowledge Base

Citations for Your Data Incidents: Introducing EvidenceCapsule

How AI-Native Data Observability Changes Incident Response