Rule reference

Every warning and error AIOptimize emits — with severity, what surface it fires on, why it matters, and the shape of the fix. KB v0.3.0-preview · last revised 2026-05-07.

Suppress on a single run with aioptimize scan . --ignore D003,D013. Suppress permanently per project via .aioptimize.toml.

At a glance

ID	Severity	Surface	Catches
D001	warn	SDK call · Python + TS	Anthropic system prompt without `cache_control`
D002	warn	SDK call · wrapper-aware	Deprecated Anthropic / OpenAI model id
D003	info	SDK call · wrapper-aware	Missing `max_tokens` / `maxTokens`
D004	info	SDK call · Python + TS	Interactive surface missing `stream=True`
D005	info	SDK call · Python + TS	Batch-eligible workload not using Batch API
D006	warn	SDK call · Python + TS	OpenAI prompt-hinted JSON instead of structured output
D007	error	File scan	Hardcoded API key in source
D008	warn	SDK call · Python + TS	System instruction placed in `user` role
D009	warn	Skill / subagent / workflow YAML	Frontmatter pins a deprecated model
D010	info	Skill / subagent	Empty or stub `description:`
D011	warn	Subagent	Tools list grants more than the body uses
D012	warn	`settings.json` hooks	Hook curls a provider endpoint with no `--max-time`
D013	info	`.mcp.json`	Stdio MCP entry without a `timeout`
D015	warn	Skill / subagent / command body	Prose recommends a deprecated model id
D016	warn	TS / JS workflow code	Vercel Workflow `step.run` LLM call without fallback

There is no D014. That ID was reserved for LangChain class-method detection (ChatAnthropic(...).invoke(...)), which requires variable-assignment tracking and is not yet implemented.

Details

Each card describes when the rule fires, why it matters, and the shape of the fix. For examples in real PRs, see the demo projects under examples/.

D001 warn SDK call · Python + TS

Anthropic system prompt not cached

Fires on Anthropic messages.create calls with a string system= argument longer than ~1024 tokens and no cache_control={'type': 'ephemeral'} on a content block.

Why: ephemeral prompt caching returns 25–90% cost reduction on cache hits and ~85% lower first-token latency. Reused system prompts are the canonical trigger.

system=[{"type":"text","text":SYSTEM_PROMPT,"cache_control":{"type":"ephemeral"}}]

D002 warn SDK call · wrapper-aware

Deprecated model version

Fires when the model= argument matches an entry in the KB's deprecated_models list. Wrapper-aware: also catches claude-agent-sdk.query(model=…) and Vercel AI SDK anthropic("…") / openai("…") factory calls.

Why: deprecated models continue to load but silently degrade — slower routing, higher cost, eventual hard failure. The KB ships the recommended replacement per id.

D003 info SDK call · wrapper-aware

Missing max_tokens / maxTokens

Fires when neither max_tokens= (Python / Anthropic SDK) nor maxTokens: (Vercel AI SDK camelCase) is present.

Why: unbounded output combined with streaming or agent loops can produce four-figure runs. Setting it to the smallest value the feature needs is a hard cost ceiling.

D004 info SDK call · Python + TS

Interactive surface missing stream=True

Fires on calls inside functions whose names suggest an interactive handler (chat, respond, stream, cli) and that have no stream=True.

Why: streaming cuts perceived first-token latency by 3–5x. Any surface a human is waiting on benefits.

D005 info SDK call · Python + TS

Batch-eligible workload not using Batch API

Fires on calls inside for / while loops, or inside functions named backfill / bulk / batch.

Why: Anthropic and OpenAI Batch APIs return a 50% discount on standard token pricing in exchange for up to 24h of latency. Loop call sites are the canonical batch-eligible pattern.

D006 warn SDK call · Python + TS

Prompt-hinted JSON instead of structured output

Fires on OpenAI chat.completions.create calls that instruct the model to "return JSON" in prose without setting response_format=.

Why: structured outputs guarantee schema validity, eliminate retry loops on malformed JSON, and drop tokens spent on format instructions.

D007 error File scan

Hardcoded API key in source

Fires on literals matching sk-ant-[A-Za-z0-9_\-]{20,} or sk-[A-Za-z0-9]{20,} in any scanned file.

Why: hardcoded keys leak through git history, logs, container images, and client bundles. Rotating a leaked key is expensive; an env lookup prevents the leak.

api_key=os.environ["ANTHROPIC_API_KEY"]

D008 warn SDK call · Python + TS

System instruction placed in user role

Fires when the first user message starts with system-shaped phrasing ("You are…", "Act as…") and there is no system= argument.

Why: OpenAI prioritizes system messages differently than user messages. Putting persona / behavioral rules in user content dilutes instruction-following and blocks prompt-caching reuse.

Agentic-config detectors

Fire on Claude-Code-style configuration files: skills, subagents, slash commands, hook definitions, MCP server registries, and YAML workflows.

D009 warn Skill / subagent / workflow YAML

Frontmatter pins a deprecated model

Fires when model: in a skill / subagent / YAML workflow step matches the KB's deprecated-models list.

Why: frontmatter pins are easy to set and forget. A pinned model that quietly enters deprecation continues to load, but the harness silently degrades.

D010 info Skill / subagent

Empty or stub description

Fires when a skill or subagent has no description: or one that is empty / TODO / TBD / FIXME.

Why: the harness ranks skills against user intent using the description. Empty descriptions mean the skill is dead weight — never invoked, but still loaded into context.

D011 warn Subagent

Tool over-permission

Fires when the subagent's tools: list contains a tool that never appears as a literal token in its body. Implicit-tool exemptions: Read, Write, Edit, Glob, Grep are not flagged because they are commonly invoked from natural language.

Why: over-permissive subagents can fire destructive tools (Bash, Write, WebFetch) by accident, and bloat context with unused tool schemas.

D012 warn settings.json hooks

Hook unbounded LLM call

Fires when a hook's command shells out to api.anthropic.com or api.openai.com via curl and does not pass --max-time or --connect-timeout.

Why: hooks fire on every harness lifecycle event. An unbounded LLM request inside a frequently-firing hook is a runaway-cost vector.

D013 info .mcp.json

Stdio MCP server has no timeout

Fires when a server entry in .mcp.json with type: stdio (or no explicit type) has no timeout field.

Why: stdio servers run as child processes. Without a timeout, a stalled child blocks the harness's tool-call path indefinitely, making the agent appear hung.

D015 warn Skill / subagent / command body

Body recommends a deprecated model in prose

Fires when the prose body of a skill / subagent / slash command mentions a deprecated model id from the KB list.

Why: prose recommendations propagate into derived prompts. The frontmatter check (D009) misses these. A line like "use claude-3-opus-20240229 for this" overrides the harness default and is invisible to D009.

D016 warn TS / JS workflow code

Vercel Workflow step has no fallback

Fires when a step.run(...) block in a file that imports @vercel/workflow contains a generateText / streamText / generateObject / streamObject call and the same block has neither experimental_fallback, fallbackModel, fallback:, nor retries: / retry:.

Why: workflow steps are durable boundaries. A single-model failure inside a step fails the whole workflow with no recovery path.

Severity gate behavior

Use the lowest setting your CI tolerates. Most teams run warn on the required-checks branch and error everywhere else.

`--fail-on-severity`	Exit-code-1 triggers
`info` (default if unset)	any finding
`warn`	any `warn` or `error` finding
`error`	only `error` findings