# AIOptimize KB digest — v0.1.0

_KB freshness: 2026-04-21_  
_Change window: 0.0.9 → v0.1.0_


**19 change(s) this cycle.**


## New deprecated models

- **`anthropic:claude-2.1`** — deprecated 2024-07-21 — → use `claude-sonnet-4-6`
    - Legacy model; no longer supported.
- **`anthropic:claude-3-haiku-20240307`** — deprecated 2025-10-01 — → use `claude-haiku-4-5-20251001`
    - Haiku 4.5 is 2x faster at similar quality.
- **`anthropic:claude-3-sonnet-20240229`** — deprecated 2025-04-01 — → use `claude-sonnet-4-6`
    - Sonnet 4.6 is the current mid-tier default.
- **`openai:gpt-3.5-turbo`** — deprecated 2025-01-01 — → use `gpt-4o-mini`
    - 4o-mini is cheaper and better.
- **`openai:gpt-4`** — deprecated 2025-06-01 — → use `gpt-4o`
    - 4o outperforms gpt-4 on most tasks, half the price.

## Updated deprecations

- **`anthropic:claude-2.0`** — deprecated 2024-07-21 — → use `claude-sonnet-4-6`
    - Legacy model; no longer supported.
    - reason: `Legacy model.` → `Legacy model; no longer supported.`
- **`anthropic:claude-3-opus-20240229`** — deprecated 2025-07-01 — → use `claude-opus-4-7`
    - Opus 4.x family outperforms claude-3-opus on every public benchmark, with lower latency and price.
    - replacement: `claude-opus-4-5` → `claude-opus-4-7`
    - reason: `Opus 4.x family outperforms claude-3-opus.` → `Opus 4.x family outperforms claude-3-opus on every public benchmark, with lower latency and price.`

## Deprecations lifted / reclassified

- `anthropic:claude-1-experimental` no longer in the deprecated list.

## New techniques

- **`D004`** (anthropic)
    - Rule: Anthropic calls inside interactive handlers (chat, respond, stream, cli) should set stream=True
    - Why: Streaming cuts perceived first-token latency by 3–5x. Any surface a human is waiting on benefits.
    - Docs: https://docs.anthropic.com/en/api/messages-streaming
    - Verified: 2026-04-21
- **`D005`** (anthropic)
    - Rule: Calls inside loops or bulk-named functions (backfill, batch, bulk) should use the Messages Batches API
    - Why: Batch API returns a 50% discount on standard token pricing in exchange for up to 24h of latency. Loop call sites are the canonical batch-eligible pattern.
    - Docs: https://docs.anthropic.com/en/docs/build-with-claude/batch-processing
    - Verified: 2026-04-21
- **`anthropic-batch-api`** (anthropic)
    - Rule: Non-real-time workloads (nightly jobs, backfills) should use batch API for 50% discount
    - Why: Batch API trades latency (up to 24h) for 50% off standard token pricing.
    - Docs: https://docs.anthropic.com/en/docs/build-with-claude/batch-processing
    - Verified: 2026-04-14
- **`anthropic-max-tokens`** (anthropic)
    - Rule: max_tokens should be explicitly set to bound output cost
    - Why: Unbounded max_tokens combined with streaming or agentic loops can produce thousand-dollar runs. Setting it to the smallest value the feature needs is a hard cost ceiling.
    - Docs: https://docs.anthropic.com/en/api/messages
    - Verified: 2026-04-14
- **`anthropic-streaming-interactive`** (anthropic)
    - Rule: Interactive surfaces (chat UI, agent CLI) should enable stream=True
    - Why: First-token latency improves perceived responsiveness 3–5x.
    - Docs: https://docs.anthropic.com/en/api/messages-streaming
    - Verified: 2026-04-14
- **`D006`** (openai)
    - Rule: When prompts instruct the model to return JSON, prefer response_format over prompt wording
    - Why: Structured outputs guarantee schema validity, eliminate retry loops on malformed JSON, and drop tokens otherwise spent on format instructions.
    - Docs: https://platform.openai.com/docs/guides/structured-outputs
    - Verified: 2026-04-21
- **`D008`** (openai)
    - Rule: System-style instructions ("You are…", "Act as…") belong in a system message, not the first user message
    - Why: OpenAI prioritizes system messages differently than user messages. Putting persona/behavioral rules in user content dilutes instruction-following and blocks prompt-caching reuse.
    - Docs: https://platform.openai.com/docs/guides/text-generation#messages-and-roles
    - Verified: 2026-04-21
- **`openai-structured-output`** (openai)
    - Rule: When parsing JSON from responses, use response_format={'type': 'json_schema', ...} instead of prompt-hinted JSON
    - Why: Structured output guarantees schema validity and reduces token count by eliminating verbose prompt instructions.
    - Docs: https://platform.openai.com/docs/guides/structured-outputs
    - Verified: 2026-04-14
- **`D007`** (unknown)
    - Rule: Provider API keys (sk-ant-..., sk-...) must not appear as literals in source
    - Why: Hardcoded keys leak through git history, logs, container images, and client bundles. Rotating a leaked key is expensive; a single env lookup prevents the leak.
    - Docs: https://docs.anthropic.com/en/api/getting-started
    - Verified: 2026-04-21

## Updated techniques

- **`anthropic-prompt-caching`** (anthropic)
    - Rule: System prompts >1024 tokens reused across calls should include cache_control
    - Why: Ephemeral prompt cache on Anthropic returns 90% cost reduction on cache hits and ~85% latency reduction.
    - Docs: https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching
    - Verified: 2026-04-14
    - rationale: `Ephemeral prompt cache returns 90% cost reduction on hits.` → `Ephemeral prompt cache on Anthropic returns 90% cost reduction on cache hits and ~85% latency reduction.`
    - verified_on: `2026-04-07` → `2026-04-14`

## Retired techniques

- `legacy-guideline` (openai) — no longer recommended.

---

_Delivered weekly to your Slack or inbox. Powered by the AIOptimize KB._
