V4 Documentation — Claude Haiku 4.5

Overview

V4 rewrites the existing seo_content from your service JSON using Anthropic's Claude Haiku 4.5 (claude-haiku-4-5-20251001). No web search — with 4 article plan choices (A–D), multi-country locale support, and service registry injection. Distinguishes itself from V2/V3 by exceptional instruction-following and natural prose quality.

Architecture: Existing seo_content → Rewrite prompt (Plan A/B/C/D) → Billoff-branded article via Claude (target language from locale).
Strength: Anthropic models are known for clear, human-like writing that closely respects formatting constraints and brand guidelines.

Attribute	Value
Model	claude-haiku-4-5-20251001 (Anthropic)
API endpoint	`api.anthropic.com/v1/messages`
API version header	`anthropic-version: 2023-06-01`
Web search	None
Input source	Existing seo_content from service JSON (up to 4,000 chars)
Article plans	Auto / Plan A / Plan B / Plan C / Plan D
Streaming	Yes — Anthropic SSE (multi-event format)
Max output tokens	16 000 (increased to prevent truncation)
Context window	200 000 tokens
Avg article length	1 800–2 200 words
Avg quality score	8–9/10
Cost per article	≈ €0.003–0.005
Avg generation time	20–35 seconds
Cost × 1 000 articles	≈ €3–5
Cost × 50 000 articles	≈ €150–250

Architecture

Flow

┌─────────────────────────────────────────────────────────┐
│  SERVICE DATA (name, category, website, keywords, etc.) │
└────────────────────────┬────────────────────────────────┘
                         │
         ┌──────────────▼──────────────────┐
         │  REWRITE INPUT BLOCK             │
         │  • Existing seo_content (~4K ch)  │
         │  • Service metadata               │
         │  • Cancellation methods           │
         │  • Keywords                       │
         └──────────────┬───────────────────┘
                        │
         ┌──────────────▼───────────────────┐
         │      REWRITE PHASE               │
         │  Rewrite Prompt (Plan A/B/C/D)    │  claude-haiku-4-5-20251001
         │  + system message (brand)         │  max_tokens=16000
         │  + messages [{role:user, ...}]    │  stream=true
         └──────────────┬───────────────────┘
                        │
                  FINAL ARTICLE
                  (HTML, ~1 900 words, 14 H2, 2+ tables)

Heading capitalisation — enforced in system message + QUALITY CHECK self-scan: sentence case on every H1/H2/H3/H4. First word capitalised, proper nouns capitalised, everything else lowercase. Post-generation sanitiser applies as final safety net.

Files involved

Billoff/functions/api/claude.js — Cloudflare Pages Function proxy (injects ANTHROPIC_API_KEY server-side)
Billoff/web/assets/openai.js → streamClaude() + generateV4() — Browser-side generator
Billoff/scripts/04_generate_v4.py — Python batch generator (to create)

Proxy — Cloudflare Pages Function

The Anthropic API key is never exposed to the browser. All calls go through /api/claude:

// functions/api/claude.js — key logic
export async function onRequestPost(context) {
  const apiKey = context.env.ANTHROPIC_API_KEY;     // Cloudflare secret
  const body = await request.text();

  const upstream = await fetch('https://api.anthropic.com/v1/messages', {
    method: 'POST',
    headers: {
      'x-api-key':         apiKey,
      'anthropic-version': '2023-06-01',
      'content-type':      'application/json',
      // Auto-added when body.thinking.type === 'enabled' (used by Analysis):
      // 'anthropic-beta': 'interleaved-thinking-2025-05-14'
    },
    body,
  });
  return new Response(upstream.body, {
    headers: { 'Content-Type': upstream.headers.get('Content-Type') }
  });
}

Client request format

// openai.js — streamClaude()
await fetch('/api/claude', {
  method: 'POST',
  body: JSON.stringify({
    model:      'claude-haiku-4-5-20251001',
    max_tokens: 16000,
    stream:     true,
    system:     systemMsg,
    messages:   [{ role: 'user', content: prompt }],
  })
});

Streaming — Anthropic SSE Format

Claude SSE uses a multi-event protocol. Unlike OpenAI (single data: lines), Anthropic sends pairs of event: + data: lines:

// Stream events (in order)
event: message_start
data: {"type":"message_start","message":{"usage":{"input_tokens":2847,"output_tokens":1}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"<h2>1"}}

// ... more content_block_delta events ...

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":1823}}

event: message_stop
data: {"type":"message_stop"}

The parser in streamClaude() accumulates input_tokens from message_start and output_tokens from message_delta, normalising to {prompt_tokens, completion_tokens} for cost calculation.

Prompt

V4 uses the Rewrite Prompt Template (shared with V2/V3) — an English/AU/Billoff adaptation of the Postclic PROMPT_SEO_CONTENT.txt, plus a dedicated system message:

System message

"Senior consumer advocate at Billoff (billoff.com). "
"'Billoff' MUST appear 5+ times. NEVER 'Postclic'. Active voice. "
"Start with <h1> (creative, non-generic title) then <h2> sections. "
"NEVER write: 'I hope', 'As an AI', 'Based on my knowledge', 'Please note'."
// + WRITING PERSONA block appended (e.g. cancellation_specialist)

Why Claude excels at instruction-following

Claude 4.x models are specifically trained for precise adherence to formatting instructions
The system field is processed separately from the user message, giving it higher priority
Claude rarely hallucinates output format (markdown in place of HTML)
Word count targets are respected consistently (within ±10%)

Sections 2, 4–14 are identical to V1/V2/V3. Section 3 uses "Cancellation Considerations" (no competitor table — no live data).

🌍 Multi-country & Locale Support

V4 is fully locale-aware. Add locale fields to each service row — get_locale(svc) resolves them and injects country, language, currency and consumer-law references directly into both the system message and the rewrite prompt.

Field	Example (AU)	Example (BR)	Effect in prompt
`country`	Australia	Brazil	Market name in mission & rights section
`language`	English	Portuguese	`LANGUAGE: Portuguese only` — all content
`currency` / `currency_symbol`	AUD / A$	BRL / R$	Pricing tables and inline amounts
`cancel_word`	Cancel	Cancelamento	How-to-cancel section headings + step wording
`consumer_law`	ACL/ACCC	Código de Defesa do Consumidor (PROCON)	Consumer rights section — law name injected verbatim into the article

Claude-specific advantage: Claude's strong instruction-following means locale constraints are respected precisely — language mixing and wrong currency symbols are extremely rare.

Output JSON includes cost_usd, cost_eur, elapsed, and stop_reason per article. If stop_reason is max_tokens (truncation), increase MAX_TOKENS or reduce TARGET_WORDS.

Fallback chain: svc.country → editor config → script defaults.

Cost Breakdown

Pricing source: docs.anthropic.com/en/docs/about-claude/pricing (verified Feb 2026)

Component	Tokens	Rate	Cost (USD)	Cost (EUR)
Input (prompt + system)	~2 900	$0.80 / 1M	$0.00232	€0.00213
Output (article)	~1 800	$4.00 / 1M	$0.00720	€0.00662
TOTAL / article	~4 700	—	~$0.0095	~€0.0087
× 20 articles			$0.190	€0.175
× 1 000 articles			$9.52	€8.76
× 50 000 articles			$476	€438

Model pricing: Input $0.80/1M tokens, Output $4.00/1M tokens (claude-haiku-4-5-20251001).

Prompt caching (available on Claude): if you repeatedly send the same system message + service template, Anthropic charges $0.08/1M for cached reads (vs $0.80/1M). At scale with consistent prompts, this can reduce input costs by up to 90%.

Scenario	Without cache	With cache (90% reads)
50 000 articles	$476 / €438	~$360 / €331

Python Script Reference

Running the batch generator

# From Billoff/ directory
python scripts/04_generate_v4.py

# Compare all 5 methods on 1 service
python scripts/test_compare_3methods.py

Direct API call (Python)

import anthropic

client = anthropic.Anthropic(api_key="sk-ant-...")
message = client.messages.create(
    model="claude-haiku-4-5-20251001",
    max_tokens=8000,
    system="You are a senior SEO writer for Billoff...",
    messages=[{ "role": "user", "content": prompt }],
)
html = message.content[0].text
usage = message.usage   # input_tokens, output_tokens

Environment variables

ANTHROPIC_API_KEY=sk-ant-api03-...
# Cloudflare: set as ANTHROPIC_API_KEY secret in Pages project settings

Pros & Cons

✅ Pros	❌ Cons
Best instruction-following among V2/V3/V4	2–3× more expensive than V3
Natural, human-like prose quality	No real-time data (pricing, competitors)
Consistent HTML structure (14 H2 guaranteed)	Slower than V3 Gemini Flash
200K token context — handles very long inputs	Anthropic-specific SSE format (proxy required)
Word count targets respected within ±10%	Training cutoff limits knowledge recency
Prompt caching can cut costs by 90% at scale	Max 64K output tokens (generous, but fixed)
Fully parallelisable (100+ workers)	Brand may need explicit system enforcement
Highest Anthropic safety/reliability score	—

When to use V4

Pages where prose quality and tone matter more than data freshness
Large-scale generation (1 000–50 000 articles) where cost needs to stay below V1/V2
When you want the best "no-search" quality without V1's web search costs
Projects with highly consistent prompt templates (leverage prompt caching)

Priority	Best method
Maximum quality + fresh data	V1 (web research)
Best prose quality, no search	V4 Claude ← here
Best cost/quality balance	V2 or V4
Maximum throughput / lowest cost	V3 Gemini Flash

All 4 Methods — Head-to-Head

Method	Model	Web search	Cost/article	Speed	Quality
V1	gpt-4.1 + gpt-5-mini	✅ 4 passes	~€0.10	~120s	9–10/10
V2	gpt-5-mini	❌	~€0.011	~90s	9/10
V3	gemini-2.5-flash	❌	~€0.003	~20s	7–8/10
V4	claude-haiku-4-5	❌	~€0.005	~25s	8–9/10
After all 4 methods: 2-phase AI analysis — 4× Claude Haiku evals (parallel) → Claude Sonnet 4.6 synthesis (extended thinking). Produces a full scorecard, E-E-A-T deep-dive, tone analysis, category winners, production recommendation and improvement plan. Saved to history automatically.

Writing Persona

A writing persona is appended to the V4 system message for every article. Set PERSONA_ID at the top of the downloaded Python script to switch voice. Claude's superior instruction-following means the persona constraints are respected more precisely than any other method. Available: cancellation_specialist (default), consumer_rights_expert, contract_lawyer, financial_advisor. See V1 Docs for the full persona reference.

🔬 AI Comparative Analysis (after all 4 methods)

Once all 4 methods complete, the Lab runs a 2-phase parallel analysis and displays a full comparison report. The analysis is automatically saved to history alongside the article results.

Phase	Model	Role
Phase 1 (parallel)	claude-haiku-4-5 × 4	One eval per method → structured JSON (scores, E-E-A-T, improvements)
Phase 2 (streaming)	claude-sonnet-4-6 + Extended Thinking	Comparative synthesis → full HTML report (scorecard · E-E-A-T · winners · recommendation · improvement plan)

Note: the /api/claude proxy automatically adds the anthropic-beta: interleaved-thinking-2025-05-14 header when the request body contains thinking.type = "enabled". This is required for Claude's Extended Thinking feature used in Phase 2.

Total cost per analysis: ≈€0.04–0.06. See V1 Docs for the full specification.

🤖 Claude Haiku 4.5 — Smart Rewrite (V4)