METHOD 1

πŸ” Consumer Research (V1)

4-pass live web search via GPT-4.1 Responses API + 1 structured-data extraction pass (Pass 5), then 2,000+ word article generation via GPT-5 Mini with pre-verified data injected.

Developer Resources β€” reflects your saved editor config
Customize Prompt

Overview

V1 is the most comprehensive and expensive method. It performs 4 real-time web searches before generating the article, ensuring the content reflects current pricing, real competitors, and actual customer reviews.

AttributeValue
Research modelgpt-4.1 (via Responses API with web_search_preview)
Writing modelgpt-5-mini-2025-08-07
Web searches4 passes (cancellation/reviews, pricing, competitors, company info)
Pass 5 β€” data extractionNon-streaming JSON call: extracts 6 structured fields from raw research before generation
Avg article length1,900–2,400 words
Avg quality score9–10/10
Cost per article€0.09–0.12 (mainly from 4Γ— web search at $0.025 each)
Avg generation time90–150 seconds
Cost Γ— 1,000 articlesβ‰ˆ €100–120

Architecture

Flow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  SERVICE DATA (from sample_services.json)               β”‚
β”‚  + locale fields: country, language, currency, etc.     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
                get_locale(svc)  ← resolves country/currency/consumer_law
                         β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚         RESEARCH PHASE           β”‚
         β”‚                                  β”‚
         β”‚  Pass 1: Cancellation + Reviews  β”‚  gpt-4.1
         β”‚  Pass 2: {currency} Pricing      β”‚  + web_search_preview
         β”‚  Pass 3: 3–5 Competitors        β”‚  + user_location (geo)
         β”‚  Pass 4: Company Structured Info β”‚  Responses API
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚
                        β”‚  raw research text (all 4 passes)
                        β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚  PASS 5 β€” STRUCTURED EXTRACTION  β”‚  gpt-4.1 (non-streaming)
         β”‚  Extract from raw text:           β”‚
         β”‚  notice_period_days (int)         β”‚
         β”‚  cancellation_channels (list)     β”‚
         β”‚  refund_eligibility_days (int)    β”‚
         β”‚  statutory_cooling_off_days (int) β”‚
         β”‚  country_specific_law (str)       β”‚
         β”‚  effective_date (str)             β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚  research._structured (pre-verified JSON)
                        β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚         GENERATION PHASE         β”‚
         β”‚                                  β”‚
         β”‚  Consumer Prompt Template        β”‚  gpt-5-mini
         β”‚  + Company info block            β”‚  max_completion_tokens=8000
         β”‚  + Research JSON (5,500 chars)   β”‚  streaming=true
         β”‚  + Pre-extracted schema (Pass 5) β”‚
         β”‚  + 14-section structure          β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚
                  FINAL ARTICLE
                  (<h1> creative title + 14 H2, ~2,000+ words, 3+ tables)

Files involved

Research Prompts (4 passes)

Pass 1 β€” Cancellation, Reviews, Consumer Rights

Queries are now jurisdiction-specific: they embed the country ISO code, current year, and market-specific review sites (ProductReview.com.au for AU, Avis VΓ©rifiΓ©s for FR, Trusted Shops for DE, etc.).

# Template β€” variables resolved at runtime via get_locale()
# search_context_size is tier-based: "high" services β†’ one tier below, "low" services β†’ "low"
{cancel_word} "{name}" {country_name} ({country_code}) {year}.
Find: ALL cancellation methods (web, app, email, phone, post),
exact notice period in DAYS (integer), early termination fee amount in {currency},
cooling-off rights under {consumer_law}, refund window in days, regulatory body name.
Also find: overall rating and key complaint themes from TrustPilot,
local review sites (e.g. ProductReview.com.au / Avis VΓ©rifiΓ©s / Trusted Shops)
or app stores.
Return JSON: {"cancellation":{"methods":[...],"steps":"...","notice_period":"...",
  "notice_period_days":null,"early_termination_fee":null,"cooling_off_days":null,
  "refund_window_days":null},"refund":{"policy":"...","window":"..."},
  "reviews":{"rating":"X/5","review_count":"~X","positive":[...],"negative":[...]}}
# geo-targeted: user_location {type:"approximate",city,region,country:country_code,timezone}
# search_context_size: "high" tier β†’ "medium" | "medium/low" tier β†’ "low"

Pass 5 β€” Structured Data Extraction (before generation)

After all 4 web searches, a dedicated non-streaming API call extracts structured fields from the raw research text. This pre-verified data is then injected into the generation prompt so the writing model never has to guess numeric facts.

# System prompt for extraction (gpt-4.1, non-streaming, ~0.001 USD):
Extract a JSON object from the research text with keys:
  notice_period_days        (int | null)
  cancellation_channels     (list of strings | null)
  refund_eligibility_days   (int | null)
  statutory_cooling_off_days (int | null)
  country_specific_law      (str | null)
  effective_date            (str | null)
Return null for unknown fields. Do NOT generate β€” only extract.

# Result injected into generation prompt as:
══ PRE-EXTRACTED STRUCTURED DATA (verified from research) ══
notice_period_days: 30 | refund_eligibility_days: 14 | ...
Embed these values as-is β€” do not contradict them.

Pass 2 β€” Current Pricing (locale currency)

Current {currency} pricing plans for "{name}" in {country_name} as of {year}.
List ALL plans (basic, standard, premium, business) with exact monthly AND annual prices in {currency}.
Include any free trial or free tier. Flag usage-based services with typical monthly total.
Return JSON: {"pricing":{"items":[{"plan":"...","price_monthly":"{symbol}XX/mo",
  "price_annual":"{symbol}XX/yr","savings_annual":"save {symbol}XX","features":"..."}],
  "free_trial":"yes/no, X days","typical_monthly":"{symbol}XX"}}
# search_context_size: "medium" β€” needs to visit actual pricing pages

Pass 3 β€” Competitors (3–5 guaranteed)

Find 3 to 5 real competitors to "{name}" in {country_name} ({category} category).
For each: name, official {tld} website, REAL {currency} price/month.
PRICING RULE: subscription β†’ cheapest paid plan | utility β†’ typical monthly bill
  | unknown β†’ estimate "from ~{symbol}XX/mo" β€” NEVER "Varies by plan".
Return JSON: {"competitors":[{"name":"...","website":"https://...","price_monthly":"{symbol}XX",
  "price_annual":"...","free_trial":"yes/no","cancel_difficulty":"Low|Medium|High",
  "key_difference":"...","pros":[...],"cons":[...],"best_for":"..."}]}

Pass 4 β€” Structured Company Info

Structured company information for "{name}" operating in {country_name}: full legal name,
parent company, HQ, CEO, founders, founded year, employees, support email,
active users/month, App Store rating, Google Play rating, annual revenue, stock ticker.
Return JSON: {"company_info":{"full_legal_name":"...","ceo":"...","headquarters":"...", ...}}

🌍 Multi-country & Locale Support

V1 is fully locale-aware. Every web search is geo-targeted and every generated article uses the correct language, currency and consumer law for the target country. No code change is needed β€” just add locale fields to each service row in your JSON.

Locale fields in your service JSON

FieldExample (AU)Example (FR)Notes
countryAustraliaFranceFull country name injected into the article
country_codeAUFRISO 3166-1 alpha-2 β€” used for geo-targeting
languageEnglishFrenchArticle language instruction
currencyAUDEURCurrency code for pricing queries
currency_symbolA$€Symbol used inline in search queries
country_tld.com.au.frGuides competitor website lookup
cancel_wordCancelRΓ©siliationNative-language cancellation term for search queries
consumer_lawAustralian Consumer Law (ACL/ACCC)French Consumer Code (DGCCRF)Referenced in rights section of article
city / region / timezoneSydney / New South Wales / Australia/SydneyParis / Île-de-France / Europe/ParisUsed for user_location geo-targeting in web searches

Fields are optional: if missing, the script falls back to your editor config (set once in the Prompt Editor). Priority chain: svc.country β†’ editor config β†’ script defaults.

Search query variables

All search query templates (customisable in the Editor) support these placeholders β€” resolved per service at runtime:

{name}          β†’ service name
{category}      β†’ service category
{country_name}  β†’ e.g. "France"
{currency}      β†’ e.g. "EUR"
{symbol}        β†’ e.g. "€"
{cancel_word}   β†’ e.g. "RΓ©siliation"
{tld}           β†’ e.g. ".fr"
{language}      β†’ e.g. "French"
{consumer_law}  β†’ e.g. "French Consumer Code (DGCCRF)"

Geo-targeted web search

Each of the 4 web search passes includes a user_location object passed to the web_search_preview tool, steering OpenAI's search toward local sources:

// Automatically built from locale fields:
user_location: {
  type:     "approximate",
  city:     "Paris",
  region:   "Île-de-France",
  country:  "FR",      // ISO country code
  timezone: "Europe/Paris",
}

50-country pipeline example

# sample_services.json β€” mix countries freely in one file
{
  "services": [
    { "name": "Netflix", "category": "tv-streaming",
      "country": "France", "country_code": "FR",
      "language": "French", "currency": "EUR", "currency_symbol": "€",
      "country_tld": ".fr", "cancel_word": "RΓ©siliation",
      "consumer_law": "French Consumer Code (DGCCRF)",
      "city": "Paris", "region": "Île-de-France", "timezone": "Europe/Paris" },
    { "name": "Netflix", "category": "tv-streaming",
      "country": "Australia", "country_code": "AU",
      "language": "English", "currency": "AUD", "currency_symbol": "A$",
      "country_tld": ".com.au", "cancel_word": "Cancel",
      "consumer_law": "Australian Consumer Law (ACL/ACCC)",
      "city": "Sydney", "region": "New South Wales", "timezone": "Australia/Sydney" }
  ]
}

Each service gets an article in the correct language, currency and legal context β€” fully automated, no prompt editing required per country.

Consumer Prompt Template (14 sections)

The full prompt injected to GPT-5 Mini after research. Key characteristics:

Prompt structure

══ VOICE & STYLE ══
Direct address ("you"), contractions, varied rhythm, genuine opinions.
Banned: "Furthermore", "Moreover", "In conclusion", "Navigating", "Delve into"…

══ TITLE ══
8 possible angles: Problem-first | Speed | Trap alert | Savings | Rights-first | …
FORBIDDEN pattern: "How to Cancel [name]: [subtitle]"

══ SERVICE DATA ══
Name, Category, Website, Main Keyword, Notes, Cancellation Address, Currency

══ PRE-EXTRACTED STRUCTURED DATA (Pass 5) ══
notice_period_days, cancellation_channels, refund_eligibility_days, …

══ STRUCTURED COMPANY DATA ══
(From Pass 4 research: CEO, HQ, Founded, Employees, Ratings, Revenue…)

══ LIVE RESEARCH DATA ══
(From Passes 1–3: cancellation steps, refund policy, reviews, pricing, competitors)

══ STRUCTURE β€” 14 SECTIONS ══
(Detailed instructions for each H2 with required H3 sub-sections)

══ HTML RULES ══
(Tables, links, brand, language, target length, tone, persona voice)

══ HTML RULES β€” enforced ══
Sentence case on ALL headings (first word + proper nouns only β€” no Title Case).
Title bank: 8 angles in sentence case β€” model picks the best fit.

══ QUALITY CHECK (self-verify before output) ══
Creative H1? 14 H2? β‰₯2 H3 per H2? Company fact table? Tables in 2/3/12?
FAQ? 1,600+ words? No banned phrases? Writing persona applied?
Sentence case on every heading (H1/H2/H3)?

Writing Persona

A writing persona is injected into the system message for every article. It shapes tone, vocabulary, logical connectors, and structural approach. The active persona is set in the Python script via PERSONA_ID.

PERSONA_IDToneBest for
cancellation_specialist (default)Friendly expert, practical, step-by-stepHow-to guides, trap-avoidance content
consumer_rights_expertReassuring, empowering, rights-focusedLegal/consumer protection angle
contract_lawyerPrecise, methodical, authoritativeHigh-value / complex contract services
financial_advisorData-driven, comparative, savings-focusedCost analysis, subscription audits

Personas are baked into the downloaded Python script. Change PERSONA_ID = "..." at the top of the script to switch voice for a batch. The persona block is appended to the system message β€” it does not override brand, HTML or locale rules.

Cost Breakdown

ComponentModelCost (USD)Cost (EUR)
Web search Γ— 4gpt-4.1 Responses API$0.100€0.092
Pass 5 β€” data extraction (~500 in + 200 out tokens)gpt-4.1$0.003€0.003
Writing (~3,000 prompt + 5,500 completion tokens)gpt-5-mini$0.011€0.010
TOTAL per articleβ€”~$0.114~€0.105
Γ— 20 articlesβ€”$2.28€2.10
Γ— 1,000 articlesβ€”$114€105

Note: ~88% of the cost is the 4 web searches. Pass 5 and generation are comparatively very cheap.

Python Script Reference

Running the batch generator

# From Billoff/ directory
python scripts/01_generate_v1.py

# Test 2 random services with full debug logs
python scripts/test_v1_dry_run.py

# Research-only (no generation)
python scripts/test_v1_dry_run.py --research-only

# Test specific service
python scripts/test_v1_dry_run.py --service "Spotify"

Key functions

# research_phase(service_name, category, country, currency, symbol, verbose)
# β†’ Returns (research_dict, web_cost_usd)
# research_dict keys: cancellation, refund, reviews, pricing, competitors, company_info, _verified_sources

# generate_phase(service_dict, research_dict)
# β†’ Returns (html_content, tokens_dict)
# tokens_dict: {prompt: N, completion: N}

Environment variables

OPENAI_API_KEY=sk-proj-...
# Or set in config/scraper_config.py as OPENAI_API_KEY

Quality Results (Ocado test, Feb 2025)

MetricResultTarget
Word count2,1191,600+
Tables43+
H2 sections1414
H3 sub-sections3928+
FAQ sectionβœ… PresentRequired
Company fact boxβœ… PresentRequired
Billoff mentions22+
Quality score10/109+

Pros & Cons

βœ… Pros❌ Cons
Real-time pricing data (current AUD)10Γ— more expensive than V2
Actual competitors with verified websites2–3Γ— slower than V2/V3
Real customer review themesRequires working internet access
Structured company data (CEO, HQ, etc.)Web search can fail for niche services
Highest quality score (10/10 in tests)Sequential by nature (rate limit aware)
Best for important pages (top 50 services)Not cost-effective at massive scale

πŸ”¬ AI Comparative Analysis (after all 4 methods)

Once V1, V2, V3 and V4 have all generated their articles, the Lab automatically triggers a 2-phase comparative analysis:

PhaseModelWhat it doesSpeed
Phase 1 β€” Parallel evalsclaude-haiku-4-5-20251001 Γ— 4One call per article (V1–V4, up to 7 000 chars each). Returns a structured JSON assessment: scores /10 per dimension (structure, E-E-A-T, SEO, tone, brand, economics), strengths, weaknesses, improvement ideas.~10–15s (parallel)
Phase 2 β€” Synthesisclaude-sonnet-4-6 (Extended Thinking)Receives the 4 compact JSON assessments (not the full articles β€” 3–4Γ— smaller input). Generates the full HTML report: scorecard, E-E-A-T deep-dive, tone analysis, category winners, production recommendation, improvement plan.~20–30s streaming

Why 2 phases?

Analysis cost breakdown

ComponentModelApprox. cost
Phase 1 β€” 4Γ— eval (parallel, one per method)claude-haiku-4-5 Γ— 4β‰ˆ$0.003 total
Phase 2 β€” synthesis + thinkingclaude-sonnet-4-6β‰ˆ$0.04–0.06
Total per analysis runβ€”β‰ˆ$0.044–0.064 β‰ˆ €0.04–0.06

Result persistence

The analysis HTML is automatically saved alongside the generation results in Cloudflare KV (BILLOFF_TESTS namespace) β€” shared across all browsers, devices, and private sessions. When you re-open a saved test, the analysis is restored instantly (no API call needed). For older tests without a saved analysis, a "Run Claude Analysis now" button appears.

Sections produced

  1. πŸ“Š Full Evaluation Scorecard β€” 30+ criteria across 6 groups, all 5 methods, with winner per row
  2. πŸ… E-E-A-T Deep Dive β€” Experience, Expertise, Authoritativeness, Trustworthiness per method
  3. πŸŽ™ Tone & Style Analysis β€” prose, empathy, readability per method
  4. πŸ† Category Winners β€” best for quality, scale, cost/quality, E-E-A-T, etc.
  5. πŸš€ Production Recommendation β€” which method for top-100, mid-tier, long-tail, hybrid
  6. πŸ›  Actionable Improvement Plan β€” 3–5 improvements per method with exact prompt wording / API param changes