METHOD 1

πŸ” Consumer Research (V1)

4-pass live web search via GPT-4.1 Responses API, then 2,000+ word article generation via GPT-5 Mini.

Overview

V1 is the most comprehensive and expensive method. It performs 4 real-time web searches before generating the article, ensuring the content reflects current pricing, real competitors, and actual customer reviews.

AttributeValue
Research modelgpt-4.1 (via Responses API with web_search_preview)
Writing modelgpt-5-mini-2025-08-07
Web searches4 passes (general, pricing, competitors, company info)
Avg article length1,900–2,200 words
Avg quality score9–10/10
Cost per article€0.09–0.12 (mainly from 4Γ— web search at $0.025 each)
Avg generation time90–150 seconds
Cost Γ— 1,000 articlesβ‰ˆ €100–120

Architecture

Flow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  SERVICE DATA (from sample_services.json / EN-AU JSON)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚         RESEARCH PHASE           β”‚
         β”‚                                  β”‚
         β”‚  Pass 1: Cancellation + Reviews  β”‚  gpt-4.1
         β”‚  Pass 2: Current AUD Pricing     β”‚  + web_search_preview
         β”‚  Pass 3: 3–5 Competitors        β”‚  Responses API
         β”‚  Pass 4: Company Structured Info β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚
                        β”‚  research JSON (cancellation, reviews,
                        β”‚  pricing, competitors, company_info)
                        β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚         GENERATION PHASE         β”‚
         β”‚                                  β”‚
         β”‚  Consumer Prompt Template        β”‚  gpt-5-mini
         β”‚  + Company info block            β”‚  max_completion_tokens=8000
         β”‚  + Research JSON (5,500 chars)   β”‚  streaming=true
         β”‚  + 14-section structure          β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚
                  FINAL ARTICLE
                  (HTML, ~2,000 words, 14 H2, 3+ tables)

Files involved

Research Prompts (4 passes)

Pass 1 β€” Cancellation, Reviews, Consumer Rights

Cancellation policy, refund policy, user reviews and consumer rights for "{name}" in Australia.
Find: how to cancel (web, app, email), refund terms, TrustPilot or ProductReview.com.au rating and themes.
Return JSON: {"cancellation":{"items":[...]},"refund":{"policy":"..."},"reviews":{"rating":"X/5","positive":[...],"negative":[...]}}

Pass 2 β€” Current AUD Pricing

Exact AUD pricing for "{name}" in Australia (all plans, monthly and annual).
Return JSON: {"pricing":{"items":[{"plan":"...","price_monthly":"$XX","price_annual":"$XX","features":"..."}]}}

Pass 3 β€” Competitors (3–5 guaranteed)

Find 3 to 5 real competitors to "{name}" in {category} available in Australia.
For each: name, official website, AUD price/month, key difference, pros, cons.
Return JSON: {"competitors":[{"name":"...","website":"https://...","price_monthly":"$XX","key_difference":"...","pros":[...],"cons":[...]}]}
# Retry once if fewer than 3 competitors found

Pass 4 β€” Structured Company Info

Structured company information: full legal name, parent company, HQ, CEO, founders, founded year,
number of employees, support email, active users/month, App Store rating, Google Play rating,
annual revenue, stock ticker.
Return JSON: {"company_info":{"full_legal_name":"...","ceo":"...","headquarters":"...","founded_year":"...", ...}}

Consumer Prompt Template (14 sections)

The full prompt injected to GPT-5 Mini after research. Key characteristics:

Prompt structure

══ SERVICE DATA ══
Name, Category, Website, Main Keyword, Notes, Cancellation Address, Currency

══ STRUCTURED COMPANY DATA ══
(From Pass 4 research: CEO, HQ, Founded, Employees, Ratings, Revenue…)

══ LIVE RESEARCH DATA ══
(From Passes 1–3: cancellation steps, refund policy, reviews, pricing, competitors)

══ STRUCTURE β€” 14 SECTIONS ══
(Detailed instructions for each H2 with required H3 sub-sections)

══ HTML RULES ══
(Tables, links, brand, language, target length, tone)

══ QUALITY CHECK (self-verify before output) ══
14 H2? β‰₯2 H3 per H2? Company fact table? Tables in 2/3/12? FAQ? 1600+ words?

Cost Breakdown

ComponentModelCost (USD)Cost (EUR)
Web search Γ— 4gpt-4.1 Responses API$0.100€0.092
Writing (~3,000 prompt + 5,500 completion tokens)gpt-5-mini$0.011€0.010
TOTAL per articleβ€”~$0.111~€0.102
Γ— 20 articlesβ€”$2.22€2.04
Γ— 1,000 articlesβ€”$111€102

Note: 90% of the cost is the 4 web searches. The generation itself is very cheap.

Python Script Reference

Running the batch generator

# From Billoff/ directory
python scripts/01_generate_v1.py

# Test 2 random services with full debug logs
python scripts/test_v1_dry_run.py

# Research-only (no generation)
python scripts/test_v1_dry_run.py --research-only

# Test specific service
python scripts/test_v1_dry_run.py --service "Spotify"

Key functions

# research_phase(service_name, category, country, currency, symbol, verbose)
# β†’ Returns (research_dict, web_cost_usd)
# research_dict keys: cancellation, refund, reviews, pricing, competitors, company_info, _verified_sources

# generate_phase(service_dict, research_dict)
# β†’ Returns (html_content, tokens_dict)
# tokens_dict: {prompt: N, completion: N}

Environment variables

OPENAI_API_KEY=sk-proj-...
# Or set in config/scraper_config.py as OPENAI_API_KEY

Quality Results (Ocado test, Feb 2025)

MetricResultTarget
Word count2,1191,600+
Tables43+
H2 sections1414
H3 sub-sections3928+
FAQ sectionβœ… PresentRequired
Company fact boxβœ… PresentRequired
Billoff mentions22+
Quality score10/109+

Pros & Cons

βœ… Pros❌ Cons
Real-time pricing data (current AUD)10Γ— more expensive than V2
Actual competitors with verified websites2–3Γ— slower than V2/V3
Real customer review themesRequires working internet access
Structured company data (CEO, HQ, etc.)Web search can fail for niche services
Highest quality score (10/10 in tests)Sequential by nature (rate limit aware)
Best for important pages (top 50 services)Not cost-effective at massive scale