V1 Documentation — Consumer Research

Overview

V1 is the most comprehensive and expensive method. It performs 4 real-time web searches before generating the article, ensuring the content reflects current pricing, real competitors, and actual customer reviews.

Attribute	Value
Research model	gpt-4.1 (via Responses API with web_search_preview)
Writing model	gpt-5-mini-2025-08-07
Web searches	4 passes (general, pricing, competitors, company info)
Avg article length	1,900–2,200 words
Avg quality score	9–10/10
Cost per article	€0.09–0.12 (mainly from 4× web search at $0.025 each)
Avg generation time	90–150 seconds
Cost × 1,000 articles	≈ €100–120

Architecture

Flow

┌─────────────────────────────────────────────────────────┐
│  SERVICE DATA (from sample_services.json / EN-AU JSON)  │
└────────────────────────┬────────────────────────────────┘
                         │
         ┌──────────────▼──────────────────┐
         │         RESEARCH PHASE           │
         │                                  │
         │  Pass 1: Cancellation + Reviews  │  gpt-4.1
         │  Pass 2: Current AUD Pricing     │  + web_search_preview
         │  Pass 3: 3–5 Competitors        │  Responses API
         │  Pass 4: Company Structured Info │
         └──────────────┬───────────────────┘
                        │
                        │  research JSON (cancellation, reviews,
                        │  pricing, competitors, company_info)
                        │
         ┌──────────────▼───────────────────┐
         │         GENERATION PHASE         │
         │                                  │
         │  Consumer Prompt Template        │  gpt-5-mini
         │  + Company info block            │  max_completion_tokens=8000
         │  + Research JSON (5,500 chars)   │  streaming=true
         │  + 14-section structure          │
         └──────────────┬───────────────────┘
                        │
                  FINAL ARTICLE
                  (HTML, ~2,000 words, 14 H2, 3+ tables)

Files involved

Billoff/scripts/01_generate_v1.py — Python batch generator
Billoff/web/assets/openai.js → generateV1() — Browser-side generator with streaming
Billoff/scripts/config.py — Prompts, models, cost tables

Research Prompts (4 passes)

Pass 1 — Cancellation, Reviews, Consumer Rights

Cancellation policy, refund policy, user reviews and consumer rights for "{name}" in Australia.
Find: how to cancel (web, app, email), refund terms, TrustPilot or ProductReview.com.au rating and themes.
Return JSON: {"cancellation":{"items":[...]},"refund":{"policy":"..."},"reviews":{"rating":"X/5","positive":[...],"negative":[...]}}

Pass 2 — Current AUD Pricing

Exact AUD pricing for "{name}" in Australia (all plans, monthly and annual).
Return JSON: {"pricing":{"items":[{"plan":"...","price_monthly":"$XX","price_annual":"$XX","features":"..."}]}}

Pass 3 — Competitors (3–5 guaranteed)

Find 3 to 5 real competitors to "{name}" in {category} available in Australia.
For each: name, official website, AUD price/month, key difference, pros, cons.
Return JSON: {"competitors":[{"name":"...","website":"https://...","price_monthly":"$XX","key_difference":"...","pros":[...],"cons":[...]}]}
# Retry once if fewer than 3 competitors found

Pass 4 — Structured Company Info

Structured company information: full legal name, parent company, HQ, CEO, founders, founded year,
number of employees, support email, active users/month, App Store rating, Google Play rating,
annual revenue, stock ticker.
Return JSON: {"company_info":{"full_legal_name":"...","ceo":"...","headquarters":"...","founded_year":"...", ...}}

Consumer Prompt Template (14 sections)

The full prompt injected to GPT-5 Mini after research. Key characteristics:

14 mandatory H2 sections: Overview → Pricing → Competitors → Reviews → Expert Analysis → Cancel Steps → Post-cancellation → Refund → Consumer Rights → Traps → Checklist → Comparison → FAQ → Address
Company fact table in section 1 (HTML <table class="company-facts">)
3 mandatory tables: pricing, competitors (with Free Trial? / Cancel Difficulty cols), final comparison
Target: 1,600–2,200 words, pure HTML output
Brand enforcement: "Billoff" required multiple times, "Postclic" forbidden

Prompt structure

══ SERVICE DATA ══
Name, Category, Website, Main Keyword, Notes, Cancellation Address, Currency

══ STRUCTURED COMPANY DATA ══
(From Pass 4 research: CEO, HQ, Founded, Employees, Ratings, Revenue…)

══ LIVE RESEARCH DATA ══
(From Passes 1–3: cancellation steps, refund policy, reviews, pricing, competitors)

══ STRUCTURE — 14 SECTIONS ══
(Detailed instructions for each H2 with required H3 sub-sections)

══ HTML RULES ══
(Tables, links, brand, language, target length, tone)

══ QUALITY CHECK (self-verify before output) ══
14 H2? ≥2 H3 per H2? Company fact table? Tables in 2/3/12? FAQ? 1600+ words?

Cost Breakdown

Component	Model	Cost (USD)	Cost (EUR)
Web search × 4	gpt-4.1 Responses API	$0.100	€0.092
Writing (~3,000 prompt + 5,500 completion tokens)	gpt-5-mini	$0.011	€0.010
TOTAL per article	—	~$0.111	~€0.102
× 20 articles	—	$2.22	€2.04
× 1,000 articles	—	$111	€102

Note: 90% of the cost is the 4 web searches. The generation itself is very cheap.

Python Script Reference

Running the batch generator

# From Billoff/ directory
python scripts/01_generate_v1.py

# Test 2 random services with full debug logs
python scripts/test_v1_dry_run.py

# Research-only (no generation)
python scripts/test_v1_dry_run.py --research-only

# Test specific service
python scripts/test_v1_dry_run.py --service "Spotify"

Key functions

# research_phase(service_name, category, country, currency, symbol, verbose)
# → Returns (research_dict, web_cost_usd)
# research_dict keys: cancellation, refund, reviews, pricing, competitors, company_info, _verified_sources

# generate_phase(service_dict, research_dict)
# → Returns (html_content, tokens_dict)
# tokens_dict: {prompt: N, completion: N}

Environment variables

OPENAI_API_KEY=sk-proj-...
# Or set in config/scraper_config.py as OPENAI_API_KEY

Quality Results (Ocado test, Feb 2025)

Metric	Result	Target
Word count	2,119	1,600+
Tables	4	3+
H2 sections	14	14
H3 sub-sections	39	28+
FAQ section	✅ Present	Required
Company fact box	✅ Present	Required
Billoff mentions	2	2+
Quality score	10/10	9+

Pros & Cons

✅ Pros	❌ Cons
Real-time pricing data (current AUD)	10× more expensive than V2
Actual competitors with verified websites	2–3× slower than V2/V3
Real customer review themes	Requires working internet access
Structured company data (CEO, HQ, etc.)	Web search can fail for niche services
Highest quality score (10/10 in tests)	Sequential by nature (rate limit aware)
Best for important pages (top 50 services)	Not cost-effective at massive scale

🔍 Consumer Research (V1)