Developer Integration Guide

πŸš€ From Postclic DB to Published Page

Everything your developer needs to wire up the Billoff pipeline: choosing a generation method, validating data, verifying costs, and keeping the site in sync automatically.

Scripts β€” download all scripts you need

Pipeline at a Glance

Each country runs through this sequence. The JSON export is the entry point β€” every subsequent script enriches it in-place.

πŸ“¦ Postclic data source β€” export JSON, direct DB query, API call… Γ  toi de choisir JSON input
↓
πŸ›‘οΈ Guardrail β€” validate & auto-fix missing fields billoff_guardrail.py
↓
πŸ“ Article generation β€” choose V1, V2, V3, or V4 billoff_v1/v2/v3/v4.py
↓
🏷️ Meta SEO β€” title Β· H1 Β· description Β· slug Β· FAQ billoff_v5_metadata.py
↓
βœ… Enriched JSON ready to publish import to site
Source de donnΓ©es : Peu importe comment tu extrais les services Postclic (export fichier, requΓͺte DB directe, API, script custom) β€” le pipeline Billoff attend juste du JSON en entrΓ©e.
File strategy: Every script reads and writes back to the same JSON file. Keep a backup copy before running anything at scale.
Format: Scripts accept both {"services": [...]} (pipeline format) and a plain [...] array.

⚠️ Before You Run Anything

Read this carefully before running at scale.

Step 1 β€” Choose Your Article Generation Method

Each method produces a standalone Python script. Go to the corresponding docs page, set your editor config if needed, then click Python Script ✦ live config.

MethodModelInputQualityCost / articleSpeed
V1 β€” Research GPT-4.1 β†’ GPT-5 Mini Web search (5 passes) ⭐⭐⭐⭐⭐ $0.05–0.10 60–120 s
V2 β€” Rewrite GPT-5 Mini Existing seo_content ⭐⭐⭐⭐ $0.001–0.003 10–20 s
V3 β€” Gemini Gemini 2.5 Flash Existing seo_content ⭐⭐⭐⭐ $0.001–0.004 15–30 s
V4 β€” Claude Claude Haiku 4.5 Existing seo_content ⭐⭐⭐⭐⭐ $0.003–0.005 20–35 s
Meta SEO GPT-4o-mini Enriched seo_content β€” $0.0004 1–3 s
Recommendation: Use V2 or V4 for the majority of your catalogue (existing content, low cost), and reserve V1 for high-priority or high-traffic pages where fresh web research is worth the extra cost.

API keys required per method

# V1 and V2 β€” OpenAI
OPENAI_API_KEY = "sk-..."

# V3 β€” Google Gemini
GEMINI_API_KEY = "AIzaSy..."

# V4 β€” Anthropic Claude
ANTHROPIC_API_KEY = "sk-ant-..."

# Meta SEO β€” OpenAI (gpt-4o-mini)
OPENAI_API_KEY = "sk-..."

Step 2 β€” Run the Pre-Flight Guardrail

Download the guardrail script and run it before any generation. It validates your JSON, reports missing fields, and optionally auto-populates locale data (language, currency, cancel_word, consumer_law, slug_prefix) from built-in country defaults.

Usage

# Audit only (no file modified)
python3 billoff_guardrail.py services.json

# Auto-populate locale fields + overwrite
python3 billoff_guardrail.py services.json --fix

# Safe: write to a separate file
python3 billoff_guardrail.py services.json --fix --out services_clean.json

What the script checks

CategoryFieldsBehaviour if missing
πŸ”΄ Critical name Exit code 1 β€” pipeline will crash. Must be fixed manually.
🟑 Locale language Β· currency Β· currency_symbol Β· cancel_word Β· consumer_law Exit code 2 β€” auto-fixable with --fix from COUNTRY_DEFAULTS. Article will be in the wrong language/currency if missing.
πŸ”΅ Quality main_keyword Β· keywords Β· seo_content Β· website Β· cancellation_address Exit code 2 β€” won't block execution, but reduces article depth and accuracy.

Country auto-detection

The script detects the country from:

  1. The "country": "FR" (or any ISO code) field in the JSON root
  2. The filename prefix: FR_services.json, DE_services.json, etc.

If detection fails, add "country": "XX" (your ISO code) to the root of your JSON, or rename the file to start with the two-letter country code.

Expected output (sample)

═════════════════════════════════════════════════════════════════
  Billoff Pre-Pipeline Guardrail
═════════════════════════════════════════════════════════════════
  File:     services.json
  Services: 4821
  Country:  FR  ← detected from JSON root "country" field
  Defaults: French Β· EUR € Β· RΓ©siliation

  Field                        Missing    Coverage          Note
  ──────────────────────────── ────────   ──────────   ──────────
  βœ… name                            0        100%
  βœ… language                        0        100%
  🟑 locale  currency               42         99%  auto-fixable
  🟑 locale  currency_symbol        42         99%  auto-fixable
  βœ… cancel_word                     0        100%
  βœ… consumer_law                    0        100%
  πŸ”΅ quality main_keyword          183         96%  reduces quality
  πŸ”΅ quality seo_content            12        100%  reduces quality

  βœ…  No critical issues β€” pipeline can run.
  🟑  42 locale fields missing β€” run --fix to auto-populate from FR defaults.
  πŸ”΅  195 quality fields missing β€” article depth may be reduced.

Step 3 β€” Run the Pipeline

Test mode first (always)

# Run on 5 random services β€” detailed output, no file modified
python3 billoff_v2_generate.py services.json --test
python3 billoff_v5_metadata.py services.json --test

Production run (full batch)

# 1. Article generation (pick your method)
python3 billoff_v2_generate.py services.json

# 2. SEO metadata (always run after article generation)
python3 billoff_v5_metadata.py services.json

# services.json now has: seo_content + seo_title + h1 + seo_description + slug + faq
Output fields after full pipeline:
seo_content (HTML article) Β· seo_title Β· h1 Β· seo_description Β· slug Β· faq[]

Step 4 β€” Verify Costs on the First 100 Articles

Especially for V1 (web search), always run a 100-article batch first and compare actual vs estimated costs before committing to the full country catalogue.

V1 can deviate significantly from estimates if your services have unusually long names or complex search results. Always validate before running 50 000 articles.

How to run the first 100

Create a test file with the first 100 services from your JSON:

# One-liner to extract the first 100 services into a test file
python3 -c "
import json
with open('services.json') as f:
    d = json.load(f)
# supports both formats
svcs = d['services'] if isinstance(d, dict) else d
test = svcs[:100]
out  = {'country': d.get('country'), 'services': test} if isinstance(d, dict) else test
with open('services_100.json', 'w') as f:
    json.dump(out, f, indent=2)
print(f'Wrote {len(test)} services to services_100.json')
"

# Run the pipeline on these 100
python3 billoff_v1_generate.py services_100.json
python3 billoff_v5_metadata.py services_100.json

# Sum up actual costs from the output file
python3 -c "
import json
with open('services_100.json') as f:
    d = json.load(f)
svcs = d['services'] if isinstance(d, dict) else d
total_usd = sum(s.get('cost_usd', 0) for s in svcs)
total_eur = total_usd * 0.92
avg = total_usd / max(len(svcs),1)
done = sum(1 for s in svcs if s.get('seo_content','').strip())
print(f'Generated:  {done}/{len(svcs)} articles')
print(f'Total cost: \${total_usd:.4f} USD  /  €{total_eur:.4f}')
print(f'Avg / article: \${avg:.5f} USD  (projected 50k: \${avg*50000:.0f} USD)')
"

Reference cost table (per 100 articles)

MethodEstimated cost / 100Projected / 50 000Key cost driver
V1$5 – $10$2 500 – $5 0005Γ— web search + 5-pass GPT-4.1
V2$0.10 – $0.30$50 – $150GPT-5 Mini rewrite
V3$0.10 – $0.40$50 – $200Gemini 2.5 Flash (thinking)
V4$0.30 – $0.50$150 – $250Claude Haiku 4.5
Meta SEO$0.04$20GPT-4o-mini structured JSON
If actual costs are more than 30% higher than estimated on the first 100, check: (1) average seo_content length β€” V1 truncates at 4 500 chars but all input is charged; (2) Pass 5 structured extraction may be retrying due to JSON parse errors β€” add logging to chatJSON() in the V1 script.

Step 5 β€” Pre-launch validation: smoke test + checkpoint strategy

Before committing to a 50 000-article run, two things will save your weekend: a multi-country smoke test (1 random service per country β€” catches locale bugs, bad API keys, and config errors in under 5 min) and checkpoint saves every 500 services so a crash at article 12 000 doesn't cost you 6 hours of API spend.

45-country smoke test β€” 1 random service per country

Extract one service per country and run the full pipeline on this tiny slice before the production batch:

# 1. Build smoke-test file β€” 1 random service per country
python3 -c "
import json, random
with open('services.json') as f:
    d = json.load(f)
svcs = d['services'] if isinstance(d, dict) else d

by_country = {}
for s in svcs:
    cc = s.get('country_code') or s.get('country', 'XX')
    by_country.setdefault(cc, []).append(s)

smoke = [random.choice(v) for v in by_country.values()]
out   = {'country': 'MULTI', 'services': smoke}
with open('smoke_test.json', 'w') as f:
    json.dump(out, f, indent=2)
print(f'Wrote {len(smoke)} services across {len(by_country)} countries β†’ smoke_test.json')
"

# 2. Run your chosen method on the smoke file (no --test flag: we want ALL countries)
python3 billoff_v2_generate.py smoke_test.json
python3 billoff_v5_metadata.py smoke_test.json

# 3. Spot-check output β€” for each country verify:
#    - language is correct (no English article for a German service)
#    - currency symbol is right (€ not A$ for France)
#    - consumer law name is injected (not blank or AU fallback)
#    - headings are in sentence case
#    - article length is β‰₯ 1 500 words
python3 -c "
import json
with open('results_v2.json') as f: arts = json.load(f)
for a in arts:
    svc = a.get('service_name','?')
    wc  = len(a.get('content','').split())
    ok  = 'βœ…' if wc >= 1500 else '❌'
    print(f'{ok}  {svc:40s}  {wc:,} words  \${a.get(\"cost_usd\",0):.4f}')
"
If any country fails the smoke test β€” fix the locale config for that country (run guardrail --fix again), don't proceed to the full batch. A bad locale config silently generates wrong-language articles at scale.

Checkpoint saves β€” never lose more than 500 articles to a crash

Add this pattern to the main() of any generated script. The script saves a .checkpoint.json every 500 completed services and deletes it automatically on successful completion:

# Add near the top of main():
CHECKPOINT_EVERY = 500        # tune: smaller = safer, slightly slower I/O
_checkpoint = OUT_FILE.with_suffix(".checkpoint.json")

# Inside the ThreadPoolExecutor loop β€” replace the existing loop body:
results = []
with ThreadPoolExecutor(max_workers=MAX_WORKERS) as ex:
    futures = {ex.submit(process_service, svc): svc for svc in services}
    for i, future in enumerate(as_completed(futures), 1):
        try:
            results.append(future.result())
        except Exception as e:
            print(f"  ❌ Unhandled: {e}")

        if CHECKPOINT_EVERY > 0 and i % CHECKPOINT_EVERY == 0:
            _checkpoint.parent.mkdir(parents=True, exist_ok=True)
            with open(_checkpoint, "w", encoding="utf-8") as f:
                json.dump(results, f, ensure_ascii=False, indent=2)
            print(f"  πŸ’Ύ Checkpoint {i}/{len(services)} β€” {len(results)} articles saved")

# Remove checkpoint file after a clean run:
if _checkpoint.exists():
    _checkpoint.unlink()
    print("  πŸ—‘  Checkpoint removed β€” run complete")

Auto-resume β€” restart exactly where you left off

Add this block at the start of main(), before the ThreadPoolExecutor. It reads the checkpoint (or the final output file if it already exists), skips already-processed services, and continues from where the previous run stopped:

# --- AUTO-RESUME BLOCK (add before the ThreadPoolExecutor loop) ---
results     = []
already_done = set()

if OUT_FILE.exists():
    with open(OUT_FILE, encoding="utf-8") as f:
        results = json.load(f)
    already_done = {r.get("service_name", "") for r in results}
    print(f"  ↩  Output file exists: {len(already_done)} articles already done β€” skipping")
elif _checkpoint.exists():
    with open(_checkpoint, encoding="utf-8") as f:
        results = json.load(f)
    already_done = {r.get("service_name", "") for r in results}
    print(f"  ↩  Checkpoint found: {len(already_done)} done β€” resuming from checkpoint")

services = [s for s in services if s.get("name", "") not in already_done]

if not services:
    print("  βœ… All services already processed β€” nothing to do")
    sys.exit(0)

print(f"  β†’ {len(services)} remaining ({len(already_done)} already done)")
# --- END RESUME BLOCK ---
Usage: Just re-run the same command β€” no extra flag needed. The script auto-detects the checkpoint or output file and skips completed services.

The generated scripts already include --checkpoint N (default 500) and --resume flags that activate this pattern out of the box.

Recommended pre-launch sequence for a 45-country batch

# Day 0 β€” validation
python3 billoff_guardrail.py services.json --fix              # fix locale fields for all countries
python3 billoff_v2_generate.py smoke_test.json               # smoke test: 45 articles (1/country)
python3 billoff_v5_metadata.py smoke_test.json               # verify metadata for all locales
                                                              # β†’ review output manually before going further

# Day 1 β€” cost gate
python3 billoff_v2_generate.py services_100.json             # 100-article cost check (see Β§4)
                                                              # β†’ confirm cost Γ— 500 is within budget

# Day 2 β€” production run (with checkpoint + resume ready)
python3 billoff_v2_generate.py services.json --workers 50    # will auto-resume if interrupted
python3 billoff_v5_metadata.py services.json --workers 100   # run after articles

Step 6 β€” Tuning worker count for maximum throughput

All Billoff batch scripts process services in parallel using an async worker pool. The number of concurrent API requests β€” MAX_WORKERS β€” is the single most impactful performance knob. Too low: your 50 000-service run takes days. Too high: you hit rate limits and waste retries.

Default values per method

MethodDefault MAX_WORKERSAPIRate limit context
V1 (web search + GPT-4.1)${maxWorkers}OpenAI + web searchWeb search is the binding constraint. OpenAI default 5 β€” raise to 10 max.
V2 (GPT-4.1 research + GPT-5 Mini write)10OpenAI2 API calls/service. Tier 1: keep ≀ 10. Tier 2+: safe up to 30.
V3 (Gemini 2.5 Flash)3Google AIFree tier: 10 RPM β†’ keep ≀ 3. Pay tier (1 000 RPM): safe up to 50.
V4 (Claude Haiku 4.5)20AnthropicTier 4: 4 000 RPM / 800 K out-tokens/min β†’ ceiling ~33 (output-token bound). Safe: 20. Max: 30.
V5 Meta SEO (GPT-4o-mini)50OpenAIMetadata only (~900 tokens/call). Tier 1: ≀ 30. Tier 3+: up to 150.
Why V4 ceiling is ~33 and not 4 000 (RPM)?
At Anthropic Tier 4 you have 4 000 RPM and 800 K output-tokens/min for Claude Haiku. A typical Billoff article is ~2 000 output tokens. That means the max sustainable throughput is 800 000 Γ· 2 000 = 400 articles/min. With ~5 s average latency per Haiku call, each worker delivers 60 Γ· 5 = 12 articles/min, so 400 Γ· 12 β‰ˆ 33 workers. The RPM limit (4 000) is not the constraint β€” the output-token budget is.
Find your OpenAI tier: platform.openai.com β†’ Settings β†’ Limits. Tier 1 starts at $5 spend. Tier 2 at $50. Tier 3 at $100. Each tier roughly 10Γ— the RPM/TPM ceiling.
Find your Anthropic tier: console.anthropic.com β†’ Settings β†’ Limits.

How to find your ideal worker count β€” step-by-step benchmark

Run this benchmark on a 50-service test file before your full batch:

# 1. Extract a 50-service benchmark slice
python3 -c "
import json
with open('services.json') as f: d = json.load(f)
svcs = d['services'] if isinstance(d, dict) else d
bench = svcs[:50]
out = {'country': d.get('country'), 'services': bench} if isinstance(d, dict) else bench
with open('bench50.json', 'w') as f: json.dump(out, f, indent=2)
print(f'Wrote {len(bench)} services to bench50.json')
"

# 2. Run at increasing worker levels β€” note duration and any 429 errors
MAX_WORKERS=5   python3 billoff_v2_generate.py bench50.json  # baseline
MAX_WORKERS=20  python3 billoff_v2_generate.py bench50.json
MAX_WORKERS=50  python3 billoff_v2_generate.py bench50.json
MAX_WORKERS=100 python3 billoff_v2_generate.py bench50.json

# 3. Pick the highest level with 0 rate-limit retries β€” that's your MAX_WORKERS
   The script logs "Rate limit β€” sleeping Xs" on every retry. Count them.

To pass MAX_WORKERS via env var, add this near the top of each script:

import os
MAX_WORKERS = int(os.getenv("MAX_WORKERS", "20"))   # default 20, override via env

Progressive fallback pattern β€” automatic backoff on rate limit

For large unattended runs (overnight batches), build in a self-regulating worker count that halves on rate limit and recovers slowly. Add this wrapper around your semaphore loop:

import asyncio, time, math

async def process_with_adaptive_workers(services, generate_fn, initial_workers=20):
    """
    Process services with adaptive concurrency.
    Halves worker count on rate limit. Increases by 1 every 60 s if clean.
    """
    workers     = initial_workers
    min_workers = 2
    max_workers = initial_workers
    results     = []
    last_ok_ts  = time.time()
    pending     = list(services)

    while pending:
        batch = pending[:workers]
        pending = pending[workers:]
        sem  = asyncio.Semaphore(workers)

        async def _run(svc):
            async with sem:
                return await generate_fn(svc)

        tasks = [asyncio.create_task(_run(s)) for s in batch]
        rate_limited = False

        for coro in asyncio.as_completed(tasks):
            try:
                results.append(await coro)
                last_ok_ts = time.time()
            except Exception as e:
                if "429" in str(e) or "rate_limit" in str(e).lower():
                    rate_limited = True
                    for t in tasks: t.cancel()   # abort batch
                    pending = batch + pending       # requeue entire batch
                    break
                raise

        if rate_limited:
            workers = max(min_workers, workers // 2)
            wait    = 60
            print(ff"  ⚑ Rate limit hit β€” dropping to {workers} workers, sleeping {wait}s")
            await asyncio.sleep(wait)
        elif time.time() - last_ok_ts > 60 and workers < max_workers:
            workers = min(max_workers, workers + 1)   # recover +1 every 60 s of clean running

    return results
Usage: Call await process_with_adaptive_workers(services, generate_one) where generate_one(svc) is your per-service coroutine. The function halves concurrency on every 429 and slowly ramps back. For overnight batch of 50 000 services this is the recommended pattern β€” it self-tunes without manual intervention.

Rate limit reference table

ProviderFree / Tier 1Tier 2Tier 3+Key limit type
OpenAI (GPT-4.1, 5 Mini, 4o-mini)500 RPM5 000 RPM10 000 RPMRPM + TPM β€” both enforced
Anthropic (Haiku 4.5)50 RPM1 000 RPM2 000 RPMRPM + input tokens/min
Google AI (Gemini 2.5 Flash)10 RPM1 000 RPM2 000 RPMRPM β€” thinking tokens count extra
Bing Web Search (V1)3 TPS / 1 000/mo10 TPScustomTransactions per second
V1 is special: V1 runs 5 web search passes per article. At MAX_WORKERS=10 that's 50 concurrent search requests. Bing S1 tier allows 3 TPS β€” so keep V1 workers ≀ 3 on S1. Use Brave Search (unlimited tier) if you need higher parallelism on V1.

Recommended settings for a 50 000-service run

MethodOpenAI Tier 1OpenAI Tier 2+Expected duration (50k)
V135~7–14 days (search-bound)
V22080~8 h (Tier 2) / ~40 h (Tier 1)
V35 (free) / 40 (pay)β€”~14 h (pay tier)
V41030 (Tier 2)~20 h (Tier 2)
V5 Meta SEO30100~3 h (Tier 2)

Estimates assume ~1.5 s average latency per API call. Actual duration depends on model load and token count.

Add to your go-live checklist: Before the full run, do a 50-service benchmark with your real API tier, note the clean worker ceiling, set MAX_WORKERS accordingly, and enable the adaptive fallback for overnight batches.

Step 7 β€” Dynamic sync: keep Postclic in sync

Once the initial bulk generation is done, set up a cron job to process only new or updated services β€” so every addition to Postclic automatically gets an article and metadata.

Architecture

Every X days / on new service addition:
──────────────────────────────────────────────────────────────────
Postclic DB β†’ JSON export     (your existing export mechanism)
     ↓
Find NEW services              (diff by slug or name vs last run)
     ↓
Run guardrail --fix            (populate locale fields)
     ↓
Run article generator          (V1/V2/V3/V4 on new services only)
     ↓
Run Meta SEO                   (metadata on new services only)
     ↓
Merge into master JSON         (append new results, keep existing)
     ↓
Import / publish               (your CMS import step)

Incremental sync script template

#!/usr/bin/env python3
"""billoff_sync.py β€” Incremental pipeline runner.
Finds services in NEW_EXPORT that are not yet in MASTER_JSON
(matched by name), processes them through the pipeline,
and merges results back into MASTER_JSON.
"""
import json, os, subprocess, sys
from pathlib import Path

MASTER_JSON  = "data/services.json"          # your accumulating master file
NEW_EXPORT   = "data/export_latest.json"      # latest Postclic data (any source)
NEW_ONLY     = "data/new_services.json"       # temp file for new services only
METHOD       = "v2"                          # v1 / v2 / v3 / v4
ARTICLE_SCRIPT = f"billoff_{METHOD}_generate.py"
META_SCRIPT    = "billoff_v5_metadata.py"

def _load_services(path):
    with open(path) as f: d = json.load(f)
    return (d["services"] if isinstance(d, dict) else d), d

def _save_services(path, services, original_data):
    if isinstance(original_data, list):
        out = services
    else:
        original_data["services"] = services
        out = original_data
    with open(path, "w") as f:
        json.dump(out, f, ensure_ascii=False, indent=2)

# Load master (existing processed services)
master_svcs, master_data = _load_services(MASTER_JSON)
existing_names = {s["name"] for s in master_svcs}

# Load latest export
new_svcs, new_data = _load_services(NEW_EXPORT)
new_only = [s for s in new_svcs if s["name"] not in existing_names]

if not new_only:
    print(f"  βœ…  No new services found β€” nothing to process.")
    sys.exit(0)

print(f"  πŸ†•  {len(new_only)} new services found β€” running pipeline...")

# Write new-only subset to temp file
_save_services(NEW_ONLY, new_only, {"country": master_data.get("country","AU"), "services": new_only})

# Guardrail β€” fix locale fields
subprocess.run(["python3", "billoff_guardrail.py", NEW_ONLY, "--fix"], check=True)

# Article generation
subprocess.run(["python3", ARTICLE_SCRIPT, NEW_ONLY], check=True)

# Meta SEO
subprocess.run(["python3", META_SCRIPT, NEW_ONLY], check=True)

# Merge processed new services back into master
processed_svcs, _ = _load_services(NEW_ONLY)
master_svcs.extend(processed_svcs)
_save_services(MASTER_JSON, master_svcs, master_data)

print(f"  βœ…  Merged {len(processed_svcs)} new articles into {MASTER_JSON}")
os.remove(NEW_ONLY)

Cron job setup

# Run every Monday at 3 AM β€” add to crontab with: crontab -e
0 3 * * 1  cd /var/www/billoff && python3 billoff_sync.py >> logs/sync.log 2>&1

# Or with a virtual environment:
0 3 * * 1  cd /var/www/billoff && /usr/local/bin/python3 billoff_sync.py >> logs/sync.log 2>&1
How "new" is detected: The sync script matches by name field. If a service can be renamed on Postclic, use a stable id field instead β€” just replace the existing_names set-comparison with an existing_ids comparison.

Obtenir les donnΓ©es Postclic β€” Γ  toi de choisir

Le script de sync attend juste un fichier AU_EXPORT_LATEST.json en entrée. Comment tu le génères, c'est ton affaire — plusieurs approches valides :

Peu importe la mΓ©thode β€” le pipeline Billoff se fiche de l'origine. Il lit du JSON, c'est tout. La structure attendue :

# Format pipeline standard
{ "country": "AU", "services": [ {"name": "...", "seo_content": "...", ...} ] }

# Ou plain array β€” les deux sont acceptΓ©s
[ {"name": "...", "seo_content": "...", ...} ]

Country configuration reference β€” all 37 Postclic markets

Every service row in your JSON must carry the locale fields below for the pipeline to generate correctly-localised articles (right language, currency, consumer law, cancel word). The guardrail --fix will auto-populate missing fields from the country defaults in this table β€” but only if country_code is present on each service row.

Always verify with AI for new countries. Consumer laws and regulatory body names change. Before launching a new market, ask GPT-4o or Claude to validate the consumer_law field β€” see the AI locale generator script below.

Required locale fields per service row

FieldExample (FR)Used byNotes
country_codeFRAll methods + guardrailISO 3166-1 alpha-2 β€” required
countryFranceV1–V4 promptsFull country name in article language
languageFrenchV1–V4 promptsLanguage name in English
currencyEURV1–V4 promptsISO 4217 currency code
currency_symbol€V1–V4 promptsSymbol used in pricing tables
country_tld.frV1 search queriesCountry TLD for competitor search
cancel_wordRΓ©silierV1–V4 headingsNative-language cancel verb
consumer_lawLoi Hamon / Code de la ConsommationV1–V4 legal sectionOfficial law name β€” cited in articles
city / region / timezoneParis / Île-de-France / Europe/ParisV1 geo-targetingOptional β€” improves web search relevance

Complete reference β€” all 37 markets

These values are baked into the guardrail --fix auto-population and the REVIEW_SITES map used in V1 web searches.

CodeCountryLanguageCurrencySymbolcancel_wordconsumer_law (short)TLDreview_sites
AEUAEArabicAEDAEDΨ₯Ω„ΨΊΨ§Ψ‘Federal Law 15/2020.aeGoogle Reviews
ARArgentinaSpanishARS$CancelarLey Defensa Consumidor 24.240.arTrustpilot.ar, Mercado Libre
ATAustriaGermanEUR€KΓΌndigenKonsumentenschutzgesetz (KSchG).atTrustpilot.at
BEBelgiumFrenchEUR€AnnulerCode de droit Γ©conomique (CDE).beTrustpilot.be
BGBulgariaBulgarianBGNлвАнулиранСЗакон Π·Π° Π·Π°Ρ‰ΠΈΡ‚Π° Π½Π° ΠΏΠΎΡ‚Ρ€Π΅Π±ΠΈΡ‚Π΅Π»ΠΈΡ‚Π΅.bgTrustpilot.bg
BRBrazilPortugueseBRLR$CancelarCΓ³digo de Defesa do Consumidor (CDC).brReclame Aqui
CACanadaEnglishCADCA$CancelConsumer Protection Acts (provincial).caTrustpilot.ca, BBB
CHSwitzerlandGermanCHFCHFKΓΌndigenKonsumentenschutzgesetz.chTrustpilot.ch
CLChileSpanishCLP$CancelarLey 19.496 (LPDC).clReclamos.cl
COColombiaSpanishCOP$CancelarEstatuto del Consumidor (Ley 1480).coSIC Colombia
CZCzech RepublicCzechCZKKčZruΕ‘itZΓ‘kon o ochranΔ› spotΕ™ebitele.czHeureka.cz
DEGermanyGermanEUR€KΓΌndigenBGB Β§Β§ 312 / Widerrufsrecht.deTrusted Shops, Trustpilot.de
DKDenmarkDanishDKKkrOpsigeForbrugerkΓΈbsloven.dkTrustpilot.dk
ESSpainSpanishEUR€CancelarLGDCU / Ley General Defensa Consumidores.esTrustpilot.es
FIFinlandFinnishEUR€PeruuttaaKuluttajansuojalaki.fiTrustpilot.fi
FRFranceFrenchEUR€RΓ©silierLoi Hamon / Code de la Consommation.frAvis VΓ©rifiΓ©s, Trustpilot.fr
GBUnited KingdomEnglishGBPΒ£CancelConsumer Rights Act 2015.co.ukTrustpilot.co.uk, Reviews.io
GRGreeceGreekEURβ‚¬Ξ‘ΞΊΟΟΟ‰ΟƒΞ·ΞΟŒΞΌΞΏΟ‚ 2251/1994 (Consumer Protection).grTrustpilot.gr, Skroutz.gr
HUHungaryHungarianHUFFtLemondΓ‘sFogyasztΓ³vΓ©delmi tΓΆrvΓ©ny (1997. Γ©vi CLV).huÁrukeresΕ‘.hu
IDIndonesiaIndonesianIDRRpBatalkanUU Perlindungan Konsumen No. 8/1999.idGoogle Reviews
IEIrelandEnglishEUR€CancelConsumer Rights Act 2022.ieTrustpilot.ie
INIndiaEnglishINRβ‚ΉCancelConsumer Protection Act 2019 (CCPA).inMouthShut.com, Trustpilot.in
ITItalyItalianEUR€DisdettaCodice del Consumo (D.Lgs. 206/2005).itTrustpilot.it, eKomi
JPJapanJapaneseJPYΒ₯γ‚­γƒ£γƒ³γ‚»γƒ«ζΆˆθ²»θ€…ε₯‘約法 (Consumer Contract Act).jpKakaku.com
NGNigeriaEnglishNGN₦CancelFCCPC Consumer Protection Act 2019.ngGoogle Reviews
NLNetherlandsDutchEUR€OpzeggenBurgerlijk Wetboek (Consumentenwet).nlTrustpilot.nl, Kiyoh
NONorwayNorwegianNOKkrAvslutteForbrukerkjΓΈpsloven.noTrustpilot.no
NZNew ZealandEnglishNZDNZ$CancelConsumer Guarantees Act 1993.co.nzConsumer.org.nz
PEPeruSpanishPENS/CancelarCΓ³digo de ProtecciΓ³n del Consumidor (Ley 29571).peIndecopi.gob.pe
PLPolandPolishPLNzΕ‚AnulujUstawa o prawach konsumenta.plOpineo.pl, Ceneo.pl
PTPortugalPortugueseEUR€CancelarLei de Defesa do Consumidor.ptTrustpilot.pt
RORomaniaRomanianRONleiAnulareLegea nr. 449/2003.roTrustpilot.ro
SESwedenSwedishSEKkrAvslutaKonsumentkΓΆplagen.seTrustpilot.se, Prisjakt
SGSingaporeEnglishSGDS$CancelConsumer Protection (Fair Trading) Act (CPFTA).sgTrustpilot.sg
TRTurkeyTurkishTRYβ‚ΊΔ°ptalTΓΌketici Kanunu No. 6502.trŞikayetvar.com
USUnited StatesEnglishUSD$CancelFTC regulations / State consumer protection laws.comBBB, Trustpilot.com, Yelp
ZASouth AfricaEnglishZARRCancelConsumer Protection Act 68/2008.co.zaHellopeter.com

AI-powered locale field generator β€” fill any missing country

When a service row is missing locale fields (or you're adding a country not in the table above), run this script. It uses GPT-4o-mini with structured output to generate all locale fields accurately, then writes them back into your JSON.

#!/usr/bin/env python3
"""billoff_locale_fill.py β€” AI-powered locale field generator.
Fills missing locale fields (language, currency, consumer_law, cancel_word, etc.)
for any country, using GPT-4o-mini structured output.
Run BEFORE the guardrail and before any generation script.
"""
import json, os, sys
from openai import OpenAI
from pathlib import Path

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") or ""
DATA_FILE      = Path(sys.argv[1]) if len(sys.argv) > 1 else Path("data/services.json")
client         = OpenAI(api_key=OPENAI_API_KEY)

# ── Hard-coded defaults for all 37 Postclic markets ──────────────
# These are always preferred over AI calls for known countries.
LOCALE_DEFAULTS = {
  "AE": {"country":"UAE","language":"Arabic","currency":"AED","currency_symbol":"AED","country_tld":".ae","cancel_word":"Ψ₯Ω„ΨΊΨ§Ψ‘","consumer_law":"UAE Consumer Protection Law (Federal Law 15/2020)"},
  "AR": {"country":"Argentina","language":"Spanish","currency":"ARS","currency_symbol":"$","country_tld":".ar","cancel_word":"Cancelar","consumer_law":"Ley de Defensa del Consumidor (Ley 24.240)"},
  "AT": {"country":"Austria","language":"German","currency":"EUR","currency_symbol":"€","country_tld":".at","cancel_word":"KΓΌndigen","consumer_law":"Konsumentenschutzgesetz (KSchG)"},
  "BE": {"country":"Belgium","language":"French","currency":"EUR","currency_symbol":"€","country_tld":".be","cancel_word":"Annuler","consumer_law":"Code de droit Γ©conomique (CDE)"},
  "BG": {"country":"Bulgaria","language":"Bulgarian","currency":"BGN","currency_symbol":"Π»Π²","country_tld":".bg","cancel_word":"АнулиранС","consumer_law":"Π—Π°ΠΊΠΎΠ½ Π·Π° Π·Π°Ρ‰ΠΈΡ‚Π° Π½Π° ΠΏΠΎΡ‚Ρ€Π΅Π±ΠΈΡ‚Π΅Π»ΠΈΡ‚Π΅"},
  "BR": {"country":"Brazil","language":"Portuguese","currency":"BRL","currency_symbol":"R$","country_tld":".br","cancel_word":"Cancelar","consumer_law":"CΓ³digo de Defesa do Consumidor (CDC)"},
  "CA": {"country":"Canada","language":"English","currency":"CAD","currency_symbol":"CA$","country_tld":".ca","cancel_word":"Cancel","consumer_law":"Consumer Protection Acts (provincial)"},
  "CH": {"country":"Switzerland","language":"German","currency":"CHF","currency_symbol":"CHF","country_tld":".ch","cancel_word":"KΓΌndigen","consumer_law":"Konsumentenschutzgesetz"},
  "CL": {"country":"Chile","language":"Spanish","currency":"CLP","currency_symbol":"$","country_tld":".cl","cancel_word":"Cancelar","consumer_law":"Ley 19.496 (LPDC)"},
  "CO": {"country":"Colombia","language":"Spanish","currency":"COP","currency_symbol":"$","country_tld":".co","cancel_word":"Cancelar","consumer_law":"Estatuto del Consumidor (Ley 1480)"},
  "CZ": {"country":"Czech Republic","language":"Czech","currency":"CZK","currency_symbol":"Kč","country_tld":".cz","cancel_word":"ZruΕ‘it","consumer_law":"ZΓ‘kon o ochranΔ› spotΕ™ebitele"},
  "DE": {"country":"Germany","language":"German","currency":"EUR","currency_symbol":"€","country_tld":".de","cancel_word":"KΓΌndigen","consumer_law":"BGB Β§Β§ 312 ff. / Widerrufsrecht"},
  "DK": {"country":"Denmark","language":"Danish","currency":"DKK","currency_symbol":"kr","country_tld":".dk","cancel_word":"Opsige","consumer_law":"ForbrugerkΓΈbsloven"},
  "ES": {"country":"Spain","language":"Spanish","currency":"EUR","currency_symbol":"€","country_tld":".es","cancel_word":"Cancelar","consumer_law":"Ley General para la Defensa de los Consumidores (LGDCU)"},
  "FI": {"country":"Finland","language":"Finnish","currency":"EUR","currency_symbol":"€","country_tld":".fi","cancel_word":"Peruuttaa","consumer_law":"Kuluttajansuojalaki"},
  "FR": {"country":"France","language":"French","currency":"EUR","currency_symbol":"€","country_tld":".fr","cancel_word":"RΓ©silier","consumer_law":"Loi Hamon / Code de la Consommation"},
  "GB": {"country":"United Kingdom","language":"English","currency":"GBP","currency_symbol":"Β£","country_tld":".co.uk","cancel_word":"Cancel","consumer_law":"Consumer Rights Act 2015"},
  "GR": {"country":"Greece","language":"Greek","currency":"EUR","currency_symbol":"€","country_tld":".gr","cancel_word":"Ακύρωση","consumer_law":"ΞΟŒΞΌΞΏΟ‚ 2251/1994 (Consumer Protection)"},
  "HU": {"country":"Hungary","language":"Hungarian","currency":"HUF","currency_symbol":"Ft","country_tld":".hu","cancel_word":"LemondΓ‘s","consumer_law":"FogyasztΓ³vΓ©delmi tΓΆrvΓ©ny (1997. Γ©vi CLV)"},
  "ID": {"country":"Indonesia","language":"Indonesian","currency":"IDR","currency_symbol":"Rp","country_tld":".id","cancel_word":"Batalkan","consumer_law":"UU Perlindungan Konsumen No. 8/1999"},
  "IE": {"country":"Ireland","language":"English","currency":"EUR","currency_symbol":"€","country_tld":".ie","cancel_word":"Cancel","consumer_law":"Consumer Rights Act 2022"},
  "IN": {"country":"India","language":"English","currency":"INR","currency_symbol":"β‚Ή","country_tld":".in","cancel_word":"Cancel","consumer_law":"Consumer Protection Act 2019 (CCPA)"},
  "IT": {"country":"Italy","language":"Italian","currency":"EUR","currency_symbol":"€","country_tld":".it","cancel_word":"Disdetta","consumer_law":"Codice del Consumo (D.Lgs. 206/2005)"},
  "JP": {"country":"Japan","language":"Japanese","currency":"JPY","currency_symbol":"Β₯","country_tld":".jp","cancel_word":"キャンセル","consumer_law":"ζΆˆθ²»θ€…ε₯‘約法 (Consumer Contract Act)"},
  "NG": {"country":"Nigeria","language":"English","currency":"NGN","currency_symbol":"₦","country_tld":".ng","cancel_word":"Cancel","consumer_law":"FCCPC Consumer Protection Act 2019"},
  "NL": {"country":"Netherlands","language":"Dutch","currency":"EUR","currency_symbol":"€","country_tld":".nl","cancel_word":"Opzeggen","consumer_law":"Burgerlijk Wetboek (Consumentenwet)"},
  "NO": {"country":"Norway","language":"Norwegian","currency":"NOK","currency_symbol":"kr","country_tld":".no","cancel_word":"Avslutte","consumer_law":"ForbrukerkjΓΈpsloven"},
  "NZ": {"country":"New Zealand","language":"English","currency":"NZD","currency_symbol":"NZ$","country_tld":".co.nz","cancel_word":"Cancel","consumer_law":"Consumer Guarantees Act 1993"},
  "PE": {"country":"Peru","language":"Spanish","currency":"PEN","currency_symbol":"S/","country_tld":".pe","cancel_word":"Cancelar","consumer_law":"CΓ³digo de ProtecciΓ³n del Consumidor (Ley 29571)"},
  "PL": {"country":"Poland","language":"Polish","currency":"PLN","currency_symbol":"zΕ‚","country_tld":".pl","cancel_word":"Anuluj","consumer_law":"Ustawa o prawach konsumenta"},
  "PT": {"country":"Portugal","language":"Portuguese","currency":"EUR","currency_symbol":"€","country_tld":".pt","cancel_word":"Cancelar","consumer_law":"Lei de Defesa do Consumidor"},
  "RO": {"country":"Romania","language":"Romanian","currency":"RON","currency_symbol":"lei","country_tld":".ro","cancel_word":"Anulare","consumer_law":"Legea nr. 449/2003"},
  "SE": {"country":"Sweden","language":"Swedish","currency":"SEK","currency_symbol":"kr","country_tld":".se","cancel_word":"Avsluta","consumer_law":"KonsumentkΓΆplagen"},
  "SG": {"country":"Singapore","language":"English","currency":"SGD","currency_symbol":"S$","country_tld":".sg","cancel_word":"Cancel","consumer_law":"Consumer Protection (Fair Trading) Act (CPFTA)"},
  "TR": {"country":"Turkey","language":"Turkish","currency":"TRY","currency_symbol":"β‚Ί","country_tld":".tr","cancel_word":"Δ°ptal","consumer_law":"TΓΌketici Kanunu No. 6502"},
  "US": {"country":"United States","language":"English","currency":"USD","currency_symbol":"$","country_tld":".com","cancel_word":"Cancel","consumer_law":"FTC regulations / State consumer protection laws"},
  "ZA": {"country":"South Africa","language":"English","currency":"ZAR","currency_symbol":"R","country_tld":".co.za","cancel_word":"Cancel","consumer_law":"Consumer Protection Act 68/2008"},
}

def fill_with_ai(country_code: str, country_name: str) -> dict:
    """Use GPT-4o-mini to generate locale fields for unknown countries."""
    resp = client.chat.completions.create(
        model="gpt-4o-mini",
        response_format={"type": "json_object"},
        messages=[{"role": "user", "content":
            f"Return JSON with locale fields for country_code={country_code}, country={country_name}. "
            "Fields: country (full English name), language (main official language in English), "
            "currency (ISO 4217 code), currency_symbol (display symbol), country_tld (ccTLD), "
            "cancel_word (verb meaning 'to cancel/terminate a subscription' in the official language), "
            "consumer_law (official name of the main consumer protection law + citation). "
            "Be precise with legal citations. Return only valid JSON."
        }]
    )
    import json
    return json.loads(resp.choices[0].message.content)

def fill_service(svc: dict) -> dict:
    cc = (svc.get("country_code") or "").upper()
    if not cc:
        return svc
    defaults = LOCALE_DEFAULTS.get(cc)
    if not defaults:
        print(f"  πŸ€– AI generating locale for unknown country: {cc}")
        defaults = fill_with_ai(cc, svc.get("country", cc))
    for k, v in defaults.items():
        if not svc.get(k):  # never overwrite existing values
            svc[k] = v
    return svc

# ── Main ────────────────────────────────────────────────────────
with open(DATA_FILE, encoding="utf-8") as f:
    data = json.load(f)
svcs = data["services"] if isinstance(data, dict) else data

filled = 0
for svc in svcs:
    before = {k: svc.get(k) for k in ["language","currency","cancel_word","consumer_law"]}
    fill_service(svc)
    after  = {k: svc.get(k) for k in before}
    if before != after:
        filled += 1

if isinstance(data, dict):
    data["services"] = svcs
with open(DATA_FILE, "w", encoding="utf-8") as f:
    json.dump(data, f, ensure_ascii=False, indent=2)
print(f"βœ… Locale fill complete β€” {filled}/{len(svcs)} services updated β†’ {DATA_FILE}")
Usage:
python3 billoff_locale_fill.py services.json
# For any country not in the 37-country reference table, the script automatically
# calls GPT-4o-mini to generate the locale fields β€” never leaves fields blank.

Then run the guardrail --fix as a second validation pass.

Adding a new country (not yet in the 37-country list)

  1. Add country_code to service rows β€” this is the only mandatory field. All other locale fields can be auto-generated.
  2. Run billoff_locale_fill.py β€” fills language, currency, cancel_word, consumer_law via the hardcoded table (known countries) or GPT-4o-mini (unknown countries).
  3. Run billoff_guardrail.py --fix β€” validates and patches any remaining gaps.
  4. Verify the consumer_law field manually β€” AI-generated legal citations can be outdated. Cross-check against the official government source before launching at scale.
  5. Add to REVIEW_SITES in download-helpers.js and openai.js β€” find the 2–3 most trusted review platforms for that country and add an entry. This improves Pass 1 web search quality for consumer ratings.
  6. Run the smoke test β€” 1 random service from the new country, verify language, currency symbol, legal references, and heading capitalisation in the output article.

Go-live checklist

CheckNotes
☐JSON export received for each countryOne file per market, named XX_SERVICES.json
☐Guardrail run with --fixExit code 0 required before V1; exit 0 or 2 acceptable for V2/V3/V4
☐API keys set in each scriptNever commit keys to git β€” use env vars in production
☐Test run on 5 services per method--test flag, review output manually
☐Cost verified on first 100 articlesEspecially critical for V1 β€” see Β§4
☐45-country smoke test passed1 random service/country β€” verify language, currency, consumer law, sentence case β€” see Β§5
☐Worker benchmark done (50 services)MAX_WORKERS set to clean ceiling β€” see Β§6. Adaptive fallback enabled for overnight runs.
☐Backup of original JSON keptcp SOURCE.json SOURCE.backup.json
☐First batch reviewed manuallySpot-check 10+ articles across different categories and countries
☐Cron / sync script tested in dry-runRun billoff_sync.py with a small test export first
☐Import to CMS / DB tested end-to-endVerify all fields (slug, faq, seo_title) map correctly to your schema