Developer Integration Guide

Pipeline at a Glance

Each country runs through this sequence. The JSON export is the entry point — every subsequent script enriches it in-place.

📦 Postclic data source — export JSON, direct DB query, API call… à toi de choisir JSON input

↓

🛡️ Guardrail — validate & auto-fix missing fields billoff_guardrail.py

↓

📝 Article generation — choose V1, V2, V3, or V4 billoff_v1/v2/v3/v4.py

↓

🏷️ Meta SEO — title · H1 · description · slug · FAQ billoff_v5_metadata.py

↓

✅ Enriched JSON ready to publish import to site

Source de données : Peu importe comment tu extrais les services Postclic (export fichier, requête DB directe, API, script custom) — le pipeline Billoff attend juste du JSON en entrée.
File strategy: Every script reads and writes back to the same JSON file. Keep a backup copy before running anything at scale.
Format: Scripts accept both {"services": [...]} (pipeline format) and a plain [...] array.

⚠️ Before You Run Anything

Read this carefully before running at scale.

Tu as besoin des données services par pays — peu importe comment tu les obtiens (connexion directe à la DB, export, API…). Le pipeline attend du JSON en entrée, à toi de l'alimenter comme tu préfères.
Les scripts modifient les fichiers en place. Si tu travailles avec un fichier JSON intermédiaire, garde une copie avant le premier run. Si tu connectes directement à la DB, assure-toi d'avoir un snapshot ou une stratégie de rollback.
Always test on 5 services first using the --test flag before committing to 50 000. Inspect the output manually.
V1 is expensive. Expect $0.05–$0.10 per article. Verify on the first 100 before running the full batch — see the Cost Verification section.
You are responsible for deploying and testing on your side. Billoff AI Lab generates the content scripts — QA, CMS import, and production deployment are your responsibility.
API keys are never stored in the cloud. Each script contains a OPENAI_API_KEY / GEMINI_API_KEY / ANTHROPIC_API_KEY placeholder that you must fill in locally.

Step 1 — Choose Your Article Generation Method

Each method produces a standalone Python script. Go to the corresponding docs page, set your editor config if needed, then click Python Script ✦ live config.

Method	Model	Input	Quality	Cost / article	Speed
V1 — Research	GPT-4.1 → GPT-5 Mini	Web search (5 passes)	⭐⭐⭐⭐⭐	$0.05–0.10	60–120 s
V2 — Rewrite	GPT-5 Mini	Existing seo_content	⭐⭐⭐⭐	$0.001–0.003	10–20 s
V3 — Gemini	Gemini 2.5 Flash	Existing seo_content	⭐⭐⭐⭐	$0.001–0.004	15–30 s
V4 — Claude	Claude Haiku 4.5	Existing seo_content	⭐⭐⭐⭐⭐	$0.003–0.005	20–35 s
Meta SEO	GPT-4o-mini	Enriched seo_content	—	$0.0004	1–3 s

Recommendation: Use V2 or V4 for the majority of your catalogue (existing content, low cost), and reserve V1 for high-priority or high-traffic pages where fresh web research is worth the extra cost.

API keys required per method

# V1 and V2 — OpenAI
OPENAI_API_KEY = "sk-..."

# V3 — Google Gemini
GEMINI_API_KEY = "AIzaSy..."

# V4 — Anthropic Claude
ANTHROPIC_API_KEY = "sk-ant-..."

# Meta SEO — OpenAI (gpt-4o-mini)
OPENAI_API_KEY = "sk-..."

Step 2 — Run the Pre-Flight Guardrail

Download the guardrail script and run it before any generation. It validates your JSON, reports missing fields, and optionally auto-populates locale data (language, currency, cancel_word, consumer_law, slug_prefix) from built-in country defaults.

Usage

# Audit only (no file modified)
python3 billoff_guardrail.py services.json

# Auto-populate locale fields + overwrite
python3 billoff_guardrail.py services.json --fix

# Safe: write to a separate file
python3 billoff_guardrail.py services.json --fix --out services_clean.json

What the script checks

Category	Fields	Behaviour if missing
🔴 Critical	`name`	Exit code 1 — pipeline will crash. Must be fixed manually.
🟡 Locale	`language` · `currency` · `currency_symbol` · `cancel_word` · `consumer_law`	Exit code 2 — auto-fixable with `--fix` from `COUNTRY_DEFAULTS`. Article will be in the wrong language/currency if missing.
🔵 Quality	`main_keyword` · `keywords` · `seo_content` · `website` · `cancellation_address`	Exit code 2 — won't block execution, but reduces article depth and accuracy.

Country auto-detection

The script detects the country from:

The "country": "FR" (or any ISO code) field in the JSON root
The filename prefix: FR_services.json, DE_services.json, etc.

If detection fails, add "country": "XX" (your ISO code) to the root of your JSON, or rename the file to start with the two-letter country code.

Expected output (sample)

═════════════════════════════════════════════════════════════════
  Billoff Pre-Pipeline Guardrail
═════════════════════════════════════════════════════════════════
  File:     services.json
  Services: 4821
  Country:  FR  ← detected from JSON root "country" field
  Defaults: French · EUR € · Résiliation

  Field                        Missing    Coverage          Note
  ──────────────────────────── ────────   ──────────   ──────────
  ✅ name                            0        100%
  ✅ language                        0        100%
  🟡 locale  currency               42         99%  auto-fixable
  🟡 locale  currency_symbol        42         99%  auto-fixable
  ✅ cancel_word                     0        100%
  ✅ consumer_law                    0        100%
  🔵 quality main_keyword          183         96%  reduces quality
  🔵 quality seo_content            12        100%  reduces quality

  ✅  No critical issues — pipeline can run.
  🟡  42 locale fields missing — run --fix to auto-populate from FR defaults.
  🔵  195 quality fields missing — article depth may be reduced.

Step 3 — Run the Pipeline

Test mode first (always)

# Run on 5 random services — detailed output, no file modified
python3 billoff_v2_generate.py services.json --test
python3 billoff_v5_metadata.py services.json --test

Production run (full batch)

# 1. Article generation (pick your method)
python3 billoff_v2_generate.py services.json

# 2. SEO metadata (always run after article generation)
python3 billoff_v5_metadata.py services.json

# services.json now has: seo_content + seo_title + h1 + seo_description + slug + faq

Output fields after full pipeline:
seo_content (HTML article) · seo_title · h1 · seo_description · slug · faq[]

Step 4 — Verify Costs on the First 100 Articles

Especially for V1 (web search), always run a 100-article batch first and compare actual vs estimated costs before committing to the full country catalogue.

V1 can deviate significantly from estimates if your services have unusually long names or complex search results. Always validate before running 50 000 articles.

How to run the first 100

Create a test file with the first 100 services from your JSON:

# One-liner to extract the first 100 services into a test file
python3 -c "
import json
with open('services.json') as f:
    d = json.load(f)
# supports both formats
svcs = d['services'] if isinstance(d, dict) else d
test = svcs[:100]
out  = {'country': d.get('country'), 'services': test} if isinstance(d, dict) else test
with open('services_100.json', 'w') as f:
    json.dump(out, f, indent=2)
print(f'Wrote {len(test)} services to services_100.json')
"

# Run the pipeline on these 100
python3 billoff_v1_generate.py services_100.json
python3 billoff_v5_metadata.py services_100.json

# Sum up actual costs from the output file
python3 -c "
import json
with open('services_100.json') as f:
    d = json.load(f)
svcs = d['services'] if isinstance(d, dict) else d
total_usd = sum(s.get('cost_usd', 0) for s in svcs)
total_eur = total_usd * 0.92
avg = total_usd / max(len(svcs),1)
done = sum(1 for s in svcs if s.get('seo_content','').strip())
print(f'Generated:  {done}/{len(svcs)} articles')
print(f'Total cost: \${total_usd:.4f} USD  /  €{total_eur:.4f}')
print(f'Avg / article: \${avg:.5f} USD  (projected 50k: \${avg*50000:.0f} USD)')
"

Reference cost table (per 100 articles)

Method	Estimated cost / 100	Projected / 50 000	Key cost driver
V1	$5 – $10	$2 500 – $5 000	5× web search + 5-pass GPT-4.1
V2	$0.10 – $0.30	$50 – $150	GPT-5 Mini rewrite
V3	$0.10 – $0.40	$50 – $200	Gemini 2.5 Flash (thinking)
V4	$0.30 – $0.50	$150 – $250	Claude Haiku 4.5
Meta SEO	$0.04	$20	GPT-4o-mini structured JSON

If actual costs are more than 30% higher than estimated on the first 100, check: (1) average seo_content length — V1 truncates at 4 500 chars but all input is charged; (2) Pass 5 structured extraction may be retrying due to JSON parse errors — add logging to chatJSON() in the V1 script.

Step 5 — Pre-launch validation: smoke test + checkpoint strategy

Before committing to a 50 000-article run, two things will save your weekend: a multi-country smoke test (1 random service per country — catches locale bugs, bad API keys, and config errors in under 5 min) and checkpoint saves every 500 services so a crash at article 12 000 doesn't cost you 6 hours of API spend.

45-country smoke test — 1 random service per country

Extract one service per country and run the full pipeline on this tiny slice before the production batch:

# 1. Build smoke-test file — 1 random service per country
python3 -c "
import json, random
with open('services.json') as f:
    d = json.load(f)
svcs = d['services'] if isinstance(d, dict) else d

by_country = {}
for s in svcs:
    cc = s.get('country_code') or s.get('country', 'XX')
    by_country.setdefault(cc, []).append(s)

smoke = [random.choice(v) for v in by_country.values()]
out   = {'country': 'MULTI', 'services': smoke}
with open('smoke_test.json', 'w') as f:
    json.dump(out, f, indent=2)
print(f'Wrote {len(smoke)} services across {len(by_country)} countries → smoke_test.json')
"

# 2. Run your chosen method on the smoke file (no --test flag: we want ALL countries)
python3 billoff_v2_generate.py smoke_test.json
python3 billoff_v5_metadata.py smoke_test.json

# 3. Spot-check output — for each country verify:
#    - language is correct (no English article for a German service)
#    - currency symbol is right (€ not A$ for France)
#    - consumer law name is injected (not blank or AU fallback)
#    - headings are in sentence case
#    - article length is ≥ 1 500 words
python3 -c "
import json
with open('results_v2.json') as f: arts = json.load(f)
for a in arts:
    svc = a.get('service_name','?')
    wc  = len(a.get('content','').split())
    ok  = '✅' if wc >= 1500 else '❌'
    print(f'{ok}  {svc:40s}  {wc:,} words  \${a.get(\"cost_usd\",0):.4f}')
"

If any country fails the smoke test — fix the locale config for that country (run guardrail --fix again), don't proceed to the full batch. A bad locale config silently generates wrong-language articles at scale.

Checkpoint saves — never lose more than 500 articles to a crash

Add this pattern to the main() of any generated script. The script saves a .checkpoint.json every 500 completed services and deletes it automatically on successful completion:

# Add near the top of main():
CHECKPOINT_EVERY = 500        # tune: smaller = safer, slightly slower I/O
_checkpoint = OUT_FILE.with_suffix(".checkpoint.json")

# Inside the ThreadPoolExecutor loop — replace the existing loop body:
results = []
with ThreadPoolExecutor(max_workers=MAX_WORKERS) as ex:
    futures = {ex.submit(process_service, svc): svc for svc in services}
    for i, future in enumerate(as_completed(futures), 1):
        try:
            results.append(future.result())
        except Exception as e:
            print(f"  ❌ Unhandled: {e}")

        if CHECKPOINT_EVERY > 0 and i % CHECKPOINT_EVERY == 0:
            _checkpoint.parent.mkdir(parents=True, exist_ok=True)
            with open(_checkpoint, "w", encoding="utf-8") as f:
                json.dump(results, f, ensure_ascii=False, indent=2)
            print(f"  💾 Checkpoint {i}/{len(services)} — {len(results)} articles saved")

# Remove checkpoint file after a clean run:
if _checkpoint.exists():
    _checkpoint.unlink()
    print("  🗑  Checkpoint removed — run complete")

Auto-resume — restart exactly where you left off

Add this block at the start of main(), before the ThreadPoolExecutor. It reads the checkpoint (or the final output file if it already exists), skips already-processed services, and continues from where the previous run stopped:

# --- AUTO-RESUME BLOCK (add before the ThreadPoolExecutor loop) ---
results     = []
already_done = set()

if OUT_FILE.exists():
    with open(OUT_FILE, encoding="utf-8") as f:
        results = json.load(f)
    already_done = {r.get("service_name", "") for r in results}
    print(f"  ↩  Output file exists: {len(already_done)} articles already done — skipping")
elif _checkpoint.exists():
    with open(_checkpoint, encoding="utf-8") as f:
        results = json.load(f)
    already_done = {r.get("service_name", "") for r in results}
    print(f"  ↩  Checkpoint found: {len(already_done)} done — resuming from checkpoint")

services = [s for s in services if s.get("name", "") not in already_done]

if not services:
    print("  ✅ All services already processed — nothing to do")
    sys.exit(0)

print(f"  → {len(services)} remaining ({len(already_done)} already done)")
# --- END RESUME BLOCK ---

Usage: Just re-run the same command — no extra flag needed. The script auto-detects the checkpoint or output file and skips completed services.

The generated scripts already include --checkpoint N (default 500) and --resume flags that activate this pattern out of the box.

Recommended pre-launch sequence for a 45-country batch

# Day 0 — validation
python3 billoff_guardrail.py services.json --fix              # fix locale fields for all countries
python3 billoff_v2_generate.py smoke_test.json               # smoke test: 45 articles (1/country)
python3 billoff_v5_metadata.py smoke_test.json               # verify metadata for all locales
                                                              # → review output manually before going further

# Day 1 — cost gate
python3 billoff_v2_generate.py services_100.json             # 100-article cost check (see §4)
                                                              # → confirm cost × 500 is within budget

# Day 2 — production run (with checkpoint + resume ready)
python3 billoff_v2_generate.py services.json --workers 50    # will auto-resume if interrupted
python3 billoff_v5_metadata.py services.json --workers 100   # run after articles

Step 6 — Tuning worker count for maximum throughput

All Billoff batch scripts process services in parallel using an async worker pool. The number of concurrent API requests — MAX_WORKERS — is the single most impactful performance knob. Too low: your 50 000-service run takes days. Too high: you hit rate limits and waste retries.

Default values per method

Method	Default `MAX_WORKERS`	API	Rate limit context
V1 (web search + GPT-4.1)	`${maxWorkers}`	OpenAI + web search	Web search is the binding constraint. OpenAI default 5 — raise to 10 max.
V2 (GPT-4.1 research + GPT-5 Mini write)	`10`	OpenAI	2 API calls/service. Tier 1: keep ≤ 10. Tier 2+: safe up to 30.
V3 (Gemini 2.5 Flash)	`3`	Google AI	Free tier: 10 RPM → keep ≤ 3. Pay tier (1 000 RPM): safe up to 50.
V4 (Claude Haiku 4.5)	`20`	Anthropic	Tier 4: 4 000 RPM / 800 K out-tokens/min → ceiling ~33 (output-token bound). Safe: 20. Max: 30.
V5 Meta SEO (GPT-4o-mini)	`50`	OpenAI	Metadata only (~900 tokens/call). Tier 1: ≤ 30. Tier 3+: up to 150.

Why V4 ceiling is ~33 and not 4 000 (RPM)?
At Anthropic Tier 4 you have 4 000 RPM and 800 K output-tokens/min for Claude Haiku. A typical Billoff article is ~2 000 output tokens. That means the max sustainable throughput is 800 000 ÷ 2 000 = 400 articles/min. With ~5 s average latency per Haiku call, each worker delivers 60 ÷ 5 = 12 articles/min, so 400 ÷ 12 ≈ 33 workers. The RPM limit (4 000) is not the constraint — the output-token budget is.

Find your OpenAI tier: platform.openai.com → Settings → Limits. Tier 1 starts at $5 spend. Tier 2 at $50. Tier 3 at $100. Each tier roughly 10× the RPM/TPM ceiling.
Find your Anthropic tier: console.anthropic.com → Settings → Limits.

How to find your ideal worker count — step-by-step benchmark

Run this benchmark on a 50-service test file before your full batch:

# 1. Extract a 50-service benchmark slice
python3 -c "
import json
with open('services.json') as f: d = json.load(f)
svcs = d['services'] if isinstance(d, dict) else d
bench = svcs[:50]
out = {'country': d.get('country'), 'services': bench} if isinstance(d, dict) else bench
with open('bench50.json', 'w') as f: json.dump(out, f, indent=2)
print(f'Wrote {len(bench)} services to bench50.json')
"

# 2. Run at increasing worker levels — note duration and any 429 errors
MAX_WORKERS=5   python3 billoff_v2_generate.py bench50.json  # baseline
MAX_WORKERS=20  python3 billoff_v2_generate.py bench50.json
MAX_WORKERS=50  python3 billoff_v2_generate.py bench50.json
MAX_WORKERS=100 python3 billoff_v2_generate.py bench50.json

# 3. Pick the highest level with 0 rate-limit retries — that's your MAX_WORKERS
   The script logs "Rate limit — sleeping Xs" on every retry. Count them.

To pass MAX_WORKERS via env var, add this near the top of each script:

import os
MAX_WORKERS = int(os.getenv("MAX_WORKERS", "20"))   # default 20, override via env

Progressive fallback pattern — automatic backoff on rate limit

For large unattended runs (overnight batches), build in a self-regulating worker count that halves on rate limit and recovers slowly. Add this wrapper around your semaphore loop:

import asyncio, time, math

async def process_with_adaptive_workers(services, generate_fn, initial_workers=20):
    """
    Process services with adaptive concurrency.
    Halves worker count on rate limit. Increases by 1 every 60 s if clean.
    """
    workers     = initial_workers
    min_workers = 2
    max_workers = initial_workers
    results     = []
    last_ok_ts  = time.time()
    pending     = list(services)

    while pending:
        batch = pending[:workers]
        pending = pending[workers:]
        sem  = asyncio.Semaphore(workers)

        async def _run(svc):
            async with sem:
                return await generate_fn(svc)

        tasks = [asyncio.create_task(_run(s)) for s in batch]
        rate_limited = False

        for coro in asyncio.as_completed(tasks):
            try:
                results.append(await coro)
                last_ok_ts = time.time()
            except Exception as e:
                if "429" in str(e) or "rate_limit" in str(e).lower():
                    rate_limited = True
                    for t in tasks: t.cancel()   # abort batch
                    pending = batch + pending       # requeue entire batch
                    break
                raise

        if rate_limited:
            workers = max(min_workers, workers // 2)
            wait    = 60
            print(ff"  ⚡ Rate limit hit — dropping to {workers} workers, sleeping {wait}s")
            await asyncio.sleep(wait)
        elif time.time() - last_ok_ts > 60 and workers < max_workers:
            workers = min(max_workers, workers + 1)   # recover +1 every 60 s of clean running

    return results

Usage: Call await process_with_adaptive_workers(services, generate_one) where generate_one(svc) is your per-service coroutine. The function halves concurrency on every 429 and slowly ramps back. For overnight batch of 50 000 services this is the recommended pattern — it self-tunes without manual intervention.

Rate limit reference table

Provider	Free / Tier 1	Tier 2	Tier 3+	Key limit type
OpenAI (GPT-4.1, 5 Mini, 4o-mini)	500 RPM	5 000 RPM	10 000 RPM	RPM + TPM — both enforced
Anthropic (Haiku 4.5)	50 RPM	1 000 RPM	2 000 RPM	RPM + input tokens/min
Google AI (Gemini 2.5 Flash)	10 RPM	1 000 RPM	2 000 RPM	RPM — thinking tokens count extra
Bing Web Search (V1)	3 TPS / 1 000/mo	10 TPS	custom	Transactions per second

V1 is special: V1 runs 5 web search passes per article. At MAX_WORKERS=10 that's 50 concurrent search requests. Bing S1 tier allows 3 TPS — so keep V1 workers ≤ 3 on S1. Use Brave Search (unlimited tier) if you need higher parallelism on V1.

Recommended settings for a 50 000-service run

Method	OpenAI Tier 1	OpenAI Tier 2+	Expected duration (50k)
V1	`3`	`5`	~7–14 days (search-bound)
V2	`20`	`80`	~8 h (Tier 2) / ~40 h (Tier 1)
V3	`5` (free) / `40` (pay)	—	~14 h (pay tier)
V4	`10`	`30` (Tier 2)	~20 h (Tier 2)
V5 Meta SEO	`30`	`100`	~3 h (Tier 2)

Estimates assume ~1.5 s average latency per API call. Actual duration depends on model load and token count.

Add to your go-live checklist: Before the full run, do a 50-service benchmark with your real API tier, note the clean worker ceiling, set MAX_WORKERS accordingly, and enable the adaptive fallback for overnight batches.

Step 7 — Dynamic sync: keep Postclic in sync

Once the initial bulk generation is done, set up a cron job to process only new or updated services — so every addition to Postclic automatically gets an article and metadata.

Architecture

Every X days / on new service addition:
──────────────────────────────────────────────────────────────────
Postclic DB → JSON export     (your existing export mechanism)
     ↓
Find NEW services              (diff by slug or name vs last run)
     ↓
Run guardrail --fix            (populate locale fields)
     ↓
Run article generator          (V1/V2/V3/V4 on new services only)
     ↓
Run Meta SEO                   (metadata on new services only)
     ↓
Merge into master JSON         (append new results, keep existing)
     ↓
Import / publish               (your CMS import step)

Incremental sync script template

#!/usr/bin/env python3
"""billoff_sync.py — Incremental pipeline runner.
Finds services in NEW_EXPORT that are not yet in MASTER_JSON
(matched by name), processes them through the pipeline,
and merges results back into MASTER_JSON.
"""
import json, os, subprocess, sys
from pathlib import Path

MASTER_JSON  = "data/services.json"          # your accumulating master file
NEW_EXPORT   = "data/export_latest.json"      # latest Postclic data (any source)
NEW_ONLY     = "data/new_services.json"       # temp file for new services only
METHOD       = "v2"                          # v1 / v2 / v3 / v4
ARTICLE_SCRIPT = f"billoff_{METHOD}_generate.py"
META_SCRIPT    = "billoff_v5_metadata.py"

def _load_services(path):
    with open(path) as f: d = json.load(f)
    return (d["services"] if isinstance(d, dict) else d), d

def _save_services(path, services, original_data):
    if isinstance(original_data, list):
        out = services
    else:
        original_data["services"] = services
        out = original_data
    with open(path, "w") as f:
        json.dump(out, f, ensure_ascii=False, indent=2)

# Load master (existing processed services)
master_svcs, master_data = _load_services(MASTER_JSON)
existing_names = {s["name"] for s in master_svcs}

# Load latest export
new_svcs, new_data = _load_services(NEW_EXPORT)
new_only = [s for s in new_svcs if s["name"] not in existing_names]

if not new_only:
    print(f"  ✅  No new services found — nothing to process.")
    sys.exit(0)

print(f"  🆕  {len(new_only)} new services found — running pipeline...")

# Write new-only subset to temp file
_save_services(NEW_ONLY, new_only, {"country": master_data.get("country","AU"), "services": new_only})

# Guardrail — fix locale fields
subprocess.run(["python3", "billoff_guardrail.py", NEW_ONLY, "--fix"], check=True)

# Article generation
subprocess.run(["python3", ARTICLE_SCRIPT, NEW_ONLY], check=True)

# Meta SEO
subprocess.run(["python3", META_SCRIPT, NEW_ONLY], check=True)

# Merge processed new services back into master
processed_svcs, _ = _load_services(NEW_ONLY)
master_svcs.extend(processed_svcs)
_save_services(MASTER_JSON, master_svcs, master_data)

print(f"  ✅  Merged {len(processed_svcs)} new articles into {MASTER_JSON}")
os.remove(NEW_ONLY)

Cron job setup

# Run every Monday at 3 AM — add to crontab with: crontab -e
0 3 * * 1  cd /var/www/billoff && python3 billoff_sync.py >> logs/sync.log 2>&1

# Or with a virtual environment:
0 3 * * 1  cd /var/www/billoff && /usr/local/bin/python3 billoff_sync.py >> logs/sync.log 2>&1

How "new" is detected: The sync script matches by name field. If a service can be renamed on Postclic, use a stable id field instead — just replace the existing_names set-comparison with an existing_ids comparison.

Obtenir les données Postclic — à toi de choisir

Le script de sync attend juste un fichier AU_EXPORT_LATEST.json en entrée. Comment tu le génères, c'est ton affaire — plusieurs approches valides :

Connexion directe à la DB : requête SQL depuis Python, résultat converti en JSON.
SELECT * FROM services WHERE country='AU' AND updated_at > :last_run
Export fichier : Postclic génère un fichier JSON en batch → tu le récupères via SFTP/S3/local
API REST : si Postclic expose une API, un simple curl suffit avant le sync
Script custom : tu connectes directement avec psycopg2, pymysql, SQLAlchemy… et tu sérialises toi-même

Peu importe la méthode — le pipeline Billoff se fiche de l'origine. Il lit du JSON, c'est tout. La structure attendue :

# Format pipeline standard
{ "country": "AU", "services": [ {"name": "...", "seo_content": "...", ...} ] }

# Ou plain array — les deux sont acceptés
[ {"name": "...", "seo_content": "...", ...} ]

Country configuration reference — all 37 Postclic markets

Every service row in your JSON must carry the locale fields below for the pipeline to generate correctly-localised articles (right language, currency, consumer law, cancel word). The guardrail --fix will auto-populate missing fields from the country defaults in this table — but only if country_code is present on each service row.

Always verify with AI for new countries. Consumer laws and regulatory body names change. Before launching a new market, ask GPT-4o or Claude to validate the consumer_law field — see the AI locale generator script below.

Required locale fields per service row

Field	Example (FR)	Used by	Notes
`country_code`	FR	All methods + guardrail	ISO 3166-1 alpha-2 — required
`country`	France	V1–V4 prompts	Full country name in article language
`language`	French	V1–V4 prompts	Language name in English
`currency`	EUR	V1–V4 prompts	ISO 4217 currency code
`currency_symbol`	€	V1–V4 prompts	Symbol used in pricing tables
`country_tld`	.fr	V1 search queries	Country TLD for competitor search
`cancel_word`	Résilier	V1–V4 headings	Native-language cancel verb
`consumer_law`	Loi Hamon / Code de la Consommation	V1–V4 legal section	Official law name — cited in articles
`city` / `region` / `timezone`	Paris / Île-de-France / Europe/Paris	V1 geo-targeting	Optional — improves web search relevance

Complete reference — all 37 markets

These values are baked into the guardrail --fix auto-population and the REVIEW_SITES map used in V1 web searches.

Code	Country	Language	Currency	Symbol	cancel_word	consumer_law (short)	TLD	review_sites
`AE`	UAE	Arabic	AED	AED	إلغاء	Federal Law 15/2020	.ae	Google Reviews
`AR`	Argentina	Spanish	ARS	$	Cancelar	Ley Defensa Consumidor 24.240	.ar	Trustpilot.ar, Mercado Libre
`AT`	Austria	German	EUR	€	Kündigen	Konsumentenschutzgesetz (KSchG)	.at	Trustpilot.at
`BE`	Belgium	French	EUR	€	Annuler	Code de droit économique (CDE)	.be	Trustpilot.be
`BG`	Bulgaria	Bulgarian	BGN	лв	Анулиране	Закон за защита на потребителите	.bg	Trustpilot.bg
`BR`	Brazil	Portuguese	BRL	R$	Cancelar	Código de Defesa do Consumidor (CDC)	.br	Reclame Aqui
`CA`	Canada	English	CAD	CA$	Cancel	Consumer Protection Acts (provincial)	.ca	Trustpilot.ca, BBB
`CH`	Switzerland	German	CHF	CHF	Kündigen	Konsumentenschutzgesetz	.ch	Trustpilot.ch
`CL`	Chile	Spanish	CLP	$	Cancelar	Ley 19.496 (LPDC)	.cl	Reclamos.cl
`CO`	Colombia	Spanish	COP	$	Cancelar	Estatuto del Consumidor (Ley 1480)	.co	SIC Colombia
`CZ`	Czech Republic	Czech	CZK	Kč	Zrušit	Zákon o ochraně spotřebitele	.cz	Heureka.cz
`DE`	Germany	German	EUR	€	Kündigen	BGB §§ 312 / Widerrufsrecht	.de	Trusted Shops, Trustpilot.de
`DK`	Denmark	Danish	DKK	kr	Opsige	Forbrugerkøbsloven	.dk	Trustpilot.dk
`ES`	Spain	Spanish	EUR	€	Cancelar	LGDCU / Ley General Defensa Consumidores	.es	Trustpilot.es
`FI`	Finland	Finnish	EUR	€	Peruuttaa	Kuluttajansuojalaki	.fi	Trustpilot.fi
`FR`	France	French	EUR	€	Résilier	Loi Hamon / Code de la Consommation	.fr	Avis Vérifiés, Trustpilot.fr
`GB`	United Kingdom	English	GBP	£	Cancel	Consumer Rights Act 2015	.co.uk	Trustpilot.co.uk, Reviews.io
`GR`	Greece	Greek	EUR	€	Ακύρωση	Νόμος 2251/1994 (Consumer Protection)	.gr	Trustpilot.gr, Skroutz.gr
`HU`	Hungary	Hungarian	HUF	Ft	Lemondás	Fogyasztóvédelmi törvény (1997. évi CLV)	.hu	Árukereső.hu
`ID`	Indonesia	Indonesian	IDR	Rp	Batalkan	UU Perlindungan Konsumen No. 8/1999	.id	Google Reviews
`IE`	Ireland	English	EUR	€	Cancel	Consumer Rights Act 2022	.ie	Trustpilot.ie
`IN`	India	English	INR	₹	Cancel	Consumer Protection Act 2019 (CCPA)	.in	MouthShut.com, Trustpilot.in
`IT`	Italy	Italian	EUR	€	Disdetta	Codice del Consumo (D.Lgs. 206/2005)	.it	Trustpilot.it, eKomi
`JP`	Japan	Japanese	JPY	¥	キャンセル	消費者契約法 (Consumer Contract Act)	.jp	Kakaku.com
`NG`	Nigeria	English	NGN	₦	Cancel	FCCPC Consumer Protection Act 2019	.ng	Google Reviews
`NL`	Netherlands	Dutch	EUR	€	Opzeggen	Burgerlijk Wetboek (Consumentenwet)	.nl	Trustpilot.nl, Kiyoh
`NO`	Norway	Norwegian	NOK	kr	Avslutte	Forbrukerkjøpsloven	.no	Trustpilot.no
`NZ`	New Zealand	English	NZD	NZ$	Cancel	Consumer Guarantees Act 1993	.co.nz	Consumer.org.nz
`PE`	Peru	Spanish	PEN	S/	Cancelar	Código de Protección del Consumidor (Ley 29571)	.pe	Indecopi.gob.pe
`PL`	Poland	Polish	PLN	zł	Anuluj	Ustawa o prawach konsumenta	.pl	Opineo.pl, Ceneo.pl
`PT`	Portugal	Portuguese	EUR	€	Cancelar	Lei de Defesa do Consumidor	.pt	Trustpilot.pt
`RO`	Romania	Romanian	RON	lei	Anulare	Legea nr. 449/2003	.ro	Trustpilot.ro
`SE`	Sweden	Swedish	SEK	kr	Avsluta	Konsumentköplagen	.se	Trustpilot.se, Prisjakt
`SG`	Singapore	English	SGD	S$	Cancel	Consumer Protection (Fair Trading) Act (CPFTA)	.sg	Trustpilot.sg
`TR`	Turkey	Turkish	TRY	₺	İptal	Tüketici Kanunu No. 6502	.tr	Şikayetvar.com
`US`	United States	English	USD	$	Cancel	FTC regulations / State consumer protection laws	.com	BBB, Trustpilot.com, Yelp
`ZA`	South Africa	English	ZAR	R	Cancel	Consumer Protection Act 68/2008	.co.za	Hellopeter.com

AI-powered locale field generator — fill any missing country

When a service row is missing locale fields (or you're adding a country not in the table above), run this script. It uses GPT-4o-mini with structured output to generate all locale fields accurately, then writes them back into your JSON.

#!/usr/bin/env python3
"""billoff_locale_fill.py — AI-powered locale field generator.
Fills missing locale fields (language, currency, consumer_law, cancel_word, etc.)
for any country, using GPT-4o-mini structured output.
Run BEFORE the guardrail and before any generation script.
"""
import json, os, sys
from openai import OpenAI
from pathlib import Path

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") or ""
DATA_FILE      = Path(sys.argv[1]) if len(sys.argv) > 1 else Path("data/services.json")
client         = OpenAI(api_key=OPENAI_API_KEY)

# ── Hard-coded defaults for all 37 Postclic markets ──────────────
# These are always preferred over AI calls for known countries.
LOCALE_DEFAULTS = {
  "AE": {"country":"UAE","language":"Arabic","currency":"AED","currency_symbol":"AED","country_tld":".ae","cancel_word":"إلغاء","consumer_law":"UAE Consumer Protection Law (Federal Law 15/2020)"},
  "AR": {"country":"Argentina","language":"Spanish","currency":"ARS","currency_symbol":"$","country_tld":".ar","cancel_word":"Cancelar","consumer_law":"Ley de Defensa del Consumidor (Ley 24.240)"},
  "AT": {"country":"Austria","language":"German","currency":"EUR","currency_symbol":"€","country_tld":".at","cancel_word":"Kündigen","consumer_law":"Konsumentenschutzgesetz (KSchG)"},
  "BE": {"country":"Belgium","language":"French","currency":"EUR","currency_symbol":"€","country_tld":".be","cancel_word":"Annuler","consumer_law":"Code de droit économique (CDE)"},
  "BG": {"country":"Bulgaria","language":"Bulgarian","currency":"BGN","currency_symbol":"лв","country_tld":".bg","cancel_word":"Анулиране","consumer_law":"Закон за защита на потребителите"},
  "BR": {"country":"Brazil","language":"Portuguese","currency":"BRL","currency_symbol":"R$","country_tld":".br","cancel_word":"Cancelar","consumer_law":"Código de Defesa do Consumidor (CDC)"},
  "CA": {"country":"Canada","language":"English","currency":"CAD","currency_symbol":"CA$","country_tld":".ca","cancel_word":"Cancel","consumer_law":"Consumer Protection Acts (provincial)"},
  "CH": {"country":"Switzerland","language":"German","currency":"CHF","currency_symbol":"CHF","country_tld":".ch","cancel_word":"Kündigen","consumer_law":"Konsumentenschutzgesetz"},
  "CL": {"country":"Chile","language":"Spanish","currency":"CLP","currency_symbol":"$","country_tld":".cl","cancel_word":"Cancelar","consumer_law":"Ley 19.496 (LPDC)"},
  "CO": {"country":"Colombia","language":"Spanish","currency":"COP","currency_symbol":"$","country_tld":".co","cancel_word":"Cancelar","consumer_law":"Estatuto del Consumidor (Ley 1480)"},
  "CZ": {"country":"Czech Republic","language":"Czech","currency":"CZK","currency_symbol":"Kč","country_tld":".cz","cancel_word":"Zrušit","consumer_law":"Zákon o ochraně spotřebitele"},
  "DE": {"country":"Germany","language":"German","currency":"EUR","currency_symbol":"€","country_tld":".de","cancel_word":"Kündigen","consumer_law":"BGB §§ 312 ff. / Widerrufsrecht"},
  "DK": {"country":"Denmark","language":"Danish","currency":"DKK","currency_symbol":"kr","country_tld":".dk","cancel_word":"Opsige","consumer_law":"Forbrugerkøbsloven"},
  "ES": {"country":"Spain","language":"Spanish","currency":"EUR","currency_symbol":"€","country_tld":".es","cancel_word":"Cancelar","consumer_law":"Ley General para la Defensa de los Consumidores (LGDCU)"},
  "FI": {"country":"Finland","language":"Finnish","currency":"EUR","currency_symbol":"€","country_tld":".fi","cancel_word":"Peruuttaa","consumer_law":"Kuluttajansuojalaki"},
  "FR": {"country":"France","language":"French","currency":"EUR","currency_symbol":"€","country_tld":".fr","cancel_word":"Résilier","consumer_law":"Loi Hamon / Code de la Consommation"},
  "GB": {"country":"United Kingdom","language":"English","currency":"GBP","currency_symbol":"£","country_tld":".co.uk","cancel_word":"Cancel","consumer_law":"Consumer Rights Act 2015"},
  "GR": {"country":"Greece","language":"Greek","currency":"EUR","currency_symbol":"€","country_tld":".gr","cancel_word":"Ακύρωση","consumer_law":"Νόμος 2251/1994 (Consumer Protection)"},
  "HU": {"country":"Hungary","language":"Hungarian","currency":"HUF","currency_symbol":"Ft","country_tld":".hu","cancel_word":"Lemondás","consumer_law":"Fogyasztóvédelmi törvény (1997. évi CLV)"},
  "ID": {"country":"Indonesia","language":"Indonesian","currency":"IDR","currency_symbol":"Rp","country_tld":".id","cancel_word":"Batalkan","consumer_law":"UU Perlindungan Konsumen No. 8/1999"},
  "IE": {"country":"Ireland","language":"English","currency":"EUR","currency_symbol":"€","country_tld":".ie","cancel_word":"Cancel","consumer_law":"Consumer Rights Act 2022"},
  "IN": {"country":"India","language":"English","currency":"INR","currency_symbol":"₹","country_tld":".in","cancel_word":"Cancel","consumer_law":"Consumer Protection Act 2019 (CCPA)"},
  "IT": {"country":"Italy","language":"Italian","currency":"EUR","currency_symbol":"€","country_tld":".it","cancel_word":"Disdetta","consumer_law":"Codice del Consumo (D.Lgs. 206/2005)"},
  "JP": {"country":"Japan","language":"Japanese","currency":"JPY","currency_symbol":"¥","country_tld":".jp","cancel_word":"キャンセル","consumer_law":"消費者契約法 (Consumer Contract Act)"},
  "NG": {"country":"Nigeria","language":"English","currency":"NGN","currency_symbol":"₦","country_tld":".ng","cancel_word":"Cancel","consumer_law":"FCCPC Consumer Protection Act 2019"},
  "NL": {"country":"Netherlands","language":"Dutch","currency":"EUR","currency_symbol":"€","country_tld":".nl","cancel_word":"Opzeggen","consumer_law":"Burgerlijk Wetboek (Consumentenwet)"},
  "NO": {"country":"Norway","language":"Norwegian","currency":"NOK","currency_symbol":"kr","country_tld":".no","cancel_word":"Avslutte","consumer_law":"Forbrukerkjøpsloven"},
  "NZ": {"country":"New Zealand","language":"English","currency":"NZD","currency_symbol":"NZ$","country_tld":".co.nz","cancel_word":"Cancel","consumer_law":"Consumer Guarantees Act 1993"},
  "PE": {"country":"Peru","language":"Spanish","currency":"PEN","currency_symbol":"S/","country_tld":".pe","cancel_word":"Cancelar","consumer_law":"Código de Protección del Consumidor (Ley 29571)"},
  "PL": {"country":"Poland","language":"Polish","currency":"PLN","currency_symbol":"zł","country_tld":".pl","cancel_word":"Anuluj","consumer_law":"Ustawa o prawach konsumenta"},
  "PT": {"country":"Portugal","language":"Portuguese","currency":"EUR","currency_symbol":"€","country_tld":".pt","cancel_word":"Cancelar","consumer_law":"Lei de Defesa do Consumidor"},
  "RO": {"country":"Romania","language":"Romanian","currency":"RON","currency_symbol":"lei","country_tld":".ro","cancel_word":"Anulare","consumer_law":"Legea nr. 449/2003"},
  "SE": {"country":"Sweden","language":"Swedish","currency":"SEK","currency_symbol":"kr","country_tld":".se","cancel_word":"Avsluta","consumer_law":"Konsumentköplagen"},
  "SG": {"country":"Singapore","language":"English","currency":"SGD","currency_symbol":"S$","country_tld":".sg","cancel_word":"Cancel","consumer_law":"Consumer Protection (Fair Trading) Act (CPFTA)"},
  "TR": {"country":"Turkey","language":"Turkish","currency":"TRY","currency_symbol":"₺","country_tld":".tr","cancel_word":"İptal","consumer_law":"Tüketici Kanunu No. 6502"},
  "US": {"country":"United States","language":"English","currency":"USD","currency_symbol":"$","country_tld":".com","cancel_word":"Cancel","consumer_law":"FTC regulations / State consumer protection laws"},
  "ZA": {"country":"South Africa","language":"English","currency":"ZAR","currency_symbol":"R","country_tld":".co.za","cancel_word":"Cancel","consumer_law":"Consumer Protection Act 68/2008"},
}

def fill_with_ai(country_code: str, country_name: str) -> dict:
    """Use GPT-4o-mini to generate locale fields for unknown countries."""
    resp = client.chat.completions.create(
        model="gpt-4o-mini",
        response_format={"type": "json_object"},
        messages=[{"role": "user", "content":
            f"Return JSON with locale fields for country_code={country_code}, country={country_name}. "
            "Fields: country (full English name), language (main official language in English), "
            "currency (ISO 4217 code), currency_symbol (display symbol), country_tld (ccTLD), "
            "cancel_word (verb meaning 'to cancel/terminate a subscription' in the official language), "
            "consumer_law (official name of the main consumer protection law + citation). "
            "Be precise with legal citations. Return only valid JSON."
        }]
    )
    import json
    return json.loads(resp.choices[0].message.content)

def fill_service(svc: dict) -> dict:
    cc = (svc.get("country_code") or "").upper()
    if not cc:
        return svc
    defaults = LOCALE_DEFAULTS.get(cc)
    if not defaults:
        print(f"  🤖 AI generating locale for unknown country: {cc}")
        defaults = fill_with_ai(cc, svc.get("country", cc))
    for k, v in defaults.items():
        if not svc.get(k):  # never overwrite existing values
            svc[k] = v
    return svc

# ── Main ────────────────────────────────────────────────────────
with open(DATA_FILE, encoding="utf-8") as f:
    data = json.load(f)
svcs = data["services"] if isinstance(data, dict) else data

filled = 0
for svc in svcs:
    before = {k: svc.get(k) for k in ["language","currency","cancel_word","consumer_law"]}
    fill_service(svc)
    after  = {k: svc.get(k) for k in before}
    if before != after:
        filled += 1

if isinstance(data, dict):
    data["services"] = svcs
with open(DATA_FILE, "w", encoding="utf-8") as f:
    json.dump(data, f, ensure_ascii=False, indent=2)
print(f"✅ Locale fill complete — {filled}/{len(svcs)} services updated → {DATA_FILE}")

Usage:

python3 billoff_locale_fill.py services.json
# For any country not in the 37-country reference table, the script automatically
# calls GPT-4o-mini to generate the locale fields — never leaves fields blank.

Then run the guardrail --fix as a second validation pass.

Adding a new country (not yet in the 37-country list)

Add country_code to service rows — this is the only mandatory field. All other locale fields can be auto-generated.
Run billoff_locale_fill.py — fills language, currency, cancel_word, consumer_law via the hardcoded table (known countries) or GPT-4o-mini (unknown countries).
Run billoff_guardrail.py --fix — validates and patches any remaining gaps.
Verify the consumer_law field manually — AI-generated legal citations can be outdated. Cross-check against the official government source before launching at scale.
Add to REVIEW_SITES in download-helpers.js and openai.js — find the 2–3 most trusted review platforms for that country and add an entry. This improves Pass 1 web search quality for consumer ratings.
Run the smoke test — 1 random service from the new country, verify language, currency symbol, legal references, and heading capitalisation in the output article.

Go-live checklist

	Check	Notes
☐	JSON export received for each country	One file per market, named `XX_SERVICES.json`
☐	Guardrail run with `--fix`	Exit code 0 required before V1; exit 0 or 2 acceptable for V2/V3/V4
☐	API keys set in each script	Never commit keys to git — use env vars in production
☐	Test run on 5 services per method	`--test` flag, review output manually
☐	Cost verified on first 100 articles	Especially critical for V1 — see §4
☐	45-country smoke test passed	1 random service/country — verify language, currency, consumer law, sentence case — see §5
☐	Worker benchmark done (50 services)	`MAX_WORKERS` set to clean ceiling — see §6. Adaptive fallback enabled for overnight runs.
☐	Backup of original JSON kept	`cp SOURCE.json SOURCE.backup.json`
☐	First batch reviewed manually	Spot-check 10+ articles across different categories and countries
☐	Cron / sync script tested in dry-run	Run `billoff_sync.py` with a small test export first
☐	Import to CMS / DB tested end-to-end	Verify all fields (slug, faq, seo_title) map correctly to your schema

🚀 From Postclic DB to Published Page