Everything your developer needs to wire up the Billoff pipeline: choosing a generation method, validating data, verifying costs, and keeping the site in sync automatically.
Each country runs through this sequence. The JSON export is the entry point β every subsequent script enriches it in-place.
{"services": [...]} (pipeline format) and a plain [...] array.
--test flag before committing to 50 000. Inspect the output manually.OPENAI_API_KEY / GEMINI_API_KEY / ANTHROPIC_API_KEY placeholder that you must fill in locally.Each method produces a standalone Python script. Go to the corresponding docs page, set your editor config if needed, then click Python Script β¦ live config.
| Method | Model | Input | Quality | Cost / article | Speed |
|---|---|---|---|---|---|
| V1 β Research | GPT-4.1 β GPT-5 Mini | Web search (5 passes) | βββββ | $0.05β0.10 | 60β120 s |
| V2 β Rewrite | GPT-5 Mini | Existing seo_content | ββββ | $0.001β0.003 | 10β20 s |
| V3 β Gemini | Gemini 2.5 Flash | Existing seo_content | ββββ | $0.001β0.004 | 15β30 s |
| V4 β Claude | Claude Haiku 4.5 | Existing seo_content | βββββ | $0.003β0.005 | 20β35 s |
| Meta SEO | GPT-4o-mini | Enriched seo_content | β | $0.0004 | 1β3 s |
# V1 and V2 β OpenAI OPENAI_API_KEY = "sk-..." # V3 β Google Gemini GEMINI_API_KEY = "AIzaSy..." # V4 β Anthropic Claude ANTHROPIC_API_KEY = "sk-ant-..." # Meta SEO β OpenAI (gpt-4o-mini) OPENAI_API_KEY = "sk-..."
Download the guardrail script and run it before any generation. It validates your JSON, reports missing fields, and optionally auto-populates locale data (language, currency, cancel_word, consumer_law, slug_prefix) from built-in country defaults.
# Audit only (no file modified) python3 billoff_guardrail.py services.json # Auto-populate locale fields + overwrite python3 billoff_guardrail.py services.json --fix # Safe: write to a separate file python3 billoff_guardrail.py services.json --fix --out services_clean.json
| Category | Fields | Behaviour if missing |
|---|---|---|
| π΄ Critical | name |
Exit code 1 β pipeline will crash. Must be fixed manually. |
| π‘ Locale | language Β· currency Β· currency_symbol Β· cancel_word Β· consumer_law |
Exit code 2 β auto-fixable with --fix from COUNTRY_DEFAULTS. Article will be in the wrong language/currency if missing. |
| π΅ Quality | main_keyword Β· keywords Β· seo_content Β· website Β· cancellation_address |
Exit code 2 β won't block execution, but reduces article depth and accuracy. |
The script detects the country from:
"country": "FR" (or any ISO code) field in the JSON rootFR_services.json, DE_services.json, etc.If detection fails, add "country": "XX" (your ISO code) to the root of your JSON, or rename the file to start with the two-letter country code.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Billoff Pre-Pipeline Guardrail βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ File: services.json Services: 4821 Country: FR β detected from JSON root "country" field Defaults: French Β· EUR β¬ Β· RΓ©siliation Field Missing Coverage Note ββββββββββββββββββββββββββββ ββββββββ ββββββββββ ββββββββββ β name 0 100% β language 0 100% π‘ locale currency 42 99% auto-fixable π‘ locale currency_symbol 42 99% auto-fixable β cancel_word 0 100% β consumer_law 0 100% π΅ quality main_keyword 183 96% reduces quality π΅ quality seo_content 12 100% reduces quality β No critical issues β pipeline can run. π‘ 42 locale fields missing β run --fix to auto-populate from FR defaults. π΅ 195 quality fields missing β article depth may be reduced.
# Run on 5 random services β detailed output, no file modified python3 billoff_v2_generate.py services.json --test python3 billoff_v5_metadata.py services.json --test
# 1. Article generation (pick your method) python3 billoff_v2_generate.py services.json # 2. SEO metadata (always run after article generation) python3 billoff_v5_metadata.py services.json # services.json now has: seo_content + seo_title + h1 + seo_description + slug + faq
seo_content (HTML article) Β· seo_title Β· h1 Β· seo_description Β· slug Β· faq[]
Especially for V1 (web search), always run a 100-article batch first and compare actual vs estimated costs before committing to the full country catalogue.
Create a test file with the first 100 services from your JSON:
# One-liner to extract the first 100 services into a test file python3 -c " import json with open('services.json') as f: d = json.load(f) # supports both formats svcs = d['services'] if isinstance(d, dict) else d test = svcs[:100] out = {'country': d.get('country'), 'services': test} if isinstance(d, dict) else test with open('services_100.json', 'w') as f: json.dump(out, f, indent=2) print(f'Wrote {len(test)} services to services_100.json') " # Run the pipeline on these 100 python3 billoff_v1_generate.py services_100.json python3 billoff_v5_metadata.py services_100.json # Sum up actual costs from the output file python3 -c " import json with open('services_100.json') as f: d = json.load(f) svcs = d['services'] if isinstance(d, dict) else d total_usd = sum(s.get('cost_usd', 0) for s in svcs) total_eur = total_usd * 0.92 avg = total_usd / max(len(svcs),1) done = sum(1 for s in svcs if s.get('seo_content','').strip()) print(f'Generated: {done}/{len(svcs)} articles') print(f'Total cost: \${total_usd:.4f} USD / β¬{total_eur:.4f}') print(f'Avg / article: \${avg:.5f} USD (projected 50k: \${avg*50000:.0f} USD)') "
| Method | Estimated cost / 100 | Projected / 50 000 | Key cost driver |
|---|---|---|---|
| V1 | $5 β $10 | $2 500 β $5 000 | 5Γ web search + 5-pass GPT-4.1 |
| V2 | $0.10 β $0.30 | $50 β $150 | GPT-5 Mini rewrite |
| V3 | $0.10 β $0.40 | $50 β $200 | Gemini 2.5 Flash (thinking) |
| V4 | $0.30 β $0.50 | $150 β $250 | Claude Haiku 4.5 |
| Meta SEO | $0.04 | $20 | GPT-4o-mini structured JSON |
seo_content length β V1 truncates at 4 500 chars but all input is charged; (2) Pass 5 structured extraction may be retrying due to JSON parse errors β add logging to chatJSON() in the V1 script.
Before committing to a 50 000-article run, two things will save your weekend: a multi-country smoke test (1 random service per country β catches locale bugs, bad API keys, and config errors in under 5 min) and checkpoint saves every 500 services so a crash at article 12 000 doesn't cost you 6 hours of API spend.
Extract one service per country and run the full pipeline on this tiny slice before the production batch:
# 1. Build smoke-test file β 1 random service per country python3 -c " import json, random with open('services.json') as f: d = json.load(f) svcs = d['services'] if isinstance(d, dict) else d by_country = {} for s in svcs: cc = s.get('country_code') or s.get('country', 'XX') by_country.setdefault(cc, []).append(s) smoke = [random.choice(v) for v in by_country.values()] out = {'country': 'MULTI', 'services': smoke} with open('smoke_test.json', 'w') as f: json.dump(out, f, indent=2) print(f'Wrote {len(smoke)} services across {len(by_country)} countries β smoke_test.json') " # 2. Run your chosen method on the smoke file (no --test flag: we want ALL countries) python3 billoff_v2_generate.py smoke_test.json python3 billoff_v5_metadata.py smoke_test.json # 3. Spot-check output β for each country verify: # - language is correct (no English article for a German service) # - currency symbol is right (β¬ not A$ for France) # - consumer law name is injected (not blank or AU fallback) # - headings are in sentence case # - article length is β₯ 1 500 words python3 -c " import json with open('results_v2.json') as f: arts = json.load(f) for a in arts: svc = a.get('service_name','?') wc = len(a.get('content','').split()) ok = 'β ' if wc >= 1500 else 'β' print(f'{ok} {svc:40s} {wc:,} words \${a.get(\"cost_usd\",0):.4f}') "
--fix again), don't proceed to the full batch. A bad locale config silently generates wrong-language articles at scale.
Add this pattern to the main() of any generated script. The script saves a .checkpoint.json every 500 completed services and deletes it automatically on successful completion:
# Add near the top of main(): CHECKPOINT_EVERY = 500 # tune: smaller = safer, slightly slower I/O _checkpoint = OUT_FILE.with_suffix(".checkpoint.json") # Inside the ThreadPoolExecutor loop β replace the existing loop body: results = [] with ThreadPoolExecutor(max_workers=MAX_WORKERS) as ex: futures = {ex.submit(process_service, svc): svc for svc in services} for i, future in enumerate(as_completed(futures), 1): try: results.append(future.result()) except Exception as e: print(f" β Unhandled: {e}") if CHECKPOINT_EVERY > 0 and i % CHECKPOINT_EVERY == 0: _checkpoint.parent.mkdir(parents=True, exist_ok=True) with open(_checkpoint, "w", encoding="utf-8") as f: json.dump(results, f, ensure_ascii=False, indent=2) print(f" πΎ Checkpoint {i}/{len(services)} β {len(results)} articles saved") # Remove checkpoint file after a clean run: if _checkpoint.exists(): _checkpoint.unlink() print(" π Checkpoint removed β run complete")
Add this block at the start of main(), before the ThreadPoolExecutor. It reads the checkpoint (or the final output file if it already exists), skips already-processed services, and continues from where the previous run stopped:
# --- AUTO-RESUME BLOCK (add before the ThreadPoolExecutor loop) --- results = [] already_done = set() if OUT_FILE.exists(): with open(OUT_FILE, encoding="utf-8") as f: results = json.load(f) already_done = {r.get("service_name", "") for r in results} print(f" β© Output file exists: {len(already_done)} articles already done β skipping") elif _checkpoint.exists(): with open(_checkpoint, encoding="utf-8") as f: results = json.load(f) already_done = {r.get("service_name", "") for r in results} print(f" β© Checkpoint found: {len(already_done)} done β resuming from checkpoint") services = [s for s in services if s.get("name", "") not in already_done] if not services: print(" β All services already processed β nothing to do") sys.exit(0) print(f" β {len(services)} remaining ({len(already_done)} already done)") # --- END RESUME BLOCK ---
--checkpoint N (default 500) and --resume flags that activate this pattern out of the box.
# Day 0 β validation python3 billoff_guardrail.py services.json --fix # fix locale fields for all countries python3 billoff_v2_generate.py smoke_test.json # smoke test: 45 articles (1/country) python3 billoff_v5_metadata.py smoke_test.json # verify metadata for all locales # β review output manually before going further # Day 1 β cost gate python3 billoff_v2_generate.py services_100.json # 100-article cost check (see Β§4) # β confirm cost Γ 500 is within budget # Day 2 β production run (with checkpoint + resume ready) python3 billoff_v2_generate.py services.json --workers 50 # will auto-resume if interrupted python3 billoff_v5_metadata.py services.json --workers 100 # run after articles
All Billoff batch scripts process services in parallel using an async worker pool. The number of concurrent API requests β MAX_WORKERS β is the single most impactful performance knob. Too low: your 50 000-service run takes days. Too high: you hit rate limits and waste retries.
| Method | Default MAX_WORKERS | API | Rate limit context |
|---|---|---|---|
| V1 (web search + GPT-4.1) | ${maxWorkers} | OpenAI + web search | Web search is the binding constraint. OpenAI default 5 β raise to 10 max. |
| V2 (GPT-4.1 research + GPT-5 Mini write) | 10 | OpenAI | 2 API calls/service. Tier 1: keep β€ 10. Tier 2+: safe up to 30. |
| V3 (Gemini 2.5 Flash) | 3 | Google AI | Free tier: 10 RPM β keep β€ 3. Pay tier (1 000 RPM): safe up to 50. |
| V4 (Claude Haiku 4.5) | 20 | Anthropic | Tier 4: 4 000 RPM / 800 K out-tokens/min β ceiling ~33 (output-token bound). Safe: 20. Max: 30. |
| V5 Meta SEO (GPT-4o-mini) | 50 | OpenAI | Metadata only (~900 tokens/call). Tier 1: β€ 30. Tier 3+: up to 150. |
800 000 Γ· 2 000 = 400 articles/min. With ~5 s average latency per Haiku call,
each worker delivers 60 Γ· 5 = 12 articles/min, so 400 Γ· 12 β 33 workers.
The RPM limit (4 000) is not the constraint β the output-token budget is.
Run this benchmark on a 50-service test file before your full batch:
# 1. Extract a 50-service benchmark slice python3 -c " import json with open('services.json') as f: d = json.load(f) svcs = d['services'] if isinstance(d, dict) else d bench = svcs[:50] out = {'country': d.get('country'), 'services': bench} if isinstance(d, dict) else bench with open('bench50.json', 'w') as f: json.dump(out, f, indent=2) print(f'Wrote {len(bench)} services to bench50.json') " # 2. Run at increasing worker levels β note duration and any 429 errors MAX_WORKERS=5 python3 billoff_v2_generate.py bench50.json # baseline MAX_WORKERS=20 python3 billoff_v2_generate.py bench50.json MAX_WORKERS=50 python3 billoff_v2_generate.py bench50.json MAX_WORKERS=100 python3 billoff_v2_generate.py bench50.json # 3. Pick the highest level with 0 rate-limit retries β that's your MAX_WORKERS The script logs "Rate limit β sleeping Xs" on every retry. Count them.
To pass MAX_WORKERS via env var, add this near the top of each script:
import os MAX_WORKERS = int(os.getenv("MAX_WORKERS", "20")) # default 20, override via env
For large unattended runs (overnight batches), build in a self-regulating worker count that halves on rate limit and recovers slowly. Add this wrapper around your semaphore loop:
import asyncio, time, math async def process_with_adaptive_workers(services, generate_fn, initial_workers=20): """ Process services with adaptive concurrency. Halves worker count on rate limit. Increases by 1 every 60 s if clean. """ workers = initial_workers min_workers = 2 max_workers = initial_workers results = [] last_ok_ts = time.time() pending = list(services) while pending: batch = pending[:workers] pending = pending[workers:] sem = asyncio.Semaphore(workers) async def _run(svc): async with sem: return await generate_fn(svc) tasks = [asyncio.create_task(_run(s)) for s in batch] rate_limited = False for coro in asyncio.as_completed(tasks): try: results.append(await coro) last_ok_ts = time.time() except Exception as e: if "429" in str(e) or "rate_limit" in str(e).lower(): rate_limited = True for t in tasks: t.cancel() # abort batch pending = batch + pending # requeue entire batch break raise if rate_limited: workers = max(min_workers, workers // 2) wait = 60 print(ff" β‘ Rate limit hit β dropping to {workers} workers, sleeping {wait}s") await asyncio.sleep(wait) elif time.time() - last_ok_ts > 60 and workers < max_workers: workers = min(max_workers, workers + 1) # recover +1 every 60 s of clean running return results
await process_with_adaptive_workers(services, generate_one) where generate_one(svc) is your per-service coroutine. The function halves concurrency on every 429 and slowly ramps back. For overnight batch of 50 000 services this is the recommended pattern β it self-tunes without manual intervention.
| Provider | Free / Tier 1 | Tier 2 | Tier 3+ | Key limit type |
|---|---|---|---|---|
| OpenAI (GPT-4.1, 5 Mini, 4o-mini) | 500 RPM | 5 000 RPM | 10 000 RPM | RPM + TPM β both enforced |
| Anthropic (Haiku 4.5) | 50 RPM | 1 000 RPM | 2 000 RPM | RPM + input tokens/min |
| Google AI (Gemini 2.5 Flash) | 10 RPM | 1 000 RPM | 2 000 RPM | RPM β thinking tokens count extra |
| Bing Web Search (V1) | 3 TPS / 1 000/mo | 10 TPS | custom | Transactions per second |
MAX_WORKERS=10 that's 50 concurrent search requests. Bing S1 tier allows 3 TPS β so keep V1 workers β€ 3 on S1. Use Brave Search (unlimited tier) if you need higher parallelism on V1.
| Method | OpenAI Tier 1 | OpenAI Tier 2+ | Expected duration (50k) |
|---|---|---|---|
| V1 | 3 | 5 | ~7β14 days (search-bound) |
| V2 | 20 | 80 | ~8 h (Tier 2) / ~40 h (Tier 1) |
| V3 | 5 (free) / 40 (pay) | β | ~14 h (pay tier) |
| V4 | 10 | 30 (Tier 2) | ~20 h (Tier 2) |
| V5 Meta SEO | 30 | 100 | ~3 h (Tier 2) |
Estimates assume ~1.5 s average latency per API call. Actual duration depends on model load and token count.
MAX_WORKERS accordingly, and enable the adaptive fallback for overnight batches.
Once the initial bulk generation is done, set up a cron job to process only new or updated services β so every addition to Postclic automatically gets an article and metadata.
Every X days / on new service addition:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Postclic DB β JSON export (your existing export mechanism)
β
Find NEW services (diff by slug or name vs last run)
β
Run guardrail --fix (populate locale fields)
β
Run article generator (V1/V2/V3/V4 on new services only)
β
Run Meta SEO (metadata on new services only)
β
Merge into master JSON (append new results, keep existing)
β
Import / publish (your CMS import step)
#!/usr/bin/env python3 """billoff_sync.py β Incremental pipeline runner. Finds services in NEW_EXPORT that are not yet in MASTER_JSON (matched by name), processes them through the pipeline, and merges results back into MASTER_JSON. """ import json, os, subprocess, sys from pathlib import Path MASTER_JSON = "data/services.json" # your accumulating master file NEW_EXPORT = "data/export_latest.json" # latest Postclic data (any source) NEW_ONLY = "data/new_services.json" # temp file for new services only METHOD = "v2" # v1 / v2 / v3 / v4 ARTICLE_SCRIPT = f"billoff_{METHOD}_generate.py" META_SCRIPT = "billoff_v5_metadata.py" def _load_services(path): with open(path) as f: d = json.load(f) return (d["services"] if isinstance(d, dict) else d), d def _save_services(path, services, original_data): if isinstance(original_data, list): out = services else: original_data["services"] = services out = original_data with open(path, "w") as f: json.dump(out, f, ensure_ascii=False, indent=2) # Load master (existing processed services) master_svcs, master_data = _load_services(MASTER_JSON) existing_names = {s["name"] for s in master_svcs} # Load latest export new_svcs, new_data = _load_services(NEW_EXPORT) new_only = [s for s in new_svcs if s["name"] not in existing_names] if not new_only: print(f" β No new services found β nothing to process.") sys.exit(0) print(f" π {len(new_only)} new services found β running pipeline...") # Write new-only subset to temp file _save_services(NEW_ONLY, new_only, {"country": master_data.get("country","AU"), "services": new_only}) # Guardrail β fix locale fields subprocess.run(["python3", "billoff_guardrail.py", NEW_ONLY, "--fix"], check=True) # Article generation subprocess.run(["python3", ARTICLE_SCRIPT, NEW_ONLY], check=True) # Meta SEO subprocess.run(["python3", META_SCRIPT, NEW_ONLY], check=True) # Merge processed new services back into master processed_svcs, _ = _load_services(NEW_ONLY) master_svcs.extend(processed_svcs) _save_services(MASTER_JSON, master_svcs, master_data) print(f" β Merged {len(processed_svcs)} new articles into {MASTER_JSON}") os.remove(NEW_ONLY)
# Run every Monday at 3 AM β add to crontab with: crontab -e 0 3 * * 1 cd /var/www/billoff && python3 billoff_sync.py >> logs/sync.log 2>&1 # Or with a virtual environment: 0 3 * * 1 cd /var/www/billoff && /usr/local/bin/python3 billoff_sync.py >> logs/sync.log 2>&1
name field. If a service can be renamed on Postclic, use a stable id field instead β just replace the existing_names set-comparison with an existing_ids comparison.
Le script de sync attend juste un fichier AU_EXPORT_LATEST.json en entrΓ©e. Comment tu le gΓ©nΓ¨res, c'est ton affaire β plusieurs approches valides :
SELECT * FROM services WHERE country='AU' AND updated_at > :last_runcurl suffit avant le syncpsycopg2, pymysql, SQLAlchemyβ¦ et tu sΓ©rialises toi-mΓͺmePeu importe la mΓ©thode β le pipeline Billoff se fiche de l'origine. Il lit du JSON, c'est tout. La structure attendue :
# Format pipeline standard { "country": "AU", "services": [ {"name": "...", "seo_content": "...", ...} ] } # Ou plain array β les deux sont acceptΓ©s [ {"name": "...", "seo_content": "...", ...} ]
Every service row in your JSON must carry the locale fields below for the pipeline to generate correctly-localised articles (right language, currency, consumer law, cancel word). The guardrail --fix will auto-populate missing fields from the country defaults in this table β but only if country_code is present on each service row.
consumer_law field β see the AI locale generator script below.
| Field | Example (FR) | Used by | Notes |
|---|---|---|---|
country_code | FR | All methods + guardrail | ISO 3166-1 alpha-2 β required |
country | France | V1βV4 prompts | Full country name in article language |
language | French | V1βV4 prompts | Language name in English |
currency | EUR | V1βV4 prompts | ISO 4217 currency code |
currency_symbol | β¬ | V1βV4 prompts | Symbol used in pricing tables |
country_tld | .fr | V1 search queries | Country TLD for competitor search |
cancel_word | RΓ©silier | V1βV4 headings | Native-language cancel verb |
consumer_law | Loi Hamon / Code de la Consommation | V1βV4 legal section | Official law name β cited in articles |
city / region / timezone | Paris / Γle-de-France / Europe/Paris | V1 geo-targeting | Optional β improves web search relevance |
These values are baked into the guardrail --fix auto-population and the REVIEW_SITES map used in V1 web searches.
| Code | Country | Language | Currency | Symbol | cancel_word | consumer_law (short) | TLD | review_sites |
|---|---|---|---|---|---|---|---|---|
AE | UAE | Arabic | AED | AED | Ψ₯ΩΨΊΨ§Ψ‘ | Federal Law 15/2020 | .ae | Google Reviews |
AR | Argentina | Spanish | ARS | $ | Cancelar | Ley Defensa Consumidor 24.240 | .ar | Trustpilot.ar, Mercado Libre |
AT | Austria | German | EUR | β¬ | KΓΌndigen | Konsumentenschutzgesetz (KSchG) | .at | Trustpilot.at |
BE | Belgium | French | EUR | β¬ | Annuler | Code de droit Γ©conomique (CDE) | .be | Trustpilot.be |
BG | Bulgaria | Bulgarian | BGN | Π»Π² | ΠΠ½ΡΠ»ΠΈΡΠ°Π½Π΅ | ΠΠ°ΠΊΠΎΠ½ Π·Π° Π·Π°ΡΠΈΡΠ° Π½Π° ΠΏΠΎΡΡΠ΅Π±ΠΈΡΠ΅Π»ΠΈΡΠ΅ | .bg | Trustpilot.bg |
BR | Brazil | Portuguese | BRL | R$ | Cancelar | CΓ³digo de Defesa do Consumidor (CDC) | .br | Reclame Aqui |
CA | Canada | English | CAD | CA$ | Cancel | Consumer Protection Acts (provincial) | .ca | Trustpilot.ca, BBB |
CH | Switzerland | German | CHF | CHF | KΓΌndigen | Konsumentenschutzgesetz | .ch | Trustpilot.ch |
CL | Chile | Spanish | CLP | $ | Cancelar | Ley 19.496 (LPDC) | .cl | Reclamos.cl |
CO | Colombia | Spanish | COP | $ | Cancelar | Estatuto del Consumidor (Ley 1480) | .co | SIC Colombia |
CZ | Czech Republic | Czech | CZK | KΔ | ZruΕ‘it | ZΓ‘kon o ochranΔ spotΕebitele | .cz | Heureka.cz |
DE | Germany | German | EUR | β¬ | KΓΌndigen | BGB Β§Β§ 312 / Widerrufsrecht | .de | Trusted Shops, Trustpilot.de |
DK | Denmark | Danish | DKK | kr | Opsige | ForbrugerkΓΈbsloven | .dk | Trustpilot.dk |
ES | Spain | Spanish | EUR | β¬ | Cancelar | LGDCU / Ley General Defensa Consumidores | .es | Trustpilot.es |
FI | Finland | Finnish | EUR | β¬ | Peruuttaa | Kuluttajansuojalaki | .fi | Trustpilot.fi |
FR | France | French | EUR | β¬ | RΓ©silier | Loi Hamon / Code de la Consommation | .fr | Avis VΓ©rifiΓ©s, Trustpilot.fr |
GB | United Kingdom | English | GBP | Β£ | Cancel | Consumer Rights Act 2015 | .co.uk | Trustpilot.co.uk, Reviews.io |
GR | Greece | Greek | EUR | β¬ | ΞΞΊΟΟΟΟΞ· | ΞΟΞΌΞΏΟ 2251/1994 (Consumer Protection) | .gr | Trustpilot.gr, Skroutz.gr |
HU | Hungary | Hungarian | HUF | Ft | LemondΓ‘s | FogyasztΓ³vΓ©delmi tΓΆrvΓ©ny (1997. Γ©vi CLV) | .hu | ΓrukeresΕ.hu |
ID | Indonesia | Indonesian | IDR | Rp | Batalkan | UU Perlindungan Konsumen No. 8/1999 | .id | Google Reviews |
IE | Ireland | English | EUR | β¬ | Cancel | Consumer Rights Act 2022 | .ie | Trustpilot.ie |
IN | India | English | INR | βΉ | Cancel | Consumer Protection Act 2019 (CCPA) | .in | MouthShut.com, Trustpilot.in |
IT | Italy | Italian | EUR | β¬ | Disdetta | Codice del Consumo (D.Lgs. 206/2005) | .it | Trustpilot.it, eKomi |
JP | Japan | Japanese | JPY | Β₯ | γγ£γ³γ»γ« | ζΆθ²»θ ε₯η΄ζ³ (Consumer Contract Act) | .jp | Kakaku.com |
NG | Nigeria | English | NGN | β¦ | Cancel | FCCPC Consumer Protection Act 2019 | .ng | Google Reviews |
NL | Netherlands | Dutch | EUR | β¬ | Opzeggen | Burgerlijk Wetboek (Consumentenwet) | .nl | Trustpilot.nl, Kiyoh |
NO | Norway | Norwegian | NOK | kr | Avslutte | ForbrukerkjΓΈpsloven | .no | Trustpilot.no |
NZ | New Zealand | English | NZD | NZ$ | Cancel | Consumer Guarantees Act 1993 | .co.nz | Consumer.org.nz |
PE | Peru | Spanish | PEN | S/ | Cancelar | CΓ³digo de ProtecciΓ³n del Consumidor (Ley 29571) | .pe | Indecopi.gob.pe |
PL | Poland | Polish | PLN | zΕ | Anuluj | Ustawa o prawach konsumenta | .pl | Opineo.pl, Ceneo.pl |
PT | Portugal | Portuguese | EUR | β¬ | Cancelar | Lei de Defesa do Consumidor | .pt | Trustpilot.pt |
RO | Romania | Romanian | RON | lei | Anulare | Legea nr. 449/2003 | .ro | Trustpilot.ro |
SE | Sweden | Swedish | SEK | kr | Avsluta | KonsumentkΓΆplagen | .se | Trustpilot.se, Prisjakt |
SG | Singapore | English | SGD | S$ | Cancel | Consumer Protection (Fair Trading) Act (CPFTA) | .sg | Trustpilot.sg |
TR | Turkey | Turkish | TRY | βΊ | Δ°ptal | TΓΌketici Kanunu No. 6502 | .tr | Εikayetvar.com |
US | United States | English | USD | $ | Cancel | FTC regulations / State consumer protection laws | .com | BBB, Trustpilot.com, Yelp |
ZA | South Africa | English | ZAR | R | Cancel | Consumer Protection Act 68/2008 | .co.za | Hellopeter.com |
When a service row is missing locale fields (or you're adding a country not in the table above), run this script. It uses GPT-4o-mini with structured output to generate all locale fields accurately, then writes them back into your JSON.
#!/usr/bin/env python3 """billoff_locale_fill.py β AI-powered locale field generator. Fills missing locale fields (language, currency, consumer_law, cancel_word, etc.) for any country, using GPT-4o-mini structured output. Run BEFORE the guardrail and before any generation script. """ import json, os, sys from openai import OpenAI from pathlib import Path OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") or "" DATA_FILE = Path(sys.argv[1]) if len(sys.argv) > 1 else Path("data/services.json") client = OpenAI(api_key=OPENAI_API_KEY) # ββ Hard-coded defaults for all 37 Postclic markets ββββββββββββββ # These are always preferred over AI calls for known countries. LOCALE_DEFAULTS = { "AE": {"country":"UAE","language":"Arabic","currency":"AED","currency_symbol":"AED","country_tld":".ae","cancel_word":"Ψ₯ΩΨΊΨ§Ψ‘","consumer_law":"UAE Consumer Protection Law (Federal Law 15/2020)"}, "AR": {"country":"Argentina","language":"Spanish","currency":"ARS","currency_symbol":"$","country_tld":".ar","cancel_word":"Cancelar","consumer_law":"Ley de Defensa del Consumidor (Ley 24.240)"}, "AT": {"country":"Austria","language":"German","currency":"EUR","currency_symbol":"β¬","country_tld":".at","cancel_word":"KΓΌndigen","consumer_law":"Konsumentenschutzgesetz (KSchG)"}, "BE": {"country":"Belgium","language":"French","currency":"EUR","currency_symbol":"β¬","country_tld":".be","cancel_word":"Annuler","consumer_law":"Code de droit Γ©conomique (CDE)"}, "BG": {"country":"Bulgaria","language":"Bulgarian","currency":"BGN","currency_symbol":"Π»Π²","country_tld":".bg","cancel_word":"ΠΠ½ΡΠ»ΠΈΡΠ°Π½Π΅","consumer_law":"ΠΠ°ΠΊΠΎΠ½ Π·Π° Π·Π°ΡΠΈΡΠ° Π½Π° ΠΏΠΎΡΡΠ΅Π±ΠΈΡΠ΅Π»ΠΈΡΠ΅"}, "BR": {"country":"Brazil","language":"Portuguese","currency":"BRL","currency_symbol":"R$","country_tld":".br","cancel_word":"Cancelar","consumer_law":"CΓ³digo de Defesa do Consumidor (CDC)"}, "CA": {"country":"Canada","language":"English","currency":"CAD","currency_symbol":"CA$","country_tld":".ca","cancel_word":"Cancel","consumer_law":"Consumer Protection Acts (provincial)"}, "CH": {"country":"Switzerland","language":"German","currency":"CHF","currency_symbol":"CHF","country_tld":".ch","cancel_word":"KΓΌndigen","consumer_law":"Konsumentenschutzgesetz"}, "CL": {"country":"Chile","language":"Spanish","currency":"CLP","currency_symbol":"$","country_tld":".cl","cancel_word":"Cancelar","consumer_law":"Ley 19.496 (LPDC)"}, "CO": {"country":"Colombia","language":"Spanish","currency":"COP","currency_symbol":"$","country_tld":".co","cancel_word":"Cancelar","consumer_law":"Estatuto del Consumidor (Ley 1480)"}, "CZ": {"country":"Czech Republic","language":"Czech","currency":"CZK","currency_symbol":"KΔ","country_tld":".cz","cancel_word":"ZruΕ‘it","consumer_law":"ZΓ‘kon o ochranΔ spotΕebitele"}, "DE": {"country":"Germany","language":"German","currency":"EUR","currency_symbol":"β¬","country_tld":".de","cancel_word":"KΓΌndigen","consumer_law":"BGB Β§Β§ 312 ff. / Widerrufsrecht"}, "DK": {"country":"Denmark","language":"Danish","currency":"DKK","currency_symbol":"kr","country_tld":".dk","cancel_word":"Opsige","consumer_law":"ForbrugerkΓΈbsloven"}, "ES": {"country":"Spain","language":"Spanish","currency":"EUR","currency_symbol":"β¬","country_tld":".es","cancel_word":"Cancelar","consumer_law":"Ley General para la Defensa de los Consumidores (LGDCU)"}, "FI": {"country":"Finland","language":"Finnish","currency":"EUR","currency_symbol":"β¬","country_tld":".fi","cancel_word":"Peruuttaa","consumer_law":"Kuluttajansuojalaki"}, "FR": {"country":"France","language":"French","currency":"EUR","currency_symbol":"β¬","country_tld":".fr","cancel_word":"RΓ©silier","consumer_law":"Loi Hamon / Code de la Consommation"}, "GB": {"country":"United Kingdom","language":"English","currency":"GBP","currency_symbol":"Β£","country_tld":".co.uk","cancel_word":"Cancel","consumer_law":"Consumer Rights Act 2015"}, "GR": {"country":"Greece","language":"Greek","currency":"EUR","currency_symbol":"β¬","country_tld":".gr","cancel_word":"ΞΞΊΟΟΟΟΞ·","consumer_law":"ΞΟΞΌΞΏΟ 2251/1994 (Consumer Protection)"}, "HU": {"country":"Hungary","language":"Hungarian","currency":"HUF","currency_symbol":"Ft","country_tld":".hu","cancel_word":"LemondΓ‘s","consumer_law":"FogyasztΓ³vΓ©delmi tΓΆrvΓ©ny (1997. Γ©vi CLV)"}, "ID": {"country":"Indonesia","language":"Indonesian","currency":"IDR","currency_symbol":"Rp","country_tld":".id","cancel_word":"Batalkan","consumer_law":"UU Perlindungan Konsumen No. 8/1999"}, "IE": {"country":"Ireland","language":"English","currency":"EUR","currency_symbol":"β¬","country_tld":".ie","cancel_word":"Cancel","consumer_law":"Consumer Rights Act 2022"}, "IN": {"country":"India","language":"English","currency":"INR","currency_symbol":"βΉ","country_tld":".in","cancel_word":"Cancel","consumer_law":"Consumer Protection Act 2019 (CCPA)"}, "IT": {"country":"Italy","language":"Italian","currency":"EUR","currency_symbol":"β¬","country_tld":".it","cancel_word":"Disdetta","consumer_law":"Codice del Consumo (D.Lgs. 206/2005)"}, "JP": {"country":"Japan","language":"Japanese","currency":"JPY","currency_symbol":"Β₯","country_tld":".jp","cancel_word":"γγ£γ³γ»γ«","consumer_law":"ζΆθ²»θ ε₯η΄ζ³ (Consumer Contract Act)"}, "NG": {"country":"Nigeria","language":"English","currency":"NGN","currency_symbol":"β¦","country_tld":".ng","cancel_word":"Cancel","consumer_law":"FCCPC Consumer Protection Act 2019"}, "NL": {"country":"Netherlands","language":"Dutch","currency":"EUR","currency_symbol":"β¬","country_tld":".nl","cancel_word":"Opzeggen","consumer_law":"Burgerlijk Wetboek (Consumentenwet)"}, "NO": {"country":"Norway","language":"Norwegian","currency":"NOK","currency_symbol":"kr","country_tld":".no","cancel_word":"Avslutte","consumer_law":"ForbrukerkjΓΈpsloven"}, "NZ": {"country":"New Zealand","language":"English","currency":"NZD","currency_symbol":"NZ$","country_tld":".co.nz","cancel_word":"Cancel","consumer_law":"Consumer Guarantees Act 1993"}, "PE": {"country":"Peru","language":"Spanish","currency":"PEN","currency_symbol":"S/","country_tld":".pe","cancel_word":"Cancelar","consumer_law":"CΓ³digo de ProtecciΓ³n del Consumidor (Ley 29571)"}, "PL": {"country":"Poland","language":"Polish","currency":"PLN","currency_symbol":"zΕ","country_tld":".pl","cancel_word":"Anuluj","consumer_law":"Ustawa o prawach konsumenta"}, "PT": {"country":"Portugal","language":"Portuguese","currency":"EUR","currency_symbol":"β¬","country_tld":".pt","cancel_word":"Cancelar","consumer_law":"Lei de Defesa do Consumidor"}, "RO": {"country":"Romania","language":"Romanian","currency":"RON","currency_symbol":"lei","country_tld":".ro","cancel_word":"Anulare","consumer_law":"Legea nr. 449/2003"}, "SE": {"country":"Sweden","language":"Swedish","currency":"SEK","currency_symbol":"kr","country_tld":".se","cancel_word":"Avsluta","consumer_law":"KonsumentkΓΆplagen"}, "SG": {"country":"Singapore","language":"English","currency":"SGD","currency_symbol":"S$","country_tld":".sg","cancel_word":"Cancel","consumer_law":"Consumer Protection (Fair Trading) Act (CPFTA)"}, "TR": {"country":"Turkey","language":"Turkish","currency":"TRY","currency_symbol":"βΊ","country_tld":".tr","cancel_word":"Δ°ptal","consumer_law":"TΓΌketici Kanunu No. 6502"}, "US": {"country":"United States","language":"English","currency":"USD","currency_symbol":"$","country_tld":".com","cancel_word":"Cancel","consumer_law":"FTC regulations / State consumer protection laws"}, "ZA": {"country":"South Africa","language":"English","currency":"ZAR","currency_symbol":"R","country_tld":".co.za","cancel_word":"Cancel","consumer_law":"Consumer Protection Act 68/2008"}, } def fill_with_ai(country_code: str, country_name: str) -> dict: """Use GPT-4o-mini to generate locale fields for unknown countries.""" resp = client.chat.completions.create( model="gpt-4o-mini", response_format={"type": "json_object"}, messages=[{"role": "user", "content": f"Return JSON with locale fields for country_code={country_code}, country={country_name}. " "Fields: country (full English name), language (main official language in English), " "currency (ISO 4217 code), currency_symbol (display symbol), country_tld (ccTLD), " "cancel_word (verb meaning 'to cancel/terminate a subscription' in the official language), " "consumer_law (official name of the main consumer protection law + citation). " "Be precise with legal citations. Return only valid JSON." }] ) import json return json.loads(resp.choices[0].message.content) def fill_service(svc: dict) -> dict: cc = (svc.get("country_code") or "").upper() if not cc: return svc defaults = LOCALE_DEFAULTS.get(cc) if not defaults: print(f" π€ AI generating locale for unknown country: {cc}") defaults = fill_with_ai(cc, svc.get("country", cc)) for k, v in defaults.items(): if not svc.get(k): # never overwrite existing values svc[k] = v return svc # ββ Main ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ with open(DATA_FILE, encoding="utf-8") as f: data = json.load(f) svcs = data["services"] if isinstance(data, dict) else data filled = 0 for svc in svcs: before = {k: svc.get(k) for k in ["language","currency","cancel_word","consumer_law"]} fill_service(svc) after = {k: svc.get(k) for k in before} if before != after: filled += 1 if isinstance(data, dict): data["services"] = svcs with open(DATA_FILE, "w", encoding="utf-8") as f: json.dump(data, f, ensure_ascii=False, indent=2) print(f"β Locale fill complete β {filled}/{len(svcs)} services updated β {DATA_FILE}")
python3 billoff_locale_fill.py services.json
# For any country not in the 37-country reference table, the script automatically
# calls GPT-4o-mini to generate the locale fields β never leaves fields blank.
--fix as a second validation pass.
country_code to service rows β this is the only mandatory field. All other locale fields can be auto-generated.billoff_locale_fill.py β fills language, currency, cancel_word, consumer_law via the hardcoded table (known countries) or GPT-4o-mini (unknown countries).billoff_guardrail.py --fix β validates and patches any remaining gaps.REVIEW_SITES in download-helpers.js and openai.js β find the 2β3 most trusted review platforms for that country and add an entry. This improves Pass 1 web search quality for consumer ratings.| Check | Notes | |
|---|---|---|
| β | JSON export received for each country | One file per market, named XX_SERVICES.json |
| β | Guardrail run with --fix | Exit code 0 required before V1; exit 0 or 2 acceptable for V2/V3/V4 |
| β | API keys set in each script | Never commit keys to git β use env vars in production |
| β | Test run on 5 services per method | --test flag, review output manually |
| β | Cost verified on first 100 articles | Especially critical for V1 β see Β§4 |
| β | 45-country smoke test passed | 1 random service/country β verify language, currency, consumer law, sentence case β see Β§5 |
| β | Worker benchmark done (50 services) | MAX_WORKERS set to clean ceiling β see Β§6. Adaptive fallback enabled for overnight runs. |
| β | Backup of original JSON kept | cp SOURCE.json SOURCE.backup.json |
| β | First batch reviewed manually | Spot-check 10+ articles across different categories and countries |
| β | Cron / sync script tested in dry-run | Run billoff_sync.py with a small test export first |
| β | Import to CMS / DB tested end-to-end | Verify all fields (slug, faq, seo_title) map correctly to your schema |