Everything your developer needs to wire up the Billoff pipeline: choosing a generation method, validating data, verifying costs, and keeping the site in sync automatically.
Each country runs through this sequence. The JSON export is the entry point β every subsequent script enriches it in-place.
{"services": [...]} (pipeline format) and a plain [...] array.
--test flag before committing to 50 000. Inspect the output manually.OPENAI_API_KEY / GEMINI_API_KEY / ANTHROPIC_API_KEY placeholder that you must fill in locally.Each method produces a standalone Python script. Go to the corresponding docs page, set your editor config if needed, then click Python Script β¦ live config.
| Method | Model | Input | Quality | Cost / article | Speed |
|---|---|---|---|---|---|
| V1 β Research | GPT-4.1 β GPT-5 Mini | Web search (5 passes) | βββββ | $0.05β0.10 | 60β120 s |
| V2 β Rewrite | GPT-5 Mini | Existing seo_content | ββββ | $0.001β0.003 | 10β20 s |
| V3 β Gemini | Gemini 2.5 Flash | Existing seo_content | ββββ | $0.001β0.004 | 15β30 s |
| V4 β Claude | Claude Haiku 4.5 | Existing seo_content | βββββ | $0.003β0.005 | 20β35 s |
| Meta SEO | GPT-4o-mini | Enriched seo_content | β | $0.0004 | 1β3 s |
# V1 and V2 β OpenAI OPENAI_API_KEY = "sk-..." # V3 β Google Gemini GEMINI_API_KEY = "AIzaSy..." # V4 β Anthropic Claude ANTHROPIC_API_KEY = "sk-ant-..." # Meta SEO β OpenAI (gpt-4o-mini) OPENAI_API_KEY = "sk-..."
Download the guardrail script and run it before any generation. It validates your JSON, reports missing fields, and optionally auto-populates locale data (language, currency, cancel_word, consumer_law, slug_prefix) from built-in country defaults.
# Audit only (no file modified) python3 billoff_guardrail.py services.json # Auto-populate locale fields + overwrite python3 billoff_guardrail.py services.json --fix # Safe: write to a separate file python3 billoff_guardrail.py services.json --fix --out services_clean.json
| Category | Fields | Behaviour if missing |
|---|---|---|
| π΄ Critical | name |
Exit code 1 β pipeline will crash. Must be fixed manually. |
| π‘ Locale | language Β· currency Β· currency_symbol Β· cancel_word Β· consumer_law |
Exit code 2 β auto-fixable with --fix from COUNTRY_DEFAULTS. Article will be in the wrong language/currency if missing. |
| π΅ Quality | main_keyword Β· keywords Β· seo_content Β· website Β· cancellation_address |
Exit code 2 β won't block execution, but reduces article depth and accuracy. |
The script detects the country from:
"country": "FR" (or any ISO code) field in the JSON rootFR_services.json, DE_services.json, etc.If detection fails, add "country": "XX" (your ISO code) to the root of your JSON, or rename the file to start with the two-letter country code.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Billoff Pre-Pipeline Guardrail βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ File: services.json Services: 4821 Country: FR β detected from JSON root "country" field Defaults: French Β· EUR β¬ Β· RΓ©siliation Field Missing Coverage Note ββββββββββββββββββββββββββββ ββββββββ ββββββββββ ββββββββββ β name 0 100% β language 0 100% π‘ locale currency 42 99% auto-fixable π‘ locale currency_symbol 42 99% auto-fixable β cancel_word 0 100% β consumer_law 0 100% π΅ quality main_keyword 183 96% reduces quality π΅ quality seo_content 12 100% reduces quality β No critical issues β pipeline can run. π‘ 42 locale fields missing β run --fix to auto-populate from FR defaults. π΅ 195 quality fields missing β article depth may be reduced.
# Run on 5 random services β detailed output, no file modified python3 billoff_v2_generate.py services.json --test python3 billoff_v5_metadata.py services.json --test
# 1. Article generation (pick your method) python3 billoff_v2_generate.py services.json # 2. SEO metadata (always run after article generation) python3 billoff_v5_metadata.py services.json # services.json now has: seo_content + seo_title + h1 + seo_description + slug + faq
seo_content (HTML article) Β· seo_title Β· h1 Β· seo_description Β· slug Β· faq[]
Especially for V1 (web search), always run a 100-article batch first and compare actual vs estimated costs before committing to the full country catalogue.
Create a test file with the first 100 services from your JSON:
# One-liner to extract the first 100 services into a test file python3 -c " import json with open('services.json') as f: d = json.load(f) # supports both formats svcs = d['services'] if isinstance(d, dict) else d test = svcs[:100] out = {'country': d.get('country'), 'services': test} if isinstance(d, dict) else test with open('services_100.json', 'w') as f: json.dump(out, f, indent=2) print(f'Wrote {len(test)} services to services_100.json') " # Run the pipeline on these 100 python3 billoff_v1_generate.py services_100.json python3 billoff_v5_metadata.py services_100.json # Sum up actual costs from the output file python3 -c " import json with open('services_100.json') as f: d = json.load(f) svcs = d['services'] if isinstance(d, dict) else d total_usd = sum(s.get('cost_usd', 0) for s in svcs) total_eur = total_usd * 0.92 avg = total_usd / max(len(svcs),1) done = sum(1 for s in svcs if s.get('seo_content','').strip()) print(f'Generated: {done}/{len(svcs)} articles') print(f'Total cost: \${total_usd:.4f} USD / β¬{total_eur:.4f}') print(f'Avg / article: \${avg:.5f} USD (projected 50k: \${avg*50000:.0f} USD)') "
| Method | Estimated cost / 100 | Projected / 50 000 | Key cost driver |
|---|---|---|---|
| V1 | $5 β $10 | $2 500 β $5 000 | 5Γ web search + 5-pass GPT-4.1 |
| V2 | $0.10 β $0.30 | $50 β $150 | GPT-5 Mini rewrite |
| V3 | $0.10 β $0.40 | $50 β $200 | Gemini 2.5 Flash (thinking) |
| V4 | $0.30 β $0.50 | $150 β $250 | Claude Haiku 4.5 |
| Meta SEO | $0.04 | $20 | GPT-4o-mini structured JSON |
seo_content length β V1 truncates at 4 500 chars but all input is charged; (2) Pass 5 structured extraction may be retrying due to JSON parse errors β add logging to chatJSON() in the V1 script.
Once the initial bulk generation is done, set up a cron job to process only new or updated services β so every addition to Postclic automatically gets an article and metadata.
Every X days / on new service addition:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Postclic DB β JSON export (your existing export mechanism)
β
Find NEW services (diff by slug or name vs last run)
β
Run guardrail --fix (populate locale fields)
β
Run article generator (V1/V2/V3/V4 on new services only)
β
Run Meta SEO (metadata on new services only)
β
Merge into master JSON (append new results, keep existing)
β
Import / publish (your CMS import step)
#!/usr/bin/env python3 """billoff_sync.py β Incremental pipeline runner. Finds services in NEW_EXPORT that are not yet in MASTER_JSON (matched by name), processes them through the pipeline, and merges results back into MASTER_JSON. """ import json, os, subprocess, sys from pathlib import Path MASTER_JSON = "data/services.json" # your accumulating master file NEW_EXPORT = "data/export_latest.json" # latest Postclic data (any source) NEW_ONLY = "data/new_services.json" # temp file for new services only METHOD = "v2" # v1 / v2 / v3 / v4 ARTICLE_SCRIPT = f"billoff_{METHOD}_generate.py" META_SCRIPT = "billoff_v5_metadata.py" def _load_services(path): with open(path) as f: d = json.load(f) return (d["services"] if isinstance(d, dict) else d), d def _save_services(path, services, original_data): if isinstance(original_data, list): out = services else: original_data["services"] = services out = original_data with open(path, "w") as f: json.dump(out, f, ensure_ascii=False, indent=2) # Load master (existing processed services) master_svcs, master_data = _load_services(MASTER_JSON) existing_names = {s["name"] for s in master_svcs} # Load latest export new_svcs, new_data = _load_services(NEW_EXPORT) new_only = [s for s in new_svcs if s["name"] not in existing_names] if not new_only: print(f" β No new services found β nothing to process.") sys.exit(0) print(f" π {len(new_only)} new services found β running pipeline...") # Write new-only subset to temp file _save_services(NEW_ONLY, new_only, {"country": master_data.get("country","AU"), "services": new_only}) # Guardrail β fix locale fields subprocess.run(["python3", "billoff_guardrail.py", NEW_ONLY, "--fix"], check=True) # Article generation subprocess.run(["python3", ARTICLE_SCRIPT, NEW_ONLY], check=True) # Meta SEO subprocess.run(["python3", META_SCRIPT, NEW_ONLY], check=True) # Merge processed new services back into master processed_svcs, _ = _load_services(NEW_ONLY) master_svcs.extend(processed_svcs) _save_services(MASTER_JSON, master_svcs, master_data) print(f" β Merged {len(processed_svcs)} new articles into {MASTER_JSON}") os.remove(NEW_ONLY)
# Run every Monday at 3 AM β add to crontab with: crontab -e 0 3 * * 1 cd /var/www/billoff && python3 billoff_sync.py >> logs/sync.log 2>&1 # Or with a virtual environment: 0 3 * * 1 cd /var/www/billoff && /usr/local/bin/python3 billoff_sync.py >> logs/sync.log 2>&1
name field. If a service can be renamed on Postclic, use a stable id field instead β just replace the existing_names set-comparison with an existing_ids comparison.
Le script de sync attend juste un fichier AU_EXPORT_LATEST.json en entrΓ©e. Comment tu le gΓ©nΓ¨res, c'est ton affaire β plusieurs approches valides :
SELECT * FROM services WHERE country='AU' AND updated_at > :last_runcurl suffit avant le syncpsycopg2, pymysql, SQLAlchemyβ¦ et tu sΓ©rialises toi-mΓͺmePeu importe la mΓ©thode β le pipeline Billoff se fiche de l'origine. Il lit du JSON, c'est tout. La structure attendue :
# Format pipeline standard { "country": "AU", "services": [ {"name": "...", "seo_content": "...", ...} ] } # Ou plain array β les deux sont acceptΓ©s [ {"name": "...", "seo_content": "...", ...} ]
| Check | Notes | |
|---|---|---|
| β | JSON export received for each country | One file per market, named XX_SERVICES.json |
| β | Guardrail run with --fix | Exit code 0 required before V1; exit 0 or 2 acceptable for V2/V3/V4 |
| β | API keys set in each script | Never commit keys to git β use env vars in production |
| β | Test run on 5 services per method | --test flag, review output manually |
| β | Cost verified on first 100 articles | Especially critical for V1 β see Β§4 above |
| β | Backup of original JSON kept | cp SOURCE.json SOURCE.backup.json |
| β | First batch reviewed manually | Spot-check 10+ articles across different categories and countries |
| β | Cron / sync script tested in dry-run | Run billoff_sync.py with a small test export first |
| β | Import to CMS / DB tested end-to-end | Verify all fields (slug, faq, seo_title) map correctly to your schema |