Developer Integration Guide

Pipeline at a Glance

Each country runs through this sequence. The JSON export is the entry point — every subsequent script enriches it in-place.

📦 Postclic data source — export JSON, direct DB query, API call… à toi de choisir JSON input

↓

🛡️ Guardrail — validate & auto-fix missing fields billoff_guardrail.py

↓

📝 Article generation — choose V1, V2, V3, or V4 billoff_v1/v2/v3/v4.py

↓

🏷️ Meta SEO — title · H1 · description · slug · FAQ billoff_v5_metadata.py

↓

✅ Enriched JSON ready to publish import to site

Source de données : Peu importe comment tu extrais les services Postclic (export fichier, requête DB directe, API, script custom) — le pipeline Billoff attend juste du JSON en entrée.
File strategy: Every script reads and writes back to the same JSON file. Keep a backup copy before running anything at scale.
Format: Scripts accept both {"services": [...]} (pipeline format) and a plain [...] array.

⚠️ Before You Run Anything

Read this carefully before running at scale.

Tu as besoin des données services par pays — peu importe comment tu les obtiens (connexion directe à la DB, export, API…). Le pipeline attend du JSON en entrée, à toi de l'alimenter comme tu préfères.
Les scripts modifient les fichiers en place. Si tu travailles avec un fichier JSON intermédiaire, garde une copie avant le premier run. Si tu connectes directement à la DB, assure-toi d'avoir un snapshot ou une stratégie de rollback.
Always test on 5 services first using the --test flag before committing to 50 000. Inspect the output manually.
V1 is expensive. Expect $0.05–$0.10 per article. Verify on the first 100 before running the full batch — see the Cost Verification section.
You are responsible for deploying and testing on your side. Billoff AI Lab generates the content scripts — QA, CMS import, and production deployment are your responsibility.
API keys are never stored in the cloud. Each script contains a OPENAI_API_KEY / GEMINI_API_KEY / ANTHROPIC_API_KEY placeholder that you must fill in locally.

Step 1 — Choose Your Article Generation Method

Each method produces a standalone Python script. Go to the corresponding docs page, set your editor config if needed, then click Python Script ✦ live config.

Method	Model	Input	Quality	Cost / article	Speed
V1 — Research	GPT-4.1 → GPT-5 Mini	Web search (5 passes)	⭐⭐⭐⭐⭐	$0.05–0.10	60–120 s
V2 — Rewrite	GPT-5 Mini	Existing seo_content	⭐⭐⭐⭐	$0.001–0.003	10–20 s
V3 — Gemini	Gemini 2.5 Flash	Existing seo_content	⭐⭐⭐⭐	$0.001–0.004	15–30 s
V4 — Claude	Claude Haiku 4.5	Existing seo_content	⭐⭐⭐⭐⭐	$0.003–0.005	20–35 s
Meta SEO	GPT-4o-mini	Enriched seo_content	—	$0.0004	1–3 s

Recommendation: Use V2 or V4 for the majority of your catalogue (existing content, low cost), and reserve V1 for high-priority or high-traffic pages where fresh web research is worth the extra cost.

API keys required per method

# V1 and V2 — OpenAI
OPENAI_API_KEY = "sk-..."

# V3 — Google Gemini
GEMINI_API_KEY = "AIzaSy..."

# V4 — Anthropic Claude
ANTHROPIC_API_KEY = "sk-ant-..."

# Meta SEO — OpenAI (gpt-4o-mini)
OPENAI_API_KEY = "sk-..."

Step 2 — Run the Pre-Flight Guardrail

Download the guardrail script and run it before any generation. It validates your JSON, reports missing fields, and optionally auto-populates locale data (language, currency, cancel_word, consumer_law, slug_prefix) from built-in country defaults.

Usage

# Audit only (no file modified)
python3 billoff_guardrail.py services.json

# Auto-populate locale fields + overwrite
python3 billoff_guardrail.py services.json --fix

# Safe: write to a separate file
python3 billoff_guardrail.py services.json --fix --out services_clean.json

What the script checks

Category	Fields	Behaviour if missing
🔴 Critical	`name`	Exit code 1 — pipeline will crash. Must be fixed manually.
🟡 Locale	`language` · `currency` · `currency_symbol` · `cancel_word` · `consumer_law`	Exit code 2 — auto-fixable with `--fix` from `COUNTRY_DEFAULTS`. Article will be in the wrong language/currency if missing.
🔵 Quality	`main_keyword` · `keywords` · `seo_content` · `website` · `cancellation_address`	Exit code 2 — won't block execution, but reduces article depth and accuracy.

Country auto-detection

The script detects the country from:

The "country": "FR" (or any ISO code) field in the JSON root
The filename prefix: FR_services.json, DE_services.json, etc.

If detection fails, add "country": "XX" (your ISO code) to the root of your JSON, or rename the file to start with the two-letter country code.

Expected output (sample)

═════════════════════════════════════════════════════════════════
  Billoff Pre-Pipeline Guardrail
═════════════════════════════════════════════════════════════════
  File:     services.json
  Services: 4821
  Country:  FR  ← detected from JSON root "country" field
  Defaults: French · EUR € · Résiliation

  Field                        Missing    Coverage          Note
  ──────────────────────────── ────────   ──────────   ──────────
  ✅ name                            0        100%
  ✅ language                        0        100%
  🟡 locale  currency               42         99%  auto-fixable
  🟡 locale  currency_symbol        42         99%  auto-fixable
  ✅ cancel_word                     0        100%
  ✅ consumer_law                    0        100%
  🔵 quality main_keyword          183         96%  reduces quality
  🔵 quality seo_content            12        100%  reduces quality

  ✅  No critical issues — pipeline can run.
  🟡  42 locale fields missing — run --fix to auto-populate from FR defaults.
  🔵  195 quality fields missing — article depth may be reduced.

Step 3 — Run the Pipeline

Test mode first (always)

# Run on 5 random services — detailed output, no file modified
python3 billoff_v2_generate.py services.json --test
python3 billoff_v5_metadata.py services.json --test

Production run (full batch)

# 1. Article generation (pick your method)
python3 billoff_v2_generate.py services.json

# 2. SEO metadata (always run after article generation)
python3 billoff_v5_metadata.py services.json

# services.json now has: seo_content + seo_title + h1 + seo_description + slug + faq

Output fields after full pipeline:
seo_content (HTML article) · seo_title · h1 · seo_description · slug · faq[]

Step 4 — Verify Costs on the First 100 Articles

Especially for V1 (web search), always run a 100-article batch first and compare actual vs estimated costs before committing to the full country catalogue.

V1 can deviate significantly from estimates if your services have unusually long names or complex search results. Always validate before running 50 000 articles.

How to run the first 100

Create a test file with the first 100 services from your JSON:

# One-liner to extract the first 100 services into a test file
python3 -c "
import json
with open('services.json') as f:
    d = json.load(f)
# supports both formats
svcs = d['services'] if isinstance(d, dict) else d
test = svcs[:100]
out  = {'country': d.get('country'), 'services': test} if isinstance(d, dict) else test
with open('services_100.json', 'w') as f:
    json.dump(out, f, indent=2)
print(f'Wrote {len(test)} services to services_100.json')
"

# Run the pipeline on these 100
python3 billoff_v1_generate.py services_100.json
python3 billoff_v5_metadata.py services_100.json

# Sum up actual costs from the output file
python3 -c "
import json
with open('services_100.json') as f:
    d = json.load(f)
svcs = d['services'] if isinstance(d, dict) else d
total_usd = sum(s.get('cost_usd', 0) for s in svcs)
total_eur = total_usd * 0.92
avg = total_usd / max(len(svcs),1)
done = sum(1 for s in svcs if s.get('seo_content','').strip())
print(f'Generated:  {done}/{len(svcs)} articles')
print(f'Total cost: \${total_usd:.4f} USD  /  €{total_eur:.4f}')
print(f'Avg / article: \${avg:.5f} USD  (projected 50k: \${avg*50000:.0f} USD)')
"

Reference cost table (per 100 articles)

Method	Estimated cost / 100	Projected / 50 000	Key cost driver
V1	$5 – $10	$2 500 – $5 000	5× web search + 5-pass GPT-4.1
V2	$0.10 – $0.30	$50 – $150	GPT-5 Mini rewrite
V3	$0.10 – $0.40	$50 – $200	Gemini 2.5 Flash (thinking)
V4	$0.30 – $0.50	$150 – $250	Claude Haiku 4.5
Meta SEO	$0.04	$20	GPT-4o-mini structured JSON

If actual costs are more than 30% higher than estimated on the first 100, check: (1) average seo_content length — V1 truncates at 4 500 chars but all input is charged; (2) Pass 5 structured extraction may be retrying due to JSON parse errors — add logging to chatJSON() in the V1 script.

Step 5 — Dynamic Sync: Keep Postclic in Sync

Once the initial bulk generation is done, set up a cron job to process only new or updated services — so every addition to Postclic automatically gets an article and metadata.

Architecture

Every X days / on new service addition:
──────────────────────────────────────────────────────────────────
Postclic DB → JSON export     (your existing export mechanism)
     ↓
Find NEW services              (diff by slug or name vs last run)
     ↓
Run guardrail --fix            (populate locale fields)
     ↓
Run article generator          (V1/V2/V3/V4 on new services only)
     ↓
Run Meta SEO                   (metadata on new services only)
     ↓
Merge into master JSON         (append new results, keep existing)
     ↓
Import / publish               (your CMS import step)

Incremental sync script template

#!/usr/bin/env python3
"""billoff_sync.py — Incremental pipeline runner.
Finds services in NEW_EXPORT that are not yet in MASTER_JSON
(matched by name), processes them through the pipeline,
and merges results back into MASTER_JSON.
"""
import json, os, subprocess, sys
from pathlib import Path

MASTER_JSON  = "data/services.json"          # your accumulating master file
NEW_EXPORT   = "data/export_latest.json"      # latest Postclic data (any source)
NEW_ONLY     = "data/new_services.json"       # temp file for new services only
METHOD       = "v2"                          # v1 / v2 / v3 / v4
ARTICLE_SCRIPT = f"billoff_{METHOD}_generate.py"
META_SCRIPT    = "billoff_v5_metadata.py"

def _load_services(path):
    with open(path) as f: d = json.load(f)
    return (d["services"] if isinstance(d, dict) else d), d

def _save_services(path, services, original_data):
    if isinstance(original_data, list):
        out = services
    else:
        original_data["services"] = services
        out = original_data
    with open(path, "w") as f:
        json.dump(out, f, ensure_ascii=False, indent=2)

# Load master (existing processed services)
master_svcs, master_data = _load_services(MASTER_JSON)
existing_names = {s["name"] for s in master_svcs}

# Load latest export
new_svcs, new_data = _load_services(NEW_EXPORT)
new_only = [s for s in new_svcs if s["name"] not in existing_names]

if not new_only:
    print(f"  ✅  No new services found — nothing to process.")
    sys.exit(0)

print(f"  🆕  {len(new_only)} new services found — running pipeline...")

# Write new-only subset to temp file
_save_services(NEW_ONLY, new_only, {"country": master_data.get("country","AU"), "services": new_only})

# Guardrail — fix locale fields
subprocess.run(["python3", "billoff_guardrail.py", NEW_ONLY, "--fix"], check=True)

# Article generation
subprocess.run(["python3", ARTICLE_SCRIPT, NEW_ONLY], check=True)

# Meta SEO
subprocess.run(["python3", META_SCRIPT, NEW_ONLY], check=True)

# Merge processed new services back into master
processed_svcs, _ = _load_services(NEW_ONLY)
master_svcs.extend(processed_svcs)
_save_services(MASTER_JSON, master_svcs, master_data)

print(f"  ✅  Merged {len(processed_svcs)} new articles into {MASTER_JSON}")
os.remove(NEW_ONLY)

Cron job setup

# Run every Monday at 3 AM — add to crontab with: crontab -e
0 3 * * 1  cd /var/www/billoff && python3 billoff_sync.py >> logs/sync.log 2>&1

# Or with a virtual environment:
0 3 * * 1  cd /var/www/billoff && /usr/local/bin/python3 billoff_sync.py >> logs/sync.log 2>&1

How "new" is detected: The sync script matches by name field. If a service can be renamed on Postclic, use a stable id field instead — just replace the existing_names set-comparison with an existing_ids comparison.

Obtenir les données Postclic — à toi de choisir

Le script de sync attend juste un fichier AU_EXPORT_LATEST.json en entrée. Comment tu le génères, c'est ton affaire — plusieurs approches valides :

Connexion directe à la DB : requête SQL depuis Python, résultat converti en JSON.
SELECT * FROM services WHERE country='AU' AND updated_at > :last_run
Export fichier : Postclic génère un fichier JSON en batch → tu le récupères via SFTP/S3/local
API REST : si Postclic expose une API, un simple curl suffit avant le sync
Script custom : tu connectes directement avec psycopg2, pymysql, SQLAlchemy… et tu sérialises toi-même

Peu importe la méthode — le pipeline Billoff se fiche de l'origine. Il lit du JSON, c'est tout. La structure attendue :

# Format pipeline standard
{ "country": "AU", "services": [ {"name": "...", "seo_content": "...", ...} ] }

# Ou plain array — les deux sont acceptés
[ {"name": "...", "seo_content": "...", ...} ]

Go-Live Checklist

	Check	Notes
☐	JSON export received for each country	One file per market, named `XX_SERVICES.json`
☐	Guardrail run with `--fix`	Exit code 0 required before V1; exit 0 or 2 acceptable for V2/V3/V4
☐	API keys set in each script	Never commit keys to git — use env vars in production
☐	Test run on 5 services per method	`--test` flag, review output manually
☐	Cost verified on first 100 articles	Especially critical for V1 — see §4 above
☐	Backup of original JSON kept	`cp SOURCE.json SOURCE.backup.json`
☐	First batch reviewed manually	Spot-check 10+ articles across different categories and countries
☐	Cron / sync script tested in dry-run	Run `billoff_sync.py` with a small test export first
☐	Import to CMS / DB tested end-to-end	Verify all fields (slug, faq, seo_title) map correctly to your schema

🚀 From Postclic DB to Published Page