Developer Integration Guide

πŸš€ From Postclic DB to Published Page

Everything your developer needs to wire up the Billoff pipeline: choosing a generation method, validating data, verifying costs, and keeping the site in sync automatically.

Scripts β€” download all scripts you need

Pipeline at a Glance

Each country runs through this sequence. The JSON export is the entry point β€” every subsequent script enriches it in-place.

πŸ“¦ Postclic data source β€” export JSON, direct DB query, API call… Γ  toi de choisir JSON input
↓
πŸ›‘οΈ Guardrail β€” validate & auto-fix missing fields billoff_guardrail.py
↓
πŸ“ Article generation β€” choose V1, V2, V3, or V4 billoff_v1/v2/v3/v4.py
↓
🏷️ Meta SEO β€” title Β· H1 Β· description Β· slug Β· FAQ billoff_v5_metadata.py
↓
βœ… Enriched JSON ready to publish import to site
Source de donnΓ©es : Peu importe comment tu extrais les services Postclic (export fichier, requΓͺte DB directe, API, script custom) β€” le pipeline Billoff attend juste du JSON en entrΓ©e.
File strategy: Every script reads and writes back to the same JSON file. Keep a backup copy before running anything at scale.
Format: Scripts accept both {"services": [...]} (pipeline format) and a plain [...] array.

⚠️ Before You Run Anything

Read this carefully before running at scale.

Step 1 β€” Choose Your Article Generation Method

Each method produces a standalone Python script. Go to the corresponding docs page, set your editor config if needed, then click Python Script ✦ live config.

MethodModelInputQualityCost / articleSpeed
V1 β€” Research GPT-4.1 β†’ GPT-5 Mini Web search (5 passes) ⭐⭐⭐⭐⭐ $0.05–0.10 60–120 s
V2 β€” Rewrite GPT-5 Mini Existing seo_content ⭐⭐⭐⭐ $0.001–0.003 10–20 s
V3 β€” Gemini Gemini 2.5 Flash Existing seo_content ⭐⭐⭐⭐ $0.001–0.004 15–30 s
V4 β€” Claude Claude Haiku 4.5 Existing seo_content ⭐⭐⭐⭐⭐ $0.003–0.005 20–35 s
Meta SEO GPT-4o-mini Enriched seo_content β€” $0.0004 1–3 s
Recommendation: Use V2 or V4 for the majority of your catalogue (existing content, low cost), and reserve V1 for high-priority or high-traffic pages where fresh web research is worth the extra cost.

API keys required per method

# V1 and V2 β€” OpenAI
OPENAI_API_KEY = "sk-..."

# V3 β€” Google Gemini
GEMINI_API_KEY = "AIzaSy..."

# V4 β€” Anthropic Claude
ANTHROPIC_API_KEY = "sk-ant-..."

# Meta SEO β€” OpenAI (gpt-4o-mini)
OPENAI_API_KEY = "sk-..."

Step 2 β€” Run the Pre-Flight Guardrail

Download the guardrail script and run it before any generation. It validates your JSON, reports missing fields, and optionally auto-populates locale data (language, currency, cancel_word, consumer_law, slug_prefix) from built-in country defaults.

Usage

# Audit only (no file modified)
python3 billoff_guardrail.py services.json

# Auto-populate locale fields + overwrite
python3 billoff_guardrail.py services.json --fix

# Safe: write to a separate file
python3 billoff_guardrail.py services.json --fix --out services_clean.json

What the script checks

CategoryFieldsBehaviour if missing
πŸ”΄ Critical name Exit code 1 β€” pipeline will crash. Must be fixed manually.
🟑 Locale language Β· currency Β· currency_symbol Β· cancel_word Β· consumer_law Exit code 2 β€” auto-fixable with --fix from COUNTRY_DEFAULTS. Article will be in the wrong language/currency if missing.
πŸ”΅ Quality main_keyword Β· keywords Β· seo_content Β· website Β· cancellation_address Exit code 2 β€” won't block execution, but reduces article depth and accuracy.

Country auto-detection

The script detects the country from:

  1. The "country": "FR" (or any ISO code) field in the JSON root
  2. The filename prefix: FR_services.json, DE_services.json, etc.

If detection fails, add "country": "XX" (your ISO code) to the root of your JSON, or rename the file to start with the two-letter country code.

Expected output (sample)

═════════════════════════════════════════════════════════════════
  Billoff Pre-Pipeline Guardrail
═════════════════════════════════════════════════════════════════
  File:     services.json
  Services: 4821
  Country:  FR  ← detected from JSON root "country" field
  Defaults: French Β· EUR € Β· RΓ©siliation

  Field                        Missing    Coverage          Note
  ──────────────────────────── ────────   ──────────   ──────────
  βœ… name                            0        100%
  βœ… language                        0        100%
  🟑 locale  currency               42         99%  auto-fixable
  🟑 locale  currency_symbol        42         99%  auto-fixable
  βœ… cancel_word                     0        100%
  βœ… consumer_law                    0        100%
  πŸ”΅ quality main_keyword          183         96%  reduces quality
  πŸ”΅ quality seo_content            12        100%  reduces quality

  βœ…  No critical issues β€” pipeline can run.
  🟑  42 locale fields missing β€” run --fix to auto-populate from FR defaults.
  πŸ”΅  195 quality fields missing β€” article depth may be reduced.

Step 3 β€” Run the Pipeline

Test mode first (always)

# Run on 5 random services β€” detailed output, no file modified
python3 billoff_v2_generate.py services.json --test
python3 billoff_v5_metadata.py services.json --test

Production run (full batch)

# 1. Article generation (pick your method)
python3 billoff_v2_generate.py services.json

# 2. SEO metadata (always run after article generation)
python3 billoff_v5_metadata.py services.json

# services.json now has: seo_content + seo_title + h1 + seo_description + slug + faq
Output fields after full pipeline:
seo_content (HTML article) Β· seo_title Β· h1 Β· seo_description Β· slug Β· faq[]

Step 4 β€” Verify Costs on the First 100 Articles

Especially for V1 (web search), always run a 100-article batch first and compare actual vs estimated costs before committing to the full country catalogue.

V1 can deviate significantly from estimates if your services have unusually long names or complex search results. Always validate before running 50 000 articles.

How to run the first 100

Create a test file with the first 100 services from your JSON:

# One-liner to extract the first 100 services into a test file
python3 -c "
import json
with open('services.json') as f:
    d = json.load(f)
# supports both formats
svcs = d['services'] if isinstance(d, dict) else d
test = svcs[:100]
out  = {'country': d.get('country'), 'services': test} if isinstance(d, dict) else test
with open('services_100.json', 'w') as f:
    json.dump(out, f, indent=2)
print(f'Wrote {len(test)} services to services_100.json')
"

# Run the pipeline on these 100
python3 billoff_v1_generate.py services_100.json
python3 billoff_v5_metadata.py services_100.json

# Sum up actual costs from the output file
python3 -c "
import json
with open('services_100.json') as f:
    d = json.load(f)
svcs = d['services'] if isinstance(d, dict) else d
total_usd = sum(s.get('cost_usd', 0) for s in svcs)
total_eur = total_usd * 0.92
avg = total_usd / max(len(svcs),1)
done = sum(1 for s in svcs if s.get('seo_content','').strip())
print(f'Generated:  {done}/{len(svcs)} articles')
print(f'Total cost: \${total_usd:.4f} USD  /  €{total_eur:.4f}')
print(f'Avg / article: \${avg:.5f} USD  (projected 50k: \${avg*50000:.0f} USD)')
"

Reference cost table (per 100 articles)

MethodEstimated cost / 100Projected / 50 000Key cost driver
V1$5 – $10$2 500 – $5 0005Γ— web search + 5-pass GPT-4.1
V2$0.10 – $0.30$50 – $150GPT-5 Mini rewrite
V3$0.10 – $0.40$50 – $200Gemini 2.5 Flash (thinking)
V4$0.30 – $0.50$150 – $250Claude Haiku 4.5
Meta SEO$0.04$20GPT-4o-mini structured JSON
If actual costs are more than 30% higher than estimated on the first 100, check: (1) average seo_content length β€” V1 truncates at 4 500 chars but all input is charged; (2) Pass 5 structured extraction may be retrying due to JSON parse errors β€” add logging to chatJSON() in the V1 script.

Step 5 β€” Dynamic Sync: Keep Postclic in Sync

Once the initial bulk generation is done, set up a cron job to process only new or updated services β€” so every addition to Postclic automatically gets an article and metadata.

Architecture

Every X days / on new service addition:
──────────────────────────────────────────────────────────────────
Postclic DB β†’ JSON export     (your existing export mechanism)
     ↓
Find NEW services              (diff by slug or name vs last run)
     ↓
Run guardrail --fix            (populate locale fields)
     ↓
Run article generator          (V1/V2/V3/V4 on new services only)
     ↓
Run Meta SEO                   (metadata on new services only)
     ↓
Merge into master JSON         (append new results, keep existing)
     ↓
Import / publish               (your CMS import step)

Incremental sync script template

#!/usr/bin/env python3
"""billoff_sync.py β€” Incremental pipeline runner.
Finds services in NEW_EXPORT that are not yet in MASTER_JSON
(matched by name), processes them through the pipeline,
and merges results back into MASTER_JSON.
"""
import json, os, subprocess, sys
from pathlib import Path

MASTER_JSON  = "data/services.json"          # your accumulating master file
NEW_EXPORT   = "data/export_latest.json"      # latest Postclic data (any source)
NEW_ONLY     = "data/new_services.json"       # temp file for new services only
METHOD       = "v2"                          # v1 / v2 / v3 / v4
ARTICLE_SCRIPT = f"billoff_{METHOD}_generate.py"
META_SCRIPT    = "billoff_v5_metadata.py"

def _load_services(path):
    with open(path) as f: d = json.load(f)
    return (d["services"] if isinstance(d, dict) else d), d

def _save_services(path, services, original_data):
    if isinstance(original_data, list):
        out = services
    else:
        original_data["services"] = services
        out = original_data
    with open(path, "w") as f:
        json.dump(out, f, ensure_ascii=False, indent=2)

# Load master (existing processed services)
master_svcs, master_data = _load_services(MASTER_JSON)
existing_names = {s["name"] for s in master_svcs}

# Load latest export
new_svcs, new_data = _load_services(NEW_EXPORT)
new_only = [s for s in new_svcs if s["name"] not in existing_names]

if not new_only:
    print(f"  βœ…  No new services found β€” nothing to process.")
    sys.exit(0)

print(f"  πŸ†•  {len(new_only)} new services found β€” running pipeline...")

# Write new-only subset to temp file
_save_services(NEW_ONLY, new_only, {"country": master_data.get("country","AU"), "services": new_only})

# Guardrail β€” fix locale fields
subprocess.run(["python3", "billoff_guardrail.py", NEW_ONLY, "--fix"], check=True)

# Article generation
subprocess.run(["python3", ARTICLE_SCRIPT, NEW_ONLY], check=True)

# Meta SEO
subprocess.run(["python3", META_SCRIPT, NEW_ONLY], check=True)

# Merge processed new services back into master
processed_svcs, _ = _load_services(NEW_ONLY)
master_svcs.extend(processed_svcs)
_save_services(MASTER_JSON, master_svcs, master_data)

print(f"  βœ…  Merged {len(processed_svcs)} new articles into {MASTER_JSON}")
os.remove(NEW_ONLY)

Cron job setup

# Run every Monday at 3 AM β€” add to crontab with: crontab -e
0 3 * * 1  cd /var/www/billoff && python3 billoff_sync.py >> logs/sync.log 2>&1

# Or with a virtual environment:
0 3 * * 1  cd /var/www/billoff && /usr/local/bin/python3 billoff_sync.py >> logs/sync.log 2>&1
How "new" is detected: The sync script matches by name field. If a service can be renamed on Postclic, use a stable id field instead β€” just replace the existing_names set-comparison with an existing_ids comparison.

Obtenir les donnΓ©es Postclic β€” Γ  toi de choisir

Le script de sync attend juste un fichier AU_EXPORT_LATEST.json en entrée. Comment tu le génères, c'est ton affaire — plusieurs approches valides :

Peu importe la mΓ©thode β€” le pipeline Billoff se fiche de l'origine. Il lit du JSON, c'est tout. La structure attendue :

# Format pipeline standard
{ "country": "AU", "services": [ {"name": "...", "seo_content": "...", ...} ] }

# Ou plain array β€” les deux sont acceptΓ©s
[ {"name": "...", "seo_content": "...", ...} ]

Go-Live Checklist

CheckNotes
☐JSON export received for each countryOne file per market, named XX_SERVICES.json
☐Guardrail run with --fixExit code 0 required before V1; exit 0 or 2 acceptable for V2/V3/V4
☐API keys set in each scriptNever commit keys to git β€” use env vars in production
☐Test run on 5 services per method--test flag, review output manually
☐Cost verified on first 100 articlesEspecially critical for V1 β€” see Β§4 above
☐Backup of original JSON keptcp SOURCE.json SOURCE.backup.json
☐First batch reviewed manuallySpot-check 10+ articles across different categories and countries
☐Cron / sync script tested in dry-runRun billoff_sync.py with a small test export first
☐Import to CMS / DB tested end-to-endVerify all fields (slug, faq, seo_title) map correctly to your schema