Meta SEO — Metadata Generator

Overview

V5 is a post-processing step — it takes the enriched JSON produced by V1/V2/V3/V4 and generates the SEO shell that wraps each article: the page title, H1 tag, meta description, slug, and 5 contextual FAQ questions grounded in seo_content.

Pipeline position: V1/V2/V3/V4 article generation → V5 metadata generation → publish.
Why separate? Metadata requires different optimisation logic (keyword selection, char-length constraints, FAQ grounding) that is best handled in a dedicated, parallelised pass.

Attribute	Value
Model	gpt-4o-mini (OpenAI)
API endpoint	`api.openai.com/v1/chat/completions`
Streaming	No — structured JSON output
Temperature	0.3 (factual, consistent)
Max output tokens	900
Input required	`name`, `main_keyword`, `keywords[]`, `seo_content`
Fields generated	`seo_title` · `h1` · `seo_description` · `slug` · `faq` (5 Q&A)
Parallel workers	200 (no web search — safe to max out)
Retry logic	Up to 2 retries if char-length constraints fail
Avg cost per service	≈ $0.00015 USD (≈ €0.00014)
Cost × 1 000 services	≈ $0.15 USD
Cost × 50 000 services	≈ $7.50 USD (≈ €6.90)
Speed (200 workers)	~1 000 services/min
Python dependency	stdlib only — `urllib`, `json`, `re` Optional: `beautifulsoup4` for richer FAQ context extraction

Architecture

Pipeline flow

┌──────────────────────────────────────────────────────────────┐
│  V1/V2/V3/V4 OUTPUT — service JSON with seo_content          │
└─────────────────────────────┬────────────────────────────────┘
                              │
              ┌───────────────▼──────────────────────┐
              │  CONTEXT EXTRACTION (per service)     │
              │  • top-15 keywords sorted by volume   │
              │  • 4 500-char preview of seo_content  │
              │  • h2/h3 headings extracted           │
              │  • semantic flags (fee, refund, …)    │
              └───────────────┬──────────────────────┘
                              │
     ┌────────────────▼────────────────────────────┐
     │  GPT-4o-mini (200 parallel workers)          │
     │  Single-call, non-streaming, JSON output     │
     │  Retry × 2 if char-length validation fails   │
     └────────────────┬────────────────────────────┘
                      │
              ┌───────▼───────────────────────────┐
              │  OUTPUT (per service)              │
              │  seo_title    30–60 chars          │
              │  h1           unique angle         │
              │  seo_description  120–160 chars    │
              │  slug         cancel-[name]        │
              │  faq[]        5 Q&A pairs          │
              └───────┬───────────────────────────┘
                      │
         SAVED BACK INTO SAME JSON FILE

BeautifulSoup (optional)

If beautifulsoup4 is installed, the script also extracts structured table data and heading maps from seo_content HTML and feeds them to the model as structured "EXTRACTED FACTS". This significantly improves FAQ specificity — highly recommended for production.

pip install beautifulsoup4

Fields Generated

Field	Format / Constraint	Purpose
`seo_title`	30–60 chars · ends with `\| Billoff`	HTML `<title>` tag — keyword-optimised from the top-volume keyword
`h1`	Unique angle — different from title	Page H1 — action-oriented, complementary to title
`seo_description`	120–160 chars · includes rating + CTA	HTML `<meta name="description">`
`slug`	`cancel-[name-ascii-lowercase]`	Clean URL slug — always derived from service name, never hallucinated
`faq`	Array of 5 `{question, answer}` · plain text, no HTML	Structured FAQ data — injected into page as JSON-LD schema or visible FAQ section

JSON output example

{
  "name": "Netflix",
  "seo_content": "...",        // ← from V1/V2/V3/V4

  // ↓ added by V5
  "seo_title":       "Cancel Netflix Subscription: Complete Guide | Billoff",
  "h1":              "Cancel Netflix: Step-by-Step Without Losing Your Watch History",
  "seo_description": "Learn how to cancel Netflix in 2 minutes — no hold music, no tricks. Rated 4.8/5 by Billoff users. Keep your data or delete everything. Cancel now.",
  "slug":            "cancel-netflix",
  "faq": [
    {
      "question": "Can I get a refund after cancelling Netflix?",
      "answer":   "Netflix does not offer refunds for partial billing periods. Your access continues until the end of the current billing cycle, after which you won't be charged again."
    },
    // ... 4 more Q&A pairs
  ]
}

Prompt Template

Each service gets a single non-streaming call to gpt-4o-mini. The prompt includes:

Top-15 keywords sorted by search volume
4 500-char preview of seo_content (head + cancellation section)
Extracted facts (headings, tables, semantic flags) via BeautifulSoup
Character-length constraints with validation checklist
FAQ grounding rules — no invented facts, method-neutral, plain text only

━━ KEYWORDS (sorted by volume) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Main keyword:     cancel netflix subscription
Related keywords: how to cancel netflix, ...

━━ SERVICE CONTEXT (4 500 chars) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Headings: ["How to Cancel Netflix", "Refund Policy", ...]
Flags:    ["auto_renewal", "refunds", "pause_option"]
[seo_content preview...]

━━ REQUIREMENTS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1️⃣  SEO TITLE     — EXACTLY 30–60 chars · ends "| Billoff"
2️⃣  H1            — different angle from title
3️⃣  SEO DESC      — EXACTLY 120–160 chars · rating 4.8/5 · CTA
4️⃣  SLUG          — cancel-[service-name]
5️⃣  FAQ (5)       — grounded in context · plain text · varied angles

━━ VALIDATION CHECKLIST ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✅ seo_title:       30–60 chars
✅ seo_description: 120–160 chars
✅ FAQ:             no invented contact details, plain text only
✅ Language:        English throughout

Retry logic

After each API response, the script validates len(seo_title) and len(seo_description). If either fails the character-length constraint, the prompt is augmented with specific correction instructions and re-sent — up to 2 retries before accepting the last result.

Usage

1. Download the script

Click Python Script ✦ live config above. The script is fully self-contained — no dependencies except the standard library (and optional beautifulsoup4).

2. Set your API key

# Option A — edit the script directly
OPENAI_API_KEY = "sk-..."

# Option B — environment variable (recommended for production)
export OPENAI_API_KEY="sk-..."

3. Run in test mode first

python3 billoff_v5_metadata.py data/services.json --test

Test mode runs on 5 random services and prints a detailed output table with field lengths, FAQ quality, and validation status — no file is saved.

4. Run in production

# Full run (200 workers by default)
python3 billoff_v5_metadata.py data/services.json

# Custom worker count (e.g. for rate-limit-sensitive environments)
python3 billoff_v5_metadata.py data/services.json --workers 50

The script saves back to the same JSON file in-place, enriching each service object with the 5 new fields.

5. Accepted JSON formats

The script accepts both the pipeline format and a plain array:

// Format A — pipeline output ({"services": [...]})
{
  "country": "FR",  // ISO code of your target country
  "services": [ { "name": "Netflix", "seo_content": "...", ... } ]
}

// Format B — plain array
[ { "name": "Netflix", "seo_content": "...", ... } ]

Multi-country

Edit the locale constants at the top of the script for each country run:

DEFAULT_LANGUAGE    = "French"
DEFAULT_CANCEL_WORD = "Résiliation"
DEFAULT_CANCEL_VERB = "résilier"
DEFAULT_SLUG_PREFIX = "annuler"

You can also override per-service by adding language, cancel_word, or slug_prefix fields directly to each service object in the JSON.

Input Fields Required

Field	Required	Notes
`name`	✅ Yes	Service name (e.g. "Netflix")
`main_keyword`	Recommended	Primary target keyword — used as default title base
`keywords`	Recommended	Array of `{keyword, volume}` — sorted by volume for title selection
`seo_content`	Recommended	HTML article from V1/V2/V3/V4 — powers FAQ context extraction
`language`	Optional	Per-service language override (defaults to `DEFAULT_LANGUAGE`)
`cancel_word`	Optional	Per-service cancel verb override
`slug_prefix`	Optional	Per-service slug prefix override

Cost Model

GPT-4o-mini pricing: $0.15 / 1M input tokens · $0.60 / 1M output tokens.

Scale	Est. tokens / service	Cost (USD)	Cost (EUR)
1 service	~1 000 in + 400 out	≈ $0.00039	≈ €0.00036
1 000 services	—	≈ $0.39	≈ €0.36
10 000 services	—	≈ $3.90	≈ €3.59
50 000 services	—	≈ $19.50	≈ €17.90
Full pipeline (V1+V5)	—	≈ $0.05–0.10 / article	—

Estimates based on average prompt size. Retries add ≈5% overhead. Actual costs may vary.

Integration in the Full Pipeline

# Step 1 — Generate articles (choose one method)
python3 billoff_v1_generate.py data/services.json   # GPT-4.1 + web search
python3 billoff_v2_generate.py data/services.json   # GPT-5 Mini rewrite
python3 billoff_v3_generate.py data/services.json   # Gemini 2.5 Flash rewrite
python3 billoff_v4_generate.py data/services.json   # Claude Haiku 4.5 rewrite

# Step 2 — Generate SEO metadata (always the same script)
python3 billoff_v5_metadata.py data/services.json

# Each service in the JSON now has all fields needed to publish:
# seo_content (article HTML), seo_title, h1, seo_description, slug, faq

Files involved

billoff_v5_metadata.py — the standalone script you download from this page
data/services.json (or any name) — enriched in-place by the script

No Cloudflare proxy needed — V5 runs directly from your machine or CI/CD pipeline against the OpenAI API.

🏷️ SEO Metadata Generator