STEP 5 β€” POST-GENERATION

🏷️ SEO Metadata Generator

Run this after V1/V2/V3/V4 to complete each page with keyword-optimised metadata: seo_title, h1, seo_description, slug, and faq (5 Q&A pairs). Powered by GPT-4o-mini β€” ultra-fast, extremely low cost, runs at 200 parallel workers.

Developer Resources β€” all config baked in, ready to run

Overview

V5 is a post-processing step β€” it takes the enriched JSON produced by V1/V2/V3/V4 and generates the SEO shell that wraps each article: the page title, H1 tag, meta description, slug, and 5 contextual FAQ questions grounded in seo_content.

Pipeline position: V1/V2/V3/V4 article generation β†’ V5 metadata generation β†’ publish.
Why separate? Metadata requires different optimisation logic (keyword selection, char-length constraints, FAQ grounding) that is best handled in a dedicated, parallelised pass.
AttributeValue
Modelgpt-4o-mini (OpenAI)
API endpointapi.openai.com/v1/chat/completions
StreamingNo β€” structured JSON output
Temperature0.3 (factual, consistent)
Max output tokens900
Input requiredname, main_keyword, keywords[], seo_content
Fields generatedseo_title Β· h1 Β· seo_description Β· slug Β· faq (5 Q&A)
Parallel workers200 (no web search β€” safe to max out)
Retry logicUp to 2 retries if char-length constraints fail
Avg cost per serviceβ‰ˆ $0.00015 USD (β‰ˆ €0.00014)
Cost Γ— 1 000 servicesβ‰ˆ $0.15 USD
Cost Γ— 50 000 servicesβ‰ˆ $7.50 USD (β‰ˆ €6.90)
Speed (200 workers)~1 000 services/min
Python dependencystdlib only β€” urllib, json, re
Optional: beautifulsoup4 for richer FAQ context extraction

Architecture

Pipeline flow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  V1/V2/V3/V4 OUTPUT β€” service JSON with seo_content          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚  CONTEXT EXTRACTION (per service)     β”‚
              β”‚  β€’ top-15 keywords sorted by volume   β”‚
              β”‚  β€’ 4 500-char preview of seo_content  β”‚
              β”‚  β€’ h2/h3 headings extracted           β”‚
              β”‚  β€’ semantic flags (fee, refund, …)    β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
     β”‚  GPT-4o-mini (200 parallel workers)          β”‚
     β”‚  Single-call, non-streaming, JSON output     β”‚
     β”‚  Retry Γ— 2 if char-length validation fails   β”‚
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚  OUTPUT (per service)              β”‚
              β”‚  seo_title    30–60 chars          β”‚
              β”‚  h1           unique angle         β”‚
              β”‚  seo_description  120–160 chars    β”‚
              β”‚  slug         cancel-[name]        β”‚
              β”‚  faq[]        5 Q&A pairs          β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β”‚
         SAVED BACK INTO SAME JSON FILE

BeautifulSoup (optional)

If beautifulsoup4 is installed, the script also extracts structured table data and heading maps from seo_content HTML and feeds them to the model as structured "EXTRACTED FACTS". This significantly improves FAQ specificity β€” highly recommended for production.

pip install beautifulsoup4

Fields Generated

FieldFormat / ConstraintPurpose
seo_title 30–60 chars Β· ends with | Billoff HTML <title> tag β€” keyword-optimised from the top-volume keyword
h1 Unique angle β€” different from title Page H1 β€” action-oriented, complementary to title
seo_description 120–160 chars Β· includes rating + CTA HTML <meta name="description">
slug cancel-[name-ascii-lowercase] Clean URL slug β€” always derived from service name, never hallucinated
faq Array of 5 {question, answer} Β· plain text, no HTML Structured FAQ data β€” injected into page as JSON-LD schema or visible FAQ section

JSON output example

{
  "name": "Netflix",
  "seo_content": "...",        // ← from V1/V2/V3/V4

  // ↓ added by V5
  "seo_title":       "Cancel Netflix Subscription: Complete Guide | Billoff",
  "h1":              "Cancel Netflix: Step-by-Step Without Losing Your Watch History",
  "seo_description": "Learn how to cancel Netflix in 2 minutes β€” no hold music, no tricks. Rated 4.8/5 by Billoff users. Keep your data or delete everything. Cancel now.",
  "slug":            "cancel-netflix",
  "faq": [
    {
      "question": "Can I get a refund after cancelling Netflix?",
      "answer":   "Netflix does not offer refunds for partial billing periods. Your access continues until the end of the current billing cycle, after which you won't be charged again."
    },
    // ... 4 more Q&A pairs
  ]
}

Prompt Template

Each service gets a single non-streaming call to gpt-4o-mini. The prompt includes:

  1. Top-15 keywords sorted by search volume
  2. 4 500-char preview of seo_content (head + cancellation section)
  3. Extracted facts (headings, tables, semantic flags) via BeautifulSoup
  4. Character-length constraints with validation checklist
  5. FAQ grounding rules β€” no invented facts, method-neutral, plain text only
━━ KEYWORDS (sorted by volume) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Main keyword:     cancel netflix subscription
Related keywords: how to cancel netflix, ...

━━ SERVICE CONTEXT (4 500 chars) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Headings: ["How to Cancel Netflix", "Refund Policy", ...]
Flags:    ["auto_renewal", "refunds", "pause_option"]
[seo_content preview...]

━━ REQUIREMENTS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1️⃣  SEO TITLE     β€” EXACTLY 30–60 chars Β· ends "| Billoff"
2️⃣  H1            β€” different angle from title
3️⃣  SEO DESC      β€” EXACTLY 120–160 chars Β· rating 4.8/5 Β· CTA
4️⃣  SLUG          β€” cancel-[service-name]
5️⃣  FAQ (5)       β€” grounded in context Β· plain text Β· varied angles

━━ VALIDATION CHECKLIST ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
βœ… seo_title:       30–60 chars
βœ… seo_description: 120–160 chars
βœ… FAQ:             no invented contact details, plain text only
βœ… Language:        English throughout

Retry logic

After each API response, the script validates len(seo_title) and len(seo_description). If either fails the character-length constraint, the prompt is augmented with specific correction instructions and re-sent β€” up to 2 retries before accepting the last result.

Usage

1. Download the script

Click Python Script ✦ live config above. The script is fully self-contained β€” no dependencies except the standard library (and optional beautifulsoup4).

2. Set your API key

# Option A β€” edit the script directly
OPENAI_API_KEY = "sk-..."

# Option B β€” environment variable (recommended for production)
export OPENAI_API_KEY="sk-..."

3. Run in test mode first

python3 billoff_v5_metadata.py data/services.json --test

Test mode runs on 5 random services and prints a detailed output table with field lengths, FAQ quality, and validation status β€” no file is saved.

4. Run in production

# Full run (200 workers by default)
python3 billoff_v5_metadata.py data/services.json

# Custom worker count (e.g. for rate-limit-sensitive environments)
python3 billoff_v5_metadata.py data/services.json --workers 50

The script saves back to the same JSON file in-place, enriching each service object with the 5 new fields.

5. Accepted JSON formats

The script accepts both the pipeline format and a plain array:

// Format A β€” pipeline output ({"services": [...]})
{
  "country": "FR",  // ISO code of your target country
  "services": [ { "name": "Netflix", "seo_content": "...", ... } ]
}

// Format B β€” plain array
[ { "name": "Netflix", "seo_content": "...", ... } ]

Multi-country

Edit the locale constants at the top of the script for each country run:

DEFAULT_LANGUAGE    = "French"
DEFAULT_CANCEL_WORD = "RΓ©siliation"
DEFAULT_CANCEL_VERB = "rΓ©silier"
DEFAULT_SLUG_PREFIX = "annuler"

You can also override per-service by adding language, cancel_word, or slug_prefix fields directly to each service object in the JSON.

Input Fields Required

FieldRequiredNotes
nameβœ… YesService name (e.g. "Netflix")
main_keywordRecommendedPrimary target keyword β€” used as default title base
keywordsRecommendedArray of {keyword, volume} β€” sorted by volume for title selection
seo_contentRecommendedHTML article from V1/V2/V3/V4 β€” powers FAQ context extraction
languageOptionalPer-service language override (defaults to DEFAULT_LANGUAGE)
cancel_wordOptionalPer-service cancel verb override
slug_prefixOptionalPer-service slug prefix override

Cost Model

GPT-4o-mini pricing: $0.15 / 1M input tokens Β· $0.60 / 1M output tokens.

ScaleEst. tokens / serviceCost (USD)Cost (EUR)
1 service~1 000 in + 400 outβ‰ˆ $0.00039β‰ˆ €0.00036
1 000 servicesβ€”β‰ˆ $0.39β‰ˆ €0.36
10 000 servicesβ€”β‰ˆ $3.90β‰ˆ €3.59
50 000 servicesβ€”β‰ˆ $19.50β‰ˆ €17.90
Full pipeline (V1+V5)β€”β‰ˆ $0.05–0.10 / articleβ€”

Estimates based on average prompt size. Retries add β‰ˆ5% overhead. Actual costs may vary.

Integration in the Full Pipeline

# Step 1 β€” Generate articles (choose one method)
python3 billoff_v1_generate.py data/services.json   # GPT-4.1 + web search
python3 billoff_v2_generate.py data/services.json   # GPT-5 Mini rewrite
python3 billoff_v3_generate.py data/services.json   # Gemini 2.5 Flash rewrite
python3 billoff_v4_generate.py data/services.json   # Claude Haiku 4.5 rewrite

# Step 2 β€” Generate SEO metadata (always the same script)
python3 billoff_v5_metadata.py data/services.json

# Each service in the JSON now has all fields needed to publish:
# seo_content (article HTML), seo_title, h1, seo_description, slug, faq

Files involved

No Cloudflare proxy needed β€” V5 runs directly from your machine or CI/CD pipeline against the OpenAI API.