Python Response Generation — Deep Dive

Response Generation Architecture

Response generation sits at the end of the chatbot pipeline. It receives an action (what to say) and context (conversation history, slot values, API results) and produces the actual text the user sees. In production, this layer must be fast, reliable, and safe — because it is the only thing the user directly experiences.

Template-Based Systems

Advanced Jinja2 Patterns

Production template systems use Jinja2’s full feature set:

from jinja2 import Environment, FileSystemLoader, select_autoescape

env = Environment(
    loader=FileSystemLoader("templates/"),
    autoescape=select_autoescape(),
    trim_blocks=True,
    lstrip_blocks=True,
)

# templates/booking_confirmed.j2
TEMPLATE = """
{% set greeting = ["Great news!", "Awesome!", "All set!"] | random %}
{{ greeting }}

{% if passengers == 1 %}
Your flight to {{ destination }} on {{ date }} is confirmed.
{% else %}
{{ passengers }} seats to {{ destination }} on {{ date }} — confirmed!
{% endif %}

Booking reference: {{ booking_id }}

{% if special_requests %}
We've noted your requests:
{% for req in special_requests %}
  • {{ req }}
{% endfor %}
{% endif %}
"""

Template Registry Pattern

Manage hundreds of templates with a registry:

from dataclasses import dataclass
from jinja2 import Environment, BaseLoader

@dataclass
class ResponseTemplate:
    action: str
    template: str
    variations: list[str]
    channel_overrides: dict[str, str]  # channel -> template

class TemplateRegistry:
    def __init__(self):
        self.templates: dict[str, ResponseTemplate] = {}
        self.env = Environment(loader=BaseLoader())

    def register(self, action: str, template: str,
                 variations: list[str] | None = None,
                 channel_overrides: dict[str, str] | None = None):
        self.templates[action] = ResponseTemplate(
            action=action,
            template=template,
            variations=variations or [],
            channel_overrides=channel_overrides or {},
        )

    def render(self, action: str, context: dict,
               channel: str = "default") -> str:
        entry = self.templates.get(action)
        if not entry:
            return "I'm not sure how to respond to that."

        # Channel-specific override
        if channel in entry.channel_overrides:
            tmpl_str = entry.channel_overrides[channel]
        elif entry.variations:
            import random
            tmpl_str = random.choice([entry.template] + entry.variations)
        else:
            tmpl_str = entry.template

        template = self.env.from_string(tmpl_str)
        return template.render(**context)

Channel-Specific Formatting

Different platforms need different formatting:

class ChannelFormatter:
    @staticmethod
    def format_for_channel(text: str, channel: str) -> dict:
        if channel == "slack":
            return {"text": text, "mrkdwn": True}
        elif channel == "telegram":
            return {"text": text, "parse_mode": "HTML"}
        elif channel == "whatsapp":
            # WhatsApp doesn't support markdown
            text = text.replace("**", "*")  # bold
            text = text.replace("• ", "- ")  # bullets
            return {"text": text}
        return {"text": text}

Retrieval-Based Generation

Candidate Scoring with Sentence Transformers

from sentence_transformers import SentenceTransformer, util
import torch

class RetrievalResponder:
    def __init__(self, responses: list[dict], model_name: str = "all-MiniLM-L6-v2"):
        self.encoder = SentenceTransformer(model_name)
        self.responses = responses
        self.response_texts = [r["text"] for r in responses]
        self.embeddings = self.encoder.encode(
            self.response_texts, convert_to_tensor=True
        )

    def get_response(self, context: str, top_k: int = 3) -> list[dict]:
        query_emb = self.encoder.encode(context, convert_to_tensor=True)
        scores = util.cos_sim(query_emb, self.embeddings)[0]
        top_indices = torch.topk(scores, k=min(top_k, len(self.responses)))

        results = []
        for score, idx in zip(top_indices.values, top_indices.indices):
            results.append({
                "text": self.responses[idx.item()]["text"],
                "score": score.item(),
                "metadata": self.responses[idx.item()].get("metadata", {}),
            })
        return results

Hybrid Retrieval + Template

Use retrieval for the conversational part and templates for structured data:

class HybridResponder:
    def __init__(self, retrieval: RetrievalResponder, registry: TemplateRegistry):
        self.retrieval = retrieval
        self.registry = registry

    def respond(self, action: str, context: dict, conversation_text: str) -> str:
        # Structured data via template
        factual_part = self.registry.render(action, context)

        # Conversational wrapper via retrieval
        candidates = self.retrieval.get_response(conversation_text, top_k=1)
        if candidates and candidates[0]["score"] > 0.7:
            conversational_part = candidates[0]["text"]
            return f"{conversational_part}\n\n{factual_part}"

        return factual_part

LLM-Based Generation

Grounded Generation Pattern

The key to reliable LLM responses is grounding — providing verified data and instructing the model to use only that data:

import openai

class GroundedGenerator:
    def __init__(self, model: str = "gpt-4o-mini"):
        self.model = model
        self.client = openai.OpenAI()

    def generate(self, action: str, structured_data: dict,
                 conversation_history: list[dict],
                 persona: str = "friendly customer service agent") -> str:

        system_prompt = f"""You are a {persona} for an airline.
Generate a natural response for the action: {action}

VERIFIED DATA (use ONLY these facts):
{self._format_data(structured_data)}

RULES:
- Include all verified data points in your response
- Do NOT invent any facts, numbers, or dates not in the verified data
- Keep the response concise (2-4 sentences)
- Match the conversation's tone
- Do not start with "I" """

        messages = [
            {"role": "system", "content": system_prompt},
            *conversation_history[-5:],  # Last 5 turns for context
            {"role": "user", "content": f"Generate response for: {action}"},
        ]

        response = self.client.chat.completions.create(
            model=self.model,
            messages=messages,
            temperature=0.7,
            max_tokens=200,
        )
        return response.choices[0].message.content

    def _format_data(self, data: dict) -> str:
        return "\n".join(f"- {k}: {v}" for k, v in data.items())

Output Validation

Never send LLM output to users without validation:

import re

class ResponseValidator:
    def __init__(self, structured_data: dict):
        self.data = structured_data

    def validate(self, response: str) -> tuple[bool, list[str]]:
        issues = []

        # Check all critical data points are present
        for key in ["booking_id", "date", "destination"]:
            if key in self.data:
                value = str(self.data[key])
                if value.lower() not in response.lower():
                    issues.append(f"Missing critical data: {key}={value}")

        # Check for hallucinated numbers not in source data
        numbers_in_response = set(re.findall(r"\b\d+\b", response))
        numbers_in_data = set()
        for v in self.data.values():
            numbers_in_data.update(re.findall(r"\b\d+\b", str(v)))
        hallucinated = numbers_in_response - numbers_in_data - {"1", "2", "3"}
        if hallucinated:
            issues.append(f"Potentially hallucinated numbers: {hallucinated}")

        # Check for banned phrases
        banned = ["as an ai", "i cannot", "i don't have access"]
        for phrase in banned:
            if phrase in response.lower():
                issues.append(f"Banned phrase detected: '{phrase}'")

        return len(issues) == 0, issues

Fallback to Templates

When LLM generation fails validation, fall back to templates:

class SafeResponder:
    def __init__(self, llm: GroundedGenerator, templates: TemplateRegistry,
                 validator_class=ResponseValidator):
        self.llm = llm
        self.templates = templates
        self.validator_class = validator_class

    def respond(self, action: str, context: dict,
                history: list[dict], channel: str = "default") -> str:
        # Try LLM first
        try:
            llm_response = self.llm.generate(action, context, history)
            validator = self.validator_class(context)
            valid, issues = validator.validate(llm_response)
            if valid:
                return llm_response
            # Log issues for monitoring
            logger.warning(f"LLM response failed validation: {issues}")
        except Exception as e:
            logger.error(f"LLM generation failed: {e}")

        # Fallback to template
        return self.templates.render(action, context, channel)

Response Enrichment

Rich Messages

Beyond plain text, chatbots send buttons, cards, carousels, and quick replies:

from dataclasses import dataclass

@dataclass
class Button:
    title: str
    payload: str

@dataclass
class Card:
    title: str
    subtitle: str
    image_url: str | None = None
    buttons: list[Button] | None = None

@dataclass
class BotResponse:
    text: str | None = None
    buttons: list[Button] | None = None
    cards: list[Card] | None = None
    quick_replies: list[str] | None = None

    def to_channel_format(self, channel: str) -> dict:
        if channel == "slack":
            return self._to_slack()
        elif channel == "telegram":
            return self._to_telegram()
        return {"text": self.text}

    def _to_slack(self) -> dict:
        blocks = []
        if self.text:
            blocks.append({"type": "section", "text": {"type": "mrkdwn", "text": self.text}})
        if self.buttons:
            blocks.append({
                "type": "actions",
                "elements": [
                    {"type": "button", "text": {"type": "plain_text", "text": b.title},
                     "value": b.payload}
                    for b in self.buttons
                ],
            })
        return {"blocks": blocks}

Monitoring Response Quality

Track these metrics in production:

  • Response length distribution — Sudden changes indicate template or prompt issues
  • Fallback rate — How often the LLM fails validation and templates take over
  • User satisfaction signals — Thumbs up/down, conversation completion rates
  • Generation latency — P50 and P99 for LLM responses vs. templates
import time
from dataclasses import dataclass

@dataclass
class ResponseMetrics:
    action: str
    generation_method: str  # "template", "retrieval", "llm"
    latency_ms: float
    was_validated: bool
    passed_validation: bool
    response_length: int
    channel: str

Performance Comparison

MethodLatencyCost/MessageQualitySafety
Template<1ms$0PredictableVery High
Retrieval5-15ms$0NaturalHigh
LLM (GPT-4o-mini)200-500ms$0.0001-0.001Very NaturalMedium
LLM (GPT-4o)500-2000ms$0.005-0.02BestMedium

The production pattern: templates for transactional messages (confirmations, errors, data readouts), retrieval for FAQ and common scenarios, LLM for open-ended conversation and tone adaptation — with validation and template fallback as safety nets.

The one thing to remember: Production response generation layers templates (for safety and speed), retrieval (for natural pre-written answers), and LLM generation (for flexibility) — with output validation as the mandatory safety net before any generated text reaches the user.

pythonresponse-generationchatbotsnlpnlgllm

See Also

  • Python Chatbot Architecture Discover how Python chatbots are built from simple building blocks that listen, think, and reply — like a friendly robot pen-pal.
  • Python Conversation Memory Discover how chatbots remember what you said five minutes ago — and why some forget everything the moment you close the window.
  • Python Dialog Management See how chatbots remember where they are in a conversation — like a waiter who never forgets your order.
  • Python Intent Classification Find out how chatbots figure out what you actually want when you type a message — even if you say it in a weird way.
  • Python Rasa Framework Meet Rasa — the free toolkit that lets anyone build a chatbot that actually understands conversations, not just keywords.