Python Response Generation — Deep Dive
Response Generation Architecture
Response generation sits at the end of the chatbot pipeline. It receives an action (what to say) and context (conversation history, slot values, API results) and produces the actual text the user sees. In production, this layer must be fast, reliable, and safe — because it is the only thing the user directly experiences.
Template-Based Systems
Advanced Jinja2 Patterns
Production template systems use Jinja2’s full feature set:
from jinja2 import Environment, FileSystemLoader, select_autoescape
env = Environment(
loader=FileSystemLoader("templates/"),
autoescape=select_autoescape(),
trim_blocks=True,
lstrip_blocks=True,
)
# templates/booking_confirmed.j2
TEMPLATE = """
{% set greeting = ["Great news!", "Awesome!", "All set!"] | random %}
{{ greeting }}
{% if passengers == 1 %}
Your flight to {{ destination }} on {{ date }} is confirmed.
{% else %}
{{ passengers }} seats to {{ destination }} on {{ date }} — confirmed!
{% endif %}
Booking reference: {{ booking_id }}
{% if special_requests %}
We've noted your requests:
{% for req in special_requests %}
• {{ req }}
{% endfor %}
{% endif %}
"""
Template Registry Pattern
Manage hundreds of templates with a registry:
from dataclasses import dataclass
from jinja2 import Environment, BaseLoader
@dataclass
class ResponseTemplate:
action: str
template: str
variations: list[str]
channel_overrides: dict[str, str] # channel -> template
class TemplateRegistry:
def __init__(self):
self.templates: dict[str, ResponseTemplate] = {}
self.env = Environment(loader=BaseLoader())
def register(self, action: str, template: str,
variations: list[str] | None = None,
channel_overrides: dict[str, str] | None = None):
self.templates[action] = ResponseTemplate(
action=action,
template=template,
variations=variations or [],
channel_overrides=channel_overrides or {},
)
def render(self, action: str, context: dict,
channel: str = "default") -> str:
entry = self.templates.get(action)
if not entry:
return "I'm not sure how to respond to that."
# Channel-specific override
if channel in entry.channel_overrides:
tmpl_str = entry.channel_overrides[channel]
elif entry.variations:
import random
tmpl_str = random.choice([entry.template] + entry.variations)
else:
tmpl_str = entry.template
template = self.env.from_string(tmpl_str)
return template.render(**context)
Channel-Specific Formatting
Different platforms need different formatting:
class ChannelFormatter:
@staticmethod
def format_for_channel(text: str, channel: str) -> dict:
if channel == "slack":
return {"text": text, "mrkdwn": True}
elif channel == "telegram":
return {"text": text, "parse_mode": "HTML"}
elif channel == "whatsapp":
# WhatsApp doesn't support markdown
text = text.replace("**", "*") # bold
text = text.replace("• ", "- ") # bullets
return {"text": text}
return {"text": text}
Retrieval-Based Generation
Candidate Scoring with Sentence Transformers
from sentence_transformers import SentenceTransformer, util
import torch
class RetrievalResponder:
def __init__(self, responses: list[dict], model_name: str = "all-MiniLM-L6-v2"):
self.encoder = SentenceTransformer(model_name)
self.responses = responses
self.response_texts = [r["text"] for r in responses]
self.embeddings = self.encoder.encode(
self.response_texts, convert_to_tensor=True
)
def get_response(self, context: str, top_k: int = 3) -> list[dict]:
query_emb = self.encoder.encode(context, convert_to_tensor=True)
scores = util.cos_sim(query_emb, self.embeddings)[0]
top_indices = torch.topk(scores, k=min(top_k, len(self.responses)))
results = []
for score, idx in zip(top_indices.values, top_indices.indices):
results.append({
"text": self.responses[idx.item()]["text"],
"score": score.item(),
"metadata": self.responses[idx.item()].get("metadata", {}),
})
return results
Hybrid Retrieval + Template
Use retrieval for the conversational part and templates for structured data:
class HybridResponder:
def __init__(self, retrieval: RetrievalResponder, registry: TemplateRegistry):
self.retrieval = retrieval
self.registry = registry
def respond(self, action: str, context: dict, conversation_text: str) -> str:
# Structured data via template
factual_part = self.registry.render(action, context)
# Conversational wrapper via retrieval
candidates = self.retrieval.get_response(conversation_text, top_k=1)
if candidates and candidates[0]["score"] > 0.7:
conversational_part = candidates[0]["text"]
return f"{conversational_part}\n\n{factual_part}"
return factual_part
LLM-Based Generation
Grounded Generation Pattern
The key to reliable LLM responses is grounding — providing verified data and instructing the model to use only that data:
import openai
class GroundedGenerator:
def __init__(self, model: str = "gpt-4o-mini"):
self.model = model
self.client = openai.OpenAI()
def generate(self, action: str, structured_data: dict,
conversation_history: list[dict],
persona: str = "friendly customer service agent") -> str:
system_prompt = f"""You are a {persona} for an airline.
Generate a natural response for the action: {action}
VERIFIED DATA (use ONLY these facts):
{self._format_data(structured_data)}
RULES:
- Include all verified data points in your response
- Do NOT invent any facts, numbers, or dates not in the verified data
- Keep the response concise (2-4 sentences)
- Match the conversation's tone
- Do not start with "I" """
messages = [
{"role": "system", "content": system_prompt},
*conversation_history[-5:], # Last 5 turns for context
{"role": "user", "content": f"Generate response for: {action}"},
]
response = self.client.chat.completions.create(
model=self.model,
messages=messages,
temperature=0.7,
max_tokens=200,
)
return response.choices[0].message.content
def _format_data(self, data: dict) -> str:
return "\n".join(f"- {k}: {v}" for k, v in data.items())
Output Validation
Never send LLM output to users without validation:
import re
class ResponseValidator:
def __init__(self, structured_data: dict):
self.data = structured_data
def validate(self, response: str) -> tuple[bool, list[str]]:
issues = []
# Check all critical data points are present
for key in ["booking_id", "date", "destination"]:
if key in self.data:
value = str(self.data[key])
if value.lower() not in response.lower():
issues.append(f"Missing critical data: {key}={value}")
# Check for hallucinated numbers not in source data
numbers_in_response = set(re.findall(r"\b\d+\b", response))
numbers_in_data = set()
for v in self.data.values():
numbers_in_data.update(re.findall(r"\b\d+\b", str(v)))
hallucinated = numbers_in_response - numbers_in_data - {"1", "2", "3"}
if hallucinated:
issues.append(f"Potentially hallucinated numbers: {hallucinated}")
# Check for banned phrases
banned = ["as an ai", "i cannot", "i don't have access"]
for phrase in banned:
if phrase in response.lower():
issues.append(f"Banned phrase detected: '{phrase}'")
return len(issues) == 0, issues
Fallback to Templates
When LLM generation fails validation, fall back to templates:
class SafeResponder:
def __init__(self, llm: GroundedGenerator, templates: TemplateRegistry,
validator_class=ResponseValidator):
self.llm = llm
self.templates = templates
self.validator_class = validator_class
def respond(self, action: str, context: dict,
history: list[dict], channel: str = "default") -> str:
# Try LLM first
try:
llm_response = self.llm.generate(action, context, history)
validator = self.validator_class(context)
valid, issues = validator.validate(llm_response)
if valid:
return llm_response
# Log issues for monitoring
logger.warning(f"LLM response failed validation: {issues}")
except Exception as e:
logger.error(f"LLM generation failed: {e}")
# Fallback to template
return self.templates.render(action, context, channel)
Response Enrichment
Rich Messages
Beyond plain text, chatbots send buttons, cards, carousels, and quick replies:
from dataclasses import dataclass
@dataclass
class Button:
title: str
payload: str
@dataclass
class Card:
title: str
subtitle: str
image_url: str | None = None
buttons: list[Button] | None = None
@dataclass
class BotResponse:
text: str | None = None
buttons: list[Button] | None = None
cards: list[Card] | None = None
quick_replies: list[str] | None = None
def to_channel_format(self, channel: str) -> dict:
if channel == "slack":
return self._to_slack()
elif channel == "telegram":
return self._to_telegram()
return {"text": self.text}
def _to_slack(self) -> dict:
blocks = []
if self.text:
blocks.append({"type": "section", "text": {"type": "mrkdwn", "text": self.text}})
if self.buttons:
blocks.append({
"type": "actions",
"elements": [
{"type": "button", "text": {"type": "plain_text", "text": b.title},
"value": b.payload}
for b in self.buttons
],
})
return {"blocks": blocks}
Monitoring Response Quality
Track these metrics in production:
- Response length distribution — Sudden changes indicate template or prompt issues
- Fallback rate — How often the LLM fails validation and templates take over
- User satisfaction signals — Thumbs up/down, conversation completion rates
- Generation latency — P50 and P99 for LLM responses vs. templates
import time
from dataclasses import dataclass
@dataclass
class ResponseMetrics:
action: str
generation_method: str # "template", "retrieval", "llm"
latency_ms: float
was_validated: bool
passed_validation: bool
response_length: int
channel: str
Performance Comparison
| Method | Latency | Cost/Message | Quality | Safety |
|---|---|---|---|---|
| Template | <1ms | $0 | Predictable | Very High |
| Retrieval | 5-15ms | $0 | Natural | High |
| LLM (GPT-4o-mini) | 200-500ms | $0.0001-0.001 | Very Natural | Medium |
| LLM (GPT-4o) | 500-2000ms | $0.005-0.02 | Best | Medium |
The production pattern: templates for transactional messages (confirmations, errors, data readouts), retrieval for FAQ and common scenarios, LLM for open-ended conversation and tone adaptation — with validation and template fallback as safety nets.
The one thing to remember: Production response generation layers templates (for safety and speed), retrieval (for natural pre-written answers), and LLM generation (for flexibility) — with output validation as the mandatory safety net before any generated text reaches the user.
See Also
- Python Chatbot Architecture Discover how Python chatbots are built from simple building blocks that listen, think, and reply — like a friendly robot pen-pal.
- Python Conversation Memory Discover how chatbots remember what you said five minutes ago — and why some forget everything the moment you close the window.
- Python Dialog Management See how chatbots remember where they are in a conversation — like a waiter who never forgets your order.
- Python Intent Classification Find out how chatbots figure out what you actually want when you type a message — even if you say it in a weird way.
- Python Rasa Framework Meet Rasa — the free toolkit that lets anyone build a chatbot that actually understands conversations, not just keywords.