Python Log Aggregation with ELK — Core Concepts

Ship structured Python logs to the ELK stack using Filebeat and structured logging, then search and visualize them in Kibana.

The ELK stack (Elasticsearch, Logstash, Kibana) is the most widely deployed open-source log aggregation platform. For Python services, the pipeline is: structured JSON logs → log shipper → Elasticsearch → Kibana dashboards and searches.

The pipeline

Python App → JSON logs (stdout or file)
           → Filebeat (collects & ships)
           → Elasticsearch (stores & indexes)
           → Kibana (search & visualize)

Why structured (JSON) logs?

Unstructured log lines like 2024-03-15 ERROR Payment failed for order 123 require parsing with regex to extract fields. Structured logs like {"timestamp": "2024-03-15", "level": "ERROR", "message": "Payment failed", "order_id": 123} are directly indexable by Elasticsearch.

Producing structured logs in Python

With python-json-logger

import logging
from pythonjsonlogger import jsonlogger

handler = logging.StreamHandler()
formatter = jsonlogger.JsonFormatter(
    "%(asctime)s %(name)s %(levelname)s %(message)s",
    rename_fields={"asctime": "timestamp", "levelname": "level"}
)
handler.setFormatter(formatter)

logger = logging.getLogger("myapp")
logger.addHandler(handler)
logger.setLevel(logging.INFO)

logger.info("Order created", extra={"order_id": "ORD-123", "amount": 99.99})

Output:

{"timestamp": "2024-03-15 10:30:00", "name": "myapp", "level": "INFO", "message": "Order created", "order_id": "ORD-123", "amount": 99.99}

With structlog

import structlog

structlog.configure(
    processors=[
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.processors.JSONRenderer()
    ]
)

logger = structlog.get_logger()
logger.info("order_created", order_id="ORD-123", amount=99.99)

With Loguru

from loguru import logger

logger.add("app.log", serialize=True)
logger.bind(order_id="ORD-123").info("Order created")

Shipping logs with Filebeat

Filebeat is the lightweight log shipper from Elastic. It watches log files and sends new lines to Elasticsearch:

# filebeat.yml
filebeat.inputs:
  - type: log
    paths:
      - /var/log/myapp/*.log
    json.keys_under_root: true    # flatten JSON fields
    json.add_error_key: true       # flag parsing errors
    json.message_key: message      # which field is the message

output.elasticsearch:
  hosts: ["elasticsearch:9200"]
  index: "python-logs-%{+yyyy.MM.dd}"

json.keys_under_root: true is critical — without it, all JSON fields are nested under a json key, making Kibana queries clunky.

Container environments (Docker/Kubernetes)

In containerized deployments, apps log to stdout. Filebeat collects from container log files:

filebeat.autodiscover:
  providers:
    - type: kubernetes
      hints.enabled: true
      templates:
        - condition:
            contains:
              kubernetes.labels.app: "python-api"
          config:
            - type: container
              paths:
                - /var/log/containers/*-${data.kubernetes.container.id}.log
              json.keys_under_root: true

Searching in Kibana

KQL (Kibana Query Language)

What you want	KQL query
All errors	`level: "ERROR"`
Specific order	`order_id: "ORD-123"`
Errors in payment service	`level: "ERROR" and service: "payments"`
Requests slower than 1s	`duration: > 1000`
Full-text search	`"timeout connecting to database"`

Discover view

The most common workflow: open Discover, select the time range, type a query, and scan through matching log lines. Click any field to filter by its value.

Dashboards

Build Kibana dashboards with:

Log volume over time — bar chart of events per minute
Error breakdown — pie chart by error type
Top endpoints — table sorted by request count
Latency trends — line chart from log duration fields

Index patterns and lifecycle

Elasticsearch creates one index per day by default (python-logs-2024.03.15). Over time, this consumes significant disk space.

Index Lifecycle Management (ILM) automates retention:

Phase	Duration	Action
Hot	7 days	Full search, fast storage
Warm	30 days	Reduced replicas, slower storage
Cold	90 days	Frozen, minimal resources
Delete	90+ days	Automatically removed

ELK alternatives

The ELK stack has competition:

Stack	Difference
EFK (Fluentd)	Replaces Logstash with Fluentd — lighter, Kubernetes-native
Grafana Loki	Doesn’t index log content — indexes labels only. Much cheaper storage.
Datadog Logs	Commercial SaaS, no infrastructure to manage
OpenSearch	AWS fork of Elasticsearch, fully open source

Loki is increasingly popular for Python teams already using Grafana for metrics — it avoids running a separate Elasticsearch cluster.

Common misconception

“ELK replaces application logging.” ELK is an aggregation and search layer — your Python app still needs to produce good, structured logs. Garbage in, garbage out. If your log messages are vague and unstructured, ELK will just make it easier to search through vague, unstructured data.

One thing to remember: The value of ELK is centralized search. When an incident hits, you search one place instead of SSH-ing into ten servers. The investment is in structured logging on the Python side and proper index management on the Elasticsearch side.

pythonobservabilityelasticsearchlogging