Python Log Aggregation with ELK — Core Concepts

The ELK stack (Elasticsearch, Logstash, Kibana) is the most widely deployed open-source log aggregation platform. For Python services, the pipeline is: structured JSON logs → log shipper → Elasticsearch → Kibana dashboards and searches.

The pipeline

Python App → JSON logs (stdout or file)
           → Filebeat (collects & ships)
           → Elasticsearch (stores & indexes)
           → Kibana (search & visualize)

Why structured (JSON) logs?

Unstructured log lines like 2024-03-15 ERROR Payment failed for order 123 require parsing with regex to extract fields. Structured logs like {"timestamp": "2024-03-15", "level": "ERROR", "message": "Payment failed", "order_id": 123} are directly indexable by Elasticsearch.

Producing structured logs in Python

With python-json-logger

import logging
from pythonjsonlogger import jsonlogger

handler = logging.StreamHandler()
formatter = jsonlogger.JsonFormatter(
    "%(asctime)s %(name)s %(levelname)s %(message)s",
    rename_fields={"asctime": "timestamp", "levelname": "level"}
)
handler.setFormatter(formatter)

logger = logging.getLogger("myapp")
logger.addHandler(handler)
logger.setLevel(logging.INFO)

logger.info("Order created", extra={"order_id": "ORD-123", "amount": 99.99})

Output:

{"timestamp": "2024-03-15 10:30:00", "name": "myapp", "level": "INFO", "message": "Order created", "order_id": "ORD-123", "amount": 99.99}

With structlog

import structlog

structlog.configure(
    processors=[
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.processors.JSONRenderer()
    ]
)

logger = structlog.get_logger()
logger.info("order_created", order_id="ORD-123", amount=99.99)

With Loguru

from loguru import logger

logger.add("app.log", serialize=True)
logger.bind(order_id="ORD-123").info("Order created")

Shipping logs with Filebeat

Filebeat is the lightweight log shipper from Elastic. It watches log files and sends new lines to Elasticsearch:

# filebeat.yml
filebeat.inputs:
  - type: log
    paths:
      - /var/log/myapp/*.log
    json.keys_under_root: true    # flatten JSON fields
    json.add_error_key: true       # flag parsing errors
    json.message_key: message      # which field is the message

output.elasticsearch:
  hosts: ["elasticsearch:9200"]
  index: "python-logs-%{+yyyy.MM.dd}"

json.keys_under_root: true is critical — without it, all JSON fields are nested under a json key, making Kibana queries clunky.

Container environments (Docker/Kubernetes)

In containerized deployments, apps log to stdout. Filebeat collects from container log files:

filebeat.autodiscover:
  providers:
    - type: kubernetes
      hints.enabled: true
      templates:
        - condition:
            contains:
              kubernetes.labels.app: "python-api"
          config:
            - type: container
              paths:
                - /var/log/containers/*-${data.kubernetes.container.id}.log
              json.keys_under_root: true

Searching in Kibana

KQL (Kibana Query Language)

What you wantKQL query
All errorslevel: "ERROR"
Specific orderorder_id: "ORD-123"
Errors in payment servicelevel: "ERROR" and service: "payments"
Requests slower than 1sduration: > 1000
Full-text search"timeout connecting to database"

Discover view

The most common workflow: open Discover, select the time range, type a query, and scan through matching log lines. Click any field to filter by its value.

Dashboards

Build Kibana dashboards with:

  • Log volume over time — bar chart of events per minute
  • Error breakdown — pie chart by error type
  • Top endpoints — table sorted by request count
  • Latency trends — line chart from log duration fields

Index patterns and lifecycle

Elasticsearch creates one index per day by default (python-logs-2024.03.15). Over time, this consumes significant disk space.

Index Lifecycle Management (ILM) automates retention:

PhaseDurationAction
Hot7 daysFull search, fast storage
Warm30 daysReduced replicas, slower storage
Cold90 daysFrozen, minimal resources
Delete90+ daysAutomatically removed

ELK alternatives

The ELK stack has competition:

StackDifference
EFK (Fluentd)Replaces Logstash with Fluentd — lighter, Kubernetes-native
Grafana LokiDoesn’t index log content — indexes labels only. Much cheaper storage.
Datadog LogsCommercial SaaS, no infrastructure to manage
OpenSearchAWS fork of Elasticsearch, fully open source

Loki is increasingly popular for Python teams already using Grafana for metrics — it avoids running a separate Elasticsearch cluster.

Common misconception

“ELK replaces application logging.” ELK is an aggregation and search layer — your Python app still needs to produce good, structured logs. Garbage in, garbage out. If your log messages are vague and unstructured, ELK will just make it easier to search through vague, unstructured data.

One thing to remember: The value of ELK is centralized search. When an incident hits, you search one place instead of SSH-ing into ten servers. The investment is in structured logging on the Python side and proper index management on the Elasticsearch side.

pythonobservabilityelasticsearchlogging

See Also

  • Python Alerting Patterns Alerting is a smoke detector for your code — it wakes you up when something is burning, not when someone is cooking.
  • Python Correlation Ids Correlation IDs are name tags for requests — they let you follow one visitor's journey through a crowded theme park of services.
  • Python Grafana Dashboards Python Grafana turns boring numbers from your Python app into colorful, real-time dashboards — like a car's dashboard but for your code.
  • Python Logging Best Practices Treat logs like a flight recorder so you can understand failures after they happen, not just during development.
  • Python Logging Handlers Think of logging handlers as mailboxes that decide where your app's messages end up — screen, file, or faraway server.