Community Detection with Python — Core Concepts

Understand modularity optimization, the Louvain and Leiden algorithms, and how to evaluate community structure in networks using Python.

What defines a community

A community (or cluster) in a network is a group of nodes that are more densely connected to each other than to the rest of the network. There’s no single universal definition — different algorithms use different criteria — but the intuition is consistent: dense internal connections, sparse external connections.

Modularity: the quality metric

Modularity (Q) measures how good a community partition is. It compares the number of edges within communities to the expected number if edges were placed randomly:

Q = 0 means the partition is no better than random.
Q > 0.3 typically indicates meaningful community structure.
Q can theoretically reach 1.0 (all edges within communities, none between).

Most community detection algorithms try to maximize modularity. However, modularity has a resolution limit — it can’t detect communities smaller than √(2m) where m is the total number of edges. Very large networks may have important small communities that modularity-based methods miss.

Key algorithms

Louvain algorithm

The most popular community detection method. It works in two phases that repeat:

Local moves — Each node is assigned to the neighbor’s community that gives the biggest modularity increase.
Aggregation — Communities become single nodes, and the process repeats on the compressed graph.

Louvain is fast — O(n log n) in practice — and scales to millions of nodes. Its main weakness: it can produce poorly connected communities (nodes within a community might not all be reachable from each other through the community).

Leiden algorithm

An improvement over Louvain that guarantees well-connected communities. It adds a refinement phase between the local moves and aggregation, ensuring every community is internally connected. Leiden produces higher-quality partitions with similar speed.

Label propagation

Each node starts with its own label. In each iteration, nodes adopt the most common label among their neighbors. The algorithm converges when no node wants to change its label. It’s extremely fast but non-deterministic — different runs may produce different results.

Girvan-Newman

Instead of building communities up, this method removes edges. It repeatedly removes the edge with the highest betweenness centrality (the edge that lies on the most shortest paths), splitting the network into smaller pieces. It’s elegant but slow — O(m²n) — making it impractical for large graphs.

Choosing an algorithm

Need	Algorithm	Why
Large graph, good quality	Leiden	Fast + well-connected communities
Quick baseline	Louvain	Fastest, widely available
Very large graph, speed matters	Label Propagation	Near-linear time
Hierarchical decomposition	Girvan-Newman	Provides dendrogram
Overlapping communities	DEMON, BigClam	Nodes can belong to multiple groups

Evaluation without ground truth

In most real-world cases, you don’t know the “correct” communities. Evaluation options:

Modularity — Higher is better, but compare across similar graph sizes.
Coverage — Fraction of edges that fall within communities. High coverage means communities capture most connections.
Conductance — For each community, the ratio of edges leaving the community to total edges touching community members. Low conductance means tight communities.
Stability — Run the algorithm multiple times. If results vary wildly, the community structure may be weak.

When you have ground truth

If you know the real groups (labeled data), compare with:

Normalized Mutual Information (NMI) — Ranges from 0 (no agreement) to 1 (perfect match). The standard metric for community detection benchmarks.
Adjusted Rand Index (ARI) — Measures pairwise agreement, corrected for chance.

Common misconception

“Every network has clear communities.” Some networks are fundamentally homogeneous — random graphs, for instance, have no meaningful community structure. Before running community detection, check if the network’s modularity is significantly higher than expected by chance. If not, the “communities” you find are noise.

One thing to remember: Community detection algorithms find groups by maximizing internal density relative to external connections — but always validate that the communities are meaningful, not artifacts of the algorithm.

pythongraph-theorydata-science