Community Detection with Python — Core Concepts
What defines a community
A community (or cluster) in a network is a group of nodes that are more densely connected to each other than to the rest of the network. There’s no single universal definition — different algorithms use different criteria — but the intuition is consistent: dense internal connections, sparse external connections.
Modularity: the quality metric
Modularity (Q) measures how good a community partition is. It compares the number of edges within communities to the expected number if edges were placed randomly:
- Q = 0 means the partition is no better than random.
- Q > 0.3 typically indicates meaningful community structure.
- Q can theoretically reach 1.0 (all edges within communities, none between).
Most community detection algorithms try to maximize modularity. However, modularity has a resolution limit — it can’t detect communities smaller than √(2m) where m is the total number of edges. Very large networks may have important small communities that modularity-based methods miss.
Key algorithms
Louvain algorithm
The most popular community detection method. It works in two phases that repeat:
- Local moves — Each node is assigned to the neighbor’s community that gives the biggest modularity increase.
- Aggregation — Communities become single nodes, and the process repeats on the compressed graph.
Louvain is fast — O(n log n) in practice — and scales to millions of nodes. Its main weakness: it can produce poorly connected communities (nodes within a community might not all be reachable from each other through the community).
Leiden algorithm
An improvement over Louvain that guarantees well-connected communities. It adds a refinement phase between the local moves and aggregation, ensuring every community is internally connected. Leiden produces higher-quality partitions with similar speed.
Label propagation
Each node starts with its own label. In each iteration, nodes adopt the most common label among their neighbors. The algorithm converges when no node wants to change its label. It’s extremely fast but non-deterministic — different runs may produce different results.
Girvan-Newman
Instead of building communities up, this method removes edges. It repeatedly removes the edge with the highest betweenness centrality (the edge that lies on the most shortest paths), splitting the network into smaller pieces. It’s elegant but slow — O(m²n) — making it impractical for large graphs.
Choosing an algorithm
| Need | Algorithm | Why |
|---|---|---|
| Large graph, good quality | Leiden | Fast + well-connected communities |
| Quick baseline | Louvain | Fastest, widely available |
| Very large graph, speed matters | Label Propagation | Near-linear time |
| Hierarchical decomposition | Girvan-Newman | Provides dendrogram |
| Overlapping communities | DEMON, BigClam | Nodes can belong to multiple groups |
Evaluation without ground truth
In most real-world cases, you don’t know the “correct” communities. Evaluation options:
- Modularity — Higher is better, but compare across similar graph sizes.
- Coverage — Fraction of edges that fall within communities. High coverage means communities capture most connections.
- Conductance — For each community, the ratio of edges leaving the community to total edges touching community members. Low conductance means tight communities.
- Stability — Run the algorithm multiple times. If results vary wildly, the community structure may be weak.
When you have ground truth
If you know the real groups (labeled data), compare with:
- Normalized Mutual Information (NMI) — Ranges from 0 (no agreement) to 1 (perfect match). The standard metric for community detection benchmarks.
- Adjusted Rand Index (ARI) — Measures pairwise agreement, corrected for chance.
Common misconception
“Every network has clear communities.” Some networks are fundamentally homogeneous — random graphs, for instance, have no meaningful community structure. Before running community detection, check if the network’s modularity is significantly higher than expected by chance. If not, the “communities” you find are noise.
One thing to remember: Community detection algorithms find groups by maximizing internal density relative to external connections — but always validate that the communities are meaningful, not artifacts of the algorithm.
See Also
- Python Graph Embeddings How Python turns tangled webs of connections into neat lists of numbers that computers can actually work with.
- Python Graph Neural Networks How Python teaches computers to learn from connections — not just data points — by combining neural networks with graph structures.
- Python Link Prediction How Python guesses which connections are missing from a network — predicting future friendships, recommendations, and undiscovered relationships.
- Python Networkx Graph Analysis How Python maps connections between things — friends, roads, websites — and finds hidden patterns in those connections.
- Python Arima Forecasting How ARIMA models use patterns in past numbers to predict the future, explained like a bedtime story.