NetworkX for Graph Analysis — Core Concepts

Understand how NetworkX models relationships as graphs, computes centrality metrics, and detects communities in connected data.

Why graph analysis matters

Many real-world datasets are fundamentally about relationships, not rows. A spreadsheet of transactions tells you amounts, but a graph of transactions reveals who pays whom, which accounts form clusters, and where money flows. Social networks, supply chains, citation networks, road systems, and biological pathways are all naturally graph-shaped.

NetworkX is Python’s standard library for creating, manipulating, and studying graphs. It provides data structures for graphs and digraphs, algorithms for shortest paths, centrality, clustering, and community detection, plus tools for visualization and export.

Graph types

NetworkX offers four graph classes:

Graph — undirected, no duplicate edges. Friendships: if Alice knows Bob, Bob knows Alice.
DiGraph — directed. Twitter follows: Alice follows Bob, but Bob may not follow Alice.
MultiGraph — undirected, allows multiple edges between the same pair. Two cities connected by a highway and a railway.
MultiDiGraph — directed with multiple edges. Multiple flight routes between airports with different airlines.

Each graph stores nodes and edges. Nodes can be any hashable Python object (strings, numbers, tuples). Edges can carry attributes — weight, color, label, timestamp — stored as dictionaries.

Core concepts

Degree and neighbors

A node’s degree is the number of edges connected to it. In a social network, degree equals the number of friends. In a directed graph, in-degree counts incoming edges (followers) and out-degree counts outgoing edges (following).

Paths and shortest paths

A path is a sequence of edges connecting two nodes. The shortest path has the fewest edges (or the lowest total weight, if edges have weights). Dijkstra’s algorithm and Bellman-Ford algorithm both run inside NetworkX with a single function call.

Centrality measures

Centrality answers “which nodes are most important?” Different definitions of importance give different answers:

Degree centrality — most connections. The popular person.
Betweenness centrality — lies on the most shortest paths. The bridge between groups.
Closeness centrality — smallest average distance to all other nodes. Can reach everyone quickly.
PageRank — importance based on who links to you and how important they are. What made Google famous.

Connected components

A connected component is a group of nodes where every pair can reach each other through edges. If a graph has two components, removing all edges between them makes no difference — they were already separate. Finding components reveals isolated clusters in data.

Community detection

Communities are groups of nodes that are densely connected internally and sparsely connected externally. The Louvain algorithm (available via community module or NetworkX’s built-in louvain_communities) is the most popular method and scales to millions of edges.

A typical workflow

Build — Create a graph from an edge list, adjacency matrix, or database query.
Explore — Check node count, edge count, density, degree distribution.
Analyze — Compute centrality, find shortest paths, detect communities.
Visualize — Draw the graph with nx.draw or export to Gephi, Cytoscape, or D3.js.
Export — Save as GraphML, GML, adjacency list, or JSON for downstream use.

Common misconception

People often assume NetworkX is designed for massive-scale graphs. It is not. NetworkX stores everything in Python dictionaries, which means it is flexible and easy to use but not optimized for graphs with tens of millions of edges. For those scales, use graph-tool (C++ with Python bindings), igraph, networkit, or Apache Spark’s GraphX. NetworkX excels as a prototyping and analysis tool for small-to-medium graphs (up to a few hundred thousand nodes).

When to use NetworkX

Use case	NetworkX?	Alternative
Research prototype	Yes	—
Social network analysis (< 100k nodes)	Yes	—
Billion-edge web graph	No	graph-tool, Neo4j
Real-time shortest path in production	No	dedicated routing engine
Graph ML features	Start here	PyTorch Geometric for training

The one thing to remember: NetworkX makes graph analysis accessible in Python — build a graph in three lines, run sophisticated algorithms in one, and visualize the result immediately.

pythondata-sciencegraph-theory