Graph Embeddings with Python — Core Concepts

Understand how Node2Vec, DeepWalk, and TransE convert graph structures into vector representations for downstream machine learning tasks.

Why graphs need embeddings

Machine learning algorithms expect fixed-size numerical inputs — feature vectors. Graphs are irregular: nodes have different numbers of neighbors, paths vary in length, and the topology encodes information that tabular features can’t capture. Graph embeddings bridge this gap by mapping each node (or edge, or subgraph) to a fixed-dimensional vector that preserves structural properties.

Once embedded, you can use standard ML tools: logistic regression for node classification, cosine similarity for link prediction, k-means for community detection.

The random walk family

DeepWalk

DeepWalk (2014) was the first influential graph embedding method. The idea is borrowed from natural language processing:

Perform multiple random walks from each node — like wandering randomly through the graph.
Treat each walk as a “sentence” where nodes are “words.”
Train a Word2Vec model on these sentences.

Nodes that appear in similar contexts (same walks) end up with similar vectors. If two nodes share many neighbors, they’ll show up together in walks frequently, producing close embeddings.

Node2Vec

Node2Vec (2016) extends DeepWalk with two parameters that control the walk behavior:

p (return parameter) — How likely the walk is to backtrack. Low p means the walk tends to revisit recent nodes, emphasizing local structure.
q (in-out parameter) — How likely the walk is to explore outward. Low q encourages exploration, capturing global community structure.

By tuning p and q, you control whether embeddings emphasize local neighborhoods (homophily) or structural roles (nodes in similar positions across different parts of the graph).

Matrix factorization methods

An alternative family decomposes the graph’s adjacency matrix (or a derived matrix like the Laplacian) into lower-rank factors:

Laplacian Eigenmaps — Uses eigenvectors of the graph Laplacian to embed nodes. Connected nodes get close embeddings.
GraRep — Captures k-step relationships by factorizing powers of the adjacency matrix.

These methods have solid theoretical foundations but scale poorly to millions of nodes because of the matrix decomposition cost.

Knowledge graph embeddings

For knowledge graphs with typed relationships (triples like “Einstein → born_in → Ulm”), specialized methods exist:

TransE — Models relationships as translations in embedding space. If embedding(Einstein) + embedding(born_in) ≈ embedding(Ulm), the triple is considered valid.
RotatE — Models relationships as rotations, handling symmetric and antisymmetric relations better than TransE.
ComplEx — Uses complex-valued embeddings to model asymmetric relations.

These methods are critical for link prediction in knowledge graphs — predicting missing facts like “Who else was born in Ulm?”

How it works in practice

A typical workflow:

Load the graph — From an edge list, adjacency matrix, or database query.
Choose a method — Node2Vec for general graphs, TransE for knowledge graphs.
Set hyperparameters — Embedding dimension (typically 64-256), walk length, number of walks.
Train — Generate walks and train the model. Takes minutes for thousands of nodes, hours for millions.
Evaluate — Use the embeddings for a downstream task (classification, link prediction) and measure accuracy.

Evaluation approaches

Embeddings are only as good as their downstream performance. Common evaluation tasks:

Node classification — Train a classifier on embeddings and labels. Accuracy measures how well embeddings capture class-relevant structure.
Link prediction — Hide some edges, embed the remaining graph, and predict hidden edges. AUC-ROC is the standard metric.
Visualization — Reduce embeddings to 2D with t-SNE or UMAP. If clusters align with known communities, the embeddings capture meaningful structure.

Common misconception

“Higher embedding dimensions are always better.” Beyond a point (usually 128-256 dimensions), additional dimensions capture noise rather than signal. They also increase memory usage and training time. Start with 64 dimensions and increase only if downstream metrics improve.

One thing to remember: Graph embeddings convert topology into geometry — positions in vector space — letting you apply any standard ML algorithm to graph-structured data.

pythonmachine-learninggraph-theory