Neo4j Integration with Python — Core Concepts
Why Neo4j with Python
Relational databases store data in tables and use joins to connect them. When relationships get deep — “friends of friends who bought similar products” — those joins multiply and queries slow down. Neo4j stores relationships directly as pointers between nodes, making relationship traversal O(1) per hop regardless of database size.
Python is the most common language for data analysis, machine learning, and backend APIs. Connecting these two means you can build recommendation engines, fraud detection systems, and knowledge graphs with a language you already know.
The official driver
The neo4j Python package is maintained by Neo4j Inc. Install it with pip install neo4j.
A basic connection uses three concepts:
- Driver — A singleton that manages connection pooling. Create once, reuse everywhere.
- Session — A lightweight context for running one or more transactions.
- Transaction — An atomic unit of work. Either everything commits or nothing does.
The driver connects to Neo4j via the Bolt protocol, a binary format designed for low-latency graph operations. Connection URIs look like bolt://localhost:7687 or neo4j://cluster-host:7687 for clusters.
Cypher basics
Cypher is Neo4j’s query language. It uses ASCII art to describe patterns:
(a:Person)— a node labeled Person-[:KNOWS]->— a relationship of type KNOWS(a)-[:KNOWS]->(b)— Person a knows Person b
Common operations include CREATE for inserting data, MATCH for finding patterns, MERGE for “find or create,” and DELETE for removal.
Transaction patterns
Neo4j supports two transaction modes:
- Auto-commit transactions — single-query convenience. Each query runs in its own transaction.
- Managed transactions — you pass a function to
session.execute_read()orsession.execute_write(), and the driver handles retries on transient errors (like leader switches in a cluster).
Managed transactions are the recommended approach for production code. The driver will automatically retry your function if it encounters a transient failure, such as a cluster failover.
Result handling
Query results come back as Record objects. Each record contains values accessible by key or index. Results are streamed — they’re consumed lazily from the server. Once consumed, they cannot be replayed unless you explicitly collect them into a list.
A common mistake is trying to access results outside the transaction function. Because results stream from the server within a transaction, returning raw Result objects from a transaction function leads to errors. Instead, extract the data you need inside the function and return plain Python objects.
Object-Graph Mapping
For larger projects, raw Cypher strings become hard to manage. Libraries like neomodel provide Django-like model definitions. You define Python classes that map to node labels and relationship types, then query using method calls instead of string queries.
Neomodel handles schema enforcement, cardinality constraints, and migrations — similar to what an ORM does for SQL databases.
Common misconception
“Graph databases replace relational databases.” They don’t. Neo4j excels at relationship-heavy queries but isn’t optimized for aggregation-heavy analytics or simple CRUD with no relationships. Most production systems use both: a relational database for transactional data and Neo4j for the relationship layer.
One thing to remember: Use managed transactions with execute_read and execute_write — they handle retries automatically and prevent the most common production failures.
See Also
- Python Knowledge Graph Construction How Python builds a web of facts about the world — connecting people, places, and ideas so computers can answer real questions.
- Python Property Graph Modeling How Python designs rich maps of connected data where every dot and line can carry extra details.
- Python Rdf Sparql Queries How Python reads and asks questions about the web's universal language for describing things and their connections.
- Python Arima Forecasting How ARIMA models use patterns in past numbers to predict the future, explained like a bedtime story.
- Python Autocorrelation Analysis How today's number is connected to yesterday's, and why that connection is the secret weapon of time series analysis.