Property Graph Modeling with Python — Deep Dive

Schema design methodology

Property graph modeling follows a query-driven approach. Unlike relational modeling (which starts with entities and normalizes), graph modeling starts with the questions you need to answer and works backward to the structure.

Step 1: Define traversal questions

Write your target queries as natural language:

  1. “Which products did a customer’s friends purchase in the last 30 days?”
  2. “What’s the shortest path between two employees through project collaborations?”
  3. “Which suppliers have the highest defect rate for components used in product X?”

Step 2: Whiteboard the traversal paths

For question 1, the path is:

(Customer)-[:FRIENDS_WITH]->(Friend)-[:PURCHASED]->(Product)

This tells you: you need Customer nodes, a FRIENDS_WITH relationship, and a PURCHASED relationship with a date property.

Step 3: Add properties and constraints

# Model definition as Python dataclasses for validation
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional

@dataclass
class Customer:
    customer_id: str                    # unique identifier
    name: str
    email: str
    created_at: datetime
    labels: tuple = ("Customer",)

@dataclass
class Product:
    sku: str                            # unique identifier
    name: str
    price: float
    category: str
    labels: tuple = ("Product",)

@dataclass
class Purchase:
    """Relationship: Customer -[:PURCHASED]-> Product"""
    quantity: int
    total_price: float
    purchased_at: datetime
    channel: str                        # "web", "mobile", "in-store"

Advanced modeling patterns

Versioned nodes

Track full history of changes without losing data:

def create_versioned_update(tx, entity_id: str, new_props: dict):
    """Archive current version and create a new one."""
    tx.run("""
        MATCH (current:Entity {entity_id: $id, _is_current: true})
        SET current._is_current = false,
            current._valid_to = datetime()
        CREATE (new:Entity)
        SET new = $props,
            new.entity_id = $id,
            new._is_current = true,
            new._valid_from = datetime(),
            new._version = coalesce(current._version, 0) + 1
        CREATE (current)-[:SUPERSEDED_BY]->(new)
    """, id=entity_id, props=new_props)

Fan-out control with summary nodes

When a node accumulates millions of relationships (a celebrity with millions of followers), traversal becomes expensive. Add summary nodes:

(Celebrity)-[:HAS_FOLLOWER_BATCH]->(FollowerBatch {batch_id: 1, count: 10000})
(FollowerBatch)-[:CONTAINS]->(Follower1)
(FollowerBatch)-[:CONTAINS]->(Follower2)
...

This limits the fan-out at any single node and enables efficient pagination.

Multi-tenancy patterns

For SaaS applications serving multiple customers from one graph:

def tenant_query(tx, tenant_id: str, query_fragment: str):
    """All queries filter by tenant to prevent data leakage."""
    full_query = f"""
        MATCH (t:Tenant {{id: $tenant_id}})
        MATCH {query_fragment}
        WHERE ALL(n IN nodes(path) WHERE (n)-[:BELONGS_TO_TENANT]->(t) OR n = t)
    """
    return tx.run(full_query, tenant_id=tenant_id)

Alternatively, use separate databases per tenant (Neo4j 5.x supports multi-database).

Schema enforcement

Property graphs are traditionally schema-optional, but Neo4j 5.x adds schema enforcement:

def apply_schema(session):
    """Apply constraints and indexes for the data model."""
    constraints = [
        # Uniqueness
        "CREATE CONSTRAINT customer_id IF NOT EXISTS FOR (c:Customer) REQUIRE c.customer_id IS UNIQUE",
        "CREATE CONSTRAINT product_sku IF NOT EXISTS FOR (p:Product) REQUIRE p.sku IS UNIQUE",

        # Existence (Enterprise only)
        "CREATE CONSTRAINT customer_email IF NOT EXISTS FOR (c:Customer) REQUIRE c.email IS NOT NULL",

        # Node key (composite uniqueness)
        "CREATE CONSTRAINT order_item_key IF NOT EXISTS FOR (oi:OrderItem) REQUIRE (oi.order_id, oi.sku) IS NODE KEY",
    ]

    indexes = [
        "CREATE INDEX customer_email_idx IF NOT EXISTS FOR (c:Customer) ON (c.email)",
        "CREATE INDEX product_category_idx IF NOT EXISTS FOR (p:Product) ON (p.category)",
        "CREATE TEXT INDEX product_name_text IF NOT EXISTS FOR (p:Product) ON (p.name)",
    ]

    for stmt in constraints + indexes:
        session.run(stmt)

Validation with neomodel

from neomodel import (
    StructuredNode, StringProperty, IntegerProperty,
    FloatProperty, DateTimeProperty, RelationshipTo,
    UniqueIdProperty, One, ZeroOrMore,
)

class Customer(StructuredNode):
    uid = UniqueIdProperty()
    name = StringProperty(required=True, max_length=200)
    email = StringProperty(required=True, unique_index=True)
    created_at = DateTimeProperty(default_now=True)

    orders = RelationshipTo("Order", "PLACED", cardinality=ZeroOrMore)
    friends = RelationshipTo("Customer", "FRIENDS_WITH", cardinality=ZeroOrMore)

class Order(StructuredNode):
    order_id = StringProperty(required=True, unique_index=True)
    total = FloatProperty(required=True)
    status = StringProperty(choices={"pending": "Pending", "shipped": "Shipped", "delivered": "Delivered"})
    placed_at = DateTimeProperty(default_now=True)

    items = RelationshipTo("Product", "INCLUDES", cardinality=ZeroOrMore)

Migration strategies

Adding a new relationship type

def migrate_add_category_hierarchy(session):
    """Migration: Add SUBCATEGORY_OF relationships between Category nodes."""
    session.run("""
        MATCH (sub:Category), (parent:Category)
        WHERE sub.parent_name = parent.name AND NOT (sub)-[:SUBCATEGORY_OF]->(parent)
        CREATE (sub)-[:SUBCATEGORY_OF]->(parent)
    """)
    # Clean up the denormalized property
    session.run("MATCH (c:Category) REMOVE c.parent_name")

Splitting a node type

When a single node type becomes overloaded (a User that’s both a customer and an admin):

def migrate_split_user_roles(session):
    """Migration: Add secondary labels based on role property."""
    session.run("""
        MATCH (u:User) WHERE u.role = 'admin'
        SET u:Admin
    """)
    session.run("""
        MATCH (u:User) WHERE u.role = 'customer'
        SET u:Customer
    """)

Anti-patterns

The dense node anti-pattern

A single node connected to millions of others (the “god node”). Queries touching this node scan all its relationships.

Fix: Introduce intermediate grouping nodes, or use relationship properties and indexes to filter without full scans.

The property-bag anti-pattern

Storing everything as properties on a single node type instead of modeling distinct entities:

# Bad: One node with 50 properties
(:Record {customer_name, customer_email, product_name, product_sku, order_date, ...})

# Good: Separate entities with relationships
(:Customer)-[:PLACED]->(:Order)-[:INCLUDES]->(:Product)

The missing relationship direction anti-pattern

Property graph relationships are always directed. Modeling bidirectional concepts (friendship) with two relationships doubles storage and complicates queries.

Fix: Use a single direction and query with undirected pattern matching:

-- Single relationship, query both directions
MATCH (a:Person)-[:FRIENDS_WITH]-(b:Person)  -- note: no arrow

Testing graph models

import pytest
from neo4j import GraphDatabase

class TestGraphModel:
    """Validate the graph model against business rules."""

    def test_every_order_has_customer(self, session):
        result = session.run("""
            MATCH (o:Order) WHERE NOT (o)<-[:PLACED]-(:Customer)
            RETURN count(o) AS orphans
        """).single()
        assert result["orphans"] == 0, "Found orders without customers"

    def test_no_self_relationships(self, session):
        result = session.run("""
            MATCH (n)-[r]->(n) RETURN count(r) AS self_loops
        """).single()
        assert result["self_loops"] == 0, "Found self-referencing relationships"

    def test_product_prices_positive(self, session):
        result = session.run("""
            MATCH (p:Product) WHERE p.price <= 0
            RETURN count(p) AS invalid
        """).single()
        assert result["invalid"] == 0, "Found products with non-positive prices"

Benchmarking model alternatives

When choosing between modeling approaches, benchmark with realistic data volumes:

import time

def benchmark_query(session, query: str, params: dict = None, iterations: int = 100):
    times = []
    for _ in range(iterations):
        start = time.perf_counter()
        result = session.run(query, **(params or {}))
        list(result)  # consume results
        times.append(time.perf_counter() - start)

    return {
        "mean_ms": sum(times) / len(times) * 1000,
        "p99_ms": sorted(times)[int(len(times) * 0.99)] * 1000,
        "min_ms": min(times) * 1000,
    }

Compare models at 10x and 100x your expected data volume to catch scaling issues early.

One thing to remember: Property graph modeling is query-driven design. Write your most important queries first, then shape the graph to make those queries natural single-traversal operations. The model should feel obvious when you see it — if it feels forced, redesign.

pythongraph-databasesdata-modeling

See Also