Legal Knowledge Graphs with Python — Core Concepts

Why graphs fit law perfectly

Legal knowledge is inherently relational. A statute is interpreted by court opinions. Those opinions cite earlier opinions. Judges write opinions and serve on specific courts. Regulations implement statutes. Parties appear in cases across jurisdictions and time periods. This web of relationships is exactly what graph databases are designed to store and query.

Traditional databases store legal information in tables — one for cases, one for statutes, one for judges. Finding connections requires joining tables, which gets slow and complex as relationships multiply. A graph database stores relationships as first-class citizens, making traversal queries (“find all cases within two citation hops of Brown v. Board of Education”) fast and natural.

A legal knowledge graph typically contains these entity types (nodes):

  • Statutes — laws passed by legislatures, with version history
  • Opinions — court decisions, with metadata like court, date, and judge
  • Judges — biographical information and court assignments
  • Courts — hierarchical structure (district → circuit → Supreme Court)
  • Parties — individuals and organizations involved in cases
  • Legal concepts — topics like “due process,” “fair use,” or “negligence”
  • Regulations — administrative rules implementing statutes

And these relationship types (edges):

  • CITES — one opinion references another
  • INTERPRETS — an opinion interprets a statute
  • OVERRULES / DISTINGUISHES — how opinions relate to precedent
  • DECIDED_BY — links opinions to judges
  • FILED_IN — links cases to courts
  • IMPLEMENTS — links regulations to authorizing statutes
  • ABOUT — links opinions and statutes to legal concepts

How Python builds the graph

Entity extraction

Python NLP pipelines extract entities from legal texts. spaCy with custom legal models identifies case names, statutory references, judge names, and legal concepts. eyecite extracts citation relationships. LexNLP pulls out dates, monetary values, and jurisdictions.

Relationship extraction

Once entities are identified, Python determines how they relate. Citation extraction creates CITES edges. Parsing opinion headers creates DECIDED_BY edges. Analyzing statutory text for phrases like “pursuant to Section 5 of the Act” creates INTERPRETS edges.

Graph storage

Python connects to graph databases using drivers like neo4j (for Neo4j), rdflib (for RDF/SPARQL stores), or networkx (for in-memory analysis). Neo4j is the most common choice for production legal knowledge graphs due to its query performance and Cypher query language.

The graph enables queries that would be impractical in traditional databases:

  • “What statutes has the 9th Circuit interpreted differently from the 5th Circuit?” — traverse INTERPRETS edges filtered by court
  • “Show the chain of precedent from Miranda v. Arizona to the most recent Supreme Court citation” — shortest path traversal
  • “Which judges most frequently cite Justice Scalia’s opinions?” — aggregation over citation patterns
  • “Find all cases about fair use that cite both Sony v. Universal and Campbell v. Acuff-Rose” — multi-hop pattern matching

Common misconception

People think building a legal knowledge graph requires manually entering all relationships. In reality, Python automates most of the extraction. Court opinions follow structured conventions — citations are formatted predictably, opinion headers list judges and courts consistently, and statutory references use standard numbering. Automated extraction handles 85-90% of relationships, with human review for complex or ambiguous cases.

The one thing to remember: Legal knowledge graphs store the web of relationships between statutes, opinions, judges, and concepts as first-class data, enabling Python to answer complex legal research questions through graph traversal instead of keyword search.

pythonlegal-techknowledge-graphsgraph-databases

See Also

  • Activation Functions Why neural networks need these tiny mathematical functions — and how ReLU's simplicity accidentally made deep learning possible.
  • Ai Agents Architecture How AI systems go from answering questions to actually doing things — the design patterns that turn language models into autonomous agents that browse, code, and plan.
  • Ai Agents ChatGPT answers questions. AI agents actually do things — browse the web, write code, send emails, and keep going until the job is done. Here's the difference.
  • Ai Ethics Why building AI fairly is harder than it sounds — bias, accountability, privacy, and who gets to decide what AI is allowed to do.
  • Ai Hallucinations ChatGPT sometimes makes up facts with total confidence. Here's the weird reason why — and why it's not as simple as 'the AI lied.'