RDF and SPARQL Queries with Python — Core Concepts
What RDF actually is
RDF (Resource Description Framework) is a W3C standard for representing information as a graph of triples: (subject, predicate, object). Each element is identified by a URI, making facts globally unambiguous.
The triple <http://dbpedia.org/resource/Paris> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/France> states that Paris’s country is France. No ambiguity — every element has a unique web address.
RDF supports three types of values:
- URIs — Identify things (people, cities, concepts).
- Literals — Concrete values like strings, numbers, dates. Can have a language tag (
"Paris"@en) or a datatype ("1889"^^xsd:gYear). - Blank nodes — Anonymous nodes for intermediate structures (like an address with street and zip code that doesn’t need its own URI).
RDF serialization formats
RDF is a data model, not a file format. It can be serialized as:
- Turtle — Human-readable, concise. Most common for hand-editing.
- N-Triples — One triple per line. Easy to parse, verbose.
- RDF/XML — The original format. Verbose and hard to read.
- JSON-LD — JSON-based, web-developer friendly. Used by Google’s structured data.
Working with RDFLib
RDFLib is Python’s standard library for RDF. It handles parsing, querying, and serialization:
from rdflib import Graph, Namespace, Literal, URIRef
from rdflib.namespace import RDF, RDFS, XSD
g = Graph()
EX = Namespace("http://example.org/")
# Add triples
g.add((EX.Paris, EX.country, EX.France))
g.add((EX.Paris, RDFS.label, Literal("Paris", lang="en")))
g.add((EX.Paris, EX.population, Literal(2161000, datatype=XSD.integer)))
# Parse from file
g.parse("data.ttl", format="turtle")
# Serialize
print(g.serialize(format="turtle"))
SPARQL query basics
SPARQL is to RDF what SQL is to relational databases. Four query types:
- SELECT — Returns tabular results (variables bound to values).
- CONSTRUCT — Returns a new RDF graph built from matched patterns.
- ASK — Returns true/false for whether a pattern exists.
- DESCRIBE — Returns RDF data about a resource (implementation-defined).
A basic SELECT query:
PREFIX ex: <http://example.org/>
SELECT ?city ?population
WHERE {
?city ex:country ex:France .
?city ex:population ?population .
FILTER(?population > 1000000)
}
ORDER BY DESC(?population)
Key SPARQL features:
- OPTIONAL — Left-join semantics. Returns results even if the optional pattern doesn’t match.
- FILTER — Constrains results with conditions (comparisons, regex, string functions).
- UNION — Matches either of two patterns.
- GROUP BY / HAVING — Aggregation, similar to SQL.
Querying remote endpoints
Public SPARQL endpoints expose billions of triples. SPARQLWrapper connects to them:
from SPARQLWrapper import SPARQLWrapper, JSON
sparql = SPARQLWrapper("https://query.wikidata.org/sparql")
sparql.setQuery("""
SELECT ?person ?personLabel WHERE {
?person wdt:P31 wd:Q5 . # instance of human
?person wdt:P19 wd:Q64 . # born in Berlin
?person wdt:P106 wd:Q82594 . # occupation: computer scientist
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
LIMIT 10
""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
for r in results["results"]["bindings"]:
print(r["personLabel"]["value"])
Major public endpoints
| Endpoint | URL | Content |
|---|---|---|
| Wikidata | query.wikidata.org | 100B+ triples, general knowledge |
| DBpedia | dbpedia.org/sparql | Wikipedia structured data |
| UniProt | sparql.uniprot.org | Protein sequences and functions |
| Europeana | sparql.europeana.eu | European cultural heritage |
Common misconception
“RDF is only for academics and the semantic web.” In practice, Google uses JSON-LD (an RDF format) for rich search results, biotech companies use RDF for drug discovery knowledge bases, and financial institutions use it for regulatory reporting. The tooling has matured significantly since its academic origins.
One thing to remember: RDF gives facts globally unique identifiers, and SPARQL lets you query across any dataset that uses those identifiers — making it the closest thing we have to a universal database query language.
See Also
- Python Knowledge Graph Construction How Python builds a web of facts about the world — connecting people, places, and ideas so computers can answer real questions.
- Python Neo4j Integration How Python talks to a database that thinks in connections instead of rows and columns.
- Python Property Graph Modeling How Python designs rich maps of connected data where every dot and line can carry extra details.
- Python Arima Forecasting How ARIMA models use patterns in past numbers to predict the future, explained like a bedtime story.
- Python Autocorrelation Analysis How today's number is connected to yesterday's, and why that connection is the secret weapon of time series analysis.