RDF and SPARQL Queries with Python — Core Concepts

Understand RDF triple models, SPARQL query patterns, and how Python's RDFLib and SPARQLWrapper connect to semantic web data sources.

What RDF actually is

RDF (Resource Description Framework) is a W3C standard for representing information as a graph of triples: (subject, predicate, object). Each element is identified by a URI, making facts globally unambiguous.

The triple <http://dbpedia.org/resource/Paris> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/France> states that Paris’s country is France. No ambiguity — every element has a unique web address.

RDF supports three types of values:

URIs — Identify things (people, cities, concepts).
Literals — Concrete values like strings, numbers, dates. Can have a language tag ("Paris"@en) or a datatype ("1889"^^xsd:gYear).
Blank nodes — Anonymous nodes for intermediate structures (like an address with street and zip code that doesn’t need its own URI).

RDF serialization formats

RDF is a data model, not a file format. It can be serialized as:

Turtle — Human-readable, concise. Most common for hand-editing.
N-Triples — One triple per line. Easy to parse, verbose.
RDF/XML — The original format. Verbose and hard to read.
JSON-LD — JSON-based, web-developer friendly. Used by Google’s structured data.

Working with RDFLib

RDFLib is Python’s standard library for RDF. It handles parsing, querying, and serialization:

from rdflib import Graph, Namespace, Literal, URIRef
from rdflib.namespace import RDF, RDFS, XSD

g = Graph()
EX = Namespace("http://example.org/")

# Add triples
g.add((EX.Paris, EX.country, EX.France))
g.add((EX.Paris, RDFS.label, Literal("Paris", lang="en")))
g.add((EX.Paris, EX.population, Literal(2161000, datatype=XSD.integer)))

# Parse from file
g.parse("data.ttl", format="turtle")

# Serialize
print(g.serialize(format="turtle"))

SPARQL query basics

SPARQL is to RDF what SQL is to relational databases. Four query types:

SELECT — Returns tabular results (variables bound to values).
CONSTRUCT — Returns a new RDF graph built from matched patterns.
ASK — Returns true/false for whether a pattern exists.
DESCRIBE — Returns RDF data about a resource (implementation-defined).

A basic SELECT query:

PREFIX ex: <http://example.org/>
SELECT ?city ?population
WHERE {
    ?city ex:country ex:France .
    ?city ex:population ?population .
    FILTER(?population > 1000000)
}
ORDER BY DESC(?population)

Key SPARQL features:

OPTIONAL — Left-join semantics. Returns results even if the optional pattern doesn’t match.
FILTER — Constrains results with conditions (comparisons, regex, string functions).
UNION — Matches either of two patterns.
GROUP BY / HAVING — Aggregation, similar to SQL.

Querying remote endpoints

Public SPARQL endpoints expose billions of triples. SPARQLWrapper connects to them:

from SPARQLWrapper import SPARQLWrapper, JSON

sparql = SPARQLWrapper("https://query.wikidata.org/sparql")
sparql.setQuery("""
    SELECT ?person ?personLabel WHERE {
        ?person wdt:P31 wd:Q5 .        # instance of human
        ?person wdt:P19 wd:Q64 .        # born in Berlin
        ?person wdt:P106 wd:Q82594 .    # occupation: computer scientist
        SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
    }
    LIMIT 10
""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()

for r in results["results"]["bindings"]:
    print(r["personLabel"]["value"])

Major public endpoints

Endpoint	URL	Content
Wikidata	query.wikidata.org	100B+ triples, general knowledge
DBpedia	dbpedia.org/sparql	Wikipedia structured data
UniProt	sparql.uniprot.org	Protein sequences and functions
Europeana	sparql.europeana.eu	European cultural heritage

Common misconception

“RDF is only for academics and the semantic web.” In practice, Google uses JSON-LD (an RDF format) for rich search results, biotech companies use RDF for drug discovery knowledge bases, and financial institutions use it for regulatory reporting. The tooling has matured significantly since its academic origins.

One thing to remember: RDF gives facts globally unique identifiers, and SPARQL lets you query across any dataset that uses those identifiers — making it the closest thing we have to a universal database query language.

pythonsemantic-webknowledge-graphs