Named Entity Recognition in Python — Core Concepts
Named Entity Recognition (NER) identifies and classifies named entities in text into predefined categories. It is one of the foundational tasks in information extraction and powers applications from knowledge graph construction to automated compliance checking.
Standard Entity Types
Most NER systems recognize a common set of entity types:
- PERSON — individual names (Marie Curie, Satya Nadella).
- ORG — organizations (NASA, Goldman Sachs, UEFA).
- GPE — geopolitical entities (France, New York, European Union).
- DATE — absolute or relative dates (January 5th, last Tuesday).
- MONEY — monetary values ($4.2 billion, €50).
- LOC — non-political locations (Mount Everest, the Pacific Ocean).
Domain-specific NER adds custom types: drug names in healthcare, gene symbols in biology, product SKUs in retail.
How NER Works
Rule-Based Approaches
Pattern rules match entities using dictionaries and regular expressions. If you have a list of 500 drug names, a rule-based system can find them with perfect precision. The tradeoff: it misses anything not on the list, and maintaining those lists is manual work.
Statistical Models
Machine learning models learn to recognize entities from annotated training data. They consider features like:
- The word itself and its neighbors.
- Capitalization and word shape (Xxxx, dd/dd/dddd).
- Part-of-speech tags.
- Position in the sentence.
spaCy’s default NER model uses a transition-based neural network. It reads tokens left-to-right and predicts whether each token begins, continues, or is outside an entity.
Transformer Models
Models like BERT treat NER as a token classification task. Each token gets a label (B-PER for the beginning of a person name, I-PER for continuation, O for outside). Transformer-based NER achieves the highest accuracy, especially on ambiguous cases where context matters.
Python Libraries for NER
| Library | Approach | Speed | Accuracy | GPU Needed |
|---|---|---|---|---|
| spaCy (sm/lg) | Statistical | Fast | Good | No |
| spaCy (trf) | Transformer | Slow | Very good | Recommended |
| Hugging Face | Transformer | Slow | Best | Yes |
| NLTK | Rule + statistical | Medium | Fair | No |
| Stanza | Neural | Medium | Very good | Optional |
| Flair | Stacked embeddings | Slow | Very good | Recommended |
For most projects, spaCy is the right starting point. Its models are fast enough for production and accurate enough for general-purpose entity types.
BIO Tagging Scheme
NER models use a tagging scheme to handle multi-word entities. The most common is BIO:
- B-TYPE — beginning of an entity of TYPE.
- I-TYPE — inside (continuation) of the entity.
- O — not part of any entity.
Example: “Barack Obama visited New York”
| Token | Tag |
|---|---|
| Barack | B-PER |
| Obama | I-PER |
| visited | O |
| New | B-GPE |
| York | I-GPE |
This encoding lets models handle entities of any length and distinguish adjacent entities of the same type.
Evaluation Metrics
NER is evaluated at the entity level, not the token level:
- Exact match — the predicted entity must have the correct type AND the correct span boundaries. “Barack” alone when the gold label is “Barack Obama” counts as wrong.
- Partial match — gives credit for overlapping spans. Useful during development but not standard for benchmarks.
Metrics are reported per entity type (precision, recall, F1 for PERSON, ORG, etc.) and as a micro-average across all types.
State-of-the-art models score 90-93 F1 on English news benchmarks (CoNLL-2003). On domain-specific text without fine-tuning, expect 70-80 F1.
Common Misunderstanding
People assume NER works equally well across all text types. It does not. Models trained on news articles struggle with social media (abbreviated names, slang), legal documents (unusual entity structures), and scientific papers (specialized nomenclature). Fine-tuning on even a few hundred annotated examples from your domain typically improves F1 by 10-15 points.
The one thing to remember: NER detects and classifies named entities in text — choose spaCy for speed and general use, fine-tune a transformer when accuracy on your specific domain matters most.
See Also
- Python Adaptive Learning Systems How Python builds learning apps that adjust to each student like a personal tutor who knows exactly what you need next.
- Python Airflow Learn Airflow as a timetable manager that makes sure data tasks run in the right order every day.
- Python Altair Learn Altair through the idea of drawing charts by describing rules, not by hand-placing every visual element.
- Python Automated Grading How Python grades homework and exams automatically, from simple answer keys to understanding written essays.
- Python Batch Vs Stream Processing Batch processing is like doing laundry once a week; stream processing is like a self-cleaning shirt that cleans itself constantly.