Python Unicode Categories — ELI5
Imagine a giant library with every character from every language in the world. Chinese characters, Arabic letters, math symbols, emojis — millions of them.
How do you organize all that? You give each character a label.
Every character gets a category
The Unicode standard — the big rulebook for all the world’s characters — puts every character into a category:
- Letters — “A”, “あ”, “ب”, “Ω” — things you read
- Numbers — “5”, “٣”, “Ⅳ” — things you count with
- Punctuation — ”.”, ”!”, ”「” — things that organize sentences
- Symbols — ”$”, ”♪”, ”→” — things with special meaning
- Separators — spaces, line breaks — invisible dividers
- Other — control codes and formatting characters you never see
Why does Python care?
When you ask Python “Is this character a letter?” it looks up the category. Python can tell whether something is a letter, a digit, or a space — even if it’s from a language Python’s creator never heard of.
A practical example
Say you’re filtering user input and want only letters and spaces. Python checks each character’s category: “Letter? Keep it. Symbol? Remove it. Space? Keep it.” It works for English, Japanese, Arabic — everything.
One Thing to Remember
Unicode categories are like filing labels for every character on Earth — Python reads these labels to tell letters apart from numbers, symbols, and invisible formatting characters.
See Also
- Python Fuzzy Matching Fuzzywuzzy Find out how Python's FuzzyWuzzy library matches messy, misspelled text — like a friend who understands you even when you mumble.
- Python Regex Lookahead Lookbehind Learn how Python regex can peek ahead or behind without grabbing text — like checking what's next in line without stepping forward.
- Python Regex Named Groups Learn how Python regex named groups let you label the pieces you capture — like putting name tags on your search results.
- Python Regex Patterns Discover how Python regex patterns work like a secret code for finding hidden text treasures in any document.
- Python Regular Expressions Learn how Python can find tricky text patterns fast, like spotting every phone number hidden in a messy page.