DNS — Deep Dive

Wire Format and the Actual Packets

DNS uses port 53, typically over UDP for queries (fast, no handshake) but falls back to TCP when responses exceed 512 bytes — or always uses TCP for zone transfers. Since 2016, RFC 7766 formally encourages implementations to keep TCP connections open for multiple queries rather than opening a new socket each time.

A DNS message has five sections:

Header        (12 bytes, always present)
Question      (variable — the name you're looking up)
Answer        (zero or more resource records)
Authority     (nameserver records)
Additional    (extra records the server thought you'd need)

The header encodes the query ID (16-bit random number used to match responses to requests), flags (QR, Opcode, AA, TC, RD, RA, Z, RCODE), and counts for each section.

Names in DNS use label encoding — not a simple string. api.stripe.com is encoded as:

\x03 a p i \x06 s t r i p e \x03 c o m \x00

Each label is prefixed with its length byte, terminated with a zero byte. Names can also use compression pointers (two bytes with the top two bits set to 11, the remaining 14 bits pointing to an earlier offset in the packet) to avoid repeating common suffixes. Parsing this incorrectly is a classic DNS implementation bug — the 1997 BIND vulnerability CVE-1999-0024 was a buffer overflow in exactly this code path.

How Resolvers Actually Work

When you run dig google.com on your machine, here’s what actually happens — without simplification.

Your libc calls getaddrinfo(), which consults /etc/nsswitch.conf to determine resolution order (files dns means check /etc/hosts first, then DNS). It reads /etc/resolv.conf to find the configured nameserver (often 127.0.0.53 for systemd-resolved on modern Linux, or your router’s IP).

The stub resolver in your OS sends a recursive query (RD=1 — “recursion desired”) to the configured recursive resolver. The stub resolver does almost nothing itself — it’s the recursive resolver that does the heavy lifting.

The recursive resolver (your ISP’s servers, or Cloudflare’s 1.1.1.1) checks its cache. On a miss, it starts from the root hints — a hardcoded list of 13 root server IP addresses, shipped with every resolver implementation. These rarely change; the last major update was 2017 when Verisign’s j.root-servers.net got new addresses.

The resolution algorithm:

  1. Query a root server: “Who handles .com?”
  2. Root returns a referral (NS records + glue A records for those nameservers)
  3. Query a .com TLD server: “Who handles google.com?”
  4. TLD returns another referral
  5. Query the authoritative nameserver: “What’s the A record for google.com?”
  6. Authoritative server returns the answer
  7. Resolver caches everything with appropriate TTLs, returns to stub

The referrals in steps 2 and 4 use the Authority section of the response, not the Answer section. An AA (Authoritative Answer) bit in the header flag tells you when you’ve finally hit the authoritative server.

Negative Caching

RFC 2308 (1998) specified that “NXDOMAIN” (name doesn’t exist) responses should also be cached. Before that, a query for a nonexistent domain would hammer the authoritative server on every retry.

Negative TTL is encoded in the SOA record’s minimum TTL field (the seventh field in a SOA record). So when your application retries a failed DNS lookup for a hostname you just deleted, it’s not actually querying DNS every time — it’s getting a cached NXDOMAIN, which is why adding a new DNS entry doesn’t work immediately even after you’ve configured it.

DNSSEC: The Chain of Trust

DNSSEC adds four new record types:

  • RRSIG — cryptographic signature over a set of resource records
  • DNSKEY — the public key used to verify signatures
  • DS — a hash of a child zone’s DNSKEY, stored in the parent zone
  • NSEC/NSEC3 — authenticated denial of existence (proves a name doesn’t exist)

The trust chain works like a certificate hierarchy: ICANN signs the root zone’s DNSKEY. The root zone contains DS records for each TLD. Each TLD contains DS records for signed delegations. Each authoritative nameserver signs its own records.

A validating resolver verifies the chain from root to leaf. If any signature fails, the resolver returns SERVFAIL rather than a potentially spoofed answer.

The weak point: the DS record must be uploaded to the parent zone when you create or rotate keys. This is called key rollover, and it’s operationally painful. Get it wrong and your entire domain stops resolving for validating resolvers. ICANN performed the first-ever root zone key rollover in October 2018 — after two years of planning. They delayed it once because metrics showed too many resolvers weren’t ready.

# Check if a domain has DNSSEC signed
dig +dnssec example.com A

# Trace the entire chain of trust
delv +rtrace google.com

DNS over HTTPS vs DNS over TLS

Both protocols encrypt DNS traffic. The difference is operational:

DoT (RFC 7858) runs on TCP port 853. It’s DNS over TLS — same wire format, just encrypted. Easy to inspect on the network, easy to block (just firewall port 853).

DoH (RFC 8484) runs on TCP port 443 — same as HTTPS. It encodes DNS messages as HTTP/2 requests with application/dns-message content type. Indistinguishable from regular web traffic. This is why some enterprise firewalls have trouble blocking it, and why ISPs hate it — it breaks their ability to log and redirect queries.

There’s also ODoH (Oblivious DoH) — a relay-based system where the resolver never sees the client IP address alongside the query. Cloudflare and Apple deployed it experimentally in 2021. The latency overhead is about 10-15ms for an extra hop.

Split-Horizon DNS

A common production pattern: serve different DNS answers based on where the query comes from.

Internal clients querying api.yourcompany.com should reach 10.0.1.50 (internal address, no transit costs, lower latency). External clients should reach 203.0.113.10 (public load balancer).

In BIND this is view blocks. In AWS Route 53, it’s “private hosted zones” associated with your VPC. In Kubernetes, CoreDNS handles internal service discovery separately from the upstream resolver.

The footgun: your monitoring system sits outside your VPC but you configured it to test internal endpoints. It resolves the public IP. Your internal service goes down. External DNS still resolves correctly. Monitoring says everything is fine. You find out via a customer at 2am.

The SERVFAIL Problem

SERVFAIL is the catch-all DNS error that means “the resolver tried but something went wrong.” Causes:

  • DNSSEC validation failure (most common in modern resolvers)
  • Authoritative servers unreachable or not responding
  • Malformed response from authoritative server
  • EDNS mismatch (the authoritative server doesn’t support EDNS0 extensions)

The EDNS one is sneaky. Resolvers send EDNS0 extensions in queries to signal support for large UDP responses and DNSSEC. Some old authoritative servers return FORMERR (format error) instead of ignoring extensions they don’t understand. RFC-compliant resolvers should retry without EDNS, but not all do. This silently breaks resolution for some domains on some resolvers.

# Simulate a non-EDNS query to diagnose
dig +noedns google.com

Anycast and Global DNS Performance

The 13 “root servers” are actually 13 anycast addresses. Verisign operates over 170 physical nodes worldwide all advertising 198.41.0.4 (the address for a.root-servers.net). BGP routing sends your query to the nearest node.

Cloudflare’s 1.1.1.1 uses the same trick — it’s one IP address but thousands of servers in Cloudflare’s global network. When you query 1.1.1.1, you’re talking to a server in the nearest Cloudflare point of presence.

This is how DNS achieves sub-millisecond response times globally while operating at a scale that processes trillions of queries per day.

Practical Debugging Toolkit

# Basic lookup
dig api.github.com

# Trace the full resolution path
dig +trace api.github.com

# Check specific record types
dig MX gmail.com
dig TXT github.com  # often has SPF records

# Query a specific resolver
dig @8.8.8.8 cloudflare.com

# Reverse lookup
dig -x 1.1.1.1

# Check DNSSEC validation
dig +dnssec +short cloudflare.com

# Find authoritative nameservers
dig NS example.com

# What's the SOA (useful for negative caching TTL)
dig SOA example.com

One underused trick: dig +short strips all the metadata and gives you just the answer. dig +norecurse @a.iana-servers.net example.com lets you query a nameserver non-recursively to see exactly what it knows — useful for debugging propagation issues where you want to verify the authoritative server has the right record before waiting for resolver caches to expire.

One Thing to Remember

DNS is deceptively simple at the surface and genuinely complex under load. The failure modes — cache poisoning, DNSSEC misconfigurations, negative caching gotchas, split-horizon bugs — almost never show up in tutorials. Understanding the protocol at the wire level is what separates engineers who “fixed it by waiting” from engineers who know exactly why it was broken and when it will propagate.

networkinginternetinfrastructurednssecuritydevops