Physical Address
Metro Manila, Philippines
Physical Address
Metro Manila, Philippines
In the first article, we explained how authoritative DNS can influence traffic direction.
The main idea was simple:
DNS steering returns different DNS answers based on available signals and policy.
Those signals may include country, ASN, resolver IP, health status, failover rules, or EDNS Client Subnet.
But there is an important design question behind every DNS steering system:
Where are routing decisions prepared?
Where are DNS answers returned?
This is where the terms control plane and data plane become useful.
In DNS steering, the control plane is where policies are created, validated, stored, and prepared.
The data plane is where DNS queries are answered.
A good DNS steering system separates these two responsibilities.
That separation helps keep DNS answers fast, predictable, and safe.
Authoritative DNS servers must answer quickly.
When a recursive resolver asks for a DNS record, the authoritative DNS server should not perform slow work before replying. It should not depend on a heavy database query for every request. It should not rebuild routing logic for every lookup. It should not wait for multiple external systems before sending an answer.
DNS is often one of the first steps before a user reaches a website, API, app, or streaming service.
If DNS is slow, the user experience is already affected before the application starts loading.
This is why DNS steering needs a clear split:
Control plane:
Prepare the decision.
Data plane:
Return the answer.
The control plane can take more time because it works before the query arrives.
The data plane must be fast because it works while the query is waiting.
The control plane is the management and preparation side of DNS steering.
It answers questions like:
What domains are managed?
What routing policies exist?
Which endpoints belong to each pool?
Which countries, ASNs, or networks match each policy?
Which endpoints are healthy?
Which fallback path should be used?
Which version of the policy is active?
The control plane does not usually answer live DNS queries directly.
Its job is to prepare the data needed by the authoritative DNS answering layer.
A control plane may include:
The control plane is where humans and automation define intent.
For example:
For users in the Philippines, prefer Manila.
For users in Singapore, prefer Singapore.
If Manila is unhealthy, use Singapore.
If no country match exists, use the global endpoint.
That is intent.
The control plane turns that intent into prepared routing data that the data plane can use quickly.
The data plane is the real-time answering side of DNS steering.
It receives DNS queries and returns DNS responses.
Its work should be narrow and fast.
A data plane may perform steps like:
Receive DNS query.
Identify the requested domain.
Read requester signal.
Find matching prepared policy.
Check prepared health state.
Choose the safest valid answer.
Return DNS response with TTL.
The data plane should avoid slow decisions.
It should avoid complex calculations per query.
It should use prepared state that is already loaded, checked, and ready.
In simple terms:
The control plane thinks ahead.
The data plane answers now.
A simple DNS steering system may look like this:
Operator or automation
|
v
Control plane
|
| prepares validated routing state
v
Runtime snapshot
|
| loaded by authoritative DNS layer
v
Data plane
|
| answers DNS queries
v
Recursive resolver
|
v
User device
The control plane prepares.
The data plane serves.
This structure keeps live DNS answering separate from slower management work.
The control plane should handle tasks that need validation, storage, review, or processing.
Operators need a place to define routing policies.
Example:
Domain: video.example.com
Rule 1:
If country is PH, answer Manila pool.
Rule 2:
If country is SG, answer Singapore pool.
Rule 3:
If no rule matches, answer global pool.
The control plane stores this policy.
It should also validate the policy before publishing it.
A policy may contain errors.
Examples:
A rule points to a deleted endpoint.
A country code is invalid.
A pool has no healthy endpoint.
A fallback path points back to itself.
A domain has no default answer.
These errors should be caught before the policy reaches the data plane.
The data plane should not discover basic policy errors during live DNS queries.
DNS steering often groups service endpoints into pools.
Example:
Manila pool:
203.0.113.10
203.0.113.11
Singapore pool:
198.51.100.20
198.51.100.21
Global pool:
192.0.2.30
The control plane manages these pools.
It defines which endpoints exist, which pool they belong to, and which domains can use them.
Health checks can come from different sources.
They may check:
HTTP status
TCP port reachability
DNS reachability
Application health endpoint
Origin availability
Edge node availability
The control plane should collect and process this state.
It should decide what health information is safe to publish to the data plane.
This matters because health checks can be wrong.
A single failed check should not always remove an endpoint.
A stale health check should not always be trusted.
A good control plane can apply rules such as:
Mark endpoint unhealthy only after 3 failed checks.
Mark endpoint healthy only after 2 successful checks.
Ignore health data older than a defined age.
Keep last known safe state if health input disappears.
DNS steering often uses country, region, or ASN data.
Raw network intelligence can be large.
The control plane should prepare only the data needed for active policies.
For example, if the active policy only uses country and ASN matching, the data plane may not need city-level data.
This keeps the data plane smaller and faster.
A runtime snapshot is a prepared copy of the state needed for fast DNS decisions.
It may include:
Active domains
Active records
Policy rules
Endpoint pools
Health status
Fallback rules
GeoIP mappings
ASN mappings
TTL values
Version information
The snapshot should be complete enough for the data plane to answer queries without calling slow systems.
The control plane should publish changes safely.
A bad policy update should not break live DNS answers.
A safer workflow may look like this:
Create policy.
Validate policy.
Build runtime snapshot.
Test snapshot.
Publish snapshot.
Monitor result.
Rollback if needed.
Rollback is important.
If a new policy causes wrong answers, the system should be able to return to the previous known good version.
The data plane should handle only the work needed to answer DNS queries.
The data plane receives a DNS query from a recursive resolver.
Example:
Query:
video.example.com A
The data plane must find the matching domain and record type.
The data plane may look at:
Source IP
Resolver IP
EDNS Client Subnet, if present
Requested domain
Record type
Transport details
EDNS Client Subnet, defined in RFC 7871, can give authoritative DNS a partial client network. This can improve location-based decisions, but it can also affect caching and privacy.
Source:
https://www.rfc-editor.org/rfc/rfc7871
The data plane should match the query against prepared policy state.
Example:
Requested name: video.example.com
Requester country: PH
Requester ASN: ISP A
Record type: A
Matched policy:
PH users from ISP A use Manila pool.
This matching should be fast.
The data plane should not rebuild the policy tree during the query.
The data plane should use health state already prepared by the control plane.
Example:
Manila pool: healthy
Singapore pool: healthy
Global pool: healthy
If Manila is healthy, return Manila.
If Manila is unhealthy, use the configured fallback.
The data plane returns the selected answer.
Example:
video.example.com. 60 IN A 203.0.113.10
The response includes a TTL.
The TTL tells recursive resolvers how long they may cache the answer.
TTL and caching behavior are part of standard DNS operation, described in RFC 1034 and RFC 1035.
Sources:
https://www.rfc-editor.org/rfc/rfc1034
https://www.rfc-editor.org/rfc/rfc1035
The data plane should be simple because it handles live traffic.
Every extra task added to the data plane increases risk.
Bad examples:
Query database on every DNS request.
Call external health API during live lookup.
Download GeoIP data during live query.
Recalculate all policy rules per request.
Wait for a remote service before answering.
These actions can make DNS answers slow or unreliable.
Better design:
Prepare data before queries arrive.
Load only active runtime state.
Use local in-memory lookups.
Return answers quickly.
Fail safely when data is missing.
The data plane should not be the place where policy is created.
It should be the place where prepared policy is applied.
The control plane can perform more complex work because it is not answering live DNS queries directly.
It can:
Validate input.
Check policy conflicts.
Build snapshots.
Run simulations.
Compare policy versions.
Review health trends.
Generate reports.
Audit changes.
This work is important, but it does not need to happen inside the live DNS query path.
That is the benefit of separation.
The control plane can be careful.
The data plane can be fast.
Assume an operator wants to add a new endpoint in Hong Kong.
New endpoint:
Hong Kong endpoint:
203.0.113.50
The control plane may process the change like this:
1. Operator adds Hong Kong endpoint.
2. Control plane validates the IP address.
3. Control plane checks that the endpoint belongs to an active pool.
4. Health check system tests the endpoint.
5. Control plane marks the endpoint ready.
6. Runtime snapshot is rebuilt.
7. Snapshot is published to the data plane.
8. Data plane starts using the endpoint for matching queries.
The data plane does not need to know how the endpoint was created.
It only needs the final prepared result:
For matching requesters, Hong Kong endpoint is a valid answer.
Assume the Manila endpoint fails.
Before failure:
video.example.com
PH users
Answer: Manila endpoint
The health system detects failure.
The control plane processes the health result.
The prepared state changes:
Manila endpoint: unhealthy
Singapore endpoint: healthy
Fallback: Singapore
The data plane receives the updated snapshot.
After that, new DNS answers may return Singapore instead:
video.example.com. 60 IN A 198.51.100.20
This does not move every active user instantly.
Resolvers may still cache the old answer until TTL expires.
But new lookups can receive the safer answer.
A requester arrives, but the system cannot classify the country.
This can happen because of:
Unknown resolver IP
Missing GeoIP data
Private network address
Incomplete ASN data
No EDNS Client Subnet
Policy mismatch
The data plane should not stop.
It should apply a prepared fallback.
Example:
If country is unknown, use global pool.
If ASN is unknown, use country policy.
If both are unknown, use default answer.
This rule should come from the control plane.
The data plane should only apply it.
When control plane and data plane responsibilities are mixed, several problems can appear.
If every query triggers database reads, external health checks, or large policy calculations, DNS response time can increase.
Slow DNS affects the first step of user connection.
If health state is checked directly during query time, answer behavior may change too often.
This can cause requesters to receive different answers in a short period, even when the real service state is unclear.
If policies are applied directly without versioned publishing, rollback becomes harder.
A bad rule may affect live DNS answers immediately.
If the data plane depends on many systems during live queries, those systems become part of the DNS answering path.
A database issue, API timeout, or storage delay can affect DNS responses.
If decisions are not versioned, operators may struggle to answer:
Which policy caused this DNS answer?
Which snapshot was active?
Was fallback used?
Was health data stale?
Was the requester classified correctly?
A clean separation makes these questions easier to answer.
A practical DNS steering system should follow this rule:
Move slow, risky, and complex work into the control plane.
Keep the data plane fast, bounded, and predictable.
This rule helps protect DNS performance.
It also makes policy changes safer.
| Area | Control Plane | Data Plane |
|---|---|---|
| Main job | Prepare routing state | Answer DNS queries |
| Speed requirement | Can take more time | Must be fast |
| Handles policy creation | Yes | No |
| Handles validation | Yes | Minimal |
| Handles live DNS queries | No | Yes |
| Uses database | Yes, usually | Avoid per-query database use |
| Uses prepared snapshot | Creates it | Reads it |
| Handles rollback | Yes | Uses selected version |
| Best design goal | Correct and safe policy | Fast and safe answers |
DNS steering has two major parts.
The control plane prepares the decision.
The data plane returns the answer.
This separation matters because DNS answers must be fast, but routing policy can be complex.
A good control plane validates policy, processes health, prepares runtime state, and publishes safe versions.
A good data plane receives queries, reads prepared state, applies policy, and returns answers quickly.
The better these two parts are separated, the easier it becomes to build a DNS steering system that is fast, reliable, and easier to operate.
In the next article, we will discuss why DNS steering is not load balancing, and why that distinction matters when designing traffic direction systems.
RFC 1034, Domain Names, Concepts and Facilities:
https://www.rfc-editor.org/rfc/rfc1034
RFC 1035, Domain Names, Implementation and Specification:
https://www.rfc-editor.org/rfc/rfc1035
RFC 7871, Client Subnet in DNS Queries:
https://www.rfc-editor.org/rfc/rfc7871