The Control Plane And Data Plane Of DNS Steering

In the first article, we explained how authoritative DNS can influence traffic direction.

The main idea was simple:

DNS steering returns different DNS answers based on available signals and policy.

Those signals may include country, ASN, resolver IP, health status, failover rules, or EDNS Client Subnet.

But there is an important design question behind every DNS steering system:

Where are routing decisions prepared?
Where are DNS answers returned?

This is where the terms control plane and data plane become useful.

In DNS steering, the control plane is where policies are created, validated, stored, and prepared.

The data plane is where DNS queries are answered.

A good DNS steering system separates these two responsibilities.

That separation helps keep DNS answers fast, predictable, and safe.

Why This Distinction Matters

Authoritative DNS servers must answer quickly.

When a recursive resolver asks for a DNS record, the authoritative DNS server should not perform slow work before replying. It should not depend on a heavy database query for every request. It should not rebuild routing logic for every lookup. It should not wait for multiple external systems before sending an answer.

DNS is often one of the first steps before a user reaches a website, API, app, or streaming service.

If DNS is slow, the user experience is already affected before the application starts loading.

This is why DNS steering needs a clear split:

Control plane:
Prepare the decision.

Data plane:
Return the answer.

The control plane can take more time because it works before the query arrives.

The data plane must be fast because it works while the query is waiting.

What Is The Control Plane?

The control plane is the management and preparation side of DNS steering.

It answers questions like:

What domains are managed?
What routing policies exist?
Which endpoints belong to each pool?
Which countries, ASNs, or networks match each policy?
Which endpoints are healthy?
Which fallback path should be used?
Which version of the policy is active?

The control plane does not usually answer live DNS queries directly.

Its job is to prepare the data needed by the authoritative DNS answering layer.

A control plane may include:

Policy management
Zone and record management
Endpoint and pool management
Health check processing
GeoIP and ASN data handling
Validation rules
Runtime snapshot generation
Publishing and rollback logic
Audit logs
Operator access control

The control plane is where humans and automation define intent.

For example:

For users in the Philippines, prefer Manila.
For users in Singapore, prefer Singapore.
If Manila is unhealthy, use Singapore.
If no country match exists, use the global endpoint.

That is intent.

The control plane turns that intent into prepared routing data that the data plane can use quickly.

What Is The Data Plane?

The data plane is the real-time answering side of DNS steering.

It receives DNS queries and returns DNS responses.

Its work should be narrow and fast.

A data plane may perform steps like:

Receive DNS query.
Identify the requested domain.
Read requester signal.
Find matching prepared policy.
Check prepared health state.
Choose the safest valid answer.
Return DNS response with TTL.

The data plane should avoid slow decisions.

It should avoid complex calculations per query.

It should use prepared state that is already loaded, checked, and ready.

In simple terms:

The control plane thinks ahead.
The data plane answers now.

A Simple DNS Steering Architecture

A simple DNS steering system may look like this:

Operator or automation
        |
        v
Control plane
        |
        | prepares validated routing state
        v
Runtime snapshot
        |
        | loaded by authoritative DNS layer
        v
Data plane
        |
        | answers DNS queries
        v
Recursive resolver
        |
        v
User device

The control plane prepares.

The data plane serves.

This structure keeps live DNS answering separate from slower management work.

What The Control Plane Should Handle

The control plane should handle tasks that need validation, storage, review, or processing.

1. Policy Creation

Operators need a place to define routing policies.

Example:

Domain: video.example.com

Rule 1:
If country is PH, answer Manila pool.

Rule 2:
If country is SG, answer Singapore pool.

Rule 3:
If no rule matches, answer global pool.

The control plane stores this policy.

It should also validate the policy before publishing it.

2. Policy Validation

A policy may contain errors.

Examples:

A rule points to a deleted endpoint.
A country code is invalid.
A pool has no healthy endpoint.
A fallback path points back to itself.
A domain has no default answer.

These errors should be caught before the policy reaches the data plane.

The data plane should not discover basic policy errors during live DNS queries.

3. Endpoint And Pool Management

DNS steering often groups service endpoints into pools.

Example:

Manila pool:
203.0.113.10
203.0.113.11

Singapore pool:
198.51.100.20
198.51.100.21

Global pool:
192.0.2.30

The control plane manages these pools.

It defines which endpoints exist, which pool they belong to, and which domains can use them.

4. Health State Processing

Health checks can come from different sources.

They may check:

HTTP status
TCP port reachability
DNS reachability
Application health endpoint
Origin availability
Edge node availability

The control plane should collect and process this state.

It should decide what health information is safe to publish to the data plane.

This matters because health checks can be wrong.

A single failed check should not always remove an endpoint.

A stale health check should not always be trusted.

A good control plane can apply rules such as:

Mark endpoint unhealthy only after 3 failed checks.
Mark endpoint healthy only after 2 successful checks.
Ignore health data older than a defined age.
Keep last known safe state if health input disappears.

5. GeoIP And ASN Data Preparation

DNS steering often uses country, region, or ASN data.

Raw network intelligence can be large.

The control plane should prepare only the data needed for active policies.

For example, if the active policy only uses country and ASN matching, the data plane may not need city-level data.

This keeps the data plane smaller and faster.

6. Runtime Snapshot Creation

A runtime snapshot is a prepared copy of the state needed for fast DNS decisions.

It may include:

Active domains
Active records
Policy rules
Endpoint pools
Health status
Fallback rules
GeoIP mappings
ASN mappings
TTL values
Version information

The snapshot should be complete enough for the data plane to answer queries without calling slow systems.

7. Publishing And Rollback

The control plane should publish changes safely.

A bad policy update should not break live DNS answers.

A safer workflow may look like this:

Create policy.
Validate policy.
Build runtime snapshot.
Test snapshot.
Publish snapshot.
Monitor result.
Rollback if needed.

Rollback is important.

If a new policy causes wrong answers, the system should be able to return to the previous known good version.

What The Data Plane Should Handle

The data plane should handle only the work needed to answer DNS queries.

1. Receive The Query

The data plane receives a DNS query from a recursive resolver.

Example:

Query:
video.example.com A

The data plane must find the matching domain and record type.

2. Read Requester Signal

The data plane may look at:

Source IP
Resolver IP
EDNS Client Subnet, if present
Requested domain
Record type
Transport details

EDNS Client Subnet, defined in RFC 7871, can give authoritative DNS a partial client network. This can improve location-based decisions, but it can also affect caching and privacy.

Source:

https://www.rfc-editor.org/rfc/rfc7871

3. Match Prepared Policy

The data plane should match the query against prepared policy state.

Example:

Requested name: video.example.com
Requester country: PH
Requester ASN: ISP A
Record type: A

Matched policy:
PH users from ISP A use Manila pool.

This matching should be fast.

The data plane should not rebuild the policy tree during the query.

4. Check Prepared Health State

The data plane should use health state already prepared by the control plane.

Example:

Manila pool: healthy
Singapore pool: healthy
Global pool: healthy

If Manila is healthy, return Manila.

If Manila is unhealthy, use the configured fallback.

5. Return The DNS Answer

The data plane returns the selected answer.

Example:

video.example.com. 60 IN A 203.0.113.10

The response includes a TTL.

The TTL tells recursive resolvers how long they may cache the answer.

TTL and caching behavior are part of standard DNS operation, described in RFC 1034 and RFC 1035.

Sources:

https://www.rfc-editor.org/rfc/rfc1034
https://www.rfc-editor.org/rfc/rfc1035

Why The Data Plane Should Stay Small

The data plane should be simple because it handles live traffic.

Every extra task added to the data plane increases risk.

Bad examples:

Query database on every DNS request.
Call external health API during live lookup.
Download GeoIP data during live query.
Recalculate all policy rules per request.
Wait for a remote service before answering.

These actions can make DNS answers slow or unreliable.

Better design:

Prepare data before queries arrive.
Load only active runtime state.
Use local in-memory lookups.
Return answers quickly.
Fail safely when data is missing.

The data plane should not be the place where policy is created.

It should be the place where prepared policy is applied.

Why The Control Plane Can Be More Complex

The control plane can perform more complex work because it is not answering live DNS queries directly.

It can:

Validate input.
Check policy conflicts.
Build snapshots.
Run simulations.
Compare policy versions.
Review health trends.
Generate reports.
Audit changes.

This work is important, but it does not need to happen inside the live DNS query path.

That is the benefit of separation.

The control plane can be careful.

The data plane can be fast.

Example: Policy Update Flow

Assume an operator wants to add a new endpoint in Hong Kong.

New endpoint:

Hong Kong endpoint:
203.0.113.50

The control plane may process the change like this:

1. Operator adds Hong Kong endpoint.
2. Control plane validates the IP address.
3. Control plane checks that the endpoint belongs to an active pool.
4. Health check system tests the endpoint.
5. Control plane marks the endpoint ready.
6. Runtime snapshot is rebuilt.
7. Snapshot is published to the data plane.
8. Data plane starts using the endpoint for matching queries.

The data plane does not need to know how the endpoint was created.

It only needs the final prepared result:

For matching requesters, Hong Kong endpoint is a valid answer.

Example: Health Failure Flow

Assume the Manila endpoint fails.

Before failure:

video.example.com
PH users
Answer: Manila endpoint

The health system detects failure.

The control plane processes the health result.

The prepared state changes:

Manila endpoint: unhealthy
Singapore endpoint: healthy
Fallback: Singapore

The data plane receives the updated snapshot.

After that, new DNS answers may return Singapore instead:

video.example.com. 60 IN A 198.51.100.20

This does not move every active user instantly.

Resolvers may still cache the old answer until TTL expires.

But new lookups can receive the safer answer.

Example: Missing Data Flow

A requester arrives, but the system cannot classify the country.

This can happen because of:

Unknown resolver IP
Missing GeoIP data
Private network address
Incomplete ASN data
No EDNS Client Subnet
Policy mismatch

The data plane should not stop.

It should apply a prepared fallback.

Example:

If country is unknown, use global pool.
If ASN is unknown, use country policy.
If both are unknown, use default answer.

This rule should come from the control plane.

The data plane should only apply it.

What Can Go Wrong Without Separation

When control plane and data plane responsibilities are mixed, several problems can appear.

Slow DNS Answers

If every query triggers database reads, external health checks, or large policy calculations, DNS response time can increase.

Slow DNS affects the first step of user connection.

Unstable Failover

If health state is checked directly during query time, answer behavior may change too often.

This can cause requesters to receive different answers in a short period, even when the real service state is unclear.

Harder Rollback

If policies are applied directly without versioned publishing, rollback becomes harder.

A bad rule may affect live DNS answers immediately.

Higher Failure Risk

If the data plane depends on many systems during live queries, those systems become part of the DNS answering path.

A database issue, API timeout, or storage delay can affect DNS responses.

Poor Observability

If decisions are not versioned, operators may struggle to answer:

Which policy caused this DNS answer?
Which snapshot was active?
Was fallback used?
Was health data stale?
Was the requester classified correctly?

A clean separation makes these questions easier to answer.

Important Design Rule

A practical DNS steering system should follow this rule:

Move slow, risky, and complex work into the control plane.
Keep the data plane fast, bounded, and predictable.

This rule helps protect DNS performance.

It also makes policy changes safer.

Control Plane Versus Data Plane Summary

Area	Control Plane	Data Plane
Main job	Prepare routing state	Answer DNS queries
Speed requirement	Can take more time	Must be fast
Handles policy creation	Yes	No
Handles validation	Yes	Minimal
Handles live DNS queries	No	Yes
Uses database	Yes, usually	Avoid per-query database use
Uses prepared snapshot	Creates it	Reads it
Handles rollback	Yes	Uses selected version
Best design goal	Correct and safe policy	Fast and safe answers

The Main Takeaway

DNS steering has two major parts.

The control plane prepares the decision.

The data plane returns the answer.

This separation matters because DNS answers must be fast, but routing policy can be complex.

A good control plane validates policy, processes health, prepares runtime state, and publishes safe versions.

A good data plane receives queries, reads prepared state, applies policy, and returns answers quickly.

The better these two parts are separated, the easier it becomes to build a DNS steering system that is fast, reliable, and easier to operate.

In the next article, we will discuss why DNS steering is not load balancing, and why that distinction matters when designing traffic direction systems.

Sources

RFC 1034, Domain Names, Concepts and Facilities:
https://www.rfc-editor.org/rfc/rfc1034

RFC 1035, Domain Names, Implementation and Specification:
https://www.rfc-editor.org/rfc/rfc1035

RFC 7871, Client Subnet in DNS Queries:
https://www.rfc-editor.org/rfc/rfc7871

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Why This Distinction Matters

What Is The Control Plane?

What Is The Data Plane?

A Simple DNS Steering Architecture

What The Control Plane Should Handle

1. Policy Creation

2. Policy Validation

3. Endpoint And Pool Management

4. Health State Processing

5. GeoIP And ASN Data Preparation

6. Runtime Snapshot Creation

7. Publishing And Rollback

What The Data Plane Should Handle

1. Receive The Query

2. Read Requester Signal

3. Match Prepared Policy

4. Check Prepared Health State

5. Return The DNS Answer

Why The Data Plane Should Stay Small

Why The Control Plane Can Be More Complex

Example: Policy Update Flow

Example: Health Failure Flow

Example: Missing Data Flow

What Can Go Wrong Without Separation

Slow DNS Answers

Unstable Failover

Harder Rollback

Higher Failure Risk

Poor Observability

Important Design Rule

Control Plane Versus Data Plane Summary

The Main Takeaway

Sources

lordfrancs3

Related Posts

DNS Steering 101: How Authoritative DNS Influences Traffic Direction

Decoding SNMP: From Agents to Managers

SNMP 101: A Beginner’s Guide to Network Management

Trending now