Operations 43 min read

Hands‑On DNS Ops: Deploy BIND and CoreDNS with Full Troubleshooting Guide

This comprehensive guide walks you through DNS fundamentals, compares BIND, CoreDNS, PowerDNS and Unbound, provides step‑by‑step deployment scripts for BIND 9.20 and CoreDNS 1.12, explains DNSSEC configuration, caching optimizations, security hardening, high‑availability designs, monitoring, backup and recovery procedures, and advanced troubleshooting techniques.

Raymond Ops
Raymond Ops
Raymond Ops
Hands‑On DNS Ops: Deploy BIND and CoreDNS with Full Troubleshooting Guide

Overview

Domain Name System (DNS) is the backbone of Internet connectivity; a single DNS outage can cripple all services. The article explains recursive and iterative resolution, the full lookup chain, and the importance of proper DNS architecture.

Record Types

A : IPv4 address mapping (e.g., web IN A 192.168.1.10)

AAAA : IPv6 address mapping

CNAME : Alias to another name

MX : Mail exchange

NS : Delegation to name servers

TXT : Text records (SPF/DKIM)

SRV : Service location (port, weight)

PTR : Reverse lookup

SOA : Zone start of authority

Software Comparison

BIND 9.20 : Authoritative + recursive, full DNSSEC support, limited plugin ecosystem.

CoreDNS 1.12 : Cloud‑native DNS, plugin chain architecture, default Kubernetes DNS.

PowerDNS 4.9 : Authoritative DNS with database back‑ends.

Unbound 1.22 : Pure recursive resolver, low memory footprint.

Deployment – BIND 9.20

Installation

# Ubuntu / Debian
sudo apt update && sudo apt install -y bind9 bind9-utils bind9-dnsutils
# CentOS / RHEL
sudo dnf install -y bind bind-utils
# Verify version
named -v   # Expected output: BIND 9.20.x ...

Global Options (named.conf.options)

options {
    directory "/var/cache/bind";
    listen-on { 192.168.1.10; 127.0.0.1; };
    listen-on-v6 { none; };
    recursion yes;
    allow-recursion { 10.0.0.0/8; 172.16.0.0/12; 192.168.0.0/16; 127.0.0.1; };
    forwarders { 223.5.5.5; 119.29.29.29; };
    forward only;
    dnssec-validation auto;
    rate-limit {
        responses-per-second 10;
        window 5;
    };
    max-cache-size 512m;
    max-cache-ttl 3600;
    max-ncache-ttl 300;
    version "not disclosed";
    allow-transfer { none; };
    allow-query { any; };
};

Zone Definition (named.conf.local)

key "transfer-key" {
    algorithm hmac-sha256;
    secret "BASE64_ENCODED_SECRET_HERE";
};
zone "example.com" {
    type master;
    file "/var/lib/bind/db.example.com";
    allow-transfer { key "transfer-key"; };
    also-notify { 192.168.1.11; };
    notify yes;
    dnssec-policy default;
    inline-signing yes;
    key-directory "/var/lib/bind/keys/";
};
zone "1.168.192.in-addr.arpa" {
    type master;
    file "/var/lib/bind/db.192.168.1";
    allow-transfer { key "transfer-key"; };
    also-notify { 192.168.1.11; };
    notify yes;
};

Sample Zone File (db.example.com)

$TTL 3600
@   IN  SOA ns1.example.com. admin.example.com. (
        2026022601 ; serial
        3600       ; refresh
        900        ; retry
        604800     ; expire
        300        ; negative TTL )
@   IN  NS  ns1.example.com.
@   IN  NS  ns2.example.com.
ns1 IN  A   192.168.1.10
ns2 IN  A   192.168.1.11
@   IN  A   192.168.1.100
www IN  CNAME @
mail IN  A   192.168.1.20
api  IN  A   192.168.1.101
_sip._tcp IN SRV 10 60 5060 sip.example.com.

Configuration Validation

# Syntax check
sudo named-checkconf
# Zone check
sudo named-checkzone example.com /var/lib/bind/db.example.com
sudo named-checkzone 1.168.192.in-addr.arpa /var/lib/bind/db.192.168.1
# Reload without downtime
sudo rndc reload
# Test resolution
dig @192.168.1.10 www.example.com
dig @192.168.1.10 -x 192.168.1.100

Deployment – CoreDNS 1.12

Binary Installation

# Download and extract
COREDNS_VERSION="1.12.0"
wget https://github.com/coredns/coredns/releases/download/v${COREDNS_VERSION}/coredns_${COREDNS_VERSION}_linux_amd64.tgz
 tar -xzf coredns_${COREDNS_VERSION}_linux_amd64.tgz
 sudo mv coredns /usr/local/bin/
 sudo chmod +x /usr/local/bin/coredns
# Verify
coredns -version

Corefile Example

# /etc/coredns/Corefile
example.com {
    file /etc/coredns/zones/db.example.com
    log
    errors
    prometheus 0.0.0.0:9153
    cache 300 {
        success 9984 300
        denial 9984 60
        prefetch 3 1m 10%
    }
}
. {
    forward . tls://223.5.5.5 tls://223.6.6.6 {
        tls_servername dns.alidns.com
        health_check 5s
        policy round_robin
    }
    cache 600
    prometheus :9153
    log
    errors
}

systemd Service

# /etc/systemd/system/coredns.service
[Unit]
Description=CoreDNS DNS Server
After=network.target

[Service]
Type=simple
User=coredns
Group=coredns
ExecStart=/usr/local/bin/coredns -conf /etc/coredns/Corefile
Restart=on-failure
RestartSec=5
LimitNOFILE=65536
AmbientCapabilities=CAP_NET_BIND_SERVICE

[Install]
WantedBy=multi-user.target

Kubernetes Integration

# coredns-custom-configmap.yaml (ConfigMap in kube-system)
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health {
            lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
            pods insecure
            fallthrough in-addr.arpa ip6.arpa
            ttl 30
        }
        prometheus :9153
        cache 30
        loop
        reload
        loadbalance
    }
    example.com:53 {
        errors
        forward . 192.168.1.10 192.168.1.11 {
            policy round_robin
            health_check 5s
        }
        cache 120
        prometheus :9153
    }
    .:53 {
        errors
        forward . tls://223.5.5.5 tls://223.6.6.6 {
            tls_servername dns.alidns.com
            health_check 10s
            max_concurrent 2000
        }
        cache 600 {
            success 9984 600
            denial 9984 60
            serve_stale 1h
        }
        prometheus :9153
        loop
        reload
    }

DNSSEC

DNSSEC adds cryptographic signatures to protect against cache poisoning. The signing chain is ZSK → RRSIG → KSK → DS → parent zone.

# Enable automatic inline signing in BIND
zone "example.com" {
    type master;
    file "/var/lib/bind/db.example.com";
    dnssec-policy default;
    inline-signing yes;
    key-directory "/var/lib/bind/keys/";
};

Validate with:

dig @192.168.1.10 example.com DNSKEY +dnssec
dig @192.168.1.10 www.example.com A +dnssec

Best Practices & Security Hardening

Separate recursive and authoritative servers; avoid mixed deployment.

Use ACLs to restrict recursion (e.g., allow-recursion { trusted-nets; };).

Enable rate‑limiting to mitigate DNS amplification attacks.

Hide version strings ( version "not disclosed";) and disable zone transfers without TSIG.

Configure TSIG keys for authenticated zone transfers.

Prefer short TTL (300‑3600 s) for most records; use longer TTL only for static data.

Enable prefetch and serve_stale to reduce latency on upstream failures.

High‑Availability Designs

BIND master‑slave with Keepalived VIP (3‑5 s failover).

CoreDNS multi‑replica Service in Kubernetes (instant failover).

Anycast deployment with BGP for global load‑balancing across data centers.

# Keepalived health‑check script (checks local DNS response)
#!/bin/bash
QUERY_DOMAIN="health.internal."
TIMEOUT=2
if dig @127.0.0.1 "$${QUERY_DOMAIN}" A +short +time=$${TIMEOUT} +tries=1 > /dev/null 2>&1; then
    exit 0
else
    exit 1
fi

Monitoring & Alerting

Key metrics: QPS, cache hit rate, query latency (P99), SERVFAIL ratio, zone‑transfer success, recursion timeout rate.

# BIND statistics channel (listens on 127.0.0.1:8053)
statistics-channels {
    inet 127.0.0.1 port 8053 allow { 127.0.0.1; };
}
# Prometheus exporter (bind_exporter --bind.stats-url="http://127.0.0.1:8053/")
# CoreDNS Prometheus endpoint (already in Corefile: prometheus :9153)

Example Prometheus alert rules (QPS > 5000, SERVFAIL > 1%, cache hit < 70%):

groups:
- name: dns_alerts
  rules:
  - alert: DNSQueryRateHigh
    expr: rate(coredns_dns_requests_total[5m]) > 5000
    for: 3m
    labels:
      severity: warning
    annotations:
      summary: "DNS QPS exceeds 5000"
  - alert: DNSServfailRateHigh
    expr: |
      rate(coredns_dns_responses_total{rcode="SERVFAIL"}[5m]) /
      rate(coredns_dns_responses_total[5m]) > 0.01
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "DNS SERVFAIL ratio > 1%"
  - alert: DNSCacheHitRateLow
    expr: |
      coredns_cache_hits_total /
      (coredns_cache_hits_total + coredns_cache_misses_total) < 0.7
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "DNS cache hit rate below 70%"

Backup & Restore

#!/bin/bash
set -euo pipefail
BACKUP_DIR="/data/backup/dns"
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_PATH="${BACKUP_DIR}/${DATE}"
RETAIN_DAYS=30
mkdir -p "${BACKUP_PATH}"
# Config files
cp -a /etc/named.conf "${BACKUP_PATH}/"
cp -a /etc/named/ "${BACKUP_PATH}/named-etc/"
# Zone files
cp -a /var/named/zones/ "${BACKUP_PATH}/zones/"
# TSIG keys
cp -a /etc/named/keys/ "${BACKUP_PATH}/keys/"
# Freeze and export dynamic zones
rndc freeze
cp -a /var/named/dynamic/ "${BACKUP_PATH}/dynamic/" 2>/dev/null || true
rndc thaw
# Archive
tar -czf "${BACKUP_DIR}/dns-backup-${DATE}.tar.gz" -C "${BACKUP_DIR}" "${DATE}"
rm -rf "${BACKUP_PATH}"
# Cleanup old backups
find "${BACKUP_DIR}" -name "dns-backup-*.tar.gz" -mtime +${RETAIN_DAYS} -delete
echo "Backup completed: ${BACKUP_DIR}/dns-backup-${DATE}.tar.gz"

Recovery steps: stop BIND, extract the tarball, restore configuration and zone files, run named-checkconf and named-checkzone for validation, fix permissions, and start the service.

Conclusion

The guide consolidates essential DNS operational knowledge: from basic record types to advanced DNSSEC signing, from single‑node BIND setups to cloud‑native CoreDNS in Kubernetes, from security hardening to observability, and from automated backup scripts to high‑availability patterns. Following these practices helps build a resilient, secure, and performant DNS infrastructure.

References

BIND 9 Administrator Reference Manual (official)

CoreDNS official documentation

"DNS and BIND" (5th edition) – classic textbook

Kubernetes DNS specification

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

monitoringHigh AvailabilityKubernetesDNSBINDCoreDNSDNSSEC
Raymond Ops
Written by

Raymond Ops

Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.