Hands‑On DNS Ops: Deploy BIND and CoreDNS with Full Troubleshooting Guide
This comprehensive guide walks you through DNS fundamentals, compares BIND, CoreDNS, PowerDNS and Unbound, provides step‑by‑step deployment scripts for BIND 9.20 and CoreDNS 1.12, explains DNSSEC configuration, caching optimizations, security hardening, high‑availability designs, monitoring, backup and recovery procedures, and advanced troubleshooting techniques.
Overview
Domain Name System (DNS) is the backbone of Internet connectivity; a single DNS outage can cripple all services. The article explains recursive and iterative resolution, the full lookup chain, and the importance of proper DNS architecture.
Record Types
A : IPv4 address mapping (e.g., web IN A 192.168.1.10)
AAAA : IPv6 address mapping
CNAME : Alias to another name
MX : Mail exchange
NS : Delegation to name servers
TXT : Text records (SPF/DKIM)
SRV : Service location (port, weight)
PTR : Reverse lookup
SOA : Zone start of authority
Software Comparison
BIND 9.20 : Authoritative + recursive, full DNSSEC support, limited plugin ecosystem.
CoreDNS 1.12 : Cloud‑native DNS, plugin chain architecture, default Kubernetes DNS.
PowerDNS 4.9 : Authoritative DNS with database back‑ends.
Unbound 1.22 : Pure recursive resolver, low memory footprint.
Deployment – BIND 9.20
Installation
# Ubuntu / Debian
sudo apt update && sudo apt install -y bind9 bind9-utils bind9-dnsutils
# CentOS / RHEL
sudo dnf install -y bind bind-utils
# Verify version
named -v # Expected output: BIND 9.20.x ...Global Options (named.conf.options)
options {
directory "/var/cache/bind";
listen-on { 192.168.1.10; 127.0.0.1; };
listen-on-v6 { none; };
recursion yes;
allow-recursion { 10.0.0.0/8; 172.16.0.0/12; 192.168.0.0/16; 127.0.0.1; };
forwarders { 223.5.5.5; 119.29.29.29; };
forward only;
dnssec-validation auto;
rate-limit {
responses-per-second 10;
window 5;
};
max-cache-size 512m;
max-cache-ttl 3600;
max-ncache-ttl 300;
version "not disclosed";
allow-transfer { none; };
allow-query { any; };
};Zone Definition (named.conf.local)
key "transfer-key" {
algorithm hmac-sha256;
secret "BASE64_ENCODED_SECRET_HERE";
};
zone "example.com" {
type master;
file "/var/lib/bind/db.example.com";
allow-transfer { key "transfer-key"; };
also-notify { 192.168.1.11; };
notify yes;
dnssec-policy default;
inline-signing yes;
key-directory "/var/lib/bind/keys/";
};
zone "1.168.192.in-addr.arpa" {
type master;
file "/var/lib/bind/db.192.168.1";
allow-transfer { key "transfer-key"; };
also-notify { 192.168.1.11; };
notify yes;
};Sample Zone File (db.example.com)
$TTL 3600
@ IN SOA ns1.example.com. admin.example.com. (
2026022601 ; serial
3600 ; refresh
900 ; retry
604800 ; expire
300 ; negative TTL )
@ IN NS ns1.example.com.
@ IN NS ns2.example.com.
ns1 IN A 192.168.1.10
ns2 IN A 192.168.1.11
@ IN A 192.168.1.100
www IN CNAME @
mail IN A 192.168.1.20
api IN A 192.168.1.101
_sip._tcp IN SRV 10 60 5060 sip.example.com.Configuration Validation
# Syntax check
sudo named-checkconf
# Zone check
sudo named-checkzone example.com /var/lib/bind/db.example.com
sudo named-checkzone 1.168.192.in-addr.arpa /var/lib/bind/db.192.168.1
# Reload without downtime
sudo rndc reload
# Test resolution
dig @192.168.1.10 www.example.com
dig @192.168.1.10 -x 192.168.1.100Deployment – CoreDNS 1.12
Binary Installation
# Download and extract
COREDNS_VERSION="1.12.0"
wget https://github.com/coredns/coredns/releases/download/v${COREDNS_VERSION}/coredns_${COREDNS_VERSION}_linux_amd64.tgz
tar -xzf coredns_${COREDNS_VERSION}_linux_amd64.tgz
sudo mv coredns /usr/local/bin/
sudo chmod +x /usr/local/bin/coredns
# Verify
coredns -versionCorefile Example
# /etc/coredns/Corefile
example.com {
file /etc/coredns/zones/db.example.com
log
errors
prometheus 0.0.0.0:9153
cache 300 {
success 9984 300
denial 9984 60
prefetch 3 1m 10%
}
}
. {
forward . tls://223.5.5.5 tls://223.6.6.6 {
tls_servername dns.alidns.com
health_check 5s
policy round_robin
}
cache 600
prometheus :9153
log
errors
}systemd Service
# /etc/systemd/system/coredns.service
[Unit]
Description=CoreDNS DNS Server
After=network.target
[Service]
Type=simple
User=coredns
Group=coredns
ExecStart=/usr/local/bin/coredns -conf /etc/coredns/Corefile
Restart=on-failure
RestartSec=5
LimitNOFILE=65536
AmbientCapabilities=CAP_NET_BIND_SERVICE
[Install]
WantedBy=multi-user.targetKubernetes Integration
# coredns-custom-configmap.yaml (ConfigMap in kube-system)
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
cache 30
loop
reload
loadbalance
}
example.com:53 {
errors
forward . 192.168.1.10 192.168.1.11 {
policy round_robin
health_check 5s
}
cache 120
prometheus :9153
}
.:53 {
errors
forward . tls://223.5.5.5 tls://223.6.6.6 {
tls_servername dns.alidns.com
health_check 10s
max_concurrent 2000
}
cache 600 {
success 9984 600
denial 9984 60
serve_stale 1h
}
prometheus :9153
loop
reload
}DNSSEC
DNSSEC adds cryptographic signatures to protect against cache poisoning. The signing chain is ZSK → RRSIG → KSK → DS → parent zone.
# Enable automatic inline signing in BIND
zone "example.com" {
type master;
file "/var/lib/bind/db.example.com";
dnssec-policy default;
inline-signing yes;
key-directory "/var/lib/bind/keys/";
};Validate with:
dig @192.168.1.10 example.com DNSKEY +dnssec
dig @192.168.1.10 www.example.com A +dnssecBest Practices & Security Hardening
Separate recursive and authoritative servers; avoid mixed deployment.
Use ACLs to restrict recursion (e.g., allow-recursion { trusted-nets; };).
Enable rate‑limiting to mitigate DNS amplification attacks.
Hide version strings ( version "not disclosed";) and disable zone transfers without TSIG.
Configure TSIG keys for authenticated zone transfers.
Prefer short TTL (300‑3600 s) for most records; use longer TTL only for static data.
Enable prefetch and serve_stale to reduce latency on upstream failures.
High‑Availability Designs
BIND master‑slave with Keepalived VIP (3‑5 s failover).
CoreDNS multi‑replica Service in Kubernetes (instant failover).
Anycast deployment with BGP for global load‑balancing across data centers.
# Keepalived health‑check script (checks local DNS response)
#!/bin/bash
QUERY_DOMAIN="health.internal."
TIMEOUT=2
if dig @127.0.0.1 "$${QUERY_DOMAIN}" A +short +time=$${TIMEOUT} +tries=1 > /dev/null 2>&1; then
exit 0
else
exit 1
fiMonitoring & Alerting
Key metrics: QPS, cache hit rate, query latency (P99), SERVFAIL ratio, zone‑transfer success, recursion timeout rate.
# BIND statistics channel (listens on 127.0.0.1:8053)
statistics-channels {
inet 127.0.0.1 port 8053 allow { 127.0.0.1; };
}
# Prometheus exporter (bind_exporter --bind.stats-url="http://127.0.0.1:8053/")
# CoreDNS Prometheus endpoint (already in Corefile: prometheus :9153)Example Prometheus alert rules (QPS > 5000, SERVFAIL > 1%, cache hit < 70%):
groups:
- name: dns_alerts
rules:
- alert: DNSQueryRateHigh
expr: rate(coredns_dns_requests_total[5m]) > 5000
for: 3m
labels:
severity: warning
annotations:
summary: "DNS QPS exceeds 5000"
- alert: DNSServfailRateHigh
expr: |
rate(coredns_dns_responses_total{rcode="SERVFAIL"}[5m]) /
rate(coredns_dns_responses_total[5m]) > 0.01
for: 5m
labels:
severity: critical
annotations:
summary: "DNS SERVFAIL ratio > 1%"
- alert: DNSCacheHitRateLow
expr: |
coredns_cache_hits_total /
(coredns_cache_hits_total + coredns_cache_misses_total) < 0.7
for: 10m
labels:
severity: warning
annotations:
summary: "DNS cache hit rate below 70%"Backup & Restore
#!/bin/bash
set -euo pipefail
BACKUP_DIR="/data/backup/dns"
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_PATH="${BACKUP_DIR}/${DATE}"
RETAIN_DAYS=30
mkdir -p "${BACKUP_PATH}"
# Config files
cp -a /etc/named.conf "${BACKUP_PATH}/"
cp -a /etc/named/ "${BACKUP_PATH}/named-etc/"
# Zone files
cp -a /var/named/zones/ "${BACKUP_PATH}/zones/"
# TSIG keys
cp -a /etc/named/keys/ "${BACKUP_PATH}/keys/"
# Freeze and export dynamic zones
rndc freeze
cp -a /var/named/dynamic/ "${BACKUP_PATH}/dynamic/" 2>/dev/null || true
rndc thaw
# Archive
tar -czf "${BACKUP_DIR}/dns-backup-${DATE}.tar.gz" -C "${BACKUP_DIR}" "${DATE}"
rm -rf "${BACKUP_PATH}"
# Cleanup old backups
find "${BACKUP_DIR}" -name "dns-backup-*.tar.gz" -mtime +${RETAIN_DAYS} -delete
echo "Backup completed: ${BACKUP_DIR}/dns-backup-${DATE}.tar.gz"Recovery steps: stop BIND, extract the tarball, restore configuration and zone files, run named-checkconf and named-checkzone for validation, fix permissions, and start the service.
Conclusion
The guide consolidates essential DNS operational knowledge: from basic record types to advanced DNSSEC signing, from single‑node BIND setups to cloud‑native CoreDNS in Kubernetes, from security hardening to observability, and from automated backup scripts to high‑availability patterns. Following these practices helps build a resilient, secure, and performant DNS infrastructure.
References
BIND 9 Administrator Reference Manual (official)
CoreDNS official documentation
"DNS and BIND" (5th edition) – classic textbook
Kubernetes DNS specification
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Raymond Ops
Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
