Master DNS Operations: Deploy BIND & CoreDNS with Real‑World Troubleshooting
This guide walks you through DNS fundamentals, compares BIND, CoreDNS, PowerDNS and Unbound, provides step‑by‑step installation and configuration scripts for BIND 9 and CoreDNS on Linux and Kubernetes, explains caching, DNSSEC, security hardening, high‑availability designs, monitoring, backup and recovery, and shares best‑practice tips for production environments.
Overview
Domain Name System (DNS) is a critical Internet service that maps human‑readable domain names to IP addresses. A single DNS outage can render an entire business unavailable, so reliable deployment, security hardening, and observability are essential.
Technical comparison
BIND 9.20.x – Full‑stack authoritative and recursive server, native DNSSEC support, extensive documentation. Ideal for traditional data‑center environments.
CoreDNS 1.12.x – Cloud‑native DNS written in Go, uses a Caddy‑style Corefile for configuration, rich plugin ecosystem, default DNS for Kubernetes clusters.
PowerDNS 4.9.x – Authoritative server with database back‑ends (MySQL, PostgreSQL, LDAP). Suited for large‑scale zone management via API.
Unbound 1.22.x – Lightweight recursive resolver with a small memory footprint, perfect for caching or privacy‑focused DNS.
Deployment steps
BIND 9.20.x deployment and configuration
Installation (Ubuntu/Debian):
# Ubuntu / Debian
sudo apt update && sudo apt install -y bind9 bind9-utils bind9-dnsutils
# Verify version
named -v # Expected output: BIND 9.20.x ...Installation (CentOS/RHEL):
# CentOS / RHEL
sudo dnf install -y bind bind-utilsGlobal options ( /etc/bind/named.conf.options ) – key parameters for a production recursive resolver:
options {
directory "/var/cache/bind";
listen-on { 192.168.1.10; 127.0.0.1; };
listen-on-v6 { none; };
recursion yes;
allow-recursion { 10.0.0.0/8; 172.16.0.0/12; 192.168.0.0/16; 127.0.0.1; };
forwarders { 223.5.5.5; 119.29.29.29; };
forward only;
dnssec-validation auto;
rate-limit { responses-per-second 10; window 5; slip 2; errors-per-second 5; nxdomains-per-second 5; log-only no; };
version "not disclosed";
hostname none;
max-cache-size 512m;
max-cache-ttl 3600;
max-ncache-ttl 300;
prefetch {2 9;};
allow-transfer { none; };
allow-query { any; };
};Zone definitions ( /etc/bind/named.conf.local ) for a forward zone and a reverse zone:
include "/etc/bind/transfer.key";
zone "example.com" {
type master;
file "/var/lib/bind/db.example.com";
allow-transfer { key "transfer-key"; };
also-notify { 192.168.1.11; };
notify yes;
dnssec-policy default;
inline-signing yes;
key-directory "/var/lib/bind/keys/";
};
zone "1.168.192.in-addr.arpa" {
type master;
file "/var/lib/bind/db.192.168.1";
allow-transfer { key "transfer-key"; };
also-notify { 192.168.1.11; };
};Sample forward zone file ( /var/lib/bind/db.example.com ) :
$TTL 3600
@ IN SOA ns1.example.com. admin.example.com. (
2026022601 ; serial (YYYYMMDDNN)
3600 ; refresh
900 ; retry
604800 ; expire
300 ; negative cache TTL
)
IN NS ns1.example.com.
IN NS ns2.example.com.
ns1 IN A 192.168.1.10
ns2 IN A 192.168.1.11
@ IN A 192.168.1.100
www IN CNAME @
mail IN A 192.168.1.20
api IN A 192.168.1.101
db-master IN A 192.168.1.30
db-slave IN A 192.168.1.31
@ IN MX 10 mail.example.com.
@ IN TXT "v=spf1 mx ip4:192.168.1.0/24 ~all"
_sip._tcp IN SRV 10 60 5060 sip.example.com.Configuration validation – syntax and zone checks:
# Syntax check
sudo named-checkconf
# Zone check
sudo named-checkzone example.com /var/lib/bind/db.example.com
sudo named-checkzone 1.168.192.in-addr.arpa /var/lib/bind/db.192.168.1Enable and start the service :
sudo systemctl enable --now named
sudo systemctl status namedCoreDNS 1.12.x deployment and configuration
Binary installation (Linux AMD64):
# Download CoreDNS 1.12.x
COREDNS_VERSION="1.12.0"
wget https://github.com/coredns/coredns/releases/download/v${COREDNS_VERSION}/coredns_${COREDNS_VERSION}_linux_amd64.tgz
tar -xzf coredns_${COREDNS_VERSION}_linux_amd64.tgz
sudo mv coredns /usr/local/bin/
sudo chmod +x /usr/local/bin/coredns
coredns -versionCorefile ( /etc/coredns/Corefile ) – internal zone, caching, forwarding to public resolvers, and Prometheus metrics:
# Internal zone
example.com {
file /etc/coredns/zones/db.example.com
log
errors
prometheus 0.0.0.0:9153
cache 300 {
success 9984 300
denial 9984 60
}
}
# Forward all other queries to upstream DNS over TLS
. {
forward . tls://223.5.5.5 tls://223.6.6.6 {
tls_servername dns.alidns.com
health_check 5s
policy round_robin
}
cache 600
health 0.0.0.0:8080
ready 0.0.0.0:8181
prometheus :9153
log
errors
}systemd service ( /etc/systemd/system/coredns.service ) :
[Unit]
Description=CoreDNS DNS Server
Documentation=https://coredns.io
After=network.target
[Service]
Type=simple
User=coredns
Group=coredns
ExecStart=/usr/local/bin/coredns -conf /etc/coredns/Corefile
Restart=on-failure
RestartSec=5
LimitNOFILE=65536
AmbientCapabilities=CAP_NET_BIND_SERVICE
[Install]
WantedBy=multi-user.targetKubernetes integration – ConfigMap that adds cluster DNS, internal corporate zone, and public DoT forwarding:
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
# Cluster DNS
cluster.local:53 {
errors
health { lameduck 5s }
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
cache 30
loop
reload
loadbalance
}
# Corporate internal zone
example.com:53 {
errors
forward . 192.168.1.10 192.168.1.11 {
policy round_robin
health_check 5s
}
cache 120
prometheus :9153
}
# Public DNS with DoT
.:53 {
errors
forward . tls://223.5.5.5 tls://223.6.6.6 {
tls_servername dns.alidns.com
health_check 10s
max_concurrent 2000
}
cache 600 {
success 9984 600
denial 9984 60
serve_stale 1h
}
prometheus :9153
loop
reload
}DNSSEC configuration
DNSSEC adds cryptographic signatures to DNS data, creating a trust chain: root KSK → .com DS → example.com KSK → ZSK → signed records. BIND 9.20.x can perform automatic inline signing.
# /etc/bind/named.conf.local – enable automatic signing
zone "example.com" {
type master;
file "/var/lib/bind/db.example.com";
dnssec-policy default;
inline-signing yes;
key-directory "/var/lib/bind/keys/";
};
# Custom policy example (ECDSA‑P256 keys, NSEC3 without iterations)
dnssec-policy "corp-policy" {
keys {
ksk key-directory lifetime unlimited algorithm ecdsap256sha256;
zsk key-directory lifetime 90d algorithm ecdsap256sha256;
};
nsec3param iterations 0 optout no salt-length 0;
};Verification of signatures:
# Verify DNSKEY set
dig @192.168.1.10 example.com DNSKEY +dnssec
# Verify an A record with RRSIG
dig @192.168.1.10 www.example.com A +dnssec
# Check RRSIG on the SOA record
dig @192.168.1.10 example.com SOA +dnssec | grep RRSIGBest practices and caveats
Separate recursive and authoritative servers in production to avoid cache pollution and to allow independent scaling.
Adopt a multi‑layer cache architecture: client → local cache (systemd‑resolved/dnsmasq) → zone cache (CoreDNS) → recursive resolver (BIND/Unbound) → authoritative server.
Restrict recursion to trusted networks using allow-recursion ACLs; enable rate-limit to mitigate DNS amplification attacks.
Maintain strictly increasing SOA serial numbers (format YYYYMMDDNN) to ensure successful zone transfers.
Use sensible TTLs (300–3600 s); avoid values below 60 s unless required for rapid failover.
Enable DNSSEC validation (default in BIND 9.20) and keep system time synchronized via NTP.
For high availability, consider BIND master/slave with a Keepalived VIP, CoreDNS multi‑replica Service in Kubernetes, or Anycast routing with BGP.
Fault diagnosis and monitoring
Log access
# Enable query logging in BIND
rndc querylog on
# Follow BIND logs
sudo journalctl -u named -f --no-pager
# CoreDNS logs (enable in Corefile with "log")
kubectl -n kube-system logs -l k8s-app=kube-dns -f --tail=100Common issues
SERVFAIL – check upstream reachability, DNSSEC validation status, or zone file syntax.
REFUSED – client IP not covered by allow-recursion ACL.
Zone transfer failures – mismatched TSIG keys, missing allow-transfer permission, or firewall blocks on TCP 53.
DNSSEC failures – expired signatures, missing trust anchors, or unsynchronized clocks.
Cache staleness – flush with rndc flush or adjust max-ncache-ttl and serve_stale settings.
Performance monitoring
Expose BIND statistics via statistics-channels { inet 127.0.0.1 port 8053; }; and scrape with bind_exporter (Prometheus).
CoreDNS ships native Prometheus metrics on the prometheus plugin port (default :9153).
Key alerts: QPS > 80 % of capacity, cache hit rate < 70 %, SERVFAIL ratio > 1 %, zone‑transfer failures, recursive timeout > 2 %.
Backup and recovery
Backup script (BIND) – creates a timestamped archive of configuration, keys, and zone files, records current SOA serials, and retains 30 days of backups:
#!/bin/bash
set -euo pipefail
BACKUP_DIR="/data/backup/dns"
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_PATH="${BACKUP_DIR}/${DATE}"
RETAIN_DAYS=30
mkdir -p "${BACKUP_PATH}"
# Config and keys
cp -a /etc/named.conf "${BACKUP_PATH}/"
cp -a /etc/named/ "${BACKUP_PATH}/named-etc/"
cp -a /var/named/zones/ "${BACKUP_PATH}/zones/"
cp -a /etc/named/keys/ "${BACKUP_PATH}/keys/"
# Freeze dynamic zones, copy, then thaw
rndc freeze
cp -a /var/named/dynamic/ "${BACKUP_PATH}/dynamic/" 2>/dev/null || true
rndc thaw
# Record SOA serials for verification
for zone_file in ${BACKUP_PATH}/zones/*.zone; do
zone_name=$(basename "${zone_file}" .zone)
dig @127.0.0.1 "${zone_name}" SOA +short >> "${BACKUP_PATH}/soa-serials.txt"
done
# Create compressed archive
tar -czf "${BACKUP_DIR}/dns-backup-${DATE}.tar.gz" -C "${BACKUP_DIR}" "${DATE}"
rm -rf "${BACKUP_PATH}"
find "${BACKUP_DIR}" -name "dns-backup-*.tar.gz" -mtime +${RETAIN_DAYS} -delete
echo "Backup completed: ${BACKUP_DIR}/dns-backup-${DATE}.tar.gz"Recovery procedure – restore the latest backup, verify syntax, fix permissions, and restart the service:
# Stop the failed node
sudo systemctl stop named
# Extract the latest backup
BACKUP_FILE="/data/backup/dns/dns-backup-20250126_020000.tar.gz"
tar -xzf "${BACKUP_FILE}" -C /tmp/dns-restore/
# Restore configuration and zones
sudo cp -a /tmp/dns-restore/named-etc/* /etc/named/
sudo cp -a /tmp/dns-restore/zones/* /var/named/zones/
sudo cp -a /tmp/dns-restore/keys/* /etc/named/keys/
# Validate syntax
named-checkconf /etc/named.conf
for zone_file in /var/named/zones/*.zone; do
zone_name=$(basename "${zone_file}" .zone)
named-checkzone "${zone_name}" "${zone_file}"
done
# Fix permissions and start service
sudo chown -R named:named /var/named/zones/ /etc/named/keys/
sudo systemctl start named
# Verify
dig @127.0.0.1 example.com A +short
rndc statusSummary
The article demonstrates a production‑grade DNS architecture that combines BIND for authoritative services and recursive resolution with CoreDNS for cloud‑native caching and Kubernetes integration. It covers installation, configuration, DNSSEC signing, security hardening, performance tuning, monitoring, backup, and disaster recovery, providing a complete reference for reliable DNS operations.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
