How to Secure Containers: Complete Guide to Image Scanning and Zero‑Trust Runtime Protection
This comprehensive guide walks you through securing production container environments by covering image vulnerability scanning with Trivy, image signing with Cosign, Kubernetes hardening with SecurityContext and Pod Security Admission, runtime protection using Falco, network isolation via NetworkPolicy, and continuous monitoring with Prometheus, complete with scripts, CI/CD integration, and troubleshooting tips.
Scope and Prerequisites
Target environments: production containers, CI/CD pipelines, image supply‑chain and runtime protection. Requires RHEL 8+/Ubuntu 20.04+ with kernel ≥ 5.4, Docker 20.10+/containerd 1.6+/CRI‑O 1.24+, Kubernetes 1.25+ (Pod Security Admission), root or admin privileges, network access to NVD/GitHub Advisory, and the tools Trivy 0.48+, Cosign 2.0+, optional Falco 0.36+.
Tool Versions
Trivy ≥ 0.48 (DB auto‑update)
Grype ≥ 0.74 (Syft ≥ 0.100)
Cosign ≥ 2.0 (requires Rekor)
Falco ≥ 0.36 (kernel module or eBPF)
Harbor ≥ 2.8 (integrated Trivy)
Implementation Overview
1. Install and configure Trivy
RHEL/CentOS:
# Install Trivy
sudo rpm -ivh https://github.com/aquasecurity/trivy/releases/download/v0.48.3/trivy_0.48.3_Linux-64bit.rpm
# Verify
trivy --version
# Update vulnerability database (offline mode optional)
trivy image --download-db-only
export TRIVY_CACHE_DIR=/opt/trivy-db # if using custom cacheUbuntu/Debian:
wget https://github.com/aquasecurity/trivy/releases/download/v0.48.3/trivy_0.48.3_Linux-64bit.deb
sudo dpkg -i trivy_0.48.3_Linux-64bit.deb
trivy image --download-db-only --cache-dir /opt/trivy-dbTypical scan commands:
# Scan an image and show only critical/high
trivy image --severity CRITICAL,HIGH nginx:latest
# JSON output for CI
trivy image -f json -o nginx-scan.json nginx:latest2. Dockerfile security baseline
Use multi‑stage build and a distroless or minimal base, run as a non‑root user.
# Build stage
FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -a -ldflags '-extldflags "-static"' -o myapp
# Runtime stage (distroless)
FROM gcr.io/distroless/static-debian12:nonroot
USER nonroot:nonroot
WORKDIR /app
COPY --from=builder --chown=nonroot:nonroot /app/myapp .
EXPOSE 8080
ENTRYPOINT ["/app/myapp"]Verification:
# Build image
docker build -t myapp:secure .
# Scan for high/critical issues
trivy image --severity HIGH,CRITICAL myapp:secure
# Confirm non‑root user
docker inspect myapp:secure | jq '.[0].Config.User'
# Expected: "nonroot"3. CI/CD integration (GitLab CI example)
stages:
- build
- scan
- sign
- deploy
variables:
IMAGE_NAME: myapp
IMAGE_TAG: $CI_COMMIT_SHORT_SHA
REGISTRY: registry.example.com
build:
stage: build
image: docker:24-dind
script:
- docker build -t $REGISTRY/$IMAGE_NAME:$IMAGE_TAG .
- docker push $REGISTRY/$IMAGE_NAME:$IMAGE_TAG
security-scan:
stage: scan
image: aquasec/trivy:latest
script:
- trivy image --exit-code 1 --severity CRITICAL $REGISTRY/$IMAGE_NAME:$IMAGE_TAG
- trivy image --severity HIGH,CRITICAL --format json --output scan-report.json $REGISTRY/$IMAGE_NAME:$IMAGE_TAG
artifacts:
reports:
container_scanning: scan-report.json
expire_in: 30 days
allow_failure: false
sign-image:
stage: sign
image: gcr.io/projectsigstore/cosign:v2.2
script:
- cosign sign --key cosign.key $REGISTRY/$IMAGE_NAME:$IMAGE_TAG
only:
- mainKey CI settings: --exit-code 1 aborts the pipeline on critical findings.
Artifacts store the JSON report for later audit.
Signing runs only on the main branch.
4. Image signing with Cosign
# Generate a key pair (store private key in CI secrets)
cosign generate-key-pair
# Sign an image
cosign sign --key cosign.key registry.example.com/myapp:v1.0
# Verify the signature
cosign verify --key cosign.pub registry.example.com/myapp:v1.0Kubernetes admission control (OPA Gatekeeper + Sigstore Policy Controller) can enforce that only signed images are allowed. Example ConstraintTemplate and ClusterImagePolicy are omitted for brevity.
5. Kubernetes SecurityContext
apiVersion: v1
kind: Pod
metadata:
name: secure-app
spec:
securityContext:
runAsNonRoot: true
runAsUser: 10000
fsGroup: 10000
seccompProfile:
type: RuntimeDefault
containers:
- name: app
image: registry.example.com/myapp:v1.0
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
add: ["NET_BIND_SERVICE"]Verification commands:
# Check running user
kubectl exec secure-app -- id
# Attempt to write to the root filesystem (should fail)
kubectl exec secure-app -- touch /test.txt
# List capabilities
kubectl exec secure-app -- capsh --print6. NetworkPolicy – default deny and whitelist
# Default deny all traffic in the namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress # Example egress whitelist for a pod labeled app=myapp
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-app-egress
namespace: production
spec:
podSelector:
matchLabels:
app: myapp
policyTypes:
- Egress
egress:
- to:
- podSelector:
matchLabels:
app: mysql
ports:
- protocol: TCP
port: 3306
- to:
- namespaceSelector:
matchLabels:
name: kube-system
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
- to:
- ipBlock:
cidr: 203.0.113.0/24
ports:
- protocol: TCP
port: 4437. Pod Security Admission (Restricted)
# Enable Restricted PSA at the namespace level
kubectl label namespace production \
pod-security.kubernetes.io/enforce=restricted \
pod-security.kubernetes.io/audit=restricted \
pod-security.kubernetes.io/warn=restrictedThe Restricted profile blocks privileged containers, host namespaces, host ports, running as root, writable root filesystem without explicit volume, and any capabilities not explicitly added.
8. Runtime protection with Falco
Install Falco via Helm in eBPF mode:
helm repo add falcosecurity https://falcosecurity.github.io/charts
helm repo update
helm install falco falcosecurity/falco \
--namespace falco --create-namespace \
--set driver.kind=ebpf \
--set falcosidekick.enabled=true \
--set falcosidekick.webui.enabled=trueCustom rules (saved as /etc/falco/rules.d/custom.yaml) example:
- rule: UnauthorizedProcessInContainer
desc: Detect shell or package manager execution in production containers
condition: >
spawned_process and container and
(proc.name in (sh, bash, ash, zsh, apt, apt-get, yum, dnf)) and
container.image.repository != "debug-tools"
output: "Unauthorized process started (user=%user.name command=%proc.cmdline container=%container.name image=%container.image.repository)"
priority: WARNING
tags: [process, mitre_execution]
- rule: WriteToNonWhitelistedDirectory
desc: Detect file writes outside /tmp or /app/cache
condition: >
open_write and container and
not fd.directory in (/tmp, /app/cache) and
not fd.name startswith /proc
output: "File write to unexpected location (file=%fd.name command=%proc.cmdline container=%container.name)"
priority: ERROR
tags: [filesystem, mitre_persistence]
- rule: OutboundConnectionToSuspiciousIP
desc: Detect connections to known malicious IPs or unusual ports
condition: >
outbound and container and
(fd.sport in (22, 3389, 4444, 6667) or fd.sip in (198.51.100.0/24))
output: "Suspicious outbound connection (ip=%fd.sip port=%fd.sport command=%proc.cmdline container=%container.name)"
priority: CRITICAL
tags: [network, mitre_command_and_control]Trigger a rule by executing a shell in a pod and view logs:
kubectl exec -it myapp-pod -- /bin/sh
kubectl logs -n falco -l app.kubernetes.io/name=falco | grep "Unauthorized process"9. Image registry hardening with Harbor
Enable automatic scanning and severity enforcement via the Harbor API:
curl -u admin:Harbor12345 -X PUT \
"https://harbor.example.com/api/v2.0/projects/myproject" \
-H "Content-Type: application/json" \
-d '{
"metadata": {
"auto_scan": "true",
"severity": "high",
"prevent_vul": "true"
}
}'Configure a webhook policy that calls a Cosign verifier to reject unsigned images, and optionally add a CVE allowlist.
10. Periodic scanning and remediation
Example Bash script that scans all images running in a namespace, reports critical/high findings to Slack, and cleans old reports:
#!/bin/bash
set -euo pipefail
NAMESPACE="production"
REPORT_DIR="/var/log/trivy"
SLACK_WEBHOOK="https://hooks.slack.com/services/XXX"
mkdir -p "$REPORT_DIR"
IMAGES=$(kubectl get pods -n "$NAMESPACE" -o jsonpath='{.items[*].spec.containers[*].image}' | tr ' ' '
' | sort -u)
for IMAGE in $IMAGES; do
SAFE=$(echo "$IMAGE" | tr '/:' '_')
REPORT="$REPORT_DIR/${SAFE}_$(date +%Y%m%d).json"
trivy image --severity CRITICAL,HIGH --format json --output "$REPORT" "$IMAGE"
CRIT=$(jq '[.Results[].Vulnerabilities[] | select(.Severity=="CRITICAL")] | length' "$REPORT")
HIGH=$(jq '[.Results[].Vulnerabilities[] | select(.Severity=="HIGH")] | length' "$REPORT")
if [[ $CRIT -gt 0 || $HIGH -gt 5 ]]; then
MSG="⚠️ Image $IMAGE has $CRIT CRITICAL and $HIGH HIGH vulnerabilities."
curl -X POST -H 'Content-type: application/json' --data "{\"text\":\"$MSG\"}" "$SLACK_WEBHOOK"
fi
done
# Delete reports older than 30 days
find "$REPORT_DIR" -name "*.json" -mtime +30 -deleteDeploy the script as a Kubernetes CronJob that runs daily at 02:00, using a ServiceAccount with read access to pods.
11. Monitoring and alerting
Prometheus can scrape Falco exporter metrics via a ServiceMonitor:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: falco
namespace: falco
spec:
selector:
matchLabels:
app.kubernetes.io/name: falco
endpoints:
- port: metrics
interval: 30sSample alert rules (container‑security.yaml):
groups:
- name: container-security
rules:
- alert: HighSeverityVulnerabilitiesDetected
expr: sum(falco_events_total{priority="Critical"}) > 10
for: 5m
labels:
severity: critical
annotations:
summary: "High severity security events detected"
- alert: UnauthorizedProcessExecution
expr: rate(falco_events_total{rule="UnauthorizedProcessInContainer"}[5m]) > 0
for: 2m
labels:
severity: warning
annotations:
summary: "Shell execution detected in production container"
- alert: ImagePullFromUntrustedRegistry
expr: kube_pod_container_info{image!~"registry.example.com/.*"} == 1
for: 5m
labels:
severity: warning
annotations:
summary: "Pod using image from untrusted registry"
- alert: PodRunningAsRoot
expr: |
kube_pod_container_status_running{} == 1 and
on(namespace, pod, container) kube_pod_container_info{container_id!="", image!~".*debug.*"} unless on(namespace, pod, container) kube_pod_security_context_run_as_non_root == 1
for: 10m
labels:
severity: warning
annotations:
summary: "Container running as root user"12. Performance considerations
Typical Trivy scan times:
Alpine image ≈ 3‑5 s
Debian/Ubuntu ≈ 10‑20 s
Large > 1 GB ≈ 30‑60 s
Cache the vulnerability database on fast storage to reduce first‑run latency. Adjust Falco syscall_event_drops.threshold to limit CPU usage. Scale Harbor scan workers via max_job_workers according to CPU cores.
13. Compliance checks
Example mapping of standards to Kubernetes configuration:
CIS Docker 4.1 – runAsNonRoot: true CIS Docker 5.7 – readOnlyRootFilesystem: true NIST 800‑190 – CI Trivy scan before release
PCI‑DSS 6.2 – Weekly remediation of high‑severity CVEs
SOC 2 – Image signing with Cosign and audit logs
14. Common troubleshooting
Trivy timeout : network to NVD blocked – use --skip-db-update or an offline DB mirror.
Cosign verification fails : mismatched key or tag – verify the key pair and use image digest instead of mutable tag.
Pod rejected by PSA : violates Restricted – adjust SecurityContext or temporarily label namespace with baseline.
Falco high CPU : event sampling too aggressive – increase syscall_event_drops.threshold or switch to eBPF driver.
NetworkPolicy blocks legitimate traffic : missing egress rule – add the required to entry.
Harbor scan queue backlog : insufficient workers – raise max_job_workers and add scan nodes.
15. Change and rollback playbook
Canary deployment with pre‑flight scan and signature:
# Scan new image
trivy image --severity CRITICAL,HIGH myapp:v2.0 > scan.txt
if ! grep -q "Total: 0" scan.txt; then
echo "Vulnerabilities found – abort"
exit 1
fi
# Sign image
cosign sign --key cosign.key registry.example.com/myapp:v2.0
# Deploy canary (10% traffic)
kubectl set image deployment/myapp app=registry.example.com/myapp:v2.0
kubectl rollout pause deployment/myapp
kubectl wait --for=condition=ready pod -l app=myapp,version=v2.0 --timeout=300s
# Simple health check loop (5 min)
for i in {1..10}; do
ERR=$(kubectl logs -l app=myapp,version=v2.0 --tail=100 | grep -c ERROR || true)
if [[ $ERR -gt 5 ]]; then
echo "High error rate – rolling back"
kubectl rollout undo deployment/myapp
exit 1
fi
sleep 30
done
# Full rollout
kubectl rollout resume deployment/myapp
kubectl rollout status deployment/myapp --timeout=600sAn emergency rollback script that tags the current image, rolls back to the previous revision, rescans the rolled‑back image, and sends a Slack alert follows the same pattern.
16. Best practices summary
Multi‑stage builds with Distroless or minimal base to reduce attack surface.
CI blocks CRITICAL, alerts HIGH, ignores MEDIUM unless required.
Enforce image signatures in production; use immutable digests.
Apply Restricted PSA, read‑only root filesystem, non‑root user.
Namespace‑level default‑deny NetworkPolicy with explicit egress whitelist.
Run Trivy on every build, daily full scan in production, real‑time Falco monitoring for critical services.
Vulnerability response SLA: CRITICAL ≤ 24 h, HIGH ≤ 7 d, MEDIUM ≤ 30 d assessment.
Manage false positives with .trivyignore and ticket tracking.
Use dedicated ServiceAccount for image pulls; disable default SA.
Enable Kubernetes audit logs and Harbor access logs; retain ≥ 90 days.
17. References
Trivy releases: https://github.com/aquasecurity/trivy/releases
Cosign documentation: https://github.com/sigstore/cosign
Falco Helm chart: https://falcosecurity.github.io/charts
Harbor API v2.0: https://github.com/goharbor/harbor
Kubernetes Pod Security Admission: https://kubernetes.io/docs/concepts/security/pod-security-admission/
Raymond Ops
Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
