Cloud Native 6 min read

Answering the Top 9 Questions About Monitoring in Kubernetes

This article discusses essential Kubernetes monitoring topics, including cost tracking, tool selection, observability frameworks, responsibility allocation, baseline establishment, namespace best practices, the importance of monitoring, backup solutions, and a comparison of Datadog versus Splunk for metrics.

Cloud Native Technology Community

Jul 9, 2024

Answering the Top 9 Questions About Monitoring in Kubernetes

In Kubernetes you can monitor many aspects, but it is crucial to identify the most important monitoring points. A recent webinar covered what should be monitored on the Kubernetes platform, best practices to follow, and why monitoring is vital for cloud‑native application development.

1. How to monitor cost? Understanding the current cost of each workload, the reasons behind cost generation, and ways to reduce it is essential. Monitoring helps detect workloads that consistently hit CPU or memory thresholds, enabling resource allocation adjustments and cost analysis over time.

2. How to choose monitoring tools? Prometheus, Grafana, Datadog, and Fairwinds Insights work well together. The first three provide real‑time monitoring to alert on application issues, while Fairwinds Insights helps discover misconfigurations, over‑provisioning, or configuration errors, offering a comprehensive monitoring solution.

3. What is the best observability framework in Kubernetes? Several options exist. Internally we use Datadog for its ease of use and powerful features. OpenTelemetry is an open standard offering APIs, SDKs, and tools for generating, collecting, and exporting telemetry data. Prometheus and Grafana are also top choices, with selection depending on usability, cost, and community support.

4. Who is responsible for application metrics and dashboard alerts? Ideally a platform or SRE team monitors core node metrics, Kubernetes services, control plane, and add‑on components, while application teams monitor logs generated by their applications, including job starts and scaling events, requiring collaboration when resource constraints arise.

5. How to establish a baseline before adopting adaptive golden signal tracking? Baseline creation is an ongoing process that involves continuously refining monitoring content, dashboard displays, and alert settings. Start by tracking latency, traffic, errors, and saturation—the four golden signals—and observe normal behavior, especially during early application startup.

6. What are best practices for namespaces? Using namespaces is a best practice. Avoid deploying all applications in the default namespace to prevent permission and resource management chaos. Separate namespaces by team or application, optionally creating hierarchical structures or using labels for further distinction.

7. Why is monitoring critical in a Kubernetes platform? Monitoring is indispensable regardless of whether Kubernetes is used. Without it you cannot understand the health of your environment or detect issues promptly. Monitoring helps identify performance degradation or resource pressure, preventing negative user experiences.

8. Is there a backup system to export the entire cluster or specific nodes? The open‑source solution Velero can back up and restore Kubernetes clusters. It is recommended to define all infrastructure as code so that a failed cluster can be quickly rebuilt by re‑applying the code.

9. Which is better for metric monitoring, Datadog or Splunk? There is no fixed recommendation; internally we use Datadog for its strong log management and Kubernetes metric integration. Splunk offers similar capabilities, so testing both on a small cluster is advised to determine the best fit.

Ensuring proper monitoring in Kubernetes helps manage complexity, collect cluster events, logs, and trace metrics, and set alerts for rapid issue response.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring Observability Kubernetes Prometheus Datadog

Written by

Cloud Native Technology Community

The Cloud Native Technology Community, part of the CNBPA Cloud Native Technology Practice Alliance, focuses on evangelizing cutting‑edge cloud‑native technologies and practical implementations. It shares in‑depth content, case studies, and event/meetup information on containers, Kubernetes, DevOps, Service Mesh, and other cloud‑native tech, along with updates from the CNBPA alliance.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.