Tagged articles
661 articles
Page 7 of 7
Java High-Performance Architecture
Java High-Performance Architecture
Feb 10, 2020 · Backend Development

How to Monitor Spring Boot Apps with Prometheus and Grafana: Step‑by‑Step Guide

This tutorial walks through building a Spring Boot application, integrating Micrometer for metric collection, deploying Prometheus and Grafana via Docker, configuring dynamic service discovery, and creating custom request‑count metrics with AOP, providing a complete end‑to‑end monitoring solution.

DockerGrafanaMicrometer
0 likes · 15 min read
How to Monitor Spring Boot Apps with Prometheus and Grafana: Step‑by‑Step Guide
360 Tech Engineering
360 Tech Engineering
Jan 7, 2020 · Operations

Introduction to Prometheus and Grafana for Monitoring and Alerting

This article provides a comprehensive overview of using Prometheus and Grafana for metric collection, storage, querying with PromQL, visualization, and alerting, including exporter integration, metric types, high‑availability setups, and practical examples for modern microservice architectures.

GrafanaMetricsPrometheus
0 likes · 10 min read
Introduction to Prometheus and Grafana for Monitoring and Alerting
Aikesheng Open Source Community
Aikesheng Open Source Community
Dec 25, 2019 · Operations

Deploying Thanos for Unified Prometheus Monitoring and Long‑Term Storage

This guide explains the background, key features, architecture, and step‑by‑step deployment of Thanos—including Sidecar, Store, Query, Compact, Bucket, Rule, and Check components—to provide a unified, high‑availability Prometheus monitoring view with unlimited historical data storage using object storage.

Cloud NativeDeploymentLong‑term Storage
0 likes · 9 min read
Deploying Thanos for Unified Prometheus Monitoring and Long‑Term Storage
Efficient Ops
Efficient Ops
Dec 24, 2019 · Operations

Scaling Real‑Time Monitoring for Billion‑Call Billing with Prometheus

Jiangsu Mobile’s IT operations team partnered with Newland to build a high‑availability, real‑time performance management platform using Prometheus, achieving billion‑level call‑record monitoring, low‑latency queries, data compression, and advanced forecasting, dramatically improving system health visibility and operational efficiency.

PrometheusTime Series Databaseperformance management
0 likes · 10 min read
Scaling Real‑Time Monitoring for Billion‑Call Billing with Prometheus
Huajiao Technology
Huajiao Technology
Dec 17, 2019 · Backend Development

Diagnosing Java Memory Leaks: JVM GC Roots, Monitoring with Spring Boot Actuator, Prometheus, Grafana, and MAT

This article explains how Java memory leaks can occur despite automatic garbage collection, describes JVM reachability analysis, shows how to monitor and detect leaks using Spring Boot Actuator, Prometheus, and Grafana, and provides step‑by‑step instructions for heap dump analysis and code fixes.

Garbage CollectionGrafanaJVM
0 likes · 11 min read
Diagnosing Java Memory Leaks: JVM GC Roots, Monitoring with Spring Boot Actuator, Prometheus, Grafana, and MAT
Alibaba Cloud Native
Alibaba Cloud Native
Nov 30, 2019 · Cloud Native

How Alibaba Cloud Manages Over 10,000 Kubernetes Clusters at Double‑11 Scale

This article explains how Alibaba Cloud Container Service (ACK) designs a unit‑based, tiered management system, capacity planning model, global observability architecture, and pluggable components to reliably operate more than ten thousand diverse Kubernetes clusters during the massive Double‑11 shopping event.

ACKAlibaba CloudCluster Management
0 likes · 13 min read
How Alibaba Cloud Manages Over 10,000 Kubernetes Clusters at Double‑11 Scale
MaGe Linux Operations
MaGe Linux Operations
Nov 26, 2019 · Operations

Master Prometheus: From Basics to Advanced Configuration and Alerts

This article introduces Prometheus, an open‑source monitoring system, explains its core components such as server, exporters, and Alertmanager, provides step‑by‑step installation and configuration instructions, demonstrates alert rule setup, and shows integration with tools like Grafana, Telegraf, Spring Boot and Canal.

AlertmanagerDevOpsGrafana
0 likes · 10 min read
Master Prometheus: From Basics to Advanced Configuration and Alerts
Alibaba Cloud Native
Alibaba Cloud Native
Nov 18, 2019 · Cloud Native

How Kubernetes Monitoring Evolved: From Heapster to Metrics‑Server and Prometheus

This article explains the fundamentals of monitoring and logging in large‑scale Kubernetes clusters, classifies monitoring types, traces the evolution from Heapster to the lightweight metrics‑server, outlines the three Kubernetes monitoring APIs, reviews Prometheus as the de‑facto standard, and describes Alibaba Cloud’s enhanced monitoring and logging solutions.

KubernetesPrometheuslogging
0 likes · 24 min read
How Kubernetes Monitoring Evolved: From Heapster to Metrics‑Server and Prometheus
Alibaba Cloud Native
Alibaba Cloud Native
Nov 14, 2019 · Cloud Native

What’s New in Cloud Native: Helm 3, Kubernetes 1.17, Istio Updates and More

This roundup highlights the latest cloud‑native announcements, including Helm 3’s stable release, the GitHub Octoverse language trends, upcoming KubeCon North America, CNCF’s Prometheus report, Kubernetes 1.17 code freeze, key upstream feature improvements, and a curated list of open‑source projects and reading recommendations.

KubernetesPrometheushelm
0 likes · 9 min read
What’s New in Cloud Native: Helm 3, Kubernetes 1.17, Istio Updates and More
dbaplus Community
dbaplus Community
Oct 28, 2019 · Operations

Avoid Common Prometheus Pitfalls: Best Practices for Reliable Monitoring

This article shares practical Prometheus best‑practice tips, covering the accuracy‑reliability trade‑off, self‑monitoring setups, avoiding NFS storage, pruning high‑cardinality metrics, handling rate‑function traps, alert‑graph mismatches, group_interval effects, and the overarching goal of stable, cost‑effective observability.

AlertingOperationsPrometheus
0 likes · 9 min read
Avoid Common Prometheus Pitfalls: Best Practices for Reliable Monitoring
Efficient Ops
Efficient Ops
Oct 22, 2019 · Operations

How Modern IT Monitoring Systems Keep Your Services Running Smoothly

This article explains the purpose, core functions, classification, layered architecture, and popular implementations of IT monitoring systems, covering log‑based, trace‑based, and metric‑based approaches as well as a comparison of Zabbix and Prometheus.

IT monitoringObservabilityPrometheus
0 likes · 17 min read
How Modern IT Monitoring Systems Keep Your Services Running Smoothly
Programmer DD
Programmer DD
Sep 20, 2019 · Operations

Master Prometheus: Key Features, Architecture, and Query Essentials

This article introduces Prometheus, an open‑source cloud‑native monitoring and alerting system, covering its main characteristics, core components, architecture diagram, typical use cases, query language syntax, built‑in functions, time‑series types, and practical tips for reliable operation.

AlertingOperationsPromQL
0 likes · 9 min read
Master Prometheus: Key Features, Architecture, and Query Essentials
dbaplus Community
dbaplus Community
Sep 16, 2019 · Operations

How to Build Effective Monitoring for Microservices: Logs, Tracing, and Metrics Explained

This article explains the three main monitoring approaches—log collection, distributed tracing, and metric gathering—in microservice architectures, outlines the layered monitoring model, lists key system, application, and user metrics, and reviews popular open‑source time‑series monitoring tools such as Prometheus, OpenTSDB, and InfluxDB.

MetricsMicroservicesObservability
0 likes · 10 min read
How to Build Effective Monitoring for Microservices: Logs, Tracing, and Metrics Explained
DevOps Cloud Academy
DevOps Cloud Academy
Sep 5, 2019 · Operations

An Overview of the Prometheus Monitoring System

Prometheus, an open‑source monitoring and alerting toolkit originally developed by SoundCloud and now a CNCF project, offers multidimensional data models, flexible queries, pull‑based data collection, various metric types (counter, gauge, summary, histogram), local and remote storage, service discovery, and integrates with Grafana for visualization.

Cloud NativeMetricsObservability
0 likes · 8 min read
An Overview of the Prometheus Monitoring System
Programmer DD
Programmer DD
Aug 13, 2019 · Operations

Mastering Prometheus Histograms: How Cumulative Buckets Simplify Metrics

This article explains the fundamentals of Prometheus histogram metrics, illustrates why they are cumulative, shows how to drop unwanted buckets with relabeling, and demonstrates quantile calculations using the histogram_quantile function, providing practical examples and code snippets for effective monitoring.

HistogramMetricsObservability
0 likes · 7 min read
Mastering Prometheus Histograms: How Cumulative Buckets Simplify Metrics
dbaplus Community
dbaplus Community
Jul 29, 2019 · Operations

How to Build a Cost‑Effective, Multi‑Layer Monitoring System for Distributed Applications

This article explains why comprehensive, multi‑layer monitoring is essential for distributed systems, outlines environment, program, and business metrics, recommends practical tools such as Zabbix, open‑falcon, Prometheus and Grafana, and provides a step‑by‑step evolution plan and alerting strategy.

Distributed SystemsMetricsObservability
0 likes · 10 min read
How to Build a Cost‑Effective, Multi‑Layer Monitoring System for Distributed Applications
dbaplus Community
dbaplus Community
Jul 23, 2019 · Cloud Native

How Xiaomi Scaled Kubernetes Monitoring with Prometheus and Open‑Falcon

This article details Xiaomi's Ocean elastic scheduling platform's challenges in monitoring massive Kubernetes clusters, the transition from Open‑Falcon to a Prometheus‑based solution with remote storage, partitioned deployment strategies, performance testing, and future plans for automated scaling and data analytics.

Cloud NativeKubernetesPrometheus
0 likes · 16 min read
How Xiaomi Scaled Kubernetes Monitoring with Prometheus and Open‑Falcon
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Jul 18, 2019 · Operations

Why Bosun Beats Alertmanager and Kapacitor for Container Alerting

This article compares three container alerting frameworks—Alertmanager, Kapacitor, and Bosun—explains why Bosun was chosen for its flexible HTTP API rule deployment and low learning curve, and provides step‑by‑step configuration, rule definition, notification, and templating examples for integrating Bosun with Prometheus.

AlertingBosunConfiguration
0 likes · 9 min read
Why Bosun Beats Alertmanager and Kapacitor for Container Alerting
dbaplus Community
dbaplus Community
Jul 17, 2019 · Databases

Rethinking Prometheus TSDB: From V2 Bottlenecks to the Scalable V3 Design

This article examines the limitations of Prometheus's original V2 time‑series storage, proposes a block‑oriented V3 architecture that tackles series churn, write amplification, and indexing inefficiencies, and validates the new design with extensive benchmarks showing dramatic reductions in memory, CPU, and disk usage.

KubernetesPrometheusTSDB
0 likes · 36 min read
Rethinking Prometheus TSDB: From V2 Bottlenecks to the Scalable V3 Design
ITPUB
ITPUB
Jun 21, 2019 · Cloud Native

Building a Scalable, High‑Availability Kubernetes Monitoring System with Prometheus

This article details the design and evolution of a highly available, persistent, and dynamically adjustable Kubernetes monitoring solution at Xiaomi, covering initial Falcon‑based approaches, the transition to Prometheus with remote storage via OpenTSDB, federation‑based partitioning, deployment strategies, performance testing, and future enhancements.

Cloud NativeFalconKubernetes
0 likes · 17 min read
Building a Scalable, High‑Availability Kubernetes Monitoring System with Prometheus
DevOps Cloud Academy
DevOps Cloud Academy
Jun 20, 2019 · Operations

Step-by-Step Installation and Configuration of Node Exporter, Alertmanager, Prometheus, and Grafana for Monitoring and Alerting

This guide walks through downloading, extracting, and setting up Node Exporter, Alertmanager, Prometheus, and Grafana on a Linux server, configuring their systemd services, customizing alert rules, and verifying the monitoring and alerting pipeline with screenshots of each verification step.

AlertmanagerGrafanaOperations
0 likes · 7 min read
Step-by-Step Installation and Configuration of Node Exporter, Alertmanager, Prometheus, and Grafana for Monitoring and Alerting
DevOps Cloud Academy
DevOps Cloud Academy
Jun 9, 2019 · Operations

Prometheus Metric Definitions, Types, and Data Samples

This article explains Prometheus metric naming conventions, label usage, metric types such as Counter, Gauge, Summary, and Histogram, and describes the structure of data samples, providing examples and best‑practice guidelines for defining and classifying metrics in monitoring systems.

MetricsObservabilityOperations
0 likes · 5 min read
Prometheus Metric Definitions, Types, and Data Samples
dbaplus Community
dbaplus Community
Apr 24, 2019 · Operations

Choosing and Tuning Open‑Source Monitoring Stacks for Large‑Scale Operations

This article reviews common open‑source monitoring tools, shares the evolution of China Unicom's big‑data platform monitoring, and provides practical guidance on selecting collectors, databases, and visualization components, with detailed configurations for Prometheus, Alertmanager, Grafana, and automation recovery techniques.

AlertmanagerGrafanaInfluxDB
0 likes · 19 min read
Choosing and Tuning Open‑Source Monitoring Stacks for Large‑Scale Operations
58 Tech
58 Tech
Apr 19, 2019 · Operations

Prometheus-Based Monitoring Solution for the 58 Cloud Search Platform

This article describes the challenges of scaling the 58 Cloud Search service, explains why Prometheus was selected as the monitoring stack, and details the architecture, data collection, storage, alerting, visualization, and future enhancements of the resulting cloud‑native monitoring system.

AlertmanagerCloud NativeGrafana
0 likes · 12 min read
Prometheus-Based Monitoring Solution for the 58 Cloud Search Platform
Efficient Ops
Efficient Ops
Apr 18, 2019 · Operations

Choosing the Right Monitoring Stack: From Nagios to Prometheus & Grafana

This article reviews common open‑source monitoring combinations, compares their strengths and weaknesses, and shares practical guidance on selecting collectors, storage back‑ends, and visualization tools such as Telegraf, InfluxDB, Prometheus, Grafana, and alertmanager for large‑scale data platform operations.

GrafanaInfluxDBNagios
0 likes · 12 min read
Choosing the Right Monitoring Stack: From Nagios to Prometheus & Grafana
Programmer DD
Programmer DD
Jan 24, 2019 · Cloud Native

What’s New in Nacos 0.8.0? Key Features, Installation & First‑Run Guide

The article introduces Nacos 0.8.0, highlighting its three major production features—user login, Prometheus metrics, and namespace isolation—while providing step‑by‑step download links, startup commands for Linux and Windows, and instructions to access the default login console.

Cloud NativePrometheusinstallation guide
0 likes · 4 min read
What’s New in Nacos 0.8.0? Key Features, Installation & First‑Run Guide
360 Tech Engineering
360 Tech Engineering
Dec 18, 2018 · Cloud Native

Design and Implementation of 360 Container Platform Monitoring System

The article describes how 360 built a Kubernetes‑based container platform monitoring system using Prometheus, ELK, Grafana and custom components, detailing its architecture, monitoring dimensions, log collection, alerting, selection rationale, high‑availability design, and future evolution for scalable cloud‑native operations.

KubernetesPrometheuscontainer platform
0 likes · 12 min read
Design and Implementation of 360 Container Platform Monitoring System
Liulishuo Tech Team
Liulishuo Tech Team
Dec 14, 2018 · Mobile Development

Engineering Practice: Building an Android Application Performance Management (APM) Dashboard

This article details the architectural design and engineering practices behind building a comprehensive Application Performance Management dashboard for Android applications, covering real-time monitoring, version comparison, development cycle tracking, automated data collection, and integrated test coverage analysis to ensure sustainable software quality and delivery efficiency.

APMAndroid DevelopmentGrafana
0 likes · 21 min read
Engineering Practice: Building an Android Application Performance Management (APM) Dashboard
Efficient Ops
Efficient Ops
Jun 11, 2018 · Operations

How to Build Low-Cost Automated Operations with Prometheus, Ansible, and Jenkins

This guide walks small teams through step‑by‑step implementation of low‑cost automated operations, covering basic monitoring with Prometheus, configuration versioning via Ansible, CI/CD pipelines using Jenkins, and scaling practices, enabling gradual evolution toward enterprise‑grade DevOps architectures.

AnsibleDevOpsJenkins
0 likes · 12 min read
How to Build Low-Cost Automated Operations with Prometheus, Ansible, and Jenkins
UCloud Tech
UCloud Tech
Nov 22, 2017 · Backend Development

Master Go Microservices: gRPC, TLS, Tracing & Prometheus Monitoring

This article shares practical Go microservice building experiences, covering gRPC-based communication, TLS security, request tracing, and comprehensive monitoring with Prometheus, including metric selection, alerting, and log management using Logrus and Graylog, to help reduce coupling and improve system observability.

MicroservicesPrometheusgRPC
0 likes · 10 min read
Master Go Microservices: gRPC, TLS, Tracing & Prometheus Monitoring
dbaplus Community
dbaplus Community
Nov 19, 2017 · Operations

Designing Scalable Monitoring with ELK and GPE: A Practical Guide

This article outlines a large‑scale monitoring solution for distributed microservice environments, comparing traditional ELK logging with a custom GPE stack (Grafana, Prometheus, Exporter, Consul), detailing architecture, components, workflows, and practical considerations for reliable observability.

ELKGrafanaPrometheus
0 likes · 10 min read
Designing Scalable Monitoring with ELK and GPE: A Practical Guide
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Aug 30, 2017 · Operations

Mastering Prometheus: From Metrics Basics to High‑Availability Monitoring

This article shares practical experiences of using Prometheus for monitoring complex services, covering metric types, PromQL query techniques, naming conventions, service discovery with file‑based configs, high‑availability sharding, alerting via Alertmanager, and visualisation with Grafana, providing actionable guidance for reliable observability.

GrafanaPromQLPrometheus
0 likes · 15 min read
Mastering Prometheus: From Metrics Basics to High‑Availability Monitoring
DevOps
DevOps
Jul 12, 2017 · Cloud Native

Container Monitoring: Challenges, Metrics Collection, and Best Practices

This article examines the unique challenges of monitoring containers, outlines three categories of metrics to collect, compares host‑centric and layered monitoring architectures, provides detailed methods for gathering CPU, memory, I/O and network data via cgroup files and Docker commands, and shares practical insights, tooling recommendations, and a Q&A session for effective container observability.

DockerOpsPrometheus
0 likes · 18 min read
Container Monitoring: Challenges, Metrics Collection, and Best Practices
Efficient Ops
Efficient Ops
Jun 11, 2017 · Operations

How Bilibili Scaled Its Ops: From DIY Deployments to Prometheus Monitoring

From early manual deployments to a sophisticated, multi-layered monitoring stack—including ELK, Zabbix, Statsd, Grafana, and Prometheus—Bilibili’s ops team shares the evolution, challenges, and lessons learned in building scalable, automated infrastructure for massive internet traffic.

DevOpsELKGrafana
0 likes · 8 min read
How Bilibili Scaled Its Ops: From DIY Deployments to Prometheus Monitoring
dbaplus Community
dbaplus Community
Jun 5, 2017 · Cloud Native

How to Tackle Performance Optimization in Large‑Scale Kubernetes PaaS Platforms

This article examines the daunting performance‑optimization challenges of a complex PaaS architecture, breaks the system into control, data, and monitoring subsystems, defines concrete metrics, demonstrates testing with Prometheus and other tools, and shares practical automation techniques to accelerate iterative improvements.

Cloud NativeKubernetesPaaS
0 likes · 16 min read
How to Tackle Performance Optimization in Large‑Scale Kubernetes PaaS Platforms
dbaplus Community
dbaplus Community
Aug 19, 2016 · Operations

Unlocking System Reliability: The Value and Complete Architecture of Monitoring for Containers

This article explains why monitoring is essential for system reliability, outlines the key components of a comprehensive monitoring framework, compares data collection methods, and presents practical container monitoring solutions—from Docker stats to cAdvisor with InfluxDB and Grafana, as well as Kubernetes and Mesos integrations.

GrafanaKubernetesPrometheus
0 likes · 14 min read
Unlocking System Reliability: The Value and Complete Architecture of Monitoring for Containers