Operations 19 min read

Design and Implementation of Multi‑Cluster HPA Metrics Collection, Analysis, and Reporting in Kubernetes

This article explains the background, benefits, and measurement criteria of Kubernetes Horizontal‑Pod‑Autoscaler (HPA), describes the creation of metric tables and SQL queries for collecting scaling events and CPU usage, and presents a Python‑based workflow that aggregates the data, stores daily reports, validates results, and sends automated email summaries.

Qunar Tech Salon

Jun 22, 2022

Design and Implementation of Multi‑Cluster HPA Metrics Collection, Analysis, and Reporting in Kubernetes

Background

The Horizontal‑Pod‑Autoscaler (HPA) in Kubernetes automatically scales workloads based on CPU, memory, or custom metrics, which is especially important in multi‑cluster deployments where cluster weights can be manually or automatically adjusted to reflect real‑time resource availability.

Benefits of HPA

Improves resource utilization.

Reduces manual operational effort.

Provides clear visibility into workload resource demands for better operational decisions.

Measurement Indicators

Key metrics include scaling up/down counts, upper/lower limit adjustments, HPA thresholds, minimum/maximum replica counts, and peak/low‑peak CPU usage statistics.

Data Collection

A table tb_hpa_xxxx is created to store HPA metrics (uc, dc, maxuc, mindc, timestamps, etc.).

create table tb_hpa_xxxx(
  id SERIAL PRIMARY KEY,
  appcode varchar(256),
  uc int DEFAULT 0,
  dc int DEFAULT 0,
  maxuc int DEFAULT 0,
  mindc int DEFAULT 0,
  create_time timestamptz NOT NULL DEFAULT now(),
  update_time timestamptz NOT NULL DEFAULT now()
);
COMMENT ON TABLE tb_hpa_xxxx IS 'HPA指标收集';
...

SQL queries are defined to aggregate scaling events, high‑peak and low‑peak CPU usage, and pod counts per application code.

select G.*, N.min_replicas, N.max_replicas
from (
  select A.deployment_base as env, A.appcode, A.annotations as hpa,
         coalesce(M.uc,0) as uc, coalesce(M.dc,0) as dc,
         coalesce(M.maxuc,0) as maxuc, coalesce(M.mindc,0) as mindc
  from (... ) A
  left join (... ) M on M.appcode = A.appcode and M.env_name = A.deployment_base
) G
left join tb_k8s_appcode_hpa N on G.appcode = N.appcode and G.env = N.deployment_base;

Python Implementation

The HpaReport class implements methods to execute the above SQL statements, compute high‑peak and low‑peak statistics, and store the results in a reporting table tb_hpa_report_xxx. It also generates an HTML email with a detailed table of scaling counts, CPU usage percentiles, and pod numbers.

class HpaReport(Base):
    @try_catch_db_exception
    @commit_on_success
    def stats_hpa_updown(self, start_time, end_time):
        rows = db.session.execute(text("""
            select G.*, N.min_replicas, N.max_replicas ...
        """), {"start_time": start_time, "end_time": end_time})
        return self.rows_as_dicts(rows.cursor)

    def save_stats_result(self, day):
        hpa_stats_rows = self.stats_hpa_updown(...)
        hcpu_stats_rows = self.stats_high_time_cpu(...)
        lpods_stats_rows = self.stats_low_time_pods(...)
        # merge results and insert into tb_hpa_report_xxx
        for value in report_rows.values():
            model = HpaReportModel(record_time=day, **value)
            db.session.add(model)

Report Generation and Validation

The send_report_form method builds an HTML table summarizing daily HPA scaling events, CPU usage percentiles, and pod counts, then emails it to stakeholders. Validation steps compare the number of reported records with the actual HPA‑enabled applications and verify that scaling counts and CPU statistics match raw metrics.

Conclusion

Accurate data collection, cleaning, and aggregation are essential for reliable HPA reporting; without proper configuration, HPA may not reduce resource usage, especially when minimum replica settings are too high.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring Python SQL Operations Kubernetes HPA

Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.