Tagged articles
4046 articles
Page 1 of 41
Ops Community
Ops Community
May 17, 2026 · Cloud Native

Istio Service Mesh Basics: What Is the Sidecar Pattern and Why Microservices Need It?

The article explains how traditional microservice architectures embed network concerns such as time‑outs, retries, circuit breaking, traffic monitoring and mTLS in application code, why this leads to code coupling, upgrade difficulty and duplicated effort, and how Istio’s sidecar‑based service mesh cleanly separates those concerns while providing traffic management, observability and security features.

EnvoyIstioKubernetes
0 likes · 30 min read
Istio Service Mesh Basics: What Is the Sidecar Pattern and Why Microservices Need It?
MaGe Linux Operations
MaGe Linux Operations
May 16, 2026 · Cloud Native

Why Pods Are the Most Powerful Unit in Kubernetes – A Deep Dive

This article provides a comprehensive, step‑by‑step analysis of Kubernetes Pods, covering their design as a shared‑namespace container group, the role of the pause (infra) container, creation flow, lifecycle phases, resource requests and limits, QoS classes, scheduling mechanics, volume types, and detailed troubleshooting techniques with concrete command‑line examples.

KubernetesNamespacePod
0 likes · 30 min read
Why Pods Are the Most Powerful Unit in Kubernetes – A Deep Dive
MaGe Linux Operations
MaGe Linux Operations
May 14, 2026 · Operations

Ops Veteran's Secret: Master These 10 Tools to Cut Overtime by 80%

The article lists ten essential Linux operations tools—Shell scripting, Git, Ansible, Prometheus, Grafana, Docker, Kubernetes, Nginx, ELK Stack, and Zabbix—detailing their functions, typical scenarios, advantages, and concrete usage examples, helping engineers streamline daily tasks and reduce overtime.

AnsibleDockerELK Stack
0 likes · 9 min read
Ops Veteran's Secret: Master These 10 Tools to Cut Overtime by 80%
Ops Community
Ops Community
May 13, 2026 · Operations

Kubernetes Node Failures: One‑Stop Guide to Diagnose and Fix Common Issues

This comprehensive guide walks Kubernetes operators through a step‑by‑step process for diagnosing node health problems—such as NotReady, MemoryPressure, DiskPressure, PIDPressure, and NetworkUnavailable—by examining node conditions, reviewing events, checking system resources, inspecting component logs, applying targeted fixes, and verifying recovery, all illustrated with real‑world commands and examples.

CNIDiskPressureKubernetes
0 likes · 44 min read
Kubernetes Node Failures: One‑Stop Guide to Diagnose and Fix Common Issues
Coder Trainee
Coder Trainee
May 13, 2026 · Cloud Native

Spring Cloud Microservices Revised Edition – Intro and New Tech Stack

After finishing the Spring Boot source‑code series, the author launches a refreshed Spring Cloud microservices tutorial built on Spring Boot 3.x, Jakarta EE, GraalVM native images, full production‑grade demos, Kubernetes deployment, observability and performance testing, outlining a 12‑episode roadmap.

KubernetesMicroservicesNacos
0 likes · 7 min read
Spring Cloud Microservices Revised Edition – Intro and New Tech Stack
Weekly Large Model Application
Weekly Large Model Application
May 6, 2026 · Cloud Native

How OpenAI Scales Low-Latency Voice AI with WebRTC: Architecture Deep Dive

The article dissects OpenAI's engineering approach to delivering low‑latency voice AI at scale, explaining why WebRTC was chosen, how a Relay + Transceiver split solves Kubernetes integration challenges, the use of ICE ufrag for deterministic routing, and how global relay and implementation choices reduce perceived latency.

KubernetesLow latencyOpenAI
0 likes · 9 min read
How OpenAI Scales Low-Latency Voice AI with WebRTC: Architecture Deep Dive
MaGe Linux Operations
MaGe Linux Operations
May 3, 2026 · Cloud Native

How to Troubleshoot Kubernetes NotReady Nodes: A Complete Step‑by‑Step Guide

This article walks Kubernetes operators through a systematic investigation of NotReady node symptoms, explaining the kubelet status mechanism, detailing each diagnostic step—from verifying node conditions with kubectl to checking kubelet, container runtime, resources, network, and certificates—and providing concrete remediation and preventive measures.

KubernetesNotReadycontainerd
0 likes · 35 min read
How to Troubleshoot Kubernetes NotReady Nodes: A Complete Step‑by‑Step Guide
Coder Trainee
Coder Trainee
May 2, 2026 · Cloud Native

Spring Cloud Microservices Series #10: Key Takeaways and Best Practices

This article reviews the entire Spring Cloud microservices series, presents a full technology stack diagram, outlines production‑grade best practices for service decomposition, configuration, remote calls, rate limiting, databases, logging and monitoring, lists common pitfalls, offers performance‑tuning tips, discusses the pros and cons of microservices, and points to future directions such as service mesh, serverless and cloud‑native adoption.

Configuration ManagementKubernetesMicroservices
0 likes · 14 min read
Spring Cloud Microservices Series #10: Key Takeaways and Best Practices
Coder Trainee
Coder Trainee
May 1, 2026 · Cloud Native

Containerizing Spring Cloud Microservices with Docker and Kubernetes (Part 9)

This article explains why traditional deployment is problematic, then walks through building Docker images, composing services with Docker‑Compose, deploying to a Kubernetes cluster, setting up CI/CD pipelines, and addressing common pitfalls such as slow starts and service discovery failures.

DockerDocker ComposeKubernetes
0 likes · 12 min read
Containerizing Spring Cloud Microservices with Docker and Kubernetes (Part 9)
MaGe Linux Operations
MaGe Linux Operations
Apr 30, 2026 · Cloud Native

Kubernetes Service Connectivity Issues? A Step‑by‑Step Guide from Pods to Services to Ingress

This article provides a systematic, layer‑by‑layer troubleshooting guide for Kubernetes service connectivity problems, covering pod health, service and endpoint configuration, kube‑proxy rules, CNI plugins, Ingress controllers, DNS resolution, and NetworkPolicy, with concrete commands, examples, and preventive scripts.

IngressKubernetesPod
0 likes · 39 min read
Kubernetes Service Connectivity Issues? A Step‑by‑Step Guide from Pods to Services to Ingress
Data STUDIO
Data STUDIO
Apr 28, 2026 · Backend Development

FastAPI in Production: Auth, Rate Limiting, and Zero‑Downtime with One Codebase

This article walks through a complete production‑ready FastAPI setup, covering secure OIDC/JWKS authentication, Redis‑backed token‑bucket rate limiting, zero‑downtime rolling deployments on Docker/Kubernetes, and observability best practices such as request‑ID middleware and structured JSON logging.

AuthenticationDockerFastAPI
0 likes · 20 min read
FastAPI in Production: Auth, Rate Limiting, and Zero‑Downtime with One Codebase
dbaplus Community
dbaplus Community
Apr 27, 2026 · Cloud Native

When MTU Misconfiguration Turns Into a Two‑Day Network Mystery

A two‑day investigation of intermittent packet loss in a hybrid‑cloud Kubernetes environment revealed that an oversized VXLAN MTU caused fragmentation, prompting a step‑by‑step analysis of MTU fundamentals, diagnostic commands, Cilium configuration changes, and best‑practice recommendations for cloud‑native networks.

CiliumKubernetesMTU
0 likes · 30 min read
When MTU Misconfiguration Turns Into a Two‑Day Network Mystery
ITPUB
ITPUB
Apr 27, 2026 · Cloud Native

Why Skipping Backups Makes Kubernetes Operations Impossible

The article explains that running production Kubernetes clusters without regular backup and recovery plans exposes businesses to severe risks such as cluster failures, data loss, and prolonged downtime, and it details practical etcd physical and Velero logical backup strategies to mitigate these threats.

BackupCloud NativeKubernetes
0 likes · 9 min read
Why Skipping Backups Makes Kubernetes Operations Impossible
DevOps Coach
DevOps Coach
Apr 26, 2026 · Cloud Native

Accelerating Kubernetes Automation: Mastering GitOps Best Practices

This guide explains GitOps fundamentals—declarative, versioned, automated deployments—and shows how tools like Argo CD, Flux, Helm, Kustomize, Tekton, and Sealed Secrets can speed up Kubernetes delivery, improve reliability, enhance security, and foster better collaboration across DevOps teams.

Argo CDCloud NativeGitOps
0 likes · 16 min read
Accelerating Kubernetes Automation: Mastering GitOps Best Practices
AI Explorer
AI Explorer
Apr 26, 2026 · Artificial Intelligence

Take Control of AI: Choose Any Model and Keep Your Data Private

Thunderbolt, an open‑source AI client from Mozilla’s Thunderbird team, lets developers pick any OpenAI‑compatible model, run it on‑premises via Docker or Kubernetes, and keep all conversation data on their own servers, eliminating vendor lock‑in and enhancing privacy.

AI clientDockerKubernetes
0 likes · 6 min read
Take Control of AI: Choose Any Model and Keep Your Data Private
DevOps Coach
DevOps Coach
Apr 24, 2026 · Cloud Native

After Years Using Kubernetes, I Finally Grasped CRDs – Build One from Scratch

The article reveals why most Kubernetes engineers use Custom Resource Definitions without truly understanding them, explains how CRDs act as the language that extends the Kubernetes API, and provides a step‑by‑step walkthrough to create a production‑ready DatabaseCluster CRD, interact with it via kubectl and the Python client, and avoid common pitfalls.

API extensionCRDCustomResourceDefinition
0 likes · 17 min read
After Years Using Kubernetes, I Finally Grasped CRDs – Build One from Scratch
Cloud Native Technology Community
Cloud Native Technology Community
Apr 24, 2026 · Cloud Native

Kubernetes v1.36 “Haru”: Why Some Changes Aren’t Worth the Wait

Kubernetes v1.36 focuses on clearing technical debt rather than adding flashy features, retiring ingress‑nginx, tightening kubelet API auth, optimizing SELinux mounts, externalizing ServiceAccount token signing, expanding DRA for GPU scheduling, graduating MutatingAdmissionPolicy, and removing long‑standing legacy components, all accompanied by a concrete upgrade checklist.

DRAKubernetesMutatingAdmissionPolicy
0 likes · 15 min read
Kubernetes v1.36 “Haru”: Why Some Changes Aren’t Worth the Wait
Ray's Galactic Tech
Ray's Galactic Tech
Apr 23, 2026 · Backend Development

Stop Treating LLMs as 'All‑Purpose Tools': Practical Spring AI Multi‑Agent Architecture for Production

This article analyses why a single‑agent LLM approach quickly hits scalability, context, and governance limits, and presents a production‑ready Spring AI Multi‑Agent design—including layered architecture, agent metadata, skill engineering, routing strategies, orchestration, resilience, A2A service discovery, Kubernetes deployment, observability, security, and cost‑control—backed by concrete Java code examples.

A2AJavaKubernetes
0 likes · 38 min read
Stop Treating LLMs as 'All‑Purpose Tools': Practical Spring AI Multi‑Agent Architecture for Production
DevOps Coach
DevOps Coach
Apr 22, 2026 · Operations

2026 AI DevOps Outlook: 10 Must‑Watch MCP Servers Transforming SRE

The article surveys the rapidly growing Model Context Protocol (MCP) ecosystem in 2026, detailing ten AI‑enabled DevOps servers, their core capabilities, real‑world impact on SRE workflows, and a practical framework for selecting the most valuable servers for a given team.

AI DevOpsInfrastructure as CodeKubernetes
0 likes · 16 min read
2026 AI DevOps Outlook: 10 Must‑Watch MCP Servers Transforming SRE
Ray's Galactic Tech
Ray's Galactic Tech
Apr 22, 2026 · Cloud Native

Solving K8s Stateful App Storage Pain: Production-Ready Longhorn + MySQL StatefulSet

This article dissects the challenges of running MySQL as a stateful workload on Kubernetes, explains why storage, consistency, and fail‑over are the real pain points, and provides a production‑grade solution that combines Longhorn distributed block storage with a carefully engineered MySQL 8.0 StatefulSet, complete with YAML manifests, performance tuning, backup strategies, and disaster‑recovery playbooks.

KubernetesLonghornStatefulSet
0 likes · 50 min read
Solving K8s Stateful App Storage Pain: Production-Ready Longhorn + MySQL StatefulSet
Raymond Ops
Raymond Ops
Apr 22, 2026 · Operations

How Prometheus Recording Rules Can Reduce Alert Noise by 70%

This guide explains how to use Prometheus Recording Rules to pre‑compute, aggregate, and smooth metrics in large‑scale microservice environments, cutting daily alert noise by up to 70% through hierarchical alert design, practical examples, and best‑practice recommendations.

Alert Noise ReductionDevOpsKubernetes
0 likes · 22 min read
How Prometheus Recording Rules Can Reduce Alert Noise by 70%
Full-Stack DevOps & Kubernetes
Full-Stack DevOps & Kubernetes
Apr 22, 2026 · Operations

Avoid 90% of Kubernetes Ops Pitfalls: A Definitive Guide

This guide outlines the five most common Kubernetes operational pitfalls, offers step‑by‑step remediation practices, introduces three emerging trends such as AI‑assisted troubleshooting, serverless clusters, and Tekton CI/CD, and provides three ready‑to‑copy kubectl commands to streamline daily management.

DevOpsKubernetesOperations
0 likes · 9 min read
Avoid 90% of Kubernetes Ops Pitfalls: A Definitive Guide
Java Backend Full-Stack
Java Backend Full-Stack
Apr 20, 2026 · Backend Development

What Skills Should a 3‑Year Java Backend Developer Master?

The article outlines a comprehensive skill matrix for a three‑year Java backend engineer, covering core Java and JVM knowledge, mainstream frameworks, storage, messaging, containerization, architecture, engineering practices, soft skills, and emerging trends such as AI integration and reactive programming.

Distributed SystemsDockerJVM
0 likes · 9 min read
What Skills Should a 3‑Year Java Backend Developer Master?
Ray's Galactic Tech
Ray's Galactic Tech
Apr 19, 2026 · Cloud Native

Building a Production‑Ready Cloud‑Native Kubernetes Platform: From Zero to SRE Success

This article presents a step‑by‑step guide to designing and implementing a production‑grade Kubernetes platform with GitOps, observability, capacity governance, fault‑injection, and SRE practices, showing how to achieve unified delivery, reliability, and low‑cost operation for high‑concurrency business services.

Cloud NativeGitOpsInfrastructure
0 likes · 37 min read
Building a Production‑Ready Cloud‑Native Kubernetes Platform: From Zero to SRE Success
Raymond Ops
Raymond Ops
Apr 19, 2026 · Cloud Native

How to Double K8s Ingress Performance: Nginx vs Envoy Gateway Tuning Guide

This article walks through a real‑world performance bottleneck on a high‑traffic e‑commerce platform, explains step‑by‑step deep tuning of Nginx Ingress Controller, compares it with Envoy Gateway, and provides concrete configurations, benchmark results, monitoring rules, and best‑practice recommendations for Kubernetes Ingress optimization.

EnvoyIngressKubernetes
0 likes · 27 min read
How to Double K8s Ingress Performance: Nginx vs Envoy Gateway Tuning Guide
MaGe Linux Operations
MaGe Linux Operations
Apr 19, 2026 · Cloud Native

Unlock the Full Deployment‑to‑Service Workflow in Kubernetes

This comprehensive guide walks operators through the entire Kubernetes workflow from creating a Deployment to exposing a Service, explaining core resources, control loops, scheduling, networking, rolling updates, troubleshooting steps, best‑practice configurations, performance tuning, and security hardening.

Cloud NativeDeploymentKubernetes
0 likes · 29 min read
Unlock the Full Deployment‑to‑Service Workflow in Kubernetes
Cloud Native Technology Community
Cloud Native Technology Community
Apr 17, 2026 · Cloud Native

What’s New in Kube-OVN v1.16.0? Key Features and Improvements Explained

Kube-OVN v1.16.0 introduces major enhancements such as BGP/EVPN‑enabled VPC egress, a tiered SecurityGroup with expanded priority range, per‑NIC DHCP control, multi‑network NetworkPolicy annotations, full‑NIC hot migration for KubeVirt, static IP/MAC per interface, and numerous reliability, performance, and Helm chart upgrades.

BGPCNIEVPN
0 likes · 6 min read
What’s New in Kube-OVN v1.16.0? Key Features and Improvements Explained
Black & White Path
Black & White Path
Apr 17, 2026 · Information Security

Threat Alert: Cloud‑Native Cybercrime Group TeamPCP Targets Docker, Kubernetes, and Redis

TeamPCP, a newly identified cloud‑native threat group, has compromised at least 60,000 servers worldwide by exploiting exposed Docker APIs, Kubernetes clusters, Redis instances, and the React2Shell vulnerability, employing automated tools such as proxy.sh, kube.py, and react.py, with detailed MITRE ATT&CK mapping and concrete defense recommendations.

DockerKubernetesMITRE ATT&CK
0 likes · 16 min read
Threat Alert: Cloud‑Native Cybercrime Group TeamPCP Targets Docker, Kubernetes, and Redis
AI Tech Publishing
AI Tech Publishing
Apr 16, 2026 · Cloud Native

Deploying a Stateful AI Agent on a Stateless Web Architecture: Challenges, Solutions, and Code Walkthrough

This article analyzes the fundamental conflict between stateful AI agents and the inherently stateless, distributed nature of modern web services, explores time, state, and execution model mismatches, and presents a practical Agent‑as‑API solution using FastAPI, Redis, SSE, and Kubernetes to achieve scalable, fault‑tolerant deployments.

AI AgentFastAPIKubernetes
0 likes · 30 min read
Deploying a Stateful AI Agent on a Stateless Web Architecture: Challenges, Solutions, and Code Walkthrough
Ctrip Technology
Ctrip Technology
Apr 16, 2026 · Big Data

How Ray + DuckDB Cut 9B-Row Attribution Queries from 40s to 15s

When attribution analysis on over 900 million rows slowed to more than 40 seconds and threatened cluster stability, Ctrip's smart attribution team rebuilt the architecture with Ray and DuckDB, achieving sub‑15‑second query times, 160 % performance gain, and complete resource isolation.

Attribution AnalysisBig DataDuckDB
0 likes · 22 min read
How Ray + DuckDB Cut 9B-Row Attribution Queries from 40s to 15s
Java Web Project
Java Web Project
Apr 16, 2026 · Backend Development

How I Resolved a 13‑Hour OOM Nightmare in a Spring Boot Service

The article walks through a 13‑hour out‑of‑memory incident on a Spring Boot 2.7 service running in Kubernetes, detailing how to preserve the crash dump, interpret GC logs, use MAT and Arthas to pinpoint a static HashMap leak, and apply both temporary and permanent fixes while hardening the system for future safety.

ArthasJVMJava
0 likes · 18 min read
How I Resolved a 13‑Hour OOM Nightmare in a Spring Boot Service
Java Architect Essentials
Java Architect Essentials
Apr 15, 2026 · Backend Development

Spring 7.0.4: Hidden Deadlock Fix and 30‑50% Startup Boost for K8s Apps

The article analyzes a nondeterministic deadlock bug in Spring 7.0.0‑7.0.3 that surfaces in Kubernetes pods, explains how Spring 7.0.4 resolves it with a revised shutdown state machine, details additional performance‑related fixes and new features, and provides practical upgrade guidance based on JDK version and deployment scenario.

BugFixJavaKubernetes
0 likes · 14 min read
Spring 7.0.4: Hidden Deadlock Fix and 30‑50% Startup Boost for K8s Apps
Java Web Project
Java Web Project
Apr 15, 2026 · Backend Development

How We Cut Spring Boot Startup from 12 s to 3 s with GraalVM Native Image

This article walks through converting a Spring Boot order‑query microservice to a GraalVM Native Image, detailing environment setup, common build pitfalls with concrete code fixes, Docker multi‑stage packaging, K8s scaling comparison, performance benchmarks, CI/CD integration, and guidance on when Native Image is appropriate.

DockerKubernetesPerformance Optimization
0 likes · 12 min read
How We Cut Spring Boot Startup from 12 s to 3 s with GraalVM Native Image
dbaplus Community
dbaplus Community
Apr 14, 2026 · Information Security

How to Investigate and Respond to Kubernetes Cluster Intrusions

This guide walks through practical techniques for detecting, tracing, and remediating Kubernetes cluster compromises, covering pod‑level debugging, node inspection, audit‑log analysis, and common attacker behaviors such as privileged pod creation and hostPath mounting.

Cluster ForensicsKubernetesPod Debugging
0 likes · 7 min read
How to Investigate and Respond to Kubernetes Cluster Intrusions
Ray's Galactic Tech
Ray's Galactic Tech
Apr 14, 2026 · Backend Development

How Go Microservices Pay a Hidden Performance Tax—and How to Eliminate It

This article examines the often‑overlooked performance “tax” in Go microservices, detailing how misuse of goroutines, channels, interfaces, object allocation, and fan‑out patterns inflates CPU, memory, and tail‑latency costs, and provides concrete engineering strategies—such as request‑level concurrency limits, bulkheads, and efficient logging—to achieve production‑grade scalability.

GoKubernetesMicroservices
0 likes · 40 min read
How Go Microservices Pay a Hidden Performance Tax—and How to Eliminate It
Ray's Galactic Tech
Ray's Galactic Tech
Apr 11, 2026 · Operations

Mastering Production‑Grade Kubernetes: From kubectl Basics to Scalable Cluster Management

This comprehensive guide walks you through turning simple kubectl commands into a robust, production‑ready Kubernetes platform by covering core architecture, scheduling, resource governance, high‑availability design, observability, security, GitOps workflows, and real‑world case studies for large‑scale deployments.

KubernetesObservabilityOps
0 likes · 52 min read
Mastering Production‑Grade Kubernetes: From kubectl Basics to Scalable Cluster Management
Node.js Tech Stack
Node.js Tech Stack
Apr 11, 2026 · Cloud Native

Control Node.js Heap Size with ENV in Kubernetes – New --max-heap-size in 25.9.0

Node.js 25.9.0 adds support for the --max‑heap‑size flag in the NODE_OPTIONS whitelist, allowing containers on Kubernetes to set heap limits via environment variables, reducing OOM kills, while also introducing experimental stream/iter API, test‑module mock changes, new Web Crypto algorithms, and other enhancements.

Environment VariablesHeap MemoryKubernetes
0 likes · 8 min read
Control Node.js Heap Size with ENV in Kubernetes – New --max-heap-size in 25.9.0
Architect's Tech Stack
Architect's Tech Stack
Apr 10, 2026 · Cloud Native

Why Docker and Kubernetes Are Like Shipping Containers: A Beginner’s Guide

Using a shipping‑container analogy, this article explains how Docker packages applications into portable images and how Kubernetes orchestrates those containers across clusters, clarifying key concepts such as images, containers, Pods, Deployments, Services, and the role of nodes in modern cloud‑native environments.

ContainersDockerKubernetes
0 likes · 7 min read
Why Docker and Kubernetes Are Like Shipping Containers: A Beginner’s Guide
IT Architects Alliance
IT Architects Alliance
Apr 9, 2026 · Information Security

Why 68% of Kubernetes Clusters Expose Cloud Credentials and How to Fix the Top 3 Risks

A recent study reveals that over two‑thirds of Kubernetes clusters contain critical misconfigurations that let attackers escape containers, steal cloud credentials, and hijack entire cloud accounts within minutes, and the article outlines the three most dangerous flaws, real‑world attack paths, and concrete mitigation steps.

Credential LeakageDefense in DepthKubernetes
0 likes · 8 min read
Why 68% of Kubernetes Clusters Expose Cloud Credentials and How to Fix the Top 3 Risks
Ray's Galactic Tech
Ray's Galactic Tech
Apr 7, 2026 · Cloud Native

Mastering Kubernetes at Scale: Production‑Ready Guide for 30+ Clusters

This comprehensive guide explains how to transform Kubernetes from a single‑cluster setup into a production‑grade, multi‑cluster platform that can handle tens of thousands of pods and high‑concurrency workloads by applying architectural, operational, and governance best practices across eight layers of the stack.

GitOpsKubernetesMulti-Cluster
0 likes · 38 min read
Mastering Kubernetes at Scale: Production‑Ready Guide for 30+ Clusters
Linux Tech Enthusiast
Linux Tech Enthusiast
Apr 7, 2026 · Operations

Top 10 Essential Tools Every Ops Engineer Uses Daily

This article enumerates ten widely used operations tools—Shell scripts, Git, Ansible, Prometheus, Grafana, Docker, Kubernetes, Nginx, ELK Stack, and Zabbix—detailing each tool's function, suitable scenarios, advantages, and concrete usage examples for daily sysadmin tasks.

AnsibleDockerELK
0 likes · 8 min read
Top 10 Essential Tools Every Ops Engineer Uses Daily
Ray's Galactic Tech
Ray's Galactic Tech
Apr 6, 2026 · Backend Development

Build a Production-Ready High-Concurrency AI Customer Service with Spring Boot 3, Spring AI & DeepSeek

This article walks through the complete engineering practice of turning a simple Spring Boot demo into a production‑grade, high‑concurrency intelligent customer‑service system by integrating Spring AI, DeepSeek, RAG, Redis, Kafka, resilience patterns, monitoring, and Kubernetes deployment.

AIIntelligent Customer ServiceKubernetes
0 likes · 38 min read
Build a Production-Ready High-Concurrency AI Customer Service with Spring Boot 3, Spring AI & DeepSeek
Ops Community
Ops Community
Apr 5, 2026 · Operations

Choosing the Right Ingress Controller: Nginx, Traefik, or Envoy?

This guide provides a deep technical comparison of Nginx Ingress Controller, Traefik, and Envoy Proxy, covering architecture, configuration, performance, feature sets, deployment patterns, security hardening, monitoring, and troubleshooting to help operators select the best solution for their Kubernetes clusters.

EnvoyIngressKubernetes
0 likes · 28 min read
Choosing the Right Ingress Controller: Nginx, Traefik, or Envoy?
AI Explorer
AI Explorer
Apr 5, 2026 · Artificial Intelligence

Onyx Open-Source AI Platform: Full Model Support and One‑Stop Deployable Solution

Onyx is an open‑source AI platform that acts as an application layer for large language models, offering a unified interface for RAG, web search, code execution, multimodal interaction, and customizable agents, with model‑agnostic support, one‑click installation, and flexible deployment options for individuals and enterprises.

AI PlatformCustom AgentsDocker
0 likes · 6 min read
Onyx Open-Source AI Platform: Full Model Support and One‑Stop Deployable Solution
Ray's Galactic Tech
Ray's Galactic Tech
Apr 3, 2026 · Artificial Intelligence

Building a Production‑Ready High‑Concurrency Story Generation System with Spring AI Alibaba

This article explains how to design and implement a scalable multi‑agent architecture for AI‑driven story creation using Spring AI Alibaba, covering core design principles, engineering optimizations, orchestration, high‑concurrency handling, observability, and deployment best practices.

KubernetesMulti-Agent ArchitectureObservability
0 likes · 29 min read
Building a Production‑Ready High‑Concurrency Story Generation System with Spring AI Alibaba
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Apr 2, 2026 · Cloud Native

How Kthena Enables Production‑Grade LLM Inference on Kubernetes

This article analyzes the cloud‑native challenges of deploying large‑model inference on Kubernetes and presents Kthena’s architecture—ModelServing, Router, Autoscaler, and ModelBooster—along with Volcano integration, vLLM‑Ascend setup, and a real‑world Qwen3‑235B deployment case, highlighting performance gains and future directions.

Cloud NativeKthenaKubernetes
0 likes · 13 min read
How Kthena Enables Production‑Grade LLM Inference on Kubernetes
Cloud Native Technology Community
Cloud Native Technology Community
Apr 2, 2026 · Information Security

Why Traditional Kubernetes Security Isn’t Enough for LLMs – 4 Critical Risks and How to Defend Them

Running large language models on Kubernetes looks stable, but the platform’s native security cannot address the new threat model introduced by LLMs, requiring operators to recognize prompt injection, data leakage, supply‑chain, and excessive agency risks and to implement a dedicated policy layer.

KubernetesLLMPolicy Layer
0 likes · 7 min read
Why Traditional Kubernetes Security Isn’t Enough for LLMs – 4 Critical Risks and How to Defend Them
java1234
java1234
Apr 2, 2026 · Cloud Native

How a Simple Analogy Clarified Docker and Kubernetes Core Concepts

An image is a static snapshot of an OS, runtime and code; a container runs that snapshot, while Dockerfile and docker‑compose define how to build and orchestrate images. Pods group containers for shared resources, and Kubernetes schedules, scales, heals, networks and stores them, enabling true “run anywhere” deployment.

Cloud NativeContainersDocker
0 likes · 6 min read
How a Simple Analogy Clarified Docker and Kubernetes Core Concepts
MaGe Linux Operations
MaGe Linux Operations
Mar 30, 2026 · Cloud Native

How to Scale Prometheus to Thousands of Nodes with Thanos: A Deep Dive

This article examines the storage, query performance, high‑availability, and high‑cardinality challenges of running Prometheus on a thousand‑node Kubernetes cluster and presents a complete, step‑by‑step Thanos‑based architecture, capacity‑planning models, configuration examples, and operational best practices for reliable horizontal scaling.

KubernetesObservabilityPrometheus
0 likes · 34 min read
How to Scale Prometheus to Thousands of Nodes with Thanos: A Deep Dive
IT Services Circle
IT Services Circle
Mar 30, 2026 · Cloud Native

Docker vs K8s: Solving Java Deployment Chaos with Containers

This article explains why traditional Java deployment struggles with environment inconsistencies, introduces Docker’s containerization workflow—including base images, Dockerfiles, images, registries, and tools like Compose and Swarm—and compares it with Kubernetes’ orchestration capabilities, showing how they together streamline Java application delivery.

DevOpsDockerJava
0 likes · 7 min read
Docker vs K8s: Solving Java Deployment Chaos with Containers
DevOps Coach
DevOps Coach
Mar 29, 2026 · Operations

Master Kubernetes YAML Without Memorizing a Single Line

This article breaks down why YAML feels daunting, reveals the exact DevOps workflow engineers use—including five essential commands and tools—to generate, validate, and edit Kubernetes manifests, and explains three proficiency levels and interview strategies for handling YAML without rote memorization.

DevOpsKubernetesOperations
0 likes · 11 min read
Master Kubernetes YAML Without Memorizing a Single Line
Advanced AI Application Practice
Advanced AI Application Practice
Mar 29, 2026 · Operations

Mastering OpenClaw Enterprise Deployment: From Setup to Operations (Practices 7‑14)

This guide walks through a real‑world 500‑person tech company’s OpenClaw rollout, detailing environment requirements, quick Windows/Linux installation, security hardening, multi‑system troubleshooting, Docker/K8s containerization, multi‑model routing, office‑tool integrations, automation scripts, RBAC, performance tuning, and high‑availability configuration, all achievable within 8‑10 hours.

AutomationDockerEnterprise Deployment
0 likes · 10 min read
Mastering OpenClaw Enterprise Deployment: From Setup to Operations (Practices 7‑14)
Ops Community
Ops Community
Mar 29, 2026 · Operations

Why DNS Lookups Fail and How to Fix Them: A Complete Troubleshooting Guide

This guide explains the DNS resolution process, categorises common failure types, provides step‑by‑step troubleshooting procedures, essential commands, configuration examples for systemd‑resolved, BIND9, Unbound and CoreDNS, and offers best‑practice recommendations for reliable DNS operation in Linux and Kubernetes environments.

DNSKubernetesLinux
0 likes · 50 min read
Why DNS Lookups Fail and How to Fix Them: A Complete Troubleshooting Guide
DevOps Coach
DevOps Coach
Mar 28, 2026 · Cloud Native

Why the Twelve-Factor App is Essential for Modern Cloud‑Native Development

The article explains how the Twelve‑Factor App methodology, created by Heroku’s Adam Wiggins, provides a set of core principles that prevent common production failures and form the foundation for modern tools like Docker, Kubernetes, and CI/CD pipelines, enabling reliable, scalable, and maintainable software.

Cloud NativeDevOpsDocker
0 likes · 22 min read
Why the Twelve-Factor App is Essential for Modern Cloud‑Native Development
DevOps Coach
DevOps Coach
Mar 27, 2026 · Operations

Can Four LLM‑Powered Agents Build a Real Kubernetes Cluster Without Human Help?

An experiment with four LLM‑driven autonomous agents—Architect, Builder, Security Sentinel, and QA Tester—attempted to provision a Proxmox‑based HA Kubernetes cluster using real hardware, revealing costly context drift, emergent coordination failures, and stark differences between Gemini and Claude in diagnosing infrastructure‑as‑code errors.

AI OpsAnsibleAutonomous SRE
0 likes · 14 min read
Can Four LLM‑Powered Agents Build a Real Kubernetes Cluster Without Human Help?
DevOps Coach
DevOps Coach
Mar 27, 2026 · Operations

Can AI Really Boost Your DevOps Productivity Ten‑fold? Updated 2026 Toolset Explained

This article analyzes how the 2025‑2026 shift to Model Context Protocol (MCP) transforms DevOps workflows, reviews four AI‑driven tools—including Cursor 2.0, MCP servers, AWS Q Developer CLI, and Spacelift’s Saturnhead AI—provides step‑by‑step configuration examples, and outlines what these tools can and cannot solve for modern infrastructure teams.

AIAWS Q DeveloperCursor
0 likes · 29 min read
Can AI Really Boost Your DevOps Productivity Ten‑fold? Updated 2026 Toolset Explained
Cognitive Technology Team
Cognitive Technology Team
Mar 27, 2026 · Operations

How to Build a Rock‑Solid High‑Availability Architecture: Redundancy, Defense, and Smooth Deployments

This article breaks down high‑availability architecture into redundancy, defensive degradation, and release mechanisms, offering concrete techniques, real‑world failure case studies, and step‑by‑step configurations to ensure continuous service even under heavy load or component failures.

Kubernetesci/cdcircuit breaker
0 likes · 16 min read
How to Build a Rock‑Solid High‑Availability Architecture: Redundancy, Defense, and Smooth Deployments
DevOps Coach
DevOps Coach
Mar 26, 2026 · Cloud Native

How kubara Enables Rapid, Production‑Ready Kubernetes Platforms in 30 Minutes

This article explains how the open‑source kubara framework provides a GitOps‑driven, hub‑and‑spoke Kubernetes platform that can be bootstrapped in about 30 minutes, detailing its architecture, default security, control‑plane components, data‑plane onboarding, and step‑by‑step commands for a production‑grade setup.

Argo CDCloud NativeGitOps
0 likes · 20 min read
How kubara Enables Rapid, Production‑Ready Kubernetes Platforms in 30 Minutes
Shi's AI Notebook
Shi's AI Notebook
Mar 25, 2026 · Information Security

LiteLLM Compromised in 46 Minutes: Inside the 47,000‑Download Supply‑Chain Attack

In March 2026, attackers hijacked the official PyPI maintainer account of LiteLLM, released two malicious versions that were downloaded 46,996 times in 46 minutes, exfiltrated credentials, launched a fork‑bomb, and demonstrated how unpinned dependencies and .pth files can turn a simple package install into a full‑scale supply‑chain breach.

KubernetesLiteLLMPyPI
0 likes · 12 min read
LiteLLM Compromised in 46 Minutes: Inside the 47,000‑Download Supply‑Chain Attack
AI Waka
AI Waka
Mar 25, 2026 · Cloud Native

How to Safely Deploy Production‑Ready AI Agents with KubeClaw on Kubernetes

This article explains why engineering discipline is essential for modern AI agents, introduces the KubeClaw platform and its Kubernetes‑native architecture, provides step‑by‑step installation and Helm deployment instructions, and outlines proven operational patterns for secure, observable, and reliable agent systems.

Agent ArchitectureKubernetesObservability
0 likes · 13 min read
How to Safely Deploy Production‑Ready AI Agents with KubeClaw on Kubernetes
DevOps Coach
DevOps Coach
Mar 24, 2026 · Operations

Avoid the Top 10 Kubernetes Monitoring Mistakes Every SRE Team Makes

This article examines the ten most common Kubernetes monitoring errors that SRE teams encounter, explains why each mistake harms reliability, and provides concrete, actionable solutions—including the Golden Signals framework, pod‑restart analysis, alert‑fatigue reduction, application‑level observability, etcd health checks, network metrics, control‑plane monitoring, log‑metric correlation, resource request tracking, and end‑to‑end observability—to help teams build robust, scalable monitoring systems.

Cloud NativeKubernetesObservability
0 likes · 11 min read
Avoid the Top 10 Kubernetes Monitoring Mistakes Every SRE Team Makes
Ray's Galactic Tech
Ray's Galactic Tech
Mar 24, 2026 · Cloud Native

Mastering Production-Grade Blue‑Green and Canary Deployments on Kubernetes

This comprehensive guide explains how to design, implement, and operate production‑grade blue‑green and canary releases on Kubernetes, covering traffic control, state handling, capacity planning, observability, automation scripts, code examples, and best‑practice checklists to ensure safe, scalable rollouts in high‑traffic environments.

Blue‑Green deploymentGitOpsKubernetes
0 likes · 32 min read
Mastering Production-Grade Blue‑Green and Canary Deployments on Kubernetes
DevOps Coach
DevOps Coach
Mar 23, 2026 · Cloud Native

How Distroless Images Cut Rust Service Startup from 8 s to 1.2 s

After building a fast Rust microservice, the team discovered Kubernetes pods took 8‑10 seconds to start due to Alpine‑based images; switching to minimal Distroless containers and static linking reduced the image size from 40 MB to 6.7 MB, cut cold‑start time to ~1.2 seconds, lowered memory usage, and improved security.

Container OptimizationDistrolessDocker
0 likes · 8 min read
How Distroless Images Cut Rust Service Startup from 8 s to 1.2 s
Woodpecker Software Testing
Woodpecker Software Testing
Mar 23, 2026 · Artificial Intelligence

Practical Guide to Optimizing AI Testing Tool Performance

This article analyzes why AI‑driven testing tools often become performance bottlenecks, identifies I/O and serialization as the main culprits, and presents concrete optimizations—including headless browser flags, mmap, gRPC streaming, model lightweighting, multi‑level caching, and Kubernetes‑based co‑scheduling—that together reduce latency by up to 90% and boost throughput severalfold.

AI testingKubernetesONNX
0 likes · 7 min read
Practical Guide to Optimizing AI Testing Tool Performance
Architecture Digest
Architecture Digest
Mar 20, 2026 · Backend Development

Why Spring Framework 7.0.4’s Hidden Bugs and Speed Boosts Matter to You

The article dissects Spring 7.0.4’s critical deadlock bug, explains several other subtle fixes, details three major performance optimizations that can cut startup time by up to 50 % and up to 20 % request latency, and provides practical upgrade guidance for Kubernetes‑deployed Java services.

JavaKubernetesSpring Framework
0 likes · 13 min read
Why Spring Framework 7.0.4’s Hidden Bugs and Speed Boosts Matter to You
Architect Chen
Architect Chen
Mar 19, 2026 · Cloud Native

How Does Kubernetes Really Work? A Deep Dive into K8s Architecture

This article provides a comprehensive, step‑by‑step explanation of Kubernetes (K8s) architecture and operation, covering the control plane components, node components, data flow, and the detailed workflow from a kubectl command to a running pod, illustrated with diagrams and ASCII schematics.

Cloud NativeDevOpsKubernetes
0 likes · 5 min read
How Does Kubernetes Really Work? A Deep Dive into K8s Architecture
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Mar 18, 2026 · Cloud Native

Why Ingress NGINX Is Retiring and How to Choose Its Successor

The article analyzes the retirement of Ingress NGINX, explains the security flaws, architectural debt, and community constraints that led to its end‑of‑life, and compares migration paths—including staying with NGINX, moving to Gateway API, or adopting Alibaba Cloud ALB Ingress—so engineers can make an informed decision.

ALB IngressGateway APIKubernetes
0 likes · 18 min read
Why Ingress NGINX Is Retiring and How to Choose Its Successor
Woodpecker Software Testing
Woodpecker Software Testing
Mar 18, 2026 · Operations

How Test Experts Can Turn Prediction Analytics into Real‑World Impact

The article explains how test prediction analytics can replace intuition with data‑driven risk signals, detailing high‑ROI use cases, data governance practices, model selection (favoring XGBoost), and a three‑layer deployment architecture that integrates predictions into CI/CD workflows, backed by concrete results from finance and e‑commerce projects.

Data‑Driven TestingKubernetesXGBoost
0 likes · 8 min read
How Test Experts Can Turn Prediction Analytics into Real‑World Impact
Shuge Unlimited
Shuge Unlimited
Mar 17, 2026 · Operations

Exploring OpenClaw for K8s AIOps: Four Practical Scenarios from Concept to Deployment

This article analyzes how OpenClaw’s Skills, Subagent, and Cron capabilities can be leveraged to build Kubernetes AIOps solutions, presenting four detailed scenarios—fault diagnosis, resource optimization, security audit, and continuous health checks—while evaluating technical feasibility, security, reliability, cost, and a phased rollout plan.

Cloud NativeKubernetesOpenClaw
0 likes · 19 min read
Exploring OpenClaw for K8s AIOps: Four Practical Scenarios from Concept to Deployment
Raymond Ops
Raymond Ops
Mar 16, 2026 · Cloud Native

Master Kubernetes Pod Lifecycle and Restart Policies – From Creation to Graceful Termination

This guide walks through Kubernetes pod lifecycle phases, container states, restartPolicy options, health‑check probes, lifecycle hooks, init containers, common troubleshooting scenarios such as CrashLoopBackOff, Pending and Stuck Terminating, and provides best‑practice recommendations for configuration, graceful shutdown, resource limits and monitoring.

Health probesInit containersKubernetes
0 likes · 15 min read
Master Kubernetes Pod Lifecycle and Restart Policies – From Creation to Graceful Termination
MaGe Linux Operations
MaGe Linux Operations
Mar 16, 2026 · Operations

Kubernetes Pod Troubleshooting Guide: Diagnose CrashLoopBackOff, OOMKilled & More

A comprehensive, step‑by‑step guide for SREs and DevOps engineers to diagnose and resolve common Kubernetes pod issues—including CrashLoopBackOff, OOMKilled, ImagePullBackOff, Pending, Evicted, and Terminating—by leveraging pod lifecycle knowledge, kubectl commands, logs, events, node inspection, scripts, real‑world case studies, and monitoring best practices.

DevOpsKubernetesPod
0 likes · 55 min read
Kubernetes Pod Troubleshooting Guide: Diagnose CrashLoopBackOff, OOMKilled & More
Selected Java Interview Questions
Selected Java Interview Questions
Mar 15, 2026 · Cloud Native

What Exactly Are Docker Images, Containers, and Kubernetes Pods? A Simple Guide

An easy-to-understand walkthrough explains Docker images as static system snapshots, containers as runnable instances, Dockerfile and docker‑compose recipes, and how Kubernetes Pods orchestrate containers, highlighting why these tools enable “run anywhere” deployment and scalable management across clusters.

Cloud NativeContainersDevOps
0 likes · 6 min read
What Exactly Are Docker Images, Containers, and Kubernetes Pods? A Simple Guide
Old Zhang's AI Learning
Old Zhang's AI Learning
Mar 13, 2026 · Artificial Intelligence

OpenClaw v3.12: Revamped Dashboard, 20+ Security Fixes & Fast Mode

OpenClaw v3.12 introduces a completely rebuilt Dashboard, a unified Fast Mode switch, a provider‑plugin architecture for easy model integration, extensive security hardening across command execution, permissions and webhooks, plus new iOS/macOS UI upgrades and Kubernetes deployment guides.

AI agentsDashboardKubernetes
0 likes · 10 min read
OpenClaw v3.12: Revamped Dashboard, 20+ Security Fixes & Fast Mode
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Mar 13, 2026 · Cloud Native

Boosting Autonomous Driving Data Pipelines with Koordinator’s ElasticQuota and GPU Sharing

This article details how a leading autonomous‑driving company tackled multi‑tenant resource contention, low GPU utilization, and distributed task dead‑locks on a heterogeneous Kubernetes cluster by adopting Koordinator’s ElasticQuota, Reservation, Gang and Device‑Share features, achieving higher allocation rates, better fairness, and significantly improved GPU efficiency.

ElasticQuotaGPU SharingKoordinator
0 likes · 20 min read
Boosting Autonomous Driving Data Pipelines with Koordinator’s ElasticQuota and GPU Sharing
Cloud Native Technology Community
Cloud Native Technology Community
Mar 13, 2026 · Cloud Native

How Kubernetes Evolved into a Unified AI Platform for Massive Data and Autonomous Agents

From its 2015 debut as a stateless microservice orchestrator, Kubernetes now powers large‑scale data pipelines, distributed training, high‑throughput inference, and autonomous agents, unifying these workloads on a single platform while addressing resource coordination, multi‑cluster scheduling, and GPU economics.

AICloud NativeGPU scheduling
0 likes · 10 min read
How Kubernetes Evolved into a Unified AI Platform for Massive Data and Autonomous Agents
MaGe Linux Operations
MaGe Linux Operations
Mar 12, 2026 · Backend Development

How to Deploy vLLM Inference Service on Kubernetes with Ingress and Service Load Balancing

This guide walks through deploying a production‑grade vLLM inference service on Kubernetes, covering GPU resource scheduling, Service and Ingress configuration, session affinity, health checks, performance tuning, scaling, monitoring, fault‑tolerance, and best‑practice recommendations for high‑availability AI workloads.

GPUIngressKubernetes
0 likes · 47 min read
How to Deploy vLLM Inference Service on Kubernetes with Ingress and Service Load Balancing
AI Explorer
AI Explorer
Mar 9, 2026 · Artificial Intelligence

OpenSandbox: Alibaba’s Open‑Source AI Sandbox Platform for Secure Agent Execution

OpenSandbox, Alibaba’s newly open‑sourced sandbox platform, offers a standardized, strongly isolated, and easily managed environment for AI agents, supporting multi‑language SDKs, Docker and Kubernetes runtimes, and enterprise‑grade security features, with a quick Python‑SDK demo to illustrate its use.

AI agentsAI sandboxDocker
0 likes · 7 min read
OpenSandbox: Alibaba’s Open‑Source AI Sandbox Platform for Secure Agent Execution
Tech Musings
Tech Musings
Mar 5, 2026 · Cloud Native

Why Default Java GC Settings Kill Performance on Kubernetes (And How to Fix It)

Through a controlled experiment with four Spring Boot service groups on Kubernetes, this article shows that relying on Java’s default GC and heap settings can drastically reduce throughput and increase tail latency, especially under higher load, and demonstrates how explicit GC algorithm and Xms/Xmx tuning restores performance.

JVMJavaKubernetes
0 likes · 13 min read
Why Default Java GC Settings Kill Performance on Kubernetes (And How to Fix It)
Raymond Ops
Raymond Ops
Mar 4, 2026 · Operations

Build an Enterprise‑Grade DevOps CI/CD Pipeline in 7 Days with Ready‑to‑Use Scripts

This guide walks you through constructing a full‑stack, enterprise‑level DevOps pipeline—from environment preparation and tool installation to Jenkins pipeline scripting, Kubernetes deployment, monitoring, security hardening, and cost optimization—providing complete scripts and step‑by‑step instructions to achieve automated, reliable releases within a week.

AutomationDevOpsDocker
0 likes · 27 min read
Build an Enterprise‑Grade DevOps CI/CD Pipeline in 7 Days with Ready‑to‑Use Scripts
Linux Ops Smart Journey
Linux Ops Smart Journey
Mar 4, 2026 · Cloud Native

Secure Envoy Gateway with Basic Auth and Kubernetes Secrets

This guide walks through enabling Basic Authentication in Envoy Gateway by creating an .htpasswd file, storing it as a Kubernetes Secret, applying a SecurityPolicy, and verifying access with curl, while highlighting important security considerations such as using HTTPS.

Basic AuthCloud NativeEnvoy Gateway
0 likes · 5 min read
Secure Envoy Gateway with Basic Auth and Kubernetes Secrets
DevOps Coach
DevOps Coach
Mar 3, 2026 · Cloud Native

Discover Argo Workflows 4.0: 24 New Features, Performance Gains & UI Upgrades

Argo Workflows 4.0 has been released, bringing 24 new features, 122 bug fixes, and contributions from 73 developers, including artifact‑driver plugins, full CRD validation, deprecated singular sync primitives, name‑filtering for archived workflows, real‑time parallelism updates, OIDC custom CA support, UI improvements, and enhanced CLI commands, all aimed at simplifying large‑scale pipeline orchestration across clusters.

Argo WorkflowsCloud NativeKubernetes
0 likes · 9 min read
Discover Argo Workflows 4.0: 24 New Features, Performance Gains & UI Upgrades