Tagged articles
89 articles
Page 1 of 1
Big Data Tech Team
Big Data Tech Team
Feb 26, 2026 · Big Data

How to Design Practical Data Architecture Diagrams: A Step‑by‑Step Guide

This guide walks data engineers through the entire process of creating clear, production‑ready data architecture diagrams—from identifying the diagram type and defining layers, to selecting tools, drawing step‑by‑step components, applying visual standards, avoiding common pitfalls, and validating the final design for stakeholders.

Diagrambig-datadata-architecture
0 likes · 11 min read
How to Design Practical Data Architecture Diagrams: A Step‑by‑Step Guide
DataFunTalk
DataFunTalk
Dec 26, 2025 · Cloud Native

How Haier Built a Cloud‑Native Multi‑Modal Data Lake for AI‑Ready Manufacturing

Haier’s digital transformation leverages a cloud‑native, open‑source‑based multi‑modal data lake that unifies structured and unstructured industrial data, uses metadata models and knowledge graphs for governance, and provides AI‑ready services that balance performance, cost, and real‑time requirements.

AIData LakeMultimodal Data
0 likes · 12 min read
How Haier Built a Cloud‑Native Multi‑Modal Data Lake for AI‑Ready Manufacturing
DataFunSummit
DataFunSummit
Dec 19, 2025 · Cloud Native

How HiSilicon Uses Cloud‑Native Architecture to Build a Multi‑Modal Data Lake

Amid the AI wave, HiSilicon’s digital transformation tackles fragmented industrial data by adopting a cloud‑native, open‑source stack centered on Paimon, creating a unified metadata model, knowledge graph, and elastic scheduling that balances performance and cost while powering AI‑ready services across nine business domains.

AIKnowledge Graphbig-data
0 likes · 12 min read
How HiSilicon Uses Cloud‑Native Architecture to Build a Multi‑Modal Data Lake
macrozheng
macrozheng
May 12, 2025 · Backend Development

Designing a Billion‑User Real‑Time Leaderboard: Redis vs MySQL

This article explores how to build a scalable, high‑performance leaderboard for hundreds of millions of users by comparing traditional database ORDER BY approaches with Redis sorted sets, addressing challenges such as hot keys, memory pressure, persistence risks, and presenting a divide‑and‑conquer implementation strategy.

Scalabilitybig-datahigh concurrency
0 likes · 11 min read
Designing a Billion‑User Real‑Time Leaderboard: Redis vs MySQL
21CTO
21CTO
Oct 5, 2024 · Big Data

How Microsoft’s Open‑Source Drasi Redefines Real‑Time Event Processing

Microsoft announced the open‑source Drasi system, a low‑code, graph‑query based platform that monitors logs, databases, and metrics to detect changes in real time, automatically triggering context‑aware actions without moving data to a central lake, aiming to simplify complex event‑driven architectures.

DrasiEvent ProcessingReal-Time
0 likes · 4 min read
How Microsoft’s Open‑Source Drasi Redefines Real‑Time Event Processing
Alibaba Cloud Native
Alibaba Cloud Native
Sep 4, 2024 · Big Data

How to Speed Up High‑Cardinality GroupBy Queries by Up to 8× in SLS

This article explains why high‑cardinality GroupBy queries are slow, describes SLS's underlying aggregation pipeline, and shows how adjusting session parameters and enabling high‑cardinality optimizations can reduce query times from dozens of seconds to just a few seconds across three real‑world test scenarios.

SLSSQLbig-data
0 likes · 11 min read
How to Speed Up High‑Cardinality GroupBy Queries by Up to 8× in SLS
Python Programming Learning Circle
Python Programming Learning Circle
Aug 30, 2024 · Fundamentals

Key Findings from the 2022 JetBrains Python Developer Survey

The 2022 JetBrains Python Developer Survey, conducted with over 23,000 respondents from more than 200 countries, reveals that 93% now use Python 3 (with Python 3.10 most popular), 7% still use Python 2, and highlights trends in frameworks, databases, big‑data tools, IDEs, operating systems, documentation tools, and primary usage contexts.

IDEbig-datadatabases
0 likes · 5 min read
Key Findings from the 2022 JetBrains Python Developer Survey
DataFunTalk
DataFunTalk
Jul 1, 2024 · Big Data

JD Retail Metric Middle Platform: Architecture, Semantic Layer, Production, Governance and Practical Cases

This article presents JD Retail’s metric middle‑platform practice, describing the background problems of legacy metric systems, the four‑step solution framework, the overall architecture, semantic‑layer construction with the 4W1H method, configurable metric production, acceleration techniques, governance mechanisms, achieved results and future plans.

big-datadata-platformgovernance
0 likes · 19 min read
JD Retail Metric Middle Platform: Architecture, Semantic Layer, Production, Governance and Practical Cases
DataFunSummit
DataFunSummit
Jun 9, 2024 · Cloud Native

Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto

This whitepaper examines the industry trend of moving data‑intensive analytics workloads to cloud‑native platforms, analyzes how cloud storage cost models affect performance optimization, and presents a case study of Uber's Presto deployment that reveals fragmented access patterns and new I/O cost considerations.

IO optimizationbig-datacase-study
0 likes · 3 min read
Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto
DataFunSummit
DataFunSummit
Jun 5, 2024 · Cloud Native

Migrating Data‑Intensive Analytics to Cloud‑Native Environments: Cost‑Aware I/O Optimization Insights from Uber Presto

This whitepaper examines the industry trend of moving data‑intensive analytics workloads to cloud‑native platforms, revealing how cloud storage cost models affect performance optimization and presenting case‑study‑based I/O strategies derived from Uber's Presto production environment.

Case StudyCost ModelI/O optimization
0 likes · 3 min read
Migrating Data‑Intensive Analytics to Cloud‑Native Environments: Cost‑Aware I/O Optimization Insights from Uber Presto
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Jun 4, 2024 · Big Data

Why Kafka Can Achieve Million‑Message‑Per‑Second Throughput: Disk Sequential Write, Zero‑Copy, Page Cache, and Memory‑Mapped Files

The article explains how Kafka attains ultra‑high write throughput by leveraging disk sequential writes, zero‑copy data transfer, operating‑system page cache, and memory‑mapped files, detailing each technique’s impact on latency, CPU usage, and overall performance.

Sequential WriteZero Copybig-data
0 likes · 5 min read
Why Kafka Can Achieve Million‑Message‑Per‑Second Throughput: Disk Sequential Write, Zero‑Copy, Page Cache, and Memory‑Mapped Files
Python Programming Learning Circle
Python Programming Learning Circle
May 9, 2024 · Fundamentals

Key Findings from the 2022 JetBrains Python Developer Survey

The 2022 JetBrains Python Developer Survey, conducted with over 23,000 respondents from more than 200 countries, reveals that 93% now use Python 3, highlights the dominance of Flask, Django and FastAPI in web development, shows growing adoption of big‑data tools, and details IDE, OS, and documentation preferences among Python developers.

IDEbig-datadatabases
0 likes · 5 min read
Key Findings from the 2022 JetBrains Python Developer Survey
JavaEdge
JavaEdge
Apr 5, 2024 · Backend Development

Beyond Web Apps: 9 Exciting Java Projects to Explore

This article lists nine compelling Java‑based projects—from a 3D engine and deep‑learning library to time‑series databases, search engines, message queues, NLP tools, and an IoT platform—showing how Java can power diverse, interesting applications beyond ordinary web development.

IoTbackend-developmentbig-data
0 likes · 8 min read
Beyond Web Apps: 9 Exciting Java Projects to Explore
Python Programming Learning Circle
Python Programming Learning Circle
Dec 27, 2023 · Fundamentals

2022 JetBrains Python Developer Survey: Key Findings on Language Versions, Frameworks, Tools, and Usage Trends

The 2022 JetBrains and Python Software Foundation survey of over 23,000 developers from 200 countries reveals that 93% now use Python 3, highlights the dominance of Flask, Django and FastAPI, shows growing adoption of big‑data tools and IDEs like PyCharm and VS Code, and details how Python is applied across web development, data analysis, and DevOps.

big-datadeveloper surveytools
0 likes · 5 min read
2022 JetBrains Python Developer Survey: Key Findings on Language Versions, Frameworks, Tools, and Usage Trends
Architect
Architect
Dec 10, 2023 · Backend Development

Design and Architecture of an Online Checkout System

This article explains the concepts, scenario challenges, functional features, third‑party integration, rule‑engine design, and big‑data handling strategies behind a scalable online checkout system, providing a comprehensive view of its backend architecture and implementation.

architecturebig-datacheckout
0 likes · 10 min read
Design and Architecture of an Online Checkout System
DataFunTalk
DataFunTalk
Nov 5, 2023 · Cloud Native

Cloud‑Native Storage Acceleration: Experience and Practices with CloudFS on Volcano Engine

This article presents the cloud‑native storage acceleration demands, evaluates what constitutes a good acceleration solution, and details the design, implementation, and real‑world practice of CloudFS—including metadata acceleration, data‑plane caching, FUSE enhancements, AI training and multi‑cloud data‑lake use cases—while outlining future roadmap plans.

AICloudFSKubernetes
0 likes · 15 min read
Cloud‑Native Storage Acceleration: Experience and Practices with CloudFS on Volcano Engine
Efficient Ops
Efficient Ops
Jul 4, 2023 · Big Data

How Cloud‑Native Architecture Transforms Big Data Operations at ByteDance

This article explains how ByteDance migrated its complex, component‑heavy big‑data platform to a cloud‑native architecture, detailing the challenges of traditional deployments, the benefits of micro‑service, container, immutable‑infrastructure and declarative‑API approaches, and the resulting low‑resource, highly‑scalable, portable operations framework.

big-datacloud-nativedisk-management
0 likes · 16 min read
How Cloud‑Native Architecture Transforms Big Data Operations at ByteDance
Laravel Tech Community
Laravel Tech Community
May 28, 2023 · Big Data

Elasticsearch 8.8.0 Release Notes: Bug Fixes, Deprecations, and New Features

Elasticsearch 8.8.0, the latest release of the Lucene‑based distributed search engine, introduces numerous bug fixes across aggregations, allocation, application and authorization, deprecates certain allocation settings, and adds new capabilities such as templated search APIs, JWT authentication, DLM enhancements, health metrics, ingest node licensing checks, machine‑learning query extensions, ranking improvements, search enhancements, and TSDB support.

Elasticsearchbig-databug-fix
0 likes · 5 min read
Elasticsearch 8.8.0 Release Notes: Bug Fixes, Deprecations, and New Features
MaGe Linux Operations
MaGe Linux Operations
Apr 28, 2023 · Big Data

How to Sync 50 Million Rows Efficiently with Alibaba’s DataX

This guide explains why traditional mysqldump and file‑based methods fail for massive cross‑database sync, introduces Alibaba’s open‑source DataX middleware, details its framework and plugin architecture, walks through installation on Linux, shows how to configure MySQL source and target, and demonstrates both full and incremental data synchronization with practical JSON job examples.

DataXETLIncremental Sync
0 likes · 14 min read
How to Sync 50 Million Rows Efficiently with Alibaba’s DataX
DataFunSummit
DataFunSummit
Mar 28, 2023 · Big Data

Core Technologies, Performance Metrics, Challenges, and Future Trends of Cloud‑Native Big Data – Expert Interview

In this expert interview, a chief big‑data architect from NetEase explains the core technology layers, key performance indicators, major challenges and mitigation strategies, the business value, and emerging trends of cloud‑native big data platforms, highlighting scheduling, storage, and mixed‑deployment considerations.

Schedulingbig-datastorage
0 likes · 15 min read
Core Technologies, Performance Metrics, Challenges, and Future Trends of Cloud‑Native Big Data – Expert Interview
Top Architect
Top Architect
Mar 25, 2023 · Big Data

Comprehensive Overview of Data Middle Platform Architecture and Its Core Subsystems

The article provides an in‑depth technical overview of data middle‑platform architecture, explaining its six decoupled subsystems—storage, collection, processing, governance, security, and operation—while illustrating how enterprises can use this layered approach to centralize data, improve agility, and unlock data‑as‑a‑service across various industry scenarios.

big-datadata opsdata-architecture
0 likes · 18 min read
Comprehensive Overview of Data Middle Platform Architecture and Its Core Subsystems
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 22, 2022 · Big Data

Comprehensive Guide to Metadata Management, Data Quality, and Optimization in Big Data Systems

This article provides an in-depth overview of metadata concepts, their technical and business classifications, value in data management, applications such as data profiling and lineage, optimization techniques for compute and storage, lifecycle management, and comprehensive data quality assurance practices within large‑scale big data environments.

big-datadata-qualitydata-warehouse
0 likes · 38 min read
Comprehensive Guide to Metadata Management, Data Quality, and Optimization in Big Data Systems
21CTO
21CTO
Nov 18, 2022 · Big Data

How to Supercharge Elasticsearch for Billion‑Row Queries: Proven Optimization Techniques

This article details a real‑world case study of optimizing Elasticsearch for massive daily data volumes, covering the underlying Lucene architecture, shard routing, index and search performance tweaks, practical configuration settings, and benchmark results that achieve sub‑second query responses on billions of records.

Searchbig-dataindexing
0 likes · 13 min read
How to Supercharge Elasticsearch for Billion‑Row Queries: Proven Optimization Techniques
Data Thinking Notes
Data Thinking Notes
Nov 8, 2022 · Big Data

Effective Spark GC Tuning: Experiments, Results, and Best Practices

This article walks through a Spark job’s garbage‑collection tuning workflow, presents step‑by‑step experiments with different JVM options and collectors, compares performance under tight and normal memory conditions, and offers practical recommendations for choosing the optimal GC strategy in big‑data workloads.

MemorySparkTuning
0 likes · 12 min read
Effective Spark GC Tuning: Experiments, Results, and Best Practices
phodal
phodal
Oct 16, 2022 · Industry Insights

Why Financial Python‑as‑a‑Service Is the Next Big Leap for FinTech Data Analysis

This article examines the Bank Python architecture—four core building blocks and a three‑layer platform (interaction, domain, data)—and explains how a self‑service Python environment can deliver fast, real‑time, low‑latency analytics for financial professionals while addressing risk, compliance, and hybrid‑cloud challenges.

AIFinTechbig-data
0 likes · 9 min read
Why Financial Python‑as‑a‑Service Is the Next Big Leap for FinTech Data Analysis
Top Architect
Top Architect
Oct 2, 2022 · Big Data

Optimizing Kafka at Meituan: Challenges and Solutions for Large‑Scale Cluster Management

This article details Meituan's Kafka deployment, describing the current massive scale and associated challenges, and presents a series of optimizations—including read/write latency reductions, application‑ and system‑level improvements, large‑scale cluster management strategies, full‑link monitoring, service lifecycle management, and future directions—to enhance performance, reliability, and scalability of the streaming platform.

KafkaMeituanbig-data
0 likes · 23 min read
Optimizing Kafka at Meituan: Challenges and Solutions for Large‑Scale Cluster Management
Architect
Architect
Sep 23, 2022 · Databases

Elasticsearch Index and Search Performance Optimization for Billion‑Scale Data

This article presents a comprehensive case study of optimizing Elasticsearch and its underlying Lucene structures to achieve sub‑second query responses on billions of records, covering architecture basics, index design, doc‑values tuning, bulk‑write strategies, and extensive performance testing.

big-dataindexinglucene
0 likes · 12 min read
Elasticsearch Index and Search Performance Optimization for Billion‑Scale Data
Architect
Architect
Sep 15, 2022 · Big Data

Meituan's Kafka Optimizations: Challenges, Latency Improvements, and Large‑Scale Cluster Management

This article describes how Meituan's massive Kafka deployment—over 15,000 machines and petabytes of daily traffic—faces scalability challenges such as slow nodes, load imbalance, and resource contention, and details the multi‑layer optimizations applied at the application, system, and cluster‑management levels to reduce read/write latency and improve reliability.

KafkaLatencybig-data
0 likes · 22 min read
Meituan's Kafka Optimizations: Challenges, Latency Improvements, and Large‑Scale Cluster Management
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Sep 5, 2022 · Big Data

Scaling Alibaba TCC to Millions of RPS with a High‑Availability Real‑Time Data Warehouse

This article details how Alibaba's TCC platform evolved its architecture over multiple phases—from a legacy database to a high‑availability real‑time data warehouse built on Flink and Hologres—highlighting the challenges, solutions, and cost‑saving measures that enabled millions of RPS, terabytes of storage, and sub‑second query latency.

FlinkHologresReal-Time
0 likes · 21 min read
Scaling Alibaba TCC to Millions of RPS with a High‑Availability Real‑Time Data Warehouse
Sanyou's Java Diary
Sanyou's Java Diary
Aug 22, 2022 · Big Data

Step-by-Step Guide to Building a Kafka 3.0 Cluster with KRaft

This tutorial walks through planning roles, preparing the environment, configuring KRaft, formatting storage, and launching a Kafka 3.0 cluster with scripts for both startup and graceful shutdown, providing all commands and explanations needed for a production-ready setup.

Cluster SetupKRaftbig-data
0 likes · 10 min read
Step-by-Step Guide to Building a Kafka 3.0 Cluster with KRaft
Python Programming Learning Circle
Python Programming Learning Circle
Aug 15, 2022 · Artificial Intelligence

Top Python Libraries for Data Science, Machine Learning, and Data Visualization

This article curates a comprehensive list of popular Python libraries for data handling, mathematics, machine learning, automated machine learning, data visualization, and model interpretation, providing brief descriptions and GitHub statistics such as stars, contributions, and contributor counts.

artificial intelligencebig-datadata-science
0 likes · 12 min read
Top Python Libraries for Data Science, Machine Learning, and Data Visualization
Wukong Talks Architecture
Wukong Talks Architecture
Aug 9, 2022 · Big Data

Kafka Basics: 15 Key Questions and In‑Depth Answers

This comprehensive guide covers Kafka’s core concepts, architecture, Zookeeper role, producer sending modes, partitioning strategies, replica types, message deletion, performance optimizations, consumer models, offset management, and best‑practice recommendations for scaling and ensuring ordered delivery in distributed streaming systems.

PartitioningStreamingZooKeeper
0 likes · 31 min read
Kafka Basics: 15 Key Questions and In‑Depth Answers
GuanYuan Data Tech Team
GuanYuan Data Tech Team
Aug 4, 2022 · Cloud Native

What Is a Cloud‑Native Data Platform? Architecture, Components, and Best Practices

This article explores the evolution and architecture of cloud‑native data platforms, covering their historical roots, modern components such as storage layers, ingestion, processing, metadata, and consumption, and offers practical guidance on selecting tools, designing pipelines, and implementing best‑practice strategies for scalable, flexible data infrastructure.

Data Architecturebig-datacloud-native
0 likes · 41 min read
What Is a Cloud‑Native Data Platform? Architecture, Components, and Best Practices
IT Architects Alliance
IT Architects Alliance
Jul 27, 2022 · Big Data

Understanding Kafka Architecture: Topics, Partitions, Replication, Consumers, Network Design, Zero‑Copy and Zookeeper

This article provides a comprehensive overview of Kafka's core concepts—including topics, partitions, replication, log segmentation, leader‑follower roles, consumer groups, network threading model, zero‑copy I/O, and Zookeeper coordination—explaining how each component works and why understanding the principles is essential for troubleshooting and performance tuning.

big-datadistributed-systems
0 likes · 9 min read
Understanding Kafka Architecture: Topics, Partitions, Replication, Consumers, Network Design, Zero‑Copy and Zookeeper
Efficient Ops
Efficient Ops
Jun 7, 2022 · Big Data

Visualizing Kafka: Core Concepts Explained with Diagrams

This article visually breaks down Kafka’s fundamental concepts—including topics, partitions, producers, consumers, consumer groups, and cluster architecture—so readers can grasp how messages flow, are stored, and achieve load balancing and ordering within a distributed streaming platform.

Distributed SystemsKafkaMessage Queue
0 likes · 6 min read
Visualizing Kafka: Core Concepts Explained with Diagrams
DataFunSummit
DataFunSummit
Jun 5, 2022 · Cloud Native

JD Retail Big Data Cloud‑Native Platform Practice

This article presents JD Retail’s cloud‑native platformization of big data, covering the definition and evolution of cloud‑native concepts, the selection of underlying technologies, JD’s architectural choices and workflow coordination, and a broader view of cloud‑native application platform development.

JDPersistencebig-data
0 likes · 15 min read
JD Retail Big Data Cloud‑Native Platform Practice
Maoyan Technology Team
Maoyan Technology Team
Apr 13, 2022 · Big Data

Inside Maoyan’s Near‑Real‑Time Transaction Data Center

The article details Maoyan’s transaction data center, explaining its background, the need for a unified real‑time order model, the benefits of reduced coupling and improved data accuracy, and describes the system’s architecture, components, data collection, processing, task scheduling, monitoring, and future plans.

Data centerReal-Timebig-data
0 likes · 29 min read
Inside Maoyan’s Near‑Real‑Time Transaction Data Center
Big Data Technology & Architecture
Big Data Technology & Architecture
Apr 11, 2022 · Big Data

Real-Time Data Warehouse Construction: Background, Objectives, Architecture, and Case Studies

This article explains the growing demand for real‑time data warehouses, outlines their objectives and layered architecture, and presents detailed case studies from Didi, Kuaishou, Tencent, Youzan and others, illustrating design choices, implementation challenges, and best practices for building scalable streaming data platforms.

ClickHouseFlinkKafka
0 likes · 48 min read
Real-Time Data Warehouse Construction: Background, Objectives, Architecture, and Case Studies
Java Backend Technology
Java Backend Technology
Jan 11, 2022 · Databases

Why SQL Fails at Multi‑Group & Top‑N Queries and How SPL Fixes It

The article explains how conventional SQL struggles with executing multiple grouping and Top‑N aggregations on massive tables, leading to repeated full scans and poor performance, and demonstrates how the SPL compute engine can perform these operations in a single pass with parallelism, improving speed and scalability.

SPLSQLbig-data
0 likes · 14 min read
Why SQL Fails at Multi‑Group & Top‑N Queries and How SPL Fixes It
Youzan Coder
Youzan Coder
Dec 22, 2021 · Big Data

3rd Youzan Big Data Technology Salon: Apache Kylin4, Data Governance, and AI Applications

The 3rd Youzan Big Data Technology Salon, held online for over 200 participants, showcased Apache Kylin 4’s performance boost, GeTui’s five‑step AI method, Kwai’s sustainable data‑governance system, and Youzan’s intelligent copy algorithms, highlighting data governance’s evolution into a core business priority and the shift toward intelligent discovery.

Apache KylinData Intelligencebig-data
0 likes · 6 min read
3rd Youzan Big Data Technology Salon: Apache Kylin4, Data Governance, and AI Applications
Architecture Digest
Architecture Digest
Mar 11, 2021 · Cloud Native

Minsheng Bank Data Middle Platform: Cloud‑Native Architecture, Tools, and Practices

This article details Minsheng Bank's data middle platform built since 2018, explaining its cloud‑native architecture, the underlying microservice and container design, the operational pain points it addresses, and the suite of DevOps tools, management solutions, and component strategies that enable scalable, secure, and efficient financial data services.

BankingDevOpsbig-data
0 likes · 14 min read
Minsheng Bank Data Middle Platform: Cloud‑Native Architecture, Tools, and Practices
DevOps
DevOps
Feb 23, 2021 · Cloud Native

Minsheng Bank Data Middle Platform: Cloud‑Native Architecture and Tooling Practices

This article details Minsheng Bank’s data middle‑platform construction, its alignment with cloud‑native principles, the challenges it addresses, and the suite of micro‑service, DevOps and tooling innovations—including a one‑stop DevOps workbench, code generators, automated validation, and full‑link tracing—implemented to support diverse financial data services.

BankingData PlatformMicroservices
0 likes · 14 min read
Minsheng Bank Data Middle Platform: Cloud‑Native Architecture and Tooling Practices
dbaplus Community
dbaplus Community
Dec 15, 2020 · Big Data

Building Real‑Time OLAP Reports with Flink SQL CDC and Elasticsearch

This article details a production‑grade pipeline that uses Apache Flink 1.11's SQL CDC to stream MySQL changes into Elasticsearch, enabling low‑latency OLAP reporting, and shares the architecture, DDL/DML scripts, operational settings, and dozens of pitfalls encountered along the way.

CheckpointYAMLbig-data
0 likes · 19 min read
Building Real‑Time OLAP Reports with Flink SQL CDC and Elasticsearch
Architects Research Society
Architects Research Society
Dec 10, 2020 · Big Data

Spring Cloud Stream with Apache Kafka – Overview, Programming Model, and Advanced Features (Part 2)

This article explains how Spring Cloud Stream integrates with Apache Kafka, covering its programming model, configuration, code examples, topic provisioning, consumer groups, partitioning, monitoring, error handling, schema evolution, and Kafka Streams support for building robust streaming microservices.

Spring Cloud Streamapache-kafkabig-data
0 likes · 16 min read
Spring Cloud Stream with Apache Kafka – Overview, Programming Model, and Advanced Features (Part 2)
Alibaba Cloud Developer
Alibaba Cloud Developer
Nov 10, 2020 · Cloud Native

How Alibaba’s Cloud‑Native Architecture Powered 580K Orders per Second on 2020 Double‑11

The 2020 Tmall Double‑11 event shattered records with a peak of 583,000 orders per second, showcasing Alibaba’s digital‑native business operating system that combines cloud‑native migration, AI, big‑data streaming, real‑time video, and intelligent logistics to sustain the world’s largest traffic surge.

AIReal-Timebig-data
0 likes · 15 min read
How Alibaba’s Cloud‑Native Architecture Powered 580K Orders per Second on 2020 Double‑11
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 6, 2020 · Big Data

Kafka Upgrade Guide and Version Changes Overview

This article provides a comprehensive guide to upgrading Apache Kafka across multiple versions, detailing rolling upgrade procedures, configuration adjustments, protocol changes, new features, deprecations, and performance considerations for Kafka brokers, producers, consumers, and Kafka Streams applications.

big-datakafka streamsperformance
0 likes · 56 min read
Kafka Upgrade Guide and Version Changes Overview
Baidu Maps Tech Team
Baidu Maps Tech Team
May 12, 2020 · Artificial Intelligence

How Trajectory Mining Revolutionizes Real-Time Map Updates

This article explores how large‑scale trajectory mining can overcome the timeliness limits of traditional street‑sweeping data collection, detailing the underlying principles, technical challenges such as vehicle‑type detection and map‑matching, and practical solutions ranging from rule‑based filters to advanced AI models.

AIHMMTrajectory
0 likes · 16 min read
How Trajectory Mining Revolutionizes Real-Time Map Updates
Big Data Technology Architecture
Big Data Technology Architecture
Apr 28, 2020 · Big Data

Understanding Shuffle in Hadoop MapReduce and Spark

This article explains the concept and workflow of shuffle in Hadoop MapReduce and Spark, covering map‑side buffering, spill and merge, reduce‑side copy‑merge‑reduce, the reasons for sorting and file merging, and compares Hash‑Shuffle and Sort‑Shuffle implementations with performance considerations.

Hash ShuffleShuffleSort-Shuffle
0 likes · 16 min read
Understanding Shuffle in Hadoop MapReduce and Spark
Efficient Ops
Efficient Ops
Jan 16, 2020 · Databases

Designing the Underworld’s Hell‑DBMS: How Myth Meets Massive Data

This whimsical yet technically detailed article explores how a mythic Hell‑DBMS could be architected, covering unique identifiers, massive concurrent writes, batch processing, NoSQL tree‑structured storage, disaster recovery, and a real‑world demo project that brings the underworld’s life‑and‑death ledger to life.

Scalabilitybig-datadatabase
0 likes · 12 min read
Designing the Underworld’s Hell‑DBMS: How Myth Meets Massive Data
DataFunTalk
DataFunTalk
Sep 24, 2019 · Big Data

Collaborative Filtering: Fundamentals, Similarity Measures, and Distributed Implementation on Spark

This article introduces the basic concepts of collaborative filtering, explains user‑based and item‑based approaches, presents co‑occurrence, Euclidean, Pearson, and Cosine similarity formulas, and provides complete Scala implementations for these metrics and association‑rule mining on the Spark platform, along with practical scalability tips.

Scalabig-datacollaborative-filtering
0 likes · 17 min read
Collaborative Filtering: Fundamentals, Similarity Measures, and Distributed Implementation on Spark
JavaEdge
JavaEdge
Aug 25, 2019 · Big Data

Which Kafka Distribution Fits Your Needs? A Detailed Comparison

This article compares the main Kafka distributions—Apache Kafka, Confluent Kafka, and CDH/HDP Kafka—examining their origins, feature sets, ecosystem support, and trade‑offs to help you choose the most suitable version for your streaming workloads.

Streamingbig-dataconfluent
0 likes · 10 min read
Which Kafka Distribution Fits Your Needs? A Detailed Comparison
Alibaba Cloud Developer
Alibaba Cloud Developer
Aug 2, 2019 · Backend Development

How Xianyu’s IFTTT Engine Boosts Real‑Time Two‑Way User Interaction

Xianyu’s IFTTT system tackles sparse, one‑way user relationships by introducing multi‑dimensional, real‑time interaction through a standardized Trigger‑Action‑Recipe model, leveraging Channel, Trigger, and Action layers, high‑performance Lindorm storage, and low‑latency SLS‑Blink pipelines to process billions of relationship events daily.

IFTTTLindormbig-data
0 likes · 10 min read
How Xianyu’s IFTTT Engine Boosts Real‑Time Two‑Way User Interaction
dbaplus Community
dbaplus Community
Jul 8, 2019 · Big Data

How to Use ClickHouse Sampling and Materialized Views for Real‑Time Monitoring of Billion‑Scale Ad Traffic

This article explains how to handle high‑volume advertising monitoring by storing raw request logs in ClickHouse, enabling sampling and materialized views, and using TP999 metrics, aggregating tables, and Grafana queries to achieve fast, flexible, and low‑impact real‑time analytics on billions of events.

ClickHouseSamplingbig-data
0 likes · 10 min read
How to Use ClickHouse Sampling and Materialized Views for Real‑Time Monitoring of Billion‑Scale Ad Traffic
dbaplus Community
dbaplus Community
Jun 27, 2019 · Artificial Intelligence

How AI Powers Intelligent Multi-Modal Financial Data Quality Monitoring

This article presents the design, implementation, and evaluation of X‑monitor, an AI‑driven, adaptive, multi‑modal financial data quality monitoring platform that combines rule‑based and self‑learning strategies to improve detection efficiency, accuracy, and flexibility for large‑scale securities‑firm data streams.

AIbig-datadata-quality
0 likes · 24 min read
How AI Powers Intelligent Multi-Modal Financial Data Quality Monitoring
Efficient Ops
Efficient Ops
Oct 13, 2018 · Big Data

Boost Your Kafka Integration with KafkaBridge: Multi-Language SDK Overview

KafkaBridge is a lightweight, multi-language SDK that simplifies Kafka read/write operations, offering unified interfaces, long‑connection reuse for PHP‑FPM, and reliable message delivery, with detailed compilation steps, usage examples, and performance benchmarks across C++, Python, PHP, and Go.

CGolangKafka
0 likes · 7 min read
Boost Your Kafka Integration with KafkaBridge: Multi-Language SDK Overview
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Dec 21, 2017 · Operations

Stability Monitoring Practices for Double 11 2017

The 2017 Double 11 stability monitoring project introduced a four‑layer monitoring architecture—including customer & sentiment, business, system water‑level, and infrastructure monitoring—along with data archiving and system‑level reliability measures to detect, respond to, and mitigate issues far faster than traditional manual processes.

Operationsbig-dataincident response
0 likes · 14 min read
Stability Monitoring Practices for Double 11 2017
MaGe Linux Operations
MaGe Linux Operations
Oct 21, 2017 · Big Data

What 1.38 Million Zhihu Followers Reveal: A Python Scraping & Visualization Journey

This article documents a Python‑based web‑scraping project that harvested over 1.38 million Zhihu followers, filtered high‑impact users, and visualized insights such as follower distribution, gender ratio, top influencers, geographic spread, education, industry, and certification details, highlighting challenges and lessons learned.

big-datadata-visualizationpandas
0 likes · 11 min read
What 1.38 Million Zhihu Followers Reveal: A Python Scraping & Visualization Journey
dbaplus Community
dbaplus Community
Apr 27, 2017 · Big Data

Why Kafka’s __consumer_offsets Topic Can Fill Your Disk and How to Fix It

The article explains Kafka’s default consumer offset storage mechanism, why the __consumer_offsets system topic can consume massive disk space due to frequent synchronous commits and misconfigured cleanup, and outlines practical steps to reduce offset data and enable proper log compaction.

Consumer OffsetOffset ManagementOperations
0 likes · 6 min read
Why Kafka’s __consumer_offsets Topic Can Fill Your Disk and How to Fix It
dbaplus Community
dbaplus Community
Nov 14, 2016 · Operations

How to Build a Visualized Distributed Ops Platform for Cloud Environments

This article details the design and implementation of a visualized, automated operations platform that integrates inspection, job scheduling, configuration management with SaltStack, data lifecycle automation, and real‑time big‑data analytics to improve efficiency, reliability, and agility of cloud‑based IT services.

SaltStackbig-datacloud
0 likes · 25 min read
How to Build a Visualized Distributed Ops Platform for Cloud Environments
Architecture Digest
Architecture Digest
Apr 6, 2016 · Backend Development

Evolution of Kuaidi Dache Architecture: Solving LBS Bottlenecks, Long‑Connection Stability, Distributed Refactoring, Open Platform, Real‑Time Monitoring, and Data‑Layer Transformation

This article details how Kuaidi Dache scaled from 2013 to 2015 by addressing LBS performance limits, redesigning long‑connection services, refactoring monolithic code into layered services with Dubbo and RocketMQ, building a secure open platform, implementing Storm‑based real‑time monitoring, and migrating data storage to sharded MySQL, Canal‑driven sync, and HBase for massive scalability.

Microservicesbig-datadatabases
0 likes · 12 min read
Evolution of Kuaidi Dache Architecture: Solving LBS Bottlenecks, Long‑Connection Stability, Distributed Refactoring, Open Platform, Real‑Time Monitoring, and Data‑Layer Transformation