Tagged articles
3675 articles
Page 25 of 37
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 23, 2020 · Big Data

Apache Hudi Overview, Core Concepts, and Quick‑Start Guide

This article introduces Apache Hudi, explaining its storage types, query views, timeline feature, typical use cases such as near‑real‑time ingestion and incremental pipelines, and provides a step‑by‑step Scala/Spark quick‑start guide with code examples for compiling, inserting, updating, querying, and syncing data to Hive.

Apache HudiBig DataData Lake
0 likes · 18 min read
Apache Hudi Overview, Core Concepts, and Quick‑Start Guide
Java Architect Essentials
Java Architect Essentials
Aug 21, 2020 · Big Data

Design and Integration of Flume, Kafka, Storm, Drools, and Redis for Real‑Time ETL Log Analysis

This article presents a modular architecture for real‑time ETL log analysis that combines Flume for log collection, Kafka as a buffering layer, Storm for stream processing, Drools for rule‑based data transformation, and Redis for fast storage, detailing installation, configuration, and code integration steps.

Big DataDroolsFlume
0 likes · 23 min read
Design and Integration of Flume, Kafka, Storm, Drools, and Redis for Real‑Time ETL Log Analysis
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Aug 21, 2020 · Big Data

How Big Data and IoT Are Transforming Vehicle Networks: Opportunities and Challenges

This article explains the concepts of the Internet of Things and big data, explores how massive sensor data fuels smart transportation and vehicle networking, outlines practical applications such as real‑time traffic control and autonomous driving, and analyzes the technical and managerial bottlenecks hindering future growth.

Big DataIoTSmart Transportation
0 likes · 13 min read
How Big Data and IoT Are Transforming Vehicle Networks: Opportunities and Challenges
Liangxu Linux
Liangxu Linux
Aug 19, 2020 · Operations

How to Quickly Analyze Beijing Residency Data with Shell Commands

This tutorial shows how to use standard Unix shell tools such as grep, cut, sort, uniq, awk, and join to extract insights—top companies, most common surnames, popular given names, age distribution, and hometown statistics—from a JSON dataset of over 6,000 Beijing residency applicants.

Big DataJSONShell
0 likes · 13 min read
How to Quickly Analyze Beijing Residency Data with Shell Commands
dbaplus Community
dbaplus Community
Aug 18, 2020 · Big Data

Designing a Scalable Financial Data Warehouse: Modeling, Layers, and Quality Control

This article outlines a comprehensive approach to building a financial data warehouse, covering background needs, modeling methodologies, a layered architecture (I, C, S, R), data quality monitoring, metadata management, and detailed naming and coding standards to ensure maintainable, high‑quality data pipelines.

Big DataData QualityMetadata Management
0 likes · 14 min read
Designing a Scalable Financial Data Warehouse: Modeling, Layers, and Quality Control
Suning Technology
Suning Technology
Aug 18, 2020 · Backend Development

Boosting Mega‑Sale Stability: Suning’s Backend Data Components in Action

The article details how Suning’s transaction middle‑platform leverages custom TPS collection, advanced flow‑control, big‑data analytics, and AI‑driven forecasting to ensure system stability, capacity planning, and intelligent inventory distribution during the high‑traffic 818 promotional event.

AIBackendBig Data
0 likes · 17 min read
Boosting Mega‑Sale Stability: Suning’s Backend Data Components in Action
Beike Product & Technology
Beike Product & Technology
Aug 17, 2020 · Big Data

Bitmap-Based User Segmentation in a DMP Platform Using ClickHouse

This article describes how a data management platform (DMP) at Beike leverages ClickHouse bitmap structures and Spark pipelines to generate global numeric user IDs, design tag-specific bitmap rules for enum, continuous, and date attributes, handle boundary cases, and produce high‑performance bitmap SQL for real‑time user group estimation and complex segment logic.

Big DataClickHouseDMP
0 likes · 17 min read
Bitmap-Based User Segmentation in a DMP Platform Using ClickHouse
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 16, 2020 · Big Data

Comprehensive Overview of HDFS: Architecture, Advantages, Limitations, Commands, and Advanced Features

This article provides a detailed introduction to HDFS, covering its application scenarios, core architecture, fault‑tolerance benefits, drawbacks such as high latency and small‑file inefficiency, essential shell and API commands, cluster management procedures, and newer Hadoop 2.0 features like HA, Federation, snapshots, ACLs, and heterogeneous storage.

Big DataCLIHA
0 likes · 10 min read
Comprehensive Overview of HDFS: Architecture, Advantages, Limitations, Commands, and Advanced Features
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 15, 2020 · Big Data

Step-by-Step Guide to Building an ELK Stack with Kafka, Zookeeper, Logstash, and Filebeat for Log Collection

This tutorial provides a comprehensive, step-by-step procedure for setting up a log‑collection pipeline using Filebeat, Kafka, Zookeeper, Logstash, Elasticsearch, and Kibana across multiple servers, covering hardware preparation, system tuning, software installation, configuration files, and verification commands.

Big DataELKFilebeat
0 likes · 11 min read
Step-by-Step Guide to Building an ELK Stack with Kafka, Zookeeper, Logstash, and Filebeat for Log Collection
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 15, 2020 · Big Data

Understanding Data Lakes: Concepts, Architecture, Vendor Solutions, and Practical Use Cases

This comprehensive article explains what a data lake is, outlines its core characteristics and reference architecture, compares major cloud providers' data‑lake offerings, presents typical advertising and gaming use cases, and proposes a practical, agile process for building and operating a data lake.

Big DataCloud NativeData Architecture
0 likes · 50 min read
Understanding Data Lakes: Concepts, Architecture, Vendor Solutions, and Practical Use Cases
Suning Technology
Suning Technology
Aug 14, 2020 · Big Data

Building SuNing’s Supply‑Chain Data Platform with DDD and Big‑Data Design

This article recounts SuNing’s step‑by‑step journey of designing and implementing a supply‑chain data middle platform, outlining its business rationale, DDD‑based domain modeling, layered system architecture, and practical deployment insights that illustrate how a tailored big‑data solution can enhance data services and governance.

Big DataDDDData Governance
0 likes · 11 min read
Building SuNing’s Supply‑Chain Data Platform with DDD and Big‑Data Design
Huolala Tech
Huolala Tech
Aug 13, 2020 · Operations

How Huolala’s “Smart Brain” Uses AI and Optimization to Revolutionize Logistics

At the 2020 Global Logistics Technology Conference in Haikou, Huolala CTO Zhang Hao detailed the company’s self‑developed “Smart Brain” system, which leverages AI, big‑data analytics, IoT and custom optimization algorithms to achieve real‑time, intelligent dispatch, dynamic pricing and safer, more efficient logistics operations.

AIBig DataIoT
0 likes · 6 min read
How Huolala’s “Smart Brain” Uses AI and Optimization to Revolutionize Logistics
Aikesheng Open Source Community
Aikesheng Open Source Community
Aug 13, 2020 · Databases

Introduction to ClickHouse: Features, Installation, Performance Testing, and Comparison

This article introduces ClickHouse, an open‑source column‑oriented OLAP database, detailing its key features, appropriate use cases, installation steps, performance benchmark queries, and how it compares with other columnar storage solutions while highlighting its adoption by major internet companies.

Big DataClickHouseColumnar Database
0 likes · 10 min read
Introduction to ClickHouse: Features, Installation, Performance Testing, and Comparison
Tencent Cloud Middleware
Tencent Cloud Middleware
Aug 12, 2020 · Big Data

How Serverless Functions Can Replace Traditional Kafka Data Pipelines for Lower Cost and Easier Scaling

This article explains how Tencent Cloud CKafka works, describes the challenges of traditional open‑source data‑flow solutions, and demonstrates a Serverless Function approach—complete with architecture diagrams and code examples—to achieve low‑cost, auto‑scaling Kafka‑to‑Elasticsearch pipelines.

Big DataCKafkaElasticsearch
0 likes · 12 min read
How Serverless Functions Can Replace Traditional Kafka Data Pipelines for Lower Cost and Easier Scaling
IT Architects Alliance
IT Architects Alliance
Aug 12, 2020 · Big Data

Introduction to Confluent KSQL for Real-Time Stream Processing

This article introduces Confluent KSQL, a SQL‑based real‑time stream processing engine for Kafka, covering its architecture, stream vs table concepts, query lifecycle, Docker‑based setup, DDL commands, example joins, windowed aggregations, connectors, and its advantages and limitations.

Big DataDockerKSQL
0 likes · 9 min read
Introduction to Confluent KSQL for Real-Time Stream Processing
Architects' Tech Alliance
Architects' Tech Alliance
Aug 11, 2020 · Big Data

Comprehensive Overview of Data Middle Platform Architecture, Components, and Practices

This article provides an extensive summary of data middle platform concepts, covering data aggregation, collection tools, offline and real‑time development, data governance, service layers, warehouse construction, and operational practices, illustrating how enterprises build and manage a unified data ecosystem.

Big DataData GovernanceData Middle Platform
0 likes · 27 min read
Comprehensive Overview of Data Middle Platform Architecture, Components, and Practices
Ctrip Technology
Ctrip Technology
Aug 6, 2020 · Big Data

Data Governance Practices and Model Design in Ctrip Vacation Data Warehouse

This article shares the practical experience and thinking behind Ctrip's vacation data governance project, covering team efficiency optimization, demand sorting, data domain definition, warehouse layering, unified dimension modeling, metric standardization, and the overall benefits of a centralized data governance framework.

Big DataCtripData Governance
0 likes · 17 min read
Data Governance Practices and Model Design in Ctrip Vacation Data Warehouse
Youku Technology
Youku Technology
Aug 6, 2020 · Big Data

Alibaba Entertainment Data Platform: The Journey Ahead

The presentation outlines how Alibaba's entertainment data platform has evolved to meet the real‑time, low‑cost, and scalable analytics demands of campaigns such as Double 11 and 618, detailing its architecture, real‑time processing, pre‑computed data cubes, practical design choices, and lessons learned from implementation challenges.

Big Datareal-time analytics
0 likes · 1 min read
Alibaba Entertainment Data Platform: The Journey Ahead
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 5, 2020 · Big Data

An Introduction to Apache Kylin: Architecture, Core Concepts, Installation, and Enterprise Use Cases

This article provides a comprehensive overview of Apache Kylin, covering its background, core OLAP concepts, technical architecture, installation steps, cube-building methods, real‑world enterprise deployments, and resources for further learning, illustrating how it enables sub‑second query performance on massive datasets.

Apache KylinBig DataCube
0 likes · 20 min read
An Introduction to Apache Kylin: Architecture, Core Concepts, Installation, and Enterprise Use Cases
21CTO
21CTO
Aug 1, 2020 · Big Data

Mastering User Profiling: A Comprehensive Big Data Blueprint

This article explains how enterprises can leverage massive raw and business data to build detailed user profiles, covering tag types, data architecture, development modules, project phases, key deliverables, and a real-world e‑commerce case study.

Big DataETLSpark
0 likes · 22 min read
Mastering User Profiling: A Comprehensive Big Data Blueprint
DataFunTalk
DataFunTalk
Aug 1, 2020 · Big Data

User Profiling Methodology and Engineering Solutions

This article explains the fundamentals of user profiling in the big data era, covering tag types, data architecture, development modules, a step‑by‑step implementation process, a practical e‑commerce case study, table design strategies, and both quantitative and qualitative profiling methods.

Big DataETLmachine learning
0 likes · 22 min read
User Profiling Methodology and Engineering Solutions

How Pandemic Data Visualization Evolved: From John Snow’s Cholera Map to Modern COVID Dashboards

This article traces the history and development of pandemic data visualization—from 19th‑century cholera maps and early 2000s SARS charts to sophisticated COVID‑19 dashboards—while outlining five essential design principles that make such visualizations clear, engaging, and impactful.

Big DataCOVID-19design principles
0 likes · 13 min read
How Pandemic Data Visualization Evolved: From John Snow’s Cholera Map to Modern COVID Dashboards
Tencent Cloud Developer
Tencent Cloud Developer
Jul 30, 2020 · Big Data

Cost Governance Practices in Youzan's Data Middle Platform

Youzan's data middle platform faced cost growth outpacing business due to low utilization and storage inefficiencies; they applied utilization standards, containerization, COS storage migration, offline task optimization, and fine-grained cost-billing, achieving a 12% compute boost, 17% batch savings, 80% storage cost cut, and over 25% overall cost reduction.

Big DataCloud ComputingContainerization
0 likes · 24 min read
Cost Governance Practices in Youzan's Data Middle Platform
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 30, 2020 · Big Data

Understanding Bucket Sampling Queries in Hive

This article explains Hive's bucket sampling syntax, demonstrates how to use the TABLESAMPLE clause with various bucket parameters, provides concrete SQL examples, and clarifies the underlying hash‑based mechanism that determines which rows are returned.

Big DataBucket SamplingTablesample
0 likes · 4 min read
Understanding Bucket Sampling Queries in Hive
Tencent Cloud Developer
Tencent Cloud Developer
Jul 29, 2020 · Big Data

Case Study: Optimizing Tencent Cloud Elasticsearch for High‑Volume Game Log Analytics

To handle a gaming company's million‑QPS log stream, the team built a hot‑cold Tencent Cloud Elasticsearch cluster with ILM‑driven tiering, scaled CPU/heap, reduced shard count via shrink and replica tweaks, tuned Logstash‑Kafka pipelines, and employed COS snapshots and searchable snapshots, achieving stable performance and lower cost.

Big DataElasticsearchILM
0 likes · 29 min read
Case Study: Optimizing Tencent Cloud Elasticsearch for High‑Volume Game Log Analytics
MaGe Linux Operations
MaGe Linux Operations
Jul 28, 2020 · Big Data

How Leading Chinese Companies Scale Elasticsearch for Billions of Orders

This article surveys how major Chinese tech firms such as JD.com, Ctrip, Didi, and 58.com deploy and evolve Elasticsearch clusters to handle massive order data, log analysis, real‑time monitoring, and security tasks, detailing architecture choices, shard strategies, multi‑cluster designs, and performance optimizations.

Big DataElasticsearchOrder Management
0 likes · 11 min read
How Leading Chinese Companies Scale Elasticsearch for Billions of Orders
Xianyu Technology
Xianyu Technology
Jul 28, 2020 · Operations

ShenTan: Automated Fault Localization System for Online Services

ShenTan is an automated fault‑localization platform for online services that quickly (under five seconds) pinpoints server‑side issues with developer‑level accuracy by aggregating real‑time metrics, applying a decision‑tree model enriched by expert knowledge and dynamic thresholds, and presenting results through an integrated alert and visualization system, while planning broader endpoint coverage and multi‑tenant support.

AutomationBig DataFault Localization
0 likes · 12 min read
ShenTan: Automated Fault Localization System for Online Services
dbaplus Community
dbaplus Community
Jul 26, 2020 · Big Data

How Prometheus Powers Scalable Monitoring for Massive Big Data Clusters

Facing thousands of nodes in expanding big‑data clusters, the author evaluates legacy monitoring stacks, selects Prometheus + Alertmanager + Grafana, and details its architecture, custom exporters, real‑time alerts, self‑healing mechanisms, and visual dashboards that now support ten large clusters and dozens of services.

AlertmanagerBig DataGrafana
0 likes · 11 min read
How Prometheus Powers Scalable Monitoring for Massive Big Data Clusters
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 23, 2020 · Big Data

Comprehensive Kafka FAQ: Uses, Architecture, Offsets, and Partition Management

This article provides an extensive overview of Apache Kafka, covering its use cases, key concepts such as ISR, AR, HW, LEO, and LW, message ordering, the roles of partitioners, serializers and interceptors, producer and consumer client architecture, offset handling, multithreaded consumption, and topic partition management.

Big DataKafkaMessage Queue
0 likes · 16 min read
Comprehensive Kafka FAQ: Uses, Architecture, Offsets, and Partition Management
dbaplus Community
dbaplus Community
Jul 22, 2020 · Databases

How to Optimize Real‑Time Vector Tile Services for Millions of Features with PostgreSQL & PostGIS

This article explains how to efficiently browse and render millions of GIS features in real‑time vector tiles using PostgreSQL and PostGIS, covering background challenges, several thinning algorithms, their implementation steps, limitations, advantages, and a practical example with a 3‑million‑point dataset.

Big DataData DilutionGIS
0 likes · 8 min read
How to Optimize Real‑Time Vector Tile Services for Millions of Features with PostgreSQL & PostGIS
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 22, 2020 · Big Data

Kafka Architecture and Core Concepts: Producers, Brokers, and Consumers

This article explains Kafka's fundamental architecture, including the roles of producers, brokers, and consumers, key concepts such as topics, partitions, replicas, ISR, and controller, as well as detailed mechanisms of producer client structure, interceptors, serializers, partitioners, and consumer group rebalancing strategies.

Big DataDistributed SystemsKafka
0 likes · 22 min read
Kafka Architecture and Core Concepts: Producers, Brokers, and Consumers
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 22, 2020 · Big Data

Exploring the Apache Big Data Ecosystem: Hadoop, Spark, Flink, and More

This article surveys the rapidly evolving big data landscape by reviewing a wide range of Apache projects—including Hadoop, Spark, Flink, HBase, Kudu, Impala, Kafka, and others—detailing their core components, architectures, strengths, and typical use‑cases for building distributed data platforms.

ApacheBig DataDistributed Systems
0 likes · 20 min read
Exploring the Apache Big Data Ecosystem: Hadoop, Spark, Flink, and More
Tencent Cloud Developer
Tencent Cloud Developer
Jul 21, 2020 · Big Data

Scaling Tencent Meeting Video Stream Quality Analysis with Tencent Cloud Elasticsearch

Facing explosive growth and massive video‑stream quality data, Tencent Meeting migrated its custom Lucene‑based analysis engine to Tencent Cloud Elasticsearch, which delivered over 1 million writes per second, automatic sharding, reduced latency from hours to seconds, and sustained 99.99% availability, proving a high‑performance, scalable solution for large‑scale video conferencing.

Big DataCloud ComputingElasticsearch
0 likes · 16 min read
Scaling Tencent Meeting Video Stream Quality Analysis with Tencent Cloud Elasticsearch
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 19, 2020 · Big Data

An Overview of Hive, HBase Integration, Apache Phoenix, and Lealone in the Big Data Ecosystem

This article explains Hive's role as a Hadoop‑based data warehouse, its integration with HBase, the advantages and drawbacks of that combination, introduces Apache Phoenix as a high‑performance SQL layer on HBase, and describes the open‑source NewSQL database Lealone, providing practical usage scenarios and performance comparisons.

Big DataHBaseLealone
0 likes · 9 min read
An Overview of Hive, HBase Integration, Apache Phoenix, and Lealone in the Big Data Ecosystem
Ctrip Technology
Ctrip Technology
Jul 16, 2020 · Big Data

Design and Architecture of the User Profiling System at Ctrip Business Travel

This article describes the concept, tag taxonomy, data flow architecture, and Lambda‑based query service design of Ctrip Business Travel's user profiling system, highlighting how batch and real‑time processing with Spark, Flink, Hive, MongoDB and Redis enable precise marketing, risk control and personalized services.

Big DataCtripdata pipeline
0 likes · 12 min read
Design and Architecture of the User Profiling System at Ctrip Business Travel
Architect
Architect
Jul 15, 2020 · Big Data

Understanding Flink Task Slots, Resource Allocation, and Slot Sharing Mechanisms

This article explains how Flink uses task slots to partition TaskManager resources, the benefits of slot sharing, the interaction between Scheduler, SlotPool, and ResourceManager, and the internal classes such as LogicalSlot, PhysicalSlot, and SlotSharingManager that enable resource isolation and sharing in stream processing jobs.

Big DataFlinkResource Management
0 likes · 6 min read
Understanding Flink Task Slots, Resource Allocation, and Slot Sharing Mechanisms
Youzan Coder
Youzan Coder
Jul 15, 2020 · Big Data

Design and Implementation of Youzan ABTest System for Data‑Driven Growth

Youzan created an internal A/B testing platform—combining Java/Node SDKs, a real‑time data pipeline, and a metadata‑driven workflow—to enable data‑driven product iteration, granular traffic allocation, automated logging, statistical analysis, and scalable growth insights across its merchant services, while planning further automation and integration.

A/B testingBig DataExperiment Platform
0 likes · 19 min read
Design and Implementation of Youzan ABTest System for Data‑Driven Growth
Huolala Tech
Huolala Tech
Jul 15, 2020 · Big Data

How to Build Smart, Scalable Data Tracking Solutions for Comprehensive Analytics

This article explores the fundamentals, common schemes, pain points, and a smart end‑to‑end solution for data tracking (埋点), offering practical guidelines, architectural diagrams, and a concrete example to help engineers implement comprehensive, controllable, and efficient event collection pipelines.

AnalyticsBig DataData Tracking
0 likes · 9 min read
How to Build Smart, Scalable Data Tracking Solutions for Comprehensive Analytics
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 12, 2020 · Big Data

Design and Implementation of Ozone Data Exploration Service (Recon Server)

This article explains the design of a data exploration service for large‑scale distributed storage systems, detailing metadata synchronization, index reconstruction, aggregation tables, node‑level statistics, a user console, and the transition from checkpoint‑based snapshots to delta updates using RocksDB WAL in Hadoop Ozone Recon Server.

Big DataDelta UpdatesOzone
0 likes · 9 min read
Design and Implementation of Ozone Data Exploration Service (Recon Server)
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 9, 2020 · Big Data

How ZooKeeper Supports HBase: Coordination, Fault Tolerance, Log Splitting, META Table Management, and Replication

This article explains how ZooKeeper functions as a distributed coordination service for HBase, detailing its role in master and RegionServer fault tolerance, log splitting, META table location tracking, and replication management, illustrating the underlying ZNode structures and failover mechanisms.

Big DataDistributed CoordinationHBase
0 likes · 7 min read
How ZooKeeper Supports HBase: Coordination, Fault Tolerance, Log Splitting, META Table Management, and Replication
Sohu Tech Products
Sohu Tech Products
Jul 8, 2020 · Big Data

Optimizing Workflow in Data Warehouse Construction: A Layered Task‑Instance Approach

The article analyzes data‑warehouse workflow scenarios, explains core concepts such as OLAP, multidimensional modeling and layer architecture, reviews existing workflow engines like Azkaban, Oozie and Airflow, and proposes a task‑and‑instance layered optimization that simplifies dependency configuration, improves collaboration, and supports complex scheduling in modern big‑data environments.

Big DataETLWorkflow
0 likes · 21 min read
Optimizing Workflow in Data Warehouse Construction: A Layered Task‑Instance Approach
dbaplus Community
dbaplus Community
Jul 7, 2020 · Big Data

How Flink + ClickHouse Power Real‑Time Analytics at Scale

This article explains how FunTouTiao builds a high‑performance real‑time analytics pipeline using Flink, Hive, and ClickHouse, covering business scenarios, hour‑level and second‑level Flink‑to‑Hive architectures, streaming file sink mechanics, multi‑user permissions, ClickHouse performance tricks, and future roadmap for unified stream‑batch storage.

Big DataClickHouseFlink
0 likes · 18 min read
How Flink + ClickHouse Power Real‑Time Analytics at Scale
Programmer DD
Programmer DD
Jul 7, 2020 · Big Data

How to Choose a Worthwhile Technology: Depth, Ecosystem, and Evolution

The article outlines a three‑dimensional framework—technical depth, ecosystem breadth, and evolution capability—to help engineers decide which big‑data or stream‑processing technology (such as Hadoop, Spark, or Flink) is worth investing time in, and provides practical tips like using Google Trends and GitHub awesome lists.

Big DataFlinkHadoop
0 likes · 12 min read
How to Choose a Worthwhile Technology: Depth, Ecosystem, and Evolution
dbaplus Community
dbaplus Community
Jul 5, 2020 · Big Data

How a Chinese Bank Built a Real‑Time Log Management Platform with Apollo and Elasticsearch

Facing massive, multi‑system log volumes, China Minsheng Bank’s big‑data team designed a real‑time intelligent log platform by integrating Ctrip’s open‑source Apollo configuration center with Elasticsearch, enabling centralized, versioned, hot‑reloading configuration, role‑based parameter management, and high‑availability deployment across thousands of servers.

ApolloBig DataDevOps
0 likes · 30 min read
How a Chinese Bank Built a Real‑Time Log Management Platform with Apollo and Elasticsearch
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 5, 2020 · Big Data

Understanding Spark Memory Management: On‑heap, Off‑heap, and Unified Memory

This article provides a comprehensive overview of Spark's memory management, covering executor memory architecture, the differences between on‑heap and off‑heap memory, static versus unified memory managers, storage and execution memory handling, and practical guidelines for optimizing Spark applications.

Big DataExecutorMemory Management
0 likes · 21 min read
Understanding Spark Memory Management: On‑heap, Off‑heap, and Unified Memory
Architect
Architect
Jul 4, 2020 · Big Data

Kuaishou Flink Real‑Time Architecture and Spring Festival Gala Assurance Practices

This article details Kuaishou's Flink‑based real‑time computing architecture, its massive cluster scale, and the comprehensive strategies—including overload protection, system stability, pressure testing, and resource guarantees—implemented to ensure reliable streaming for the 2020 Spring Festival Gala and its real‑time dashboard.

Big DataFlinkKuaishou
0 likes · 12 min read
Kuaishou Flink Real‑Time Architecture and Spring Festival Gala Assurance Practices
Youzan Coder
Youzan Coder
Jul 3, 2020 · Big Data

Data Cost Quantification, Billing, and Optimization in a Data Platform

The data‑platform team introduced a self‑sustaining cost‑reduction framework that quantifies CPU, memory, and disk expenses using price‑per‑resource formulas, applies time‑weighted billing, generates multi‑level reports, and drives optimization through six actionable “swords” and incentive‑based operations, achieving roughly 17 % offline‑cluster savings within six months.

Big DataCost OptimizationResource Quantification
0 likes · 15 min read
Data Cost Quantification, Billing, and Optimization in a Data Platform
Youzan Coder
Youzan Coder
Jul 1, 2020 · Big Data

Mastering HiveCube: Efficient Multi‑Dimensional Aggregation with Grouping Sets

This article explains how HiveCube can replace traditional development for multi‑dimensional aggregation in a data‑warehouse, covering background, theory of cube, with‑cube/rollup/grouping‑sets syntax, grouping_id handling, practical implementation tips, performance tuning, and a comparison with conventional methods.

Big DataCubeGrouping Sets
0 likes · 19 min read
Mastering HiveCube: Efficient Multi‑Dimensional Aggregation with Grouping Sets
Tencent Advertising Technology
Tencent Advertising Technology
Jun 29, 2020 · Artificial Intelligence

2020 Tencent Advertising Rhinoceros Bird Special Research Program Call for Proposals

The Tencent Advertising Rhinoceros Bird Special Research Program, launched in June 2020, invites global academia to collaborate on advertising technology challenges in AI, big data, and related fields, outlining the application process, evaluation criteria, and accompanying Wiztalk lecture series.

Big DataTencent AdvertisingWiztalk Lectures
0 likes · 4 min read
2020 Tencent Advertising Rhinoceros Bird Special Research Program Call for Proposals
Big Data and Microservices
Big Data and Microservices
Jun 28, 2020 · Big Data

Data Warehouse vs Data Lake vs Data Platform vs Data Middle Platform: Which Fits Your Business?

This article compares data warehouse, data lake, data platform, and data middle platform, explaining their definitions, architectures, strengths, limitations, and use‑case differences, and provides tables that highlight how each solution handles structured and unstructured data, governance, flexibility, and business value.

Big DataData ArchitectureData Lake
0 likes · 12 min read
Data Warehouse vs Data Lake vs Data Platform vs Data Middle Platform: Which Fits Your Business?
Tencent Cloud Developer
Tencent Cloud Developer
Jun 24, 2020 · Industry Insights

How Industrial Internet Is Reshaping China's Light Manufacturing: Trends, Challenges, and Opportunities

The article analyzes the rapid shift from "Made in China" to "Intelligent Manufacturing" driven by industrial internet, 5G, AI and big data, highlighting policy evolution, case studies across light industry, liquor production and hazardous chemicals, and Tencent Cloud's strategic role in enabling digital transformation.

5GAIBig Data
0 likes · 33 min read
How Industrial Internet Is Reshaping China's Light Manufacturing: Trends, Challenges, and Opportunities
Big Data and Microservices
Big Data and Microservices
Jun 24, 2020 · Industry Insights

What Is a Data Middle Platform and How It Boosts Business Agility

The article explains what a data middle platform is, why it differs from a traditional big‑data platform, the efficiency, collaboration and talent challenges it addresses, its definition as a data‑driven innovation layer built on big data, cloud and AI, and outlines its logical architecture centered on data APIs.

Artificial IntelligenceBig DataCloud Computing
0 likes · 6 min read
What Is a Data Middle Platform and How It Boosts Business Agility
dbaplus Community
dbaplus Community
Jun 20, 2020 · Big Data

What’s New in Apache Spark 3.0? Explore Dynamic Partition Pruning, AQE, and More

Apache Spark 3.0, released after a 21‑month development cycle, introduces dynamic partition pruning, adaptive query execution, accelerator‑aware scheduling, DataSource V2, enhanced pandas UDFs, new join hints, richer monitoring, ANSI‑SQL compatibility, SparkR vectorization, Kafka header support, and numerous platform upgrades, all backed by over 3,400 resolved issues.

Adaptive Query ExecutionApache SparkBig Data
0 likes · 17 min read
What’s New in Apache Spark 3.0? Explore Dynamic Partition Pruning, AQE, and More
dbaplus Community
dbaplus Community
Jun 18, 2020 · Databases

How a Hybrid Data Warehouse Transformed Banking Data Services

This article details the 2015 hybrid data‑warehouse design implemented at Guangdong Huaxing Bank, explaining its real‑time, historical, and archival layers, the data‑bus concept, and how mixing in‑memory, relational, and Hadoop technologies addressed modern banking data‑volume, latency, and unstructured‑data challenges.

BankingBig DataHadoop
0 likes · 20 min read
How a Hybrid Data Warehouse Transformed Banking Data Services
DataFunTalk
DataFunTalk
Jun 18, 2020 · Big Data

Real-time Data Processing at QuTouTiao: Flink + ClickHouse Architecture and Practices

QuTouTiao leverages Flink and ClickHouse to build a high‑performance real‑time analytics platform that supports hourly Hive pipelines and sub‑second ClickHouse queries, achieving sub‑second response for 80% of requests through streaming ingestion, exactly‑once semantics, multi‑cluster coordination, and optimized ClickHouse storage and connector designs.

Big DataClickHouseFlink
0 likes · 16 min read
Real-time Data Processing at QuTouTiao: Flink + ClickHouse Architecture and Practices
JD Retail Technology
JD Retail Technology
Jun 17, 2020 · Operations

How JD’s Data Platforms Scaled for the 618 Mega‑Sale: Operations, Stress‑Testing, and Dual‑Stream Architecture

The article details JD’s data product teams’ systematic preparation for the 618 shopping festival, covering pressure estimation, capacity expansion, stress testing, emergency downgrade strategies, dual‑data‑center isolation, high‑fidelity end‑to‑end testing, and continuous monitoring to ensure stable, real‑time data services during massive traffic spikes.

Big DataData PlatformJD.com
0 likes · 10 min read
How JD’s Data Platforms Scaled for the 618 Mega‑Sale: Operations, Stress‑Testing, and Dual‑Stream Architecture