Tagged articles

3675 articles

Page 11 of 37

Aug 2, 2023 · Big Data

Loop Detection in Risk Control: Challenges, Distributed Graph Computing Optimizations, and ArcNeural Engine Case Studies

This article discusses the challenges of loop detection in financial risk control, presents distributed graph computing optimization techniques—including pruning, multi‑graph handling, and memory‑efficient algorithms—shows experimental results, and shares real‑world ArcNeural engine case studies and future directions.

ArcNeuralBig DataLoop Detection

0 likes · 13 min read

Loop Detection in Risk Control: Challenges, Distributed Graph Computing Optimizations, and ArcNeural Engine Case Studies

HomeTech

Aug 2, 2023 · Artificial Intelligence

Push Precision Recommendation System: Overview, Iteration, and Design

This article presents a comprehensive overview of the push precision recommendation system, detailing its data processing pipeline, machine‑learning‑driven algorithms, modular architecture—including offline, near‑real‑time, and push layers—and subsequent system iterations, optimizations, visual monitoring platforms, and future development directions.

ArchitectureBig Datamachine learning

0 likes · 11 min read

Push Precision Recommendation System: Overview, Iteration, and Design

FunTester

Aug 1, 2023 · Big Data

Rethinking Big Data Testing: Defining Problem Domains and Key Test Areas

The article explores how to approach testing for big data platforms and applications by first defining problem domains, categorizing concrete user‑oriented questions, and then mapping them to focused test areas such as data extraction, real‑time updates, algorithm verification, and response timeliness.

Big DataQuality assuranceapplication

0 likes · 7 min read

Rethinking Big Data Testing: Defining Problem Domains and Key Test Areas

Alibaba Cloud Developer

Jul 31, 2023 · Big Data

From BI to Kappa: How Data Architecture Evolved in the Big Data Era

This article traces the evolution of data architecture from early BI systems through traditional big‑data stacks, streaming, Lambda and Kappa designs, and explains how a unified stream‑batch model simplifies development while keeping logic consistent across data‑analysis and pipeline applications.

BI systemsBig DataData Architecture

0 likes · 16 min read

From BI to Kappa: How Data Architecture Evolved in the Big Data Era

DataFunTalk

Jul 29, 2023 · Databases

Xiaomi’s OLAP Practice with Apache Doris: System Selection, Architecture, and User Behavior Analytics

This article details Xiaomi Group’s adoption of Apache Doris for OLAP, covering the evolution of their system selection, the architecture of their data ecosystem, practical implementations for user behavior analysis, and future plans to enhance performance, stability, and scalability.

Apache DorisBig DataOLAP

0 likes · 25 min read

Xiaomi’s OLAP Practice with Apache Doris: System Selection, Architecture, and User Behavior Analytics

DataFunSummit

Jul 28, 2023 · Big Data

User Path Analysis and SessionAnalytics: Business Practices, Technical Architecture, and Open‑Source Framework

This article introduces user path analysis and the SessionAnalytics open‑source framework, covering business scenarios, data processing techniques, algorithmic mining methods, technical architecture, implementation details, comparisons with event‑based analysis, and a comprehensive Q&A for practical deployment.

Big DataNLPdata engineering

0 likes · 19 min read

User Path Analysis and SessionAnalytics: Business Practices, Technical Architecture, and Open‑Source Framework

GuanYuan Data Tech Team

Jul 27, 2023 · Big Data

How Delta Lake Powers Scalable BI & AI: Real-World Practices and Optimizations

Guandata’s R&D leader outlines how their analytics platform leverages Delta Lake and Spark to deliver fast, ACID‑compliant BI and AI workloads, detailing architecture, key features like schema evolution and time travel, and practical performance tricks such as compaction, vacuuming, and multi‑engine integration.

AIBIBig Data

0 likes · 14 min read

How Delta Lake Powers Scalable BI & AI: Real-World Practices and Optimizations

Top Architect

Jul 27, 2023 · Big Data

Performance Comparison of Elasticsearch and ClickHouse for Log Search

This article compares Elasticsearch and ClickHouse as log‑search solutions, detailing their architectures, Docker‑compose deployments, data‑ingestion pipelines with Vector, query syntax differences, and benchmark results that show ClickHouse generally outperforms Elasticsearch in speed and aggregation efficiency.

Big DataClickHouseDocker

0 likes · 13 min read

Performance Comparison of Elasticsearch and ClickHouse for Log Search

vivo Internet Technology

Jul 26, 2023 · Big Data

Understanding HBase Compaction: Principles, Process, Throttling Strategies, and Optimization Cases

Understanding HBase compaction involves knowing its minor and major merge types, trigger mechanisms, file‑selection policies such as RatioBased and Exploring, throttling controls based on file count, and practical tuning of key parameters to avoid latency spikes, as illustrated by real‑world production cases.

Big DataHBasecompaction

0 likes · 36 min read

Understanding HBase Compaction: Principles, Process, Throttling Strategies, and Optimization Cases

DataFunTalk

Jul 25, 2023 · Databases

Building an Integrated Metric Data Service Platform with Apache Doris: Architecture Evolution and Millisecond‑Level Query Performance

This article describes how Financial One Account, a technology service arm of Ping An, migrated from a Hadoop‑Presto‑Kylin stack to an Apache Doris‑based data platform, detailing the architectural evolution, OLAP engine selection, metric system design, performance optimizations, and future roadmap for real‑time analytics.

Apache DorisBig DataOLAP

0 likes · 15 min read

Building an Integrated Metric Data Service Platform with Apache Doris: Architecture Evolution and Millisecond‑Level Query Performance

Architect's Guide

Jul 24, 2023 · Big Data

Using Bitmap and Bloom Filter to De‑duplicate 4 Billion IDs Within 1 GB Memory

The article explains how to store and de‑duplicate 4 billion unsigned integers using a bitmap to reduce memory from 14.9 GB to under 500 MB, introduces the concept and benefits of bitmaps, describes Bloom filters, their principles, advantages, limitations, typical use cases, and provides Java and Redis implementation examples.

Big DataBitmapData Structures

0 likes · 10 min read

Using Bitmap and Bloom Filter to De‑duplicate 4 Billion IDs Within 1 GB Memory

21CTO

Jul 17, 2023 · Big Data

How WeChat Cut Query Latency from Seconds to 100 ms with Druid Optimizations

This case study explains how the WeChat multi‑dimensional monitoring platform identified performance bottlenecks in its Druid‑based data layer, analyzed user query patterns, and applied sub‑query splitting, Redis caching, and segment size reductions to achieve over 85% cache‑hit rates and bring average query latency down to around 100 ms.

Big DataDruidcaching

0 likes · 13 min read

How WeChat Cut Query Latency from Seconds to 100 ms with Druid Optimizations

Big Data Technology & Architecture

Jul 17, 2023 · Big Data

Incremental Query of Hudi Tables Using Hive, Spark SQL, and Flink SQL

This guide explains how to perform incremental queries on Hudi tables by configuring Hive synchronization, using Spark SQL both programmatically and via pure SQL, and leveraging Flink SQL in batch and streaming modes, with detailed parameter settings and code examples.

Big DataFlink SQLHudi

0 likes · 20 min read

Incremental Query of Hudi Tables Using Hive, Spark SQL, and Flink SQL

Data Thinking Notes

Jul 16, 2023 · Big Data

How to Build a Reliable Real‑Time Data Warehouse: Timeliness, Quality, and Cost Strategies

This article outlines practical methods for ensuring timeliness, data quality, stability, cost efficiency, agility, and management in real‑time data warehouse pipelines using technologies like Flink and Kafka, while addressing consistency, completeness, and high‑availability concerns.

Big DataData QualityFlink

0 likes · 10 min read

How to Build a Reliable Real‑Time Data Warehouse: Timeliness, Quality, and Cost Strategies

ITPUB

Jul 16, 2023 · Big Data

How WeChat Reduced Query Latency from 1000ms to 100ms in Its Multi‑Dimensional Monitoring Platform

This article explains how the WeChat multi‑dimensional monitoring platform, which processes billions of data points daily, identified performance bottlenecks in its Druid‑based data layer and applied sub‑query splitting, Redis caching, and sub‑dimension tables to achieve over 85% cache hit rate and bring average query time down to around 100 ms.

Big DataDruidPerformance Optimization

0 likes · 13 min read

How WeChat Reduced Query Latency from 1000ms to 100ms in Its Multi‑Dimensional Monitoring Platform

Top Architect

Jul 14, 2023 · Big Data

Lambda Architecture: Real-Time Big Data Processing and Practical Use Cases

This article introduces the Lambda Architecture for billion‑scale real‑time data analysis, explains its three layers—Batch, Speed, and Serving—covers its flexibility, fault tolerance, and scalability, and demonstrates concrete applications such as Twitter hashtag analysis and a smart‑parking recommendation system.

Batch LayerBig DataLambda architecture

0 likes · 11 min read

Lambda Architecture: Real-Time Big Data Processing and Practical Use Cases

Big Data Technology & Architecture

Jul 12, 2023 · Big Data

Design and Evolution of Volcano Engine DataLeap Data Catalog System

This article details the architecture, design decisions, and iterative improvements of the Data Catalog product within Volcano Engine's DataLeap suite, covering metadata management, ingestion pipelines, search optimization, lineage capabilities, storage layer enhancements, and future development directions.

Apache AtlasBig DataConnector

0 likes · 16 min read

Design and Evolution of Volcano Engine DataLeap Data Catalog System

DataFunSummit

Jul 11, 2023 · Big Data

Tencent's Autonomous Big Data Platform: Data‑Driven Governance and AI‑Powered Optimization

Tencent’s big data platform introduces a data‑plus‑algorithm driven autonomous solution that automates self‑diagnosis, self‑optimization, and self‑management for trillion‑scale analytics, addressing challenges of massive task governance, resource efficiency, and stability through observable data foundations, pluggable decision engines, and generalized AI decision intelligence.

AI decisionAutonomous PlatformBig Data

0 likes · 17 min read

Tencent's Autonomous Big Data Platform: Data‑Driven Governance and AI‑Powered Optimization

Big Data Technology & Architecture

Jul 11, 2023 · Big Data

Best Practices for Setting Buckets and Partitions in Apache Doris

This article explains how improper bucket and partition settings in Apache Doris can degrade read/write performance, provides quantitative guidelines for choosing bucket and partition counts, and introduces the automatic bucket feature with practical syntax and usage tips.

Apache DorisBig DataBucket

0 likes · 9 min read

Best Practices for Setting Buckets and Partitions in Apache Doris

DataFunTalk

Jul 11, 2023 · Big Data

Analysis of Lakehouse Storage Systems: Design, Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Apache Hudi, and Apache Iceberg

This article examines the architecture and core design of lakehouse storage systems, compares the metadata handling and Merge‑On‑Read mechanisms of Delta Lake, Apache Hudi, and Apache Iceberg, and presents practical performance‑optimization techniques and real‑world case studies on Alibaba Cloud EMR.

Apache HudiApache IcebergBig Data

0 likes · 18 min read

Analysis of Lakehouse Storage Systems: Design, Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Apache Hudi, and Apache Iceberg

Java Architecture Diary

Jul 11, 2023 · Big Data

Redpanda vs Apache Kafka with KRaft: Why Redpanda Is Up to 10× Faster

This article presents a detailed benchmark comparing Redpanda 23.1 and Apache Kafka 3.4.0 (with and without KRaft) across multiple AWS instance types, showing how Redpanda consistently delivers higher throughput and dramatically lower end‑to‑end latency, often outperforming Kafka by 4‑20× even with extra hardware.

Apache KafkaBig DataKRaft

0 likes · 12 min read

Redpanda vs Apache Kafka with KRaft: Why Redpanda Is Up to 10× Faster

Architect

Jul 10, 2023 · Big Data

Understanding Lambda Architecture for Real‑Time Billion‑Scale Data Analysis

This article explains the Lambda Architecture—a three‑layer big‑data processing model combining batch and speed layers to deliver accurate, low‑latency analytics, and illustrates its use with Twitter hashtag tracking and a smart‑parking recommendation system.

Batch ProcessingBig DataLambda architecture

0 likes · 10 min read

Understanding Lambda Architecture for Real‑Time Billion‑Scale Data Analysis

DataFunSummit

Jul 9, 2023 · Big Data

Data Governance and Application for Behavior Analysis: Modeling Methods, Architecture, and Practical Cases

This article explains how a data‑ecosystem team governs and applies behavior‑analysis data by describing common analysis scenarios, data‑warehouse modeling methods and their pros and cons, the concepts and overall architecture of behavior‑centric analytics, key system components, and several concrete analysis examples such as retention, funnel and path analysis.

Big DataColumnar StorageUser Segmentation

0 likes · 12 min read

Data Governance and Application for Behavior Analysis: Modeling Methods, Architecture, and Practical Cases

DataFunTalk

Jul 9, 2023 · Operations

Building High‑Performance Observability Data Pipelines with Vector and Honghu

This article explains the concepts and importance of observability, introduces the Vector data‑pipeline tool and its architecture, demonstrates how to configure sources, transforms and sinks, and shows how to integrate Vector with the Honghu platform to build a complete, real‑time monitoring solution for modern distributed systems.

Big DataHonghuObservability

0 likes · 33 min read

Building High‑Performance Observability Data Pipelines with Vector and Honghu

DataFunSummit

Jul 8, 2023 · Big Data

Data Preparation Practices at Douyin Group for Diverse Application Scenarios

This article explains Douyin Group's large‑scale data applications, introduces the concept and architecture of data preparation, details its four subsystems and modular capabilities, and showcases how these are applied in BI, CDP, and custom scenarios within the Volcano Engine ecosystem.

BIBig DataCDP

0 likes · 16 min read

Data Preparation Practices at Douyin Group for Diverse Application Scenarios

AntTech

Jul 6, 2023 · Industry Insights

Unlocking AI Value: Data Quality, Privacy, and Blockchain in the Smart Era

The article examines how high‑quality data, robust privacy protection, and blockchain‑enabled trust infrastructure are essential for unlocking the value of AI models, citing market forecasts, examples from smart‑car and fintech firms, and the growing Chinese big‑data market through 2026.

AIBig DataBlockchain

0 likes · 9 min read

Unlocking AI Value: Data Quality, Privacy, and Blockchain in the Smart Era

Huolala Tech

Jul 6, 2023 · Big Data

How to Optimize DAG Task Scheduling to Cut 30 Minutes from Critical Path

This article explains how to analyze and automatically optimize complex DAG‑based data platform task chains, identify bottlenecks, adjust upstream task timings, and reduce critical‑path execution time by up to 30 minutes while preventing resource contention and peak overloads.

Big DataDAGResource Optimization

0 likes · 15 min read

How to Optimize DAG Task Scheduling to Cut 30 Minutes from Critical Path

Alibaba Cloud Big Data AI Platform

Jul 6, 2023 · Big Data

Explore World Cup Analytics on EMR Serverless StarRocks – Free Trial Guide

This guide walks you through creating a fully managed EMR Serverless StarRocks instance, loading historical World Cup data, and running OLAP SQL queries to analyze championship counts and host‑nation performance, all using a free trial of compute and storage resources.

Big DataOLAPStarRocks

0 likes · 11 min read

Explore World Cup Analytics on EMR Serverless StarRocks – Free Trial Guide

Python Programming Learning Circle

Jul 6, 2023 · Big Data

Analyzing Google Ngram Data with Python and PyTubes

This article demonstrates how to download the Google Ngram 1‑gram dataset, load the roughly 1.4 billion rows with Python and the PyTubes library, use NumPy to compute yearly word‑frequency percentages, filter and plot the trends for the word “Python” and compare it with other programming languages.

Big DataGoogle NgramPyTubes

0 likes · 8 min read

Analyzing Google Ngram Data with Python and PyTubes

DataFunSummit

Jul 6, 2023 · Big Data

Design and Practice of Alibaba Cloud's Billion‑Scale Real‑Time Log Analysis

This article presents Alibaba Cloud's SLS billion‑scale real‑time log analysis architecture, covering business background, core challenges such as low‑latency queries, massive data scale, high concurrency, and multi‑tenant isolation, and detailing key design solutions like LSM‑based storage, index‑columnar storage, data locality, layered caching, and future directions.

Big Datadistributed storagehigh concurrency

0 likes · 17 min read

Design and Practice of Alibaba Cloud's Billion‑Scale Real‑Time Log Analysis

Data Thinking Notes

Jul 5, 2023 · Big Data

Top 10 Big Data Trends Shaping China’s Data Industry in 2023

At the 2023 Big Data Industry Development Conference in Beijing, the China Communications Standards Association unveiled the top ten big‑data keywords, highlighting trends such as lake‑warehouse integration, data assetization, DataOps, intelligent analytics, data ethics, security, public data licensing, and cross‑border data flows.

Big DataData EthicsData Governance

0 likes · 16 min read

Top 10 Big Data Trends Shaping China’s Data Industry in 2023

dbaplus Community

Jul 5, 2023 · Databases

Mid‑Year 2023 Database Industry Roundup: Major Releases and Trends

The 2023 first‑half newsletter compiles a comprehensive overview of the database sector, highlighting the surge of domestic vendors, key technological breakthroughs such as HTAP and serverless, and detailed version updates across RDBMS, NewSQL, graph, time‑series, big‑data, and cloud databases, offering valuable insights for practitioners and decision‑makers.

Big DataIndustry ReportNewSQL

0 likes · 48 min read

Mid‑Year 2023 Database Industry Roundup: Major Releases and Trends

DataFunTalk

Jul 5, 2023 · Big Data

DataFun Summit 2023 Real‑Time Computing Forum – Speaker Line‑up and Session Details

The DataFun Summit 2023 Real‑Time Computing Forum showcases a series of expert talks on Apache Flink, stream‑batch integration, cloud‑native streaming databases, and large‑scale real‑time data warehousing, featuring speakers from Alibaba Cloud, Taobao, Didi, Ant Group and RisingWave.

Big DataCloud NativeData Warehousing

0 likes · 8 min read

DataFun Summit 2023 Real‑Time Computing Forum – Speaker Line‑up and Session Details

Big Data Technology & Architecture

Jul 4, 2023 · Big Data

Building a Real‑Time Streaming Data Warehouse with Paimon on Kubernetes for Supply‑Chain Logistics

This article presents a step‑by‑step guide on how the logistics provider Haicheng Bangda implemented a streaming data warehouse using Paimon, Flink CDC, and Kubernetes, covering business background, architecture choices, environment setup, SQL examples, troubleshooting tips, and future roadmap for their digital transformation.

Big DataCDCFlink

0 likes · 27 min read

Building a Real‑Time Streaming Data Warehouse with Paimon on Kubernetes for Supply‑Chain Logistics

Data Thinking Notes

Jul 2, 2023 · Big Data

Mastering Data Governance: A Comprehensive Framework for Enterprise Success

This article outlines a complete data governance framework, detailing the five managerial domains—control, process, governance, technology, and value—along with strategies for data strategy, organizational structure, policies, processes, standards, quality, security, and platform tools, and highlights AI’s pivotal role in enhancing governance efficiency.

Big DataData GovernanceData Quality

0 likes · 10 min read

Mastering Data Governance: A Comprehensive Framework for Enterprise Success

DataFunSummit

Jul 2, 2023 · Big Data

Building a One‑Stop AB Testing Platform at NetEase Cloud Music: Architecture, Metric Infrastructure, Scientific Evaluation, and Efficiency

The article describes how NetEase Cloud Music designed and deployed a comprehensive AB testing platform, covering system infrastructure, metric modeling, scientific experiment validation (including SRM mitigation and statistical power), and operational efficiency improvements to support rapid product iteration across multiple devices.

AB testingBig DataData Infrastructure

0 likes · 13 min read

Building a One‑Stop AB Testing Platform at NetEase Cloud Music: Architecture, Metric Infrastructure, Scientific Evaluation, and Efficiency

DataFunTalk

Jul 2, 2023 · Big Data

Bilibili Data Service Middle Platform: Architecture, Practices, and Future Roadmap

This article presents Bilibili's data service middle platform, detailing its background, one‑stop data service architecture, core processes, model and API construction, query mechanisms, full‑link control, cost‑reduction, high‑availability strategies, achieved results, and future roadmap.

ArchitectureBig DataData Governance

0 likes · 18 min read

Bilibili Data Service Middle Platform: Architecture, Practices, and Future Roadmap

21CTO

Jun 30, 2023 · Information Security

How WeChat’s Security Data Warehouse Powers Billions of Daily Feature Reads

This article explains the origins, evolution, and current architecture of WeChat’s security data warehouse, detailing its unified feature storage, data quality guarantees, multi‑IDC synchronization, and operational system that streamlines feature management, analysis, and deployment to support the platform’s massive security strategy.

Big DataFeature ManagementOperations

0 likes · 15 min read

How WeChat’s Security Data Warehouse Powers Billions of Daily Feature Reads

iQIYI Technical Product Team

Jun 30, 2023 · Big Data

Advertising Data Lake Architecture and Real-time Optimizations

By replacing the costly Lambda architecture with a unified data‑lake built on Iceberg and Flink CDC, the advertising team achieved minute‑level latency, strong consistency, and lower storage expenses, cutting end‑to‑end processing times from hours to a few minutes across budgeting, warehousing, OLAP and ETL workloads.

AdvertisingBig DataFlink

0 likes · 13 min read

Advertising Data Lake Architecture and Real-time Optimizations

StarRocks

Jun 29, 2023 · Big Data

How StarRocks Boosted Mango TV’s Data Platform Performance by Over 10×

Mango TV replaced its fragmented EMR‑Hive‑Kudu‑Presto stack with a unified StarRocks lakehouse, simplifying architecture, cutting operational costs, and achieving more than a ten‑fold increase in query speed while supporting real‑time analytics, materialized views, bitmap indexing, and store‑compute separation.

Big DataBitmap IndexMaterialized Views

0 likes · 14 min read

How StarRocks Boosted Mango TV’s Data Platform Performance by Over 10×

DataFunTalk

Jun 29, 2023 · Big Data

Practical Deployment of Delta Lake in BI and AI Products

This article summarizes a technical presentation on how Delta Lake is integrated into a BI+AI platform, covering the product background, data‑lake architecture, Delta Lake features such as ACID transactions, schema management, multi‑engine support, performance optimizations, and future development directions.

AIBIBig Data

0 likes · 12 min read

Practical Deployment of Delta Lake in BI and AI Products

Big Data Technology & Architecture

Jun 27, 2023 · Big Data

Comprehensive Big Data Interview Experience and Questions Overview

The article presents a detailed three‑month interview journey that led to a position at a top new‑energy automotive firm, outlining the questions and topics covered in five interview rounds—including Hive, Spark, Flink, Kafka, data modeling, and data governance—to help candidates prepare for big‑data roles.

Big DataFlinkKafka

0 likes · 7 min read

Comprehensive Big Data Interview Experience and Questions Overview

Baidu Intelligent Cloud Tech Hub

Jun 27, 2023 · Cloud Native

How Hierarchical Namespace Boosts Cloud‑Native Data Lake Performance

This article examines the performance challenges of cloud‑native data lakes built on flat object storage and explains how a hierarchical‑namespace design improves directory operations, reduces request amplification, and delivers significant speedups for big‑data and AI workloads.

Big DataData Lakecloud-native

0 likes · 21 min read

How Hierarchical Namespace Boosts Cloud‑Native Data Lake Performance

Alibaba Cloud Big Data AI Platform

Jun 27, 2023 · Big Data

How MaxCompute’s Lakehouse Architecture Enables Near‑Real‑Time Incremental Processing

This article details Alibaba Cloud MaxCompute’s lakehouse evolution, describing its unified storage‑metadata‑compute design, the Transactional Table 2.0 format, near‑real‑time incremental ingestion, clustering and compaction services, transaction handling, TimeTravel and incremental queries, and future roadmap for big‑data workloads.

Big DataIncremental ProcessingLakehouse

0 likes · 23 min read

How MaxCompute’s Lakehouse Architecture Enables Near‑Real‑Time Incremental Processing

DataFunTalk

Jun 26, 2023 · Big Data

Iceberg Data Lake: Core Features, Xiaomi Use Cases, and Future Plans

This presentation details Iceberg's core capabilities—transactional writes, schema evolution, implicit partitioning, and row‑level updates—while showcasing Xiaomi's real‑world applications such as log ingestion redesign, near‑real‑time warehousing, offline optimizations, column‑level encryption, Hive migration strategies, and outlining upcoming enhancements like materialized views and cloud migration.

Big DataColumn EncryptionData Lake

0 likes · 20 min read

Iceberg Data Lake: Core Features, Xiaomi Use Cases, and Future Plans

dbaplus Community

Jun 25, 2023 · Big Data

WeChat’s 10× Query Speedup: From 1000ms to 100ms with Druid & Redis

WeChat’s multi‑dimensional monitoring platform faced severe query latency and I/O bottlenecks, so the team analyzed user behavior and Druid architecture, then introduced sub‑query splitting, Redis caching, and segment size reductions, achieving over 85% cache hit rate and reducing average query time to around 100 ms.

Big DataCacheDruid

0 likes · 12 min read

WeChat’s 10× Query Speedup: From 1000ms to 100ms with Druid & Redis

DataFunTalk

Jun 25, 2023 · Big Data

Multi‑Cloud Cache Evolution at Zhihu: From Multi‑HDFS to UnionStore to Alluxio

This technical presentation details Zhihu's journey in multi‑cloud caching, covering the motivations for a multi‑cloud architecture, the design and limitations of the self‑built UnionStore component, and the adoption of Alluxio to achieve significant performance, stability, and cost improvements across model serving and training workloads.

AlluxioBig Datacaching

0 likes · 24 min read

Multi‑Cloud Cache Evolution at Zhihu: From Multi‑HDFS to UnionStore to Alluxio

DataFunTalk

Jun 25, 2023 · Databases

An Overview of Apache Doris: Minimal Architecture, Simplicity, Rich Features, and Open‑Source Design

Apache Doris is an open‑source MPP OLAP database that combines a minimalist architecture, ease of use, rich features such as partition‑bucket pruning, materialized views, and bitmap indexes, and provides high‑performance, scalable, and reliable data warehousing for big‑data analytics.

Apache DorisBig DataMPP

0 likes · 19 min read

An Overview of Apache Doris: Minimal Architecture, Simplicity, Rich Features, and Open‑Source Design

Data Thinking Notes

Jun 24, 2023 · Fundamentals

Why a Robust Data Metric System Is the Lifeblood of Modern Businesses

This article explains the concepts, construction, and value of data metric systems and tag systems, describing how they help product managers turn raw data into actionable indicators, support decision‑making, guide operations, drive user growth, and ensure a unified statistical standard across the enterprise.

Big DataBusiness IntelligenceData Product Management

0 likes · 16 min read

Why a Robust Data Metric System Is the Lifeblood of Modern Businesses

DataFunTalk

Jun 24, 2023 · Big Data

Design and Architecture of MaxCompute Lakehouse Near‑Real‑Time Incremental Processing

This article explains the evolution of Alibaba Cloud's MaxCompute platform into a lakehouse architecture that supports near‑real‑time incremental processing, detailing its development history, core design of transactional tables, five‑module technical stack, data ingestion methods, optimization services, transaction management, query capabilities, ecosystem integration, practical applications, future roadmap, and common user questions.

Big DataData LakeIncremental Processing

0 likes · 24 min read

Design and Architecture of MaxCompute Lakehouse Near‑Real‑Time Incremental Processing

DataFunSummit

Jun 22, 2023 · Big Data

Building a Data Middle Platform Indicator System for the Automotive Industry

This article explains how a comprehensive indicator system within a data middle platform can address the automotive industry's data challenges, outlines the evolution of data platforms, details a step‑by‑step methodology for indicator design, development, and management, and presents real‑world case studies.

Big DataData Middle PlatformDigital Marketing

0 likes · 12 min read

Building a Data Middle Platform Indicator System for the Automotive Industry

Big Data Technology & Architecture

Jun 21, 2023 · Big Data

Design and Optimization of Bilibili's Real-Time Data Quality Monitoring Platform

This article details the background, architecture, challenges, and iterative improvements of Bilibili's real-time data quality monitoring platform, covering offline and streaming DQC, resource-efficient Flink designs, InfluxDB proxy integration, CQ table handling, operational safeguards, and future engineering plans.

Big DataData QualityFlink

0 likes · 22 min read

Design and Optimization of Bilibili's Real-Time Data Quality Monitoring Platform

Code Ape Tech Column

Jun 21, 2023 · Big Data

From Java Streams to Spark: Basic Big Data Operations Explained

This article demonstrates how developers familiar with Java Stream APIs can quickly grasp fundamental Spark operations—including map, flatMap, groupBy, and reduce—by translating stream examples into Spark code, providing complete code snippets, explanations of transformations versus actions, and practical tips for handling exceptions in distributed processing.

Big DataJava StreamMAP

0 likes · 24 min read

From Java Streams to Spark: Basic Big Data Operations Explained

MaGe Linux Operations

Jun 20, 2023 · Big Data

What Is Kafka? A Beginner’s Guide to Distributed Streaming and Messaging

Kafka is an open‑source, distributed streaming platform that uses a publish/subscribe message queue architecture to provide high‑throughput, fault‑tolerant real‑time data processing, featuring topics, partitions, replicas, consumer groups, and multiple APIs for producers, consumers, streams, connectors, and administration.

Big DataDistributed StreamingKafka

0 likes · 20 min read

What Is Kafka? A Beginner’s Guide to Distributed Streaming and Messaging

Wukong Talks Architecture

Jun 20, 2023 · Databases

Evolution of JD Baitiao’s Data Architecture: From MySQL to Apache ShardingSphere

This article chronicles JD Baitiao’s journey from early MySQL and NoSQL solutions through DBRep to the adoption of Apache ShardingSphere, highlighting the technical motivations, decoupling strategies, performance comparisons, and the broader Database Plus vision for scalable, stable financial‑grade data architectures.

ArchitectureBig DataJD Baitiao

0 likes · 14 min read

Evolution of JD Baitiao’s Data Architecture: From MySQL to Apache ShardingSphere

Architects' Tech Alliance

Jun 19, 2023 · Fundamentals

Understanding Complex Systems and Software Architecture: Definitions, Types, Principles, and Design Considerations

This article explains what complex systems and software architecture are, outlines various architectural categories, discusses essential functional and non‑functional requirements, and presents design principles and typical solutions such as domain‑driven design, microservices, cloud‑native, DevOps, and big‑data architectures for building stable, scalable, and maintainable systems.

Big DataComplex SystemsDomain-Driven Design

0 likes · 13 min read

Understanding Complex Systems and Software Architecture: Definitions, Types, Principles, and Design Considerations

DaTaobao Tech

Jun 19, 2023 · Product Management

User Experience Analysis of Taobao Detail Page Using User Journey and VOC Data

The article, the second in a ten‑part Taobao APP UX series, explains how module‑level user‑journey metrics and Voice‑of‑Customer chat data are collected, labeled with a BIO‑CRF taxonomy, clustered via DBSCAN, and correlated to identify size and quality concerns on the men’s‑clothing detail page, prompting module redesigns, A/B tests, and resulting in higher conversion rates and reduced dwell time.

A/B testingBig DataUser experience

0 likes · 11 min read

User Experience Analysis of Taobao Detail Page Using User Journey and VOC Data

Data Thinking Notes

Jun 18, 2023 · Big Data

Data Lake vs Data Warehouse: Uncover the Real Differences

This article explores the evolving concept of data lakes, compares them with traditional data warehouses across storage, modeling, tooling, and user roles, and examines the emerging lake‑warehouse integration, highlighting why both remain essential in modern big‑data architectures.

Big DataData ArchitectureData Lake

0 likes · 12 min read

Data Lake vs Data Warehouse: Uncover the Real Differences

DataFunTalk

Jun 18, 2023 · Big Data

Evolution and Comparison of High‑Performance Cloud‑Native Lakehouse Storage Architecture: From HDFS to JuiceFS

This article examines the evolution of big‑data storage from on‑premise HDFS to cloud‑native object storage, compares their architectures and performance, outlines future lakehouse storage requirements, and demonstrates a practical implementation using the JuiceFS distributed file system.

Big DataCloud NativeHDFS

0 likes · 15 min read

Evolution and Comparison of High‑Performance Cloud‑Native Lakehouse Storage Architecture: From HDFS to JuiceFS

DeWu Technology

Jun 16, 2023 · Big Data

Traffic Replay Platform for Data Platform Testing

The team built an online traffic‑replay platform that captures real user requests, replays them in a synchronized pre‑release environment, automatically compares responses using AAdiff and field‑ignore rules, achieving 86% interface coverage, 30% fewer regression bugs, 98% replay success and halving manual testing effort, while providing a zero‑intrusion, high‑concurrency solution for ongoing smoke, regression, stress and cache validation.

Big DataData Platformtraffic replay

0 likes · 10 min read

Traffic Replay Platform for Data Platform Testing

JD Tech

Jun 16, 2023 · Big Data

Comprehensive Introduction to Apache Kafka: Architecture, Features, and Best Practices

This article provides a detailed overview of Apache Kafka, covering its distributed streaming architecture, storage mechanisms, replication, consumer groups, compression techniques, exactly‑once semantics, configuration tips, and performance optimizations for building reliable high‑throughput data pipelines.

Big DataDistributed StreamingExactly-Once

0 likes · 19 min read

Comprehensive Introduction to Apache Kafka: Architecture, Features, and Best Practices

Data Thinking Notes

Jun 14, 2023 · Big Data

Why Data Warehouse Standards Matter and How to Implement Them Effectively

This article explains why data‑warehouse standards are essential for improving team efficiency, product quality, and maintenance costs, and provides a step‑by‑step guide covering standard creation, discussion, rollout, supervision, continuous improvement, as well as detailed design, process, quality, and security specifications.

Big DataStandardsdata modeling

0 likes · 18 min read

Why Data Warehouse Standards Matter and How to Implement Them Effectively

DataFunTalk

Jun 14, 2023 · Big Data

Active Data Governance with Operator-Level Lineage: Practices and Exploration

This article presents Big Data company's active data governance practice using operator-level lineage, detailing the shortcomings of traditional lineage, the implementation of indicator chain governance, and the exploration of proactive model governance to achieve smarter, more precise data management.

Big DataData GovernanceOperator-Level Lineage

0 likes · 14 min read

Active Data Governance with Operator-Level Lineage: Practices and Exploration

Alibaba Cloud Developer

Jun 14, 2023 · Big Data

How to Diagnose and Optimize Data Skew and Data Expansion in Big Data SQL

This article shares practical methods, based on real‑world team experience, to identify and resolve data skew and data expansion issues in big data SQL queries, offering systematic investigation steps and optimization techniques for Map, Reduce, and Join stages.

Big DataData SkewODPS

0 likes · 9 min read

How to Diagnose and Optimize Data Skew and Data Expansion in Big Data SQL

Big Data Technology & Architecture

Jun 13, 2023 · Big Data

Iceberg Data Lake Implementation and Optimization at iQIYI

This article details iQIYI's adoption of Iceberg for its data lake, covering the OLAP architecture, reasons for a data lake, Iceberg's table format advantages over Hive, platform construction, streaming ingestion, query and performance optimizations, real‑world business deployments, and future plans.

Big DataData LakeFlink

0 likes · 21 min read

Iceberg Data Lake Implementation and Optimization at iQIYI

DataFunSummit

Jun 12, 2023 · Big Data

From Data Integration to the Modern Data Stack: Concepts, Tools, and Practices

This article explains data integration fundamentals, compares data integration tools such as Stitch, Fivetran, and Airbyte, describes the concepts of data warehouses and data lakes, outlines ETL vs ELT processes, and explores building modern data stacks with Flink CDC and cloud services.

Big DataData IntegrationELT

0 likes · 17 min read

From Data Integration to the Modern Data Stack: Concepts, Tools, and Practices

Big Data Technology & Architecture

Jun 11, 2023 · Big Data

Typical Interview Questions for Offline Data Warehouse Positions (Spark, Hadoop, etc.)

The article shares a fresh graduate's experience interviewing for offline data‑warehouse roles at companies like Ctrip, Meituan and Alibaba, outlines the common interview pattern, and lists detailed Spark, Hadoop, and data‑warehouse questions used by these firms.

AlibabaBig DataCtrip

0 likes · 5 min read

Typical Interview Questions for Offline Data Warehouse Positions (Spark, Hadoop, etc.)

Architects Research Society

Jun 10, 2023 · Big Data

Designing and Planning a Data Lake on Azure Data Lake Storage Gen2

This article provides a comprehensive guide to planning, structuring, securing, and managing a data lake on Azure Data Lake Storage Gen2, covering zone architecture, folder hierarchy, access control, file formats, scalability considerations, and best‑practice recommendations for big‑data workloads.

ADLS Gen2AzureBig Data

0 likes · 21 min read

Designing and Planning a Data Lake on Azure Data Lake Storage Gen2

DataFunSummit

Jun 9, 2023 · Artificial Intelligence

Construction and Application of a Power Industry Knowledge Graph

This article describes how a power‑industry knowledge graph is built using AI, big‑data and cloud techniques, outlines its multi‑dimensional structure, and demonstrates various application scenarios such as personal achievement aggregation, professional learning, job training, generic knowledge services, and decision support for power production.

Big Dataknowledge graphknowledge management

0 likes · 10 min read

Construction and Application of a Power Industry Knowledge Graph

Alibaba Cloud Native

Jun 9, 2023 · Cloud Native

Accelerate AI & Big Data on Kubernetes with Elastic File Client & Fluid

This article explains how the Elastic File Client (EFC) and Fluid together provide a cloud‑native, high‑performance storage solution for AI and big‑data workloads on Kubernetes, detailing architecture challenges, core features, performance benchmarks, and a step‑by‑step deployment guide.

AIBig DataCloud Native

0 likes · 16 min read

Accelerate AI & Big Data on Kubernetes with Elastic File Client & Fluid

Huolala Tech

Jun 8, 2023 · Big Data

How Huolala Built a Robust Big Data Security Framework: Lessons and Practices

This article details Huolala's practical experience in constructing a comprehensive big data security system, covering data lifecycle protection, classification standards, capability development, and governance, while balancing regulatory compliance and business growth.

Big DataData Governancecloud infrastructure

0 likes · 10 min read

How Huolala Built a Robust Big Data Security Framework: Lessons and Practices

Data Thinking Notes

Jun 7, 2023 · Big Data

Digital Portraits of Data Governance: Measuring User Experience & Architecture

This article proposes a digital portrait framework for data governance, detailing metrics for user experience across external customers, internal users, management, and technical staff, as well as architecture quality indicators covering model, distribution, standards, and assets.

Big DataData GovernanceDigital Metrics

0 likes · 11 min read

Digital Portraits of Data Governance: Measuring User Experience & Architecture

DevOps

Jun 7, 2023 · Big Data

Deploying Apache Spark on YARN vs Kubernetes: Architecture, Benefits, and Comparison

This article explains how Apache Spark can be deployed using the traditional Hadoop YARN resource manager and the newer Kubernetes approach, detailing configuration steps, submission methods, and a comprehensive comparison of isolation, scalability, learning curve, logging, performance, and cost considerations.

Big DataKubernetesSpark

0 likes · 10 min read

Deploying Apache Spark on YARN vs Kubernetes: Architecture, Benefits, and Comparison

Alibaba Cloud Big Data AI Platform

Jun 7, 2023 · Big Data

How Alibaba Cloud’s Flink Advisor Transforms Real‑Time Log Diagnosis

Alibaba Cloud's Flink Intelligent Diagnosis (Advisor) combines real‑time data‑warehouse, log‑clustering, and decision‑tree algorithms to automatically analyze error logs, diagnose job anomalies, and provide optimization suggestions, dramatically reducing manual support tickets and improving user experience across Flink managed services.

AIBig DataFlink

0 likes · 12 min read

How Alibaba Cloud’s Flink Advisor Transforms Real‑Time Log Diagnosis

FunTester

Jun 7, 2023 · Big Data

Optimizing Query Performance in WeChat's Multi‑Dimensional Monitoring Platform with Druid and Redis

The article details how WeChat's multi‑dimensional metric monitoring platform, which handles billions of data points per minute, reduced average query latency from over 1000 ms to around 140 ms and achieved over 85% cache hit rate by analyzing query behavior, redesigning the data layer architecture, splitting queries into sub‑queries, adding Redis caching, and introducing sub‑dimension tables.

Big DataCacheDruid

0 likes · 13 min read

Optimizing Query Performance in WeChat's Multi‑Dimensional Monitoring Platform with Druid and Redis

dbaplus Community

Jun 6, 2023 · Big Data

Why Data Lakes Are Transforming Big Data: Concepts, Benefits, and Iceberg in Practice

This article explains the evolution of data lakes, compares public‑cloud and private‑cloud implementations, outlines key technical features, presents three real‑world scenarios, details the selection and inner workings of Apache Iceberg versus Hive, and showcases multiple production use cases at iQIYI.

Apache IcebergBatch ProcessingBig Data

0 likes · 25 min read

Why Data Lakes Are Transforming Big Data: Concepts, Benefits, and Iceberg in Practice

DataFunSummit

Jun 5, 2023 · Big Data

Building an Intelligent Data Analysis Platform Based on a Unified Semantic Layer

This article presents a comprehensive overview of Xiaomi's intelligent data analysis platform built on a unified semantic layer, covering business scenarios, system architecture, core modules such as data assets and semantic modeling, and the platform's product capabilities like visual analytics, alerts, and embedded dashboards.

Big DataIntelligent Analyticsrow-level security

0 likes · 14 min read

Building an Intelligent Data Analysis Platform Based on a Unified Semantic Layer

DataFunSummit

Jun 4, 2023 · Fundamentals

The Role of Metadata in Data Governance and Its Applications

Metadata serves as a foundational element of data governance, enabling analysis, monitoring, discovery, and understanding of data assets, while applications such as data lineage, impact analysis, and data mapping help organizations assess quality, trace origins, and optimize processing workflows.

Big DataInformation Managementmetadata

0 likes · 5 min read

The Role of Metadata in Data Governance and Its Applications

Architects Research Society

Jun 3, 2023 · Big Data

Understanding Azure Synapse Analytics: Architecture, Features, and Workloads

Azure Synapse Analytics is a cloud‑native, unlimited analytics service that combines data warehousing, big‑data processing, and AI integration, offering unified SQL and Spark engines, extensive language support, workload management, and tight integration with Power BI, Azure Data Lake, and Azure Databricks for rapid, scalable data insights.

AzureBig DataSynapse

0 likes · 11 min read

Understanding Azure Synapse Analytics: Architecture, Features, and Workloads

DataFunSummit

Jun 2, 2023 · Artificial Intelligence

Knowledge Graph–Based Root Cause Analysis for Intelligent Manufacturing

This article explains how knowledge‑graph technology combined with artificial‑intelligence methods can enhance intelligent manufacturing by improving quality and reliability through advanced root‑cause analysis, detailing development trends, analytical techniques, challenges, practical frameworks, and real‑world case studies.

Big DataRoot Cause Analysisintelligent manufacturing

0 likes · 17 min read

Knowledge Graph–Based Root Cause Analysis for Intelligent Manufacturing

WeiLi Technology Team

Jun 2, 2023 · Big Data

Flink RocksDB State Backend: Practical Tuning Guide for Large Jobs

This article explains how to optimize Flink’s RocksDB state backend for large‑scale streaming jobs, covering state types, enabling latency tracking, incremental checkpoints, predefined options, and advanced memory and thread settings, with practical configuration examples and performance comparisons.

Big DataFlinkPerformance Tuning

0 likes · 16 min read

Flink RocksDB State Backend: Practical Tuning Guide for Large Jobs

360 Tech Engineering

Jun 2, 2023 · Big Data

Overcoming Challenges in User Profiling: A Big Data‑Driven Framework for Precise Marketing

The article outlines how a unified, big‑data‑based user profiling platform addresses traditional data silos, high costs, and limited functionality by standardizing tags, integrating Spark and RHadoop processing, and enabling a closed‑loop marketing workflow that improves accuracy and operational efficiency.

Big DataData IntegrationMarketing Automation

0 likes · 7 min read

Overcoming Challenges in User Profiling: A Big Data‑Driven Framework for Precise Marketing

DataFunTalk

Jun 2, 2023 · Big Data

Iceberg Data Lake Implementation and Optimization at iQIYI

This article details iQIYI's adoption of the Iceberg data lake, covering its OLAP architecture, reasons for a lake, Iceberg table format advantages over Hive, platform construction, extensive performance optimizations, and real‑world business use cases such as ad‑flow unification, log analysis, audit, and CDC pipelines.

Big DataData LakeFlink

0 likes · 18 min read

DevOps Cloud Academy

Jun 1, 2023 · Big Data

DataOps 2.0: Integrated Data Development and Governance Practices at NetEase

The article recounts NetEase’s presentation at the inaugural DataOps conference, detailing the evolution from DataOps 1.0 pipeline to a 2.0 integrated data development‑governance model, the challenges faced, practical solutions, and strategic advice for data managers.

Big DataData GovernanceData Management

0 likes · 11 min read

DataOps 2.0: Integrated Data Development and Governance Practices at NetEase

WeChat Backend Team

Jun 1, 2023 · Big Data

How WeChat Boosted Flink Stability with TaskManager Recovery and Load Balancing

This article details WeChat’s Gemini‑2.0 real‑time streaming platform built on Flink, explaining two key stability enhancements: a TaskManager‑level partial failure recovery that avoids data loss during node crashes, and a load‑balancing scheduler that evenly distributes tasks across TaskManagers to improve resource utilization and reduce latency.

Big DataFlinkKubernetes

0 likes · 16 min read

How WeChat Boosted Flink Stability with TaskManager Recovery and Load Balancing

DataFunTalk

May 30, 2023 · Big Data

Optimizing Chart Query Performance in YouShu BI: Data Query Principles, Intelligent Caching, Query Merging, and Diagnostics

This article explains the data query fundamentals of YouShu BI charts, introduces intelligent caching design, describes query merging and various optimization techniques—including partition filters, value acceleration, and SQL generation—and outlines performance diagnosis methods to improve BI chart responsiveness.

BIBig DataChart Performance

0 likes · 16 min read

Optimizing Chart Query Performance in YouShu BI: Data Query Principles, Intelligent Caching, Query Merging, and Diagnostics

Architects Research Society

May 28, 2023 · Big Data

Understanding Azure Synapse Analytics: An Integrated Data Lake and Data Warehouse Platform

This article examines Microsoft Azure Synapse Analytics, explaining how its unified framework combines data lake and data warehouse capabilities through components such as Pipelines, Dedicated SQL pools, Spark pools, and Serverless SQL, and evaluates its advantages over separate tools like Snowflake and Databricks.

Azure SynapseBig DataCloud Analytics

0 likes · 7 min read

Understanding Azure Synapse Analytics: An Integrated Data Lake and Data Warehouse Platform

Architects Research Society

May 28, 2023 · Big Data

Databricks vs Snowflake: Comparing Data Lake and Data Warehouse Cloud Solutions

This article compares the cloud‑based analytics platforms Databricks and Snowflake, examining how Databricks serves as a data‑lake processing tool with emerging warehouse features while Snowflake operates as a scalable data‑warehouse that incorporates lake‑style capabilities, and discusses their complementary use cases.

Big DataCloud AnalyticsDatabricks

0 likes · 7 min read

Databricks vs Snowflake: Comparing Data Lake and Data Warehouse Cloud Solutions

StarRocks

May 26, 2023 · Big Data

How SeaTunnel’s StarRocks Connector Enables High‑Performance Data Sync

This article explains SeaTunnel’s architecture and its StarRocks connector, detailing source and sink features such as field projection, predicate push‑down, parallel reading, state recovery, data type mapping, Stream Load writes, CDC support, configuration examples, and future roadmap for exactly‑once semantics.

Big DataConnectorData Integration

0 likes · 16 min read

How SeaTunnel’s StarRocks Connector Enables High‑Performance Data Sync

vivo Internet Technology

May 24, 2023 · Big Data

Kafka Real-time Data Archiving to Hive: Flink SQL and DataStream Implementation Solutions

The article explains how to archive Kafka real‑time data to Hive using either Flink SQL, which quickly creates partitioned ORC tables but requires timezone handling, or Flink DataStream for more complex pipelines, and offers best‑practice guidance on data quality, system complexity, security, and performance.

Big DataDataStreamFlink

0 likes · 15 min read

Kafka Real-time Data Archiving to Hive: Flink SQL and DataStream Implementation Solutions

DataFunTalk

May 23, 2023 · Big Data

Building a Millisecond‑Response Lakehouse Platform with Apache Iceberg: Architecture, Query Acceleration, and Intelligent Optimization

This article details Bilibili's technical practice of constructing a millisecond‑response lake‑warehouse platform using Apache Iceberg, covering the background challenges, unified architecture, multi‑dimensional sorting and indexing for query acceleration, the Magnus service for intelligent optimization, and the current production deployment and performance metrics.

Big DataCubeIceberg

0 likes · 14 min read

Building a Millisecond‑Response Lakehouse Platform with Apache Iceberg: Architecture, Query Acceleration, and Intelligent Optimization

Qunar Tech Salon

May 23, 2023 · Operations

Interview with Sun Bin on Qunar’s Technology Operations, AI Initiatives, and Technical Branding

In this interview, Qunar VP Sun Bin reflects on his 13‑year journey, the technology operations center’s pandemic‑driven innovations, the company’s AI committee and big‑data strategies, and the philosophy of pure technical branding within the ITCP alliance.

Artificial IntelligenceBig DataTechnical Branding

0 likes · 11 min read

Interview with Sun Bin on Qunar’s Technology Operations, AI Initiatives, and Technical Branding

DataFunTalk

May 22, 2023 · Big Data

Alibaba Cloud Data Lake: Unified Metadata and Storage Management Practices

This article explains Alibaba Cloud's data lake architecture, unified metadata services, storage management optimizations, and format handling techniques, illustrating how lakehouse concepts, multi‑engine support, and lifecycle policies enable efficient, secure, and cost‑effective big data processing in the cloud.

Big DataCloud ServicesData Lake

0 likes · 22 min read

Alibaba Cloud Data Lake: Unified Metadata and Storage Management Practices

Data Thinking Notes

May 21, 2023 · Information Security

Why Government Data Sharing Stalls and How a “Three‑Rights” Model Can Unlock It

The article analyzes why government data sharing often fails—citing legal, technical, security, and organizational hurdles—then outlines one‑to‑one and centralized sharing models, highlights four critical success factors, and proposes a “three‑rights” framework supported by blockchain to create trustworthy, sustainable inter‑departmental data exchange.

Big DataBlockchainData Governance

0 likes · 11 min read

Why Government Data Sharing Stalls and How a “Three‑Rights” Model Can Unlock It

IT Services Circle

May 21, 2023 · R&D Management

Interviewer’s Reflections: Evaluating Senior Candidates for Cloud and Big Data Positions

The article shares an interviewer's experience assessing senior candidates for cloud and big‑data roles, detailing candidate backgrounds, interview questions on algorithms, Java, Spring, and Kubernetes, the evaluation outcomes, and practical advice for both interviewers and senior engineers.

Big DataCloud ComputingManagement

0 likes · 11 min read

Interviewer’s Reflections: Evaluating Senior Candidates for Cloud and Big Data Positions

Big Data Technology & Architecture

May 19, 2023 · Big Data

Comprehensive Big Data Interview Q&A and Personal Project Summary

This article shares a recent graduate's successful job offer story, emphasizes preparing a detailed personal project summary, and provides extensive big‑data interview questions covering Hadoop, Spark, Flink, Kafka, Hive, ClickHouse, and related technologies to help candidates excel in interviews.

Big DataFlinkHadoop

0 likes · 15 min read

Comprehensive Big Data Interview Q&A and Personal Project Summary

Data Thinking Notes

May 17, 2023 · Big Data

Inside Wing Pay’s Scalable Big Data Platform: Architecture & Governance

This article details how Wing Pay built a comprehensive data development and governance platform, covering company background, business scenarios, goals, challenges, task development workflow, task types, SparkSQL editor features, double‑environment deployment, Airflow scheduling, DataX data bus, resource isolation, compute optimization, data quality monitoring, cloud‑native practices, future outlook, and a Q&A on data permissions and governance.

AirflowBig DataCloud Native

0 likes · 17 min read

Inside Wing Pay’s Scalable Big Data Platform: Architecture & Governance

DataFunTalk

May 17, 2023 · Databases

Evolution of 360 Commercial Real-Time Data Warehouse and Apache Doris Deployment

This article details the three‑stage evolution of 360's real‑time data warehouse—from Storm + Druid + MySQL to Flink + Druid + TiDB and finally to Flink + Apache Doris—explaining architectural pain points, the reasons for choosing Doris, and how the new system delivers sub‑second query latency, strong consistency, and simplified operations across advertising scenarios.

Apache DorisBig DataData Consistency

0 likes · 17 min read

Evolution of 360 Commercial Real-Time Data Warehouse and Apache Doris Deployment

Tongcheng Travel Technology Center

May 17, 2023 · Databases

StarRocks Production Practice at Tongcheng Travel: Architecture, Use Cases, and Technical Evaluation

This article details Tongcheng Travel’s production deployment of the StarRocks OLAP database, covering background, business scenarios, technical evaluation against ClickHouse and Greenplum, implementation with Flink SQL, real‑time analytics, offline reporting, CDP use cases, performance optimizations, and future cloud‑native plans.

Big DataFlinkOLAP

0 likes · 12 min read

StarRocks Production Practice at Tongcheng Travel: Architecture, Use Cases, and Technical Evaluation

WeChat Backend Team

May 17, 2023 · Big Data

Boosting Real-Time Recommendations: Apache Pulsar Optimizations at WeChat

This article details how WeChat's Gemini‑2.0 big‑data platform leverages Apache Pulsar, outlining cloud‑native advantages, load‑balancing refinements, cache and SSD tuning, high‑availability safeguards, and cost‑saving strategies that together enable large‑scale, real‑time, deep‑learning recommendation workloads.

Apache PulsarBig DataCloud Native

0 likes · 17 min read

Boosting Real-Time Recommendations: Apache Pulsar Optimizations at WeChat