Tagged articles
3675 articles
Page 21 of 37
dbaplus Community
dbaplus Community
Jul 11, 2021 · Big Data

Scaling Real‑Time & Offline Analytics with Druid: Architecture, Optimizations, and Lessons

This article explains how Beike adopted the Druid OLAP engine to handle massive real‑time and offline query workloads, detailing its four‑component architecture, key technologies such as deep storage and metadata storage, practical optimizations for data ingestion, query caching, dynamic throttling, timeout control, and a roadmap for future enhancements.

Big DataDruidOLAP
0 likes · 19 min read
Scaling Real‑Time & Offline Analytics with Druid: Architecture, Optimizations, and Lessons
Tech Musings
Tech Musings
Jul 8, 2021 · Big Data

Building a Simple Single-Node MapReduce System: From Theory to Code

This article walks through implementing a lightweight single‑machine MapReduce framework inspired by the original MapReduce paper, covering the abstract Map/Reduce model, task scheduling between master and workers, core Go code for map, reduce, worker, and coordinator, and a brief reflection on its limitations.

Big DataDistributed SystemsLab
0 likes · 10 min read
Building a Simple Single-Node MapReduce System: From Theory to Code
DataFunTalk
DataFunTalk
Jul 7, 2021 · Big Data

Solving Data Island Challenges and Enabling Advanced OLAP Analysis on Heterogeneous Big Data Platforms – Kyligence Solution Overview

This article explains the growing analytical demands in the big‑data era, the limitations of traditional OLAP, and how Kyligence’s distributed OLAP engine addresses data‑island issues, multi‑dimensional and many‑to‑many analysis, unified security, and performance optimization with MDX on Spark, delivering a seamless Excel‑like experience.

AnalyticsBig DataData Integration
0 likes · 9 min read
Solving Data Island Challenges and Enabling Advanced OLAP Analysis on Heterogeneous Big Data Platforms – Kyligence Solution Overview
dbaplus Community
dbaplus Community
Jul 4, 2021 · Big Data

How Didi Scales MySQL‑to‑Hive Sync with Real‑Time Binlog Capture

This article explains Didi's end‑to‑end architecture for ingesting MySQL data into Hive using real‑time Binlog collection, a customized Canal component, message queues, HDFS storage, Dquality monitoring, and strategies for handling data drift and sharding in large‑scale big‑data environments.

Big DataCanalMySQL
0 likes · 13 min read
How Didi Scales MySQL‑to‑Hive Sync with Real‑Time Binlog Capture
TAL Education Technology
TAL Education Technology
Jul 1, 2021 · Big Data

Optimization of A/B Test Metric Computation Using Spark and ClickHouse

This article details the design and multi‑stage optimization of an A/B testing metric system, describing its product architecture, Spark‑based computation engine, ClickHouse OLAP layer, cumulative calculation improvements, and batch processing techniques that reduced processing time from hours to a few minutes for hundreds of experiments and metrics.

A/B testingBig DataClickHouse
0 likes · 8 min read
Optimization of A/B Test Metric Computation Using Spark and ClickHouse
Architect
Architect
Jul 1, 2021 · Big Data

Data Governance Practices at Meituan Hotel Travel Platform

This article presents a comprehensive case study of Meituan's hotel‑travel data governance, covering the background, challenges, strategic goals, standardized processes, technical systems, cost and security optimizations, measurable outcomes, and future plans for automated governance.

Big DataCost OptimizationData Governance
0 likes · 29 min read
Data Governance Practices at Meituan Hotel Travel Platform
Youzan Coder
Youzan Coder
Jun 30, 2021 · Big Data

Online Monitoring Practices for Offline and Real-Time Data at Youzan

Youzan Data Report Center monitors offline batch and real‑time data pipelines using accuracy and timeliness rules, cross‑table checks, upstream‑downstream comparisons, and scheduled alerts to detect anomalies early; since 2021 it has generated over 25 alerts, and plans a unified data‑quality dashboard.

Big DataData QualityFlink
0 likes · 12 min read
Online Monitoring Practices for Offline and Real-Time Data at Youzan
JD Retail Technology
JD Retail Technology
Jun 29, 2021 · Big Data

The Value of Data and Data Products: From Concept to Practice

This article explains how data has become a critical production resource, outlines the limitations of traditional data‑analysis workflows, defines data products and their components, describes their advantages and key characteristics, and shares practical case studies of data‑product implementations in a large e‑commerce environment.

Big DataData ProductData Value
0 likes · 16 min read
The Value of Data and Data Products: From Concept to Practice
DataFunTalk
DataFunTalk
Jun 26, 2021 · Big Data

Building a Scalable Big Data Service System at Didi: Practices and Lessons

Zhang Liang shares Didi's four-stage journey of constructing and governing large‑scale open‑source big‑data engine services—including engine selection, hardware sizing, PaaS platform building, proxy architecture, and governance—highlighting practical challenges, solutions, and ROI‑driven best practices for Kafka, Elasticsearch, Flink, and related technologies.

Big DataData InfrastructureElasticsearch
0 likes · 16 min read
Building a Scalable Big Data Service System at Didi: Practices and Lessons
Laravel Tech Community
Laravel Tech Community
Jun 25, 2021 · Big Data

Apache Kudu 1.15.0 – New Features and Improvements

Apache Kudu 1.15.0 adds experimental multi‑row transaction support (currently INSERT and INSERT_IGNORE), Raft‑based master configuration tools, table comment synchronization with Hive Metastore, per‑table size and row‑count limits configurable via flags or the kudu table set_limit tool, a customizable Kerberos principal flag, and TLS v1.3 with optional cipher‑suite selection, collectively enhancing low‑latency random access and analytical capabilities in the Hadoop ecosystem.

Apache KuduBig DataHadoop
0 likes · 3 min read
Apache Kudu 1.15.0 – New Features and Improvements
Yuewen Technology
Yuewen Technology
Jun 25, 2021 · Big Data

Building Yuedu Group’s Overseas Big Data Platform: Architecture, Offline & Real‑Time Processing

This article details how Yuedu Group designed and implemented an overseas big data platform, covering overall system architecture, offline data‑warehouse construction with dimensional modeling, real‑time streaming using Oceanus and ClickHouse, and future plans for cost reduction and data quality assurance.

Big DataCloud ComputingReal-time Processing
0 likes · 12 min read
Building Yuedu Group’s Overseas Big Data Platform: Architecture, Offline & Real‑Time Processing
Architecture Digest
Architecture Digest
Jun 24, 2021 · Big Data

Kuaishou's Big Data Service Platform: Architecture, Key Technologies, and Future Outlook

This article introduces Kuaishou's data platform serviceification, outlining the background challenges for data engineers, the platform's architecture and key technologies such as configuration‑driven development, multi‑mode APIs, data acceleration, and high‑availability mechanisms, and concludes with a summary of achievements and future directions.

Big DataData AccelerationData Platform
0 likes · 12 min read
Kuaishou's Big Data Service Platform: Architecture, Key Technologies, and Future Outlook
DevOps
DevOps
Jun 22, 2021 · Operations

Building Digital Champion Capabilities: Integrating Customer Solutions, Operations, Technology, and Talent Ecosystems

The article outlines how digital‑champion enterprises achieve superior performance by integrating four core ecosystems—customer solutions, operations, technology, and talent—through strategic planning, partnership, and advanced technologies such as AI, big data, and industrial IoT, while highlighting maturity stages and practical implementation steps.

Artificial IntelligenceBig DataDigital Transformation
0 likes · 28 min read
Building Digital Champion Capabilities: Integrating Customer Solutions, Operations, Technology, and Talent Ecosystems
DataFunTalk
DataFunTalk
Jun 21, 2021 · Big Data

Flink + Iceberg 0.11 Practices in Qunar Data Platform

This article shares Qunar's experience using Flink together with Apache Iceberg 0.11 to address real‑time data warehouse challenges, covering background pain points, Iceberg architecture, solutions for Kafka data loss and Hive latency, and optimization practices such as small‑file handling, sorting, and checkpoint management.

Big DataData LakeFlink
0 likes · 13 min read
Flink + Iceberg 0.11 Practices in Qunar Data Platform
Architecture Digest
Architecture Digest
Jun 21, 2021 · Databases

Using HBase for HR Performance Data Preprocessing Platform: Architecture, Concepts, and Best Practices

This article introduces the HR performance data preprocessing platform’s requirements, explains why HBase was selected as the storage solution, details its core concepts, architecture, data write/read processes, best practices, limitations, and presents performance metrics demonstrating its suitability for large‑scale, high‑throughput workloads.

Big DataDatabase ArchitectureHBase
0 likes · 12 min read
Using HBase for HR Performance Data Preprocessing Platform: Architecture, Concepts, and Best Practices
DataFunTalk
DataFunTalk
Jun 20, 2021 · Databases

Xiaohongshu’s OLAP Architecture Evolution and DorisDB Adoption

This article details Xiaohongshu’s multi‑stage evolution of its OLAP infrastructure—from Redshift to Presto, ClickHouse, and finally DorisDB—describing the data pipeline, tool comparisons, advertising use‑case implementation, and the resulting performance and operational benefits.

Big DataClickHouseDorisDB
0 likes · 12 min read
Xiaohongshu’s OLAP Architecture Evolution and DorisDB Adoption
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Jun 20, 2021 · Big Data

Why HBase Is the Ideal Choice for Large‑Scale HR Data Preprocessing

This article explains how HBase’s distributed column‑oriented architecture, high‑performance read/write capabilities, and flexible schema make it a cost‑effective solution for handling massive, unstructured HR performance data, covering its core concepts, cluster operation, best practices, and performance metrics.

Big DataHBasedata preprocessing
0 likes · 11 min read
Why HBase Is the Ideal Choice for Large‑Scale HR Data Preprocessing
DevOps
DevOps
Jun 16, 2021 · Operations

Understanding Digital Transformation: Definitions, Strategic Questions, Drivers, Frameworks, Roadmaps, Benefits and Pitfalls

The article provides a comprehensive overview of digital transformation, covering its definition, essential strategic questions, key drivers such as customer expectations, cloud and AI, priority areas in the value chain, practical frameworks, roadmap steps, expected benefits and common reasons for failure.

Artificial IntelligenceBig DataBusiness strategy
0 likes · 20 min read
Understanding Digital Transformation: Definitions, Strategic Questions, Drivers, Frameworks, Roadmaps, Benefits and Pitfalls
IT Architects Alliance
IT Architects Alliance
Jun 15, 2021 · Industry Insights

How Cloud Computing, Big Data, and AI Intertwine to Power Modern Services

This article explains the evolution of cloud computing from resource management to elastic virtualization, the emergence of IaaS, PaaS and SaaS service models, how big‑data processing relies on distributed cloud platforms, and why artificial intelligence now depends on massive data and cloud‑scale compute to deliver intelligent services.

Artificial IntelligenceBig DataCloud Computing
0 likes · 37 min read
How Cloud Computing, Big Data, and AI Intertwine to Power Modern Services
Baidu Geek Talk
Baidu Geek Talk
Jun 15, 2021 · Industry Insights

What Baidu Unveiled at QCon 2021: Key Takeaways from 7 Cutting‑Edge Sessions

This article compiles Baidu experts' presentations at QCon 2021, covering unified quality‑efficiency delivery for feed recommendation, software engineering capabilities, AIOps fault‑management practices, Apache Doris real‑time analytics, large‑scale Service Mesh deployment, massive service‑governance techniques, and deep‑learning platform innovations, with speaker details and audience benefits.

AIBaiduBig Data
0 likes · 12 min read
What Baidu Unveiled at QCon 2021: Key Takeaways from 7 Cutting‑Edge Sessions
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 10, 2021 · Big Data

User Profiling: Concepts, Tag Classification, Tag‑System Construction, Applications and Implementation Steps

This article provides a comprehensive overview of user profiling, covering its definition, the five‑dimensional framework (goal, method, organization, standards, validation), various tag classifications, tag‑system architecture, modeling techniques, practical applications such as precise marketing and product innovation, and a step‑by‑step guide for building a profiling system using big‑data and AI methods.

Big DataCustomer Segmentationdata tagging
0 likes · 24 min read
User Profiling: Concepts, Tag Classification, Tag‑System Construction, Applications and Implementation Steps
Architecture Digest
Architecture Digest
Jun 10, 2021 · Big Data

NetEase Game Streaming ETL Architecture and Practices Based on Flink

This article presents NetEase Game's streaming ETL solution built on Flink, covering business background, log characteristics, specialized and generic ETL services, architectural evolution, Python UDF integration, runtime optimizations, fault‑tolerance mechanisms, and future roadmap for unified real‑time and offline data warehouses.

Big DataFlinkLog Processing
0 likes · 19 min read
NetEase Game Streaming ETL Architecture and Practices Based on Flink
58 Tech
58 Tech
Jun 9, 2021 · Big Data

Designing and Implementing a Unified Data Metric System for 58 Commercial Data Team

This article explains how 58's commercial data team built a comprehensive data metric system—from identifying common metric definition issues to establishing a domain‑driven hierarchy, distinguishing atomic and derived metrics, implementing a unified metric management platform, and providing APIs and examples for querying and visualizing metrics.

Big DataData Governancejava
0 likes · 17 min read
Designing and Implementing a Unified Data Metric System for 58 Commercial Data Team
Xianyu Technology
Xianyu Technology
Jun 8, 2021 · Big Data

Longgong Data Analysis Platform: Architecture and Solutions for Large‑Scale Structured Data

The Longgong Data Analysis Platform enables Idle Fish to capture, store, and analyze billions of structured product attributes in real time across more than 8,000 categories, using TableStore, MySQL, ODPS, and a distributed scheduler to achieve over 50% query speedup, 80% category coverage, and rapid support for search and recommendation teams.

AlibabaBig DataData Platform
0 likes · 9 min read
Longgong Data Analysis Platform: Architecture and Solutions for Large‑Scale Structured Data
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 8, 2021 · Artificial Intelligence

Can Low‑Code Bridge the Gap Between Business and AI? Insights on Its Future

The article explores how low‑code platforms can complement traditional algorithm development, enhance collaboration between business users and engineers, and accelerate big‑data and AI initiatives by improving data cleaning, modular design, and feedback loops, while highlighting the trade‑offs of abstraction and flexibility.

AIAlgorithm DevelopmentBig Data
0 likes · 9 min read
Can Low‑Code Bridge the Gap Between Business and AI? Insights on Its Future
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 6, 2021 · Big Data

Understanding Data Warehouses: Concepts, Architecture, Modeling, and Governance

This article provides a comprehensive overview of data warehouses, explaining their purpose, differences from databases, OLTP vs OLAP, traditional versus internet data warehouse models, layered architecture, modeling theories, metric dictionaries, date dimensions, naming conventions, data governance, and incremental synchronization techniques with practical SQL examples.

Big DataData GovernanceETL
0 likes · 24 min read
Understanding Data Warehouses: Concepts, Architecture, Modeling, and Governance
DataFunTalk
DataFunTalk
Jun 6, 2021 · Big Data

Understanding Apache Pulsar: Cloud‑Native Messaging, Storage‑Compute Separation, and Batch‑Stream Fusion with Flink

This article explains Apache Pulsar’s cloud‑native, storage‑compute separated architecture, its data model and scalability features, and how it integrates with Flink to provide a unified platform for both real‑time streaming and batch processing in big‑data applications.

Apache PulsarBatch-Stream IntegrationBig Data
0 likes · 17 min read
Understanding Apache Pulsar: Cloud‑Native Messaging, Storage‑Compute Separation, and Batch‑Stream Fusion with Flink
DataFunTalk
DataFunTalk
Jun 5, 2021 · Big Data

Building and Evolving a Data Service Platform for NetEase Cloud Music

The article details how NetEase Cloud Music co‑built a unified data service platform with NetEase YouShu, describing its architecture, phased development from internal use to online high‑concurrency services, feature enhancements such as API marketplace, multi‑source support, parameter conversion, and future roadmap for broader data products.

API PlatformBackendBig Data
0 likes · 16 min read
Building and Evolving a Data Service Platform for NetEase Cloud Music
dbaplus Community
dbaplus Community
Jun 5, 2021 · Big Data

How Flink + Iceberg Transform Data Lakes for Real‑Time Streaming

This article explains the concept of data lakes, outlines a four‑layer open‑source architecture, presents several classic Flink‑Iceberg use cases, details why Iceberg was chosen, and describes the design of Flink’s streaming sink and upcoming community roadmap.

Apache FlinkApache IcebergBig Data
0 likes · 14 min read
How Flink + Iceberg Transform Data Lakes for Real‑Time Streaming
MaGe Linux Operations
MaGe Linux Operations
Jun 3, 2021 · Big Data

Why Kafka Handles Billions of Messages: Architecture, Use Cases, and Fast Performance

This article introduces Kafka, LinkedIn’s high‑throughput distributed messaging system, explains its core concepts such as brokers, topics, partitions, offsets, producers, consumers, and consumer groups, outlines common use cases like asynchronous decoupling and data‑stream processing, and details its fast performance mechanisms, fault‑tolerance, installation, and configuration steps.

Big DataData StreamingInstallation
0 likes · 11 min read
Why Kafka Handles Billions of Messages: Architecture, Use Cases, and Fast Performance
dbaplus Community
dbaplus Community
Jun 2, 2021 · Databases

How to Build a Mature Data Warehouse: 7 Essential Steps and Best Practices

This article explains why data warehouses are critical for decision‑making, outlines the challenges of immature warehouses, and provides a step‑by‑step framework—including goal setting, technology selection, problem identification, domain modeling, layer design, modeling principles, and governance standards—to help teams build a robust, maintainable data warehouse.

Big DataData ArchitectureDatabase design
0 likes · 22 min read
How to Build a Mature Data Warehouse: 7 Essential Steps and Best Practices
Big Data Technology Architecture
Big Data Technology Architecture
Jun 2, 2021 · Big Data

Practical Operations of NetEase Big Data Platform: Architecture, EasyOps, Monitoring, and Experience Sharing

The presentation details NetEase's big data platform operations, covering current usage, the internally built EasyOps control system, a generic service‑operation framework based on Ansible, Prometheus‑Grafana monitoring, configuration management, network and storage optimizations, and lessons learned from cloud migration.

AnsibleBig DataEasyOps
0 likes · 9 min read
Practical Operations of NetEase Big Data Platform: Architecture, EasyOps, Monitoring, and Experience Sharing
Tencent Advertising Technology
Tencent Advertising Technology
Jun 2, 2021 · Big Data

Tencent Advertising Real-Time Strategy Data Framework: Architecture, Performance, and High Availability

The article presents a detailed overview of Tencent Advertising's real‑time strategy data framework, explaining its role in the ad system, the challenges of massive log volumes, and the architectural, performance, and high‑availability solutions implemented to achieve fast, reliable, and scalable ad decision making.

Big DataDistributed SystemsReal-Time Strategy
0 likes · 24 min read
Tencent Advertising Real-Time Strategy Data Framework: Architecture, Performance, and High Availability
dbaplus Community
dbaplus Community
Jun 1, 2021 · Big Data

How Didi Boosted SQL Performance by 40%: Migrating 10k Hive Jobs to Spark

Didi migrated over 10,000 Hive SQL tasks to Spark SQL, achieving 85% Spark task share, cutting execution time by 40%, and reducing CPU and memory usage by 21% and 49% respectively, through a systematic migration process that addressed syntax, UDF, performance, and functional differences between the two engines.

Big DataPerformance OptimizationSQL Migration
0 likes · 20 min read
How Didi Boosted SQL Performance by 40%: Migrating 10k Hive Jobs to Spark
Qunar Tech Salon
Qunar Tech Salon
Jun 1, 2021 · Big Data

Integrating TensorFlow for Java with Spark‑Scala for Distributed Machine Learning Prediction

This article shares practical experience of building a high‑performance distributed prediction service by combining TensorFlow for Java with Spark‑Scala, covering framework selection, performance comparison, model training, loading, inference, deployment, and optimization techniques for large‑scale data processing.

Big DataPerformance OptimizationScala
0 likes · 16 min read
Integrating TensorFlow for Java with Spark‑Scala for Distributed Machine Learning Prediction
Top Architect
Top Architect
May 31, 2021 · Databases

How to Achieve Fast Queries: MySQL Index Optimization, Large‑Table Strategies, Elasticsearch Basics, and HBase Overview

This article explains common causes of slow MySQL queries, how proper indexing and lock handling can improve performance, introduces Elasticsearch’s inverted‑index advantages and suitable use cases, and outlines HBase’s column‑family storage model and row‑key design for large‑scale data.

Big DataDatabase OptimizationHBase
0 likes · 18 min read
How to Achieve Fast Queries: MySQL Index Optimization, Large‑Table Strategies, Elasticsearch Basics, and HBase Overview
IT Architects Alliance
IT Architects Alliance
May 30, 2021 · Big Data

NetEase Game Streaming ETL Architecture and Practices Based on Flink

This article presents NetEase Game's Flink‑based streaming ETL system, detailing business background, log classifications, specialized and generic ETL services, Python UDF integration, runtime optimizations, HDFS write tuning, SLA metrics, fault‑tolerance mechanisms, and future roadmap for unified data lakes and PyFlink support.

Big DataData IntegrationETL
0 likes · 19 min read
NetEase Game Streaming ETL Architecture and Practices Based on Flink
DataFunTalk
DataFunTalk
May 28, 2021 · Artificial Intelligence

JD's Open‑Source Federated Learning Solution 9N‑FL: Architecture, Features, Timeline, and Business Impact

This article introduces JD's open‑source federated learning platform 9N‑FL, explaining the data‑island problem, the fundamentals and classifications of federated learning, its four key features, the system’s layered architecture, development timeline, real‑world advertising use case results, and future enhancements.

9N-FLBig DataFederated Learning
0 likes · 15 min read
JD's Open‑Source Federated Learning Solution 9N‑FL: Architecture, Features, Timeline, and Business Impact
58 Tech
58 Tech
May 28, 2021 · Big Data

Practical Upgrade Experience of Hadoop 3.2.1 in 58.com Data Platform: HDFS, YARN, and MR3

This article details the end‑to‑end upgrade of a 5000‑node Hadoop 2.6.0 cluster to Hadoop 3.2.1 at 58.com, covering HDFS migration, RBF and EC adoption, Yarn federation and rolling upgrades, MR3 integration, extensive compatibility testing, and operational lessons learned for large‑scale big‑data platforms.

Big DataCluster UpgradeHDFS
0 likes · 19 min read
Practical Upgrade Experience of Hadoop 3.2.1 in 58.com Data Platform: HDFS, YARN, and MR3
dbaplus Community
dbaplus Community
May 27, 2021 · Big Data

How Vipshop Scales Billion‑Row OLAP with ClickHouse, Presto, and Flink

This article details Vipshop's OLAP evolution, describing how Presto, Kylin, and ClickHouse are integrated, the deployment architecture with HAproxy and chproxy, containerization on Kubernetes, and the Flink‑ClickHouse pipeline that enables self‑service analysis of hundred‑billion‑row datasets while addressing performance challenges and future roadmap.

Big DataClickHouseFlink
0 likes · 28 min read
How Vipshop Scales Billion‑Row OLAP with ClickHouse, Presto, and Flink
Tencent Cloud Developer
Tencent Cloud Developer
May 27, 2021 · Big Data

An Introduction to Kafka: Architecture, Core Components, Service Governance, Performance Optimizations, and Installation Guide

Kafka is a high‑throughput distributed publish‑subscribe system that uses brokers, topics, partitions, offsets, producers, consumers, and Zookeeper for metadata and leader election, offering fast sequential disk writes, page‑cache zero‑copy transfers, ISR‑based replication, and includes step‑by‑step installation of JDK, Zookeeper, and Kafka.

Big DataDistributed MessagingInstallation
0 likes · 11 min read
An Introduction to Kafka: Architecture, Core Components, Service Governance, Performance Optimizations, and Installation Guide
IT Architects Alliance
IT Architects Alliance
May 25, 2021 · Big Data

How Modern Data Middle Platforms Power Real‑Time and Offline Analytics

This article provides a comprehensive technical overview of data middle platforms, covering data aggregation, offline and real‑time development, smart operations, data asset management, governance, service layers, platform implementations, warehouse layering, and key differences between offline and real‑time data warehouses.

Big DataData GovernanceData Platform
0 likes · 26 min read
How Modern Data Middle Platforms Power Real‑Time and Offline Analytics
Alibaba Terminal Technology
Alibaba Terminal Technology
May 25, 2021 · Frontend Development

Inside Alibaba’s Front‑End Visualization Showcase: Insights from CSIG’s Campus‑to‑Enterprise Event

The CSIG Visualization and Visual Analysis Committee’s visit to Alibaba’s Xixi Campus on May 21, 2021 brought together leading academics and industry experts to discuss graph data, big‑data research, spatio‑temporal data, low‑code design, and cutting‑edge visualization techniques, fostering deep industry‑academia collaboration.

Big Dataindustry‑academialow‑code
0 likes · 7 min read
Inside Alibaba’s Front‑End Visualization Showcase: Insights from CSIG’s Campus‑to‑Enterprise Event
Full-Stack Internet Architecture
Full-Stack Internet Architecture
May 25, 2021 · Backend Development

Comprehensive Interview Experience Summary and Preparation Guide for Major Tech Companies

This article compiles detailed interview experiences, question lists, and practical advice for candidates targeting backend, big‑data, and cloud positions at leading Chinese tech firms, offering timelines, personal background, preparation tips, and reflections to help job seekers navigate multi‑round technical interviews efficiently.

Big DataSystem Designcareer advice
0 likes · 28 min read
Comprehensive Interview Experience Summary and Preparation Guide for Major Tech Companies
Architects Research Society
Architects Research Society
May 23, 2021 · Big Data

Data Architecture Trends: From Chaos to an Organized Era – Insights from Anthony J. Algmin

The article reviews Anthony J. Algmin’s reflections on past data‑architecture predictions, current hot topics such as cloud, AI/ML, data governance, and real‑time analytics, and forecasts future trends including metadata management, blockchain, and the evolving role of data architects within enterprises.

Artificial IntelligenceBig DataData Architecture
0 likes · 13 min read
Data Architecture Trends: From Chaos to an Organized Era – Insights from Anthony J. Algmin
DataFunTalk
DataFunTalk
May 22, 2021 · Databases

Combining HBase and Elasticsearch: Challenges and the Lindorm Searchindex Solution

The article examines the strengths and weaknesses of combining HBase and Elasticsearch for massive data storage and retrieval, outlines three integration patterns and their challenges, and presents Alibaba Cloud's Lindorm Searchindex as a SQL‑driven, low‑cost, strongly consistent solution that simplifies development and improves performance.

Big DataElasticsearchHBase
0 likes · 11 min read
Combining HBase and Elasticsearch: Challenges and the Lindorm Searchindex Solution
DeWu Technology
DeWu Technology
May 22, 2021 · Big Data

Unified Semantic Layer for Data Development: Addressing Pain Points and Optimizing Queries

A unified semantic layer for data development solves metric‑change ripple effects, developer burden, and large‑scale query performance problems by offering consistent metric definitions, multi‑view access, concise auto‑generated SQL, instant propagation of updates, and engine‑driven optimal query selection, thereby bridging business and engineering and cutting maintenance effort.

Big DataOLAPdata engineering
0 likes · 5 min read
Unified Semantic Layer for Data Development: Addressing Pain Points and Optimizing Queries
Top Architect
Top Architect
May 22, 2021 · Big Data

Kafka Basics: Topics, Partitions, Producers, Consumers, and Cluster Architecture

This article provides a comprehensive introduction to Kafka, covering its role as a message system, core concepts such as topics, partitions, producers, consumers, messages, the cluster architecture with replicas and controllers, performance optimizations, log segmentation, and network design, all illustrated with diagrams and code examples.

Big DataKafkaMessage Queue
0 likes · 13 min read
Kafka Basics: Topics, Partitions, Producers, Consumers, and Cluster Architecture
Programmer DD
Programmer DD
May 22, 2021 · Big Data

What Is a Data Lake? Origins, Architecture, and How It Powers Modern Big Data

This article explains the concept of a data lake—its origin in 2011, how it differs from traditional databases and data warehouses, its core characteristics such as raw data storage, on‑demand computing, and schema‑on‑read, as well as its advantages, challenges, architectural components, and future outlook within the big‑data ecosystem.

Big DataData ArchitectureData Governance
0 likes · 20 min read
What Is a Data Lake? Origins, Architecture, and How It Powers Modern Big Data
IT Architects Alliance
IT Architects Alliance
May 22, 2021 · Big Data

Flink-Based Real‑Time Recommendation System: Architecture, Logic, and Docker Deployment Guide

This article presents a comprehensive walkthrough of a Flink‑powered recommendation system, detailing its v2.0 architecture, module functions, recommendation algorithms (hotness, product similarity, collaborative filtering), front‑end and back‑end UI, and step‑by‑step Docker deployment of MySQL, Redis, HBase, and Kafka services.

Big DataFlinkHBase
0 likes · 11 min read
Flink-Based Real‑Time Recommendation System: Architecture, Logic, and Docker Deployment Guide
NetEase Game Operations Platform
NetEase Game Operations Platform
May 22, 2021 · Big Data

Comprehensive Overview and Source Code Analysis of NetEase Spark Kyuubi

This article systematically introduces NetEase Kyuubi, an open‑source high‑performance JDBC and SQL execution engine built on Apache Spark, covering its background, core architecture, service discovery, session and operation management, startup processes, and key source‑code implementations with detailed code examples.

Apache ThriftBig DataKyuubi
0 likes · 47 min read
Comprehensive Overview and Source Code Analysis of NetEase Spark Kyuubi
Tencent Cloud Developer
Tencent Cloud Developer
May 21, 2021 · Big Data

Tencent Cloud Oceanus: Flink SQL Optimization and Extension Practices

Tencent Cloud Oceanus, a computing service powering internal apps like WeChat and external partners such as Bilibili, scales to over 30,000 cores handling 5 PB daily and 500,000 jobs, and tackles Flink SQL’s syntax, function and operational limits with table‑valued functions, incremental and enhanced tumble windows, and caching‑based retraction optimization that cuts downstream data volume up to 30× and improves join performance by about 20 %.

Big DataFlink SQLOceanus
0 likes · 19 min read
Tencent Cloud Oceanus: Flink SQL Optimization and Extension Practices
UCloud Tech
UCloud Tech
May 21, 2021 · Big Data

How US3 Hadoop Adapter Cuts Big Data Storage Costs and Boosts Performance

This article explains how UCloud's US3 object storage, combined with a custom Hadoop adapter, separates compute and storage, optimizes file system operations, and leverages caching and specialized APIs to dramatically reduce storage costs and improve read/write performance for large‑scale Hadoop workloads.

Big DataCacheHadoop
0 likes · 13 min read
How US3 Hadoop Adapter Cuts Big Data Storage Costs and Boosts Performance
iQIYI Technical Product Team
iQIYI Technical Product Team
May 21, 2021 · Big Data

Design and Implementation of iQIYI's User Feedback Analysis System

iQIYI built an in‑house user‑feedback analysis system that automatically ingests multi‑channel data, classifies and clusters issues, assesses feedback quality, localizes problems, and streamlines repair closure, boosting recall accuracy, alarm precision, closure rates and reducing cycle time across business lines to enhance user experience.

AIBig Dataclassification
0 likes · 15 min read
Design and Implementation of iQIYI's User Feedback Analysis System
Byte Quality Assurance Team
Byte Quality Assurance Team
May 19, 2021 · Big Data

Streaming 102: The World Beyond Batch

This article extends the concepts introduced in Streaming 101 by deeply exploring data processing patterns for unbounded data, covering windowing, watermarks, triggers, accumulation modes, and their practical implications for building robust low‑latency streaming pipelines.

Big DataStreamingTriggers
0 likes · 14 min read
Streaming 102: The World Beyond Batch
Big Data Technology & Architecture
Big Data Technology & Architecture
May 19, 2021 · Big Data

Comprehensive Guide to Data Governance: Metadata, Data Quality, Standards, and Asset Management

This article provides an extensive overview of data governance in the big‑data era, covering common pitfalls, the role of metadata, data quality management, data standardization, and data asset management, and offers practical recommendations for organizations to implement effective governance practices.

Big DataData Asset ManagementData Governance
0 likes · 42 min read
Comprehensive Guide to Data Governance: Metadata, Data Quality, Standards, and Asset Management
Tencent Cloud Developer
Tencent Cloud Developer
May 19, 2021 · Industry Insights

How Cloud‑Native Principles Transform Big Data Infrastructure

The article analyzes how cloud‑native concepts such as DevOps, micro‑services, continuous delivery, and containerization can be applied to big‑data foundations, outlining four guiding principles—industrialized delivery, cost quantification, load‑adaptive scaling, and data‑centric design—and describing concrete Hadoop‑based architectures and Tencent Cloud solutions that lower cost while boosting performance.

Big DataCost OptimizationData Infrastructure
0 likes · 22 min read
How Cloud‑Native Principles Transform Big Data Infrastructure
UCloud Tech
UCloud Tech
May 18, 2021 · Big Data

Step‑by‑Step Guide to Deploy UCloud’s Free USDP for Big Data

This article provides a comprehensive tutorial on installing UCloud's free USDP version for private big‑data deployments, covering environment preparation, minimum node specifications, resource download, configuration files, one‑click initialization scripts, server startup, web UI access, license acquisition, and optional manual setup procedures.

Big DataLinuxUCloud
0 likes · 16 min read
Step‑by‑Step Guide to Deploy UCloud’s Free USDP for Big Data
Alibaba Cloud Native
Alibaba Cloud Native
May 17, 2021 · Big Data

How Vineyard Accelerates Cloud‑Native Big Data Workflows with Zero‑Copy Memory Sharing

Vineyard, an open‑source distributed memory data‑sharing engine, tackles the inefficiencies of traditional file‑system based big‑data pipelines by enabling zero‑copy, in‑memory object exchange, Kubernetes‑aware scheduling, and plug‑in operators, delivering up to 1.34× faster end‑to‑end execution.

Big DataCloud NativeMemory Sharing
0 likes · 10 min read
How Vineyard Accelerates Cloud‑Native Big Data Workflows with Zero‑Copy Memory Sharing
Beijing SF i-TECH City Technology Team
Beijing SF i-TECH City Technology Team
May 17, 2021 · Artificial Intelligence

AIOps Overview: Concepts, Applications, and Case Studies

This article provides a comprehensive overview of AIOps, covering its definition, evolution from manual to AI-driven operations, core capabilities, and real-world applications in capacity prediction, anomaly detection, and alarm merging, illustrated with case studies from a food‑retail giant and internal logistics.

Artificial IntelligenceBig DataCapacity Prediction
0 likes · 13 min read
AIOps Overview: Concepts, Applications, and Case Studies
Architecture Digest
Architecture Digest
May 17, 2021 · Big Data

Technical Architecture Overview of Toutiao: Data Pipeline, User Modeling, Recommendation System, and Microservices

The article provides a comprehensive technical overview of Toutiao's rapid growth, detailing its massive user base, data collection and processing pipelines, user modeling, cold‑start strategies, recommendation engines, storage solutions, push notification mechanisms, and the underlying microservice and PaaS architecture.

Big DataHadoopKafka
0 likes · 8 min read
Technical Architecture Overview of Toutiao: Data Pipeline, User Modeling, Recommendation System, and Microservices
DataFunTalk
DataFunTalk
May 16, 2021 · Big Data

Efficient Data Update/Delete and Real‑time Processing in the Arctic Lakehouse System

This article explains the evolution from traditional data warehouses to modern lakehouse architectures, introduces the Arctic system’s dynamic hash tree for fast update/delete, describes file splitting with sequence/offset ordering, and compares copy‑on‑write versus merge‑on‑read techniques for achieving low‑latency analytics.

ArcticBig DataCopy-on-Write
0 likes · 12 min read
Efficient Data Update/Delete and Real‑time Processing in the Arctic Lakehouse System
Big Data Technology & Architecture
Big Data Technology & Architecture
May 15, 2021 · Big Data

One‑Stop Big Data Platform Construction: Practices from WeBank, Beike, and iQIYI

This article shares practical notes on building a one‑stop big data platform, outlining essential functions such as data extraction, cleaning, storage, analysis, governance, and security, and presents implementation case studies from WeBank, Beike, and iQIYI to illustrate real‑world architectures and solutions.

Big DataData GovernanceData Platform
0 likes · 8 min read
One‑Stop Big Data Platform Construction: Practices from WeBank, Beike, and iQIYI
Architects Research Society
Architects Research Society
May 15, 2021 · Big Data

Data Warehouse vs Data Lake: Definitions, Differences, and Architectural Considerations

Data warehouses store structured data centrally for reporting and analysis, while data lakes retain raw data in various formats, offering flexible, low‑cost, schema‑on‑read processing; the article explains their definitions, key differences, common misconceptions, and why many organizations now combine both to enable self‑service big‑data analytics.

AnalyticsBig DataData Architecture
0 likes · 21 min read
Data Warehouse vs Data Lake: Definitions, Differences, and Architectural Considerations
DataFunTalk
DataFunTalk
May 14, 2021 · Big Data

Real‑time Billion‑Scale Data Transmission and AI Pipeline Architecture at Bilibili

This article presents a technical deep‑dive into Bilibili’s evolution from offline to real‑time data processing, describing the challenges of timeliness, ETL, AI feature engineering, and the design of a Flink‑on‑YARN incremental pipeline that supports trillion‑scale message throughput and AI‑driven real‑time applications.

AIBig DataFlink
0 likes · 27 min read
Real‑time Billion‑Scale Data Transmission and AI Pipeline Architecture at Bilibili
HelloTech
HelloTech
May 14, 2021 · Big Data

User Behavior Analysis System: Architecture, ClickHouse Cluster Deployment, and Analytical Techniques

The article describes a real‑time user behavior analysis platform built on a ClickHouse cluster, detailing its architecture, Hive‑to‑ClickHouse data ingestion with user‑ID routing, table designs for behavior and group data, and five analytical methods—event, funnel, path, retention, and attribution—leveraging shard‑level parallelism and custom functions for high efficiency.

AnalyticsBig DataClickHouse
0 likes · 20 min read
User Behavior Analysis System: Architecture, ClickHouse Cluster Deployment, and Analytical Techniques
ITPUB
ITPUB
May 14, 2021 · Big Data

How AnalyticDB Powers Petabyte-Scale Consumer Analytics in Alibaba’s Data Bank

The article details how Alibaba’s Data Bank leverages AnalyticDB’s cold‑hot tiered storage, high‑throughput real‑time writes, and low‑latency OLAP capabilities to handle petabyte‑scale consumer data, support flexible AIPL analysis, crowd profiling, and rapid audience selection while cutting costs and ensuring elasticity during peak events.

AnalyticDBBig DataCold-Hot Storage
0 likes · 14 min read
How AnalyticDB Powers Petabyte-Scale Consumer Analytics in Alibaba’s Data Bank
Volcano Engine Developer Services
Volcano Engine Developer Services
May 13, 2021 · Databases

Inside ByteGraph: How ByteDance Built a Scalable Distributed Graph Database

The article offers a comprehensive technical deep‑dive into ByteDance’s home‑grown distributed graph database and graph‑processing engine, ByteGraph, covering its directed‑property graph model, Gremlin query support, multi‑layer architecture, storage strategies for massive data, and real‑world graph‑computing practices.

Big DataByteGraphGraph Database
0 likes · 28 min read
Inside ByteGraph: How ByteDance Built a Scalable Distributed Graph Database
JD Retail Technology
JD Retail Technology
May 13, 2021 · Big Data

Evolution and Architecture of JD.com Self‑Operated Rebate Platform

The article details the development, challenges, and redesign of JD.com’s self‑operated rebate system, describing its early monolithic architecture, data‑intensive processing pipeline, migration to a modular, high‑availability platform built on Spark, Hive, and Elasticsearch, and the resulting performance and operational improvements.

Big DataETLSpark
0 likes · 16 min read
Evolution and Architecture of JD.com Self‑Operated Rebate Platform
DataFunTalk
DataFunTalk
May 12, 2021 · Big Data

Building a Unified Real‑Time and Offline OLAP Platform with DorisDB at Yuanfudao

The article describes how Yuanfudao's data middle platform built a high‑performance OLAP service using the MPP HOLAP engine DorisDB to unify real‑time and batch analytics, meet low‑latency and high‑concurrency requirements, and support diverse education‑industry use cases such as live‑stream monitoring, advertising, and order analytics.

Big DataDorisDBEducation Technology
0 likes · 13 min read
Building a Unified Real‑Time and Offline OLAP Platform with DorisDB at Yuanfudao
Tencent Tech
Tencent Tech
May 12, 2021 · Big Data

How Tencent Powered China’s 7th Census with Big Data and Cloud Tech

The article explains how China’s seventh national census, covering 1.41 billion people, was conducted using fully electronic data collection, self‑service mini‑programs, massive cloud‑native infrastructure, and high‑performance databases to achieve real‑time processing and unprecedented scale.

Big Datacensusdatabases
0 likes · 8 min read
How Tencent Powered China’s 7th Census with Big Data and Cloud Tech
DataFunTalk
DataFunTalk
May 11, 2021 · Big Data

Design and Practice of Baixin Bank's Flink‑Based Real‑Time Computing Platform and Hudi‑Powered Real‑Time Data Lake

This article details Baixin Bank's construction of a Flink‑driven real‑time computing platform integrated with Hudi as a real‑time data lake, covering background, architecture, data collection, transformation, storage layers, technical challenges, future roadmap, and practical lessons for similar big‑data initiatives.

Big DataFlinkHudi
0 likes · 12 min read
Design and Practice of Baixin Bank's Flink‑Based Real‑Time Computing Platform and Hudi‑Powered Real‑Time Data Lake
Big Data Technology & Architecture
Big Data Technology & Architecture
May 11, 2021 · Big Data

Data Quality: Dimensions, Rules, and Constraints

The article explains the importance of data quality in the big data era, defines key quality dimensions such as completeness, uniqueness, validity, consistency, accuracy, timeliness, and credibility, and details how each dimension can be measured and enforced through specific constraints and validation rules.

Big DataConsistencyData Governance
0 likes · 9 min read
Data Quality: Dimensions, Rules, and Constraints
Architects Research Society
Architects Research Society
May 9, 2021 · Big Data

Data Lakes vs. Data Warehouses: Key Differences and Choosing the Right Approach

This article explains the fundamental distinctions between data lakes and data warehouses, outlines five critical differences—including data retention, type support, user support, adaptability, and insight speed—and offers guidance on selecting the appropriate solution based on organizational needs and technology options.

AnalyticsBig DataData Architecture
0 likes · 12 min read
Data Lakes vs. Data Warehouses: Key Differences and Choosing the Right Approach
Architecture Digest
Architecture Digest
May 7, 2021 · Big Data

Comprehensive Overview of Data Middle Platform Architecture and Practices

This article provides a detailed introduction to data middle platform concepts, covering data aggregation, ingestion tools, offline and real‑time development, data governance, service layers, monitoring, and deployment patterns, illustrating how enterprises build unified data ecosystems across various industries.

Big DataData GovernanceData Platform
0 likes · 25 min read
Comprehensive Overview of Data Middle Platform Architecture and Practices
Qu Tech
Qu Tech
May 6, 2021 · Big Data

How JuiceFS Cut HDFS Load by 26% and Boost Presto Query Speed 13%

This case study details how integrating JuiceFS with Presto reduced HDFS cluster load by about 26%, achieved over 90% cache hit rate for ad‑hoc queries, and lowered average query latency by roughly 13%, while simplifying operations and improving system stability.

Big DataCacheHDFS
0 likes · 9 min read
How JuiceFS Cut HDFS Load by 26% and Boost Presto Query Speed 13%
DataFunTalk
DataFunTalk
May 5, 2021 · Big Data

JD's OLAP Architecture: Design, Challenges, and Solutions

This article explains how JD constructs its OLAP platform from data ingestion to storage, querying, and management, describing the diverse data sources, real‑time and offline processing, scalability, consistency, fault tolerance, and future optimization plans, while addressing key technical challenges and solutions.

Big DataDistributed SystemsJD.com
0 likes · 15 min read
JD's OLAP Architecture: Design, Challenges, and Solutions
DataFunTalk
DataFunTalk
May 4, 2021 · Big Data

Design and Implementation of a Real-Time Data Transmission Platform Based on Apache Flink at AutoHome

This article presents the background, requirements, architectural design, component interaction, and implementation details of AutoHome's real‑time data transmission platform built on Apache Flink, highlighting its high availability, exactly‑once semantics, scalability, DDL handling, and integration with existing streaming services.

Apache FlinkBig DataData Streaming
0 likes · 18 min read
Design and Implementation of a Real-Time Data Transmission Platform Based on Apache Flink at AutoHome