Tagged articles
3675 articles
Page 24 of 37
Beike Product & Technology
Beike Product & Technology
Nov 13, 2020 · Big Data

Beike One‑Stop Big Data Development Platform: Architecture, Evolution, and Future Outlook

The article summarizes Beike's one‑stop big data development platform, describing its data business background, the evolution from a simple Hadoop‑Kafka‑Hive stack to a metadata‑driven, asset‑oriented platform, and outlines current capabilities in data management, integration, scheduling, quality, openness, and future plans.

Big DataData GovernanceData Platform
0 likes · 11 min read
Beike One‑Stop Big Data Development Platform: Architecture, Evolution, and Future Outlook
Tencent Cloud Developer
Tencent Cloud Developer
Nov 13, 2020 · Big Data

Apache Spark Core: Architecture, Components, and Execution Flow

Apache Spark Core is a high‑performance, fault‑tolerant engine that abstracts distributed computation through SparkContext, DAG and Task schedulers, supports in‑memory and disk storage, runs on various cluster managers (YARN, Kubernetes, etc.), and unifies batch, streaming, ML and graph processing via its rich ecosystem.

Apache SparkBig DataDAG scheduler
0 likes · 17 min read
Apache Spark Core: Architecture, Components, and Execution Flow
DataFunSummit
DataFunSummit
Nov 12, 2020 · Big Data

OLAP Engine Selection and Challenges in Large-Scale Data at Youku

This article explores the challenges big data brings to traditional data technologies and reviews various OLAP solutions—including MPP, batch processing, pre‑computation, and Hadoop‑based engines—while detailing Youku’s specific business scenarios and how different OLAP engines are selected to meet performance, scalability, and real‑time analysis requirements.

AnalyticsBig DataMPP
0 likes · 14 min read
OLAP Engine Selection and Challenges in Large-Scale Data at Youku
Xianyu Technology
Xianyu Technology
Nov 11, 2020 · Industry Insights

How Alibaba’s Double‑11 Tech Stack Powers Record‑Breaking Live Commerce

Alibaba’s Double 11 2023 showcased a suite of cutting‑edge technologies—including the GRTN real‑time transmission network, edge‑AI voice interaction, massive digital infrastructure, AI‑driven smart sample rooms, and 3D virtual home‑decoration live streams—that together delivered sub‑second latency, 30% cost reduction, and unprecedented merchant scalability.

3D virtual realityBig DataDigital Infrastructure
0 likes · 11 min read
How Alibaba’s Double‑11 Tech Stack Powers Record‑Breaking Live Commerce
Architect
Architect
Nov 11, 2020 · Big Data

Real-time Click Stream Data Warehouse with Flink and ClickHouse: Architecture, Layered Design, and Practical Tips

This article explains how to build a real‑time click‑stream data warehouse using Flink for stream processing and ClickHouse for near‑real‑time OLAP, covering click‑stream characteristics, dimensional modeling, layered warehouse design, async dimension joins, sink implementation, and data rebalancing strategies.

Big DataClick StreamClickHouse
0 likes · 7 min read
Real-time Click Stream Data Warehouse with Flink and ClickHouse: Architecture, Layered Design, and Practical Tips
DataFunTalk
DataFunTalk
Nov 11, 2020 · Big Data

Evolution and Practices of Cainiao's Real‑Time Data Warehouse for International Import Business

This article details the high‑complexity logistics scenario of Cainiao's international import business, explains the evolution from offline to real‑time data warehouses (versions 1.0 and 2.0), describes the layered architecture, enumerates technical challenges such as multi‑source joins, state explosion, out‑of‑order processing, and presents concrete solutions using Flink features, logical middle‑layers, union‑all joins, deduplication, timer services, and batch‑stream hybrid processing.

Big DataFlinkState Management
0 likes · 21 min read
Evolution and Practices of Cainiao's Real‑Time Data Warehouse for International Import Business
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 8, 2020 · Big Data

Flume Tuning Guide for High‑Throughput Data Ingestion

This article explains how to identify and resolve performance bottlenecks in Apache Flume by configuring Taildir sources, optimizing channel capacities, tuning Kafka sinks, adjusting JVM options, and using simple monitoring scripts, enabling a single Flume‑NG agent to sustain over 50,000 RPS in production.

Big DataFlumeKafka
0 likes · 10 min read
Flume Tuning Guide for High‑Throughput Data Ingestion
iQIYI Technical Product Team
iQIYI Technical Product Team
Nov 6, 2020 · Big Data

Recommended Technical Articles: iQiyi Effect Advertising Exploration and Druid Practice

This recommendation highlights two iTech Forum pieces—one detailing iQiyi’s effect advertising exploration and implementation, and another documenting the company’s Druid practice and technical evolution—providing readers with in‑depth case studies, performance insights, and practical guidance for similar large‑scale data and advertising systems.

AdvertisingBig DataData Analytics
0 likes · 1 min read
Recommended Technical Articles: iQiyi Effect Advertising Exploration and Druid Practice
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Nov 5, 2020 · Cloud Native

How Transwarp Scheduler Tackles Mixed Workloads in Unified Cloud‑Native Infrastructure

This article reviews the challenges of scheduling heterogeneous workloads—micro‑services, big‑data, AI, and HPC—on a unified cloud‑native platform, compares existing schedulers like Mesos and YARN, examines Kubernetes ecosystem extensions such as Volcano and YuniKorn, and details the design and components of the Transwarp Scheduler built on Kubernetes Scheduling Framework v2.

AIBig DataCloud Native
0 likes · 16 min read
How Transwarp Scheduler Tackles Mixed Workloads in Unified Cloud‑Native Infrastructure
dbaplus Community
dbaplus Community
Nov 3, 2020 · Big Data

How Ctrip Boosted Hotel Data Warehouse Performance 400% with ClickHouse

Ctrip’s hotel data team tackled a 3 TB daily data load by building a ClickHouse cluster on VMware, creating custom sync and execution tools, applying query optimizations, and handling merge and memory errors, ultimately achieving over 400% performance gains across multiple reporting themes.

Big DataClickHouseETL
0 likes · 7 min read
How Ctrip Boosted Hotel Data Warehouse Performance 400% with ClickHouse
AntTech
AntTech
Nov 2, 2020 · Frontend Development

Opportunities and Challenges of Enterprise Data Visualization Applications

The talk outlines why enterprise data visualization is essential for extracting value from massive, multi‑dimensional data, describes design and development challenges, presents AntV's comprehensive frontend visualization solutions, and predicts future trends such as intelligent, democratized, and decision‑integrated visual analytics.

AntVBig DataData visualization
0 likes · 15 min read
Opportunities and Challenges of Enterprise Data Visualization Applications
Top Architect
Top Architect
Oct 31, 2020 · Big Data

Building a Zhihu User Data Crawler and Large‑Scale Analysis with SpringBoot, SeimiCrawler, RabbitMQ, ElasticSearch, and Kibana

This article describes how to build a Java‑based crawler to collect millions of Zhihu user profiles, handle anti‑crawling measures with rotating user‑agents and a proxy pool, deduplicate data using a Bloom filter, import the results into ElasticSearch, and analyze the dataset with Kibana and ECharts visualizations.

Big DataElasticsearchKibana
0 likes · 15 min read
Building a Zhihu User Data Crawler and Large‑Scale Analysis with SpringBoot, SeimiCrawler, RabbitMQ, ElasticSearch, and Kibana
Tencent Cloud Middleware
Tencent Cloud Middleware
Oct 30, 2020 · Cloud Computing

How KonaJDK Powers Tencent Cloud Java, Big Data, and Secure Computing

This article explains how Tencent's self‑developed KonaJDK underpins cloud Java services, enhances micro‑service monitoring, adds national cryptography support, optimizes large‑heap tools like jmap, and delivers performance gains for big‑data workloads, while contributing key features back to the OpenJDK community.

Big DataCloud ComputingJVM
0 likes · 11 min read
How KonaJDK Powers Tencent Cloud Java, Big Data, and Secure Computing
ITPUB
ITPUB
Oct 30, 2020 · Fundamentals

Why Java Remains the Dominant Programming Language Across Industries

The article outlines Java’s history, its widespread adoption by top companies, key features such as simplicity, portability and security, and its extensive use in big‑data frameworks, IoT, Android, finance, web development, scientific tools, and cloud services, arguing why it will stay popular.

Big DataEnterpriseIoT
0 likes · 11 min read
Why Java Remains the Dominant Programming Language Across Industries
21CTO
21CTO
Oct 30, 2020 · Big Data

Which Log Collection System Wins? Scribe, Chukwa, Kafka, Flume & ELK Compared

This article reviews the background, requirements, and architectural designs of major open‑source log collection systems—including Facebook’s Scribe, Apache’s Chukwa, LinkedIn’s Kafka, Cloudera’s Flume—and evaluates mature monitoring tools such as ELK, highlighting their features, use cases, advantages, and drawbacks for large‑scale log processing.

Big DataELKFlume
0 likes · 18 min read
Which Log Collection System Wins? Scribe, Chukwa, Kafka, Flume & ELK Compared
Zhongtong Tech
Zhongtong Tech
Oct 30, 2020 · Big Data

How Apache Kylin Supercharged OLAP at ZTO Express: A Deep Dive

This article details ZTO Express's journey of adopting Apache Kylin for OLAP, comparing it with Presto, describing platform architecture, performance gains, integration with scheduling and monitoring systems, and the practical optimizations and future plans that enabled sub‑second query responses on massive daily data volumes.

Apache KylinBig DataHBase
0 likes · 16 min read
How Apache Kylin Supercharged OLAP at ZTO Express: A Deep Dive
Alibaba Cloud Developer
Alibaba Cloud Developer
Oct 29, 2020 · Frontend Development

How Big Data and AI Are Redefining Front‑End Development

From the early days of static web pages to today's data‑driven, AI‑enhanced interfaces, this article explores how the big‑data boom and artificial‑intelligence advances since 2010 have transformed front‑end technologies, driving innovations in data visualization, web‑based software, and diverse user interactions.

AIBig DataData visualization
0 likes · 11 min read
How Big Data and AI Are Redefining Front‑End Development
ITPUB
ITPUB
Oct 16, 2020 · Big Data

How NetEase Cloud Music Built a Real‑Time Data Warehouse with Flink & Calcite

This article details NetEase Cloud Music's evolution of a real‑time data warehouse built on Flink 1.9 and Calcite, covering platform scale, architectural design, metadata management, SDK simplifications, monitoring improvements, and concrete use cases such as AB‑testing, live reporting, and feature serving.

Big DataCalciteFlink
0 likes · 8 min read
How NetEase Cloud Music Built a Real‑Time Data Warehouse with Flink & Calcite
Yuewen Technology
Yuewen Technology
Oct 16, 2020 · Artificial Intelligence

How Intelligent Traffic Distribution Boosts New Book Exposure in Reading Apps

This article describes the design and implementation of an intelligent traffic distribution system for a reading platform, detailing its background, overall architecture, sub-modules such as the small‑traffic experiment platform, near‑line computation, retrieval strategies, pacing algorithms, and how it balances user personalization with content ecosystem growth.

AIBig DataReal-time Streaming
0 likes · 8 min read
How Intelligent Traffic Distribution Boosts New Book Exposure in Reading Apps
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 15, 2020 · Big Data

Meituan's OLAP Requirements and Apache Kylin Deployment: Architecture, Challenges, and Comparative Analysis

This article describes Meituan's massive OLAP workloads, the specific challenges of data scale, complex schemas, and precise counting, explains how Apache Kylin was integrated using wide tables and bitmap deduplication, compares its performance and features with Presto, Druid and other engines, and outlines future improvements.

Apache KylinBig DataMeituan
0 likes · 19 min read
Meituan's OLAP Requirements and Apache Kylin Deployment: Architecture, Challenges, and Comparative Analysis
Alibaba Cloud Developer
Alibaba Cloud Developer
Oct 11, 2020 · Operations

How Alibaba’s SLS Powers a Unified Observability Platform for Massive Data

Alibaba Cloud’s Log Service (SLS) has evolved into a unified observability middle‑platform that handles tens of petabytes daily, offering integrated storage, processing, and AI‑driven analysis for logs, metrics, and traces, while addressing challenges of data ingestion, performance, and scalability across diverse Ops scenarios.

Big DataCloud StorageLog Analytics
0 likes · 16 min read
How Alibaba’s SLS Powers a Unified Observability Platform for Massive Data
ITPUB
ITPUB
Oct 10, 2020 · Big Data

How Didi Scaled Presto for Petabyte‑Scale Queries: Architecture & Optimizations

Didi’s three‑year journey with Presto transformed it into the company’s primary ad‑hoc and Hive‑SQL acceleration engine, serving over 6 000 users, processing 2‑3 PB of HDFS data daily, and achieving major gains in stability, performance, cost, and usability through extensive architectural tweaks, resource isolation, connector extensions, and monitoring enhancements.

Big DataCluster ManagementDruid Connector
0 likes · 18 min read
How Didi Scaled Presto for Petabyte‑Scale Queries: Architecture & Optimizations
JD Tech Talk
JD Tech Talk
Oct 10, 2020 · Big Data

Discovering Real-Time Reachable Areas Using Trajectory Connections

This article presents a novel method for real-time reachable area analysis that leverages recent trajectory data, introduces a Skip Graph Index for efficient query processing, predicts optimal trajectory‑splicing parameters with machine learning, and demonstrates its effectiveness through extensive experiments on multiple real‑world datasets.

Big Datak-value predictionreal-time reachable area
0 likes · 13 min read
Discovering Real-Time Reachable Areas Using Trajectory Connections
Didi Tech
Didi Tech
Oct 9, 2020 · Big Data

Presto at Didi: Architecture, Optimizations, and Operational Experience

At Didi, Presto has been the default ad‑hoc and Hive‑SQL engine for over three years, serving 6,000 users, processing 2‑3 PB daily and 30‑35 trillion rows, with mixed and dedicated clusters, migration to PrestoSQL 340, extensive Hive compatibility, label‑based isolation, a native Druid connector, usability and stability enhancements, and JVM‑level performance optimizations, while planning further resource‑saving upgrades.

Big DataCluster ManagementDistributed SQL
0 likes · 17 min read
Presto at Didi: Architecture, Optimizations, and Operational Experience
Alibaba Terminal Technology
Alibaba Terminal Technology
Oct 9, 2020 · Frontend Development

How Big Data and AI Are Redefining Front‑End Development

From the early days of static web pages to today’s data‑driven, AI‑enhanced interfaces, this article explores how the rise of big data platforms like Alibaba Cloud’s Feitian has transformed front‑end development through advanced visualization, software‑Web convergence, and diverse new interactions.

Big DataCloud ComputingData visualization
0 likes · 9 min read
How Big Data and AI Are Redefining Front‑End Development
DataFunTalk
DataFunTalk
Oct 7, 2020 · Big Data

Yanxuan Data Warehouse: Architecture, Standards, and Evaluation Framework

This article outlines the Yanxuan data warehouse’s layered architecture, the offline and real‑time development platforms, the comprehensive standards for metric definition, model design, and SQL development, and proposes a six‑dimensional evaluation system covering data norms, security, quality, stability, continuous improvement, and development efficiency.

Big DataData GovernanceSQL Standards
0 likes · 12 min read
Yanxuan Data Warehouse: Architecture, Standards, and Evaluation Framework
DataFunTalk
DataFunTalk
Sep 30, 2020 · Big Data

Real-time Data Warehouse Construction for Didi Ride-hailing's Carpool Service

This article details Didi's end‑to‑end real‑time data warehouse design for the carpool business, covering its objectives, architecture layers from ODS to application, naming conventions, StreamSQL development, operational tooling, challenges faced, and future batch‑stream integration plans.

Big DataDidiFlink
0 likes · 20 min read
Real-time Data Warehouse Construction for Didi Ride-hailing's Carpool Service
IT Architects Alliance
IT Architects Alliance
Sep 29, 2020 · Big Data

How Qualitis Ensures High‑Availability Data Quality Monitoring on Big Data Platforms

Qualitis is a big‑data‑platform‑based data‑quality‑management service that defines, detects, and reports data‑set quality issues, featuring idempotent backend services, load‑balanced high‑availability, Zookeeper‑coordinated process synchronization, thread‑pool throttling, and clearly separated internal and external APIs.

Big DataData QualityQualitis
0 likes · 6 min read
How Qualitis Ensures High‑Availability Data Quality Monitoring on Big Data Platforms
Architects Research Society
Architects Research Society
Sep 29, 2020 · Big Data

Understanding DataOps: Principles, Benefits, and Implementation

DataOps, an Agile‑derived methodology that extends DevOps principles to data analytics, emphasizes automation, collaboration, and continuous delivery to accelerate and improve data processing, quality, and business insight, while outlining its benefits, relationship to Agile/DevOps, and practical steps for adoption.

Big DataContinuous AnalyticsDataOps
0 likes · 12 min read
Understanding DataOps: Principles, Benefits, and Implementation
Tencent Advertising Technology
Tencent Advertising Technology
Sep 29, 2020 · Artificial Intelligence

The Power of Data and AI: Highlights from the 2020 Tencent Advertising Algorithm Live Week

The 2020 Tencent Advertising Algorithm Live Week presented expert insights on federated learning, machine learning, big data, and deep‑learning applications in advertising, offering a comprehensive Q&A that explains how massive data fuels AI breakthroughs and reshapes business problem solving.

Big Datamachine learning
0 likes · 11 min read
The Power of Data and AI: Highlights from the 2020 Tencent Advertising Algorithm Live Week
High Availability Architecture
High Availability Architecture
Sep 29, 2020 · Artificial Intelligence

Architecture Design Overview of Recommendation Systems

This article reviews the core algorithm modules of recommendation systems from an architectural perspective, discussing offline, near‑line, and online layers, the trade‑offs between personalization, timeliness, and resource consumption, system boundaries, external dependencies, and the practical design of each layer.

AIBig Dataarchitecture
0 likes · 30 min read
Architecture Design Overview of Recommendation Systems
DataFunTalk
DataFunTalk
Sep 25, 2020 · Big Data

Meituan Waimai Data Warehouse: Architecture Evolution, Governance, and Future Roadmap

The article details Meituan Waimai's offline data warehouse evolution from its initial V1.0 design through V2.0 improvements to the V3.0 modeling‑tool driven architecture, covering the four‑layer framework, Spark‑based ETL, data governance processes, resource optimization, security measures, and future development plans.

Big DataData GovernanceETL
0 likes · 22 min read
Meituan Waimai Data Warehouse: Architecture Evolution, Governance, and Future Roadmap
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 24, 2020 · Big Data

HiveSQL Classic Optimization Cases: Partitioning, Subset Decomposition, and Percentile Approximation Improvements

This article presents three HiveSQL optimization case studies—restructuring a large‑scale query with partitioned tables, breaking a complex window‑function query into smaller subsets with joins, and refactoring excessive PERCENTILE_APPROX calls—demonstrating how each change reduces execution time from hours to minutes and improves overall performance.

Big DataHiveSQLPartitioning
0 likes · 9 min read
HiveSQL Classic Optimization Cases: Partitioning, Subset Decomposition, and Percentile Approximation Improvements
Java Architect Essentials
Java Architect Essentials
Sep 23, 2020 · Big Data

Evolution of JD.com Order Center Elasticsearch Cluster Architecture

The article details how JD.com's order center migrated its massive order query workload from MySQL to Elasticsearch, iteratively improving cluster isolation, node deployment, replica tuning, master‑slave redundancy, version upgrades, and data synchronization while addressing performance pitfalls such as deep pagination and FieldData usage.

Big DataCluster ArchitectureElasticsearch
0 likes · 12 min read
Evolution of JD.com Order Center Elasticsearch Cluster Architecture
JD Tech Talk
JD Tech Talk
Sep 23, 2020 · Artificial Intelligence

Delivery Time Inference Based on Couriers' Trajectories

Leveraging large-scale courier trajectory data and spatiotemporal analytics, the DTInf framework infers parcel delivery times by detecting stay points, correcting delivery locations, and matching delivery events using a trained MLP model, achieving a mean absolute error of 401 seconds and outperforming baselines by over 30%.

Big DataLogisticscourier trajectories
0 likes · 10 min read
Delivery Time Inference Based on Couriers' Trajectories
Tencent Cloud Developer
Tencent Cloud Developer
Sep 22, 2020 · Big Data

Evolution and Architecture of Beike's OLAP Platform: From Hive/MySQL to Multi‑Engine Flexibility

Beike’s OLAP platform has progressed from a basic Hive‑MySQL batch pipeline to a Kylin‑based single‑engine solution, and now to a flexible multi‑engine architecture that uses a query‑engine layer to route metrics across Kylin, Druid, ClickHouse and Doris, dramatically cutting cube‑build times, supporting real‑time ingestion, and paving the way for further engine consolidation and automated performance routing.

Apache DruidApache KylinBeike
0 likes · 17 min read
Evolution and Architecture of Beike's OLAP Platform: From Hive/MySQL to Multi‑Engine Flexibility
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 18, 2020 · Big Data

Understanding the Elasticsearch Master Election Process

This article explains when Elasticsearch triggers a master election, describes each election stage—including active master and candidate selection, Bully algorithm comparison, and master node responsibilities—while providing code excerpts that illustrate the underlying implementation details.

Big DataCluster ManagementDistributed Systems
0 likes · 8 min read
Understanding the Elasticsearch Master Election Process
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 18, 2020 · Big Data

Understanding Kafka Consumer Groups, Partition Assignment, and Offset Management

This article explains how Kafka consumer groups accelerate message consumption by distributing partitions across multiple consumers, details the three key characteristics of consumer groups, and provides in‑depth guidance on partition assignment strategies and offset management with practical Java code examples.

Big DataKafkaOffset Management
0 likes · 13 min read
Understanding Kafka Consumer Groups, Partition Assignment, and Offset Management
Youku Technology
Youku Technology
Sep 18, 2020 · Big Data

Digitalization of Youku Long‑Video Content Supply Chain: Practices and Architecture

Youku’s digital content‑supply‑chain system transforms long‑video production by introducing a three‑stage framework—structured evaluation of talent and scripts, information‑driven production management, and a unified demand‑aligned content strategy—that curtails delays, mitigates risk, and saves over 100 million RMB while scaling to billions of data records daily.

Artificial IntelligenceBig DataContent Supply Chain
0 likes · 11 min read
Digitalization of Youku Long‑Video Content Supply Chain: Practices and Architecture
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Sep 17, 2020 · Big Data

How Big Data Is Used for Price Discrimination and the New Regulations to Stop It

The article explains how big‑data algorithms enable online price discrimination—often called “kill‑familiar” pricing—illustrates real‑world e‑commerce examples, outlines the recently enacted Chinese online tourism regulation prohibiting such practices, and discusses broader data‑privacy and security concerns.

Big Dataconsumer rightsdata privacy
0 likes · 6 min read
How Big Data Is Used for Price Discrimination and the New Regulations to Stop It
Programmer DD
Programmer DD
Sep 17, 2020 · Big Data

5 Open‑Source Quant Trading Tools Every Developer Should Explore

Discover five open‑source stock‑trading utilities—funds, ZVT, QUANTAXIS, StockAnalysisSystem, and match‑trade—each offering real‑time data, backtesting, multi‑asset support, and high‑performance matching to help programmers build powerful quantitative finance applications.

Big DataOpen-sourcePython
0 likes · 5 min read
5 Open‑Source Quant Trading Tools Every Developer Should Explore
DataFunTalk
DataFunTalk
Sep 17, 2020 · Big Data

Design and Implementation of a Scalable User Tag Production Platform

The article explains how a flexible, high‑performance user‑tagging system is built on a batch‑stream integrated architecture using big‑data technologies such as Impala, HDFS, and Flink to support both offline and real‑time label generation for precise marketing, product improvement, and operational analytics.

Big DataFlinkImpala
0 likes · 15 min read
Design and Implementation of a Scalable User Tag Production Platform
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 16, 2020 · Big Data

Understanding Flink CEP's NFAb Automaton for Complex Event Processing

This article explains how Flink's Complex Event Processing (CEP) library implements pattern matching using a nondeterministic finite automaton with matching caches (NFAb), covering its theoretical foundation, construction, state transition semantics, event selection strategies, shared versioned match buffers, and computation state details.

Big DataCEPFlink
0 likes · 9 min read
Understanding Flink CEP's NFAb Automaton for Complex Event Processing
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 16, 2020 · Databases

Optimizing a Complex MySQL Slow Query for Article Comments

This article analyzes a 60‑second MySQL query that retrieves article comments with multiple filters, explains why the optimizer chooses a small table as the driver, and presents a step‑by‑step optimization—including avoiding semi‑joins, improving index usage, refining range conditions, and moving GROUP BY into a subquery—that reduces execution time to 1.3 seconds, achieving a 60‑fold speedup.

Big DataMySQLQuery Performance
0 likes · 13 min read
Optimizing a Complex MySQL Slow Query for Article Comments
Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 15, 2020 · Big Data

How JindoFS Accelerates Data Lakes: Deep Dive into Storage‑Compute Optimization

This article explains why data lake acceleration is essential, outlines the three key architectural decisions for big‑data architects, and details Alibaba Cloud's JindoFS solutions—including basic adaptation, cache acceleration, and deep‑customization modes—to boost performance and reliability of lake storage and compute.

Big DataCloud StorageJindoFS
0 likes · 18 min read
How JindoFS Accelerates Data Lakes: Deep Dive into Storage‑Compute Optimization
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 11, 2020 · Big Data

Evolution of JD.com Order Center Elasticsearch Cluster Architecture

This article details how JD.com's order center migrated its Elasticsearch cluster from a simple, default‑configured setup to a highly available, multi‑replica, dual‑cluster architecture with version upgrades, data synchronization strategies, and performance optimizations to support billions of documents and hundreds of millions of daily queries.

Big DataCluster ArchitectureElasticsearch
0 likes · 12 min read
Evolution of JD.com Order Center Elasticsearch Cluster Architecture
Ctrip Technology
Ctrip Technology
Sep 10, 2020 · Big Data

Design and Implementation of a Unified Log Framework for Ctrip Payment Center

The article describes the design, architecture, and operational details of a unified logging framework at Ctrip's payment center, covering log production via a Log4j2 extension, Kafka‑Camus collection, Hive/ORC storage, MapReduce parsing optimizations, and governance strategies for massive daily TB‑scale data.

Big DataCamusData Governance
0 likes · 15 min read
Design and Implementation of a Unified Log Framework for Ctrip Payment Center
DataFunTalk
DataFunTalk
Sep 10, 2020 · Databases

Graph‑Based Real‑Time Content Update Architecture at Youku: Challenges, Design, and Practice

This technical presentation explains how Youku tackles the massive, real‑time update problem of video‑content graphs by adopting a graph‑database architecture, sub‑graph partitioning, schema‑driven logical views, and Flink‑based pipelines to achieve second‑level updates for billions of entities and attributes.

Big DataFlinkGraph Database
0 likes · 15 min read
Graph‑Based Real‑Time Content Update Architecture at Youku: Challenges, Design, and Practice
ITPUB
ITPUB
Sep 9, 2020 · Databases

How to Speed Up Massive MySQL User‑Log Tables: Partitioning, Indexing, and Migration Strategies

This article examines performance problems with a 20‑million‑row MySQL user‑log table on Alibaba Cloud RDS, outlines three solution paths—optimizing the existing database, migrating to a MySQL‑compatible high‑performance service, and adopting a big‑data engine—and provides detailed guidance on schema design, indexing, partitioning, and practical SQL tweaks.

Big DataDatabase OptimizationMySQL
0 likes · 17 min read
How to Speed Up Massive MySQL User‑Log Tables: Partitioning, Indexing, and Migration Strategies
DataFunTalk
DataFunTalk
Sep 9, 2020 · Big Data

NetEase Big Data User Profiling: Architecture, Tagging System, and Real‑World Applications

This presentation details NetEase's massive multi‑domain data ecosystem, the design of its user‑profile center—including basic, behavior, preference, and predictive tags—ID‑mapping techniques, quality assurance processes, and several real‑time and offline use cases such as marketing, recommendation, growth operations, advertising, and fraud detection.

Big DataID-MappingTag Management
0 likes · 13 min read
NetEase Big Data User Profiling: Architecture, Tagging System, and Real‑World Applications
Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 7, 2020 · Big Data

How Alibaba’s ADC Project Automates Real‑Time SQL Generation with Design Patterns and Priority Queues

This article explains how the Alibaba DChain Data Converger (ADC) automatically creates wide‑table SQL for real‑time cross‑database analytics by using a pipeline architecture, priority‑queue‑driven task scheduling, and specific design patterns to handle metadata, joins, and resource management.

Big DataSQL generationpriority-queue
0 likes · 13 min read
How Alibaba’s ADC Project Automates Real‑Time SQL Generation with Design Patterns and Priority Queues
DataFunTalk
DataFunTalk
Sep 7, 2020 · Big Data

Real‑time Data Warehouse Architecture and Best Practices in Alibaba Search Recommendation

This article presents Alibaba's search‑recommendation real‑time data warehouse, describing its business background, typical use cases, key requirements, the evolution from architecture 1.0 to 2.0 with Flink and Hologres, best‑practice patterns such as row/column storage, stream‑batch integration, high‑concurrency updates, and future directions like real‑time joins and persistent dimension storage.

Big DataFlinkHologres
0 likes · 13 min read
Real‑time Data Warehouse Architecture and Best Practices in Alibaba Search Recommendation
Architecture Digest
Architecture Digest
Sep 3, 2020 · Databases

Practical Elasticsearch Performance and Stability Tuning Guide

This article consolidates practical Elasticsearch tuning techniques—including configuration file adjustments, system‑level optimizations, and usage‑level settings—to improve cluster performance, stability, and resource efficiency for production environments.

Big DataCluster ConfigurationElasticsearch
0 likes · 15 min read
Practical Elasticsearch Performance and Stability Tuning Guide
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 2, 2020 · Big Data

An Overview of Apache Hudi: Architecture, Features, and Query Types

Apache Hudi is an open‑source data‑lake framework that leverages Spark to ingest, manage, and incrementally query large analytical datasets on HDFS‑compatible storage, offering features such as timeline management, copy‑on‑write and merge‑on‑read tables, and support for snapshot, incremental, and read‑optimized queries across engines like Hive, Spark SQL and Presto.

Apache HudiBig DataData Lake
0 likes · 12 min read
An Overview of Apache Hudi: Architecture, Features, and Query Types
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 1, 2020 · Big Data

Configuring Hadoop to Support LZO Compression

This guide explains how to enable LZO compression in Hadoop by installing the twitter‑provided hadoop‑lzo library, updating core‑site.xml, synchronizing files across nodes, creating LZO indexes, and running a WordCount MapReduce job with LZO‑compressed output.

Big DataHadoopLZO
0 likes · 6 min read
Configuring Hadoop to Support LZO Compression
DataFunTalk
DataFunTalk
Sep 1, 2020 · Big Data

NetEase Real-Time Computing Platform (Sloth): Architecture, Practices, and Future Outlook

This article introduces NetEase's real-time computing platform Sloth, detailing its architecture, component layers, integrated IDE, operational tooling, unified metadata management, challenges such as Kudu write amplification, and proposes a tiered real‑time data‑warehouse model with a vision for storage‑compute separation and unified batch‑stream APIs.

Big DataFlinkKafka
0 likes · 13 min read
NetEase Real-Time Computing Platform (Sloth): Architecture, Practices, and Future Outlook
Xianyu Technology
Xianyu Technology
Sep 1, 2020 · Artificial Intelligence

Interest-Based Live Stream Recommendation System for Xianyu

Within three weeks, the team built an interest‑based live‑stream recommendation platform for Xianyu that combined operational insights, BI analysis, and offline algorithms to generate user‑anchor interest tags, sync them to an online graph, and dramatically boost top‑room UV and click‑through rates.

Big DataGraph Databaseinterest tagging
0 likes · 8 min read
Interest-Based Live Stream Recommendation System for Xianyu
Laravel Tech Community
Laravel Tech Community
Aug 31, 2020 · Big Data

Evolution of JD Daojia Order System Elasticsearch Cluster Architecture

This article details the step‑by‑step evolution of the JD Daojia order‑center Elasticsearch cluster—from an initial loosely configured deployment to a real‑time dual‑cluster architecture with replica tuning, master‑slave adjustments, data‑sync strategies, and lessons learned about pagination, fielddata, and doc values—highlighting how each phase improved query throughput, stability, and scalability for billions of documents.

Big DataCluster ArchitectureElasticsearch
0 likes · 12 min read
Evolution of JD Daojia Order System Elasticsearch Cluster Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 30, 2020 · Big Data

Kylin Cube Construction Principles and Optimization Techniques

This article explains the fundamentals of Kylin Cube construction—including dimensions, measures, Cuboid generation, layer-by-layer and in‑memory building algorithms, storage mechanisms, and various optimization strategies such as derived dimensions, aggregation groups, row‑key design, and concurrency granularity—providing a comprehensive guide for big‑data OLAP practitioners.

Big DataCubeKylin
0 likes · 14 min read
Kylin Cube Construction Principles and Optimization Techniques
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 27, 2020 · Big Data

HBase Architecture, Components, and Operations Overview

This article provides a comprehensive overview of Apache HBase’s architecture, detailing its core components such as RegionServer, HMaster, ZooKeeper, WAL, MemStore, and HFiles, and explains key processes including read/write paths, compaction, region splitting, load balancing, and recovery mechanisms.

Big DataDatabase ArchitectureDistributed Systems
0 likes · 17 min read
HBase Architecture, Components, and Operations Overview
Tencent Cloud Developer
Tencent Cloud Developer
Aug 27, 2020 · Big Data

Elasticsearch Overview: Architecture, Lucene Foundations, Application Scenarios, and Optimizations

Elasticsearch, built on Apache Lucene, provides a distributed, near‑real‑time search platform that scales to billions of documents across thousands of nodes, supporting use cases such as log analytics, time‑series monitoring, and product search, while Tencent’s CES adds advanced availability, performance, and cost‑optimizing features.

Big DataElasticsearchScalability
0 likes · 17 min read
Elasticsearch Overview: Architecture, Lucene Foundations, Application Scenarios, and Optimizations
Efficient Ops
Efficient Ops
Aug 24, 2020 · Operations

How to Scale Elasticsearch for PB‑Level Game Logs: Real‑World Strategies & Lessons

This article walks through a mid‑size gaming company's journey of deploying, tuning, and scaling an Elasticsearch cluster for massive log volumes, covering hot‑cold node architecture, ILM policies, shard management, Logstash‑Kafka optimization, emergency expansions, and the promise of searchable snapshots to achieve petabyte‑scale storage with cost efficiency.

Big DataElasticsearchILM
0 likes · 28 min read
How to Scale Elasticsearch for PB‑Level Game Logs: Real‑World Strategies & Lessons
Didi Tech
Didi Tech
Aug 24, 2020 · Big Data

Evolution and Architecture of DiDi Data Channel Service

DiDi’s Data Channel Service evolved from a fragmented component system into a unified, SLA‑driven platform with a UI‑based Sync Center and Flink‑powered StreamSQL engine, dramatically improving task creation speed, resource utilization, and reliability while automating issue diagnosis for company‑wide real‑time and offline data synchronization.

Big DataETLFlink
0 likes · 12 min read
Evolution and Architecture of DiDi Data Channel Service
58 Tech
58 Tech
Aug 24, 2020 · Big Data

Design and Practice of an Online Real-Time Feature System for Intelligent Risk Control

This article presents the concepts, architecture, and practical techniques of an online real‑time feature system used in intelligent risk‑control, covering feature definition, time‑window types, calculation functions, distributed processing, low‑latency storage, and operational challenges in high‑concurrency environments.

Big DataReal-time ProcessingStreaming
0 likes · 16 min read
Design and Practice of an Online Real-Time Feature System for Intelligent Risk Control