Tagged articles
3675 articles
Page 29 of 37
Snowball Engineer Team
Snowball Engineer Team
Sep 24, 2019 · Big Data

Snowball Data Middle Platform (AIBO): Architecture, Capabilities, and Future Outlook

The article introduces Snowball's AIBO data middle platform, detailing its storage‑compute separation architecture, core capabilities such as data integration, catalog, tagging, analysis tools, micro‑service data APIs, and outlines future enhancements for security, lineage, and continuous business‑driven iteration.

Big DataData CatalogData Integration
0 likes · 12 min read
Snowball Data Middle Platform (AIBO): Architecture, Capabilities, and Future Outlook
Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 24, 2019 · Big Data

Inside Alibaba’s 10‑Year Search Engine: Architecture, Data Flow, and Indexing

Alibaba’s 10‑year‑old search engine combines data source aggregation, incremental and real‑time indexing, and online services through platforms like Tisplus, Bahamut, Maat, Ha3, Build Service and Drogo, illustrating a comprehensive architecture that powers 1688’s search capabilities across multiple engines and deployment pipelines.

Backend ArchitectureBig DataDistributed Systems
0 likes · 10 min read
Inside Alibaba’s 10‑Year Search Engine: Architecture, Data Flow, and Indexing
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 23, 2019 · Big Data

Applying Apache Kylin for Large‑Scale OLAP at Meituan: Architecture, Challenges, and Performance Evaluation

This article describes Meituan’s large‑scale OLAP requirements, how Apache Kylin was integrated to meet them, the architectural solutions, performance benchmarks against other engines, and future work, providing practical insights for building stable, precise, and high‑performance analytics platforms.

Apache KylinBig DataHadoop
0 likes · 20 min read
Applying Apache Kylin for Large‑Scale OLAP at Meituan: Architecture, Challenges, and Performance Evaluation
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 22, 2019 · Databases

Alibaba Cloud BDS Service for Non‑Stop HBase Cluster Migration

This article explains how Alibaba Cloud's BDS migration service enables continuous, high‑performance migration of HBase clusters—including schema, full data, and incremental sync—across version upgrades, hardware changes, network migrations, and cross‑region scenarios, while ensuring stability and minimal impact on live workloads.

Alibaba CloudBDSBig Data
0 likes · 10 min read
Alibaba Cloud BDS Service for Non‑Stop HBase Cluster Migration
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 21, 2019 · Big Data

Deploying Apache Flink on Kubernetes: A Step‑by‑Step Guide

This tutorial explains how to run Apache Flink jobs on Kubernetes by building Docker images, deploying JobManager and TaskManager components with Kubernetes manifests, configuring high‑availability with ZooKeeper and HDFS, and using SavePoints and scaling techniques to manage and extend Flink streaming applications.

Big DataDockerFlink
0 likes · 14 min read
Deploying Apache Flink on Kubernetes: A Step‑by‑Step Guide
Beike Product & Technology
Beike Product & Technology
Sep 20, 2019 · Big Data

Understanding DStream Construction and Execution in Spark Streaming

This article explains how Spark Streaming's DStream abstraction is built from InputDStream through successive transform operators, details the internal ForEachDStream implementation, describes the job generation and scheduling workflow, and outlines how Beike's real‑time platform leverages these mechanisms for large‑scale streaming tasks.

Big DataDstreamReal-time Processing
0 likes · 10 min read
Understanding DStream Construction and Execution in Spark Streaming
Suning Technology
Suning Technology
Sep 20, 2019 · Big Data

How Suning’s Big Data Engine Powers Smart Retail Transformation

Suning’s big‑data center, built on a 30‑year retail evolution and leveraging technologies like AI, cloud, and IoT, showcases how integrated data platforms and robust security can drive smart retail, improve services for 600 million users, and create a new competitive edge.

AIBig DataCloud Computing
0 likes · 6 min read
How Suning’s Big Data Engine Powers Smart Retail Transformation
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 19, 2019 · Big Data

Performance Optimization Practices for Meituan's Hadoop YARN Fair Scheduler

This article presents a comprehensive analysis of Meituan's Hadoop YARN fair scheduler, detailing its architecture, resource abstractions, scheduling workflow, performance bottlenecks, fine‑grained metrics, and a series of optimization techniques—including sorting improvements, job‑skip reduction, parallel queue sorting, and robust rollout strategies—to achieve high‑throughput, low‑latency scheduling for large‑scale offline, streaming, and machine‑learning workloads.

Big DataFair SchedulerPerformance Optimization
0 likes · 24 min read
Performance Optimization Practices for Meituan's Hadoop YARN Fair Scheduler
FunTester
FunTester
Sep 19, 2019 · Operations

Emerging Technologies Shaping DevOps and Software Testing in the Next Decade

Over the next decade, rapid advances in IoT, AI, big data, and pervasive automation such as cognitive RPA will transform DevOps practices, driving more integrated, intelligent testing and continuous delivery pipelines, while organizations mature their digital transformation journeys to meet increasingly complex, data‑driven operational demands.

AIBig DataIoT
0 likes · 8 min read
Emerging Technologies Shaping DevOps and Software Testing in the Next Decade
Efficient Ops
Efficient Ops
Sep 18, 2019 · Databases

Why the DBA Role Is Becoming a Narrowed, High‑Risk Career Path

The article analyzes how the DBA job market is shrinking as traditional enterprises shift away from legacy systems, cloud adoption reshapes responsibilities, and DBAs face limited advancement unless they transition to architecture or data‑analytics roles, highlighting the growing risk and low reward of staying in pure DBA work.

Big DataDBADatabase Administration
0 likes · 7 min read
Why the DBA Role Is Becoming a Narrowed, High‑Risk Career Path
Youzan Coder
Youzan Coder
Sep 18, 2019 · Big Data

Applying Newton's Law of Cooling to Transaction Scoring in DMP User Profiling

The article proposes using Newton’s law of cooling to score DMP user transactions, assigning higher weights to recent purchases that decay exponentially over time, deriving a cooling constant from boundary conditions, and normalizing the resulting heat‑based scores through log‑scaling and a sigmoid‑like mapping to a 0‑100 range.

Big DataDMPNewton cooling law
0 likes · 4 min read
Applying Newton's Law of Cooling to Transaction Scoring in DMP User Profiling
DataFunTalk
DataFunTalk
Sep 17, 2019 · Artificial Intelligence

Machine Learning for Personalized Education Paths – Case Study and Reflections

This lecture explores how machine learning can generate individualized learning pathways for students by building knowledge dependency graphs, defining optimization goals, and leveraging historical data to rank candidate routes, while reflecting on data, model, business, and demand challenges in AI-driven education.

AIBig Dataknowledge graph
0 likes · 10 min read
Machine Learning for Personalized Education Paths – Case Study and Reflections
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 16, 2019 · Big Data

Comprehensive Flink Interview Guide: Architecture, APIs, Operators, and Advanced Topics

This guide provides a detailed overview of Apache Flink covering its core streaming engine, APIs (DataSet, DataStream, Table), architectural components, comparison with Spark Streaming, partitioning, parallelism, restart strategies, state backends, time semantics, watermarks, SQL processing, fault‑tolerance mechanisms, memory management, serialization, RPC framework, back‑pressure handling, operator chaining, and practical tips for interview preparation.

Apache FlinkBig DataDataflow
0 likes · 22 min read
Comprehensive Flink Interview Guide: Architecture, APIs, Operators, and Advanced Topics
iQIYI Technical Product Team
iQIYI Technical Product Team
Sep 12, 2019 · Artificial Intelligence

AI Technology Practice and Application in Entertainment

The iQiyi Technology Salon’s AI Technology Practice and Application series explains how AI reshapes entertainment by automating video and audio production, optimizing short‑video flows, enabling intelligent search, and leveraging big‑data analytics for behavior analysis, intent recognition, and personalized recommendations, supported by iQiyi’s robust AI platform.

AI technologyBig DataEntertainment Industry
0 likes · 7 min read
AI Technology Practice and Application in Entertainment
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 11, 2019 · Big Data

Big Data Technology and Architecture: Case Studies of Taobao, Didi, and Meituan

This article reviews the evolution and key components of big data platforms at leading Chinese internet companies—Taobao, Didi, and Meituan—detailing their data sources, synchronization tools, storage layers, processing engines, and scheduling systems to provide practical guidance for building robust big data infrastructures.

ArchitectureBig DataData Platform
0 likes · 9 min read
Big Data Technology and Architecture: Case Studies of Taobao, Didi, and Meituan
Tencent Cloud Developer
Tencent Cloud Developer
Sep 11, 2019 · Big Data

YARN Practice and Technical Evolution at Kuaishou

Jiaoxiao Fang’s talk details Kuaishou’s YARN deployment, covering its architecture, support for offline, real‑time and ML workloads, and recent enhancements such as event‑handling stability, refined preemption, high‑throughput parallel scheduling, shuffle‑caching for small I/O, plus plans for job protection and multi‑cluster resource utilization.

Big DataCluster OptimizationDistributed Systems
0 likes · 16 min read
YARN Practice and Technical Evolution at Kuaishou
DataFunTalk
DataFunTalk
Sep 10, 2019 · Big Data

Why We Should Ride the Big Data Carriage: Business Perspectives on Data Growth and Machine Learning

The article explains why businesses must embrace the rapid, non‑linear growth of data and machine‑learning technologies, illustrating how data volume and richer information can drive exponential business value, improve competitiveness, and create sustainable positive feedback loops across various industry scenarios.

AIBig DataBusiness strategy
0 likes · 13 min read
Why We Should Ride the Big Data Carriage: Business Perspectives on Data Growth and Machine Learning
Tencent Cloud Developer
Tencent Cloud Developer
Sep 9, 2019 · Databases

Tencent Optimizes Elasticsearch High-Concurrency Write Performance, Cutting 10M Data Load Time by 20%

Tencent engineers improved Elasticsearch’s high‑concurrency write path, reducing the time to load ten million records from eighteen to fifteen minutes—a 20 % speed boost—earning thanks from Elastic’s CEO and showcasing the company’s broader open‑source contributions and strategic cloud‑search partnership.

Big DataElasticsearchOpen-source
0 likes · 6 min read
Tencent Optimizes Elasticsearch High-Concurrency Write Performance, Cutting 10M Data Load Time by 20%
Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 9, 2019 · Big Data

Unlocking the Power of Unstructured Data: From AI Breakthroughs to Business Value

This article explains how unstructured data—comprising documents, images, audio, video and more—now dominates over 80% of all data, outlines its characteristics and challenges, compares it with structured data, and showcases real-world AI applications such as ImageNet, intelligent customer service and smart security, while proposing a roadmap for building a unified unstructured‑data asset.

Big DataData Analyticsmachine learning
0 likes · 15 min read
Unlocking the Power of Unstructured Data: From AI Breakthroughs to Business Value
58 Tech
58 Tech
Sep 6, 2019 · Big Data

Architecture and Technical Implementation of the WMDA Data Analytics Platform

The article details WMDA's end‑to‑end data analytics architecture, covering zero‑event data collection, real‑time and offline processing pipelines built on Spark Streaming, Druid, Hadoop, Kettle, and TaskServer, and explains how these components collaborate to deliver comprehensive user behavior analysis.

Big DataDruidETL
0 likes · 11 min read
Architecture and Technical Implementation of the WMDA Data Analytics Platform
360 Tech Engineering
360 Tech Engineering
Sep 4, 2019 · Big Data

XSQL: A Low‑Barrier, Stable Multi‑Data‑Source Distributed Query Engine

XSQL is an open‑source, low‑threshold, highly stable distributed query engine that supports federated queries across heterogeneous data sources, offering push‑down optimization, metadata decentralization, multi‑engine integration, and seamless deployment on Spark/YARN for real‑time big‑data analytics.

Big DataDistributed QuerySQL Federation
0 likes · 14 min read
XSQL: A Low‑Barrier, Stable Multi‑Data‑Source Distributed Query Engine
Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 4, 2019 · Big Data

How Structured Big Data Storage Powers Modern Data Systems

This article explores the core components of data systems, the evolution toward lightweight, intelligent big data architectures, the distinction between primary and secondary storage, challenges of data replication, and how Alibaba Cloud's Tablestore implements advanced features such as storage‑compute separation, CDC, and multi‑model indexing for scalable, cost‑effective structured big data storage.

Big DataCDCCloud Services
0 likes · 24 min read
How Structured Big Data Storage Powers Modern Data Systems
DataFunTalk
DataFunTalk
Sep 3, 2019 · Big Data

The Value of Big Data in Machine Learning: Detailed Illustration and Insights

This article explains how big data enhances machine learning by enabling finer-grained data characterization, improving confidence in statistical conclusions, and supporting smarter learning through multiple stages of model development, illustrated with concrete examples and a discussion of sample size dilemmas.

Big Datadata analysismachine learning
0 likes · 10 min read
The Value of Big Data in Machine Learning: Detailed Illustration and Insights
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Sep 3, 2019 · Big Data

QuickSQL: 360’s Unified Multi-Source Query Engine Explained

This article outlines how 360’s data center built QuickSQL, a federated SQL engine that unifies queries across heterogeneous sources such as Hive, MySQL, and Elasticsearch, detailing the business challenges, architectural design, performance benchmarks, and future roadmap for multi‑source data analysis.

Big DataData IntegrationSQL Engine
0 likes · 12 min read
QuickSQL: 360’s Unified Multi-Source Query Engine Explained
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Sep 3, 2019 · Big Data

Practical Experiences and Lessons Learned in Building a Flink‑Based Real‑Time Computing Platform at Tongcheng‑Elong

This article details the design, implementation, and optimization of a Flink‑based real‑time computing platform at Tongcheng‑Elong, covering the evolution from Storm to Flink, support for FlinkSQL and FlinkStream, metric collection, logging, data lineage, savepoint management, and numerous stability fixes contributed back to the open‑source community.

Big DataData LineageFlink
0 likes · 16 min read
Practical Experiences and Lessons Learned in Building a Flink‑Based Real‑Time Computing Platform at Tongcheng‑Elong
Tencent Cloud Developer
Tencent Cloud Developer
Aug 30, 2019 · Big Data

How Tencent Cloud Leverages Spark, ElasticSearch, and Flink for PB‑Scale Data Warehousing

The cloud+ community and Kuaishou hosted a big‑data technology salon where experts detailed the evolution, architecture, and practical deployments of Spark‑based cloud data warehouses, ElasticSearch, Yarn, and Flink, highlighting trends, optimization techniques, and future directions for enterprise data analytics.

Big DataCloud ComputingElasticsearch
0 likes · 22 min read
How Tencent Cloud Leverages Spark, ElasticSearch, and Flink for PB‑Scale Data Warehousing
Beike Product & Technology
Beike Product & Technology
Aug 29, 2019 · Big Data

TiSpark Integration with TiDB/TiKV for Efficient Data Synchronization and OLAP in the Databus Project

This article introduces TiSpark—an extension of Spark that tightly integrates with TiDB/TiKV to enable high‑performance, scalable data synchronization and OLAP queries, details its architecture, key configuration, performance advantages over Spark SQL and Sqoop, and outlines its role in the Databus data‑integration platform.

Big DataData IntegrationPerformance Optimization
0 likes · 10 min read
TiSpark Integration with TiDB/TiKV for Efficient Data Synchronization and OLAP in the Databus Project
360 Smart Cloud
360 Smart Cloud
Aug 29, 2019 · Artificial Intelligence

360 Selected to Build a National New‑Generation AI Open Innovation Platform for a Security Brain

At the 2019 World Artificial Intelligence Conference, the Ministry of Science and Technology announced ten national AI open‑innovation platforms, selecting 360 to lead the security‑brain platform, highlighting its role in AI‑driven cybersecurity, big‑data analytics, cloud and blockchain technologies.

360Big DataInformation Security
0 likes · 4 min read
360 Selected to Build a National New‑Generation AI Open Innovation Platform for a Security Brain
58 Tech
58 Tech
Aug 29, 2019 · Information Security

Graph-Based Anomaly Detection Framework for Security Threats

The article presents a graph‑based anomaly detection architecture that tackles black‑market resource switching by constructing complex user‑traffic networks, mining graph similarities, and applying multi‑dimensional strategies to achieve high‑accuracy detection while meeting timeliness, performance, and interpretability requirements.

Big DataInformation Securityanomaly detection
0 likes · 8 min read
Graph-Based Anomaly Detection Framework for Security Threats
Xianyu Technology
Xianyu Technology
Aug 28, 2019 · Big Data

Unified Search System Architecture and Automation for Multiple Business Scenarios

To avoid building separate search services for each Xianyu business, the team created a unified, generic search architecture based on Alibaba’s HA3 engine and a control layer that automates data dumping, indexing, query translation, and result ranking across five subsystems, enabling new services to be onboarded in minutes instead of weeks.

Big Dataautomationdata pipeline
0 likes · 18 min read
Unified Search System Architecture and Automation for Multiple Business Scenarios
dbaplus Community
dbaplus Community
Aug 27, 2019 · Big Data

How eBay Scales Real‑Time Monitoring with Flink: Metadata‑Driven Streaming

This article explains how eBay’s Sherlock.IO monitoring platform processes billions of logs, events, and metrics daily using Flink Streaming jobs, detailing a metadata‑driven architecture, shared job strategies, Heartbeat‑based monitoring, job isolation, back‑pressure handling, and real‑world use cases such as Event Alerting, Eventzon, and Netmon.

Big DataFlinkReal-time Processing
0 likes · 18 min read
How eBay Scales Real‑Time Monitoring with Flink: Metadata‑Driven Streaming
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 26, 2019 · Big Data

Comprehensive Collection of Apache Flink Learning Resources

This article compiles a curated list of the most reliable and official Apache Flink learning materials—including beginner tutorials, source‑code walkthroughs, advanced topics, community articles, real‑world case studies, and downloadable resources—providing a one‑stop reference for developers and researchers interested in stream processing and big‑data analytics.

Apache FlinkBig DataResources
0 likes · 10 min read
Comprehensive Collection of Apache Flink Learning Resources
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 25, 2019 · Big Data

Tencent Oceanus: Evolution, Productization, and Optimizations of Real‑Time Stream Computing with Flink

This article recounts Tencent's journey from adopting Flink to building the Oceanus platform, detailing its architecture, product features, and a series of deep extensions—including UI redesign, JobManager failover, checkpoint handling, enhanced windows, LocalKeyBy, watermark idle detection, and log isolation—aimed at supporting trillion‑scale real‑time data processing.

Big DataFlinkOceanus
0 likes · 18 min read
Tencent Oceanus: Evolution, Productization, and Optimizations of Real‑Time Stream Computing with Flink
Architects' Tech Alliance
Architects' Tech Alliance
Aug 24, 2019 · Big Data

Reimagining Big Data in a Post‑Hadoop World

The article analyzes the decline of Hadoop as the dominant big‑data platform, explains how cloud‑based services are replacing its complex on‑premises architecture, and outlines the lessons and future directions for enterprises navigating a post‑Hadoop landscape.

Big DataDistributed SystemsHadoop
0 likes · 12 min read
Reimagining Big Data in a Post‑Hadoop World
Youzan Coder
Youzan Coder
Aug 23, 2019 · Big Data

How to Build a Robust Event Logging Quality System with Real‑Time Validation

This article outlines common event‑logging quality problems, a systematic registration and real‑time validation framework built on Flink, configurable rule syntax, explainable results, continuous monitoring, targeted optimizations, and an evaluation model that together form a comprehensive quality‑center for big‑data platforms.

Big DataData QualityFlink
0 likes · 11 min read
How to Build a Robust Event Logging Quality System with Real‑Time Validation
Qunar Tech Salon
Qunar Tech Salon
Aug 22, 2019 · Big Data

Performance Optimization Practices for Meituan's Hadoop YARN Fair Scheduler

This article details Meituan's experience optimizing the Hadoop YARN fair scheduler, covering background challenges, architectural components, resource abstractions, scheduling flow, performance metrics, a series of code‑level optimizations, stability strategies for production rollout, and future directions for large‑scale cluster scheduling.

Big DataFair SchedulerLoad Simulation
0 likes · 23 min read
Performance Optimization Practices for Meituan's Hadoop YARN Fair Scheduler
Big Data Technology Architecture
Big Data Technology Architecture
Aug 21, 2019 · Big Data

Key Big Data Terminology: Offline vs Real-time Computing, Real-time vs Ad Hoc Queries, OLTP vs OLAP, Row vs Column Storage

This article explains fundamental big‑data concepts by comparing offline (batch) and real‑time (stream) computing, distinguishing real‑time queries from ad‑hoc queries, clarifying OLTP versus OLAP workloads, and outlining the differences between row‑based and column‑based storage architectures.

Big DataColumn StorageOLAP
0 likes · 5 min read
Key Big Data Terminology: Offline vs Real-time Computing, Real-time vs Ad Hoc Queries, OLTP vs OLAP, Row vs Column Storage
DataFunTalk
DataFunTalk
Aug 20, 2019 · Artificial Intelligence

The Story of Machine Learning: Why Machines Can Learn and How Statistical Learning Makes It Possible

This article explains why machine learning relies on big‑data statistical learning, illustrating human learning through induction and deduction, presenting case studies that highlight the limits of anecdotal reasoning, and introducing the law of large numbers and probabilistic trust as foundations for reliable AI models.

Big DataLearning Theorymachine learning
0 likes · 19 min read
The Story of Machine Learning: Why Machines Can Learn and How Statistical Learning Makes It Possible
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 18, 2019 · Big Data

Flink Application Scenarios and Scale at Kuaishou

The article details how Kuaishou leverages Apache Flink for large‑scale stream processing, describing its application scenarios, cluster sizing, interval join optimization, RocksDB performance challenges, source throttling strategies, JobManager stability, frequent job failures, and platform‑wide improvements.

Big DataFlinkKuaishou
0 likes · 2 min read
Flink Application Scenarios and Scale at Kuaishou
Architects' Tech Alliance
Architects' Tech Alliance
Aug 18, 2019 · Big Data

Oracle Architecture and ASM Storage Configuration Overview

This article provides a comprehensive overview of Oracle database architecture, detailing memory, physical and logical structures, I/O characteristics of various files, differences between OLTP and OLAP workloads, and practical ASM configuration and storage optimization recommendations for high‑performance environments.

ASMBig DataDatabase Storage
0 likes · 12 min read
Oracle Architecture and ASM Storage Configuration Overview
Didi Tech
Didi Tech
Aug 17, 2019 · Industry Insights

How Didi’s Ride‑Sharing Data Transforms Automotive Finance Risk Management

This article analyzes how Didi’s unique ride‑hailing scenario big data is applied to automotive finance, detailing the business model, asset‑side and full‑process risk challenges, data‑driven solutions, and future prospects for intelligent credit risk control in both enterprise and retail lending.

Big DataCredit ScoringDidi
0 likes · 14 min read
How Didi’s Ride‑Sharing Data Transforms Automotive Finance Risk Management
Youku Technology
Youku Technology
Aug 15, 2019 · Big Data

Youku's Migration from Hadoop to Alibaba Cloud MaxCompute: Benefits and Technical Insights

Youku’s 2017 migration from an on‑premises Hadoop cluster to Alibaba Cloud MaxCompute delivered a unified, elastic data pipeline that cut compute and storage costs by roughly half, handled billions of daily log records, boosted performance and scalability, and empowered analysts with self‑service tools and a rich ecosystem.

Big DataCost OptimizationData Migration
0 likes · 12 min read
Youku's Migration from Hadoop to Alibaba Cloud MaxCompute: Benefits and Technical Insights
DataFunTalk
DataFunTalk
Aug 14, 2019 · Artificial Intelligence

Understanding Recommendation Systems: From Information Overload to Personalized AI Solutions

The article explores how the rapid growth of the internet has created information overload, discusses the challenges of recommendation systems such as sparsity and timeliness, outlines a four‑step personalized content pipeline, and highlights the interdisciplinary nature of building effective AI‑driven recommendation solutions.

AIBig Datadata engineering
0 likes · 16 min read
Understanding Recommendation Systems: From Information Overload to Personalized AI Solutions
Youzan Coder
Youzan Coder
Aug 14, 2019 · Big Data

Comprehensive Guide to Data Collection, Event Modeling, and Tracking in Big Data Platforms

The guide explains how comprehensive data collection in big‑data platforms relies on a standardized event model, passive and code‑based embedding, multi‑platform SDKs, a log‑middleware layer, precise location tracking, and an embedding management platform that supports workflow, testing, quality monitoring, and scalable infrastructure for future enhancements.

AnalyticsBig DataLog Processing
0 likes · 19 min read
Comprehensive Guide to Data Collection, Event Modeling, and Tracking in Big Data Platforms
Architecture Digest
Architecture Digest
Aug 14, 2019 · Big Data

Kafka Overview: Architecture, Storage Mechanism, Replication, and Consumer/Producer Model

Kafka is a distributed, partitioned, replicated messaging system originally developed by LinkedIn, offering high throughput, low latency, fault tolerance, and scalability; this article explains its core concepts, file storage design, partition replication, leader election, consumer groups, delivery guarantees, and operational considerations for big‑data pipelines.

Big DataDistributed SystemsKafka
0 likes · 56 min read
Kafka Overview: Architecture, Storage Mechanism, Replication, and Consumer/Producer Model
Amap Tech
Amap Tech
Aug 13, 2019 · Artificial Intelligence

2019 Alibaba Cloud Yunci Conference – Gaode Technology Session (Sept 27)

At the 2019 Alibaba Cloud Yunci Conference in Hangzhou, Gaode Technology presented a comprehensive technical forum covering visual intelligence, autonomous-driving perception, the evolution of its client and traffic-access architecture, fine-grained positioning, route-planning algorithms, and spatio-temporal data applications, featuring expert talks from Gaode and Alibaba specialists.

Big DataCloud NativeLocation-Based Services
0 likes · 8 min read
2019 Alibaba Cloud Yunci Conference – Gaode Technology Session (Sept 27)
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 12, 2019 · Big Data

Spark SQL Parameter Tuning and Performance Optimization (Spark 2.3.2)

This article explains how to troubleshoot and tune Spark SQL configuration parameters—covering exception‑related settings such as spark.sql.hive.convertMetastoreParquet, file‑ignore options, and partition verification, as well as performance‑focused tweaks like broadcast join thresholds, adaptive execution, and parquet schema merging—while providing a comprehensive parameter reference table.

Big DataHive MigrationParameter Tuning
0 likes · 23 min read
Spark SQL Parameter Tuning and Performance Optimization (Spark 2.3.2)
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 11, 2019 · Big Data

Deep Dive into Flink’s Network Stack: Credit‑Based Flow Control and Thread Model Optimizations

This article examines Flink’s industrial‑scale network stack, detailing the credit‑based flow control introduced in version 1.5, the refactored task‑IO thread collaboration, and serialization optimizations that together improve throughput and latency for large‑scale stream processing workloads.

Big DataCredit-based Flow ControlFlink
0 likes · 12 min read
Deep Dive into Flink’s Network Stack: Credit‑Based Flow Control and Thread Model Optimizations
DevOps Cloud Academy
DevOps Cloud Academy
Aug 11, 2019 · Big Data

Overview of MFS Distributed File System Architecture Similar to GoogleFS

The article explains the MFS distributed file system, detailing its four components—Master, Metalogger, Chunkserver, and Client—along with hardware recommendations, metadata handling, replication strategies, and FUSE‑based client mounting, providing a comprehensive guide to building a GoogleFS‑like storage cluster.

Big DataDistributed File SystemMFS
0 likes · 5 min read
Overview of MFS Distributed File System Architecture Similar to GoogleFS
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 8, 2019 · Big Data

Comprehensive Guide to Apache Kylin: Architecture, Concepts, Cube Design and Optimization

This article provides an in‑depth overview of Apache Kylin’s pre‑computation architecture, data‑warehouse concepts, step‑by‑step cube creation from Hive tables, and advanced optimization techniques such as derived dimensions, aggregation groups, and HBase row‑key encoding to achieve sub‑second OLAP queries on massive datasets.

Apache KylinBig DataCube
0 likes · 20 min read
Comprehensive Guide to Apache Kylin: Architecture, Concepts, Cube Design and Optimization
360 Quality & Efficiency
360 Quality & Efficiency
Aug 8, 2019 · Big Data

An Introduction to Kafka: Architecture, Design Principles, and Common Issues

This article introduces Kafka, covering its definition, core concepts such as topics, partitions, offsets, producers and consumers, typical use cases, underlying design principles including message‑partition allocation and retention policies, processing mechanisms, and common troubleshooting questions for real‑world deployments.

Big DataDistributed MessagingKafka
0 likes · 7 min read
An Introduction to Kafka: Architecture, Design Principles, and Common Issues
vivo Internet Technology
vivo Internet Technology
Aug 7, 2019 · Big Data

Understanding Apache Kafka: Concepts, Architecture, Deployment, Monitoring and Offset Management

The article gives a thorough overview of Apache Kafka, explaining its core concepts, architecture, deployment steps, monitoring tools, and offset management, including broker and topic structures, producer/consumer APIs, replication, leader election, consumer groups, offset committing, and practical configuration and troubleshooting guidance.

Big DataKafkaMessaging
0 likes · 36 min read
Understanding Apache Kafka: Concepts, Architecture, Deployment, Monitoring and Offset Management
dbaplus Community
dbaplus Community
Aug 6, 2019 · Databases

How ClickHouse Powers Real‑Time Hotel Data Analytics at Ctrip

This article details Ctrip's hotel data platform challenges with billions of daily updates and near‑million queries, evaluates various storage options, explains why ClickHouse was chosen, and describes the full‑load and incremental pipelines, monitoring, server clustering, and practical tips that enable sub‑second query performance at massive scale.

Big DataCtripDatabase Optimization
0 likes · 13 min read
How ClickHouse Powers Real‑Time Hotel Data Analytics at Ctrip
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 5, 2019 · Big Data

Apache Spark Latest Technological Developments and Outlook for Spark 3.0+

The article provides a comprehensive overview of recent Apache Spark advancements—including Delta Lake, Data Source V2, runtime optimizations, relational cache, cloud‑native challenges, AI integration via Project Hydrogen, and the anticipated features of Spark 3.0—highlighting how these innovations address modern data‑warehouse, cloud, and machine‑learning workloads.

Apache SparkBig DataDelta Lake
0 likes · 17 min read
Apache Spark Latest Technological Developments and Outlook for Spark 3.0+
Alibaba Cloud Developer
Alibaba Cloud Developer
Aug 5, 2019 · Cloud Computing

How Tmall’s Smart Stores Are Redefining New Retail with Cloud and Data

Alibaba’s senior tech expert Mu Jian explains how Tmall’s smart stores embody the new retail paradigm by leveraging cloud computing, big data, and digital tools to transform offline retail, enhance consumer experiences, streamline operations, and create integrated online‑offline ecosystems through cloud stores, cloud POS, and innovative marketing solutions.

Big DataCloud ComputingDigital Transformation
0 likes · 25 min read
How Tmall’s Smart Stores Are Redefining New Retail with Cloud and Data
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 3, 2019 · Big Data

Understanding SparkEnv Initialization: Components and Their Setup

This article walks through the SparkEnv initialization process in Apache Spark, detailing how the driver and executor environments are created, the key components such as SecurityManager, RpcEnv, SerializerManager, BroadcastManager, MapOutputTracker, ShuffleManager, MemoryManager, BlockManager, MetricsSystem, and OutputCommitCoordinator are instantiated, and how the final SparkEnv instance is assembled and stored.

Big DataScalaSpark
0 likes · 13 min read
Understanding SparkEnv Initialization: Components and Their Setup
Suning Technology
Suning Technology
Aug 2, 2019 · Big Data

How SuNing Uses Big Data to Revolutionize Retail Supply Chains

At the 15th China (Nanjing) International Software Expo, SuNing's VP shared how the company applies big‑data analytics, the C2M model, and flexible manufacturing to personalize retail experiences, bridge online‑offline gaps, and drive data‑driven product development and supply‑chain efficiency.

Big DataC2MData-driven
0 likes · 9 min read
How SuNing Uses Big Data to Revolutionize Retail Supply Chains
Meituan Technology Team
Meituan Technology Team
Aug 1, 2019 · Big Data

Performance Optimization Practices for Meituan's Hadoop YARN Fair Scheduler

Meituan improved its custom Hadoop YARN Fair Scheduler by pre‑computing resource usage, filtering zero‑demand jobs, and parallelizing queue sorting, which reduced sorting time from 30 s to 5 s per minute, boosted container‑per‑second throughput to 50 k, enabled live roll‑backs, and prepared the system for clusters up to 10 k nodes and future scaling to hundreds of thousands.

Big DataFair SchedulerHadoop
0 likes · 24 min read
Performance Optimization Practices for Meituan's Hadoop YARN Fair Scheduler
21CTO
21CTO
Jul 31, 2019 · Artificial Intelligence

How JD Built a Scalable AI‑Powered Recommendation System

The article outlines JD’s evolution from rule‑based product suggestions in 2012 to a sophisticated, AI‑driven, multi‑screen personalized recommendation platform, detailing its product types, system architecture, data collection, offline and online computation, and the core recommendation engine that powers features like “Guess You Like.”

AIBig DataJD.com
0 likes · 14 min read
How JD Built a Scalable AI‑Powered Recommendation System
dbaplus Community
dbaplus Community
Jul 30, 2019 · Big Data

Spark vs Flink: Which Real‑Time Engine Should You Choose for Kafka Streams?

With the surge in real‑time data from sensors and devices, choosing the right streaming engine is critical; this article compares Apache Spark and Apache Flink—examining their architectures, micro‑batch vs continuous processing, strengths, limitations, and use‑case suitability for Kafka‑driven pipelines.

Big DataFlinkKafka
0 likes · 14 min read
Spark vs Flink: Which Real‑Time Engine Should You Choose for Kafka Streams?
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 29, 2019 · Databases

Comprehensive Comparison of Apache Kylin and Apache Doris: Architecture, Data Models, Storage, Query, and Operations

This article provides an in‑depth technical comparison of Apache Kylin and Apache Doris, covering their system architectures, aggregation and detail data models, storage engines, data import processes, query execution, deduplication, metadata handling, performance, high availability, maintainability, usability, schema‑change capabilities, features, and community ecosystems.

Apache DorisApache KylinBig Data
0 likes · 21 min read
Comprehensive Comparison of Apache Kylin and Apache Doris: Architecture, Data Models, Storage, Query, and Operations
Architects' Tech Alliance
Architects' Tech Alliance
Jul 28, 2019 · Big Data

Alluxio: A Virtual Distributed File System for Unified Big Data Access and Cost‑Effective Storage

The article explains how Alluxio, a memory‑speed virtual distributed file system, acts as a virtual data lake to unify access to structured and unstructured big‑data across heterogeneous storage systems, offering on‑demand fast local access, intelligent caching, reduced storage costs, and enterprise‑grade security and fault tolerance.

AlluxioBig DataData Lake
0 likes · 15 min read
Alluxio: A Virtual Distributed File System for Unified Big Data Access and Cost‑Effective Storage
dbaplus Community
dbaplus Community
Jul 24, 2019 · Big Data

Essential Open-Source Tools Every Big Data Engineer Should Know

This article compiles a comprehensive list of common open‑source tools for big data platforms—covering programming languages, data collection, ETL, storage, analysis, query, management, and monitoring—to help learners and practitioners quickly locate and understand the technologies they need.

Big DataETLHadoop
0 likes · 15 min read
Essential Open-Source Tools Every Big Data Engineer Should Know
Tencent Cloud Developer
Tencent Cloud Developer
Jul 24, 2019 · Big Data

Implementing Custom Data Sources in Spark: TGSpark Data Source V2 Practice

The article explains how Tencent’s TGSpark leverages Spark DataSource V2 to create a custom source for TGMars storage, detailing shard‑aware design, push‑down of columns and filters, columnar batch loading, partition‑location reporting, and experimental results that show reduced shuffles and improved local computation when executor placement matches storage nodes.

Big DataColumn PushdownCustom Data Source
0 likes · 10 min read
Implementing Custom Data Sources in Spark: TGSpark Data Source V2 Practice
Xianyu Technology
Xianyu Technology
Jul 23, 2019 · Operations

Automated Service Fault Localization System Architecture

The automated service fault localization system ingests massive real‑time instrumentation data, builds call‑chain graphs, and instantly pinpoints the exact component causing timeouts or other errors, achieving developer‑level accuracy within seconds instead of minutes while remaining simple, fast, and fully automated.

Big DataFault LocalizationOperations
0 likes · 8 min read
Automated Service Fault Localization System Architecture
System Architect Go
System Architect Go
Jul 19, 2019 · Big Data

Introduction to HBase: Architecture, Data Model, and Operations

This article provides a comprehensive overview of HBase, covering its distributed column‑oriented architecture, data model components, storage mechanisms, read/write processes, WAL lifecycle, MemStore flushing, region splitting and merging, and failure recovery within the Hadoop ecosystem.

ArchitectureBig DataHBase
0 likes · 20 min read
Introduction to HBase: Architecture, Data Model, and Operations
dbaplus Community
dbaplus Community
Jul 18, 2019 · Databases

How JD.com Scales HBase to 90PB: Architecture, Optimizations, and Lessons

This article examines JD.com's massive HBase deployment, detailing its evolution from early adoption to a 90PB, 7,000‑node cluster, the platform's architecture, multi‑active disaster recovery, multi‑tenant isolation, and the integration of Phoenix for SQL‑based access, offering practical insights for large‑scale distributed storage.

Big DataDatabase ArchitectureHBase
0 likes · 15 min read
How JD.com Scales HBase to 90PB: Architecture, Optimizations, and Lessons
Tencent Cloud Developer
Tencent Cloud Developer
Jul 18, 2019 · Big Data

Tencent iData Analysis Center: Why We Chose Spark as Our Computing Platform

Tencent’s iData analysis center selected Spark as its new computing platform because, unlike ElasticSearch, TiDB, and other MPP solutions, Spark offers iterative processing, shuffle support, robust SQL and DAG scheduling, and flexible SMP‑style data exchange, enabling efficient OLAP on billions of game‑user records.

Big DataData PlatformMPP
0 likes · 13 min read
Tencent iData Analysis Center: Why We Chose Spark as Our Computing Platform
Youku Technology
Youku Technology
Jul 17, 2019 · Artificial Intelligence

How AI and Big Data Drive Casting Decisions in the TV Series “The Longest Day in Chang'an”

Youku’s AI‑powered Beidouxing system analyzed audience tags, attractiveness scores and performance data to select Lei Jiayin and Yi Yangqianxi for “The Longest Day in Chang’an”, guiding casting, episode frequency and other production choices while reducing subjective bias and expanding the talent pool.

AIBig DataCasting
0 likes · 13 min read
How AI and Big Data Drive Casting Decisions in the TV Series “The Longest Day in Chang'an”
DataFunTalk
DataFunTalk
Jul 16, 2019 · Databases

TDengine Architecture and Storage Design for IoT Big Data

This article explains TDengine’s architecture, including its management, data, and client modules, virtual node design, write process, and detailed storage file structures, highlighting how its innovative design optimizes resource usage and performance for IoT and other big‑data applications.

ArchitectureBig DataIoT
0 likes · 12 min read
TDengine Architecture and Storage Design for IoT Big Data
Amap Tech
Amap Tech
Jul 16, 2019 · Industry Insights

How Amap’s Big Data Powers Smart City Traffic – Insights from CCF‑GAIR 2019

At the 2019 CCF‑GAIR summit, Amap’s Director of Future Transportation explained how the company’s massive location‑based data, real‑time traffic feeds, and AI‑driven analytics enable smart traffic management, emergency vehicle routing, and predictive highway safety, delivering measurable congestion reductions and faster journeys across Chinese cities.

AIBig DataSmart City
0 likes · 10 min read
How Amap’s Big Data Powers Smart City Traffic – Insights from CCF‑GAIR 2019
DataFunTalk
DataFunTalk
Jul 15, 2019 · Big Data

Key Infrastructure Considerations for Autonomous Driving: Storage, Computing, and Services

The article reviews the essential infrastructure for autonomous driving, covering massive sensor data storage strategies, the role of metadata, offline and real‑time computing platforms, basic micro‑service components, and various business scenarios, highlighting why robust big‑data handling is critical.

Big DataReal‑Time Computingautonomous driving
0 likes · 14 min read
Key Infrastructure Considerations for Autonomous Driving: Storage, Computing, and Services
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 12, 2019 · Big Data

Designing a Real‑Time Big Data Sentiment System on Alibaba Cloud: From Lambda to Lambda‑Plus

This article explains how massive online data can be captured, structured, and analyzed in real time using a Lambda‑style architecture, then introduces a simplified Lambda‑Plus design built on Alibaba Cloud's Tablestore and Blink to meet both batch and streaming requirements while reducing operational complexity.

Big DataCloud ComputingLambda architecture
0 likes · 18 min read
Designing a Real‑Time Big Data Sentiment System on Alibaba Cloud: From Lambda to Lambda‑Plus
Didi Tech
Didi Tech
Jul 5, 2019 · Artificial Intelligence

Didi's Open-Source Contributions and Technical Innovations in AI and Big Data

Didi’s platform handles over 700 billion daily ETA requests using AI‑driven real‑time calculations, while its 6,000‑plus engineers rely on open‑source big‑data, cloud and AI frameworks, contribute 23 projects that have earned more than 36,000 stars, and provide anonymized traffic data to academia for transportation and urban‑planning research.

AIBig DataOpen-source
0 likes · 9 min read
Didi's Open-Source Contributions and Technical Innovations in AI and Big Data