Tagged articles
3675 articles
Page 23 of 37
Architects' Tech Alliance
Architects' Tech Alliance
Jan 29, 2021 · Artificial Intelligence

Comprehensive Overview of Machine Learning: Types, Industry Chain, and Key Technologies

This article provides a detailed introduction to machine learning, covering its definition, learning modes such as supervised, unsupervised and reinforcement learning, shallow versus deep learning, the full industry chain from AI chips to cloud and big‑data services, and the major open‑source frameworks and platforms driving the field.

AI chipsBig DataUnsupervised Learning
0 likes · 11 min read
Comprehensive Overview of Machine Learning: Types, Industry Chain, and Key Technologies
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 28, 2021 · Big Data

Understanding Data Lakes: Definitions, Benefits, Architectures, and Technology Choices

Data lakes, emerging since 2020, are centralized repositories that store structured and unstructured data at any scale, offering flexible analytics, but require robust management to avoid becoming data swamps; this article explains definitions, advantages, typical architectures, and compares cloud and open‑source solutions such as AWS Lake Formation, Alibaba Cloud, Delta, Iceberg, and Hudi.

AnalyticsBig DataCloud Storage
0 likes · 13 min read
Understanding Data Lakes: Definitions, Benefits, Architectures, and Technology Choices
dbaplus Community
dbaplus Community
Jan 27, 2021 · Big Data

How We Upgraded a 1500-Node Flink Cluster to 1.10: Challenges and Solutions

Facing a massive 1500‑node Flink 1.4.2 cluster handling over 12,000 tasks and 30 trillion daily events, we migrated to Flink 1.10, detailing new DDL/Catalog support, SQL enhancements, memory tuning, compatibility patches, extensive testing, and engine optimizations such as task‑load metrics and balanced sub‑task scheduling.

Big DataFlinkPerformance Optimization
0 likes · 13 min read
How We Upgraded a 1500-Node Flink Cluster to 1.10: Challenges and Solutions
Alibaba Cloud Developer
Alibaba Cloud Developer
Jan 25, 2021 · Big Data

Why 2020 Was the Breakthrough Year for Apache Flink’s Ecosystem

In 2020, Apache Flink surged to become the most active Apache project, releasing three major versions that advanced its unified stream‑batch engine, introduced cloud‑native K8s support, expanded AI capabilities with PyFlink, and fostered a thriving Chinese community, solidifying its role as the de‑facto standard for real‑time computing.

AI integrationApache FlinkBig Data
0 likes · 21 min read
Why 2020 Was the Breakthrough Year for Apache Flink’s Ecosystem
Didi Tech
Didi Tech
Jan 22, 2021 · Big Data

Erasure Coding Practice in HDFS at Didi: Principles, Implementation, and Lessons Learned

Didi migrated HDFS to Hadoop 3.2 and implemented erasure coding—using XOR and Reed‑Solomon RS(6,3) striping—to replace three‑replica storage for cold data, building back‑ported clients, automated conversion tools, and cross‑datacenter backup pipelines, while addressing operational bugs and noting performance trade‑offs.

Big DataDidiHDFS
0 likes · 11 min read
Erasure Coding Practice in HDFS at Didi: Principles, Implementation, and Lessons Learned
DataFunTalk
DataFunTalk
Jan 22, 2021 · Big Data

Practical Experience of Apache Flink at ByteDance: Architecture, Optimizations, and Future Directions

This article presents ByteDance's real‑world use of Apache Flink, covering the platform's overall architecture, SQL extensions, custom connectors, UI‑driven SQL platform, performance optimizations such as window mini‑batch and custom windows, dimension‑table enhancements, checkpoint recovery improvements, stream‑batch integration, and upcoming roadmap items.

Apache FlinkBig DataByteDance
0 likes · 15 min read
Practical Experience of Apache Flink at ByteDance: Architecture, Optimizations, and Future Directions
Top Architect
Top Architect
Jan 18, 2021 · Big Data

Migrating Over 2 Billion MySQL Records to Google BigQuery Using Kafka

This article details a real‑world solution for migrating more than two billion MySQL records to Google BigQuery by streaming data through Kafka, employing partitioned tables, data filtering, and incremental migration to avoid downtime and reduce storage costs.

Big DataBigQueryData Migration
0 likes · 7 min read
Migrating Over 2 Billion MySQL Records to Google BigQuery Using Kafka
Efficient Ops
Efficient Ops
Jan 17, 2021 · Big Data

Understanding Kafka: Core Concepts, Architecture, and Performance Secrets

This article introduces Kafka’s fundamental role as a messaging system, explains topics, partitions, producers, consumers, replicas, consumer groups, and the controller, and explores its cluster architecture, performance optimizations like sequential writes and zero-copy, providing a comprehensive overview for building scalable data pipelines.

Big DataDistributed SystemsMessage Queue
0 likes · 11 min read
Understanding Kafka: Core Concepts, Architecture, and Performance Secrets
Programmer DD
Programmer DD
Jan 16, 2021 · Artificial Intelligence

Can AI Really Predict Employee Work Status? Inside Baidu’s New Patent

The article examines Baidu’s newly filed patent for predicting employee work status, explaining its big‑data‑driven methodology, the company’s claim it’s a talent‑management tool, and the broader debate over workplace surveillance amid the ongoing 996 controversy.

AI predictionBaidu patentBig Data
0 likes · 4 min read
Can AI Really Predict Employee Work Status? Inside Baidu’s New Patent
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 15, 2021 · Big Data

Evolution and Architecture of Major Chinese Big Data Platforms: Taobao, Didi, Meituan, 360, Kuaishou, and JD

This article reviews the evolution, architecture, and key components of major Chinese big‑data platforms—including those of Taobao, Didi, Meituan, 360, Kuaishou, and JD—highlighting data ingestion, storage, processing engines, scheduling systems, and service‑oriented designs that underpin their large‑scale data operations.

Big DataData PlatformHadoop
0 likes · 14 min read
Evolution and Architecture of Major Chinese Big Data Platforms: Taobao, Didi, Meituan, 360, Kuaishou, and JD
DataFunTalk
DataFunTalk
Jan 15, 2021 · Big Data

Optimizing Apache Kylin for Meituan's Sales OLAP: From MapReduce to Spark and Resource Tuning

This article presents a detailed case study of how Meituan's in‑store dining sales team identified severe efficiency issues in their Apache Kylin‑based OLAP system, dissected the construction process, and applied a step‑by‑step optimization roadmap—including engine migration, dimension pruning, resource configuration, and Spark‑based layered building—to boost query performance and achieve near‑perfect SLA.

Apache KylinBig DataMeituan
0 likes · 16 min read
Optimizing Apache Kylin for Meituan's Sales OLAP: From MapReduce to Spark and Resource Tuning
Didi Tech
Didi Tech
Jan 14, 2021 · Cloud Computing

Design and Implementation of Didi's Logi‑KafkaManager Multi‑tenant Kafka Cloud Platform

Didi’s Logi‑KafkaManager is a multi‑tenant Kafka cloud platform that consolidates dozens of clusters into a secure, isolated gateway‑driven service offering intuitive web‑based topic management, real‑time metrics visualization, automated diagnostics, quota governance and safe scaling, delivering high internal satisfaction and enterprise commercialization.

Big DataKafkaMonitoring
0 likes · 17 min read
Design and Implementation of Didi's Logi‑KafkaManager Multi‑tenant Kafka Cloud Platform
Meituan Technology Team
Meituan Technology Team
Jan 14, 2021 · Big Data

Design and Implementation of an SSD‑Based Application‑Layer Cache Architecture for Kafka in Meituan Data Platform

Meituan built an SSD‑based application‑layer cache for Kafka that bypasses PageCache contention between real‑time and delayed jobs, classifies log segments across SSD and HDD, limits flush rates, and achieves up to 80% latency reduction while guaranteeing stable real‑time consumption.

Big DataKafkaLogSegment
0 likes · 19 min read
Design and Implementation of an SSD‑Based Application‑Layer Cache Architecture for Kafka in Meituan Data Platform
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
Jan 14, 2021 · Big Data

How Yidun Achieves Real-Time, High-Performance Public-Opinion Data Cleaning with Groovy and JVM

Yidun’s public-opinion monitoring platform transforms massive raw web data into a unified format by separating dynamic Groovy-script-driven cleaning from static processing, achieving real-time source integration, high throughput, scalability, and high availability while addressing format diversity, team coordination, and performance-flexibility trade-offs.

Big DataETLGroovy
0 likes · 5 min read
How Yidun Achieves Real-Time, High-Performance Public-Opinion Data Cleaning with Groovy and JVM
Architects Research Society
Architects Research Society
Jan 13, 2021 · Fundamentals

Master Data Management (MDM): Concepts, Business Value, Technical Challenges, and Architectural Considerations

The article explains master data management (MDM) as a framework for creating a single, reliable source of truth, outlines its growing business relevance, discusses key technical challenges such as data governance and scalability, and explores next‑generation architectures involving graph databases, big data, and machine learning.

Big DataData GovernanceGraph Database
0 likes · 10 min read
Master Data Management (MDM): Concepts, Business Value, Technical Challenges, and Architectural Considerations
vivo Internet Technology
vivo Internet Technology
Jan 13, 2021 · Big Data

Statistical Monitoring Using Normal Distribution and Boxplot: Theory, Implementation, and API Design

The article explains the origin of the normal distribution, the central limit theorem, and how boxplots identify anomalies, then describes a Java‑based API that partitions data into five median‑centered levels using same‑period and year‑over‑year ratios to automatically detect and classify abnormal trends in daily metrics.

Big DataBoxplotanomaly detection
0 likes · 11 min read
Statistical Monitoring Using Normal Distribution and Boxplot: Theory, Implementation, and API Design
dbaplus Community
dbaplus Community
Jan 11, 2021 · Databases

Why eBay Switched Its Ad Analytics from Druid to ClickHouse – A Deep Dive

eBay’s ad data platform, originally built on a custom SQL engine and later migrated to Druid, was re‑engineered to use ClickHouse, highlighting challenges such as massive data volume, atomic offline replacements, schema design, compression, and operational simplifications, and demonstrating performance and scalability gains for advertisers.

Ad AnalyticsBig DataClickHouse
0 likes · 18 min read
Why eBay Switched Its Ad Analytics from Druid to ClickHouse – A Deep Dive
DataFunSummit
DataFunSummit
Jan 10, 2021 · Big Data

Business Model and Digital Transformation of Internet Consumer Finance: A Case Study of CMB’s Flash Loan

The article analyzes the business architecture, value proposition, channels, revenue model, core resources, and digital transformation of internet consumer finance using China Merchants Bank’s fast‑approval “Flash Loan” as a case study, highlighting the role of big data, AI, and cloud computing in modern retail lending.

Big DataBusiness ModelDigital Transformation
0 likes · 13 min read
Business Model and Digital Transformation of Internet Consumer Finance: A Case Study of CMB’s Flash Loan
21CTO
21CTO
Jan 7, 2021 · Big Data

How Kuaishou Built a Scalable Big Data Service Platform to Eliminate Redundant Development

This article explains Kuaishou's data service platform, detailing the background challenges of high development barriers and duplicated work, the platform's architecture and key technologies such as configuration‑driven development, multi‑mode APIs, data acceleration, and high‑availability mechanisms, and concludes with future directions.

Big DataData AccelerationData Platform
0 likes · 12 min read
How Kuaishou Built a Scalable Big Data Service Platform to Eliminate Redundant Development
360 Tech Engineering
360 Tech Engineering
Jan 7, 2021 · Big Data

Overview of the Qirin Big Data Platform Architecture and Core Modules

The article introduces the Qirin big data platform—a one‑stop solution covering resource management, metadata, data ingestion, task development, interactive querying, and self‑service analysis—detailing its modular architecture, typical processing workflow, and future development plans for enterprise‑wide data services.

Big DataData PlatformResource Management
0 likes · 11 min read
Overview of the Qirin Big Data Platform Architecture and Core Modules
vivo Internet Technology
vivo Internet Technology
Jan 6, 2021 · Big Data

How HyperLogLog Estimates Cardinality in Massive Data Sets

This article explains the cardinality‑counting problem behind DAU/MAU and unique visitor metrics, compares naïve solutions like Set, Bitmap and Bloom filter, introduces big‑data algorithms such as Linear Counting, LogLog and HyperLogLog, and shows how Redis implements HyperLogLog with dense and sparse storage optimizations.

Big DataCardinalityHyperLogLog
0 likes · 17 min read
How HyperLogLog Estimates Cardinality in Massive Data Sets
DataFunTalk
DataFunTalk
Jan 6, 2021 · Big Data

Didi's Presto Engine: Architecture, Optimizations, and Operational Practices

This article presents Didi's three‑year experience with Presto, detailing its architecture, low‑latency design, large‑scale deployment, extensive Hive compatibility work, resource isolation, Druid connector integration, usability enhancements, stability engineering, performance tuning, and future directions for the ad‑hoc query engine.

Big DataDistributed SystemsDruid Connector
0 likes · 17 min read
Didi's Presto Engine: Architecture, Optimizations, and Operational Practices
dbaplus Community
dbaplus Community
Jan 5, 2021 · Big Data

How Ctrip Built a Scalable Unified Log Framework for Payment Data

Facing massive, heterogeneous logs from numerous payment services, Ctrip’s data team designed a unified logging framework that extends log4j2, streams logs via Kafka to HDFS using a customized Camus pipeline, partitions and stores data in ORC for efficient Hive analysis, while addressing format, storage, and performance challenges.

Big DataCamusHadoop
0 likes · 16 min read
How Ctrip Built a Scalable Unified Log Framework for Payment Data
Alibaba Cloud Developer
Alibaba Cloud Developer
Jan 4, 2021 · Databases

Why Cloud‑Native Distributed Databases Are the Future of Enterprise Data

The article reviews the evolution of database systems driven by cloud computing, big‑data demands and distributed architectures, highlights Alibaba Cloud’s cloud‑native offerings such as PolarDB and AnalyticDB, and discusses trends, security, and best practices for modern enterprise data platforms.

Alibaba CloudBig DataDatabase Security
0 likes · 14 min read
Why Cloud‑Native Distributed Databases Are the Future of Enterprise Data
DataFunTalk
DataFunTalk
Jan 3, 2021 · Artificial Intelligence

iQIYI Machine Learning Platform: Development History, Features, and Practical Experience

This article details the evolution of iQIYI's machine learning platform—from its early Javis‑based deep‑learning system to three major versions that introduced visual workflow, distributed scheduling, auto‑tuning, large‑scale training support, model management, and online prediction—while sharing practical lessons and a real anti‑cheat use case.

Big DataModel Managementhyperparameter tuning
0 likes · 13 min read
iQIYI Machine Learning Platform: Development History, Features, and Practical Experience
Tencent Cloud Developer
Tencent Cloud Developer
Dec 30, 2020 · Big Data

How Alluxio Boosts Tencent Cloud EMR: Cutting Bandwidth by 50% and Accelerating IO‑Intensive Workloads

This article analyzes the challenges of traditional monolithic big‑data architectures, explains how Tencent Cloud EMR integrates Alluxio for compute‑storage separation, presents detailed performance benchmarks showing 20‑50% bandwidth reduction and 5‑40% query speedup, and outlines the specific tuning measures applied.

AlluxioBig DataCloud Computing
0 likes · 10 min read
How Alluxio Boosts Tencent Cloud EMR: Cutting Bandwidth by 50% and Accelerating IO‑Intensive Workloads
JD Tech Talk
JD Tech Talk
Dec 30, 2020 · Databases

Architecture and Application Practice of JD Urban Spatio-Temporal Data Engine (JUST)

The presentation details the design, implementation, and real‑world applications of the JD Urban Spatio‑Temporal Data Engine (JUST), a distributed, scalable database that handles massive, complex spatio‑temporal data with novel storage, indexing, and query techniques, demonstrating high performance and ease of use across smart‑city scenarios.

Big DataGISUrban Computing
0 likes · 26 min read
Architecture and Application Practice of JD Urban Spatio-Temporal Data Engine (JUST)
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 29, 2020 · Fundamentals

What Are the 10 Tech Trends Shaping the Post-Pandemic Era?

Alibaba DAMO Academy outlines ten pivotal technology trends for 2021, ranging from third‑generation semiconductors and quantum computing to AI‑driven drug discovery, cloud‑native IT, data‑intelligent agriculture, and smart city operation centers, highlighting how these innovations will drive post‑pandemic growth.

Artificial IntelligenceBig DataQuantum Computing
0 likes · 9 min read
What Are the 10 Tech Trends Shaping the Post-Pandemic Era?
Alibaba Terminal Technology
Alibaba Terminal Technology
Dec 28, 2020 · Big Data

Unlocking Massive-Scale User Behavior Analysis: From Funnels to Intelligent Links

This talk explores how to conduct user behavior analysis on massive data sets, compares existing analytics tools, and presents Alibaba Dataworks' end‑to‑end solution—including funnel and link visualizations, a big‑data processing architecture, and future intelligent link capabilities—to uncover and resolve user‑experience issues efficiently.

Alibaba CloudBig DataData visualization
0 likes · 16 min read
Unlocking Massive-Scale User Behavior Analysis: From Funnels to Intelligent Links
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 28, 2020 · Big Data

Implementing Historical Slowly Changing Dimension (Chain) Tables with PL/pgSQL

This article explains the concept of historical chain (slowly changing dimension) tables in data warehousing, demonstrates how to create source and target tables, provides a PL/pgSQL stored procedure to handle inserts, updates, and deletions, and shows step‑by‑step testing with sample SQL scripts.

Big DataPL/pgSQLSlowly Changing Dimension
0 likes · 10 min read
Implementing Historical Slowly Changing Dimension (Chain) Tables with PL/pgSQL
dbaplus Community
dbaplus Community
Dec 27, 2020 · Big Data

How ClickHouse Powers a 700 B‑Row Real‑Time Data Platform at Ctrip

This article details how Ctrip's senior engineering manager leveraged ClickHouse to build a high‑availability, sub‑second response data platform handling nearly 700 billion rows, describing the motivations, architecture, data synchronization processes, performance gains, challenges, and practical recommendations for large‑scale analytics.

Big DataClickHouseData Architecture
0 likes · 28 min read
How ClickHouse Powers a 700 B‑Row Real‑Time Data Platform at Ctrip
Architect
Architect
Dec 27, 2020 · Big Data

Optimizing Billion‑Scale Hive Queries: Partitioning, Indexing, Bucketing, Active‑User Segmentation, and Data Structure Refactoring

This article walks through the challenges of querying a 300‑billion‑row Hive table, analyzes why traditional partitioning, indexing, and bucketing fall short, and presents a practical solution that combines active‑user segmentation and a redesigned array‑based data model to cut query time from hours to minutes.

Big DataData Partitioningdata modeling
0 likes · 10 min read
Optimizing Billion‑Scale Hive Queries: Partitioning, Indexing, Bucketing, Active‑User Segmentation, and Data Structure Refactoring
Youzan Coder
Youzan Coder
Dec 25, 2020 · Big Data

Metadata Governance and Collection in a Data Asset Platform

The platform implements comprehensive metadata governance by extracting, standardizing, and ingesting basic, trend, resource, lineage, and task metadata from offline and real‑time systems via a Kafka‑based SDK, enabling unified storage, monitoring, alerts, and future automation to improve data asset visibility and quality.

Big DataData GovernanceMonitoring
0 likes · 18 min read
Metadata Governance and Collection in a Data Asset Platform
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 24, 2020 · Big Data

Common Techniques for Processing Massive Data Sets

This article summarizes a range of practical methods—including Bloom filters, hashing, bit‑maps, heaps, bucket partitioning, database indexes, inverted indexes, external sorting, trie trees, and MapReduce—that are commonly used to handle, deduplicate, and query extremely large data volumes in big‑data applications.

Big DataHashingHeap
0 likes · 11 min read
Common Techniques for Processing Massive Data Sets
Code Ape Tech Column
Code Ape Tech Column
Dec 23, 2020 · Fundamentals

Technical Concepts Illustrated Through Relationship Analogies

The article humorously maps various relationship scenarios to core IT concepts such as backup strategies, high‑availability mechanisms, scaling methods, security measures, cloud services, and big‑data techniques, providing an engaging overview of fundamental system design principles.

BackupBig DataCloud Computing
0 likes · 8 min read
Technical Concepts Illustrated Through Relationship Analogies
dbaplus Community
dbaplus Community
Dec 22, 2020 · Big Data

How eBay Migrated 10 PB of HDFS Data Across Namespaces in Just 2 Hours

This article details how eBay's ADI Hadoop team tackled a massive 10 PB, 10‑million‑file migration by optimizing DistCp with Fastcopy, load‑balancing, ACL handling, and failure recovery, ultimately completing the transfer within a two‑hour window while preserving cluster stability and performance.

Big DataDistcpHDFS
0 likes · 16 min read
How eBay Migrated 10 PB of HDFS Data Across Namespaces in Just 2 Hours
Architect
Architect
Dec 22, 2020 · Big Data

Dimensional Modeling in Data Warehousing: Concepts, Theory, and Practical Example

This article explains data warehouse fundamentals, reviews classic warehouse models such as ER, dimensional, Data Vault and Anchor, then dives deep into dimensional modeling concepts, star and snowflake schemas, and demonstrates a practical e‑commerce scenario with SQL examples and trade‑offs.

Big DataETLStar Schema
0 likes · 11 min read
Dimensional Modeling in Data Warehousing: Concepts, Theory, and Practical Example
21CTO
21CTO
Dec 21, 2020 · Big Data

5 Emerging Big Data Trends Shaping Business, Health, and Climate in 2021

This article outlines five key big‑data trends for 2021—including the rise of augmented analytics, the convergence of big data with blockchain, the growing importance of knowledge graphs, data‑driven health innovations, and climate‑focused analytics—highlighting their impact on organizations and future technological landscapes.

Big DataBlockchainKnowledge graph
0 likes · 8 min read
5 Emerging Big Data Trends Shaping Business, Health, and Climate in 2021
Didi Tech
Didi Tech
Dec 21, 2020 · Big Data

HBase Availability and Latency Optimizations: Replication‑Based Multi‑Read and ZGC Adoption

To overcome HBase’s weak availability and GC‑induced latency spikes, the DiDi team introduced a replication‑based client multi‑read (hedged‑read) mechanism and migrated to the Z Garbage Collector, which together dramatically cut maximum and 99.9th‑percentile latencies while keeping services online during region disruptions.

Big DataHBaseLow latency
0 likes · 12 min read
HBase Availability and Latency Optimizations: Replication‑Based Multi‑Read and ZGC Adoption
Youzan Coder
Youzan Coder
Dec 18, 2020 · Big Data

Design and Implementation of a Configurable Real-Time Rule Engine for Live‑Streaming Product Audits

The paper presents a configurable real‑time rule engine for live‑streaming product audits that decouples data aggregation from rule execution, uses QLExpress for dynamic conditions, supports Dubbo and HTTP sources, and enables safe gray‑release updates, cutting the rule‑change cycle from weeks to near‑real‑time.

Big DataQLExpressconfiguration
0 likes · 8 min read
Design and Implementation of a Configurable Real-Time Rule Engine for Live‑Streaming Product Audits
Laiye Technology Team
Laiye Technology Team
Dec 18, 2020 · Big Data

Comprehensive Overview of Laiye Technology's Business Intelligence Ecosystem

This article provides a detailed, end‑to‑end description of Laiye Technology's BI ecosystem, covering its background, development stages, data acquisition, transmission, transformation, loading, modeling, storage layers, statistical analysis, real‑time metrics, visualization, and future challenges, illustrating how the company builds a scalable, cloud‑native data‑driven platform.

AnalyticsBIBig Data
0 likes · 22 min read
Comprehensive Overview of Laiye Technology's Business Intelligence Ecosystem
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 17, 2020 · Big Data

Why GraphScope is Revolutionizing Large-Scale Graph Computing for AI and Big Data

GraphScope, an open‑source one‑stop platform from Alibaba DAMO Academy, unifies interactive queries, graph analytics, and graph learning on massive, rapidly evolving graphs, offering high‑performance distributed memory management, Gremlin optimization, and seamless Python integration to tackle real‑world AI and big‑data challenges.

Big DataDistributed SystemsOpen-source
0 likes · 21 min read
Why GraphScope is Revolutionizing Large-Scale Graph Computing for AI and Big Data
macrozheng
macrozheng
Dec 15, 2020 · Big Data

How Kafka Achieves Million‑TPS Through Sequential I/O, MMAP, and Zero‑Copy

Kafka can sustain millions of transactions per second by writing data sequentially to disk, leveraging memory‑mapped files, employing zero‑copy DMA transfers, and batching messages, each technique reducing I/O overhead and CPU involvement, which together enable its high‑throughput performance in big‑data pipelines.

Big DataHigh ThroughputKafka
0 likes · 11 min read
How Kafka Achieves Million‑TPS Through Sequential I/O, MMAP, and Zero‑Copy
Youzan Coder
Youzan Coder
Dec 15, 2020 · Industry Insights

How Youzan Built a Full‑Scale Data Cost Billing System: From SDK to Multi‑Dimensional Analysis

This article details Youzan's end‑to‑end construction of a unified data‑center cost billing system, covering background goals, multi‑type cost support, SDK‑based information collection, cost quantification for offline, real‑time and platform tools, full‑business coverage, multi‑dimensional analysis models, operational rollout, and future plans.

Big DataData PlatformIndustry Insights
0 likes · 19 min read
How Youzan Built a Full‑Scale Data Cost Billing System: From SDK to Multi‑Dimensional Analysis
Programmer DD
Programmer DD
Dec 10, 2020 · Artificial Intelligence

Discover Didi’s 40+ Open‑Source Projects in AI, Big Data & Cloud

DiDi’s open‑source portfolio, now exceeding 40 projects, spans AI runtimes, speech recognition, traffic analytics, middleware, big‑data loaders, monitoring tools, mobile frameworks, and frontend libraries, offering developers ready‑to‑use solutions for edge AI, intelligent transportation, data processing, and system reliability.

Artificial IntelligenceBig DataMobile Development
0 likes · 23 min read
Discover Didi’s 40+ Open‑Source Projects in AI, Big Data & Cloud
Youzan Coder
Youzan Coder
Dec 9, 2020 · Big Data

Youzan Big Data Technology Salon: Practices in Data Cost Governance, Apache Iceberg, Flink, and Data-Driven Growth

The Youzan Big Data Technology Salon brought together Youzan, NetEase and Didi to share practical approaches for cutting data‑infrastructure costs, building an Apache Iceberg‑based data lake, scaling Flink real‑time workloads, and creating a data‑driven growth platform that leverages tracking, A/B testing and analytics.

Apache IcebergBig DataData Cost Governance
0 likes · 5 min read
Youzan Big Data Technology Salon: Practices in Data Cost Governance, Apache Iceberg, Flink, and Data-Driven Growth
DataFunTalk
DataFunTalk
Dec 8, 2020 · Artificial Intelligence

Financial Big Data Risk Control Models: Techniques, Applications, and COVID‑19 Challenges

This article presents a comprehensive overview of financial big‑data risk control models at Du Xiaoman, covering traditional scoring cards, AI‑driven time‑series and text processing, graph‑based networks, model interpretability, probability calibration, stability analysis, and the specific challenges introduced by the COVID‑19 pandemic.

Artificial IntelligenceBig DataCredit Scoring
0 likes · 14 min read
Financial Big Data Risk Control Models: Techniques, Applications, and COVID‑19 Challenges
Xianyu Technology
Xianyu Technology
Dec 8, 2020 · Big Data

Supply-Demand Modeling and Category Optimization for the Idle Second-Hand Market

The article describes a supply‑demand modeling framework for the idle second‑hand market that extracts and structures product attributes, builds a decision‑tree‑based index from price, inventory, search‑hotspot and demand‑activation sub‑models, and uses the index to optimize category allocation, boost scarce supply, and drive overall growth.

Big DataMarket AnalysisProduct Modeling
0 likes · 7 min read
Supply-Demand Modeling and Category Optimization for the Idle Second-Hand Market
Tencent Cloud Developer
Tencent Cloud Developer
Dec 7, 2020 · Big Data

Searchable Snapshots in Elasticsearch 7.10: Features, Usage, and Future Outlook

Elasticsearch 7.10 adds searchable snapshots, letting users query indices stored directly in remote repositories such as S3 or COS, which halves storage costs, decouples storage from compute, supports manual mounting and ILM cold‑phase policies, and promises future full storage‑compute separation without local caching.

Big DataData TieringElasticsearch
0 likes · 12 min read
Searchable Snapshots in Elasticsearch 7.10: Features, Usage, and Future Outlook
DataFunSummit
DataFunSummit
Dec 1, 2020 · Artificial Intelligence

Building an AI Ecosystem with Flink: AI Flow Architecture, Components, and Applications

This article explains how Flink enables end‑to‑end AI workflows through the AI Flow platform, covering the Lambda architecture background, AI task pipeline stages, the reasons for choosing Flink, AI Flow’s graph model, core services, integration with ML pipelines, and real‑world advertising recommendation use cases.

AI FlowAI PipelineBig Data
0 likes · 12 min read
Building an AI Ecosystem with Flink: AI Flow Architecture, Components, and Applications
DataFunTalk
DataFunTalk
Nov 30, 2020 · Fundamentals

DataFunTalk Annual Conference – Full Program and Speaker Details

The DataFunTalk year‑end conference will be held online on December 19‑20, featuring over 90 speakers across multiple forums covering recommendation algorithms, knowledge graphs, AI, big data, security, and product development, with detailed session schedules, speaker bios, and registration information.

AIBig DataKnowledge graph
0 likes · 76 min read
DataFunTalk Annual Conference – Full Program and Speaker Details
JD Tech Talk
JD Tech Talk
Nov 30, 2020 · Big Data

Scalable Time Series Similarity Search in Big Data: Partitioning, Dimensionality Reduction, and LSH Approaches

This article examines the challenges of performing time‑series similarity queries on massive datasets and presents three scalable solutions—partition‑based indexing, dimensionality‑reduction using MinHash, and a combined approach with Locality Sensitive Hashing—to reduce computation while preserving similarity accuracy.

Big DataLSHMinhash
0 likes · 10 min read
Scalable Time Series Similarity Search in Big Data: Partitioning, Dimensionality Reduction, and LSH Approaches
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Nov 28, 2020 · Fundamentals

What 19 Core Topics Every Software Architect Must Master

This article outlines a comprehensive knowledge framework for software architects, covering nineteen essential areas such as responsibilities, foundational concepts, internet system challenges, distributed caching, messaging, load balancing, performance testing, operating systems, algorithms, networking, database design, JVM internals, flash-sale systems, microservices, domain‑driven design, security, high‑availability, big data, and blockchain.

Big DataSoftware ArchitectureSystem Design
0 likes · 6 min read
What 19 Core Topics Every Software Architect Must Master
dbaplus Community
dbaplus Community
Nov 28, 2020 · Operations

How a Chinese City Bank Integrated DevOps, AI, and Big Data to Transform Operations

This case study details how a city‑bank leveraged DevOps and ITIL integration, AI‑driven monitoring, and Spark‑based big‑data analytics to build a unified development‑testing‑operations platform, improve service availability, shorten deployment cycles, and achieve near‑99.99% system uptime across its core banking services.

AIBig DataDevOps
0 likes · 17 min read
How a Chinese City Bank Integrated DevOps, AI, and Big Data to Transform Operations
Beike Product & Technology
Beike Product & Technology
Nov 27, 2020 · Artificial Intelligence

Mining User Housing Preference Schemes with Supply‑Filtered Tree‑Based Methods

The article proposes a supply‑filtered, tree‑based approach to discover multi‑dimensional user housing preference schemes, contrasting fixed‑length preference mining methods, and details algorithmic modules such as split‑point search, similarity calculation, split suppression, and user clustering to improve interpretability and offline applicability.

AIBig Datahousing recommendation
0 likes · 13 min read
Mining User Housing Preference Schemes with Supply‑Filtered Tree‑Based Methods
Practical DevOps Architecture
Practical DevOps Architecture
Nov 27, 2020 · Big Data

Step-by-Step Guide to Install and Configure a Hadoop 2.8.2 Cluster

This tutorial provides a complete walkthrough for downloading Hadoop 2.8.2, setting up a three‑node master‑slave cluster, configuring core, HDFS, MapReduce and YARN settings, creating required directories, distributing the installation, starting the services, verifying the cluster status, and finally shutting it down.

Big DataCluster SetupHDFS
0 likes · 5 min read
Step-by-Step Guide to Install and Configure a Hadoop 2.8.2 Cluster
dbaplus Community
dbaplus Community
Nov 26, 2020 · Big Data

Silicon Valley's Data Middle Platform Secrets: EA, Twitter, Airbnb, Uber

This article examines how leading Silicon Valley companies such as EA, Twitter, Airbnb, and Uber design and operate data middle platforms—detailing their architectures, data collection pipelines, standardization efforts, real‑time and batch processing, and the business impact of shared data capabilities.

Big DataData ArchitectureData Platform
0 likes · 25 min read
Silicon Valley's Data Middle Platform Secrets: EA, Twitter, Airbnb, Uber
DataFunTalk
DataFunTalk
Nov 26, 2020 · Big Data

Evolution of 58.com Commercial Data Warehouse: From 0‑1 to 3.0 Architecture and Technology

This article details the evolution of 58.com’s commercial data warehouse across three phases—1.0, 2.0, and 3.0—covering its scale, four‑layer architecture, migration from legacy Hadoop‑MapReduce pipelines to Flume/Kafka and Flink streaming, code optimizations, monitoring, and productization for real‑time business insights.

Big DataETLHadoop
0 likes · 9 min read
Evolution of 58.com Commercial Data Warehouse: From 0‑1 to 3.0 Architecture and Technology
DataFunTalk
DataFunTalk
Nov 24, 2020 · Artificial Intelligence

Building Next‑Generation Data Intelligence Infrastructure with Knowledge Graphs: From New Infrastructure to Cognitive AI Platforms

This presentation explains how knowledge graphs serve as the foundation for new‑infrastructure initiatives, detailing the evolution of AI from perception to cognition, the role of big‑data centers, DIKW modeling, intelligent data governance, and the construction of a cognitive AI middle‑platform for industry applications.

AI infrastructureArtificial IntelligenceBig Data
0 likes · 18 min read
Building Next‑Generation Data Intelligence Infrastructure with Knowledge Graphs: From New Infrastructure to Cognitive AI Platforms
Big Data Technology Architecture
Big Data Technology Architecture
Nov 24, 2020 · Big Data

Using DeltaLake for Industrial Data Platforms: Distributed Stream Processing, Batch‑Stream Fusion, and Transactional Support

This article shares practical experiences of building an industrial data middle‑platform with DeltaLake, covering heterogeneous distributed stream handling, batch‑stream unified analytics, and transactional/algorithm support to improve data timeliness, reliability, and operational efficiency in manufacturing environments.

Batch-Stream FusionBig DataDeltaLake
0 likes · 11 min read
Using DeltaLake for Industrial Data Platforms: Distributed Stream Processing, Batch‑Stream Fusion, and Transactional Support
Alibaba Cloud Developer
Alibaba Cloud Developer
Nov 23, 2020 · Big Data

How Alibaba’s CCO Built a Cloud‑Native Real‑Time Data Warehouse with Hologres

Alibaba’s Customer Experience (CCO) team transformed its real‑time data platform by evolving from a Lambda‑style database architecture to a cloud‑native real‑time data warehouse powered by Hologres and Flink, achieving higher throughput, lower latency, reduced costs, and self‑service analytics for massive Double‑11 traffic.

AlibabaBig DataFlink
0 likes · 15 min read
How Alibaba’s CCO Built a Cloud‑Native Real‑Time Data Warehouse with Hologres
Alibaba Cloud Developer
Alibaba Cloud Developer
Nov 19, 2020 · Databases

How AnalyticDB Powers Double 11: Cloud‑Native Data Warehouse Innovations

AnalyticDB, a cloud‑native MySQL‑compatible data warehouse, delivered extreme performance during Double 11 by handling billions of orders with ultra‑high write TPS, while introducing compute‑storage separation, hot‑cold tiering, resource groups, elastic scaling and intelligent optimization to meet demanding real‑time analytics workloads.

AnalyticDBBig DataResource Groups
0 likes · 17 min read
How AnalyticDB Powers Double 11: Cloud‑Native Data Warehouse Innovations
Meituan Technology Team
Meituan Technology Team
Nov 19, 2020 · Big Data

Optimizing Apache Kylin for High‑Performance OLAP in Meituan's Sales System

Meituan’s sales system “Qingtian” boosted OLAP performance by migrating Apache Kylin’s build engine from MapReduce to Spark, consolidating Hive files, refining dictionary creation, applying a By‑layer algorithm, and bulk‑loading cuboid files to HBase, cutting resource consumption and halving build time, ultimately reaching a 100 % SLA.

Apache KylinBig DataMeituan
0 likes · 15 min read
Optimizing Apache Kylin for High‑Performance OLAP in Meituan's Sales System
Tencent Tech
Tencent Tech
Nov 19, 2020 · Cloud Computing

How Tencent Built a Massive Cloud Storage System to Power QQ Album and Beyond

This article chronicles Tencent's journey from the early development of the TFS distributed storage platform to large‑scale data migrations, flexible bandwidth strategies, and the creation of the cloud‑native YottaStore, illustrating how a small architecture team solved massive storage challenges for billions of users.

Big DataCloud StorageData Migration
0 likes · 15 min read
How Tencent Built a Massive Cloud Storage System to Power QQ Album and Beyond
DeWu Technology
DeWu Technology
Nov 19, 2020 · Operations

HBase Operations and Use Cases for High‑Concurrency E‑commerce

In this talk, Yun Jin explains how HBase’s petabyte‑scale, horizontally‑scalable architecture—built on Hadoop, HMaster, RegionServers, and Zookeeper—enables e‑commerce platforms to handle extreme promotion‑day traffic by supporting high‑throughput reads/writes, time‑series monitoring, massive order storage, and robust HA, while covering essential table operations, monitoring, and troubleshooting techniques.

Big DataHBaseMonitoring
0 likes · 6 min read
HBase Operations and Use Cases for High‑Concurrency E‑commerce
Java High-Performance Architecture
Java High-Performance Architecture
Nov 18, 2020 · Big Data

Why Pulsar Might Outperform Kafka: Key Advantages and Drawbacks

This article examines Apache Pulsar, an open‑source messaging platform created by Yahoo, compares it with Kafka by outlining Kafka’s common pain points, highlights Pulsar’s multi‑tenant architecture, layered storage, built‑in functions, and security features, and discusses the trade‑offs of each solution.

Apache PulsarBig DataDistributed Systems
0 likes · 6 min read
Why Pulsar Might Outperform Kafka: Key Advantages and Drawbacks
JD Tech Talk
JD Tech Talk
Nov 17, 2020 · Databases

JUST Engine: Novel Spatio‑Temporal Indexes and Data Models for Large‑Scale Urban Data Management

The article introduces the JUST engine, a spatio‑temporal data platform that extends GeoMesa with three new indexes (Z2T, XZ2T, time_range), defines nine common and three specialized data models, provides default indexing strategies, and offers detailed SQL usage guidelines for efficient querying of massive urban datasets.

Big DataGeoMesaJUST engine
0 likes · 25 min read
JUST Engine: Novel Spatio‑Temporal Indexes and Data Models for Large‑Scale Urban Data Management
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 16, 2020 · Big Data

Understanding Data Skew in Big Data: Causes, Symptoms, and Solutions for Hadoop and Spark

This article explains what data skew is, why it occurs in large‑scale Hadoop and Spark jobs, how to recognize its symptoms such as stuck reducers or OOM executors, and presents practical strategies—including business‑level adjustments, code refactoring, and platform‑specific tuning—to mitigate the problem.

Big DataHadoopSpark
0 likes · 13 min read
Understanding Data Skew in Big Data: Causes, Symptoms, and Solutions for Hadoop and Spark
Alibaba Cloud Native
Alibaba Cloud Native
Nov 16, 2020 · Cloud Native

What’s New in Fluid 0.4? DataLoad, Small‑File Boost, HDFS Support & Multi‑Dataset Deployment

Fluid 0.4 introduces a DataLoad custom resource for declarative data pre‑warming, enhances support for massive small‑file datasets, adds HDFS‑compatible access for Spark and other big‑data frameworks, and enables mixed‑deployment of multiple datasets on a single node, all backed by significant performance gains.

AIAlluxioBig Data
0 likes · 8 min read
What’s New in Fluid 0.4? DataLoad, Small‑File Boost, HDFS Support & Multi‑Dataset Deployment
DataFunSummit
DataFunSummit
Nov 15, 2020 · Big Data

Evolution of 58.com Commercial Data Warehouse: From 0‑1 to 3.0 Using Hadoop, Flume, Kafka, Spark, and Flink

This article details the three‑stage evolution of 58.com’s commercial data warehouse, describing its massive scale, four‑layer architecture, technical challenges, migrations from MapReduce to Hive and Flink, real‑time streaming upgrades, and the resulting improvements in stability, accuracy, and timeliness.

Big DataData ArchitectureFlink
0 likes · 10 min read
Evolution of 58.com Commercial Data Warehouse: From 0‑1 to 3.0 Using Hadoop, Flume, Kafka, Spark, and Flink