Tagged articles

3675 articles

Page 29 of 37

Sep 24, 2019 · Big Data

Snowball Data Middle Platform (AIBO): Architecture, Capabilities, and Future Outlook

The article introduces Snowball's AIBO data middle platform, detailing its storage‑compute separation architecture, core capabilities such as data integration, catalog, tagging, analysis tools, micro‑service data APIs, and outlines future enhancements for security, lineage, and continuous business‑driven iteration.

Big DataData CatalogData Integration

0 likes · 12 min read

Snowball Data Middle Platform (AIBO): Architecture, Capabilities, and Future Outlook

Alibaba Cloud Developer

Sep 24, 2019 · Big Data

Inside Alibaba’s 10‑Year Search Engine: Architecture, Data Flow, and Indexing

Alibaba’s 10‑year‑old search engine combines data source aggregation, incremental and real‑time indexing, and online services through platforms like Tisplus, Bahamut, Maat, Ha3, Build Service and Drogo, illustrating a comprehensive architecture that powers 1688’s search capabilities across multiple engines and deployment pipelines.

Backend ArchitectureBig DataDistributed Systems

0 likes · 10 min read

Inside Alibaba’s 10‑Year Search Engine: Architecture, Data Flow, and Indexing

Big Data Technology & Architecture

Sep 23, 2019 · Big Data

Applying Apache Kylin for Large‑Scale OLAP at Meituan: Architecture, Challenges, and Performance Evaluation

This article describes Meituan’s large‑scale OLAP requirements, how Apache Kylin was integrated to meet them, the architectural solutions, performance benchmarks against other engines, and future work, providing practical insights for building stable, precise, and high‑performance analytics platforms.

Apache KylinBig DataHadoop

0 likes · 20 min read

Applying Apache Kylin for Large‑Scale OLAP at Meituan: Architecture, Challenges, and Performance Evaluation

Big Data Technology & Architecture

Sep 22, 2019 · Databases

Alibaba Cloud BDS Service for Non‑Stop HBase Cluster Migration

This article explains how Alibaba Cloud's BDS migration service enables continuous, high‑performance migration of HBase clusters—including schema, full data, and incremental sync—across version upgrades, hardware changes, network migrations, and cross‑region scenarios, while ensuring stability and minimal impact on live workloads.

Alibaba CloudBDSBig Data

0 likes · 10 min read

Alibaba Cloud BDS Service for Non‑Stop HBase Cluster Migration

Big Data Technology & Architecture

Sep 21, 2019 · Big Data

Deploying Apache Flink on Kubernetes: A Step‑by‑Step Guide

This tutorial explains how to run Apache Flink jobs on Kubernetes by building Docker images, deploying JobManager and TaskManager components with Kubernetes manifests, configuring high‑availability with ZooKeeper and HDFS, and using SavePoints and scaling techniques to manage and extend Flink streaming applications.

Big DataDockerFlink

0 likes · 14 min read

Deploying Apache Flink on Kubernetes: A Step‑by‑Step Guide

Beike Product & Technology

Sep 20, 2019 · Big Data

Understanding DStream Construction and Execution in Spark Streaming

This article explains how Spark Streaming's DStream abstraction is built from InputDStream through successive transform operators, details the internal ForEachDStream implementation, describes the job generation and scheduling workflow, and outlines how Beike's real‑time platform leverages these mechanisms for large‑scale streaming tasks.

Big DataDstreamReal-time Processing

0 likes · 10 min read

Understanding DStream Construction and Execution in Spark Streaming

Suning Technology

Sep 20, 2019 · Big Data

How Suning’s Big Data Engine Powers Smart Retail Transformation

Suning’s big‑data center, built on a 30‑year retail evolution and leveraging technologies like AI, cloud, and IoT, showcases how integrated data platforms and robust security can drive smart retail, improve services for 600 million users, and create a new competitive edge.

AIBig DataCloud Computing

0 likes · 6 min read

How Suning’s Big Data Engine Powers Smart Retail Transformation

Big Data Technology & Architecture

Sep 19, 2019 · Big Data

Performance Optimization Practices for Meituan's Hadoop YARN Fair Scheduler

This article presents a comprehensive analysis of Meituan's Hadoop YARN fair scheduler, detailing its architecture, resource abstractions, scheduling workflow, performance bottlenecks, fine‑grained metrics, and a series of optimization techniques—including sorting improvements, job‑skip reduction, parallel queue sorting, and robust rollout strategies—to achieve high‑throughput, low‑latency scheduling for large‑scale offline, streaming, and machine‑learning workloads.

Big DataFair SchedulerPerformance Optimization

0 likes · 24 min read

Performance Optimization Practices for Meituan's Hadoop YARN Fair Scheduler

Big Data Technology & Architecture

Sep 19, 2019 · Big Data

Building a Real‑Time ETL Pipeline with Apache Flink and Ensuring Exactly‑once Semantics

This article demonstrates how to develop a real‑time ETL job using Apache Flink, covering project setup, Kafka as a source, custom bucket assigners for HDFS, checkpointing, savepoints, and deployment on YARN to achieve exactly‑once processing guarantees.

Apache FlinkBig DataExactly-Once

0 likes · 11 min read

Building a Real‑Time ETL Pipeline with Apache Flink and Ensuring Exactly‑once Semantics

FunTester

Sep 19, 2019 · Operations

Emerging Technologies Shaping DevOps and Software Testing in the Next Decade

Over the next decade, rapid advances in IoT, AI, big data, and pervasive automation such as cognitive RPA will transform DevOps practices, driving more integrated, intelligent testing and continuous delivery pipelines, while organizations mature their digital transformation journeys to meet increasingly complex, data‑driven operational demands.

AIBig DataIoT

0 likes · 8 min read

Emerging Technologies Shaping DevOps and Software Testing in the Next Decade

Efficient Ops

Sep 18, 2019 · Databases

Why the DBA Role Is Becoming a Narrowed, High‑Risk Career Path

The article analyzes how the DBA job market is shrinking as traditional enterprises shift away from legacy systems, cloud adoption reshapes responsibilities, and DBAs face limited advancement unless they transition to architecture or data‑analytics roles, highlighting the growing risk and low reward of staying in pure DBA work.

Big DataDBADatabase Administration

0 likes · 7 min read

Why the DBA Role Is Becoming a Narrowed, High‑Risk Career Path

Big Data Technology & Architecture

Sep 18, 2019 · Big Data

Understanding Flink Checkpoint Mechanism and Configuration

This article explains Flink's checkpoint mechanism, its execution flow, common configuration options, and the benefits and considerations of incremental checkpoints using the RocksDB state backend, providing practical code examples and YAML settings for reliable stream processing.

Big DataCheckpointFlink

0 likes · 12 min read

Understanding Flink Checkpoint Mechanism and Configuration

Youzan Coder

Sep 18, 2019 · Big Data

Applying Newton's Law of Cooling to Transaction Scoring in DMP User Profiling

The article proposes using Newton’s law of cooling to score DMP user transactions, assigning higher weights to recent purchases that decay exponentially over time, deriving a cooling constant from boundary conditions, and normalizing the resulting heat‑based scores through log‑scaling and a sigmoid‑like mapping to a 0‑100 range.

Big DataDMPNewton cooling law

0 likes · 4 min read

Applying Newton's Law of Cooling to Transaction Scoring in DMP User Profiling

DataFunTalk

Sep 17, 2019 · Artificial Intelligence

Machine Learning for Personalized Education Paths – Case Study and Reflections

This lecture explores how machine learning can generate individualized learning pathways for students by building knowledge dependency graphs, defining optimization goals, and leveraging historical data to rank candidate routes, while reflecting on data, model, business, and demand challenges in AI-driven education.

AIBig Dataknowledge graph

0 likes · 10 min read

Machine Learning for Personalized Education Paths – Case Study and Reflections

Big Data Technology & Architecture

Sep 16, 2019 · Big Data

Comprehensive Flink Interview Guide: Architecture, APIs, Operators, and Advanced Topics

This guide provides a detailed overview of Apache Flink covering its core streaming engine, APIs (DataSet, DataStream, Table), architectural components, comparison with Spark Streaming, partitioning, parallelism, restart strategies, state backends, time semantics, watermarks, SQL processing, fault‑tolerance mechanisms, memory management, serialization, RPC framework, back‑pressure handling, operator chaining, and practical tips for interview preparation.

Apache FlinkBig DataDataflow

0 likes · 22 min read

Comprehensive Flink Interview Guide: Architecture, APIs, Operators, and Advanced Topics

Big Data Technology & Architecture

Sep 15, 2019 · Big Data

Flink Interview Guide: Concepts, Basics, Advanced Topics, and Source Code

This article presents a comprehensive collection of Flink interview questions covering fundamental concepts, advanced topics, and source‑code details to help candidates prepare effectively for Flink‑related technical interviews.

Apache FlinkBig DataFlink

0 likes · 6 min read

Flink Interview Guide: Concepts, Basics, Advanced Topics, and Source Code

Big Data Technology & Architecture

Sep 14, 2019 · Big Data

Comparison of Open-Source OLAP Engines for Real-Time Data Warehousing

This article reviews the concepts, criteria, and characteristics of major open‑source OLAP engines—including Hive, HAWQ, Spark SQL, Presto, Kylin, Impala, Druid, Greenplum, and ClickHouse—providing guidance on selecting the most suitable solution for various big‑data analytics scenarios.

Big DataOLAPOpen-Source

0 likes · 19 min read

Comparison of Open-Source OLAP Engines for Real-Time Data Warehousing

Big Data Technology & Architecture

Sep 13, 2019 · Big Data

Differences and Relationship Between HBase and Hive in Big Data Architecture

The article explains that HBase and Hive occupy distinct roles in big‑data systems—HBase handles real‑time random queries on massive detail data, while Hive provides batch‑oriented SQL‑based processing on HDFS—and describes how they are typically combined in a data pipeline.

Batch ProcessingBig DataData Architecture

0 likes · 5 min read

Differences and Relationship Between HBase and Hive in Big Data Architecture

iQIYI Technical Product Team

Sep 12, 2019 · Artificial Intelligence

AI Technology Practice and Application in Entertainment

The iQiyi Technology Salon’s AI Technology Practice and Application series explains how AI reshapes entertainment by automating video and audio production, optimizing short‑video flows, enabling intelligent search, and leveraging big‑data analytics for behavior analysis, intent recognition, and personalized recommendations, supported by iQiyi’s robust AI platform.

AI technologyBig DataEntertainment Industry

0 likes · 7 min read

AI Technology Practice and Application in Entertainment

Big Data Technology & Architecture

Sep 11, 2019 · Big Data

Big Data Technology and Architecture: Case Studies of Taobao, Didi, and Meituan

This article reviews the evolution and key components of big data platforms at leading Chinese internet companies—Taobao, Didi, and Meituan—detailing their data sources, synchronization tools, storage layers, processing engines, and scheduling systems to provide practical guidance for building robust big data infrastructures.

ArchitectureBig DataData Platform

0 likes · 9 min read

Big Data Technology and Architecture: Case Studies of Taobao, Didi, and Meituan

Tencent Cloud Developer

Sep 11, 2019 · Big Data

YARN Practice and Technical Evolution at Kuaishou

Jiaoxiao Fang’s talk details Kuaishou’s YARN deployment, covering its architecture, support for offline, real‑time and ML workloads, and recent enhancements such as event‑handling stability, refined preemption, high‑throughput parallel scheduling, shuffle‑caching for small I/O, plus plans for job protection and multi‑cluster resource utilization.

Big DataCluster OptimizationDistributed Systems

0 likes · 16 min read

YARN Practice and Technical Evolution at Kuaishou

DataFunTalk

Sep 10, 2019 · Big Data

Why We Should Ride the Big Data Carriage: Business Perspectives on Data Growth and Machine Learning

The article explains why businesses must embrace the rapid, non‑linear growth of data and machine‑learning technologies, illustrating how data volume and richer information can drive exponential business value, improve competitiveness, and create sustainable positive feedback loops across various industry scenarios.

AIBig DataBusiness strategy

0 likes · 13 min read

Why We Should Ride the Big Data Carriage: Business Perspectives on Data Growth and Machine Learning

Tencent Cloud Developer

Sep 9, 2019 · Databases

Tencent Optimizes Elasticsearch High-Concurrency Write Performance, Cutting 10M Data Load Time by 20%

Tencent engineers improved Elasticsearch’s high‑concurrency write path, reducing the time to load ten million records from eighteen to fifteen minutes—a 20 % speed boost—earning thanks from Elastic’s CEO and showcasing the company’s broader open‑source contributions and strategic cloud‑search partnership.

Big DataElasticsearchOpen-source

0 likes · 6 min read

Tencent Optimizes Elasticsearch High-Concurrency Write Performance, Cutting 10M Data Load Time by 20%

Alibaba Cloud Developer

Sep 9, 2019 · Big Data

Unlocking the Power of Unstructured Data: From AI Breakthroughs to Business Value

This article explains how unstructured data—comprising documents, images, audio, video and more—now dominates over 80% of all data, outlines its characteristics and challenges, compares it with structured data, and showcases real-world AI applications such as ImageNet, intelligent customer service and smart security, while proposing a roadmap for building a unified unstructured‑data asset.

Big DataData Analyticsmachine learning

0 likes · 15 min read

Unlocking the Power of Unstructured Data: From AI Breakthroughs to Business Value

58 Tech

Sep 6, 2019 · Big Data

Architecture and Technical Implementation of the WMDA Data Analytics Platform

The article details WMDA's end‑to‑end data analytics architecture, covering zero‑event data collection, real‑time and offline processing pipelines built on Spark Streaming, Druid, Hadoop, Kettle, and TaskServer, and explains how these components collaborate to deliver comprehensive user behavior analysis.

Big DataDruidETL

0 likes · 11 min read

Architecture and Technical Implementation of the WMDA Data Analytics Platform

Big Data Technology & Architecture

Sep 5, 2019 · Databases

Understanding HBase Connection Management and Best Practices

The article explains why HBase client connections should not be pooled, describes common misuse patterns, and details how the heavyweight, thread‑safe Connection object internally manages connections to HMaster, RegionServers, and ZooKeeper, recommending a single shared Connection per application.

Big DataHBaseclient

0 likes · 10 min read

Understanding HBase Connection Management and Best Practices

Big Data Technology & Architecture

Sep 5, 2019 · Big Data

Applying Flink CEP for Complex Event Processing at Haolo Mobility

This article explains how Flink CEP, a complex event processing library for Apache Flink, is employed at Haolo Mobility to detect intricate patterns in endless data streams by modeling patterns as states and using pattern conditions for state transitions, illustrating its practical application in real‑world big‑data scenarios.

Big DataCEPFlink

0 likes · 2 min read

Applying Flink CEP for Complex Event Processing at Haolo Mobility

Big Data Technology & Architecture

Sep 4, 2019 · Big Data

Understanding Druid: Real‑time OLAP Architecture, Features, Ingestion, and Querying

This article provides a comprehensive overview of Apache Druid, covering its real‑time OLAP design, core features, six‑component architecture, segment storage model, data ingestion pipelines (including Tranquility and Kafka), native and SQL query interfaces, and practical tuning tips with code examples.

ApacheBig DataDruid

0 likes · 17 min read

Understanding Druid: Real‑time OLAP Architecture, Features, Ingestion, and Querying

Big Data Technology & Architecture

Sep 4, 2019 · Artificial Intelligence

Understanding the Relationship Between AI, Big Data, and Cloud Computing

This article explores the historical development of artificial intelligence, its interplay with big data and cloud computing, examines realistic expectations for AI applications, and explains how massive data and scalable cloud resources together drive modern AI advancements.

AIBig DataCloud Computing

0 likes · 13 min read

Understanding the Relationship Between AI, Big Data, and Cloud Computing

360 Tech Engineering

Sep 4, 2019 · Big Data

XSQL: A Low‑Barrier, Stable Multi‑Data‑Source Distributed Query Engine

XSQL is an open‑source, low‑threshold, highly stable distributed query engine that supports federated queries across heterogeneous data sources, offering push‑down optimization, metadata decentralization, multi‑engine integration, and seamless deployment on Spark/YARN for real‑time big‑data analytics.

Big DataDistributed QuerySQL Federation

0 likes · 14 min read

XSQL: A Low‑Barrier, Stable Multi‑Data‑Source Distributed Query Engine

Alibaba Cloud Developer

Sep 4, 2019 · Big Data

How Structured Big Data Storage Powers Modern Data Systems

This article explores the core components of data systems, the evolution toward lightweight, intelligent big data architectures, the distinction between primary and secondary storage, challenges of data replication, and how Alibaba Cloud's Tablestore implements advanced features such as storage‑compute separation, CDC, and multi‑model indexing for scalable, cost‑effective structured big data storage.

Big DataCDCCloud Services

0 likes · 24 min read

How Structured Big Data Storage Powers Modern Data Systems

DataFunTalk

Sep 3, 2019 · Big Data

The Value of Big Data in Machine Learning: Detailed Illustration and Insights

This article explains how big data enhances machine learning by enabling finer-grained data characterization, improving confidence in statistical conclusions, and supporting smarter learning through multiple stages of model development, illustrated with concrete examples and a discussion of sample size dilemmas.

Big Datadata analysismachine learning

0 likes · 10 min read

The Value of Big Data in Machine Learning: Detailed Illustration and Insights

360 Zhihui Cloud Developer

Sep 3, 2019 · Big Data

QuickSQL: 360’s Unified Multi-Source Query Engine Explained

This article outlines how 360’s data center built QuickSQL, a federated SQL engine that unifies queries across heterogeneous sources such as Hive, MySQL, and Elasticsearch, detailing the business challenges, architectural design, performance benchmarks, and future roadmap for multi‑source data analysis.

Big DataData IntegrationSQL Engine

0 likes · 12 min read

QuickSQL: 360’s Unified Multi-Source Query Engine Explained

Tongcheng Travel Technology Center

Sep 3, 2019 · Big Data

Practical Experiences and Lessons Learned in Building a Flink‑Based Real‑Time Computing Platform at Tongcheng‑Elong

This article details the design, implementation, and optimization of a Flink‑based real‑time computing platform at Tongcheng‑Elong, covering the evolution from Storm to Flink, support for FlinkSQL and FlinkStream, metric collection, logging, data lineage, savepoint management, and numerous stability fixes contributed back to the open‑source community.

Big DataData LineageFlink

0 likes · 16 min read

Practical Experiences and Lessons Learned in Building a Flink‑Based Real‑Time Computing Platform at Tongcheng‑Elong

Tencent Cloud Developer

Aug 30, 2019 · Big Data

How Tencent Cloud Leverages Spark, ElasticSearch, and Flink for PB‑Scale Data Warehousing

The cloud+ community and Kuaishou hosted a big‑data technology salon where experts detailed the evolution, architecture, and practical deployments of Spark‑based cloud data warehouses, ElasticSearch, Yarn, and Flink, highlighting trends, optimization techniques, and future directions for enterprise data analytics.

Big DataCloud ComputingElasticsearch

0 likes · 22 min read

How Tencent Cloud Leverages Spark, ElasticSearch, and Flink for PB‑Scale Data Warehousing

Beike Product & Technology

Aug 29, 2019 · Big Data

TiSpark Integration with TiDB/TiKV for Efficient Data Synchronization and OLAP in the Databus Project

This article introduces TiSpark—an extension of Spark that tightly integrates with TiDB/TiKV to enable high‑performance, scalable data synchronization and OLAP queries, details its architecture, key configuration, performance advantages over Spark SQL and Sqoop, and outlines its role in the Databus data‑integration platform.

Big DataData IntegrationPerformance Optimization

0 likes · 10 min read

TiSpark Integration with TiDB/TiKV for Efficient Data Synchronization and OLAP in the Databus Project

360 Smart Cloud

Aug 29, 2019 · Artificial Intelligence

360 Selected to Build a National New‑Generation AI Open Innovation Platform for a Security Brain

At the 2019 World Artificial Intelligence Conference, the Ministry of Science and Technology announced ten national AI open‑innovation platforms, selecting 360 to lead the security‑brain platform, highlighting its role in AI‑driven cybersecurity, big‑data analytics, cloud and blockchain technologies.

360Big DataInformation Security

0 likes · 4 min read

360 Selected to Build a National New‑Generation AI Open Innovation Platform for a Security Brain

58 Tech

Aug 29, 2019 · Information Security

Graph-Based Anomaly Detection Framework for Security Threats

The article presents a graph‑based anomaly detection architecture that tackles black‑market resource switching by constructing complex user‑traffic networks, mining graph similarities, and applying multi‑dimensional strategies to achieve high‑accuracy detection while meeting timeliness, performance, and interpretability requirements.

Big DataInformation Securityanomaly detection

0 likes · 8 min read

Graph-Based Anomaly Detection Framework for Security Threats

Xianyu Technology

Aug 28, 2019 · Big Data

Unified Search System Architecture and Automation for Multiple Business Scenarios

To avoid building separate search services for each Xianyu business, the team created a unified, generic search architecture based on Alibaba’s HA3 engine and a control layer that automates data dumping, indexing, query translation, and result ranking across five subsystems, enabling new services to be onboarded in minutes instead of weeks.

Big Dataautomationdata pipeline

0 likes · 18 min read

Unified Search System Architecture and Automation for Multiple Business Scenarios

dbaplus Community

Aug 27, 2019 · Big Data

How eBay Scales Real‑Time Monitoring with Flink: Metadata‑Driven Streaming

This article explains how eBay’s Sherlock.IO monitoring platform processes billions of logs, events, and metrics daily using Flink Streaming jobs, detailing a metadata‑driven architecture, shared job strategies, Heartbeat‑based monitoring, job isolation, back‑pressure handling, and real‑world use cases such as Event Alerting, Eventzon, and Netmon.

Big DataFlinkReal-time Processing

0 likes · 18 min read

How eBay Scales Real‑Time Monitoring with Flink: Metadata‑Driven Streaming

Big Data Technology & Architecture

Aug 27, 2019 · Big Data

Building a Data Warehouse: Architecture, ETL, Layering, Modeling, and Governance

This article explains how to build a data warehouse from scratch, covering its definition, system and collaboration layers, ETL requirements, data layering design, modeling steps, common challenges, and governance practices such as temporary table management and coding standards.

Big DataData GovernanceETL

0 likes · 13 min read

Building a Data Warehouse: Architecture, ETL, Layering, Modeling, and Governance

Big Data Technology & Architecture

Aug 26, 2019 · Big Data

Comprehensive Collection of Apache Flink Learning Resources

This article compiles a curated list of the most reliable and official Apache Flink learning materials—including beginner tutorials, source‑code walkthroughs, advanced topics, community articles, real‑world case studies, and downloadable resources—providing a one‑stop reference for developers and researchers interested in stream processing and big‑data analytics.

Apache FlinkBig DataResources

0 likes · 10 min read

Comprehensive Collection of Apache Flink Learning Resources

Big Data Technology & Architecture

Aug 25, 2019 · Big Data

Tencent Oceanus: Evolution, Productization, and Optimizations of Real‑Time Stream Computing with Flink

This article recounts Tencent's journey from adopting Flink to building the Oceanus platform, detailing its architecture, product features, and a series of deep extensions—including UI redesign, JobManager failover, checkpoint handling, enhanced windows, LocalKeyBy, watermark idle detection, and log isolation—aimed at supporting trillion‑scale real‑time data processing.

Big DataFlinkOceanus

0 likes · 18 min read

Tencent Oceanus: Evolution, Productization, and Optimizations of Real‑Time Stream Computing with Flink

Architects' Tech Alliance

Aug 24, 2019 · Big Data

Reimagining Big Data in a Post‑Hadoop World

The article analyzes the decline of Hadoop as the dominant big‑data platform, explains how cloud‑based services are replacing its complex on‑premises architecture, and outlines the lessons and future directions for enterprises navigating a post‑Hadoop landscape.

Big DataDistributed SystemsHadoop

0 likes · 12 min read

Reimagining Big Data in a Post‑Hadoop World

Youzan Coder

Aug 23, 2019 · Big Data

How to Build a Robust Event Logging Quality System with Real‑Time Validation

This article outlines common event‑logging quality problems, a systematic registration and real‑time validation framework built on Flink, configurable rule syntax, explainable results, continuous monitoring, targeted optimizations, and an evaluation model that together form a comprehensive quality‑center for big‑data platforms.

Big DataData QualityFlink

0 likes · 11 min read

How to Build a Robust Event Logging Quality System with Real‑Time Validation

Qunar Tech Salon

Aug 22, 2019 · Big Data

Performance Optimization Practices for Meituan's Hadoop YARN Fair Scheduler

This article details Meituan's experience optimizing the Hadoop YARN fair scheduler, covering background challenges, architectural components, resource abstractions, scheduling flow, performance metrics, a series of code‑level optimizations, stability strategies for production rollout, and future directions for large‑scale cluster scheduling.

Big DataFair SchedulerLoad Simulation

0 likes · 23 min read

Big Data Technology Architecture

Aug 21, 2019 · Big Data

Key Big Data Terminology: Offline vs Real-time Computing, Real-time vs Ad Hoc Queries, OLTP vs OLAP, Row vs Column Storage

This article explains fundamental big‑data concepts by comparing offline (batch) and real‑time (stream) computing, distinguishing real‑time queries from ad‑hoc queries, clarifying OLTP versus OLAP workloads, and outlining the differences between row‑based and column‑based storage architectures.

Big DataColumn StorageOLAP

0 likes · 5 min read

Key Big Data Terminology: Offline vs Real-time Computing, Real-time vs Ad Hoc Queries, OLTP vs OLAP, Row vs Column Storage

Big Data Technology & Architecture

Aug 20, 2019 · Big Data

OPPO’s Real‑Time Data Warehouse Construction with Apache Flink

The article summarizes a 2019 Apache Flink Meetup in Shenzhen where OPPO’s big‑data platform lead explains how the company built a real‑time data warehouse using Flink SQL extensions, presents four key aspects of the evolution, application cases, and future directions.

Big DataFlinkOPPO

0 likes · 3 min read

OPPO’s Real‑Time Data Warehouse Construction with Apache Flink

Architects' Tech Alliance

Aug 20, 2019 · Big Data

Current State and Future Trends of Hadoop in the Big Data Landscape

Despite recent market turbulence and negative headlines, Hadoop's revenue continues to grow, driven by cloud migration, evolving storage solutions, and increasing adoption of related projects like Spark and Kafka, positioning it as a leading data‑lake technology.

Apache SparkBig DataData Lake

0 likes · 8 min read

Current State and Future Trends of Hadoop in the Big Data Landscape

21CTO

Aug 20, 2019 · Big Data

How Mogu’s Advertising Platform Built a Real‑Time Data Pipeline with Storm, Flink, and Kylin

This article explains how Mogu’s advertising system designs and evolves a real‑time data pipeline—covering merchant and operation needs, data collection, cleaning, processing with Storm, Flink, and Kylin, and service guarantees—to enable high‑quality, low‑latency analytics for advertisers and the platform.

AdvertisingBig DataFlink

0 likes · 12 min read

How Mogu’s Advertising Platform Built a Real‑Time Data Pipeline with Storm, Flink, and Kylin

DataFunTalk

Aug 20, 2019 · Artificial Intelligence

The Story of Machine Learning: Why Machines Can Learn and How Statistical Learning Makes It Possible

This article explains why machine learning relies on big‑data statistical learning, illustrating human learning through induction and deduction, presenting case studies that highlight the limits of anecdotal reasoning, and introducing the law of large numbers and probabilistic trust as foundations for reliable AI models.

Big DataLearning Theorymachine learning

0 likes · 19 min read

The Story of Machine Learning: Why Machines Can Learn and How Statistical Learning Makes It Possible

Big Data Technology & Architecture

Aug 18, 2019 · Big Data

Flink Application Scenarios and Scale at Kuaishou

The article details how Kuaishou leverages Apache Flink for large‑scale stream processing, describing its application scenarios, cluster sizing, interval join optimization, RocksDB performance challenges, source throttling strategies, JobManager stability, frequent job failures, and platform‑wide improvements.

Big DataFlinkKuaishou

0 likes · 2 min read

Flink Application Scenarios and Scale at Kuaishou

Architects' Tech Alliance

Aug 18, 2019 · Big Data

Oracle Architecture and ASM Storage Configuration Overview

This article provides a comprehensive overview of Oracle database architecture, detailing memory, physical and logical structures, I/O characteristics of various files, differences between OLTP and OLAP workloads, and practical ASM configuration and storage optimization recommendations for high‑performance environments.

ASMBig DataDatabase Storage

0 likes · 12 min read

Oracle Architecture and ASM Storage Configuration Overview

Didi Tech

Aug 17, 2019 · Industry Insights

How Didi’s Ride‑Sharing Data Transforms Automotive Finance Risk Management

This article analyzes how Didi’s unique ride‑hailing scenario big data is applied to automotive finance, detailing the business model, asset‑side and full‑process risk challenges, data‑driven solutions, and future prospects for intelligent credit risk control in both enterprise and retail lending.

Big DataCredit ScoringDidi

0 likes · 14 min read

How Didi’s Ride‑Sharing Data Transforms Automotive Finance Risk Management

Youku Technology

Aug 15, 2019 · Big Data

Youku's Migration from Hadoop to Alibaba Cloud MaxCompute: Benefits and Technical Insights

Youku’s 2017 migration from an on‑premises Hadoop cluster to Alibaba Cloud MaxCompute delivered a unified, elastic data pipeline that cut compute and storage costs by roughly half, handled billions of daily log records, boosted performance and scalability, and empowered analysts with self‑service tools and a rich ecosystem.

Big DataCost OptimizationData Migration

0 likes · 12 min read

Youku's Migration from Hadoop to Alibaba Cloud MaxCompute: Benefits and Technical Insights

DataFunTalk

Aug 14, 2019 · Artificial Intelligence

Understanding Recommendation Systems: From Information Overload to Personalized AI Solutions

The article explores how the rapid growth of the internet has created information overload, discusses the challenges of recommendation systems such as sparsity and timeliness, outlines a four‑step personalized content pipeline, and highlights the interdisciplinary nature of building effective AI‑driven recommendation solutions.

AIBig Datadata engineering

0 likes · 16 min read

Understanding Recommendation Systems: From Information Overload to Personalized AI Solutions

Youzan Coder

Aug 14, 2019 · Big Data

Comprehensive Guide to Data Collection, Event Modeling, and Tracking in Big Data Platforms

The guide explains how comprehensive data collection in big‑data platforms relies on a standardized event model, passive and code‑based embedding, multi‑platform SDKs, a log‑middleware layer, precise location tracking, and an embedding management platform that supports workflow, testing, quality monitoring, and scalable infrastructure for future enhancements.

AnalyticsBig DataLog Processing

0 likes · 19 min read

Comprehensive Guide to Data Collection, Event Modeling, and Tracking in Big Data Platforms

Architecture Digest

Aug 14, 2019 · Big Data

Kafka Overview: Architecture, Storage Mechanism, Replication, and Consumer/Producer Model

Kafka is a distributed, partitioned, replicated messaging system originally developed by LinkedIn, offering high throughput, low latency, fault tolerance, and scalability; this article explains its core concepts, file storage design, partition replication, leader election, consumer groups, delivery guarantees, and operational considerations for big‑data pipelines.

Big DataDistributed SystemsKafka

0 likes · 56 min read

Kafka Overview: Architecture, Storage Mechanism, Replication, and Consumer/Producer Model

Amap Tech

Aug 13, 2019 · Artificial Intelligence

2019 Alibaba Cloud Yunci Conference – Gaode Technology Session (Sept 27)

At the 2019 Alibaba Cloud Yunci Conference in Hangzhou, Gaode Technology presented a comprehensive technical forum covering visual intelligence, autonomous-driving perception, the evolution of its client and traffic-access architecture, fine-grained positioning, route-planning algorithms, and spatio-temporal data applications, featuring expert talks from Gaode and Alibaba specialists.

Big DataCloud NativeLocation-Based Services

0 likes · 8 min read

2019 Alibaba Cloud Yunci Conference – Gaode Technology Session (Sept 27)

Big Data Technology & Architecture

Aug 12, 2019 · Big Data

Spark SQL Parameter Tuning and Performance Optimization (Spark 2.3.2)

This article explains how to troubleshoot and tune Spark SQL configuration parameters—covering exception‑related settings such as spark.sql.hive.convertMetastoreParquet, file‑ignore options, and partition verification, as well as performance‑focused tweaks like broadcast join thresholds, adaptive execution, and parquet schema merging—while providing a comprehensive parameter reference table.

Big DataHive MigrationParameter Tuning

0 likes · 23 min read

Spark SQL Parameter Tuning and Performance Optimization (Spark 2.3.2)

Big Data Technology & Architecture

Aug 11, 2019 · Big Data

Deep Dive into Flink’s Network Stack: Credit‑Based Flow Control and Thread Model Optimizations

This article examines Flink’s industrial‑scale network stack, detailing the credit‑based flow control introduced in version 1.5, the refactored task‑IO thread collaboration, and serialization optimizations that together improve throughput and latency for large‑scale stream processing workloads.

Big DataCredit-based Flow ControlFlink

0 likes · 12 min read

Deep Dive into Flink’s Network Stack: Credit‑Based Flow Control and Thread Model Optimizations

DevOps Cloud Academy

Aug 11, 2019 · Big Data

Overview of MFS Distributed File System Architecture Similar to GoogleFS

The article explains the MFS distributed file system, detailing its four components—Master, Metalogger, Chunkserver, and Client—along with hardware recommendations, metadata handling, replication strategies, and FUSE‑based client mounting, providing a comprehensive guide to building a GoogleFS‑like storage cluster.

Big DataDistributed File SystemMFS

0 likes · 5 min read

Overview of MFS Distributed File System Architecture Similar to GoogleFS

360 Tech Engineering

Aug 9, 2019 · Information Security

Zhou Hongyi Highlights the Growing Threat of Cyber Warfare and the Need for Advanced Security Intelligence

In a Sanya digital summit speech, Zhou Hongyi warned that cyber warfare has become a major national‑level threat, outlined four key shifts in enterprise security, and described 360's big‑data security brain and future plans to build a nation‑wide defensive ecosystem.

APTBig DataCyber Warfare

0 likes · 5 min read

Zhou Hongyi Highlights the Growing Threat of Cyber Warfare and the Need for Advanced Security Intelligence

Big Data Technology & Architecture

Aug 8, 2019 · Big Data

Comprehensive Guide to Apache Kylin: Architecture, Concepts, Cube Design and Optimization

This article provides an in‑depth overview of Apache Kylin’s pre‑computation architecture, data‑warehouse concepts, step‑by‑step cube creation from Hive tables, and advanced optimization techniques such as derived dimensions, aggregation groups, and HBase row‑key encoding to achieve sub‑second OLAP queries on massive datasets.

Apache KylinBig DataCube

0 likes · 20 min read

Comprehensive Guide to Apache Kylin: Architecture, Concepts, Cube Design and Optimization

360 Quality & Efficiency

Aug 8, 2019 · Big Data

An Introduction to Kafka: Architecture, Design Principles, and Common Issues

This article introduces Kafka, covering its definition, core concepts such as topics, partitions, offsets, producers and consumers, typical use cases, underlying design principles including message‑partition allocation and retention policies, processing mechanisms, and common troubleshooting questions for real‑world deployments.

Big DataDistributed MessagingKafka

0 likes · 7 min read

An Introduction to Kafka: Architecture, Design Principles, and Common Issues

vivo Internet Technology

Aug 7, 2019 · Big Data

Understanding Apache Kafka: Concepts, Architecture, Deployment, Monitoring and Offset Management

The article gives a thorough overview of Apache Kafka, explaining its core concepts, architecture, deployment steps, monitoring tools, and offset management, including broker and topic structures, producer/consumer APIs, replication, leader election, consumer groups, offset committing, and practical configuration and troubleshooting guidance.

Big DataKafkaMessaging

0 likes · 36 min read

Understanding Apache Kafka: Concepts, Architecture, Deployment, Monitoring and Offset Management

Ctrip Technology

Aug 7, 2019 · Big Data

Improving Log Replay Efficiency with Flink and Elasticsearch at Ctrip Ticket Frontend

The article describes how Ctrip's ticket front‑end team replaced a slow, manual log‑pulling process with a Flink‑based real‑time pipeline that streams Kafka data, indexes it in Elasticsearch, and enables second‑level log retrieval for automated scenario replay, dramatically reducing CI cycle time.

Automation testingBig DataElasticsearch

0 likes · 7 min read

Improving Log Replay Efficiency with Flink and Elasticsearch at Ctrip Ticket Frontend

dbaplus Community

Aug 6, 2019 · Databases

How ClickHouse Powers Real‑Time Hotel Data Analytics at Ctrip

This article details Ctrip's hotel data platform challenges with billions of daily updates and near‑million queries, evaluates various storage options, explains why ClickHouse was chosen, and describes the full‑load and incremental pipelines, monitoring, server clustering, and practical tips that enable sub‑second query performance at massive scale.

Big DataCtripDatabase Optimization

0 likes · 13 min read

How ClickHouse Powers Real‑Time Hotel Data Analytics at Ctrip

Big Data Technology & Architecture

Aug 5, 2019 · Big Data

Apache Spark Latest Technological Developments and Outlook for Spark 3.0+

The article provides a comprehensive overview of recent Apache Spark advancements—including Delta Lake, Data Source V2, runtime optimizations, relational cache, cloud‑native challenges, AI integration via Project Hydrogen, and the anticipated features of Spark 3.0—highlighting how these innovations address modern data‑warehouse, cloud, and machine‑learning workloads.

Apache SparkBig DataDelta Lake

0 likes · 17 min read

Apache Spark Latest Technological Developments and Outlook for Spark 3.0+

Alibaba Cloud Developer

Aug 5, 2019 · Cloud Computing

How Tmall’s Smart Stores Are Redefining New Retail with Cloud and Data

Alibaba’s senior tech expert Mu Jian explains how Tmall’s smart stores embody the new retail paradigm by leveraging cloud computing, big data, and digital tools to transform offline retail, enhance consumer experiences, streamline operations, and create integrated online‑offline ecosystems through cloud stores, cloud POS, and innovative marketing solutions.

Big DataCloud ComputingDigital Transformation

0 likes · 25 min read

How Tmall’s Smart Stores Are Redefining New Retail with Cloud and Data

Big Data Technology & Architecture

Aug 4, 2019 · Big Data

Apache Pulsar vs Apache Kafka: Architecture, Performance, and Advantages

This article compares Apache Kafka and Apache Pulsar, detailing Kafka's scalability challenges, Pulsar's architectural benefits, performance gains, multi‑tenant support, security features, and provides code examples and migration guidance for large‑scale streaming applications.

Apache PulsarBig DataDistributed Systems

0 likes · 11 min read

Apache Pulsar vs Apache Kafka: Architecture, Performance, and Advantages

Big Data Technology & Architecture

Aug 3, 2019 · Big Data

Understanding SparkEnv Initialization: Components and Their Setup

This article walks through the SparkEnv initialization process in Apache Spark, detailing how the driver and executor environments are created, the key components such as SecurityManager, RpcEnv, SerializerManager, BroadcastManager, MapOutputTracker, ShuffleManager, MemoryManager, BlockManager, MetricsSystem, and OutputCommitCoordinator are instantiated, and how the final SparkEnv instance is assembled and stored.

Big DataScalaSpark

0 likes · 13 min read

Understanding SparkEnv Initialization: Components and Their Setup

Suning Technology

Aug 2, 2019 · Big Data

How SuNing Uses Big Data to Revolutionize Retail Supply Chains

At the 15th China (Nanjing) International Software Expo, SuNing's VP shared how the company applies big‑data analytics, the C2M model, and flexible manufacturing to personalize retail experiences, bridge online‑offline gaps, and drive data‑driven product development and supply‑chain efficiency.

Big DataC2MData-driven

0 likes · 9 min read

How SuNing Uses Big Data to Revolutionize Retail Supply Chains

Meituan Technology Team

Aug 1, 2019 · Big Data

Performance Optimization Practices for Meituan's Hadoop YARN Fair Scheduler

Meituan improved its custom Hadoop YARN Fair Scheduler by pre‑computing resource usage, filtering zero‑demand jobs, and parallelizing queue sorting, which reduced sorting time from 30 s to 5 s per minute, boosted container‑per‑second throughput to 50 k, enabled live roll‑backs, and prepared the system for clusters up to 10 k nodes and future scaling to hundreds of thousands.

Big DataFair SchedulerHadoop

0 likes · 24 min read

21CTO

Jul 31, 2019 · Artificial Intelligence

How JD Built a Scalable AI‑Powered Recommendation System

The article outlines JD’s evolution from rule‑based product suggestions in 2012 to a sophisticated, AI‑driven, multi‑screen personalized recommendation platform, detailing its product types, system architecture, data collection, offline and online computation, and the core recommendation engine that powers features like “Guess You Like.”

AIBig DataJD.com

0 likes · 14 min read

How JD Built a Scalable AI‑Powered Recommendation System

360 Tech Engineering

Jul 31, 2019 · Backend Development

Design and Key Technologies of the 360 Search Engine for Billion‑Scale Web Retrieval

This article explains how 360 Search processes billions of web pages daily, detailing its backend architecture, offline indexing, online retrieval, index organization, and relevance models that enable efficient search over a hundred‑billion‑scale web corpus.

Big DataDistributed SystemsHBase

0 likes · 21 min read

Design and Key Technologies of the 360 Search Engine for Billion‑Scale Web Retrieval

dbaplus Community

Jul 30, 2019 · Big Data

Spark vs Flink: Which Real‑Time Engine Should You Choose for Kafka Streams?

With the surge in real‑time data from sensors and devices, choosing the right streaming engine is critical; this article compares Apache Spark and Apache Flink—examining their architectures, micro‑batch vs continuous processing, strengths, limitations, and use‑case suitability for Kafka‑driven pipelines.

Big DataFlinkKafka

0 likes · 14 min read

Spark vs Flink: Which Real‑Time Engine Should You Choose for Kafka Streams?

Big Data Technology & Architecture

Jul 29, 2019 · Databases

Comprehensive Comparison of Apache Kylin and Apache Doris: Architecture, Data Models, Storage, Query, and Operations

This article provides an in‑depth technical comparison of Apache Kylin and Apache Doris, covering their system architectures, aggregation and detail data models, storage engines, data import processes, query execution, deduplication, metadata handling, performance, high availability, maintainability, usability, schema‑change capabilities, features, and community ecosystems.

Apache DorisApache KylinBig Data

0 likes · 21 min read

Comprehensive Comparison of Apache Kylin and Apache Doris: Architecture, Data Models, Storage, Query, and Operations

Architects' Tech Alliance

Jul 28, 2019 · Big Data

Alluxio: A Virtual Distributed File System for Unified Big Data Access and Cost‑Effective Storage

The article explains how Alluxio, a memory‑speed virtual distributed file system, acts as a virtual data lake to unify access to structured and unstructured big‑data across heterogeneous storage systems, offering on‑demand fast local access, intelligent caching, reduced storage costs, and enterprise‑grade security and fault tolerance.

AlluxioBig DataData Lake

0 likes · 15 min read

Alluxio: A Virtual Distributed File System for Unified Big Data Access and Cost‑Effective Storage

Big Data Technology & Architecture

Jul 28, 2019 · Databases

An Overview of Apache Kudu: Architecture, Table Design, and Storage Details

This article provides a comprehensive introduction to Apache Kudu, covering its origins, cluster architecture with Raft consensus, table schema and partitioning design, and detailed storage mechanisms including MemRowSet, DiskRowSet, CFile, and compaction processes.

Big DataDatabase ArchitectureDistributed Systems

0 likes · 11 min read

An Overview of Apache Kudu: Architecture, Table Design, and Storage Details

dbaplus Community

Jul 24, 2019 · Big Data

Essential Open-Source Tools Every Big Data Engineer Should Know

This article compiles a comprehensive list of common open‑source tools for big data platforms—covering programming languages, data collection, ETL, storage, analysis, query, management, and monitoring—to help learners and practitioners quickly locate and understand the technologies they need.

Big DataETLHadoop

0 likes · 15 min read

Essential Open-Source Tools Every Big Data Engineer Should Know

Tencent Cloud Developer

Jul 24, 2019 · Big Data

Implementing Custom Data Sources in Spark: TGSpark Data Source V2 Practice

The article explains how Tencent’s TGSpark leverages Spark DataSource V2 to create a custom source for TGMars storage, detailing shard‑aware design, push‑down of columns and filters, columnar batch loading, partition‑location reporting, and experimental results that show reduced shuffles and improved local computation when executor placement matches storage nodes.

Big DataColumn PushdownCustom Data Source

0 likes · 10 min read

Implementing Custom Data Sources in Spark: TGSpark Data Source V2 Practice

Big Data Technology & Architecture

Jul 23, 2019 · Big Data

Understanding Google Dataflow: Model, Windowing, Triggers, and Incremental Processing

This article explains the Google Dataflow model, covering its unified batch‑and‑stream architecture, windowing and triggering mechanisms, core primitives, time domains, and how these concepts form the foundation of modern big‑data stream processing systems.

Big DataDataflowGoogle Cloud

0 likes · 13 min read

Understanding Google Dataflow: Model, Windowing, Triggers, and Incremental Processing

Xianyu Technology

Jul 23, 2019 · Operations

Automated Service Fault Localization System Architecture

The automated service fault localization system ingests massive real‑time instrumentation data, builds call‑chain graphs, and instantly pinpoints the exact component causing timeouts or other errors, achieving developer‑level accuracy within seconds instead of minutes while remaining simple, fast, and fully automated.

Big DataFault LocalizationOperations

0 likes · 8 min read

Automated Service Fault Localization System Architecture

Big Data Technology & Architecture

Jul 20, 2019 · Big Data

Registering UDF, UDTF, and UDAF Functions in Apache Flink – Common Pitfalls and Solutions

This article explains how to register scalar UDFs, table‑valued UDTFs, and aggregate UDAFs in Apache Flink, illustrates typical compilation and runtime pitfalls with concrete Scala code examples, and provides corrected implementations and best‑practice tips for reliable function registration.

Apache FlinkBig DataScala

0 likes · 13 min read

Registering UDF, UDTF, and UDAF Functions in Apache Flink – Common Pitfalls and Solutions

Big Data Technology Architecture

Jul 20, 2019 · Big Data

Resolving JDK Version Mismatch in Spark Streaming Jobs with Elasticsearch on YARN

This guide explains how a Spark Streaming job failed due to an incorrect JDK version, demonstrates how to identify the mismatch from ApplicationMaster logs, and provides the correct Spark‑submit configuration to set JAVA_HOME for both driver and executor so the job runs successfully.

Big DataJDK

0 likes · 6 min read

Resolving JDK Version Mismatch in Spark Streaming Jobs with Elasticsearch on YARN

System Architect Go

Jul 19, 2019 · Big Data

Introduction to HBase: Architecture, Data Model, and Operations

This article provides a comprehensive overview of HBase, covering its distributed column‑oriented architecture, data model components, storage mechanisms, read/write processes, WAL lifecycle, MemStore flushing, region splitting and merging, and failure recovery within the Hadoop ecosystem.

ArchitectureBig DataHBase

0 likes · 20 min read

Introduction to HBase: Architecture, Data Model, and Operations

dbaplus Community

Jul 18, 2019 · Databases

How JD.com Scales HBase to 90PB: Architecture, Optimizations, and Lessons

This article examines JD.com's massive HBase deployment, detailing its evolution from early adoption to a 90PB, 7,000‑node cluster, the platform's architecture, multi‑active disaster recovery, multi‑tenant isolation, and the integration of Phoenix for SQL‑based access, offering practical insights for large‑scale distributed storage.

Big DataDatabase ArchitectureHBase

0 likes · 15 min read

How JD.com Scales HBase to 90PB: Architecture, Optimizations, and Lessons

Tencent Cloud Developer

Jul 18, 2019 · Big Data

Tencent iData Analysis Center: Why We Chose Spark as Our Computing Platform

Tencent’s iData analysis center selected Spark as its new computing platform because, unlike ElasticSearch, TiDB, and other MPP solutions, Spark offers iterative processing, shuffle support, robust SQL and DAG scheduling, and flexible SMP‑style data exchange, enabling efficient OLAP on billions of game‑user records.

Big DataData PlatformMPP

0 likes · 13 min read

Tencent iData Analysis Center: Why We Chose Spark as Our Computing Platform

Big Data Technology & Architecture

Jul 17, 2019 · Big Data

How to Write Spark DataFrames to Hive Tables and Partitions

This article explains how to persist Spark DataFrames into Hive tables and specific partitions, covering the relevant write APIs, the need to select a database, and providing step‑by‑step Scala code examples for both Spark 1.6 and Spark 2.x versions, along with Hive table creation syntax.

Big DataScalaSpark

0 likes · 10 min read

How to Write Spark DataFrames to Hive Tables and Partitions

Youku Technology

Jul 17, 2019 · Artificial Intelligence

How AI and Big Data Drive Casting Decisions in the TV Series “The Longest Day in Chang'an”

Youku’s AI‑powered Beidouxing system analyzed audience tags, attractiveness scores and performance data to select Lei Jiayin and Yi Yangqianxi for “The Longest Day in Chang’an”, guiding casting, episode frequency and other production choices while reducing subjective bias and expanding the talent pool.

AIBig DataCasting

0 likes · 13 min read

How AI and Big Data Drive Casting Decisions in the TV Series “The Longest Day in Chang'an”

DataFunTalk

Jul 16, 2019 · Databases

TDengine Architecture and Storage Design for IoT Big Data

This article explains TDengine’s architecture, including its management, data, and client modules, virtual node design, write process, and detailed storage file structures, highlighting how its innovative design optimizes resource usage and performance for IoT and other big‑data applications.

ArchitectureBig DataIoT

0 likes · 12 min read

TDengine Architecture and Storage Design for IoT Big Data

Amap Tech

Jul 16, 2019 · Industry Insights

How Amap’s Big Data Powers Smart City Traffic – Insights from CCF‑GAIR 2019

At the 2019 CCF‑GAIR summit, Amap’s Director of Future Transportation explained how the company’s massive location‑based data, real‑time traffic feeds, and AI‑driven analytics enable smart traffic management, emergency vehicle routing, and predictive highway safety, delivering measurable congestion reductions and faster journeys across Chinese cities.

AIBig DataSmart City

0 likes · 10 min read

How Amap’s Big Data Powers Smart City Traffic – Insights from CCF‑GAIR 2019

DataFunTalk

Jul 15, 2019 · Big Data

Key Infrastructure Considerations for Autonomous Driving: Storage, Computing, and Services

The article reviews the essential infrastructure for autonomous driving, covering massive sensor data storage strategies, the role of metadata, offline and real‑time computing platforms, basic micro‑service components, and various business scenarios, highlighting why robust big‑data handling is critical.

Big DataReal‑Time Computingautonomous driving

0 likes · 14 min read

Key Infrastructure Considerations for Autonomous Driving: Storage, Computing, and Services

Big Data Technology & Architecture

Jul 14, 2019 · Big Data

An Overview of Apache Kudu: Architecture, Table Design, and Storage Details

This article provides a comprehensive introduction to Apache Kudu, covering its origins, cluster architecture with Raft consensus, schema‑based table and partition design, and the intricate storage engine that combines in‑memory and on‑disk structures to deliver fast OLTP and OLAP capabilities on fast data.

Big DataKuduRaft consensus

0 likes · 12 min read

Alibaba Cloud Developer

Jul 12, 2019 · Big Data

Designing a Real‑Time Big Data Sentiment System on Alibaba Cloud: From Lambda to Lambda‑Plus

This article explains how massive online data can be captured, structured, and analyzed in real time using a Lambda‑style architecture, then introduces a simplified Lambda‑Plus design built on Alibaba Cloud's Tablestore and Blink to meet both batch and streaming requirements while reducing operational complexity.

Big DataCloud ComputingLambda architecture

0 likes · 18 min read

Designing a Real‑Time Big Data Sentiment System on Alibaba Cloud: From Lambda to Lambda‑Plus

Big Data Technology & Architecture

Jul 9, 2019 · Big Data

Understanding Flink State Management and Checkpointing for Exactly-Once Kafka Integration

This article explains how Apache Flink manages state, uses checkpointing for fault-tolerant recovery, and achieves exactly-once semantics when consuming Kafka streams by persisting offsets, describing the checkpoint mechanism, recovery process, and practical considerations for production deployments.

Big DataCheckpointFlink

0 likes · 8 min read

Understanding Flink State Management and Checkpointing for Exactly-Once Kafka Integration

Big Data Technology & Architecture

Jul 7, 2019 · Big Data

Deep Dive into Flink's RPC Framework Implemented with Akka

This article explains how Apache Flink builds its RPC communication layer on top of Akka by detailing the Actor model, actor system creation, message passing patterns, key RPC interfaces such as RpcGateway and RpcEndpoint, and the internal workflow of request handling and execution.

AkkaBig DataDistributed Systems

0 likes · 20 min read

Deep Dive into Flink's RPC Framework Implemented with Akka

Big Data Technology & Architecture

Jul 6, 2019 · Big Data

Understanding Broadcast, Shuffle, and Sort‑Merge Joins in Spark SQL

This article explains the principles, use cases, and performance considerations of Spark SQL's three join implementations—Broadcast Hash Join, Shuffle Hash Join, and Sort‑Merge Join—illustrating how table size and distribution affect the choice of algorithm for efficient large‑scale data processing.

Big DataBroadcast JoinJoin Algorithms

0 likes · 11 min read

Understanding Broadcast, Shuffle, and Sort‑Merge Joins in Spark SQL

Didi Tech

Jul 5, 2019 · Artificial Intelligence

Didi's Open-Source Contributions and Technical Innovations in AI and Big Data

Didi’s platform handles over 700 billion daily ETA requests using AI‑driven real‑time calculations, while its 6,000‑plus engineers rely on open‑source big‑data, cloud and AI frameworks, contribute 23 projects that have earned more than 36,000 stars, and provide anonymized traffic data to academia for transportation and urban‑planning research.

AIBig DataOpen-source

0 likes · 9 min read

Didi's Open-Source Contributions and Technical Innovations in AI and Big Data