Tagged articles

3675 articles

Page 28 of 37

Dec 20, 2019 · Big Data

How to Supercharge Elasticsearch for Billion‑Row Queries: Practical Optimization Guide

This article explains the architecture of Elasticsearch and Lucene, outlines common performance bottlenecks, and provides concrete indexing and search optimization techniques—including bulk writes, shard routing, doc values tuning, and pagination strategies—to achieve sub‑second query responses on billions of records.

Big DataElasticsearchPerformance Tuning

0 likes · 14 min read

How to Supercharge Elasticsearch for Billion‑Row Queries: Practical Optimization Guide

Qunar Tech Salon

Dec 20, 2019 · Big Data

Understanding Flink Cluster Startup and Job Execution Process

This article explains the architecture of a Flink cluster, detailing the startup procedures for JobManager and TaskManager, the three deployment modes, and the end‑to‑end flow of a Flink job from client code through StreamGraph, JobGraph, ExecutionGraph to the physical execution on TaskManagers.

Big DataCluster ArchitectureFlink

0 likes · 10 min read

Understanding Flink Cluster Startup and Job Execution Process

Big Data Technology & Architecture

Dec 20, 2019 · Big Data

Understanding Hadoop YARN Schedulers: FIFO, Capacity, and Fair Scheduler

This article explains the role of YARN's Scheduler, compares FIFO, Capacity, and Fair schedulers, details their configurations—including XML snippets for Capacity and Fair schedulers, queue hierarchy, preemption settings, and provides practical guidance for resource allocation in Hadoop clusters.

Big DataCapacity SchedulerFair Scheduler

0 likes · 13 min read

Understanding Hadoop YARN Schedulers: FIFO, Capacity, and Fair Scheduler

Big Data Technology & Architecture

Dec 19, 2019 · Big Data

Apache Kafka 2.4.0 Release: New Features and Improvements

Apache Kafka 2.4.0 introduces a range of new capabilities—including consumer replica fetching, incremental cooperative rebalancing, MirrorMaker 2.0, a new Java authorization API, KTable non‑key joins, administrative replica reassignment, protected REST endpoints, and offset deletion—along with numerous performance and stability improvements.

Apache KafkaBig DataDistributed Systems

0 likes · 3 min read

Apache Kafka 2.4.0 Release: New Features and Improvements

vivo Internet Technology

Dec 18, 2019 · Big Data

Comprehensive Overview of Big Data Architecture, Lambda/Kappa Models, and End-to-End Data Platform Design

The article surveys modern big‑data architecture, contrasting Lambda and Kappa models, highlights common governance and integration pain points, and proposes an end‑to‑end platform featuring unified metadata, stream‑batch processing, one‑click ingestion, standardized modeling, intelligent query abstraction, and a comprehensive development IDE.

Big DataData PlatformETL

0 likes · 13 min read

Comprehensive Overview of Big Data Architecture, Lambda/Kappa Models, and End-to-End Data Platform Design

Big Data Technology & Architecture

Dec 17, 2019 · Big Data

Understanding Flink Sliding Windows and Performance Optimizations

This article explains Flink's sliding window mechanism, shows how the WindowAssigner and WindowOperator work with code examples, analyzes the performance impact of fine‑grained sliding windows, and proposes a practical workaround using tumbling windows combined with external storage such as Redis for efficient PV/UV aggregation.

Big DataFlinkPerformance Optimization

0 likes · 8 min read

Understanding Flink Sliding Windows and Performance Optimizations

Alibaba Cloud Developer

Dec 16, 2019 · Big Data

Why Apache Flink Became the Fastest‑Growing Open‑Source Big Data Engine in 2019

Apache Flink, the open‑source stream‑and‑batch processing engine, has surged to become one of the most active Apache projects, with rapid community growth in China, unified SQL capabilities, AI‑focused extensions, Kubernetes integration, and benchmark results that outperform Hive by up to seven times.

AIApache FlinkBig Data

0 likes · 14 min read

Why Apache Flink Became the Fastest‑Growing Open‑Source Big Data Engine in 2019

DataFunTalk

Dec 13, 2019 · Databases

Lindorm: High‑Performance Distributed NoSQL Database for Big Data

Lindorm, an Alibaba‑derived distributed NoSQL database built on HBase, delivers multi‑model hybrid storage, five‑fold throughput gains, sub‑millisecond latency, advanced indexing, cloud‑native elasticity, strong/adjustable consistency, and comprehensive security and multi‑tenant features for massive data workloads.

Big DataNoSQLPerformance Optimization

0 likes · 25 min read

Lindorm: High‑Performance Distributed NoSQL Database for Big Data

Architecture Digest

Dec 13, 2019 · Big Data

Understanding Data Middle Platform: Concepts, Architecture, and Real‑Time Implementation

The article explains the data middle platform concept, its distinction from traditional big‑data platforms, the architectural principles behind Alibaba's implementation, and how real‑time ingestion, processing, and service layers enable efficient, collaborative, and scalable data-driven applications.

AlibabaBig DataData Middle Platform

0 likes · 13 min read

Understanding Data Middle Platform: Concepts, Architecture, and Real‑Time Implementation

HomeTech

Dec 12, 2019 · Big Data

Architecture and Design of the Home Data Integration Governance Platform

The article describes the background, architecture, and design principles of a unified big‑data scheduling and data‑exchange platform, detailing its data ingestion “direct‑train”, centralized scheduling engine, and DataX‑based data‑exchange components along with monitoring, alerting, and security features.

Big DataData IntegrationDataX

0 likes · 7 min read

Architecture and Design of the Home Data Integration Governance Platform

Sohu Tech Products

Dec 11, 2019 · Mobile Development

Technical Q&A: Android Dex Encryption and User Profiling for Content Recall

The article announces the new “Expert Talk” column, shares technical answers on Android dex‑based app hardening and user profiling for content recall, and promotes a giveaway event with prize details and participation instructions for readers.

AndroidBig DataMobile Development

0 likes · 4 min read

Technical Q&A: Android Dex Encryption and User Profiling for Content Recall

Product Technology Team

Dec 11, 2019 · Big Data

How a Data Middle Platform Transforms Business: Design, Architecture, and Modeling Insights

This article explains what a data middle platform is, why it matters, its core components—including storage, compute, IDE, workflow, API services, and data asset management—and details the layered architecture of ODS, DWD, DWT, DIM, and DWA, as well as dimensional modeling using Kimball’s methodology.

ArchitectureBig DataData Platform

0 likes · 6 min read

How a Data Middle Platform Transforms Business: Design, Architecture, and Modeling Insights

Programmer DD

Dec 11, 2019 · Big Data

Big Data Architecture Secrets: Storage-Compute Separation & Spark in Action

This article explores how enterprises can tackle the explosive growth of data by adopting modern big‑data architectures, including storage‑compute separation, data‑driven workflows, risk‑control frameworks, and real‑world Spark optimizations, offering practical guidance for scalable, high‑performance analytics.

Big DataData ArchitectureData-driven

0 likes · 12 min read

Big Data Architecture Secrets: Storage-Compute Separation & Spark in Action

dbaplus Community

Dec 10, 2019 · Backend Development

How to Optimize Elasticsearch for Billions of Records: Practical Tuning Guide

An in‑depth guide walks through Elasticsearch’s underlying Lucene architecture, explains shard routing and DocValues, then presents concrete index‑ and search‑performance tweaks—bulk writes, refresh intervals, memory allocation, SSD usage, field mapping, pagination strategies—and shows benchmark results that reduce query latency to seconds for billions of records.

Big DataElasticsearchIndex Optimization

0 likes · 13 min read

How to Optimize Elasticsearch for Billions of Records: Practical Tuning Guide

21CTO

Dec 9, 2019 · Big Data

China’s Big Data Crackdown: Legal Risks Every Developer Should Know

The article examines the sweeping regulatory crackdown on China’s big‑data and financial‑risk companies, detailing the dissolution of major crawler firms, new legal restrictions on data collection, and practical guidance on what data‑scraping activities are illegal and how to protect personal information.

Big DataLegal ComplianceWeb Crawling

0 likes · 11 min read

China’s Big Data Crackdown: Legal Risks Every Developer Should Know

Big Data Technology & Architecture

Dec 9, 2019 · Big Data

Building a Real‑Time ETL Pipeline with Apache Flink: Kafka to HDFS with Exactly‑Once Guarantees

This article explains how to develop a real‑time ETL application using Apache Flink that reads events from Kafka, partitions them by event time into HDFS directories, and achieves exactly‑once processing through checkpointing, custom bucket assigners, and proper state backend configuration.

Apache FlinkBig DataExactly-Once

0 likes · 11 min read

Building a Real‑Time ETL Pipeline with Apache Flink: Kafka to HDFS with Exactly‑Once Guarantees

Architecture Digest

Dec 8, 2019 · Big Data

Technical Feasibility of a Nationwide WeChat Group with 1.4 Billion Users

The article analyses whether it is technically possible to place all 1.4 billion Chinese users into a single WeChat group, examining population data, message volume, CPU and network requirements, hardware costs, physical space, and human visual limits to assess scalability and practicality.

Big DataNetwork BandwidthServer Architecture

0 likes · 11 min read

Technical Feasibility of a Nationwide WeChat Group with 1.4 Billion Users

ITPUB

Dec 5, 2019 · Big Data

How to Achieve Sub‑Second Queries on Billions of Records with Elasticsearch

This article explains how a data platform handling billions of daily records can be optimized for cross‑month queries and sub‑second response times by tuning Elasticsearch indexing, shard routing, Lucene structures, and hardware configurations.

Big DataPerformance Tuningindexing

0 likes · 13 min read

How to Achieve Sub‑Second Queries on Billions of Records with Elasticsearch

Big Data Technology & Architecture

Dec 4, 2019 · Big Data

Comprehensive Flink Interview Guide: Core Concepts, Advanced Topics, and Source‑Code Insights

This article provides an in‑depth Flink interview guide covering the framework’s core concepts, advanced features such as fault‑tolerance, state management, and checkpointing, as well as detailed explanations of its architecture, APIs, partitioning strategies, and source‑code flow, complete with code examples.

Big DataDistributed SystemsFlink

0 likes · 29 min read

Comprehensive Flink Interview Guide: Core Concepts, Advanced Topics, and Source‑Code Insights

AntTech

Dec 4, 2019 · Artificial Intelligence

Ant Financial’s Online Learning System Built on Ray: Architecture, Challenges, and Future Plans

The interview details how Ant Financial transitioned from offline to online machine learning by adopting the Ray distributed engine, describing their open architecture, fusion computing approach, technical advantages, encountered pitfalls, and plans to open‑source the system for broader AI and big‑data use.

AIAnt FinancialBig Data

0 likes · 15 min read

Ant Financial’s Online Learning System Built on Ray: Architecture, Challenges, and Future Plans

Big Data Technology & Architecture

Dec 2, 2019 · Big Data

Implementing Custom Flink Sources and Sinks for RocketMQ and HBase Streaming

This article explains how to create custom Flink SourceFunction and SinkFunction implementations, demonstrates a RocketMQ source and an HBase sink with full code examples, and discusses checkpointing, event‑time handling, and deployment of the streaming job on a Flink‑on‑YARN cluster.

Big DataFlinkHBase

0 likes · 16 min read

Implementing Custom Flink Sources and Sinks for RocketMQ and HBase Streaming

Yanxuan Tech Team

Dec 2, 2019 · Big Data

Why Modern Enterprises Need a Data Middle Platform: Lessons from NetEase Yanxuan

Drawing on NetEase Yanxuan’s experience, this article explains what a data middle platform is, why companies are building one for digital transformation and fine‑grained operations, and details its core components—including the data warehouse, data services, and BI platform—illustrated with real‑world diagrams.

BIBig DataData Middle Platform

0 likes · 12 min read

Why Modern Enterprises Need a Data Middle Platform: Lessons from NetEase Yanxuan

Big Data Technology & Architecture

Dec 1, 2019 · Big Data

Dynamic Configuration Updates in Real-Time Streaming with Spark Broadcast Variables and Flink Broadcast State

This article explains how to dynamically update configuration data in real‑time Spark Streaming and Flink jobs using broadcast variables and broadcast state, providing Java code examples and discussing the limitations and practical considerations of each approach.

Big DataFlinkReal-time Streaming

0 likes · 8 min read

Dynamic Configuration Updates in Real-Time Streaming with Spark Broadcast Variables and Flink Broadcast State

Big Data Technology & Architecture

Dec 1, 2019 · Big Data

Understanding Flink LatencyMarker: End-to-End Delay Measurement and Implementation Details

This article explains the background, source‑code analysis, and practical implementation of Flink's LatencyMarker feature for measuring end‑to‑end job latency, including metric exposure, configuration options, and code snippets illustrating how latency markers are emitted and processed within the streaming pipeline.

Big DataEnd-to-End LatencyFlink

0 likes · 6 min read

Understanding Flink LatencyMarker: End-to-End Delay Measurement and Implementation Details

Big Data Technology & Architecture

Nov 29, 2019 · Big Data

Understanding Flink's Memory Management and Data Flow Architecture

This article explains how Flink manages memory through its MemorySegment abstraction, the implementations of HeapMemorySegment and HybridMemorySegment, the role of ByteBuffer, NetworkBufferPool and LocalBufferPool, and details the end‑to‑end data flow from RecordWriter to Netty transport, including key code snippets.

Big DataData FlowFlink

0 likes · 16 min read

Understanding Flink's Memory Management and Data Flow Architecture

58 Tech

Nov 29, 2019 · Big Data

Application of Big Data and Algorithms in the Real‑Estate Internet

The talk presented at the Shanghai Computer Society Annual Meeting details how big data and algorithms are leveraged in the real‑estate internet sector to enhance user personalization, improve agent matching, and assess video quality, illustrating practical implementations and performance gains across data collection, modeling, and recommendation pipelines.

AIBig DataReal Estate

0 likes · 10 min read

Application of Big Data and Algorithms in the Real‑Estate Internet

Efficient Ops

Nov 28, 2019 · Operations

Master Modern IT Operations: Skill Maps, ELK Architectures & Big Data Monitoring

This article explores the evolving landscape of IT operations, detailing role specializations, comprehensive skill maps for system, web, big data, and container ops, and compares three ELK logging architectures while emphasizing a data‑driven approach to monitoring and incident response.

Big DataELKIT Operations

0 likes · 11 min read

Master Modern IT Operations: Skill Maps, ELK Architectures & Big Data Monitoring

Mafengwo Technology

Nov 28, 2019 · Big Data

Why NiFi Beats Flink: Practical Data Flow for Recommendation Engines

This article explains why the team prefers Apache NiFi over Flink or Storm for data‑flow handling in information‑stream recommendation systems, outlines NiFi’s core components, features, cluster setup, custom processor development, and real‑world use cases such as HDFS, Elasticsearch, and RocketMQ integrations.

Big DataNiFiProcessor Development

0 likes · 17 min read

Why NiFi Beats Flink: Practical Data Flow for Recommendation Engines

YooTech Youzu Tech Team

Nov 28, 2019 · Big Data

How Data Ingestion Evolved at Youzu: From HTTP to Real‑Time DTS & ETL

This article traces the evolution of Youzu's data platform ingestion, comparing early HTTP/script methods with modern DTS and real‑time ETL solutions, evaluating middleware choices, detailing core system architectures, and outlining future improvements for reliable, scalable data access.

Big DataDTSETL

0 likes · 6 min read

How Data Ingestion Evolved at Youzu: From HTTP to Real‑Time DTS & ETL

Big Data Technology & Architecture

Nov 28, 2019 · Big Data

Resolving Unsupported Oracle Data Types in Spark SQL via Custom JdbcDialects

This article explains how to overcome Spark SQL's inability to handle certain Oracle data types, such as Timestamp with local timezone and FLOAT(126), by creating and registering a custom JdbcDialect that remaps unsupported types to compatible Spark types.

Big DataCustom DialectETL

0 likes · 8 min read

Resolving Unsupported Oracle Data Types in Spark SQL via Custom JdbcDialects

58 Tech

Nov 27, 2019 · Information Security

Evolution and Architecture of a Big Data‑Driven Security Portrait System at 58.com

The article details the design, multi‑stage evolution, and operational impact of a big‑data‑based security portrait platform built by 58.com, describing its data pipelines, real‑time risk tagging, strategy scheduling, configuration management, and overall architecture that enable large‑scale threat detection and mitigation.

Big DataRisk managementsecurity

0 likes · 15 min read

Evolution and Architecture of a Big Data‑Driven Security Portrait System at 58.com

Big Data Technology & Architecture

Nov 26, 2019 · Big Data

Understanding Flink SQL Window Functions: Types, Implementation, and Emit Triggers

This article provides a comprehensive overview of Flink SQL window functions, detailing time‑based window types, their underlying implementation in the StreamExecGroupWindowAggregate operator, the processing flow of WindowOperator, timer handling, emit/trigger strategies, and practical code examples for Tumble, Hop, and Session windows.

Big DataEmitFlink

0 likes · 20 min read

Understanding Flink SQL Window Functions: Types, Implementation, and Emit Triggers

Java High-Performance Architecture

Nov 26, 2019 · Fundamentals

How Bloom Filters Efficiently Detect Element Presence in Massive Datasets

This article explains the concept, typical use cases such as preventing database misses and cache penetration, the underlying hash‑based implementation with examples, and shows how to deploy a Bloom filter using RedisBloom, providing a practical guide for handling huge data sets.

Big DataRedisBloombloom-filter

0 likes · 6 min read

How Bloom Filters Efficiently Detect Element Presence in Massive Datasets

Architecture Digest

Nov 25, 2019 · Big Data

Introduction to Apache Kafka: Core Concepts, Architecture, and APIs

This article provides a comprehensive overview of Apache Kafka, covering its fundamental capabilities, typical use cases, core components, key APIs, and essential concepts such as topics, partitions, segments, brokers, producers, and consumers, illustrated with diagrams.

APIsBig DataDistributed Systems

0 likes · 8 min read

Introduction to Apache Kafka: Core Concepts, Architecture, and APIs

Big Data Technology & Architecture

Nov 24, 2019 · Big Data

Common Apache Kafka Exceptions and Their Causes

This article lists frequent Apache Kafka exceptions such as UnknownTopicOrPartitionException, LEADER_NOT_AVAILABLE, NotLeaderForPartitionException, TimeoutException, RecordTooLargeException, and others, explaining each error message, typical reasons, and practical troubleshooting steps for producers and consumers.

Big DataConsumerError Handling

0 likes · 5 min read

Common Apache Kafka Exceptions and Their Causes

Tianxing Digital Tech User Experience

Nov 22, 2019 · Product Management

Can Tesla’s Shadow‑Mode Revolutionize Product Design Evaluation?

This article examines the shortcomings of traditional usability testing, explains Tesla’s shadow‑mode data collection and high‑precision mapping, and proposes how the same AI‑driven, data‑rich approach can be adapted to create a self‑learning, automated product‑design evaluation and iteration cycle.

AIBig DataProduct Design

0 likes · 14 min read

Can Tesla’s Shadow‑Mode Revolutionize Product Design Evaluation?

Architecture Digest

Nov 22, 2019 · Big Data

Elasticsearch Optimization Practices for Large‑Scale Data Platforms

This article presents a comprehensive guide to optimizing Elasticsearch for massive data volumes, covering Lucene fundamentals, index and shard design, practical performance‑tuning techniques, and real‑world testing results that enable cross‑month queries and sub‑second response times.

Big DataElasticsearchIndex Optimization

0 likes · 14 min read

Elasticsearch Optimization Practices for Large‑Scale Data Platforms

Meituan Technology Team

Nov 21, 2019 · Big Data

Designing a Platformized Jupyter Service Integrated with Spark for Meituan

Meituan Homestay created a platform‑wide Jupyter service built on JupyterHub and Kubernetes that integrates Spark, scheduling, documentation and storage, providing seamless, reproducible notebooks with custom extensions, magics and container isolation to unify data analysis, model training and production workflows.

Big DataJupyterKubernetes

0 likes · 19 min read

Designing a Platformized Jupyter Service Integrated with Spark for Meituan

DataFunTalk

Nov 21, 2019 · Big Data

Evolution of 58.com Real-Time Computing Platform and the One-Stop Streaming Data Processing System Wstream

The article details the technical evolution of 58.com’s real-time computing platform—from Storm and Spark Streaming to a Flink‑based one‑stop solution called Wstream—covering use cases, architecture, stability measures, migration from Storm, operational diagnostics, and future development plans.

Big DataFlinkReal-time Streaming

0 likes · 11 min read

Evolution of 58.com Real-Time Computing Platform and the One-Stop Streaming Data Processing System Wstream

Xianyu Technology

Nov 21, 2019 · Big Data

Event-Driven Rule Engine for User Growth at Xianyu

To accelerate growth on Xianyu’s 20 million‑DAU platform, the team built an event‑driven rule engine with a SQL‑like DSL that translates user‑behavior streams into real‑time Flink/Blink queries, cutting rule development from four days to half a day and achieving sub‑5‑second processing latency.

Big DataDSLEvent Stream

0 likes · 9 min read

Event-Driven Rule Engine for User Growth at Xianyu

JD Retail Technology

Nov 19, 2019 · Industry Insights

How JD.com Is Building an Open, Integrated Tech Ecosystem Across Retail, Logistics, and Cloud

JD.com's 2019 JDDiscovery conference revealed a comprehensive, cloud‑native technology landscape that spans AI, big data, IoT, and blockchain, detailing how the company has transformed its integrated retail, logistics, and finance systems into modular, open‑service solutions for external partners.

Artificial IntelligenceBig DataCloud Computing

0 likes · 9 min read

How JD.com Is Building an Open, Integrated Tech Ecosystem Across Retail, Logistics, and Cloud

Big Data Technology & Architecture

Nov 18, 2019 · Big Data

Understanding JVM Garbage Collection and Flink Memory Management

This article explains the fundamentals of JVM garbage collection, its generational algorithms and associated performance issues, and then details Apache Flink's memory management architecture, including MemorySegment, off‑heap buffers, serialization mechanisms, and type information for efficient big‑data processing.

Big DataFlinkGarbage Collection

0 likes · 7 min read

Understanding JVM Garbage Collection and Flink Memory Management

Big Data Technology & Architecture

Nov 16, 2019 · Big Data

Understanding SparkSQL Join Algorithms: Shuffle Hash Join, Broadcast Hash Join, and Sort Merge Join

This article explains SparkSQL's three join strategies—Shuffle Hash Join, Broadcast Hash Join, and Sort Merge Join—detailing their mechanisms, when to use each based on table size, and their relative performance costs in distributed big‑data environments.

Big DataBroadcast JoinHash Join

0 likes · 5 min read

Understanding SparkSQL Join Algorithms: Shuffle Hash Join, Broadcast Hash Join, and Sort Merge Join

iQIYI Technical Product Team

Nov 15, 2019 · Industry Insights

How iQIYI’s Big Data Middle Platform Fuels Scalable Entertainment Innovation

The article analyzes iQIYI’s big‑data middle‑platform strategy, detailing its origins, architecture, digital‑asset management, governance principles and how a unified, transparent, and compatible data platform enables user‑centric, scalable innovation across the entertainment ecosystem.

AnalyticsBig DataData Governance

0 likes · 9 min read

How iQIYI’s Big Data Middle Platform Fuels Scalable Entertainment Innovation

Big Data Technology & Architecture

Nov 14, 2019 · Big Data

Comparison of Flink and Spark Structured Streaming: Joins, State Management, Fault Tolerance, and Backpressure

This article compares Flink and Spark Structured Streaming, detailing their differences in join capabilities, state management, fault‑tolerance mechanisms, exactly‑once semantics, back‑pressure handling, and table registration, while providing code examples and practical insights for real‑time big‑data processing.

Big DataFlinkJOIN

0 likes · 13 min read

Comparison of Flink and Spark Structured Streaming: Joins, State Management, Fault Tolerance, and Backpressure

Tencent Cloud Developer

Nov 14, 2019 · Big Data

Tencent Announces Open‑Source High‑Performance Graph Computing Framework Plato

Tencent has open‑sourced its high‑performance graph computing framework Plato, which can process billion‑node graphs in minutes on as few as ten servers, outpacing Spark GraphX by up to two orders of magnitude, and supports offline computation, representation learning, and integration with Kubernetes/YARN for social, recommendation, and biomedical applications.

Big DataDistributed SystemsOpen-source

0 likes · 7 min read

Tencent Announces Open‑Source High‑Performance Graph Computing Framework Plato

Big Data Technology & Architecture

Nov 13, 2019 · Databases

ClickHouse Engines: Use Cases, Syntax, and Limitations

This article provides a comprehensive overview of ClickHouse, covering its typical application scenarios, inherent limitations, common SQL syntax, default values, data types, materialized and expression columns, and detailed explanations of its various storage engines such as TinyLog, Log, Memory, Merge, Distributed, Null, Buffer, Set, MergeTree, ReplacingMergeTree, SummingMergeTree, AggregatingMergeTree, and CollapsingMergeTree, accompanied by practical code examples.

Big DataClickHouseDatabase Engines

0 likes · 25 min read

ClickHouse Engines: Use Cases, Syntax, and Limitations

DataFunTalk

Nov 13, 2019 · Big Data

ByteDance’s Core Optimization Practices on Spark SQL

ByteDance’s data warehouse team shares comprehensive optimizations for Spark SQL, covering architecture overview, bucket join enhancements, materialized columns and views, and shuffle stability and performance improvements, illustrating practical techniques that boost query efficiency and job reliability in large‑scale big‑data environments.

Big DataMaterialized ColumnsShuffle Optimization

0 likes · 20 min read

ByteDance’s Core Optimization Practices on Spark SQL

DevOps

Nov 11, 2019 · Operations

Capital One DevOps Transformation: Data‑Driven Innovation, Cloud Migration, and AI‑Enabled Services

This case study details Capital One’s evolution from a regional credit‑card unit to a data‑centric financial giant, highlighting its vision, data‑driven product strategy, big‑data analytics, AI‑powered customer service, cloud migration to AWS, and the DevOpsSec practices that enabled rapid, secure, and scalable innovation across banking, automotive finance, and digital services.

Big DataDevOpsFintech

0 likes · 19 min read

Capital One DevOps Transformation: Data‑Driven Innovation, Cloud Migration, and AI‑Enabled Services

Big Data Technology & Architecture

Nov 9, 2019 · Big Data

OneData: A Comprehensive Big Data Architecture and Governance Framework

This article presents the OneData methodology for building a robust big‑data platform, detailing background challenges, goals, unified input and output strategies, model design, naming conventions, data‑cleaning rules, and the resulting business benefits and future outlook.

Big DataData GovernanceOnedata

0 likes · 13 min read

OneData: A Comprehensive Big Data Architecture and Governance Framework

Suning Technology

Nov 9, 2019 · Operations

Suning’s 2019 Smart Retail White Paper Unveils Digital Store Trends

The 2019 Suning Smart Retail White Paper analyzes the digital transformation of Chinese retail stores, highlighting AI, big data, O2O integration, and operational efficiencies that give retailers a competitive edge in the evolving market.

AIBig DataO2O

0 likes · 5 min read

Suning’s 2019 Smart Retail White Paper Unveils Digital Store Trends

Big Data Technology & Architecture

Nov 9, 2019 · Big Data

Comparative Study of Apache Flink and Spark Streaming at Xiaomi: Architecture, Performance, and Serialization

This article examines Xiaomi's migration from Spark Streaming to Apache Flink, comparing scheduling strategies, mini‑batch versus true streaming, resource utilization, latency, and serialization mechanisms, and concludes with practical insights and custom optimization techniques for large‑scale data processing.

Big DataFlinkMini-Batch

0 likes · 17 min read

Comparative Study of Apache Flink and Spark Streaming at Xiaomi: Architecture, Performance, and Serialization

JD Retail Technology

Nov 7, 2019 · Industry Insights

How JD’s Advertising Architecture Scaled for 11.11: Lessons in Cost‑Cutting and Performance

The article details how JD’s advertising division tackled the massive traffic surge of the 11.11 shopping festival by expanding shard capacity, optimizing models and data pipelines, migrating workloads to the cloud, and implementing cost‑saving measures that together ensured stable, high‑performance ad delivery.

AdvertisingBig DataPerformance Optimization

0 likes · 7 min read

How JD’s Advertising Architecture Scaled for 11.11: Lessons in Cost‑Cutting and Performance

DataFunTalk

Nov 7, 2019 · Big Data

Real-Time Computing Engine at Beike: Architecture, Practices, and Future Plans

This article details Beike's real‑time computing engine, covering its background, streaming platform built on Spark Streaming and Flink, data ingestion via Kafka, metadata handling, SQL‑based task development, monitoring, storage solutions, and future roadmap for resource management and AI‑enhanced monitoring.

Big DataFlinkKafka

0 likes · 14 min read

Real-Time Computing Engine at Beike: Architecture, Practices, and Future Plans

Xianyu Technology

Nov 7, 2019 · Big Data

Sequence Pattern Mining for User Behavior Analysis in Xianyu

By applying sequence pattern mining and unsupervised clustering to Xianyu’s massive event logs, the study abstracts high‑level user behaviors, discovers frequent subsequences, uncovers unknown fraudulent account patterns, expands known fraud cohorts with 99 % precision, and enables richer analyses such as PCA‑based cross‑group comparisons.

Big Dataclusteringdata mining

0 likes · 8 min read

Sequence Pattern Mining for User Behavior Analysis in Xianyu

360 Zhihui Cloud Developer

Nov 5, 2019 · Operations

How 360 Scaled AIOps: From Data to Self‑Healing Operations

At the 360 Internet Technology Training Camp, experts detailed how AI-driven AIOps can transform large‑scale operations, covering data collection, model‑based anomaly detection, alert correlation, self‑healing workflows, and visual dashboards, and presented a practical end‑to‑end framework that other companies can adopt quickly.

Big DataOperationsaiops

0 likes · 15 min read

How 360 Scaled AIOps: From Data to Self‑Healing Operations

Architecture Digest

Nov 5, 2019 · Big Data

Architecture Overview of Taobao, Meituan, and Didi Big Data Platforms

This article examines the big‑data architectures of three leading Chinese internet companies—Taobao, Meituan, and Didi—detailing their data sources, synchronization mechanisms, batch and streaming processing layers, and the common scheduling components that unify their Hadoop‑based ecosystems.

Big DataData ArchitectureDidi

0 likes · 7 min read

Architecture Overview of Taobao, Meituan, and Didi Big Data Platforms

Big Data Technology & Architecture

Nov 4, 2019 · Big Data

Understanding Spark Checkpoint: Purpose, Mechanism, and Best Practices

This article explains why Spark checkpoints are needed for large or complex RDD pipelines, how they work by persisting data to reliable storage such as HDFS, and outlines practical steps and best‑practice recommendations for using checkpoints effectively in production environments.

Big DataCheckpointHDFS

0 likes · 6 min read

Understanding Spark Checkpoint: Purpose, Mechanism, and Best Practices

Efficient Ops

Nov 3, 2019 · Operations

How Beijing Mobile Achieved Tier‑3 DevOps Maturity: A Deep Dive into Continuous Delivery

This article details Beijing Mobile's successful Tier‑3 DevOps standard assessment, showcasing their micro‑service, container‑based performance management system, the role of standards and tooling in boosting efficiency, and insights from a Q&A with senior engineers on implementation challenges and future DevOps prospects.

AIBig DataContainerization

0 likes · 11 min read

How Beijing Mobile Achieved Tier‑3 DevOps Maturity: A Deep Dive into Continuous Delivery

Efficient Ops

Nov 3, 2019 · Operations

How Zhejiang Mobile Is Pioneering AIOps to Reach NoOps

Zhejiang Mobile’s IT department chronicles its journey from a 2015 cloud‑native initiative to a cutting‑edge AIOps transformation, detailing a six‑level NoOps roadmap, digital fault‑governance, middle‑platform consolidation, organizational agility, and measurable operational gains that position it as a telecom industry leader.

Artificial IntelligenceBig DataDigital Transformation

0 likes · 7 min read

How Zhejiang Mobile Is Pioneering AIOps to Reach NoOps

Big Data Technology & Architecture

Nov 3, 2019 · Big Data

Understanding Spark Shuffle and Smart Shuffle: Design, Implementation, and Performance Analysis

This article explains the evolution of Spark Shuffle from hash‑based to sort‑based, introduces the Smart Shuffle optimization, details their implementations and configurations, and presents performance comparisons using TPC‑DS benchmarks, highlighting significant speedups and reduced I/O overhead.

Big DataShuffleSmart Shuffle

0 likes · 7 min read

Understanding Spark Shuffle and Smart Shuffle: Design, Implementation, and Performance Analysis

Big Data Technology & Architecture

Nov 2, 2019 · Big Data

Evolution of Elasticsearch Cluster Architecture for JD Daojia Order Center

This article details how JD Daojia's order center migrated its Elasticsearch cluster through multiple architectural stages—from an initial loosely configured setup to a real‑time dual‑cluster solution—addressing scalability, high availability, data synchronization, and performance optimization for billions of documents and hundreds of millions of daily queries.

Big DataCluster ArchitectureElasticsearch

0 likes · 12 min read

Evolution of Elasticsearch Cluster Architecture for JD Daojia Order Center

Big Data Technology & Architecture

Oct 30, 2019 · Big Data

Building a Real‑Time Data Processing Pipeline with Apache Kafka, Spark Streaming, and Cassandra

This tutorial explains how to create a highly scalable, fault‑tolerant real‑time data processing platform by configuring a Kafka topic, a Cassandra keyspace, adding Spark and connector dependencies, developing a Java‑based Spark Streaming pipeline, enabling checkpoints, and deploying the application with spark‑submit.

Big DataKafkaReal-Time

0 likes · 8 min read

Building a Real‑Time Data Processing Pipeline with Apache Kafka, Spark Streaming, and Cassandra

Alibaba Cloud Developer

Oct 30, 2019 · Big Data

How Real-Time Big Data Pipelines Detect E‑Commerce Ad Misplacements

This article explains how a large‑scale e‑commerce search advertising system uses real‑time big‑data pipelines, log synchronization, NoSQL storage, and proactive verification to automatically discover and correct ad placement errors across the entire data processing chain, protecting both advertisers and the platform.

Big Dataad verificationdata pipeline

0 likes · 13 min read

How Real-Time Big Data Pipelines Detect E‑Commerce Ad Misplacements

Big Data Technology & Architecture

Oct 28, 2019 · Big Data

Big Data Technology and Architecture: Leveraging Spark and HBase for Real‑Time and Offline Processing

This article outlines the challenges of various big‑data scenarios such as financial risk control, recommendation systems, and social feeds, explains why Spark is chosen over alternatives, describes a one‑stop data platform architecture with Spark‑HBase integration, and shares best‑practice tips and case studies.

Big DataData ArchitectureHBase

0 likes · 7 min read

Big Data Technology and Architecture: Leveraging Spark and HBase for Real‑Time and Offline Processing

Big Data Technology & Architecture

Oct 27, 2019 · Databases

ClickHouse Architecture and Performance Optimization for Large-Scale OLAP

This article outlines ClickHouse’s columnar OLAP architecture, dual‑center design, storage and write stability strategies, performance testing results, and practical query and system optimizations for handling petabyte‑scale data with high throughput and low latency requirements.

Big DataClickHouseDatabase Architecture

0 likes · 4 min read

ClickHouse Architecture and Performance Optimization for Large-Scale OLAP

DataFunTalk

Oct 25, 2019 · Big Data

Migrating Data from HBase to Kafka Using MapReduce

This article explains how to reverse the typical data flow by extracting massive Rowkeys from HBase with MapReduce, storing them on HDFS, and then using batch Get operations to retrieve the full records and write them into Kafka, while handling retries and monitoring progress.

Big DataData MigrationHBase

0 likes · 9 min read

Migrating Data from HBase to Kafka Using MapReduce

Big Data Technology Architecture

Oct 24, 2019 · Big Data

Real-Time Search Engine Indexing with Flink: Architecture and Implementation

This article explains how to build a real-time search engine indexing pipeline using Flink, covering background, batch versus incremental indexing strategies, a hybrid architecture that merges both approaches, and a concrete cloud‑based implementation involving MySQL binlog, Logtail, SLS, and Elasticsearch.

Big DataElasticsearchFlink

0 likes · 5 min read

Real-Time Search Engine Indexing with Flink: Architecture and Implementation

dbaplus Community

Oct 22, 2019 · Big Data

How Weibo Built a Billion‑Log Real‑Time Data Platform with Flink

This article details how Weibo’s advertising team designed and implemented a real‑time data platform capable of processing over a hundred billion daily logs, covering technology selection, Flink advantages, architecture evolution, data processing pipelines, component libraries, fault‑tolerance strategies, and the construction of a multi‑layer real‑time data warehouse.

Big DataCheckpointData Architecture

0 likes · 25 min read

How Weibo Built a Billion‑Log Real‑Time Data Platform with Flink

Big Data Technology & Architecture

Oct 22, 2019 · Big Data

Real-Time Data Verification: Building a Log Comparison Solution with Flink, Elasticsearch, and Hive

This article explains how to design and implement a real‑time data verification framework using Flink to generate wide tables, storing detailed records in Elasticsearch or HDFS with Hive for cross‑checking against offline data, ensuring trustworthy metrics for dashboards and stakeholders.

Big DataData verificationElasticsearch

0 likes · 7 min read

Real-Time Data Verification: Building a Log Comparison Solution with Flink, Elasticsearch, and Hive

58 Tech

Oct 21, 2019 · Big Data

Improving Information Exposure Measurement: Visible Ad Metrics and Data Processing Practices at 58 Platform

To address inaccuracies in traditional information exposure metrics, this article proposes adopting advertising visibility standards—defining visible exposure by pixel and time thresholds, implementing client-side logging, unique TID tracking, and ETL pipelines—to provide more reliable data for product strategy and user behavior analysis.

Big DataData Qualityad visibility

0 likes · 8 min read

Improving Information Exposure Measurement: Visible Ad Metrics and Data Processing Practices at 58 Platform

Big Data Technology & Architecture

Oct 20, 2019 · Big Data

Converting Spark RDD to DataSet/DataFrame: Two Methods and Handling Serialization Issues

This article explains two approaches—reflection‑based schema inference and programmatic schema definition—to transform a Spark RDD into a DataSet or DataFrame, demonstrates the required code, and discusses common Task‑not‑serializable errors with practical solutions.

Big DataDatasetRDD

0 likes · 8 min read

Converting Spark RDD to DataSet/DataFrame: Two Methods and Handling Serialization Issues

dbaplus Community

Oct 20, 2019 · Big Data

Mastering Kafka: Concepts, Installation, Optimization, and Security

This comprehensive guide covers Kafka's core concepts, design principles, installation steps, configuration tweaks, performance optimizations, permission management, common operational commands, cluster scaling, log retention settings, and monitoring scripts to help you build and maintain a robust Kafka ecosystem.

Big DataInstallationKafka

0 likes · 20 min read

Mastering Kafka: Concepts, Installation, Optimization, and Security

Architects' Tech Alliance

Oct 17, 2019 · Big Data

Understanding Alibaba's Data Middle Platform: Concepts, Architecture, and Differences from Data Warehouses and Data Lakes

The article explains Alibaba's data middle platform—its definition, methodology, organizational structure, key tools, and how it differs from traditional data warehouses and data lakes—while highlighting its role in supporting scalable, business‑centric data services and digital transformation.

AlibabaBig DataData Architecture

0 likes · 16 min read

Understanding Alibaba's Data Middle Platform: Concepts, Architecture, and Differences from Data Warehouses and Data Lakes

Big Data Technology & Architecture

Oct 17, 2019 · Big Data

Delta Lake: Architecture, Features, and Hands‑On Tutorial

This article explains the origins and motivations of Delta Lake, details its ACID transaction support, schema enforcement, metadata handling, versioning, and unified batch‑and‑stream processing, and provides a step‑by‑step Maven and Spark code tutorial for creating, updating, and querying Delta tables.

ACIDApache SparkBig Data

0 likes · 10 min read

Delta Lake: Architecture, Features, and Hands‑On Tutorial

Meituan Technology Team

Oct 17, 2019 · Big Data

OneData Methodology: Building a Unified Data Warehouse Architecture and Governance Framework

By adapting Alibaba’s OneData methodology, the project establishes a unified data‑warehouse architecture, standards, and governance framework—including consolidated business intake, standardized design layers, naming conventions, and delivery metrics—that resolves data‑quality issues, enhances scalability and reusability, and delivers faster, reliable data support for evolving business needs.

Big DataData ArchitectureData Governance

0 likes · 15 min read

OneData Methodology: Building a Unified Data Warehouse Architecture and Governance Framework

Efficient Ops

Oct 16, 2019 · Artificial Intelligence

How AIOps Is Revolutionizing IT Operations – Insights from Sina Expert Peng Dong

This interview explores the rise of AIOps, its business drivers, and practical implementation at Sina Weibo, while sharing Peng Dong’s career journey, technical challenges, and management philosophies that illustrate how AI‑driven automation is reshaping large‑scale IT operations.

Big DataIT Operationsaiops

0 likes · 12 min read

How AIOps Is Revolutionizing IT Operations – Insights from Sina Expert Peng Dong

Youku Technology

Oct 16, 2019 · Artificial Intelligence

Building an Entertainment Content Cognition Brain: AI and Big Data for the Full Content Lifecycle

The talk outlines how Alibaba’s Entertainment Brain leverages AI, big-data analytics, and psychological modeling to map content attributes and user emotions across the entire production-to-distribution lifecycle, enabling data-driven talent selection, script evaluation, real-time feedback, and predictive traffic forecasting for hit-making.

AIBig DataContent Analytics

0 likes · 11 min read

Building an Entertainment Content Cognition Brain: AI and Big Data for the Full Content Lifecycle

Efficient Ops

Oct 14, 2019 · Operations

How AIOps Transforms IT Operations: Real-World Architecture and Lessons

This article shares a practical case study of implementing AIOps in an online‑education company, covering the background pain points of massive monitoring data, the designed architecture with real‑time processing and machine‑learning pipelines, and the challenges and opportunities of intelligent operations.

Big DataIT Operationsaiops

0 likes · 14 min read

How AIOps Transforms IT Operations: Real-World Architecture and Lessons

JD Retail Technology

Oct 14, 2019 · Databases

Overview of JDNoSQL Platform and Its Real-Time Advertising Use Cases

The article introduces JDNoSQL, a distributed column‑oriented key‑value store built on HDFS, outlines its core features, describes various business scenarios including real‑time ad computation, details the system architecture with Kafka and Flink, and presents table designs for ad impression and click statistics.

Big DataFlinkKafka

0 likes · 13 min read

Overview of JDNoSQL Platform and Its Real-Time Advertising Use Cases

Big Data Technology & Architecture

Oct 14, 2019 · Big Data

Optimizing Spark PageRank: Cache, Checkpoint, Data Skew, and Resource Utilization

This article presents a comprehensive analysis of Spark PageRank performance, detailing the algorithm's basics, the original example code, and four key optimizations—caching with checkpointing, memory‑efficient data structures, handling data skew, and maximizing executor and driver resource usage—backed by experimental results and practical recommendations.

Big DataCacheCheckpoint

0 likes · 18 min read

Optimizing Spark PageRank: Cache, Checkpoint, Data Skew, and Resource Utilization

Big Data Technology & Architecture

Oct 13, 2019 · Big Data

Installing and Configuring Alibaba Canal for MySQL Binlog Capture

This guide explains how to download, install, and configure Alibaba Canal—including extracting the package, setting up canal.properties, instance.properties, and instance.xml files, and tuning key parameters—to enable reliable MySQL binlog capture for big‑data pipelines.

Big DataBinlogCanal

0 likes · 13 min read

Installing and Configuring Alibaba Canal for MySQL Binlog Capture

Big Data Technology & Architecture

Oct 13, 2019 · Big Data

Building a Simple Canal-to-Kafka Demo with Maven Dependencies and Java Code

This guide introduces the canal‑kafka integration package, outlines its constraints, and provides a step‑by‑step tutorial with Maven dependencies and Java source code for a SimpleCanalClient, a Kafka producer, and a server class, enabling a functional demo of Canal to Kafka data streaming.

Big DataCanalData Integration

0 likes · 8 min read

Building a Simple Canal-to-Kafka Demo with Maven Dependencies and Java Code

58 Tech

Oct 10, 2019 · Big Data

Optimizing Real‑Time Feature Extraction at 58.com: Migrating from Spark Streaming to Flink

This article describes how 58.com’s commercial engineering team redesigned its real‑time feature‑mining pipeline—replacing a minute‑level Spark Streaming framework with Flink—to achieve sub‑second latency, higher throughput, stronger fault‑tolerance, and end‑to‑end exactly‑once semantics for user‑profile generation in the second‑hand‑car recommendation scenario.

Big DataExactly-OnceFlink

0 likes · 14 min read

Optimizing Real‑Time Feature Extraction at 58.com: Migrating from Spark Streaming to Flink

Sohu Tech Products

Oct 9, 2019 · Databases

HBase Table Design Strategies: Data Model, Column Descriptors, RowKey, Region and Performance Optimization

This article explains HBase’s data model and provides comprehensive table‑design strategies—including column‑descriptor options, row‑key best practices, high‑vs‑wide table trade‑offs, region splitting and pre‑splitting techniques—to help achieve optimal performance and scalability in large‑scale NoSQL workloads.

Big DataColumn FamilyHBase

0 likes · 16 min read

HBase Table Design Strategies: Data Model, Column Descriptors, RowKey, Region and Performance Optimization

Big Data Technology & Architecture

Oct 9, 2019 · Big Data

Choosing and Using Flink State Backends: MemoryStateBackend, FsStateBackend, and RocksDBStateBackend

This article explains how Flink checkpoints persist state, compares the three built‑in state backends (MemoryStateBackend, FsStateBackend, RocksDBStateBackend), discusses their configurations, advantages, limitations, and provides guidance on selecting the appropriate backend for different big‑data streaming scenarios.

Big DataCheckpointFlink

0 likes · 10 min read

Choosing and Using Flink State Backends: MemoryStateBackend, FsStateBackend, and RocksDBStateBackend

Alibaba Cloud Infrastructure

Oct 9, 2019 · Cloud Computing

The Next Decade of Cloud Networking: Highlights from Alibaba Cloud Network Forum at Yunqi Conference 2019

The 2019 Yunqi Conference Cloud Network Forum gathered over two hundred network enthusiasts to review a decade of Alibaba data‑center networking evolution, explore emerging technologies such as AI, big data, and programmable chips, and outline the next ten years of high‑performance, data‑centric cloud networking.

Big DataHigh‑Performance Networkingnetwork architecture

0 likes · 9 min read

The Next Decade of Cloud Networking: Highlights from Alibaba Cloud Network Forum at Yunqi Conference 2019

dbaplus Community

Oct 8, 2019 · Big Data

How to Master Large-Scale Cluster Management: 10 Real-World Troubleshooting Cases

This article shares a senior data‑platform engineer's hands‑on experience managing dozens of thousand‑node clusters, detailing nine common cluster problems and step‑by‑step solutions—including performance tuning, RPC fixes, HDFS cleanup, Hive metadata repair, Spark shuffle optimization, HBase region recovery, and Kafka bottleneck mitigation.

Big DataCluster ManagementHBase

0 likes · 17 min read

How to Master Large-Scale Cluster Management: 10 Real-World Troubleshooting Cases

Big Data Technology & Architecture

Oct 8, 2019 · Big Data

Handling Deprecated Flink API: Converting Legacy TypeInformation to DataTypes

After Flink 1.9 deprecated the legacy Type API in favor of DataTypes, users encounter missing schema TypeInformation methods, and this article explains the root cause and provides a code solution to convert legacy types using TypeConversions and register a TableSink.

Big DataDataTypesFlink

0 likes · 2 min read

Handling Deprecated Flink API: Converting Legacy TypeInformation to DataTypes

Architects' Tech Alliance

Oct 7, 2019 · Industry Insights

How Google’s Vision Drove the PC Web, Big Data, and Cloud Revolutions

The article traces Google’s decade‑long impact on the evolution of the PC Web era, its pioneering technologies in search, email, infrastructure, big data, cloud computing, and mobile, explaining how its philosophy both propelled and missed commercial opportunities across each wave of internet innovation.

Big DataCloud ComputingGoogle

0 likes · 11 min read

How Google’s Vision Drove the PC Web, Big Data, and Cloud Revolutions

Big Data Technology & Architecture

Oct 3, 2019 · Big Data

Data Development Interview Tips and Career Guidance

This article offers practical advice for data development job interviews, explaining why Java is essential, comparing Java and Python, outlining required backend framework knowledge, discussing the role of SQL and data warehousing, and addressing work‑life concerns such as overtime and company size choices.

Big DataPythoncareer advice

0 likes · 4 min read

Data Development Interview Tips and Career Guidance

Programmer DD

Sep 29, 2019 · Big Data

Can 1.4 Billion People Share a Single WeChat Group? A Technical Deep‑Dive

This article explores whether it is technically feasible to place all 1.4 billion Chinese users into one WeChat group, analyzing population statistics, message volume, CPU processing limits, network bandwidth, storage requirements, and cost implications with supporting calculations and references.

Big DataDistributed SystemsNetwork Bandwidth

0 likes · 12 min read

Can 1.4 Billion People Share a Single WeChat Group? A Technical Deep‑Dive

Architects Research Society

Sep 28, 2019 · Artificial Intelligence

Data Mining and Machine Learning: Concepts, Process, and Software Catalog

This article explains the fundamentals of data mining and machine learning, outlines the knowledge discovery process and typical analytical tasks, and provides an extensive alphabetically ordered list of software tools used for these technologies.

AIBig Datamachine learning

0 likes · 7 min read

Data Mining and Machine Learning: Concepts, Process, and Software Catalog

Xueersi Online School Tech Team

Sep 27, 2019 · Big Data

Design Principles and Architecture of Apache Kylin for Sub‑Second OLAP Queries

This article explains how Apache Kylin, an open‑source distributed analytics engine built on Hadoop/Spark, achieves sub‑second OLAP query performance through pre‑computed cubes, a layered cuboid generation algorithm, bitmap‑based distinct counting, dimension optimization techniques, and tight integration with HBase for storage and fast SQL querying.

Apache KylinBig DataCube

0 likes · 15 min read

Design Principles and Architecture of Apache Kylin for Sub‑Second OLAP Queries

JD Retail Technology

Sep 27, 2019 · Big Data

How to Become a Spark Committer: The Journey of JD’s Zheng Ruifeng

The article chronicles JD engineer Zheng Ruifeng’s path to becoming a Spark Committer, highlighting his early involvement, key contributions to Spark’s ML and GraphX components, the community’s scale, and his vision for future improvements in the big‑data platform.

Apache SparkBig DataCommitter

0 likes · 6 min read

How to Become a Spark Committer: The Journey of JD’s Zheng Ruifeng

Meituan Technology Team

Sep 26, 2019 · Big Data

Big Data Technology: Commercial Applications and Practice – A Collaborative Course between Meituan and Tsinghua University

Meituan’s big‑data team and Tsinghua’s Electronic Engineering Department have launched a master‑level, credit‑bearing course that blends theory with 24 hours of hands‑on training, showcases Meituan’s real‑world data infrastructure and applications, and aims to create a recurring bridge between academia and industry while recruiting top talent.

Big DataCommercial ApplicationData Analytics

0 likes · 6 min read

Big Data Technology: Commercial Applications and Practice – A Collaborative Course between Meituan and Tsinghua University

Big Data Technology & Architecture

Sep 25, 2019 · Big Data

Designing and Using Global Secondary Indexes in Apache Phoenix

This article explains how Apache Phoenix implements global secondary indexes using separate HBase tables, demonstrates index creation and data synchronization with example SQL, and provides design guidelines to optimize query latency and avoid full‑table scans in big‑data environments.

Big DataHBasePhoenix

0 likes · 4 min read

Designing and Using Global Secondary Indexes in Apache Phoenix

Huawei Cloud Developer Alliance

Sep 25, 2019 · Cloud Computing

How Huawei’s OceanConnect IoT Platform Powers Smart Cities, Connected Cars, and More

This article provides a comprehensive overview of Huawei's OceanConnect IoT platform, detailing its market positioning, key application scenarios such as connected vehicles and smart cities, core product capabilities, deployment options, security measures, and underlying software architecture.

Big DataCloud ComputingConnected Car

0 likes · 12 min read

How Huawei’s OceanConnect IoT Platform Powers Smart Cities, Connected Cars, and More

dbaplus Community

Sep 24, 2019 · Big Data

How Weibo Turns Big Data into Revenue: Insights from a 2019 DAMS Talk

The presentation explains how Weibo leverages big‑data technologies, user profiling, and social‑first advertising models to drive commercial growth, detailing data‑driven product development, real‑time and offline data warehouses, scientific experiments, and case studies that illustrate the impact on revenue and user engagement.

AdvertisingBig DataGrowth Hacking

0 likes · 24 min read

How Weibo Turns Big Data into Revenue: Insights from a 2019 DAMS Talk

Alibaba Cloud Developer

Sep 24, 2019 · Artificial Intelligence

How Semi‑Supervised Deep Learning Detects Road Closures in Real‑Time

Gaode’s engineering team presents a semi‑supervised deep‑learning framework that models road networks, extracts traffic, routing, deviation and heatmap features, and combines LSTM with ResNet to accurately identify dynamic road‑closure events, enabling both offline and real‑time detection with high confidence and business‑aligned validation.

Big DataLSTMResNetSemi-supervised Learning

0 likes · 12 min read

How Semi‑Supervised Deep Learning Detects Road Closures in Real‑Time