Tagged articles
3675 articles
Page 30 of 37
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 2, 2019 · Big Data

Integrating Apache Flink with Apache Pulsar for Scalable Elastic Data Processing

This article explains how Apache Pulsar and Apache Flink can be combined to provide a unified, scalable, and fault‑tolerant data processing platform, covering Pulsar's architecture, its differences from other messaging systems, various integration patterns, and concrete code examples for stream and batch workloads.

Apache FlinkApache PulsarBig Data
0 likes · 13 min read
Integrating Apache Flink with Apache Pulsar for Scalable Elastic Data Processing
21CTO
21CTO
Jul 2, 2019 · Operations

How to Build Ultra‑Reliable Systems: Multi‑Level Caching, Isolation, and Monitoring Strategies

This article outlines practical techniques for achieving high system availability, covering multi‑level caching, dynamic group switching, database and service isolation across data centers, concurrency control, gray‑release deployment, comprehensive monitoring, graceful degradation, and data consistency models, with insights on leveraging big‑data pipelines for intelligent logistics.

Big Datacachingcanary release
0 likes · 10 min read
How to Build Ultra‑Reliable Systems: Multi‑Level Caching, Isolation, and Monitoring Strategies
ITPUB
ITPUB
Jul 2, 2019 · Databases

How ClickHouse Powers Ctrip’s Hotel Data Platform for Billions of Daily Updates

This article explains how Ctrip’s hotel data intelligence platform handles over ten billion daily data updates and nearly a million queries by adopting ClickHouse, detailing the system's background, the reasons for choosing ClickHouse over other solutions, the data ingestion pipelines, monitoring strategies, operational practices, and performance outcomes.

Big DataClickHouseReal-time analytics
0 likes · 13 min read
How ClickHouse Powers Ctrip’s Hotel Data Platform for Billions of Daily Updates
DataFunTalk
DataFunTalk
Jul 2, 2019 · Artificial Intelligence

From Zero to Autonomous Driving: Pony.ai’s Technical Journey

The article traces the evolution of autonomous driving from early concepts to modern implementations, highlighting Pony.ai’s technical innovations in sensor fusion, high‑definition mapping, simulation, data processing, software iteration, and the challenges of scaling vehicle fleets for commercial deployment.

AIBig DataPony.ai
0 likes · 12 min read
From Zero to Autonomous Driving: Pony.ai’s Technical Journey
58 Tech
58 Tech
Jul 2, 2019 · Artificial Intelligence

Magic Mirror: A Visual Data‑Intelligence Platform for Low‑Code Machine Learning

Magic Mirror is a big‑data‑based visual analytics platform that lowers the barrier of machine‑learning for non‑experts while accelerating expert workflows through visual UI, modular algorithms, distributed feature generation, and automated binary‑classification modeling.

Automated ModelingBig DataSpark
0 likes · 9 min read
Magic Mirror: A Visual Data‑Intelligence Platform for Low‑Code Machine Learning
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 1, 2019 · Big Data

Why Lambda, Kappa, and Lambda+ Are Shaping Modern Big Data Architecture

This article examines the technical challenges of large‑scale data processing, compares the classic Lambda and Kappa architectures, introduces the unified stream‑batch Lambda+ design built on Tablestore and Blink, and outlines suitable scenarios and practical solutions for modern big‑data systems.

Big DataCloud ComputingKappa architecture
0 likes · 16 min read
Why Lambda, Kappa, and Lambda+ Are Shaping Modern Big Data Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 30, 2019 · Big Data

Curated Collection of Big Data, Flink, Hadoop and Real‑Time Computing Articles from the “Big Data Technology and Architecture” Series

This article presents a carefully organized catalogue of over a hundred technical posts covering Flink source‑code analysis, fundamental and advanced big‑data structures, Hadoop ecosystem components, real‑time streaming with Spark and Kafka, as well as system design guidelines and miscellaneous insights, each linked to its original publication for easy reference.

Big DataDistributed SystemsFlink
0 likes · 6 min read
Curated Collection of Big Data, Flink, Hadoop and Real‑Time Computing Articles from the “Big Data Technology and Architecture” Series
21CTO
21CTO
Jun 28, 2019 · Fundamentals

Beijing’s Software Industry Surpasses Trillion-Yuan Mark: 2019 Report Highlights

The 2019 Beijing Software and Information Service Industry Development Report reveals that the sector’s scale exceeded one trillion yuan, with double‑digit growth in cloud computing, big data, AI and cybersecurity, while talent, investment, and regional collaboration propelled the city to a leading national position.

BeijingBig DataInformation Security
0 likes · 9 min read
Beijing’s Software Industry Surpasses Trillion-Yuan Mark: 2019 Report Highlights
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 24, 2019 · Big Data

Hive Optimization Techniques: Column/Partition Pruning, Predicate Pushdown, Join Strategies, and MapReduce Tuning

This article provides a comprehensive guide to improving Hive query performance by covering column and partition pruning, predicate pushdown, replacing ORDER BY with SORT BY, using GROUP BY instead of DISTINCT, fine‑tuning join operations, and optimizing MapReduce parameters such as mapper/reducer counts, file merging, compression, JVM reuse, parallel execution, strict mode, and storage formats.

Big DataJOIN optimizationMapReduce
0 likes · 19 min read
Hive Optimization Techniques: Column/Partition Pruning, Predicate Pushdown, Join Strategies, and MapReduce Tuning
Didi Tech
Didi Tech
Jun 22, 2019 · Big Data

Analysis of Hadoop RPC Architecture and Implementation

The article examines Hadoop’s RPC framework—detailing its client‑server workflow, core classes (RPC, Client, Server), dynamic proxy handling, NIO‑based server threading, configurable concurrency controls such as FairCallQueue, and a practical HDFS mkdir command example, illustrating high‑performance distributed communication.

Big DataHadoopRPC
0 likes · 17 min read
Analysis of Hadoop RPC Architecture and Implementation
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 20, 2019 · Big Data

Comprehensive Guide to Flink SQL: Background, New Features, Programming Model, Operators, Functions, and a Practical NBA Scoring Leader Example

This article provides an in‑depth overview of Flink SQL, covering its origins, the latest 1.7.0 and 1.8.0 enhancements, the underlying programming model, common operators and built‑in functions, and a complete end‑to‑end example that analyzes NBA scoring‑leader data using Flink SQL.

Apache FlinkBig DataFlink SQL
0 likes · 27 min read
Comprehensive Guide to Flink SQL: Background, New Features, Programming Model, Operators, Functions, and a Practical NBA Scoring Leader Example
Suning Technology
Suning Technology
Jun 20, 2019 · Fundamentals

How Suning’s Digital Transformation Is Shaping the Future of Retail

Suning’s senior tech leader explains how the company leveraged AI, big data, cloud computing and IoT to drive a digital‑first retail ecosystem, illustrating the broader shift toward intelligent, data‑driven retail operations in a rapidly changing market.

Artificial IntelligenceBig DataCloud Computing
0 likes · 4 min read
How Suning’s Digital Transformation Is Shaping the Future of Retail
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 19, 2019 · Big Data

Understanding Spark Structured Streaming StateStore: Architecture, Operations, and Fault Recovery

This article explains the design and implementation of Spark Structured Streaming's StateStore module, covering its distributed architecture, state sharding, versioning, batch read/write, migration, update/query APIs, maintenance compaction, and fault‑tolerance mechanisms that enable incremental continuous queries with exactly‑once guarantees.

Big DataSparkStateStore
0 likes · 8 min read
Understanding Spark Structured Streaming StateStore: Architecture, Operations, and Fault Recovery
21CTO
21CTO
Jun 17, 2019 · Big Data

Why Data Middle Platforms May Be the Biggest Opportunity of the Next 20 Years

The article explores the rapid rise of data middle platforms in China, tracing their historical roots, explaining their core purpose of unifying data across legacy and new systems, showcasing Shulan Technology’s real‑world implementations, and analyzing market dynamics and future opportunities for enterprises and startups alike.

Big DataData Middle Platformdata strategy
0 likes · 25 min read
Why Data Middle Platforms May Be the Biggest Opportunity of the Next 20 Years
AI Large-Model Wave and Transformation Guide
AI Large-Model Wave and Transformation Guide
Jun 16, 2019 · Artificial Intelligence

Understanding AI's Four Core Elements: Data, Compute, Algorithms, and Scenarios

The article breaks down artificial intelligence into four essential components—massive data, powerful compute, effective algorithms, and real‑world scenarios—explaining each element with concrete analogies, hardware benchmarks, algorithm classifications, and a list of typical AI applications.

AI fundamentalsAI use casesBig Data
0 likes · 5 min read
Understanding AI's Four Core Elements: Data, Compute, Algorithms, and Scenarios
Xianyu Technology
Xianyu Technology
Jun 14, 2019 · Big Data

Xianyu IFTTT: Scalable Real-Time User Relationship Platform

Xianyu IFTTT is a scalable real-time user-relationship platform that enriches metadata, enables bidirectional buyer-seller interactions, integrates quickly via SLS logs, uses a chain-of-responsibility for customizable lists, processes push actions with fatigue filtering, and stores TB-scale data in Lindorm, delivering billions of daily records and more than double the click-through rate of offline pushes.

Big DataIFTTTReal-Time
0 likes · 9 min read
Xianyu IFTTT: Scalable Real-Time User Relationship Platform
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Jun 14, 2019 · Big Data

How I Prepared for ByteDance (TouTiao) Interviews: Study Plan, Interview Experiences, and Practical Tips

An in‑depth personal account details how the author prepared for ByteDance’s (TouTiao) recruitment, outlining a month‑by‑month study schedule covering Java, big‑data technologies, algorithms, and system fundamentals, describing each interview round, sharing successful test strategies, and offering practical advice for landing offers at top tech firms.

AlgorithmsBig DataByteDance
0 likes · 11 min read
How I Prepared for ByteDance (TouTiao) Interviews: Study Plan, Interview Experiences, and Practical Tips
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 12, 2019 · Big Data

Comprehensive Guide to FlinkCEP: API Overview, Pattern Definitions, Quantifiers, Conditions, and Usage Examples

This article provides a detailed introduction to FlinkCEP, covering how to add the library, define simple and composite patterns, use quantifiers and conditions, handle skip strategies, time constraints, and select results, with complete Java and Scala code examples for complex event processing.

Big DataCEPFlink
0 likes · 27 min read
Comprehensive Guide to FlinkCEP: API Overview, Pattern Definitions, Quantifiers, Conditions, and Usage Examples
Dada Group Technology
Dada Group Technology
Jun 11, 2019 · Big Data

Building and Evolving the Dada‑JD Daojia Big Data Platform: Architecture, Strategies, and Lessons Learned

This article presents a comprehensive case study of the Dada‑JD Daojia big data platform, detailing its evolution from a MySQL‑based warehouse to a multi‑layered One Data, One Platform, One Service, Many Apps architecture, the technical challenges faced, and the strategic approaches adopted to ensure coverage, accuracy, stability, and scalability.

Big DataData GovernanceData Platform
0 likes · 14 min read
Building and Evolving the Dada‑JD Daojia Big Data Platform: Architecture, Strategies, and Lessons Learned
Architecture Digest
Architecture Digest
Jun 11, 2019 · Databases

Database Optimization for Billion‑Scale Data: Partitioning, Sharding, and Vertical Splitting in MySQL

This article explains how a high‑traffic messaging platform with tens of millions of users and billions of daily records can be optimized using MySQL partitioning, sharding (both client‑side and proxy‑side), vertical database splitting, and practical migration scripts to maintain performance and availability.

Big DataDatabase OptimizationMySQL
0 likes · 15 min read
Database Optimization for Billion‑Scale Data: Partitioning, Sharding, and Vertical Splitting in MySQL
360 Tech Engineering
360 Tech Engineering
Jun 10, 2019 · Information Security

Design and Practice of Big Data Platform Security: Insights from 360’s Data Center Technical Director

In this interview, 360’s Big Data Center Technical Director Xu Hao discusses the critical data security challenges faced by enterprises, outlines regulatory, system‑level, and managerial risks, and shares practical strategies for building robust security governance, platform architecture, permission controls, and cloud‑based data protection.

Big Datacloud securitydata security
0 likes · 13 min read
Design and Practice of Big Data Platform Security: Insights from 360’s Data Center Technical Director
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 9, 2019 · Big Data

Optimizing Spark Shuffle: Can Fetch, Efficient Fetch, and Reliable Fetch

This article analyzes three Spark shuffle bottlenecks—oversized partitions that exceed Netty's 2 GB limit, excessive retry latency caused by dead executors, and insufficient data‑corruption checks—and presents concrete configuration changes, new block identifiers, executor‑liveness checks, and CRC‑32 verification to improve fetchability, efficiency, and reliability at scale.

Big DataShuffleSpark
0 likes · 18 min read
Optimizing Spark Shuffle: Can Fetch, Efficient Fetch, and Reliable Fetch
21CTO
21CTO
Jun 7, 2019 · Big Data

How to Build a Real-Time Big Data Sentiment Analysis Platform Using Lambda & Kappa

This article explores the design of a large‑scale, real‑time sentiment analysis system, detailing the data ingestion, processing, and storage requirements, comparing Lambda and Kappa architectures, and presenting an Alibaba Cloud solution that combines Tablestore and Blink for unified batch‑and‑stream processing.

Big DataKappa architectureLambda architecture
0 likes · 18 min read
How to Build a Real-Time Big Data Sentiment Analysis Platform Using Lambda & Kappa
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Jun 7, 2019 · Backend Development

Comprehensive Guide to Autumn Recruitment: Strategies, Case Studies, and Interview Topics for Java and Big Data Positions

This article provides a detailed roadmap for autumn campus recruitment, covering the significance of the hiring season, tailored preparation strategies for different skill levels, multiple case studies, extensive interview question collections across Java, JVM, big data, and system fundamentals, as well as practical tips for resume polishing and interview mindset.

AlgorithmsBig Datacareer advice
0 likes · 18 min read
Comprehensive Guide to Autumn Recruitment: Strategies, Case Studies, and Interview Topics for Java and Big Data Positions
Tencent Cloud Developer
Tencent Cloud Developer
Jun 6, 2019 · Big Data

2019 Big Data Industry Summit Highlights and Outcomes

From June 4‑5, 2019, the China‑hosted Big Data Industry Summit gathered more than 4,000 attendees and 60,000 online viewers to present award winners, release multiple whitepapers and standards, and hold six thematic forums and two roundtables that examined data platforms, asset management, security, law, and emerging technologies, outlining current opportunities and future challenges for big‑data growth.

Big DataChinaData Asset Management
0 likes · 14 min read
2019 Big Data Industry Summit Highlights and Outcomes
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 5, 2019 · Big Data

Real-Time Advertising Click Counting with Spark Structured Streaming and Redis Streams

This article presents a complete solution for real‑time advertising click counting using Spark Structured Streaming combined with Redis Streams, detailing the business scenario, data flow, input/output formats, and step‑by‑step implementation including data extraction, processing, storage, and query via Spark‑SQL.

Big DataRedis StreamScala
0 likes · 11 min read
Real-Time Advertising Click Counting with Spark Structured Streaming and Redis Streams
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Jun 4, 2019 · Big Data

Why Flink Outperforms Storm: Deep Dive into Stream Processing Performance

Based on data transmission and reliability metrics, this article compares Apache Storm and Apache Flink in stream processing, presenting benchmark designs, test environments, results for synthetic and Kafka data, and offers practical recommendations such as operator chaining, object reuse, and checkpoint strategies to maximize Flink performance.

Big DataFlinkPerformance Testing
0 likes · 13 min read
Why Flink Outperforms Storm: Deep Dive into Stream Processing Performance
360 Tech Engineering
360 Tech Engineering
Jun 3, 2019 · Big Data

Performance Comparison of Apache Storm and Apache Flink from Data Transmission and Reliability Perspectives

This article presents a detailed performance benchmark comparing Apache Storm and Apache Flink in stream processing, focusing on data transmission methods, reliability mechanisms, operator chaining, and both self‑generated and Kafka‑sourced workloads, and provides practical optimization recommendations based on the results.

Big DataData TransmissionFlink
0 likes · 10 min read
Performance Comparison of Apache Storm and Apache Flink from Data Transmission and Reliability Perspectives
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 2, 2019 · Big Data

Tencent's Oceanus Real-Time Stream Computing Platform and Flink Optimizations

The article presents Tencent's evolution of real‑time stream processing using Flink, the design of the Oceanus one‑stop visual platform, and a series of deep extensions and optimizations—including UI redesign, JobManager failover, checkpoint handling, enhanced windows, LocalKeyBy, idle detection, and log isolation—aimed at supporting petabyte‑scale data workloads.

Big DataFlinkOceanus
0 likes · 16 min read
Tencent's Oceanus Real-Time Stream Computing Platform and Flink Optimizations
Java Captain
Java Captain
Jun 2, 2019 · Big Data

Comprehensive Guide to Autumn Recruitment: Strategies, Learning Paths, and Interview Questions for Java and Big Data Positions

This article provides a detailed roadmap for candidates preparing for the autumn recruitment season, covering interview experience sharing, systematic learning routes, project preparation, essential Java and big‑data technologies, core algorithms, and practical interview question collections to help readers avoid common pitfalls and succeed in securing offers.

AlgorithmsAutumn RecruitmentBig Data
0 likes · 18 min read
Comprehensive Guide to Autumn Recruitment: Strategies, Learning Paths, and Interview Questions for Java and Big Data Positions
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 1, 2019 · Big Data

Understanding Spark Executor Memory Management: On‑Heap, Off‑Heap, and Unified Memory

This article explains Spark's executor memory architecture, covering on‑heap and off‑heap memory planning, static and unified memory managers, storage and execution memory allocation, RDD persistence, eviction policies, and shuffle memory usage, providing practical guidance for performance tuning.

Big DataExecutorMemory Management
0 likes · 23 min read
Understanding Spark Executor Memory Management: On‑Heap, Off‑Heap, and Unified Memory
Big Data Technology & Architecture
Big Data Technology & Architecture
May 30, 2019 · Big Data

Data Skew Optimization Techniques in Spark

This article explains the phenomenon, causes, detection methods, and a comprehensive set of solutions—including Hive preprocessing, key filtering, shuffle parallelism, two‑stage aggregation, map‑join, sampling, random prefixing, and combined strategies—to mitigate data skew in Spark jobs and improve performance.

Big DataData SkewShuffle
0 likes · 31 min read
Data Skew Optimization Techniques in Spark
Big Data Technology & Architecture
Big Data Technology & Architecture
May 29, 2019 · Cloud Native

Real-Time Computing Solutions with Flink and HBase: Architecture, Market Analysis, and Use Cases

The article presents Alibaba Cloud's real-time computing solution based on Flink and HBase, covering market competition, open‑source ecosystem, containerized architecture on Kubernetes, and typical applications such as online education video analysis, city‑brain traffic management, and fraud detection.

Big DataCloud NativeFlink
0 likes · 12 min read
Real-Time Computing Solutions with Flink and HBase: Architecture, Market Analysis, and Use Cases
ITPUB
ITPUB
May 29, 2019 · Big Data

How to Build a Trillion-Scale Real-Time Data Platform: Lessons from DTCC 2019

In a DTCC 2019 keynote, Zhao Qun, director of big‑data platform at Percent Point, outlines the challenges of trillion‑scale real‑time analytics and presents a transparent, fine‑grained architecture built on Kafka, Spark Streaming, ClickHouse, HBase, Ceph and Elasticsearch, detailing design principles, component sizing, multi‑center deployment, performance testing and operational safeguards.

ArchitectureBig DataKafka
0 likes · 17 min read
How to Build a Trillion-Scale Real-Time Data Platform: Lessons from DTCC 2019
Big Data Technology & Architecture
Big Data Technology & Architecture
May 28, 2019 · Big Data

Optimizing Flink Shuffle: New Flow‑Control Mechanism, Serialization Improvements, and Architecture Refactoring

The article explains how Flink's shuffle pipeline—from upstream data serialization to downstream consumption—is optimized through a credit‑based flow‑control mechanism, zero‑copy network buffers, broadcast serialization reduction, external shuffle service, and a plugin‑based shuffle manager, resulting in significant performance gains for both streaming and batch jobs.

Big DataFlinkFlow Control
0 likes · 15 min read
Optimizing Flink Shuffle: New Flow‑Control Mechanism, Serialization Improvements, and Architecture Refactoring
MaGe Linux Operations
MaGe Linux Operations
May 28, 2019 · Big Data

Recreating Google Ngram Trends with Python, PyTubes, and NumPy

This article demonstrates how to download the Google 1‑gram dataset, load and filter billions of rows with the PyTubes library, compute yearly word frequencies using NumPy, and reproduce the classic Python usage trend chart while discussing performance considerations and future improvements.

Big DataGoogle NgramNumPy
0 likes · 9 min read
Recreating Google Ngram Trends with Python, PyTubes, and NumPy
21CTO
21CTO
May 24, 2019 · Operations

How Meituan’s R&D Team Cut Tens of Millions in Resource Costs: A Practical Guide

This article details Meituan's R&D team's systematic PDCA‑based approach to resource cost optimization, covering methodology definition, planning, execution, checking, and iterative improvement across infrastructure, big‑data, and shared services, ultimately saving tens of millions of yuan.

Big DataCost OptimizationOperations
0 likes · 22 min read
How Meituan’s R&D Team Cut Tens of Millions in Resource Costs: A Practical Guide
dbaplus Community
dbaplus Community
May 21, 2019 · Big Data

How to Supercharge Elasticsearch Queries on Billions of Records

This article explains why Elasticsearch can be slow on massive datasets, then details practical techniques—leveraging filesystem cache, pre‑heating hot data, separating hot and cold indices, designing lean document models, and avoiding deep pagination—to achieve sub‑second query performance at billions‑scale.

Big DataElasticsearchdata modeling
0 likes · 11 min read
How to Supercharge Elasticsearch Queries on Billions of Records
Big Data Technology & Architecture
Big Data Technology & Architecture
May 19, 2019 · Big Data

Implementing End-to-End Exactly-Once Semantics in Apache Flink with Apache Kafka Using Two-Phase Commit Sink

This article explains how Apache Flink’s TwoPhaseCommitSinkFunction, introduced in version 1.4, enables end-to-end exactly-once semantics when integrated with Apache Kafka, detailing the checkpoint mechanism and the two-phase commit protocol that ensures reliable data processing.

Apache FlinkApache KafkaBig Data
0 likes · 4 min read
Implementing End-to-End Exactly-Once Semantics in Apache Flink with Apache Kafka Using Two-Phase Commit Sink
Qunar Tech Salon
Qunar Tech Salon
May 16, 2019 · Big Data

Optimizing HDFS Federation Data Migration with FastCopy and qFastCopy at Qunar

This article describes the challenges of scaling Qunar's Hadoop NameNode, introduces HDFS Federation and the FastCopy tool, presents performance tests comparing FastCopy with DistCp, and details the development and evaluation of an optimized qFastCopy solution that reduces multi‑petabyte migration time from hours to a few.

Big DataData MigrationFastCopy
0 likes · 8 min read
Optimizing HDFS Federation Data Migration with FastCopy and qFastCopy at Qunar
dbaplus Community
dbaplus Community
May 13, 2019 · Big Data

Tackling HDFS Performance Bottlenecks: Real‑World Optimizations from VIP.com

This article examines the performance challenges encountered after upgrading a large‑scale HDFS cluster at VIP.com, explains the root causes of NameNode RPC latency, and presents concrete solutions—including delayed block reports, configurable block deletion, federation redesign, client monitoring, temp‑directory sharding, and small‑file handling—along with configuration snippets and real‑world results.

Big DataFederationHDFS
0 likes · 13 min read
Tackling HDFS Performance Bottlenecks: Real‑World Optimizations from VIP.com
DataFunTalk
DataFunTalk
May 13, 2019 · Artificial Intelligence

Financial Risk Management: Business Requirements and Technical Solutions

This article presents a comprehensive overview of financial risk management, detailing business challenges such as identity verification and fraud, and describing technical solutions including feature engineering, sample handling, model optimization, and online validation, emphasizing the integration of data-driven AI techniques throughout the process.

Big DataRisk managementfinancial modeling
0 likes · 13 min read
Financial Risk Management: Business Requirements and Technical Solutions
Big Data Technology & Architecture
Big Data Technology & Architecture
May 12, 2019 · Big Data

Understanding Spark Streaming Integration with Kafka: Receiver-based and Direct Approaches

This article explains Spark Streaming’s architecture, core concepts such as DStream, windowing, and the two Kafka integration methods—Receiver-based and Direct approaches—detailing their configurations, memory implications, checkpointing, and best‑practice recommendations for reliable, high‑throughput real‑time data processing.

Big DataDirect ApproachReceiver Approach
0 likes · 18 min read
Understanding Spark Streaming Integration with Kafka: Receiver-based and Direct Approaches
Architecture Digest
Architecture Digest
May 11, 2019 · Cloud Native

Ant Financial’s Fifteen‑Year Technology Architecture Evolution and the Future of FinTech

In a QCon 2019 talk, Ant Financial’s deputy CTO Hu Xi outlines the company’s fifteen‑year journey reshaping payments and micro‑loans through blockchain, AI, security, IoT and cloud computing, and details the emerging cloud‑native, high‑availability, data‑intelligent architecture that will underpin the next generation of financial technology.

Artificial IntelligenceBig DataBlockchain
0 likes · 16 min read
Ant Financial’s Fifteen‑Year Technology Architecture Evolution and the Future of FinTech
DataFunTalk
DataFunTalk
May 10, 2019 · Artificial Intelligence

Pony.ai Infrastructure Overview: Vehicle Systems, Simulation Platform, and Data Architecture

The article presents a comprehensive overview of Pony.ai's autonomous driving infrastructure, covering the core infrastructure team’s responsibilities, vehicle onboard systems, simulation platform, data architecture, and supporting services, while discussing the technical challenges and engineering practices employed to achieve scalability, reliability, and high performance.

AIBig DataInfrastructure
0 likes · 14 min read
Pony.ai Infrastructure Overview: Vehicle Systems, Simulation Platform, and Data Architecture
Alibaba Cloud Developer
Alibaba Cloud Developer
May 10, 2019 · Cloud Native

How Ant Group Built a Cloud‑Native, Financial‑Grade Architecture Over 15 Years

Ant Group’s former CTO Hu Xi outlines the 15‑year evolution of its fintech architecture, highlighting the five BASIC technologies—blockchain, AI, security, IoT, and cloud computing—while detailing the shift to cloud‑native, distributed middleware, OceanBase, service mesh, risk‑auto‑recovery, and open‑intelligent data platforms.

Big DataBlockchainDistributed Systems
0 likes · 18 min read
How Ant Group Built a Cloud‑Native, Financial‑Grade Architecture Over 15 Years
AntTech
AntTech
May 9, 2019 · Cloud Native

Ant Financial’s Fifteen‑Year Technology Architecture Evolution and the Future of FinTech

The article reviews Ant Financial’s fifteen‑year journey reshaping payments and micro‑loans through blockchain, AI, security, IoT and cloud computing, explains how distributed middleware, OceanBase, service‑mesh‑based cloud‑native infrastructure and open intelligent computing architectures enable high‑availability, scalable financial services, and introduces the BASIC College talent program.

Artificial IntelligenceBig DataBlockchain
0 likes · 16 min read
Ant Financial’s Fifteen‑Year Technology Architecture Evolution and the Future of FinTech
Big Data Technology & Architecture
Big Data Technology & Architecture
May 5, 2019 · Databases

Designing Effective RowKeys in HBase

This article explains why HBase rowkey design is critical for performance, outlines common interview expectations, and provides visual guidelines to help developers create efficient rowkeys for production workloads, including best‑practice tips on key length, salting, and ordering to avoid hotspotting.

Big DataDatabase designrowKey
0 likes · 1 min read
Designing Effective RowKeys in HBase
Didi Tech
Didi Tech
May 1, 2019 · Artificial Intelligence

New Generation AI Empowering the Era of Smart Mobility – Insights from Didi’s Chief Scientist Tang Jian

Chief Scientist Tang Jian explains how Didi leverages next‑generation AI—big‑data, hybrid‑augmented, autonomous, and collective intelligence—to transform smart mobility through advanced dispatch, safety systems, in‑car perception, traffic‑signal optimization, and global collaborations, while confronting challenges of model scale, computing power, and safety assurance.

Artificial IntelligenceBig DataDidi
0 likes · 11 min read
New Generation AI Empowering the Era of Smart Mobility – Insights from Didi’s Chief Scientist Tang Jian
21CTO
21CTO
Apr 29, 2019 · Big Data

How EasyScheduler Powers Scalable Big Data Workflow Management

EasyScheduler is an open‑source big‑data workflow scheduler that uses a decentralized architecture with Master and Worker nodes coordinated via ZooKeeper, supporting DAG‑based task definitions, various task types, fault tolerance, priority handling, distributed locks, and remote logging, all illustrated with detailed component diagrams.

Big DataDAGDistributed Systems
0 likes · 17 min read
How EasyScheduler Powers Scalable Big Data Workflow Management
Youzan Coder
Youzan Coder
Apr 29, 2019 · Big Data

Optimizing Flink Sliding Windows for Super Long Time Ranges

To overcome severe performance degradation of Flink sliding windows over very long time ranges, Youzan engineers applied time‑slicing based on the greatest common divisor of window length and slide step, reducing state writes and timers, which yielded 3‑8× speedups in production.

Big DataFlinkReal-time Processing
0 likes · 18 min read
Optimizing Flink Sliding Windows for Super Long Time Ranges
Big Data Technology & Architecture
Big Data Technology & Architecture
Apr 24, 2019 · Big Data

Hive SQL Optimization Techniques and Best Practices

This article provides a comprehensive guide to Hive SQL performance tuning, covering optimization goals, common pitfalls, execution flow, table and job settings, map, shuffle, reduce, and query-level improvements such as join, bucket join, group‑by, and count‑distinct optimizations.

Big DataHadoophive
0 likes · 11 min read
Hive SQL Optimization Techniques and Best Practices
Efficient Ops
Efficient Ops
Apr 23, 2019 · Information Security

How Situational Awareness Transforms Modern Cybersecurity Defense

The article explains how situational awareness—covering pre‑attack, during‑attack, and post‑attack stages—leverages big data, AI, threat intelligence, UEBA and visualization to turn security platforms into proactive “security brains,” while also critiquing current product implementations and market practices.

Big DataThreat IntelligenceUEBA
0 likes · 14 min read
How Situational Awareness Transforms Modern Cybersecurity Defense
Didi Tech
Didi Tech
Apr 23, 2019 · Big Data

Travel Time Index (TTI): Evaluation Methods, Calculation, and Validation Using Didi Trajectory Data

The Travel Time Index (TTI) quantifies urban congestion by comparing actual travel time to free‑flow conditions, and this study details domestic and international evaluation methods, free‑flow speed estimation, weight calculation, link extraction via PostGIS, system architecture, and validation using massive Didi trajectory data to support city traffic management.

Big DataGISPostGIS
0 likes · 9 min read
Travel Time Index (TTI): Evaluation Methods, Calculation, and Validation Using Didi Trajectory Data
Youku Technology
Youku Technology
Apr 22, 2019 · Artificial Intelligence

Exploring the Construction of an Entertainment Brain: AI and Big Data Practices in the Fish Brain Platform

The talk introduces Alibaba’s Fish Brain platform, an AI‑powered decision‑support system for entertainment that combines a three‑layer data‑model, AI‑processed basic data, and application models, leveraging NLP, computer‑vision, custom embeddings, loss functions and predictive hybrid networks to analyze content, user behavior, and forecast performance.

AIBig DataEmbedding
0 likes · 12 min read
Exploring the Construction of an Entertainment Brain: AI and Big Data Practices in the Fish Brain Platform
Didi Tech
Didi Tech
Apr 18, 2019 · Big Data

Big Data-Driven Smart Transportation Lecture by Didi and High Education Community

Didi’s vice‑president and chief scientist of smart transportation, Professor Henry Liu, delivered the “Big Data‑Driven Smart Transportation” lecture—part of the AI Industry Applications course—on China University MOOC, teaching students fundamental concepts, real‑world cases, and future prospects of big‑data and AI in traffic management.

Artificial IntelligenceBig DataDidi
0 likes · 3 min read
Big Data-Driven Smart Transportation Lecture by Didi and High Education Community
Alibaba Cloud Developer
Alibaba Cloud Developer
Apr 18, 2019 · Big Data

How MaxCompute Evolved: 10 Years of Big Data Innovation at Alibaba

This article reviews a decade of MaxCompute development, covering its origins, core technologies, performance gains, ecosystem integration, intelligent features, competitive positioning, and commercialization, while highlighting the platform's role as Alibaba's central big‑data compute engine.

AI integrationBig DataMaxCompute
0 likes · 21 min read
How MaxCompute Evolved: 10 Years of Big Data Innovation at Alibaba
Big Data Technology & Architecture
Big Data Technology & Architecture
Apr 17, 2019 · Big Data

Step-by-Step Guide to Installing Hive 2.1.0 on a Hadoop 2.7.1 Cluster (Ubuntu 14.04)

This tutorial provides a comprehensive, step-by-step procedure for setting up Hive 2.1.0 on a Hadoop 2.7.1 cluster running Ubuntu 14.04, covering environment preparation, Hive installation, configuration of environment variables, MySQL metastore integration, client setup, service startup, and basic verification commands.

Big DataHadoopInstallation
0 likes · 8 min read
Step-by-Step Guide to Installing Hive 2.1.0 on a Hadoop 2.7.1 Cluster (Ubuntu 14.04)
DataFunTalk
DataFunTalk
Apr 17, 2019 · Artificial Intelligence

Evolution of Ctrip Financial Risk Control Models: From Data Platform to AI‑Driven Scoring and Anti‑Fraud Systems

This report details Ctrip Financial's end‑to‑end risk control development, covering business overview, a three‑layer data platform, the progression of credit scoring and anti‑fraud models from rule‑based to advanced AI techniques, and the evaluation, monitoring, and social‑network‑based fraud detection strategies employed.

Big DataFinancial AIanti-fraud
0 likes · 16 min read
Evolution of Ctrip Financial Risk Control Models: From Data Platform to AI‑Driven Scoring and Anti‑Fraud Systems
dbaplus Community
dbaplus Community
Apr 16, 2019 · Big Data

Scaling Elasticsearch for Billions of Daily Events: Cluster Planning, Routing & Hot‑Warm Tips

This article explains how to handle a real‑time OLAP monitoring platform processing 10‑12 billion daily events and 400 billion yearly records by optimizing Elasticsearch 5.3.3 through cluster planning, storage strategies, index sharding, compression, hot‑warm architecture, routing, index templates, rollover, and cross‑cluster search, providing concrete configurations and code examples.

Big DataCluster PlanningElasticsearch
0 likes · 23 min read
Scaling Elasticsearch for Billions of Daily Events: Cluster Planning, Routing & Hot‑Warm Tips
21CTO
21CTO
Apr 15, 2019 · Big Data

Mastering High‑Concurrency Big Data: Sharding, Partitioning, and Index Strategies

This article explores practical techniques for handling massive, high‑concurrency data workloads, covering relational database limits, read/write separation, vertical and horizontal sharding, key selection, archival to NoSQL stores, and the use of heterogeneous index tables to maintain performance.

Big DataPartitioningdatabase scaling
0 likes · 6 min read
Mastering High‑Concurrency Big Data: Sharding, Partitioning, and Index Strategies
Alibaba Cloud Developer
Alibaba Cloud Developer
Apr 15, 2019 · Artificial Intelligence

Why Deep Learning Finally Succeeded and What Challenges Lie Ahead

This article reviews Jia Yangqing’s insights on why deep learning finally succeeded—highlighting the roles of big data and high‑performance computing—while examining its current limitations, emerging challenges, and future opportunities across AI engineering, AutoML, and hardware‑software co‑design.

AI ChallengesAI EngineeringAutoML
0 likes · 9 min read
Why Deep Learning Finally Succeeded and What Challenges Lie Ahead
JD Retail Technology
JD Retail Technology
Apr 10, 2019 · Databases

HBase at JD.com: Architecture, Use Cases, and Evolution

This article explains how JD.com leverages the open‑source HBase database for massive, low‑latency data storage across various business lines, detailing its architecture, multi‑tenant isolation, disaster‑recovery mechanisms, and integration with Phoenix SQL for OLTP workloads.

Big DataDatabase ArchitectureHBase
0 likes · 13 min read
HBase at JD.com: Architecture, Use Cases, and Evolution
Java Captain
Java Captain
Apr 9, 2019 · Big Data

Kafka FAQs: Zookeeper Dependency, Retention Policies, Cleanup Rules, Performance Bottlenecks, and Cluster Best Practices

This article answers common Kafka questions, explaining why Kafka cannot operate without Zookeeper, describing its two retention strategies based on time and size, detailing how simultaneous time‑ and size‑based cleanup works, listing performance bottlenecks, and offering practical guidelines for sizing and configuring Kafka clusters.

Big DataCluster DesignKafka
0 likes · 2 min read
Kafka FAQs: Zookeeper Dependency, Retention Policies, Cleanup Rules, Performance Bottlenecks, and Cluster Best Practices
Youzan Coder
Youzan Coder
Apr 7, 2019 · Industry Insights

How Youzan Scaled Order Search: Hot‑State Indexing and AKF Expansion

This article reviews the evolution of Youzan's order search architecture over two years, detailing challenges from data growth, the creation of a hot‑state index covering half of search traffic, time‑sharded indexes, and the AKF expansion cube that guides multi‑axis scalability.

Big DataElasticsearchScalability
0 likes · 10 min read
How Youzan Scaled Order Search: Hot‑State Indexing and AKF Expansion
Big Data Technology & Architecture
Big Data Technology & Architecture
Apr 3, 2019 · Big Data

Understanding RAID and Its Role in HDFS Architecture

This article explains the storage challenges of big data, introduces RAID technologies and their variants, and shows how the principles of RAID are applied in the Hadoop Distributed File System (HDFS) to achieve scalable, reliable, and high‑performance data storage and processing.

Big DataHDFSRAID
0 likes · 10 min read
Understanding RAID and Its Role in HDFS Architecture
Alibaba Cloud Developer
Alibaba Cloud Developer
Apr 3, 2019 · Cloud Computing

What Alibaba Cloud’s New President Reveals About the Future of Cloud Computing

In a candid interview, Alibaba Cloud’s new president discusses how pricing is just a starting point, the shift from open‑source to self‑developed data platforms, the rapid growth of hybrid cloud, security priorities, the role of AI, the evolution of the middle‑platform concept, ecosystem integration, and the strategic focus on scaling, public‑cloud share, and partner collaboration to drive Alibaba Cloud’s future growth.

AIAlibaba CloudBig Data
0 likes · 31 min read
What Alibaba Cloud’s New President Reveals About the Future of Cloud Computing
Alibaba Cloud Developer
Alibaba Cloud Developer
Apr 3, 2019 · Cloud Computing

What’s Next for Cloud Computing? Insights from Alibaba Cloud’s New President

In a detailed interview, Alibaba Cloud’s new president discusses the future of cloud computing, emphasizing the shift from price competition to core value, the importance of hybrid cloud, data processing platforms, open‑source challenges, AI integration, ecosystem strategy, and the evolving role of the cloud as a platform and integrated service.

Alibaba CloudArtificial IntelligenceBig Data
0 likes · 28 min read
What’s Next for Cloud Computing? Insights from Alibaba Cloud’s New President
Programmer DD
Programmer DD
Apr 2, 2019 · Backend Development

From Freshman to Senior Engineer: A Developer’s Journey Through Java, Spring, and Big Data

This article chronicles a Chinese computer science graduate’s step‑by‑step evolution from learning basic C and Java in university to building campus apps, winning software contests, mastering Spring, Hadoop, Elasticsearch, and Neo4j, and ultimately landing offers from top tech firms, illustrating the challenges and perseverance required for a successful software engineering career.

Big Datacareerjava
0 likes · 13 min read
From Freshman to Senior Engineer: A Developer’s Journey Through Java, Spring, and Big Data