Tagged articles
946 articles
Page 10 of 10
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 25, 2019 · Big Data

Understanding Flink DataSetAPI and DataStreamAPI

This article introduces Apache Flink's DataSetAPI and DataStreamAPI, explains their source, transformation, and sink concepts, highlights the key differences in transformation handling, and notes the series' goal of publishing over 500 big‑data tutorials for learners from beginner to expert.

Big DataDataSetAPIDataStreamAPI
0 likes · 2 min read
Understanding Flink DataSetAPI and DataStreamAPI
Qunar Tech Salon
Qunar Tech Salon
Feb 20, 2019 · Big Data

Building Real-Time User Behavior Engineering with Apache Flink: Architecture, Features, and Implementation

This article introduces the design and implementation of a real‑time user behavior engineering platform at Qunar using Apache Flink, covering Flink's core characteristics, distributed runtime, DataStream programming model, fault‑tolerance, back‑pressure handling, event‑time processing, windowing, watermarks, and practical code examples for filtering, splitting, joining, and state management.

CheckpointDataStreamEventTime
0 likes · 18 min read
Building Real-Time User Behavior Engineering with Apache Flink: Architecture, Features, and Implementation
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 15, 2019 · Big Data

Big Data Mastery Roadmap

This article outlines a comprehensive series of over 500 planned tutorials covering Java advanced features, distributed theory, Hadoop, Spark, Flink, and various big‑data storage and processing technologies, designed to guide engineers transitioning into big‑data development from fundamentals to expert level.

Distributed SystemsFlinkHadoop
0 likes · 4 min read
Big Data Mastery Roadmap
NetEase Game Operations Platform
NetEase Game Operations Platform
Jan 25, 2019 · Big Data

Understanding Exactly-Once Semantics in Apache Flink: Challenges and Implementation

This article analyzes the difficulties of achieving exactly-once delivery in Apache Flink, explains the distinction between state and end‑to‑end semantics, and details how idempotent and transactional sinks—illustrated with the Bucketing File Sink—realize exactly‑once guarantees through checkpoint‑based two‑phase commit.

Big DataExactly-OnceFlink
0 likes · 13 min read
Understanding Exactly-Once Semantics in Apache Flink: Challenges and Implementation
Youzan Coder
Youzan Coder
Jan 16, 2019 · Big Data

How Youzan Scaled Real‑Time Analytics with Flink: Architecture, Pitfalls, and Lessons

This article walks through Youzan's real‑time platform architecture, explains why Flink was chosen over Spark Structured Streaming, details practical challenges such as container over‑provisioning and monitoring overhead, shares solutions for Spring integration and async caching, and outlines future directions for SQL‑based streaming and scheduler improvements.

Big DataFlinkReal-time Streaming
0 likes · 19 min read
How Youzan Scaled Real‑Time Analytics with Flink: Architecture, Pitfalls, and Lessons
Architects Research Society
Architects Research Society
Dec 30, 2018 · Big Data

Overview of Major Apache Big Data Processing Frameworks

This article provides a concise overview of numerous Apache open‑source projects—including Ignite, MapReduce, Pig, JAQL, Spark, Storm, Flink, Apex, REEF, Twill, and Beam—that enable distributed in‑memory storage, real‑time and batch processing, and advanced analytics for large‑scale data workloads.

ApacheBig DataFlink
0 likes · 22 min read
Overview of Major Apache Big Data Processing Frameworks
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 24, 2018 · Artificial Intelligence

Explore Alibaba’s New Open‑Source AI Framework, Massive Cluster Dataset, and Blink Engine

This newsletter introduces Alibaba’s X‑DeepLearning deep‑learning framework, releases a large‑scale cluster dataset for research, announces the upcoming open‑source of the Blink streaming engine, showcases the futuristic FlyZoo Hotel, and shares a welcome letter to new hires, highlighting cutting‑edge technologies across AI, big data, and cloud operations.

AIAlibabaCluster Data
0 likes · 7 min read
Explore Alibaba’s New Open‑Source AI Framework, Massive Cluster Dataset, and Blink Engine
Didi Tech
Didi Tech
Dec 18, 2018 · Big Data

Evolution and Architecture of Didi's Real-Time Computing Platform

From early self‑built Storm and Spark Streaming clusters to a unified YARN‑based Spark platform and finally a low‑latency Flink system with extended CEP and StreamSQL capabilities, Didi’s real‑time computing platform evolved through three stages, delivering multi‑tenant isolation, rich SQL processing, and dramatically reduced development costs.

Big DataCEPFlink
0 likes · 9 min read
Evolution and Architecture of Didi's Real-Time Computing Platform
DataFunTalk
DataFunTalk
Dec 18, 2018 · Big Data

Flink-based Real-time Data Warehouse Practice at Yanxuan

This talk presents Yanxuan’s real‑time data warehouse built on Flink, covering background challenges, overall architecture and implementation, data quality measures, monitoring, and practical application scenarios, while highlighting design goals of flexibility, high development efficiency, and stringent data quality requirements.

FlinkStreamingreal-time data warehouse
0 likes · 14 min read
Flink-based Real-time Data Warehouse Practice at Yanxuan
Xianyu Technology
Xianyu Technology
Nov 6, 2018 · Big Data

Technical Evolution of Xianyu Real-Time Selection System for Double Eleven

To meet Double‑Eleven’s sub‑second, billion‑item feed demands, Alibaba’s Xianyu selection system evolved from a Solr‑based search pipeline through offline batch and PostgreSQL attempts to a Blink‑powered real‑time stream platform using Niagara’s low‑latency LSM storage, delivering high‑throughput, personalized product feeds.

AlibabaBig DataFlink
0 likes · 23 min read
Technical Evolution of Xianyu Real-Time Selection System for Double Eleven
dbaplus Community
dbaplus Community
Nov 1, 2018 · Big Data

How Vipshop Scales Real‑Time Data with Flink on Kubernetes

This article details Vipshop's real‑time platform architecture, the migration from Storm and Spark to Flink, Flink's deployment on Kubernetes, and the latest Unified Data Management system that unifies data access across Kafka, Redis, Tair and HDFS.

Big DataFlinkKubernetes
0 likes · 12 min read
How Vipshop Scales Real‑Time Data with Flink on Kubernetes
ITPUB
ITPUB
Oct 23, 2018 · Big Data

How Meituan Built a Scalable Real‑Time Data Warehouse with Flink

This article explains how Meituan tackled growing real‑time data demands by redesigning its streaming platform, adopting a layered real‑time data warehouse architecture, selecting storage and compute technologies such as Cellar, Elasticsearch, Druid and Flink, and sharing practical tips on dimension expansion, joins, and aggregation to achieve higher throughput and lower latency.

Data ArchitectureFlinkMeituan
0 likes · 15 min read
How Meituan Built a Scalable Real‑Time Data Warehouse with Flink
21CTO
21CTO
Oct 19, 2018 · Big Data

How Meituan Scales Real‑Time Computing with Flink: Architecture, Challenges & Solutions

This article summarizes Meituan’s real‑time computing platform, detailing its layered architecture built on Kafka, Flink on YARN, state management, resource isolation, fault tolerance, monitoring, and the Petra metric aggregation system, while highlighting the challenges faced and the solutions implemented to achieve high‑throughput, low‑latency stream processing at massive scale.

Big DataFlinkReal-time Streaming
0 likes · 18 min read
How Meituan Scales Real‑Time Computing with Flink: Architecture, Challenges & Solutions
Meituan Technology Team
Meituan Technology Team
Oct 18, 2018 · Big Data

Building a Real-Time Data Warehouse with Flink at Meituan

Meituan replaced its Storm‑based pipeline with a four‑layer real‑time data warehouse powered by Flink, using hybrid storage (Cellar KV, Elasticsearch, Druid, MySQL) to deliver low‑latency, high‑throughput services, dramatically simplifying SQL‑driven development, unifying metrics, cutting compute costs, and paving the way for offline‑grade accuracy and reliability.

FlinkMeituanStreaming
0 likes · 16 min read
Building a Real-Time Data Warehouse with Flink at Meituan
Tencent Cloud Developer
Tencent Cloud Developer
Sep 6, 2018 · Big Data

Real-Time Stream Computing: Concepts, Challenges, and Tencent Cloud Solutions

As mobile and IoT data surge, real-time stream computing—especially Flink’s low-latency, high-throughput, exactly-once engine—addresses challenges of latency, accuracy, and usability, and Tencent Cloud’s managed Flink service provides elastic, secure, integrated pipelines for applications ranging from online status monitoring to fraud detection and smart transportation.

Apache StormBig DataFlink
0 likes · 30 min read
Real-Time Stream Computing: Concepts, Challenges, and Tencent Cloud Solutions
Alibaba Cloud Developer
Alibaba Cloud Developer
Aug 13, 2018 · Big Data

How Ele.me Evolved Its Real‑Time Engine: From Storm to Flink

This article examines Ele.me’s big‑data platform evolution, comparing Storm, Spark Streaming, Structured Streaming, and Flink, detailing their architectures, consistency semantics, performance trade‑offs, and why Flink became the preferred real‑time computation engine for the company.

Big DataFlinkSpark
0 likes · 15 min read
How Ele.me Evolved Its Real‑Time Engine: From Storm to Flink
Meitu Technology
Meitu Technology
Aug 2, 2018 · Big Data

Spark Streaming vs Flink – Architecture, Scheduling & Fault Tolerance

This article compares Spark Streaming and Flink across runtime models, component roles, programming APIs, task scheduling, time semantics, dynamic Kafka partition detection, fault‑tolerance mechanisms, exactly‑once guarantees, and back‑pressure handling, providing code examples and practical insights for real‑time data processing.

Dynamic Partition DetectionExactly-OnceFlink
0 likes · 23 min read
Spark Streaming vs Flink – Architecture, Scheduling & Fault Tolerance
JD Tech Talk
JD Tech Talk
Aug 2, 2018 · Big Data

Real-Time Order Statistics with Apache Flink in a Data Aggregation Platform

This article explains how the data aggregation platform adopts Apache Flink for high‑throughput, low‑latency stream processing, covering the complete workflow from data source integration, transformation operations, windowing and time concepts, to a concrete order‑count example with custom aggregation logic.

Apache FlinkEvent TimeFlink
0 likes · 10 min read
Real-Time Order Statistics with Apache Flink in a Data Aggregation Platform
ITPUB
ITPUB
Jun 14, 2018 · Big Data

Why Suning.com Sticks with Hadoop: Insights into China’s Big Data Platform Choices

Amid declining Hadoop usage reports, Suning.com’s 2018‑2020 big‑data platform case study reveals why the retailer still relies on Hadoop’s mature ecosystem, how it integrates HDFS, HBase, YARN, Hive, Spark, Flink and emerging tools, and what future resource‑management plans it envisions.

Data PlatformFlinkHadoop
0 likes · 11 min read
Why Suning.com Sticks with Hadoop: Insights into China’s Big Data Platform Choices
Ctrip Technology
Ctrip Technology
Jun 4, 2018 · Big Data

Real-Time Data Processing Frameworks and Kafka Practices at Ctrip Ticketing

This article examines Ctrip Ticket's real-time data processing ecosystem, comparing batch and streaming frameworks such as Hadoop, Spark, Storm, Flink, and Spark Streaming, detailing Kafka deployment and configuration, and describing how these technologies are applied in production for log analysis, seat‑occupancy detection, and anti‑crawling.

FlinkReal-time ProcessingSpark Streaming
0 likes · 12 min read
Real-Time Data Processing Frameworks and Kafka Practices at Ctrip Ticketing
iQIYI Technical Product Team
iQIYI Technical Product Team
Jan 31, 2018 · Big Data

Evolution of iQIYI Real-Time Big Data Collection System

iQIYI’s big‑data collection system has progressed from simple HTTP log uploads to a Flume‑Kafka pipeline and finally to a custom Venus‑Agent architecture with centralized configuration, persistent offsets, dual‑Kafka streams and Flink processing, now handling tens of millions of queries per second and over three hundred billion records daily to power its AI‑driven services.

Big DataFlinkFlume
0 likes · 15 min read
Evolution of iQIYI Real-Time Big Data Collection System
Architecture Digest
Architecture Digest
Dec 16, 2017 · Big Data

Performance Comparison of Apache Flink and Apache Storm for Real‑Time Stream Processing

This report presents a systematic performance evaluation of Apache Flink and Apache Storm across multiple real‑time processing scenarios, measuring throughput, latency, message‑delivery semantics, and state‑backend effects, and provides recommendations for selecting the most suitable engine based on the observed results.

Big DataFlinkReal-time analytics
0 likes · 21 min read
Performance Comparison of Apache Flink and Apache Storm for Real‑Time Stream Processing
21CTO
21CTO
Aug 14, 2017 · Big Data

Unveiling Flink’s Multi‑Layer Execution Graph: From StreamGraph to Physical Deployment

This article explains Flink’s architecture, detailing the roles of Client, JobManager and TaskManager, walks through a SocketTextStreamWordCount example, and clarifies the four‑layer graph model—StreamGraph, JobGraph, ExecutionGraph, and the physical execution graph—highlighting why each layer exists.

Big DataExecution GraphFlink
0 likes · 9 min read
Unveiling Flink’s Multi‑Layer Execution Graph: From StreamGraph to Physical Deployment
Architecture Digest
Architecture Digest
Aug 13, 2017 · Big Data

Understanding Flink Architecture, Job Example, and Execution Graph Layers

This article explains Flink’s cluster architecture, the roles of Client, JobManager and TaskManager, demonstrates a SocketTextStreamWordCount example, and details the four-layer execution graph (StreamGraph, JobGraph, ExecutionGraph, Physical Execution Graph) to illustrate how Flink schedules and runs streaming jobs.

Execution GraphFlinkJobManager
0 likes · 9 min read
Understanding Flink Architecture, Job Example, and Execution Graph Layers
Alibaba Cloud Developer
Alibaba Cloud Developer
Aug 4, 2017 · Big Data

From PhD to Alibaba: How a Real‑Time Streaming Expert Built Blink on Flink

Alibaba algorithm engineer Shi Xiaogang shares his journey from a Peking University PhD researching real‑time iterative computation on data streams to developing Blink’s state management and recovery features in Flink, highlighting the challenges of transitioning from academia to industry and the impact of large‑scale real‑time systems.

FlinkReal-time Streamingalgorithm engineering
0 likes · 13 min read
From PhD to Alibaba: How a Real‑Time Streaming Expert Built Blink on Flink
Alibaba Cloud Developer
Alibaba Cloud Developer
May 25, 2017 · Big Data

How Alibaba’s Blink Engine Redefines Real‑Time Big Data Processing

This article explains how Alibaba’s Blink, built on Apache Flink, transforms batch‑oriented big‑data platforms into a unified, high‑performance real‑time computing engine, detailing its architecture, state management, checkpointing, and successful deployment in e‑commerce, search, recommendation, and online machine‑learning scenarios.

AlibabaBig DataFlink
0 likes · 17 min read
How Alibaba’s Blink Engine Redefines Real‑Time Big Data Processing
Architect
Architect
Jun 10, 2016 · Big Data

Understanding Session Windows in Apache Flink

This article explains the concept, implementation, and underlying mechanics of session windows in Apache Flink, covering how Flink assigns and merges windows, the relevant APIs such as SessionWindows.withGap, and detailed source code analysis for both the window assigner and trigger handling.

FlinkMergingWindowAssignersession window
0 likes · 13 min read
Understanding Session Windows in Apache Flink
Architect
Architect
May 25, 2016 · Big Data

How Flink Manages Memory to Overcome JVM Limitations

The article explains how Flink tackles JVM memory challenges by using proactive memory management, a custom serialization framework, cache‑friendly binary operations, and off‑heap memory techniques to reduce GC pressure, avoid OOM, and improve performance in big‑data workloads.

Big DataFlinkJVM
0 likes · 17 min read
How Flink Manages Memory to Overcome JVM Limitations
21CTO
21CTO
Nov 19, 2015 · Big Data

Beyond Hadoop: Modern Big Data Platforms and Technologies Explained

This article surveys the evolution of Hadoop and its ecosystem, explains core storage and processing concepts, and introduces contemporary big‑data technologies such as Spark, Flink, Kafka, Lambda architecture, NoSQL databases, and cloud‑native solutions, highlighting their roles and trade‑offs.

Big DataFlinkHadoop
0 likes · 17 min read
Beyond Hadoop: Modern Big Data Platforms and Technologies Explained
Efficient Ops
Efficient Ops
Oct 14, 2015 · Big Data

Spark vs Hadoop, Flink, HBase/Cassandra, Kafka & Tachyon: Expert Q&A

During a lively “Sit and Discuss” session, experts compared Spark and Hadoop, evaluated Flink against Spark, contrasted HBase with Cassandra, explained why Kafka (and sometimes Flink) is preferred for distributed messaging, and shared insights on Tachyon’s role in modern big‑data ecosystems.

FlinkHBaseHadoop
0 likes · 10 min read
Spark vs Hadoop, Flink, HBase/Cassandra, Kafka & Tachyon: Expert Q&A