Tagged articles
946 articles
Page 7 of 10
Architect
Architect
May 19, 2021 · Big Data

Flink-Based Real-Time Recommendation System Architecture and Deployment Guide

This article presents a comprehensive overview of a Flink-powered real-time recommendation system, detailing its v2.0 architecture, module functions, recommendation algorithms, front‑end and back‑end interfaces, Docker‑based deployment of MySQL, Redis, HBase, Kafka, and step‑by‑step startup procedures.

DockerFlinkHBase
0 likes · 9 min read
Flink-Based Real-Time Recommendation System Architecture and Deployment Guide
DataFunTalk
DataFunTalk
May 18, 2021 · Big Data

Evolution and Architecture of Beike Real-Time Computing Platform

Beike's real-time computing platform, led by Liu Liyun, has evolved from early Spark Streaming to a Flink-based system with SQL 1.0, 2.0, and upcoming 3.0, supporting a large-scale data warehouse, event-driven processing, extensive monitoring, and diverse business scenarios across the company's operations.

Event-drivenFlinkReal-time Streaming
0 likes · 14 min read
Evolution and Architecture of Beike Real-Time Computing Platform
DataFunTalk
DataFunTalk
May 14, 2021 · Big Data

Real‑time Billion‑Scale Data Transmission and AI Pipeline Architecture at Bilibili

This article presents a technical deep‑dive into Bilibili’s evolution from offline to real‑time data processing, describing the challenges of timeliness, ETL, AI feature engineering, and the design of a Flink‑on‑YARN incremental pipeline that supports trillion‑scale message throughput and AI‑driven real‑time applications.

AIBig DataFlink
0 likes · 27 min read
Real‑time Billion‑Scale Data Transmission and AI Pipeline Architecture at Bilibili
DataFunTalk
DataFunTalk
May 11, 2021 · Big Data

Design and Practice of Baixin Bank's Flink‑Based Real‑Time Computing Platform and Hudi‑Powered Real‑Time Data Lake

This article details Baixin Bank's construction of a Flink‑driven real‑time computing platform integrated with Hudi as a real‑time data lake, covering background, architecture, data collection, transformation, storage layers, technical challenges, future roadmap, and practical lessons for similar big‑data initiatives.

Big DataFlinkHudi
0 likes · 12 min read
Design and Practice of Baixin Bank's Flink‑Based Real‑Time Computing Platform and Hudi‑Powered Real‑Time Data Lake
DataFunTalk
DataFunTalk
May 4, 2021 · Big Data

Design and Implementation of a Real-Time Data Transmission Platform Based on Apache Flink at AutoHome

This article presents the background, requirements, architectural design, component interaction, and implementation details of AutoHome's real‑time data transmission platform built on Apache Flink, highlighting its high availability, exactly‑once semantics, scalability, DDL handling, and integration with existing streaming services.

Apache FlinkBig DataData Streaming
0 likes · 18 min read
Design and Implementation of a Real-Time Data Transmission Platform Based on Apache Flink at AutoHome
DataFunTalk
DataFunTalk
May 2, 2021 · Big Data

Continuous Optimization and Practice of Flink at Kuaishou

This article presents Kuaishou's comprehensive engineering practices for improving Flink's stability, task startup latency, and SQL performance, including high‑availability Kafka connectors, fault‑recovery mechanisms, I/O reductions, asynchronous job upgrades, aggregation optimizations, and future resource‑utilization plans.

Big DataFlinkKafka
0 likes · 10 min read
Continuous Optimization and Practice of Flink at Kuaishou
DataFunTalk
DataFunTalk
Apr 27, 2021 · Big Data

Implementing CDC‑to‑Hudi for Real‑Time Mutable Data in a Big Data System

This article describes how Linkflow migrated mutable customer data from MySQL to an Apache Hudi data lake using Debezium‑in‑Flink CDC, addressing challenges such as snapshot resumability, partial updates, row‑key merging, schema evolution, indexing, and concurrent writes to achieve minute‑level data freshness and improved offline processing performance.

Apache HudiBig DataCDC
0 likes · 21 min read
Implementing CDC‑to‑Hudi for Real‑Time Mutable Data in a Big Data System
DataFunTalk
DataFunTalk
Apr 23, 2021 · Big Data

Building and Evolving Zhihu’s Flink‑Based Data Integration Platform

This article details Zhihu’s transition from a Sqoop‑driven data integration system to a Flink‑centric platform, covering business scenarios, historical architecture, design goals, technology choices, performance optimizations, and future plans for unified streaming‑batch processing across diverse storage systems.

Batch ProcessingBig DataData Integration
0 likes · 14 min read
Building and Evolving Zhihu’s Flink‑Based Data Integration Platform
dbaplus Community
dbaplus Community
Apr 20, 2021 · Big Data

10 Common Pitfalls When Migrating Spark Jobs to Flink (And How to Avoid Them)

This article shares ten practical pitfalls encountered when moving hourly Spark session jobs to Flink, covering parallelism load imbalance, state TTL, checkpointing strategies, logging, JMX debugging, state migration risks, reduce vs process choices, input data validation, event‑time handling, and external storage considerations, along with concrete configuration snippets and performance tips.

FlinkSpark migrationState Management
0 likes · 20 min read
10 Common Pitfalls When Migrating Spark Jobs to Flink (And How to Avoid Them)
IT Architects Alliance
IT Architects Alliance
Apr 20, 2021 · Big Data

Real-time Log Processing System Based on Flink and Drools

This article describes a real-time log processing platform that integrates Kafka, Flink, Drools rule engine, Redis, and Elasticsearch to unify heterogeneous log formats, extract business metrics, and provide configurable, dynamic data processing for large‑scale logging scenarios.

DroolsElasticsearchFlink
0 likes · 6 min read
Real-time Log Processing System Based on Flink and Drools
dbaplus Community
dbaplus Community
Apr 17, 2021 · Big Data

How a Traditional Finance Firm Tackles Real‑Time Analytics with Flink

This article details a financial company's exploration of Apache Flink for real‑time processing, covering its unique business constraints, end‑to‑end data pipeline, single‑table and multi‑table use cases, implementation challenges, code snippets, data initialization, testing strategies, and lessons learned.

FinancialFlinkHBase
0 likes · 13 min read
How a Traditional Finance Firm Tackles Real‑Time Analytics with Flink
Huolala Tech
Huolala Tech
Apr 16, 2021 · Cloud Native

How to Build a Scalable Kubernetes Logging Pipeline with EFK and Fluentd

This article explains how to collect, process, and visualize Flink job logs on Kubernetes using an EFK stack with Fluentd, covering logging architectures, deployment of Elasticsearch, Kibana, and Fluentd, and the backend logic for querying and displaying logs in a feature platform.

EFKElasticsearchFlink
0 likes · 20 min read
How to Build a Scalable Kubernetes Logging Pipeline with EFK and Fluentd
Huolala Tech
Huolala Tech
Apr 8, 2021 · Big Data

Mastering PyFlink on Kubernetes: Practical Deployment Strategies and Lessons

This article explains how to deploy a PyFlink feature platform on Kubernetes, covering basic K8s concepts, Flink execution graphs, various deployment modes, preparation steps, detailed Standalone and Native deployment procedures, and practical tips for efficient big‑data processing.

Cloud NativeDeploymentFlink
0 likes · 16 min read
Mastering PyFlink on Kubernetes: Practical Deployment Strategies and Lessons
Big Data Technology & Architecture
Big Data Technology & Architecture
Apr 6, 2021 · Big Data

Real-Time Computing and Data Warehouse Solutions with Apache Flink: Architecture, Technology Selection, and Implementation

This article explores the evolution of real-time computing in the big data domain, detailing Apache Flink's capabilities, architectural designs, technology selections such as Kafka, Canal, HBase, ClickHouse, and provides practical implementation guides and case studies from Alibaba, Tencent, and other enterprises.

FlinkReal‑Time Computingdata-warehouse
0 likes · 33 min read
Real-Time Computing and Data Warehouse Solutions with Apache Flink: Architecture, Technology Selection, and Implementation
Big Data Technology Architecture
Big Data Technology Architecture
Apr 5, 2021 · Big Data

Evolution of Real‑Time Data Warehouses: From 1.0 to 3.0 and the Road to Batch‑Stream Unified Architecture

The article reviews the current state of offline Hive‑based data warehouses, explains the emergence of real‑time data warehouses (1.0) built on Kafka and Flink, discusses their limitations, and outlines the progression toward batch‑stream unified architectures (2.0 and 3.0) leveraging data‑lake technologies such as Iceberg.

Batch-Stream IntegrationBig DataFlink
0 likes · 13 min read
Evolution of Real‑Time Data Warehouses: From 1.0 to 3.0 and the Road to Batch‑Stream Unified Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Apr 4, 2021 · Big Data

Flink Performance Tuning Guide: Memory Configuration, Parallelism, Checkpoint Optimization, and Common Issues

This guide details comprehensive Flink performance tuning techniques, covering memory configuration, GC settings, parallelism adjustments, process parameters, partitioning strategies, Netty network tuning, checkpoint optimization, and common issues such as data skew and resource bottlenecks.

CheckpointFlinkMemory Management
0 likes · 18 min read
Flink Performance Tuning Guide: Memory Configuration, Parallelism, Checkpoint Optimization, and Common Issues
DataFunTalk
DataFunTalk
Apr 3, 2021 · Big Data

Building a Real-Time Data Computing Platform for Tencent Games: Practices and Architecture

This article describes Tencent Games' end‑to‑end real‑time data platform, covering its construction background, the unified OneData development framework, the OneFun data‑service API layer, micro‑service and ServiceMesh management, and the operational benefits achieved through automation, standardization, and scalability.

FlinkGame AnalyticsMicroservices
0 likes · 14 min read
Building a Real-Time Data Computing Platform for Tencent Games: Practices and Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 29, 2021 · Big Data

Understanding Flink AggregateFunction, Session Windows, and Timer Mechanisms

This article explains how Flink's DataStream API uses AggregateFunction and session windows, details the MergingWindowAssigner and MergingWindowSet implementations, and demonstrates timer registration and processing with KeyedProcessFunction, providing full code examples and internal workflow analysis.

AggregateFunctionFlinkKeyedProcessFunction
0 likes · 25 min read
Understanding Flink AggregateFunction, Session Windows, and Timer Mechanisms
HelloTech
HelloTech
Mar 26, 2021 · Big Data

Data Quality and Interface Semantic Monitoring for Algorithm Testing Platform

The article describes how algorithm testing teams tackled data‑quality and interface‑semantic monitoring problems by building a unified business monitoring platform that checks table, storage and service consistency, validates response semantics, and, through dashboards, alerts and correction tools, quickly identified dozens of offline and online issues, guiding future reliability enhancements.

AIBig DataData Quality
0 likes · 26 min read
Data Quality and Interface Semantic Monitoring for Algorithm Testing Platform
iQIYI Technical Product Team
iQIYI Technical Product Team
Mar 26, 2021 · Big Data

Evolution of iQIYI's Real-Time Big Data Ecosystem

iQIYI transformed its data infrastructure from a traditional offline T+1 model to a comprehensive real‑time ecosystem—leveraging Kafka, Flink, a three‑layer Stream Data Service Platform, the Talos drag‑and‑drop pipeline, and a Druid‑based analytics platform—to enable low‑latency monitoring, personalized recommendations, ad targeting, and continuous machine‑learning workflows while planning future stream‑batch integration and lake‑warehouse convergence.

AnalyticsBig DataFlink
0 likes · 13 min read
Evolution of iQIYI's Real-Time Big Data Ecosystem
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 23, 2021 · Big Data

Practical Implementations of Data Lakes: Huawei Production Scenario, Real-Time Financial Data Lake, and Soul's Delta Lake

This article presents a comprehensive overview of data lake implementations, detailing Huawei's production‑scene platform, a real‑time financial data lake architecture using Kafka, Flink and Iceberg, and Soul's Delta Lake practice with Spark, Hive, and custom ETL tools, highlighting design choices, processing flows, and operational considerations.

Data LakeDelta LakeFlink
0 likes · 8 min read
Practical Implementations of Data Lakes: Huawei Production Scenario, Real-Time Financial Data Lake, and Soul's Delta Lake
DataFunTalk
DataFunTalk
Mar 21, 2021 · Big Data

Single‑Point Recovery and Regional Checkpoint in Flink: Design, Implementation, and Optimizations

This article presents ByteDance's recent Flink enhancements, detailing a single‑point recovery mechanism for the network layer and a regional checkpoint strategy that together improve failover latency, reduce output loss, and enable scalable, high‑throughput stream processing for large‑scale real‑time recommendation workloads.

Big DataCheckpointFlink
0 likes · 12 min read
Single‑Point Recovery and Regional Checkpoint in Flink: Design, Implementation, and Optimizations
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 18, 2021 · Big Data

Flink Job Troubleshooting and Performance Optimization: Data Skew, Kafka Configuration, Resource Management, and Checkpoint Issues

This article details common Flink streaming problems such as data skew causing task back‑pressure, oversized Kafka messages, high‑throughput ack settings, slot removal errors, checkpoint timeouts, and resource constraints, and provides concrete configuration changes and architectural adjustments to resolve them.

CheckpointData SkewFlink
0 likes · 18 min read
Flink Job Troubleshooting and Performance Optimization: Data Skew, Kafka Configuration, Resource Management, and Checkpoint Issues
DataFunTalk
DataFunTalk
Mar 15, 2021 · Big Data

Ten Gotchas When Migrating Spark Jobs to Flink

This article shares ten practical pitfalls encountered while moving hour‑level Spark session processing jobs to Apache Flink, covering parallelism skew, state TTL, checkpoint handling, logging, debugging, state migration, Reduce vs Process, input validation, event‑time handling, and the trade‑offs of storing data inside Flink.

Big DataFlinkState Management
0 likes · 19 min read
Ten Gotchas When Migrating Spark Jobs to Flink
DataFunTalk
DataFunTalk
Mar 1, 2021 · Artificial Intelligence

Online Learning and Real‑Time Model Updating in JD Retail Search Using Flink

The article describes JD's end‑to‑end online learning pipeline for retail search, covering the background, system architecture, real‑time feature collection, sample stitching, Flink‑based incremental training, parameter updates, and full‑link monitoring to achieve low‑latency, high‑accuracy model serving.

FlinkModel ServingOnline Learning
0 likes · 9 min read
Online Learning and Real‑Time Model Updating in JD Retail Search Using Flink
DataFunTalk
DataFunTalk
Feb 22, 2021 · Big Data

Optimizing Flink Real-Time Task Resources: Memory and Message Processing Perspectives

This article explores practical methods for optimizing Flink real‑time task resources on Kubernetes, focusing on memory usage analysis via GC logs and message‑processing capacity assessment, proposing automated detection of over‑provisioned memory and CPU, and outlining a workflow for resource adjustment to reduce costs.

Big DataFlinkGC Analysis
0 likes · 18 min read
Optimizing Flink Real-Time Task Resources: Memory and Message Processing Perspectives
dbaplus Community
dbaplus Community
Feb 18, 2021 · Big Data

How JD Search Scaled Real‑Time Analytics with Flink and Doris

This article details JD Search's journey from a Storm‑based pipeline to a Flink‑driven architecture backed by Apache Doris, covering business requirements, technical challenges, design trade‑offs, performance optimizations for massive traffic spikes, and future plans for their real‑time OLAP data warehouse.

Big DataFlinkOLAP
0 likes · 12 min read
How JD Search Scaled Real‑Time Analytics with Flink and Doris
DataFunTalk
DataFunTalk
Feb 17, 2021 · Big Data

Apache Iceberg 0.11.0: New Partition Support, SortOrder, Flink Streaming Reader, and Ecosystem Integrations

The article details Apache Iceberg 0.11.0's core enhancements—including partition changes, SortOrder, extensive Flink and Spark integrations, CDC/Upsert support, hash‑based write distribution to reduce small files, and upcoming 0.12.0 roadmap—while providing practical SQL and API examples for data‑lake practitioners.

Apache IcebergBig DataCDC
0 likes · 13 min read
Apache Iceberg 0.11.0: New Partition Support, SortOrder, Flink Streaming Reader, and Ecosystem Integrations
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 7, 2021 · Big Data

Building a Flink SQL Platform on Zeppelin: Installation, Configuration, and Advanced Use Cases

This guide walks through setting up Apache Zeppelin as a low‑cost, SQL‑centric development platform for Flink, covering environment preparation, installation, interpreter configuration, execution modes, verification, common pitfalls, dimension‑table joins, custom UDFs, Redis integration, and dual‑stream join techniques.

FlinkStreamingUDF
0 likes · 24 min read
Building a Flink SQL Platform on Zeppelin: Installation, Configuration, and Advanced Use Cases
DataFunTalk
DataFunTalk
Feb 7, 2021 · Big Data

Optimizations and Extensions for Flink SQL in Tencent Real‑Time Computing Platform

This article, presented by Tencent senior engineer Du Li, details the current state of Flink SQL, compares Jar, Canvas, and SQL modes, introduces window‑function extensions, retract‑stream optimizations, and outlines future roadmap plans for cost‑based optimization and new features in the real‑time computing platform.

Big DataFlinkRetract Stream
0 likes · 19 min read
Optimizations and Extensions for Flink SQL in Tencent Real‑Time Computing Platform
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 1, 2021 · Big Data

Flink 1.12 Enhancements: Full SQL Support, Hive Integration, and Streaming Write to Hive

The article reviews Flink 1.12's major enhancements, including comprehensive SQL capabilities, deep integration with Hive via catalog and streaming support, and a practical code example that demonstrates how to write streaming data into Hive tables while handling partition commits and small‑file merging.

Data IntegrationFlinkStreaming
0 likes · 7 min read
Flink 1.12 Enhancements: Full SQL Support, Hive Integration, and Streaming Write to Hive
TAL Education Technology
TAL Education Technology
Jan 28, 2021 · Big Data

Batch-Stream Fusion in Education: TAL’s Real-Time Data Platform Practices

This article, presented by senior data platform engineer Mao Xiangyi of TAL Education, details the design and implementation of the company’s real‑time T‑Streaming platform, covering its three‑layer data architecture, batch‑stream integration techniques, ODS layer real‑timeization, Flink SQL development workflow, hybrid‑cloud deployment, and a case study of K‑12 renewal reporting.

Batch-Stream IntegrationEducation AnalyticsFlink
0 likes · 18 min read
Batch-Stream Fusion in Education: TAL’s Real-Time Data Platform Practices
DataFunTalk
DataFunTalk
Jan 28, 2021 · Big Data

Real-Time Financial Data Lake: Architecture, Practices, and Applications at Zhongyuan Bank

This talk by Ba Xueyu, a senior big data platform engineer at Zhongyuan Bank, outlines the background, architecture, and engineering practices of a real‑time financial data lake, highlighting its open, timely, and integrated design, streaming platform implementation, and use cases such as anti‑fraud and real‑time BI.

Flinkanti-fraudfinancial analytics
0 likes · 15 min read
Real-Time Financial Data Lake: Architecture, Practices, and Applications at Zhongyuan Bank
dbaplus Community
dbaplus Community
Jan 27, 2021 · Big Data

How We Upgraded a 1500-Node Flink Cluster to 1.10: Challenges and Solutions

Facing a massive 1500‑node Flink 1.4.2 cluster handling over 12,000 tasks and 30 trillion daily events, we migrated to Flink 1.10, detailing new DDL/Catalog support, SQL enhancements, memory tuning, compatibility patches, extensive testing, and engine optimizations such as task‑load metrics and balanced sub‑task scheduling.

Big DataFlinkVersion Upgrade
0 likes · 13 min read
How We Upgraded a 1500-Node Flink Cluster to 1.10: Challenges and Solutions
Youzan Coder
Youzan Coder
Jan 13, 2021 · Big Data

Flink Real-time Task Resource Optimization Practice at Youzan

At Youzan, Flink real‑time tasks running on Kubernetes are optimized by daily GC‑log memory analysis and Kafka‑throughput monitoring, which compute recommended heap sizes and parallelism adjustments to eliminate over‑provisioned CPU and memory, automate alerts, and pave the way for fully automated resource tuning.

FlinkGC tuningKubernetes
0 likes · 16 min read
Flink Real-time Task Resource Optimization Practice at Youzan
Didi Tech
Didi Tech
Jan 12, 2021 · Big Data

Upgrading DiDi Real‑time Computing Engine from Flink 1.4 to Flink 1.10: Challenges, Optimizations, and Lessons Learned

DiDi upgraded its massive real‑time computing engine from Flink 1.4.2 to Flink 1.10, implementing a transparent migration across 1500 machines, adding native DDL, binary rows, MiniBatch, improved scheduling and window functions, and establishing a rigorous testing pipeline that achieved 99.9 % compatibility while preventing OOM issues.

FlinkPerformanceOptimizationRealTimeComputing
0 likes · 11 min read
Upgrading DiDi Real‑time Computing Engine from Flink 1.4 to Flink 1.10: Challenges, Optimizations, and Lessons Learned
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 9, 2021 · Big Data

Comprehensive 2021 Flink Interview Questions and Answers

This article presents a detailed collection of 2021 Flink interview questions covering checkpoint mechanisms, watermarks, state backends, join types, fault tolerance, resource configuration, and recent Flink 1.10 features, providing concise explanations and code examples for each topic.

CheckpointFlinkState Backend
0 likes · 23 min read
Comprehensive 2021 Flink Interview Questions and Answers
NetEase Game Operations Platform
NetEase Game Operations Platform
Jan 9, 2021 · Operations

Real-Time Log Intelligent Classification Practice

This article describes how NetEase built a real‑time log intelligent classification system using Flink and AI algorithms, detailing the challenges of massive log volumes, the Drain template‑extraction method, algorithm workflow, performance results, and a practical case study that demonstrates reduced alert storms and faster issue diagnosis.

AIDrain algorithmFlink
0 likes · 15 min read
Real-Time Log Intelligent Classification Practice
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 27, 2020 · Big Data

Understanding and Solving the Small File Problem in Big Data Systems

This article examines the pervasive small‑file issue in big‑data environments, explains its impact on storage and processing performance, and presents a comprehensive set of solutions—including file merging, Hadoop archives, SequenceFiles, HBase, CombineFileInputFormat, and Spark/Flink strategies—to mitigate metadata overhead and improve I/O efficiency.

FlinkHadoopNameNode
0 likes · 41 min read
Understanding and Solving the Small File Problem in Big Data Systems
Youzan Coder
Youzan Coder
Dec 21, 2020 · Big Data

Youzan Big Data Technology Salon – Cost Governance, Apache Iceberg, Flink, and Data‑Driven Growth

At Youzan’s Big Data Technology Salon, over 100 attendees heard leaders from Youzan, NetEase Yishu, and Didi discuss cost governance, Apache Iceberg data lakes, large‑scale Flink real‑time computing, and data‑driven growth strategies, highlighting practical implementations, savings of millions and tools for merchant empowerment.

Apache IcebergData GrowthFlink
0 likes · 5 min read
Youzan Big Data Technology Salon – Cost Governance, Apache Iceberg, Flink, and Data‑Driven Growth
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 20, 2020 · Big Data

Getting Started with Apache Zeppelin: Installation, Core Features, and Integration with JDBC, Spark, and Flink

This tutorial introduces Apache Zeppelin, explains REPL and Jupyter concepts, outlines its core features and project structure, and provides step‑by‑step instructions for installing Zeppelin, creating notebooks, and connecting to databases, Spark, and Flink with practical code examples.

Apache ZeppelinFlinkInstallation
0 likes · 11 min read
Getting Started with Apache Zeppelin: Installation, Core Features, and Integration with JDBC, Spark, and Flink
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Dec 18, 2020 · Big Data

Unlocking the Data Middle Platform: From Ingestion to Real‑Time Analytics

This article provides a comprehensive overview of data middle platform concepts, covering data aggregation, collection tools, development modules, job scheduling, baseline control, heterogeneous storage, permission management, real‑time and offline processing, governance, services, and implementation details for building robust big‑data solutions.

Data GovernanceData PlatformETL
0 likes · 25 min read
Unlocking the Data Middle Platform: From Ingestion to Real‑Time Analytics
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 16, 2020 · Big Data

Designing a Real‑Time Data Processing Platform with Flink: Architecture, Deployment, and Operations

This article explains how to build a real‑time data processing platform using Flink, covering the Lambda architecture, design approaches, SQL and custom‑Jar task definitions, UI drag‑and‑drop, cluster resource management on Yarn and Kubernetes, submission modes, scheduling, permission and metadata handling, logging, and monitoring with Prometheus and Grafana.

Cluster ManagementFlinkLambda architecture
0 likes · 19 min read
Designing a Real‑Time Data Processing Platform with Flink: Architecture, Deployment, and Operations
Youzan Coder
Youzan Coder
Dec 9, 2020 · Big Data

Youzan Big Data Technology Salon: Practices in Data Cost Governance, Apache Iceberg, Flink, and Data-Driven Growth

The Youzan Big Data Technology Salon brought together Youzan, NetEase and Didi to share practical approaches for cutting data‑infrastructure costs, building an Apache Iceberg‑based data lake, scaling Flink real‑time workloads, and creating a data‑driven growth platform that leverages tracking, A/B testing and analytics.

Apache IcebergBig DataData Cost Governance
0 likes · 5 min read
Youzan Big Data Technology Salon: Practices in Data Cost Governance, Apache Iceberg, Flink, and Data-Driven Growth
DataFunTalk
DataFunTalk
Dec 7, 2020 · Big Data

Jingdong's Flink Real‑Time Computing Platform: Containerization, Optimizations, and Future Roadmap

This article details Jingdong's evolution from Storm to Flink, the architecture of its Kubernetes‑based real‑time computing platform, extensive containerization practices, performance and stability optimizations, and the future plan to unify batch‑stream processing while expanding SQL support and intelligent operations.

Batch-Stream IntegrationFlinkKubernetes
0 likes · 16 min read
Jingdong's Flink Real‑Time Computing Platform: Containerization, Optimizations, and Future Roadmap
DataFunTalk
DataFunTalk
Dec 6, 2020 · Artificial Intelligence

Building an AI Ecosystem with Flink: Overview of AI Flow and Its Architecture

This article explains how Flink enables end‑to‑end machine‑learning workflows through AI Flow, covering the background of Lambda architecture, AI task stages, the advantages of Flink, AI Flow components, AI Graph concepts, integration with Python and TensorFlow, and a real‑world advertising recommendation use case.

AI FlowFlinkReal-Time
0 likes · 14 min read
Building an AI Ecosystem with Flink: Overview of AI Flow and Its Architecture
DataFunTalk
DataFunTalk
Dec 3, 2020 · Big Data

Streaming Data Lake Ingestion with Apache Flink and Apache Iceberg

This article explains how Apache Flink integrates with data lake architectures, especially using Apache Iceberg as a table format, to enable real‑time streaming ingestion, CDC processing, near‑real‑time lambda architectures, and future enhancements like automatic file merging and row‑level deletes.

Apache IcebergData LakeFlink
0 likes · 13 min read
Streaming Data Lake Ingestion with Apache Flink and Apache Iceberg
DataFunSummit
DataFunSummit
Dec 1, 2020 · Artificial Intelligence

Building an AI Ecosystem with Flink: AI Flow Architecture, Components, and Applications

This article explains how Flink enables end‑to‑end AI workflows through the AI Flow platform, covering the Lambda architecture background, AI task pipeline stages, the reasons for choosing Flink, AI Flow’s graph model, core services, integration with ML pipelines, and real‑world advertising recommendation use cases.

AI FlowAI PipelineBig Data
0 likes · 12 min read
Building an AI Ecosystem with Flink: AI Flow Architecture, Components, and Applications
Alibaba Cloud Developer
Alibaba Cloud Developer
Nov 23, 2020 · Big Data

How Alibaba’s CCO Built a Cloud‑Native Real‑Time Data Warehouse with Hologres

Alibaba’s Customer Experience (CCO) team transformed its real‑time data platform by evolving from a Lambda‑style database architecture to a cloud‑native real‑time data warehouse powered by Hologres and Flink, achieving higher throughput, lower latency, reduced costs, and self‑service analytics for massive Double‑11 traffic.

AlibabaBig DataFlink
0 likes · 15 min read
How Alibaba’s CCO Built a Cloud‑Native Real‑Time Data Warehouse with Hologres
DataFunTalk
DataFunTalk
Nov 17, 2020 · Artificial Intelligence

Alink: A Flink‑Based Machine Learning Platform – Overview, Features, and Quick‑Start Guide

This article introduces Alink, Alibaba's open‑source machine‑learning platform built on Flink, explains its core algorithms, performance comparison with Spark ML, version‑wise feature evolution, and provides practical quick‑start instructions for both Java (Maven) and Python (PyAlink) users, including data source handling, type conversion components, unified file‑system operations, and an overview of its FM algorithm implementation.

AlinkBatch ProcessingData Integration
0 likes · 13 min read
Alink: A Flink‑Based Machine Learning Platform – Overview, Features, and Quick‑Start Guide
DataFunSummit
DataFunSummit
Nov 15, 2020 · Big Data

Evolution of 58.com Commercial Data Warehouse: From 0‑1 to 3.0 Using Hadoop, Flume, Kafka, Spark, and Flink

This article details the three‑stage evolution of 58.com’s commercial data warehouse, describing its massive scale, four‑layer architecture, technical challenges, migrations from MapReduce to Hive and Flink, real‑time streaming upgrades, and the resulting improvements in stability, accuracy, and timeliness.

Big DataData ArchitectureFlink
0 likes · 10 min read
Evolution of 58.com Commercial Data Warehouse: From 0‑1 to 3.0 Using Hadoop, Flume, Kafka, Spark, and Flink
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 13, 2020 · Big Data

Understanding Flink Operator Chaining Mechanism

This article explains the Flink operator chaining mechanism, detailing how logical plans are transformed into JobGraph and ExecutionGraph, the conditions for chaining, code implementations, and how the runtime constructs OperatorChain to improve execution efficiency.

FlinkJobGraphjava
0 likes · 12 min read
Understanding Flink Operator Chaining Mechanism
Architect
Architect
Nov 11, 2020 · Big Data

Real-time Click Stream Data Warehouse with Flink and ClickHouse: Architecture, Layered Design, and Practical Tips

This article explains how to build a real‑time click‑stream data warehouse using Flink for stream processing and ClickHouse for near‑real‑time OLAP, covering click‑stream characteristics, dimensional modeling, layered warehouse design, async dimension joins, sink implementation, and data rebalancing strategies.

Big DataClick StreamFlink
0 likes · 7 min read
Real-time Click Stream Data Warehouse with Flink and ClickHouse: Architecture, Layered Design, and Practical Tips
DataFunTalk
DataFunTalk
Nov 11, 2020 · Big Data

Evolution and Practices of Cainiao's Real‑Time Data Warehouse for International Import Business

This article details the high‑complexity logistics scenario of Cainiao's international import business, explains the evolution from offline to real‑time data warehouses (versions 1.0 and 2.0), describes the layered architecture, enumerates technical challenges such as multi‑source joins, state explosion, out‑of‑order processing, and presents concrete solutions using Flink features, logical middle‑layers, union‑all joins, deduplication, timer services, and batch‑stream hybrid processing.

Big DataFlinkState Management
0 likes · 21 min read
Evolution and Practices of Cainiao's Real‑Time Data Warehouse for International Import Business
DataFunSummit
DataFunSummit
Nov 10, 2020 · Artificial Intelligence

Alink: An Open‑Source Machine Learning Platform on Flink – Features, Performance, and Quick‑Start Guide

This article introduces Alink, Alibaba's open‑source machine‑learning platform built on Flink, detailing its core algorithms, performance advantages over Spark ML, version evolution, Maven and PyAlink installation steps, data‑source integrations, FM algorithm support, and unified file‑system operations for both batch and streaming workloads.

AlinkFlinkPyAlink
0 likes · 11 min read
Alink: An Open‑Source Machine Learning Platform on Flink – Features, Performance, and Quick‑Start Guide
Tencent Cloud Developer
Tencent Cloud Developer
Nov 10, 2020 · Big Data

Design and Optimization of a Real-Time Video Recommendation Indexing System

The article describes a real‑time video recommendation indexing system that replaces 30‑minute batch builds with an Elasticsearch‑based service, integrates prior and posterior data pipelines, ensures consistency via locking and version checks, enables zero‑downtime upgrades, smooths write spikes, and boosts recall performance through multi‑level caching and ES tuning, delivering sub‑40 ms latency and significant business growth.

ElasticsearchFlinkcaching
0 likes · 13 min read
Design and Optimization of a Real-Time Video Recommendation Indexing System
360 Tech Engineering
360 Tech Engineering
Nov 6, 2020 · Big Data

Guide to Flink SQL: Features, Scenarios, and Productization

Flink SQL, the high‑level SQL interface for Apache Flink, offers language‑independent, dependency‑free, easy‑to‑use stream processing with advanced features such as DDL, UDFs, time semantics, windowing, pattern matching, and built‑in connectors, supporting data synchronization, batch‑stream fusion, Hive integration, and various product enhancements.

Data IntegrationFlinkReal-Time
0 likes · 11 min read
Guide to Flink SQL: Features, Scenarios, and Productization
Amap Tech
Amap Tech
Nov 6, 2020 · Operations

Full-Link Load Testing Platform TestPG: Architecture, Corpus Production, and Intelligent Features

Gaode’s TestPG platform solves full‑link load‑testing bottlenecks by unifying traffic capture with Iflow, converting logs into standardized corpora via a Flink pipeline, and applying corpus‑intelligence that extracts seasonal feature statistics and predicts distributions for precise, feature‑level throttling, enabling faster, more reliable testing and future autonomous optimization.

FlinkLoad TestingTraffic Capture
0 likes · 16 min read
Full-Link Load Testing Platform TestPG: Architecture, Corpus Production, and Intelligent Features
DataFunTalk
DataFunTalk
Nov 1, 2020 · Big Data

Flink 1.11 Integration with Hive: New Features and Real‑time Data Warehouse

The article explains how Flink 1.11 deepens its integration with Hive, covering background, new connector features, simplified dependency management, enhanced Hive dialect, streaming writes and reads, temporal table joins, and how these capabilities enable a unified batch‑streaming data warehouse.

Batch‑Streaming IntegrationFlinkStreaming
0 likes · 16 min read
Flink 1.11 Integration with Hive: New Features and Real‑time Data Warehouse
Big Data Technology Architecture
Big Data Technology Architecture
Nov 1, 2020 · Big Data

Practical Application of Flink + Kafka in NetEase Cloud Music Real‑Time Computing Platform

This article presents NetEase Cloud Music's real‑time computing platform built on Flink and Kafka, covering background, architectural design, Kafka and Flink selection reasons, platformization, warehouse usage, encountered challenges, and the solutions implemented to improve reliability and performance.

FlinkKafkaReal-time Streaming
0 likes · 11 min read
Practical Application of Flink + Kafka in NetEase Cloud Music Real‑Time Computing Platform
dbaplus Community
dbaplus Community
Oct 29, 2020 · Big Data

Inside Didi’s Real-Time Data Warehouse for Ride-Sharing: Architecture & Lessons

This article details Didi’s end‑to‑end construction of a real‑time data warehouse for the Ride‑Sharing (顺风车) business, covering motivations, layer‑by‑layer architecture, naming conventions, StreamSQL capabilities, operational tooling, achieved results, challenges, and future batch‑stream integration plans.

DidiFlinkreal-time data warehouse
0 likes · 21 min read
Inside Didi’s Real-Time Data Warehouse for Ride-Sharing: Architecture & Lessons
DataFunTalk
DataFunTalk
Oct 29, 2020 · Big Data

Building a Large-Scale Near Real-Time Data Analytics Platform at Lyft Using Apache Flink

Lyft transformed its legacy data pipeline by designing a cloud‑native, Flink‑based near real‑time analytics platform that ingests billions of events, writes Parquet files to S3, leverages Presto for interactive queries, and implements multi‑stage non‑blocking ETL, fault‑tolerant back‑fill, and extensive performance optimizations.

AWSData LakeETL
0 likes · 12 min read
Building a Large-Scale Near Real-Time Data Analytics Platform at Lyft Using Apache Flink
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 23, 2020 · Big Data

Overview of Real-Time Big Data Processing: Spark Structured Streaming, CarbonData, Flink, and Cloud Stream

This article provides a comprehensive overview of modern real‑time big‑data solutions, detailing Spark Structured Streaming capabilities, CarbonData’s storage architecture, Meituan’s Flink deployments, and Huawei Cloud Stream’s unified streaming service, highlighting their features, challenges, and future directions.

CarbonDataFlinkReal-time analytics
0 likes · 17 min read
Overview of Real-Time Big Data Processing: Spark Structured Streaming, CarbonData, Flink, and Cloud Stream
ITPUB
ITPUB
Oct 16, 2020 · Big Data

How NetEase Cloud Music Built a Real‑Time Data Warehouse with Flink & Calcite

This article details NetEase Cloud Music's evolution of a real‑time data warehouse built on Flink 1.9 and Calcite, covering platform scale, architectural design, metadata management, SDK simplifications, monitoring improvements, and concrete use cases such as AB‑testing, live reporting, and feature serving.

Big DataCalciteFlink
0 likes · 8 min read
How NetEase Cloud Music Built a Real‑Time Data Warehouse with Flink & Calcite
dbaplus Community
dbaplus Community
Oct 13, 2020 · Big Data

How to Build a Real‑Time Data Warehouse with Flink: Principles, Architecture, and Best Practices

This article explains why real‑time data warehouses are needed, outlines their core principles, compares them with offline warehouses, describes typical use cases such as real‑time OLAP, dashboards, feature generation and monitoring, and provides a step‑by‑step guide to designing, implementing, and operating a Flink‑based streaming warehouse with Kafka, HBase, and metadata management.

FlinkKafkaOLAP
0 likes · 29 min read
How to Build a Real‑Time Data Warehouse with Flink: Principles, Architecture, and Best Practices