Tagged articles
27 articles
Page 1 of 1
Alibaba Cloud Developer
Alibaba Cloud Developer
Nov 7, 2025 · Big Data

Unlock Enterprise‑Grade Data Pipelines with DMS Airflow: Features, Integration & Code Samples

This article introduces DMS Airflow, an enterprise‑level data workflow orchestration platform built on Apache Airflow, covering its advanced DAG capabilities, deep DMS integration, scheduling, task dependency management, dynamic task generation, resource scaling, security features, and practical code examples for SQL, Spark, DTS, and Notebook tasks.

AirflowBig DataDMS
0 likes · 20 min read
Unlock Enterprise‑Grade Data Pipelines with DMS Airflow: Features, Integration & Code Samples
Alibaba Cloud Native
Alibaba Cloud Native
May 18, 2025 · Cloud Native

Airflow vs Argo Workflows: Which Cloud‑Native Scheduler Wins for Data Engineering?

This comprehensive guide compares Apache Airflow and Argo Workflows—two leading cloud‑native distributed task schedulers—by examining their core features, architectures, DAG handling, performance, language support, big‑data and AI integrations, and provides practical selection advice for data engineers and DevOps teams.

AirflowArgo WorkflowsWorkflow Orchestration
0 likes · 23 min read
Airflow vs Argo Workflows: Which Cloud‑Native Scheduler Wins for Data Engineering?
Big Data Technology & Architecture
Big Data Technology & Architecture
May 27, 2024 · Big Data

Athena Data Factory: A One‑Stop Data Development and Governance Platform – Architecture, Features, and Impact

The Athena Data Factory, built by Spark Thinking, is a comprehensive one‑stop data development and governance platform that integrates data integration, development, analysis, and services, offering offline, real‑time, and AI pipelines, modular architecture, extensive monitoring, and cost‑optimisation to empower thousands of users across the company.

AirflowBig DataData Platform
0 likes · 26 min read
Athena Data Factory: A One‑Stop Data Development and Governance Platform – Architecture, Features, and Impact
DataFunTalk
DataFunTalk
May 26, 2024 · Big Data

Athena Data Factory: A One‑Stop Data Development and Governance Platform for Sparkle Thinking

The article details how Sparkle Thinking built the Athena Data Factory—a comprehensive, self‑service data development and governance platform that integrates data integration, ETL, real‑time processing, monitoring, and analytics, describing its architecture, key technologies, implementation timeline, operational practices, performance gains, and future directions.

AirflowETLFlink
0 likes · 26 min read
Athena Data Factory: A One‑Stop Data Development and Governance Platform for Sparkle Thinking
Python Programming Learning Circle
Python Programming Learning Circle
Dec 9, 2023 · Backend Development

Eight Ways to Implement Python Scheduled Tasks

This article presents a comprehensive guide to implementing periodic tasks in Python, covering eight approaches including simple while‑loop with sleep, Timeloop, threading.Timer, sched, schedule, APScheduler, Celery, and Apache Airflow, each with code examples and practical notes.

APSchedulerAirflowScheduling
0 likes · 24 min read
Eight Ways to Implement Python Scheduled Tasks
Python Programming Learning Circle
Python Programming Learning Circle
Aug 14, 2023 · Backend Development

Common Python Scheduling Techniques and Tools

This article reviews multiple ways to implement periodic tasks in Python, covering simple loops with sleep, libraries such as Timeloop, threading.Timer, sched, schedule, the APScheduler framework, as well as distributed solutions like Celery and Apache Airflow, and provides code examples for each method.

APSchedulerAirflowPython
0 likes · 23 min read
Common Python Scheduling Techniques and Tools
Data Thinking Notes
Data Thinking Notes
May 17, 2023 · Big Data

Inside Wing Pay’s Scalable Big Data Platform: Architecture & Governance

This article details how Wing Pay built a comprehensive data development and governance platform, covering company background, business scenarios, goals, challenges, task development workflow, task types, SparkSQL editor features, double‑environment deployment, Airflow scheduling, DataX data bus, resource isolation, compute optimization, data quality monitoring, cloud‑native practices, future outlook, and a Q&A on data permissions and governance.

AirflowBig DataCloud Native
0 likes · 17 min read
Inside Wing Pay’s Scalable Big Data Platform: Architecture & Governance
Python Programming Learning Circle
Python Programming Learning Circle
Apr 14, 2023 · Backend Development

Implementing Periodic Tasks in Python: while‑loop, Timeloop, sched, schedule, APScheduler, Celery, and Airflow

This article reviews several Python approaches for creating scheduled or periodic jobs—including a simple while‑True loop with sleep, the Timeloop library, the built‑in sched module, the schedule package, APScheduler, Celery, and Apache Airflow—explaining their usage, advantages, limitations, and providing ready‑to‑run code samples.

APSchedulerAirflowBackend
0 likes · 15 min read
Implementing Periodic Tasks in Python: while‑loop, Timeloop, sched, schedule, APScheduler, Celery, and Airflow
DataFunTalk
DataFunTalk
Apr 11, 2023 · Big Data

WingPay's Big Data Platform Construction and Development Experience

This article presents a comprehensive case study of WingPay's big data platform, covering company background, data development and governance platform design, task development workflow, architectural choices, scheduling engine selection, data bus implementation, resource isolation, quality monitoring, cloud‑native practices, future challenges, and a Q&A session.

AirflowResource Isolation
0 likes · 15 min read
WingPay's Big Data Platform Construction and Development Experience
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 14, 2022 · Big Data

Comparison of Common Big Data Scheduling Systems: Oozie, Azkaban, Airflow, XXL‑Job, and DolphinScheduler

This article provides a comparative overview of several popular big‑data workflow schedulers—including Oozie, Azkaban, Airflow, XXL‑Job, and DolphinScheduler—detailing their supported task types, visual workflow definition, monitoring capabilities, pause/resume features, high‑availability options, and other notable characteristics.

AirflowDolphinSchedulerOozie
0 likes · 9 min read
Comparison of Common Big Data Scheduling Systems: Oozie, Azkaban, Airflow, XXL‑Job, and DolphinScheduler
Big Data Technology Architecture
Big Data Technology Architecture
Nov 28, 2021 · Big Data

EMR Studio: Architecture and Features for Simplifying Big Data Development

EMR Studio is a one‑stop, open‑source‑compatible big data development platform that integrates Zeppelin, Jupyter, Airflow and a custom Cluster Manager to streamline job creation, scheduling, monitoring, and cluster switching, thereby addressing common usability challenges in Spark, Flink, Hive, and Presto workflows.

AirflowApache SparkEMR Studio
0 likes · 9 min read
EMR Studio: Architecture and Features for Simplifying Big Data Development
MaGe Linux Operations
MaGe Linux Operations
Oct 2, 2021 · Operations

Master Python Scheduling: 10 Practical Ways to Run Periodic Tasks

This comprehensive guide explores ten Python techniques for implementing periodic tasks—from simple while‑loop sleeps and the Timeloop library to advanced frameworks like APScheduler, Celery, and Apache Airflow—providing code samples, advantages, limitations, and architectural insights for reliable scheduling.

APSchedulerAirflowAsync
0 likes · 27 min read
Master Python Scheduling: 10 Practical Ways to Run Periodic Tasks
Python Programming Learning Circle
Python Programming Learning Circle
Sep 9, 2021 · Backend Development

Common Python Scheduling Techniques and Libraries

This article provides a comprehensive overview of various Python approaches for implementing periodic tasks, including simple loops with sleep, third‑party libraries such as Timeloop, schedule, APScheduler, as well as distributed solutions like Celery and Apache Airflow, complete with code examples and architectural explanations.

APSchedulerAirflowScheduling
0 likes · 24 min read
Common Python Scheduling Techniques and Libraries
DataFunTalk
DataFunTalk
Jul 29, 2021 · Big Data

Real-Time Data Warehouse Construction at TAL Using DorisDB

This article details TAL's transition from offline to real-time data warehousing, describing business drivers, pain points, architectural evolution through Hive, Flink+Kudu, and DorisDB, and outlining the system design, data flow, scheduling, monitoring, and the resulting business and cost benefits.

AirflowBig DataDorisDB
0 likes · 14 min read
Real-Time Data Warehouse Construction at TAL Using DorisDB
Youzan Coder
Youzan Coder
Mar 18, 2020 · Big Data

The Evolution of Youzan’s Data Warehouse in a Big Data Environment

The article traces Youzan’s data warehouse from its chaotic early days lacking structure, through a 2016 Airflow‑driven construction phase that introduced layered ODS/DW/Data Mart architecture and naming standards, to a mature stage focused on efficiency, security, SparkSQL, dimensional modeling, metadata, and ongoing real‑time and governance challenges.

AirflowBig DataData Governance
0 likes · 20 min read
The Evolution of Youzan’s Data Warehouse in a Big Data Environment
Youzan Coder
Youzan Coder
Jul 20, 2018 · Big Data

How Youzan Built a Scalable Big Data Development Platform (DP)

This article details the design, architecture, and operational experience of Youzan's Data Platform (DP), covering its scheduling, data‑sync, service, and monitoring modules, the custom Airflow‑based task scheduler, current production metrics, supported task types, and future improvement plans.

AirflowBig DataData Platform
0 likes · 12 min read
How Youzan Built a Scalable Big Data Development Platform (DP)
DataFunTalk
DataFunTalk
Jun 24, 2018 · Big Data

OPPO Big Data Platform Operations and R&D Practices: Architecture, Scaling, and Monitoring

This article summarizes OPPO's rapid growth of its big‑data platform, detailing the three‑layer architecture, the evolution from Flume‑Kafka to NiFi for data ingestion, the upgrade of the OFlow task scheduler, comprehensive monitoring of data, resources and task SLA, and the development of a self‑service analytics tool called InnerEye to ensure stability, efficiency, and security.

AirflowBig DataNiFi
0 likes · 10 min read
OPPO Big Data Platform Operations and R&D Practices: Architecture, Scaling, and Monitoring
Liulishuo Tech Team
Liulishuo Tech Team
Jun 17, 2016 · Big Data

Building a Scalable Big Data Platform on AWS: Architecture and Execution Service Design

This article details the architectural design and implementation of a scalable big data platform built on AWS services, highlighting the transition from HDFS to S3 for storage, the use of EMR for elastic compute, and a custom Execution Service integrated with Consul and Airflow for automated cluster management and task scheduling.

AWS EMRAirflowBig Data Architecture
0 likes · 11 min read
Building a Scalable Big Data Platform on AWS: Architecture and Execution Service Design
21CTO
21CTO
Mar 31, 2016 · Big Data

Inside Airbnb’s Massive Big Data Platform: Architecture, Lessons & Scaling Secrets

Airbnb’s engineering team outlines the evolution of its big‑data platform, detailing the philosophy behind its architecture, the dual “gold” and “silver” Hive clusters, migration to Mesos, use of Presto, Airpal, Airflow, and the performance and cost gains achieved through these design choices.

AirbnbAirflowBig Data
0 likes · 11 min read
Inside Airbnb’s Massive Big Data Platform: Architecture, Lessons & Scaling Secrets