Tag

Airflow

1 views collected around this technical thread.

DataFunTalk
DataFunTalk
May 26, 2024 · Big Data

Athena Data Factory: A One‑Stop Data Development and Governance Platform for Sparkle Thinking

The article details how Sparkle Thinking built the Athena Data Factory—a comprehensive, self‑service data development and governance platform that integrates data integration, ETL, real‑time processing, monitoring, and analytics, describing its architecture, key technologies, implementation timeline, operational practices, performance gains, and future directions.

AirflowETLFlink
0 likes · 26 min read
Athena Data Factory: A One‑Stop Data Development and Governance Platform for Sparkle Thinking
Python Programming Learning Circle
Python Programming Learning Circle
Dec 9, 2023 · Backend Development

Eight Ways to Implement Python Scheduled Tasks

This article presents a comprehensive guide to implementing periodic tasks in Python, covering eight approaches including simple while‑loop with sleep, Timeloop, threading.Timer, sched, schedule, APScheduler, Celery, and Apache Airflow, each with code examples and practical notes.

AirflowCeleryCron
0 likes · 24 min read
Eight Ways to Implement Python Scheduled Tasks
Python Programming Learning Circle
Python Programming Learning Circle
Aug 14, 2023 · Backend Development

Common Python Scheduling Techniques and Tools

This article reviews multiple ways to implement periodic tasks in Python, covering simple loops with sleep, libraries such as Timeloop, threading.Timer, sched, schedule, the APScheduler framework, as well as distributed solutions like Celery and Apache Airflow, and provides code examples for each method.

AirflowCeleryCron
0 likes · 23 min read
Common Python Scheduling Techniques and Tools
Data Thinking Notes
Data Thinking Notes
May 17, 2023 · Big Data

Inside Wing Pay’s Scalable Big Data Platform: Architecture & Governance

This article details how Wing Pay built a comprehensive data development and governance platform, covering company background, business scenarios, goals, challenges, task development workflow, task types, SparkSQL editor features, double‑environment deployment, Airflow scheduling, DataX data bus, resource isolation, compute optimization, data quality monitoring, cloud‑native practices, future outlook, and a Q&A on data permissions and governance.

AirflowSparkbig data
0 likes · 17 min read
Inside Wing Pay’s Scalable Big Data Platform: Architecture & Governance
Python Programming Learning Circle
Python Programming Learning Circle
Apr 14, 2023 · Backend Development

Implementing Periodic Tasks in Python: while‑loop, Timeloop, sched, schedule, APScheduler, Celery, and Airflow

This article reviews several Python approaches for creating scheduled or periodic jobs—including a simple while‑True loop with sleep, the Timeloop library, the built‑in sched module, the schedule package, APScheduler, Celery, and Apache Airflow—explaining their usage, advantages, limitations, and providing ready‑to‑run code samples.

AirflowCeleryCron
0 likes · 15 min read
Implementing Periodic Tasks in Python: while‑loop, Timeloop, sched, schedule, APScheduler, Celery, and Airflow
DataFunTalk
DataFunTalk
Apr 11, 2023 · Big Data

WingPay's Big Data Platform Construction and Development Experience

This article presents a comprehensive case study of WingPay's big data platform, covering company background, data development and governance platform design, task development workflow, architectural choices, scheduling engine selection, data bus implementation, resource isolation, quality monitoring, cloud‑native practices, future challenges, and a Q&A session.

AirflowResource Isolationbig data
0 likes · 15 min read
WingPay's Big Data Platform Construction and Development Experience
Data Thinking Notes
Data Thinking Notes
Nov 15, 2022 · Operations

Why Is Airflow Draining CPU? A Step‑by‑Step Diagnosis and Fix

A high‑CPU anomaly on a Spark‑enabled machine was traced through application checks, network TIME_WAIT analysis, and Airflow inspection, leading to kernel tweaks and an Airflow configuration change that finally restored normal CPU usage.

AirflowCPUPerformance
0 likes · 4 min read
Why Is Airflow Draining CPU? A Step‑by‑Step Diagnosis and Fix
Big Data Technology Architecture
Big Data Technology Architecture
Jun 4, 2022 · Cloud Native

Deploying an Apache Airflow Cluster on Kubernetes with Helm and GitSync

This guide explains how to set up a production‑grade Apache Airflow cluster on Kubernetes using different executors, Helm charts, Git‑based DAG synchronization, custom Docker images, and related tooling such as Chocolatey, Helm, and Git, providing step‑by‑step commands and configuration details.

AirflowCI/CDCluster Deployment
0 likes · 18 min read
Deploying an Apache Airflow Cluster on Kubernetes with Helm and GitSync
Big Data Technology Architecture
Big Data Technology Architecture
Nov 28, 2021 · Big Data

EMR Studio: Architecture and Features for Simplifying Big Data Development

EMR Studio is a one‑stop, open‑source‑compatible big data development platform that integrates Zeppelin, Jupyter, Airflow and a custom Cluster Manager to streamline job creation, scheduling, monitoring, and cluster switching, thereby addressing common usability challenges in Spark, Flink, Hive, and Presto workflows.

AirflowApache SparkEMR Studio
0 likes · 9 min read
EMR Studio: Architecture and Features for Simplifying Big Data Development
Python Programming Learning Circle
Python Programming Learning Circle
Sep 9, 2021 · Backend Development

Common Python Scheduling Techniques and Libraries

This article provides a comprehensive overview of various Python approaches for implementing periodic tasks, including simple loops with sleep, third‑party libraries such as Timeloop, schedule, APScheduler, as well as distributed solutions like Celery and Apache Airflow, complete with code examples and architectural explanations.

AirflowCeleryCron
0 likes · 24 min read
Common Python Scheduling Techniques and Libraries
Python Programming Learning Circle
Python Programming Learning Circle
Sep 7, 2021 · Backend Development

Python Scheduling Techniques: From Simple Loops to APScheduler, Celery, and Airflow

This article presents a comprehensive guide to implementing periodic tasks in Python, covering simple while‑loop with sleep, Timeloop, threading.Timer, the built‑in sched module, the schedule library, APScheduler, Celery, and Apache Airflow, with code examples and practical tips.

AirflowCeleryCron
0 likes · 26 min read
Python Scheduling Techniques: From Simple Loops to APScheduler, Celery, and Airflow
DataFunTalk
DataFunTalk
Jul 29, 2021 · Big Data

Real-Time Data Warehouse Construction at TAL Using DorisDB

This article details TAL's transition from offline to real-time data warehousing, describing business drivers, pain points, architectural evolution through Hive, Flink+Kudu, and DorisDB, and outlining the system design, data flow, scheduling, monitoring, and the resulting business and cost benefits.

AirflowDorisDBFlink
0 likes · 14 min read
Real-Time Data Warehouse Construction at TAL Using DorisDB
Youzan Coder
Youzan Coder
Mar 18, 2020 · Big Data

The Evolution of Youzan’s Data Warehouse in a Big Data Environment

The article traces Youzan’s data warehouse from its chaotic early days lacking structure, through a 2016 Airflow‑driven construction phase that introduced layered ODS/DW/Data Mart architecture and naming standards, to a mature stage focused on efficiency, security, SparkSQL, dimensional modeling, metadata, and ongoing real‑time and governance challenges.

AirflowETLHive
0 likes · 20 min read
The Evolution of Youzan’s Data Warehouse in a Big Data Environment
DataFunTalk
DataFunTalk
Jun 24, 2018 · Big Data

OPPO Big Data Platform Operations and R&D Practices: Architecture, Scaling, and Monitoring

This article summarizes OPPO's rapid growth of its big‑data platform, detailing the three‑layer architecture, the evolution from Flume‑Kafka to NiFi for data ingestion, the upgrade of the OFlow task scheduler, comprehensive monitoring of data, resources and task SLA, and the development of a self‑service analytics tool called InnerEye to ensure stability, efficiency, and security.

AirflowNiFibig data
0 likes · 10 min read
OPPO Big Data Platform Operations and R&D Practices: Architecture, Scaling, and Monitoring
Liulishuo Tech Team
Liulishuo Tech Team
Jun 17, 2016 · Big Data

Building a Scalable Big Data Platform on AWS: Architecture and Execution Service Design

This article details the architectural design and implementation of a scalable big data platform built on AWS services, highlighting the transition from HDFS to S3 for storage, the use of EMR for elastic compute, and a custom Execution Service integrated with Consul and Airflow for automated cluster management and task scheduling.

AWS EMRAirflowbig data architecture
0 likes · 11 min read
Building a Scalable Big Data Platform on AWS: Architecture and Execution Service Design
Art of Distributed System Architecture Design
Art of Distributed System Architecture Design
Mar 31, 2016 · Big Data

Airbnb’s Big Data Platform Architecture: Design, Evolution, and Lessons Learned

Airbnb’s engineering team outlines the evolution and design of its massive big‑data platform—detailing the dual “gold” and “silver” Hive clusters, use of Kafka, Presto, Airflow, Mesos, and Spark, along with performance gains, cost reductions, and open‑source contributions.

AirbnbAirflowHadoop
0 likes · 13 min read
Airbnb’s Big Data Platform Architecture: Design, Evolution, and Lessons Learned