Tag

DAG

0 views collected around this technical thread.

DeWu Technology
DeWu Technology
Apr 16, 2025 · Databases

DGraph 2024 Architecture Upgrade and Performance Optimizations

In 2024 DGraph upgraded its architecture by splitting single clusters into multiple business‑specific clusters, adopting a sharded active‑active topology, and replacing its 1:N thread‑pool with an M:N grouped execution model that uses atomic scheduling, while parallelizing FlatBuffer encoding, streamlining SDK conversions, adding DAG debugging, timeline analysis, and dynamic sub‑graph templates to boost scalability, stability and developer productivity.

DAGbackend engineeringdistributed architecture
0 likes · 13 min read
DGraph 2024 Architecture Upgrade and Performance Optimizations
Test Development Learning Exchange
Test Development Learning Exchange
Dec 1, 2024 · Big Data

How to Install Apache Airflow and Build a Simple Data Processing Pipeline

This tutorial guides you through installing Apache Airflow, initializing its database, starting the web server and scheduler, creating a Python DAG that reads, cleans, groups, and saves CSV data, configuring the DAG directory, and monitoring the pipeline via the Airflow web UI.

Apache AirflowDAGETL
0 likes · 6 min read
How to Install Apache Airflow and Build a Simple Data Processing Pipeline
JD Tech Talk
JD Tech Talk
Jul 25, 2024 · Backend Development

Design and Architecture of JD.com’s Buffalo Distributed DAG Scheduling System

The article details the design, core technical solutions, high‑availability architecture, performance optimizations, and open capabilities of Buffalo, JD.com’s distributed DAG‑based job scheduling platform that supports massive task volumes, complex dependencies, and flexible resource management.

DAGHigh Availabilitybackend
0 likes · 13 min read
Design and Architecture of JD.com’s Buffalo Distributed DAG Scheduling System
iQIYI Technical Product Team
iQIYI Technical Product Team
Jul 5, 2024 · Big Data

RiskFactor: An Integrated Real‑Time and Offline Feature Platform for Risk Control

RiskFactor unifies iQIYI’s legacy real‑time and offline feature platforms onto Opal’s DAG‑plus‑SQL engine, accelerating feature production fifteen‑fold, cutting latency from hours to minutes, streamlining development, lowering costs, and delivering more reliable, versioned risk‑control capabilities against sophisticated online threats.

Big DataDAGFeature Engineering
0 likes · 14 min read
RiskFactor: An Integrated Real‑Time and Offline Feature Platform for Risk Control
Bilibili Tech
Bilibili Tech
May 31, 2024 · Backend Development

Design and High‑Availability Practices of Bilibili's Video Submission System

Bilibili’s video submission platform uses a layered micro‑service architecture with a DAG‑based scheduler, extensive observability, and HA tactics such as sharding, 64‑bit ID migration, full‑link stress tests, chaos engineering, and multi‑active data‑center deployment, while tooling like trace correlation and automated alerts ensures stability and guides future hybrid‑cloud migration.

BilibiliDAGHigh Availability
0 likes · 35 min read
Design and High‑Availability Practices of Bilibili's Video Submission System
DataFunTalk
DataFunTalk
May 16, 2024 · Big Data

Upgrading Data Warehouse Dependency Model: From Project-Level to Task-Level and External Dependency Integration

This article explains how a data warehouse dependency model was transformed from coarse project-level dependencies to fine-grained task-level DAGs, introduces virtual tasks for external dependencies, describes offset handling, and outlines the technical implementation and future automation plans for large‑scale scheduling systems.

Big DataDAGData Warehouse
0 likes · 13 min read
Upgrading Data Warehouse Dependency Model: From Project-Level to Task-Level and External Dependency Integration
Test Development Learning Exchange
Test Development Learning Exchange
Mar 31, 2024 · Big Data

Apache Airflow Overview and Advanced Usage Examples

This article introduces Apache Airflow, explains its core concepts such as DAGs, tasks, operators, executors, and the web UI, and provides multiple practical Python code examples for Bash commands, Python functions, SQL queries, task dependencies, sensors, dynamic DAGs, SubDAGs, XCom, email alerts, and error handling.

Apache AirflowDAGdata pipelines
0 likes · 7 min read
Apache Airflow Overview and Advanced Usage Examples
Baidu Geek Talk
Baidu Geek Talk
Jan 8, 2024 · Backend Development

Exgraph: A Graph Execution Engine for Task Orchestration

Exgraph, Baidu Search’s graph execution engine, uses a human‑readable description language and a robust execution core with dependency injection, object pooling, and interruption handling to orchestrate complex, parallel or conditional tasks, improving code readability and unifying diverse execution scenarios in search architecture.

DAGGo developmentObject Pooling
0 likes · 10 min read
Exgraph: A Graph Execution Engine for Task Orchestration
Ctrip Technology
Ctrip Technology
Nov 23, 2023 · Big Data

Optimizing Data Warehouse Timeliness Using Metadata Lineage

This article presents a metadata‑driven approach to improve data warehouse timeliness by extracting upstream lineage, identifying over‑layered, duplicate, and critical‑path tasks, and applying targeted scheduling and code‑level optimizations, demonstrated with a hotel order wide‑table case study.

DAGData WarehouseOptimization
0 likes · 7 min read
Optimizing Data Warehouse Timeliness Using Metadata Lineage
Cognitive Technology Team
Cognitive Technology Team
Nov 12, 2023 · Fundamentals

Topological Sorting of Directed Acyclic Graphs with Java Implementation Using Guava

This article explains the definition and properties of directed acyclic graphs (DAG), describes the basic topological sorting algorithm steps, and provides a complete Java implementation using Guava's MutableGraph class, illustrating the process with an example and its execution result.

DAGGuavaJava
0 likes · 4 min read
Topological Sorting of Directed Acyclic Graphs with Java Implementation Using Guava
Aikesheng Open Source Community
Aikesheng Open Source Community
Nov 8, 2023 · Databases

Analyzing OceanBase Freeze Dump Process via Log Parsing

This article explains how to parse OceanBase logs to trace the tenant freeze dump workflow, detailing the roles and log sequences of the freeze check thread, LSFreeze, Flush, DagScheduler, and MiniMerge threads, and illustrating each step with actual log excerpts and code snippets.

CompactionDAGDatabase
0 likes · 16 min read
Analyzing OceanBase Freeze Dump Process via Log Parsing
Baidu Geek Talk
Baidu Geek Talk
Oct 30, 2023 · Backend Development

Design and Practice of the tanGo Search Presentation Framework

The article presents Baidu’s Aladdin vertical search product and introduces the tanGo framework, which abstracts search pipelines into resources, cards, and scenes, enabling configuration‑driven, graph‑based resource scheduling for single results, demand clusters, and groups, while measuring scale, efficiency, and user satisfaction.

DAGSearcharchitecture
0 likes · 10 min read
Design and Practice of the tanGo Search Presentation Framework
ByteDance Data Platform
ByteDance Data Platform
Oct 11, 2023 · Backend Development

How Volcano Engine Rebuilt Its Ad‑Testing Platform for Scalability and Reliability

This article explains how Volcano Engine identified the tangled authorization, data‑fetching, and performance problems of its advertising AB‑testing platform and refactored it by splitting services, redesigning the data model with MySQL and ClickHouse, applying DAG scheduling, time‑wheel algorithms, Domain‑Driven Design, and rigorous unit testing to achieve a more stable, extensible backend solution.

AB testingDAGDDD
0 likes · 16 min read
How Volcano Engine Rebuilt Its Ad‑Testing Platform for Scalability and Reliability
ByteDance Data Platform
ByteDance Data Platform
Jul 5, 2023 · Cloud Native

How to Seamlessly Integrate ByteHouse Cloud Data Warehouse with Apache Airflow

This guide explains how to combine ByteHouse's cloud‑native data warehouse with Apache Airflow to build scalable, automated, and easy‑to‑manage data pipelines, covering business scenarios, data flow, and step‑by‑step installation and DAG creation.

Apache AirflowByteHouseDAG
0 likes · 10 min read
How to Seamlessly Integrate ByteHouse Cloud Data Warehouse with Apache Airflow
Python Programming Learning Circle
Python Programming Learning Circle
Apr 24, 2023 · Artificial Intelligence

Implementing a Simple Probabilistic Programming Language in Python

This article explains the principles of probabilistic programming languages and walks through a step‑by‑step implementation of a minimal PPL in Python, covering model definition, variable representation, DAG traversal, log‑density computation, and a posterior grid illustration.

Bayesian inferenceDAGPPL
0 likes · 12 min read
Implementing a Simple Probabilistic Programming Language in Python
Python Programming Learning Circle
Python Programming Learning Circle
Feb 17, 2023 · Artificial Intelligence

Building a Simple Probabilistic Programming Language in Python

This article explains the principles of probabilistic programming languages and walks through constructing a basic PPL in Python, covering model definition with latent and observed variables, distribution handling, DAG traversal for log‑density computation, and demonstrates evaluation with example code and visualizations.

Bayesian inferenceDAGPPL
0 likes · 13 min read
Building a Simple Probabilistic Programming Language in Python
Bilibili Tech
Bilibili Tech
Feb 17, 2023 · Backend Development

Design and Implementation of the Comet Workflow Engine at Bilibili

The article details Bilibili’s Comet workflow engine—a low‑code, plugin‑extensible platform built since 2019 that uses visual DAG templates, graph‑based legality checks, and asynchronous execution to automate diverse business processes such as SRE automation, permission requests, and push‑task approvals, improving operational efficiency across mobile and web services.

DAGGoSRE
0 likes · 18 min read
Design and Implementation of the Comet Workflow Engine at Bilibili
DevOps Cloud Academy
DevOps Cloud Academy
Nov 22, 2022 · Big Data

Components and Key Terminology in Apache Airflow

Apache Airflow’s architecture consists of schedulers, executors, workers, a web server, and a metadata database, enabling scalable workflow orchestration, while essential terminology such as DAGs, operators, and sensors defines how tasks are organized, executed, and monitored within data pipelines.

Apache AirflowBig DataDAG
0 likes · 8 min read
Components and Key Terminology in Apache Airflow
DevOps Cloud Academy
DevOps Cloud Academy
Oct 22, 2022 · Fundamentals

How to Write Your First Apache Airflow DAG (Hello World)

This tutorial walks through creating a simple “Hello World” Apache Airflow DAG by setting up the Python file, importing modules, defining the DAG object, adding a PythonOperator task, writing the callable function, and running the DAG with Airflow’s webserver and scheduler.

Apache AirflowDAGData Engineering
0 likes · 9 min read
How to Write Your First Apache Airflow DAG (Hello World)
DevOps Cloud Academy
DevOps Cloud Academy
Oct 15, 2022 · Big Data

Introduction to Apache Airflow

Apache Airflow is an open‑source platform for programmatically authoring, scheduling, and monitoring workflows using Directed Acyclic Graphs (DAGs), featuring components such as Scheduler, Web Server, Database, and various Executors, and offering easy‑to‑use, extensible, scalable, and robust integrations for data pipeline management.

Apache AirflowDAGExecutor
0 likes · 10 min read
Introduction to Apache Airflow