Tagged articles
58 articles
Page 1 of 1
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Feb 23, 2026 · Artificial Intelligence

How AlphaPROBE Leverages DAGs for Efficient Alpha‑Factor Mining

AlphaPROBE reformulates alpha‑factor discovery as a strategy‑navigation problem on a directed acyclic graph, combining a Bayesian factor retriever with a DAG‑aware generator to achieve superior prediction accuracy, stable returns, and faster training across three major Chinese stock markets.

Alpha FactorAlphaPROBEBayesian Retrieval
0 likes · 22 min read
How AlphaPROBE Leverages DAGs for Efficient Alpha‑Factor Mining
DeWu Technology
DeWu Technology
Apr 16, 2025 · Databases

DGraph 2024 Architecture Upgrade and Performance Optimizations

In 2024 DGraph upgraded its architecture by splitting single clusters into multiple business‑specific clusters, adopting a sharded active‑active topology, and replacing its 1:N thread‑pool with an M:N grouped execution model that uses atomic scheduling, while parallelizing FlatBuffer encoding, streamlining SDK conversions, adding DAG debugging, timeline analysis, and dynamic sub‑graph templates to boost scalability, stability and developer productivity.

Backend EngineeringDAGdistributed architecture
0 likes · 13 min read
DGraph 2024 Architecture Upgrade and Performance Optimizations
DeWu Technology
DeWu Technology
Apr 7, 2025 · Industry Insights

How DPP Evolved from Fixed Engine to DAG‑Based Orchestration for Faster Recommendation Iterations

This article explains the DPP platform’s overall architecture, its key features for rapid iteration, and the three‑stage evolution of its orchestration engine—from the fixed DPP‑Engine to the flexible BizEngine and finally the graph‑based DagEngine—detailing design decisions, protocols, challenges, and future directions.

DAGDPPOrchestration
0 likes · 16 min read
How DPP Evolved from Fixed Engine to DAG‑Based Orchestration for Faster Recommendation Iterations
21CTO
21CTO
Aug 4, 2024 · Operations

How Netflix’s Open‑Source Maestro Powers Scalable Data & ML Workflows

Netflix has open‑sourced its Maestro workflow orchestrator, a highly scalable, DAG‑based system built on Git, Java, Gradle and Docker that handles millions of daily jobs for data scientists, enabling ETL, ML pipelines, A/B testing and more, while meeting strict SLOs.

DAGMaestroNetflix
0 likes · 5 min read
How Netflix’s Open‑Source Maestro Powers Scalable Data & ML Workflows
JD Tech Talk
JD Tech Talk
Jul 25, 2024 · Backend Development

Design and Architecture of JD.com’s Buffalo Distributed DAG Scheduling System

The article details the design, core technical solutions, high‑availability architecture, performance optimizations, and open capabilities of Buffalo, JD.com’s distributed DAG‑based job scheduling platform that supports massive task volumes, complex dependencies, and flexible resource management.

BackendDAGDistributed Scheduling
0 likes · 13 min read
Design and Architecture of JD.com’s Buffalo Distributed DAG Scheduling System
iQIYI Technical Product Team
iQIYI Technical Product Team
Jul 5, 2024 · Big Data

RiskFactor: An Integrated Real‑Time and Offline Feature Platform for Risk Control

RiskFactor unifies iQIYI’s legacy real‑time and offline feature platforms onto Opal’s DAG‑plus‑SQL engine, accelerating feature production fifteen‑fold, cutting latency from hours to minutes, streamlining development, lowering costs, and delivering more reliable, versioned risk‑control capabilities against sophisticated online threats.

Big DataDAGReal-time Streaming
0 likes · 14 min read
RiskFactor: An Integrated Real‑Time and Offline Feature Platform for Risk Control
Bilibili Tech
Bilibili Tech
May 31, 2024 · Backend Development

Design and High‑Availability Practices of Bilibili's Video Submission System

Bilibili’s video submission platform uses a layered micro‑service architecture with a DAG‑based scheduler, extensive observability, and HA tactics such as sharding, 64‑bit ID migration, full‑link stress tests, chaos engineering, and multi‑active data‑center deployment, while tooling like trace correlation and automated alerts ensures stability and guides future hybrid‑cloud migration.

Backend ArchitectureBilibiliDAG
0 likes · 35 min read
Design and High‑Availability Practices of Bilibili's Video Submission System
DataFunTalk
DataFunTalk
May 16, 2024 · Big Data

Upgrading Data Warehouse Dependency Model: From Project-Level to Task-Level and External Dependency Integration

This article explains how a data warehouse dependency model was transformed from coarse project-level dependencies to fine-grained task-level DAGs, introduces virtual tasks for external dependencies, describes offset handling, and outlines the technical implementation and future automation plans for large‑scale scheduling systems.

DAGautomationdependency model
0 likes · 13 min read
Upgrading Data Warehouse Dependency Model: From Project-Level to Task-Level and External Dependency Integration
Test Development Learning Exchange
Test Development Learning Exchange
Mar 31, 2024 · Big Data

Apache Airflow Overview and Advanced Usage Examples

This article introduces Apache Airflow, explains its core concepts such as DAGs, tasks, operators, executors, and the web UI, and provides multiple practical Python code examples for Bash commands, Python functions, SQL queries, task dependencies, sensors, dynamic DAGs, SubDAGs, XCom, email alerts, and error handling.

Apache AirflowDAGPython
0 likes · 7 min read
Apache Airflow Overview and Advanced Usage Examples
Baidu Geek Talk
Baidu Geek Talk
Jan 8, 2024 · Backend Development

Exgraph: A Graph Execution Engine for Task Orchestration

Exgraph, Baidu Search’s graph execution engine, uses a human‑readable description language and a robust execution core with dependency injection, object pooling, and interruption handling to orchestrate complex, parallel or conditional tasks, improving code readability and unifying diverse execution scenarios in search architecture.

DAGGo developmentObject Pooling
0 likes · 10 min read
Exgraph: A Graph Execution Engine for Task Orchestration
Ctrip Technology
Ctrip Technology
Nov 23, 2023 · Big Data

Optimizing Data Warehouse Timeliness Using Metadata Lineage

This article presents a metadata‑driven approach to improve data warehouse timeliness by extracting upstream lineage, identifying over‑layered, duplicate, and critical‑path tasks, and applying targeted scheduling and code‑level optimizations, demonstrated with a hotel order wide‑table case study.

DAGLineagedata pipeline
0 likes · 7 min read
Optimizing Data Warehouse Timeliness Using Metadata Lineage
Aikesheng Open Source Community
Aikesheng Open Source Community
Nov 8, 2023 · Databases

Analyzing OceanBase Freeze Dump Process via Log Parsing

This article explains how to parse OceanBase logs to trace the tenant freeze dump workflow, detailing the roles and log sequences of the freeze check thread, LSFreeze, Flush, DagScheduler, and MiniMerge threads, and illustrating each step with actual log excerpts and code snippets.

DAGFreeze ProcessOceanBase
0 likes · 16 min read
Analyzing OceanBase Freeze Dump Process via Log Parsing
Baidu Geek Talk
Baidu Geek Talk
Oct 30, 2023 · Backend Development

Design and Practice of the tanGo Search Presentation Framework

The article presents Baidu’s Aladdin vertical search product and introduces the tanGo framework, which abstracts search pipelines into resources, cards, and scenes, enabling configuration‑driven, graph‑based resource scheduling for single results, demand clusters, and groups, while measuring scale, efficiency, and user satisfaction.

BackendConfigurationDAG
0 likes · 10 min read
Design and Practice of the tanGo Search Presentation Framework
ByteDance Data Platform
ByteDance Data Platform
Oct 11, 2023 · Backend Development

How Volcano Engine Rebuilt Its Ad‑Testing Platform for Scalability and Reliability

This article explains how Volcano Engine identified the tangled authorization, data‑fetching, and performance problems of its advertising AB‑testing platform and refactored it by splitting services, redesigning the data model with MySQL and ClickHouse, applying DAG scheduling, time‑wheel algorithms, Domain‑Driven Design, and rigorous unit testing to achieve a more stable, extensible backend solution.

AB testingAdvertisingBackend
0 likes · 16 min read
How Volcano Engine Rebuilt Its Ad‑Testing Platform for Scalability and Reliability
Kuaishou E-commerce Frontend Team
Kuaishou E-commerce Frontend Team
Aug 17, 2023 · Frontend Development

How DAGs Supercharge Frontend Performance and Workflow Automation

This article explains how directed acyclic graphs (DAGs) are applied in frontend development for resource loading optimization, component library construction, and task flow orchestration, and demonstrates a real-world implementation in Kuaishou's e‑commerce advertising engine with detailed architecture and code examples.

Component LibraryDAGVisual Programming
0 likes · 14 min read
How DAGs Supercharge Frontend Performance and Workflow Automation
Huolala Tech
Huolala Tech
Jul 6, 2023 · Big Data

How to Optimize DAG Task Scheduling to Cut 30 Minutes from Critical Path

This article explains how to analyze and automatically optimize complex DAG‑based data platform task chains, identify bottlenecks, adjust upstream task timings, and reduce critical‑path execution time by up to 30 minutes while preventing resource contention and peak overloads.

Big DataDAGResource Optimization
0 likes · 15 min read
How to Optimize DAG Task Scheduling to Cut 30 Minutes from Critical Path
Python Programming Learning Circle
Python Programming Learning Circle
Feb 17, 2023 · Artificial Intelligence

Building a Simple Probabilistic Programming Language in Python

This article explains the principles of probabilistic programming languages and walks through constructing a basic PPL in Python, covering model definition with latent and observed variables, distribution handling, DAG traversal for log‑density computation, and demonstrates evaluation with example code and visualizations.

Bayesian inferenceDAGPPL
0 likes · 13 min read
Building a Simple Probabilistic Programming Language in Python
Bilibili Tech
Bilibili Tech
Feb 17, 2023 · Backend Development

Design and Implementation of the Comet Workflow Engine at Bilibili

The article details Bilibili’s Comet workflow engine—a low‑code, plugin‑extensible platform built since 2019 that uses visual DAG templates, graph‑based legality checks, and asynchronous execution to automate diverse business processes such as SRE automation, permission requests, and push‑task approvals, improving operational efficiency across mobile and web services.

DAGGoProcess Engine
0 likes · 18 min read
Design and Implementation of the Comet Workflow Engine at Bilibili
DevOps Cloud Academy
DevOps Cloud Academy
Nov 22, 2022 · Big Data

Components and Key Terminology in Apache Airflow

Apache Airflow’s architecture consists of schedulers, executors, workers, a web server, and a metadata database, enabling scalable workflow orchestration, while essential terminology such as DAGs, operators, and sensors defines how tasks are organized, executed, and monitored within data pipelines.

Apache AirflowBig DataDAG
0 likes · 8 min read
Components and Key Terminology in Apache Airflow
DevOps Cloud Academy
DevOps Cloud Academy
Oct 22, 2022 · Fundamentals

How to Write Your First Apache Airflow DAG (Hello World)

This tutorial walks through creating a simple “Hello World” Apache Airflow DAG by setting up the Python file, importing modules, defining the DAG object, adding a PythonOperator task, writing the callable function, and running the DAG with Airflow’s webserver and scheduler.

Apache AirflowDAGPython
0 likes · 9 min read
How to Write Your First Apache Airflow DAG (Hello World)
DevOps Cloud Academy
DevOps Cloud Academy
Oct 15, 2022 · Big Data

Introduction to Apache Airflow

Apache Airflow is an open‑source platform for programmatically authoring, scheduling, and monitoring workflows using Directed Acyclic Graphs (DAGs), featuring components such as Scheduler, Web Server, Database, and various Executors, and offering easy‑to‑use, extensible, scalable, and robust integrations for data pipeline management.

Apache AirflowDAGExecutor
0 likes · 10 min read
Introduction to Apache Airflow
DevOps Cloud Academy
DevOps Cloud Academy
Sep 15, 2022 · Big Data

Understanding Apache Airflow DAGs and Best Practices

This article explains what Apache Airflow DAGs are, describes their architecture and how they model data pipelines as directed acyclic graphs, and provides practical best‑practice guidelines for writing clean, reproducible, and resource‑efficient workflows.

Apache AirflowDAGbest practices
0 likes · 10 min read
Understanding Apache Airflow DAGs and Best Practices
DataFunTalk
DataFunTalk
Jun 12, 2022 · Big Data

Huya Offline Job Scheduling System: Design, Baseline Scheduling, and Cost Optimization

This article introduces Huya's offline job scheduling platform, covering its positioning, evolution, system architecture, baseline scheduling techniques, cost‑optimization strategies, resource‑balancing methods, and future intelligent data‑warehouse directions, illustrating how data‑driven automation improves YARN utilization and SLA compliance.

Cost OptimizationDAGYARN
0 likes · 12 min read
Huya Offline Job Scheduling System: Design, Baseline Scheduling, and Cost Optimization
Big Data Technology Architecture
Big Data Technology Architecture
Jun 3, 2022 · Operations

Understanding Apache Airflow DAGs, Operators, and Scheduling

This article explains Apache Airflow's core concepts, including DAG definitions, scheduling intervals, task dependencies, various operators such as BashOperator, PythonOperator, Branch operators, sensors, and custom operators, and provides code examples and configuration details for building robust data pipelines.

Apache AirflowDAGScheduling
0 likes · 15 min read
Understanding Apache Airflow DAGs, Operators, and Scheduling
DataFunTalk
DataFunTalk
Mar 5, 2022 · Big Data

Designing Cross‑Period Dependencies in Data Scheduling Systems

This article explains how data scheduling systems manage task execution, ETL processes, and cross‑period dependencies by linking task versions, data partitions, and time parameters, and introduces the offset‑and‑cnt model to express dynamic dependencies in big‑data pipelines.

DAGData SchedulingETL
0 likes · 14 min read
Designing Cross‑Period Dependencies in Data Scheduling Systems
TikTok Frontend Technology Team
TikTok Frontend Technology Team
Oct 22, 2021 · Fundamentals

Understanding DAG Basics and the Dagre Layout Algorithm with Perfect-Process

This article introduces the fundamentals of directed acyclic graphs (DAGs), explains adjacency representations, details the Dagre layout algorithm’s concepts, computation steps, and constraints, and presents the Perfect‑Process front‑end library that implements these techniques for interactive pipeline diagram rendering and editing.

DAGalgorithmfrontend
0 likes · 13 min read
Understanding DAG Basics and the Dagre Layout Algorithm with Perfect-Process
Architect
Architect
Aug 29, 2021 · Backend Development

Design and Implementation of a DAG‑Based Task Orchestration Framework

This article explains how to design and implement a DAG‑based task orchestration framework in Java, covering graph representations, dependency management, executor integration, state tracking, and how to persist workflows and tasks in a relational database for platform‑level usage.

DAGExecutorjava
0 likes · 11 min read
Design and Implementation of a DAG‑Based Task Orchestration Framework
ELab Team
ELab Team
Jul 22, 2021 · Fundamentals

Mastering Hierarchical Graph Layout: A Deep Dive into Sugiyama’s Algorithm

This article explains the principles, steps, and various algorithms behind hierarchical (Sugiyama) graph layout, covering node layering, crossing reduction, coordinate computation, drawing, and practical implementation details with JavaScript libraries such as dagre and d3‑dag.

DAGSugiyama algorithmgraph layout
0 likes · 16 min read
Mastering Hierarchical Graph Layout: A Deep Dive into Sugiyama’s Algorithm
IT Architects Alliance
IT Architects Alliance
Jul 13, 2021 · Backend Development

Design and Implementation of a DAG‑Based Task Scheduling Framework

This article explains how to build a task‑orchestration framework using directed acyclic graphs (DAG), covering graph representations, Java data structures, dependency management, concurrent execution with thread pools, and persisting workflow state to a relational database for platform‑level use.

DAGjavatask scheduling
0 likes · 11 min read
Design and Implementation of a DAG‑Based Task Scheduling Framework
Architect
Architect
May 18, 2021 · Big Data

Design and Optimization of Baidu's Image Processing and Ingestion Platform (Imazon) for Multimodal Retrieval

This article details Baidu's multimodal retrieval architecture, explaining the separation of online and offline services, the design of the Imazon image processing and ingestion platform, its technical indicators, large‑scale streaming and batch pipelines, optimization practices for high throughput, and the underlying content‑relationship engine.

DAGImage ProcessingMultimodal Retrieval
0 likes · 13 min read
Design and Optimization of Baidu's Image Processing and Ingestion Platform (Imazon) for Multimodal Retrieval
Baidu Geek Talk
Baidu Geek Talk
May 17, 2021 · Artificial Intelligence

Design and Optimization of Baidu's Image Processing and Multimodal Retrieval Platform (Imazon)

The Imazon platform unifies Baidu’s image acquisition, feature extraction, and ANN‑based multimodal retrieval into a cloud‑native, real‑time pipeline that ingests billions of images daily, optimizes storage and GPU usage, reduces message‑queue costs, and ensures high‑throughput, low‑latency search across text, visual, and voice queries.

Cloud NativeDAGImage Processing
0 likes · 13 min read
Design and Optimization of Baidu's Image Processing and Multimodal Retrieval Platform (Imazon)
Taobao Frontend Technology
Taobao Frontend Technology
May 17, 2021 · Operations

Mastering GitLab CI/CD: Core Concepts, Pipelines, and Best Practices

This article provides a comprehensive overview of GitLab CI/CD, covering its core concepts—pipelines, stages, jobs, and runners—along with .gitlab-ci.yml configuration, variables, triggers, DAG pipelines, runner types, cloud‑native capabilities, efficiency management, and practical demo examples to help teams implement robust DevOps workflows.

CI/CD pipelinesDAGDevOps
0 likes · 19 min read
Mastering GitLab CI/CD: Core Concepts, Pipelines, and Best Practices
High Availability Architecture
High Availability Architecture
Dec 16, 2020 · Backend Development

Implementing Task Scheduling Dependencies and Workflow with Go and DAG

This article explains the concepts of task scheduling dependencies and workflow, introduces graph theory basics such as vertices, edges, and DAGs, and provides a complete Go implementation—including graph structures, BFS traversal, topological sorting, and concurrent execution—to efficiently manage dependent tasks in distributed systems.

DAGGoconcurrency
0 likes · 10 min read
Implementing Task Scheduling Dependencies and Workflow with Go and DAG
Alibaba Terminal Technology
Alibaba Terminal Technology
Nov 23, 2020 · Frontend Development

Unlock Powerful Graph Editing with AntV X6: Features, APIs, and Demos

This article introduces AntV X6, a versatile JavaScript graph‑editing engine, covering its core capabilities such as node and edge creation, custom HTML/React nodes, rich connection styles, grid and background options, interactive tools like ports, grouping, selection, undo/redo, and provides links to tutorials, demos, and extensibility mechanisms.

AntV X6DAGER Diagram
0 likes · 11 min read
Unlock Powerful Graph Editing with AntV X6: Features, APIs, and Demos
21CTO
21CTO
Apr 29, 2019 · Big Data

How EasyScheduler Powers Scalable Big Data Workflow Management

EasyScheduler is an open‑source big‑data workflow scheduler that uses a decentralized architecture with Master and Worker nodes coordinated via ZooKeeper, supporting DAG‑based task definitions, various task types, fault tolerance, priority handling, distributed locks, and remote logging, all illustrated with detailed component diagrams.

Big DataDAGDistributed Systems
0 likes · 17 min read
How EasyScheduler Powers Scalable Big Data Workflow Management
Architecture Digest
Architecture Digest
Apr 29, 2019 · Big Data

EasyScheduler: An Open‑Source Big Data Workflow Scheduling System – Architecture and Design Overview

This article introduces EasyScheduler, an open‑source big data workflow scheduling system, explaining its core terminology, decentralized architecture, distributed lock implementation, thread‑shortage handling, fault‑tolerance mechanisms, task‑retry and priority designs, as well as its logging solution using Logback and gRPC.

DAGSchedulerfault tolerance
0 likes · 14 min read
EasyScheduler: An Open‑Source Big Data Workflow Scheduling System – Architecture and Design Overview
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 20, 2018 · Big Data

Unlocking Alibaba’s Massive Cluster Data V2018: A Treasure Trove for Big‑Data Research

Alibaba has released the comprehensive Cluster Data V2018 dataset, detailing eight days of operation for 4,000 servers and their mixed online and offline workloads, including DAG information, enabling researchers to study large‑scale data‑center performance, resource utilization, scheduling algorithms, and derive new insights.

Big DataDAGDataset
0 likes · 7 min read
Unlocking Alibaba’s Massive Cluster Data V2018: A Treasure Trove for Big‑Data Research
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Dec 14, 2018 · Big Data

Design and Architecture of Jarvis: A DAG‑Based Big Data Scheduling Platform

The article describes the design goals, architecture, and key components of Jarvis, an internal DAG‑driven job scheduling platform for big‑data pipelines, covering timed‑shard and workflow schedulers, high‑availability mechanisms, task development for Hive and data‑transfer jobs, dependency handling, APIs, monitoring, and future enhancements.

DAGJob Schedulinghigh availability
0 likes · 17 min read
Design and Architecture of Jarvis: A DAG‑Based Big Data Scheduling Platform
dbaplus Community
dbaplus Community
Jul 11, 2018 · Big Data

How 360’s Titan Platform Evolved: From Script Templates to Real‑Time DAG‑Based Data Processing

This article outlines the evolution of 360’s Titan big‑data processing platform, describing the challenges of traditional script‑based development, the three architectural stages (pre‑Titan, Titan 1.0, Titan 2.0), the functional modules, the DITTO component framework, and key takeaways for building flexible, self‑service data pipelines.

DAGDITTOData Platform
0 likes · 14 min read
How 360’s Titan Platform Evolved: From Script Templates to Real‑Time DAG‑Based Data Processing
Qunar Tech Salon
Qunar Tech Salon
Jul 28, 2017 · Backend Development

Ensuring Transaction System Availability with Rate Limiting, Circuit Breaking, Gray Release, Warm‑up, Automated Diff Testing, ARES Regression Tool, and a DAG‑Based Asynchronous Programming Framework

The article describes how a high‑traffic e‑commerce transaction system improves availability through rate limiting, circuit breaking, gray‑release, JVM warm‑up, an online diff testing tool, the ARES regression platform, and a DAG‑driven asynchronous execution framework to boost throughput and reduce latency.

Automated TestingBackendCircuit Breaking
0 likes · 6 min read
Ensuring Transaction System Availability with Rate Limiting, Circuit Breaking, Gray Release, Warm‑up, Automated Diff Testing, ARES Regression Tool, and a DAG‑Based Asynchronous Programming Framework
High Availability Architecture
High Availability Architecture
Jul 19, 2017 · Artificial Intelligence

Weiflow: A Scalable Machine Learning Workflow Framework for Sina Weibo

The article introduces Weiflow, a dual‑layer DAG‑based machine‑learning workflow framework designed for Sina Weibo, and explains how its modular XML configuration, Scala implementation, and integration with Spark, TensorFlow, Hive, Storm, and Flink improve development efficiency, scalability, and execution performance across the entire ML pipeline.

Big DataDAGScala
0 likes · 16 min read
Weiflow: A Scalable Machine Learning Workflow Framework for Sina Weibo
Hulu Beijing
Hulu Beijing
Aug 14, 2015 · Big Data

How Voidbox Bridges Docker and YARN for Scalable Big Data Workloads

Voidbox integrates Docker containers with YARN to simplify distributed application development, improve deployment, boost cluster efficiency, and provide fault‑tolerant, DAG‑based execution modes, enabling seamless resource management for Hadoop‑based big data jobs.

Big DataCluster ComputingDAG
0 likes · 17 min read
How Voidbox Bridges Docker and YARN for Scalable Big Data Workloads
Efficient Ops
Efficient Ops
Jun 25, 2015 · Big Data

Inside Baidu’s 8‑Year Evolution of Hadoop and Distributed Computing

This article chronicles Baidu’s eight‑year journey from early Hadoop adoption to advanced MPI, DAG engines, and real‑time streaming platforms, detailing architectural milestones, performance optimizations, and practical lessons for large‑scale offline and online data processing.

BaiduDAGHadoop
0 likes · 21 min read
Inside Baidu’s 8‑Year Evolution of Hadoop and Distributed Computing