GuanYuan Data Tech Team
Author

GuanYuan Data Tech Team

Practical insights from the GuanYuan Data Tech Team

20
Articles
0
Likes
41
Views
0
Comments
Recent Articles

Latest from GuanYuan Data Tech Team

20 recent articles
GuanYuan Data Tech Team
GuanYuan Data Tech Team
Jul 28, 2022 · Artificial Intelligence

Unlocking Reinforcement Learning: Core Concepts, Algorithms, and Real‑World Applications

This article introduces reinforcement learning by defining agents, environments, rewards, and policies, explains key concepts such as Markov Decision Processes and Bellman equations, and surveys major algorithms—including dynamic programming, Monte‑Carlo, TD learning, policy gradients, Q‑learning, DQN, and evolution strategies—while highlighting practical challenges and notable case studies like AlphaGo Zero.

Evolution StrategiesMDPPolicy Gradient
0 likes · 27 min read
Unlocking Reinforcement Learning: Core Concepts, Algorithms, and Real‑World Applications
GuanYuan Data Tech Team
GuanYuan Data Tech Team
Jul 21, 2022 · Operations

Mastering Data Workflows with DAGs: Scheduling, Configurable UI, and Visual Design

This article explains how to abstract repetitive data‑report tasks into a standardized workflow, describes the core capabilities of scheduling and configuration, shows how to implement DAG‑based visual editors, and compares similar platforms such as n8n and Orange, offering practical code examples and design insights.

DAGbackendconfiguration UI
0 likes · 20 min read
Mastering Data Workflows with DAGs: Scheduling, Configurable UI, and Visual Design
GuanYuan Data Tech Team
GuanYuan Data Tech Team
Jul 14, 2022 · Big Data

How to Train Massive GBDT Models on Spark: A Complete Step‑by‑Step Guide

This article walks through using Apache Spark for large‑scale GBDT training, covering the challenges of massive data, Spark deployment, PySpark code examples, differences from Pandas, feature engineering, mmlspark installation, early‑stopping tricks, performance bottlenecks, and a systematic evaluation of alternative frameworks.

Big DataGBDTSpark
0 likes · 38 min read
How to Train Massive GBDT Models on Spark: A Complete Step‑by‑Step Guide
GuanYuan Data Tech Team
GuanYuan Data Tech Team
Jun 30, 2022 · Big Data

Why Spark 3.2 OOMs After Upgrade: Deep Dive into AQE and StageMetrics

After upgrading Spark from 3.0.1 to 3.2.1 an ETL job began failing with OutOfMemory errors; this article examines the root causes, including AQE‑related metric accumulation, skipped stages, and stage‑metric growth, and presents a debugging process and a code‑level fix to mitigate memory pressure.

AQEBig DataOutOfMemory
0 likes · 13 min read
Why Spark 3.2 OOMs After Upgrade: Deep Dive into AQE and StageMetrics
GuanYuan Data Tech Team
GuanYuan Data Tech Team
Jun 16, 2022 · Artificial Intelligence

How Deepchecks Automates Data and Model Validation for Reliable AI Pipelines

This article introduces the open‑source Deepchecks library, explains its core concepts of checks, conditions, and suites, and provides step‑by‑step tutorials for data validation, train‑test validation, and model evaluation to help AI engineers build robust, data‑centric machine‑learning workflows.

Pythondata validationdeepchecks
0 likes · 15 min read
How Deepchecks Automates Data and Model Validation for Reliable AI Pipelines
GuanYuan Data Tech Team
GuanYuan Data Tech Team
May 12, 2022 · Backend Development

Why Playwright Beats Selenium for Modern Web Automation

This article compares Playwright and Selenium, highlighting Playwright's superior language support, driver‑less operation, faster startup, reliable auto‑waiting, stable code generation, asynchronous capabilities, and headless mode, then provides step‑by‑step environment setup, practical usage tips, and code examples for Java‑based UI testing.

JavaPlaywrightUI Automation
0 likes · 16 min read
Why Playwright Beats Selenium for Modern Web Automation
GuanYuan Data Tech Team
GuanYuan Data Tech Team
May 5, 2022 · Artificial Intelligence

Why FLAML Is the Fast, Lightweight AutoML Framework You Should Try

This article introduces Microsoft’s FLAML, a fast and lightweight AutoML library, explains its design principles, cost‑aware search strategy, key observations, properties, and experimental results, and provides practical code examples for integrating FLAML into Python machine‑learning workflows.

AutoMLCost-aware SearchFLAML
0 likes · 15 min read
Why FLAML Is the Fast, Lightweight AutoML Framework You Should Try
GuanYuan Data Tech Team
GuanYuan Data Tech Team
Apr 14, 2022 · Artificial Intelligence

Mastering Time Series Forecasting: From Moving Averages to Transformers

Time series forecasting, essential across weather, finance, and commerce, involves tasks like classification, clustering, anomaly detection, and especially prediction; this article explores its definitions, evaluation metrics, traditional methods, machine‑learning approaches, deep‑learning models such as TFT, and emerging AutoML tools, offering practical insights and best practices.

AutoMLGBDTProphet
0 likes · 27 min read
Mastering Time Series Forecasting: From Moving Averages to Transformers