Author

GuanYuan Data Tech Team

Practical insights from the GuanYuan Data Tech Team

Articles

Likes

Views

Comments

Latest from GuanYuan Data Tech Team

20 recent articles

GuanYuan Data Tech Team

Jul 28, 2022 · Artificial Intelligence

Unlocking Reinforcement Learning: Core Concepts, Algorithms, and Real‑World Applications

This article introduces reinforcement learning by defining agents, environments, rewards, and policies, explains key concepts such as Markov Decision Processes and Bellman equations, and surveys major algorithms—including dynamic programming, Monte‑Carlo, TD learning, policy gradients, Q‑learning, DQN, and evolution strategies—while highlighting practical challenges and notable case studies like AlphaGo Zero.

Deep LearningEvolution StrategiesMDP

0 likes · 27 min read

Unlocking Reinforcement Learning: Core Concepts, Algorithms, and Real‑World Applications

GuanYuan Data Tech Team

Jul 21, 2022 · Operations

Mastering Data Workflows with DAGs: Scheduling, Configurable UI, and Visual Design

This article explains how to abstract repetitive data‑report tasks into a standardized workflow, describes the core capabilities of scheduling and configuration, shows how to implement DAG‑based visual editors, and compares similar platforms such as n8n and Orange, offering practical code examples and design insights.

DAGData PipelineScheduling

0 likes · 20 min read

Mastering Data Workflows with DAGs: Scheduling, Configurable UI, and Visual Design

GuanYuan Data Tech Team

Jul 14, 2022 · Big Data

How to Train Massive GBDT Models on Spark: A Complete Step‑by‑Step Guide

This article walks through using Apache Spark for large‑scale GBDT training, covering the challenges of massive data, Spark deployment, PySpark code examples, differences from Pandas, feature engineering, mmlspark installation, early‑stopping tricks, performance bottlenecks, and a systematic evaluation of alternative frameworks.

GBDTSparkbig data

0 likes · 38 min read

How to Train Massive GBDT Models on Spark: A Complete Step‑by‑Step Guide

GuanYuan Data Tech Team

Jun 30, 2022 · Big Data

Why Spark 3.2 OOMs After Upgrade: Deep Dive into AQE and StageMetrics

After upgrading Spark from 3.0.1 to 3.2.1 an ETL job began failing with OutOfMemory errors; this article examines the root causes, including AQE‑related metric accumulation, skipped stages, and stage‑metric growth, and presents a debugging process and a code‑level fix to mitigate memory pressure.

AQEOutOfMemorySpark

0 likes · 13 min read

Why Spark 3.2 OOMs After Upgrade: Deep Dive into AQE and StageMetrics

GuanYuan Data Tech Team

Jun 16, 2022 · Artificial Intelligence

How Deepchecks Automates Data and Model Validation for Reliable AI Pipelines

This article introduces the open‑source Deepchecks library, explains its core concepts of checks, conditions, and suites, and provides step‑by‑step tutorials for data validation, train‑test validation, and model evaluation to help AI engineers build robust, data‑centric machine‑learning workflows.

Data validationPythondeepchecks

0 likes · 15 min read

How Deepchecks Automates Data and Model Validation for Reliable AI Pipelines

GuanYuan Data Tech Team

May 12, 2022 · Backend Development

Why Playwright Beats Selenium for Modern Web Automation

This article compares Playwright and Selenium, highlighting Playwright's superior language support, driver‑less operation, faster startup, reliable auto‑waiting, stable code generation, asynchronous capabilities, and headless mode, then provides step‑by‑step environment setup, practical usage tips, and code examples for Java‑based UI testing.

PlaywrightSeleniumUI automation

0 likes · 16 min read

Why Playwright Beats Selenium for Modern Web Automation

GuanYuan Data Tech Team

May 5, 2022 · Artificial Intelligence

Why FLAML Is the Fast, Lightweight AutoML Framework You Should Try

This article introduces Microsoft’s FLAML, a fast and lightweight AutoML library, explains its design principles, cost‑aware search strategy, key observations, properties, and experimental results, and provides practical code examples for integrating FLAML into Python machine‑learning workflows.

AutoMLCost-aware SearchFLAML

0 likes · 15 min read

Why FLAML Is the Fast, Lightweight AutoML Framework You Should Try

GuanYuan Data Tech Team

Apr 21, 2022 · Information Security

Why Does Keycloak PublicKey Retrieval Hang in Spring Boot? Timeout Fixes Explained

This article analyzes intermittent page failures caused by blocked Keycloak public‑key retrieval in a Spring Boot application, explains how default HTTP client timeouts of –1 lead to indefinite waits, and provides a filter‑based solution to set proper timeouts and adjust internal/external URLs.

AuthenticationKeycloakfilter

0 likes · 16 min read

Why Does Keycloak PublicKey Retrieval Hang in Spring Boot? Timeout Fixes Explained

GuanYuan Data Tech Team

Apr 14, 2022 · Artificial Intelligence

Mastering Time Series Forecasting: From Moving Averages to Transformers

Time series forecasting, essential across weather, finance, and commerce, involves tasks like classification, clustering, anomaly detection, and especially prediction; this article explores its definitions, evaluation metrics, traditional methods, machine‑learning approaches, deep‑learning models such as TFT, and emerging AutoML tools, offering practical insights and best practices.

AutoMLDeep LearningGBDT

0 likes · 27 min read

Mastering Time Series Forecasting: From Moving Averages to Transformers

GuanYuan Data Tech Team

Mar 24, 2022 · Big Data

Why Do Spark Card Queries Take 10 Seconds? Uncovering a NAS Mount Issue

A customer’s Spark card queries were consistently taking around 10 seconds, prompting a step‑by‑step investigation that revealed a misconfigured NAS mount option (lookupcache=none) as the root cause of the severe slowdown.

ArthasNASSpark

0 likes · 7 min read

Why Do Spark Card Queries Take 10 Seconds? Uncovering a NAS Mount Issue