Author

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

318

Articles

Likes

Views

Comments

Latest from Data Party THU

100 recent articles max

Data Party THU

Apr 14, 2026 · Artificial Intelligence

Heterogeneous Hyperbolic Manifolds for Better Vision-Language Tree Alignment

This paper introduces a novel framework that constructs and aligns dual visual‑textual trees on heterogeneous hyperbolic manifolds, addressing asymmetric modality alignment in hierarchical classification tasks and achieving state‑of‑the‑art performance on benchmarks such as CIFAR‑100, ImageNet and Rare Species datasets.

Cross-AttentionHierarchical ClassificationVision-Language Models

0 likes · 8 min read

Heterogeneous Hyperbolic Manifolds for Better Vision-Language Tree Alignment

Data Party THU

Apr 14, 2026 · Backend Development

10 Advanced Pydantic V2 Tricks to Harden Your FastAPI Production

Discover ten essential Pydantic V2 techniques—including strict mode, field constraints, separate create/update/response models, cross‑field validators, custom error handling, reusable types, forbidden extra fields, nested models, computed fields, and discriminated unions—to prevent subtle bugs and ensure robust, secure FastAPI APIs in production.

Backend DevelopmentFastAPIPydantic

0 likes · 17 min read

10 Advanced Pydantic V2 Tricks to Harden Your FastAPI Production

Data Party THU

Apr 12, 2026 · Artificial Intelligence

What’s Driving the Next Wave of LLM Post‑Training? A Deep Dive into SFT, RLHF, GRPO and Emerging Trends

This article systematically reviews the core post‑training techniques for large language models—including supervised fine‑tuning, RLHF, PPO, GRPO, DPO, RLVR and Agentic RL—explains their evolution, compares their trade‑offs, and highlights the most promising research directions for 2025‑2026.

AI alignmentGRPOLLM

0 likes · 20 min read

What’s Driving the Next Wave of LLM Post‑Training? A Deep Dive into SFT, RLHF, GRPO and Emerging Trends

Data Party THU

Apr 12, 2026 · Artificial Intelligence

Physics‑Informed GP Model Enables Near‑Infinite Stability in Hot Molecular Dynamics

Researchers from the University of Manchester introduced a physics‑informed Gaussian Process atomic energy model that, unlike traditional machine‑learning potentials, remains stable in molecular dynamics simulations up to 1000 K for tens of nanoseconds, demonstrating robust force predictions and reliable long‑time behavior across diverse molecules.

Gaussian ProcessMachine Learning Potentialscomputational chemistry

0 likes · 7 min read

Physics‑Informed GP Model Enables Near‑Infinite Stability in Hot Molecular Dynamics

Data Party THU

Apr 11, 2026 · Artificial Intelligence

How LLMs Are Uncovering Ultra‑Hard Carbon Allotropes in Minutes

Researchers at Xi'an Jiaotong University built a closed‑loop AI framework centered on a large language model that generates and evaluates thousands of carbon structures, rapidly discovering ultra‑hard, highly anisotropic and novel carbon allotropes such as C16_3, C12 and C8 within minutes.

AI-driven researchLLMMaterials Discovery

0 likes · 7 min read

How LLMs Are Uncovering Ultra‑Hard Carbon Allotropes in Minutes

Data Party THU

Apr 11, 2026 · Artificial Intelligence

How OpenClaw Turns Large Language Models into Actionable AI Agents

This article provides a comprehensive technical breakdown of the OpenClaw AI agent framework, explaining its distinction from base large models, its See‑Think‑Act‑Feedback loop, four‑layer architecture, key capabilities, deployment advantages, and real‑world enterprise use cases.

AI agentsEnterprise AIOpenClaw

0 likes · 17 min read

How OpenClaw Turns Large Language Models into Actionable AI Agents

Data Party THU

Apr 9, 2026 · Fundamentals

Mastering Numeric Feature Scaling: 4 Techniques with Scikit‑Learn

This article explains why numeric feature engineering is essential for machine learning, outlines the challenges of differing scales and outliers, and demonstrates four preprocessing methods—Standardization, Robust Scaler, Power Transformer, and Normalization—using the California housing dataset with detailed code examples and visual analysis.

Normalizationfeature scalingnumeric preprocessing

0 likes · 11 min read

Mastering Numeric Feature Scaling: 4 Techniques with Scikit‑Learn

Data Party THU

Apr 6, 2026 · Fundamentals

How Energy Distance Detects Distribution Shifts Between Training and Test Sets

Energy Distance is a statistical metric that quantifies the separation between two probability distributions by comparing cross‑distribution and within‑distribution Euclidean distances, enabling detection of data drift, covariate shift, and other multivariate distribution changes, especially when combined with permutation testing for statistical significance.

Data DriftEnergy DistancePermutation Test

0 likes · 7 min read

How Energy Distance Detects Distribution Shifts Between Training and Test Sets

Data Party THU

Apr 5, 2026 · Artificial Intelligence

How to Beat Shortcut Learning for Better OOD Generalization in Vision Models

Visual and vision-language models excel under IID benchmarks but often fail on out-of-distribution data due to shortcut learning; this article examines the problem, explains its causes, and proposes data-level and model-level interventions—including StillMix, FLASH, and SPARCL—to improve OOD robustness.

AI researchData AugmentationOOD generalization

0 likes · 7 min read

How to Beat Shortcut Learning for Better OOD Generalization in Vision Models

Data Party THU

Apr 5, 2026 · Artificial Intelligence

How Sequential World Models Enable Scalable Multi‑Robot Cooperation

SeqWM introduces a sequential causal decomposition of multi‑robot dynamics, allowing each robot to model its marginal contribution conditioned on preceding agents, which simplifies learning, improves sample efficiency, and yields natural collaborative behaviors both in simulation (Bi‑DexHands, Multi‑Quadruped) and real‑world tests on Unitree Go2‑W, outperforming prior methods.

multi-robotreal-robotreinforcement-learning

0 likes · 7 min read

How Sequential World Models Enable Scalable Multi‑Robot Cooperation