Author

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

386

Articles

Likes

592

Views

Comments

Latest from Data Party THU

100 recent articles max

Data Party THU

Apr 12, 2026 · Artificial Intelligence

What’s Driving the Next Wave of LLM Post‑Training? A Deep Dive into SFT, RLHF, GRPO and Emerging Trends

This article systematically reviews the core post‑training techniques for large language models—including supervised fine‑tuning, RLHF, PPO, GRPO, DPO, RLVR and Agentic RL—explains their evolution, compares their trade‑offs, and highlights the most promising research directions for 2025‑2026.

AI alignmentGRPOLLM

0 likes · 20 min read

What’s Driving the Next Wave of LLM Post‑Training? A Deep Dive into SFT, RLHF, GRPO and Emerging Trends

Data Party THU

Apr 12, 2026 · Artificial Intelligence

Physics‑Informed GP Model Enables Near‑Infinite Stability in Hot Molecular Dynamics

Researchers from the University of Manchester introduced a physics‑informed Gaussian Process atomic energy model that, unlike traditional machine‑learning potentials, remains stable in molecular dynamics simulations up to 1000 K for tens of nanoseconds, demonstrating robust force predictions and reliable long‑time behavior across diverse molecules.

Gaussian ProcessMachine Learning Potentialscomputational chemistry

0 likes · 7 min read

Physics‑Informed GP Model Enables Near‑Infinite Stability in Hot Molecular Dynamics

Data Party THU

Apr 11, 2026 · Artificial Intelligence

How LLMs Are Uncovering Ultra‑Hard Carbon Allotropes in Minutes

Researchers at Xi'an Jiaotong University built a closed‑loop AI framework centered on a large language model that generates and evaluates thousands of carbon structures, rapidly discovering ultra‑hard, highly anisotropic and novel carbon allotropes such as C16_3, C12 and C8 within minutes.

AI-driven researchLLMMaterials Discovery

0 likes · 7 min read

How LLMs Are Uncovering Ultra‑Hard Carbon Allotropes in Minutes

Data Party THU

Apr 11, 2026 · Artificial Intelligence

How OpenClaw Turns Large Language Models into Actionable AI Agents

This article provides a comprehensive technical breakdown of the OpenClaw AI agent framework, explaining its distinction from base large models, its See‑Think‑Act‑Feedback loop, four‑layer architecture, key capabilities, deployment advantages, and real‑world enterprise use cases.

AI agentsOpenClawenterprise AI

0 likes · 17 min read

How OpenClaw Turns Large Language Models into Actionable AI Agents

Data Party THU

Apr 9, 2026 · Fundamentals

Mastering Numeric Feature Scaling: 4 Techniques with Scikit‑Learn

This article explains why numeric feature engineering is essential for machine learning, outlines the challenges of differing scales and outliers, and demonstrates four preprocessing methods—Standardization, Robust Scaler, Power Transformer, and Normalization—using the California housing dataset with detailed code examples and visual analysis.

Normalizationfeature scalingnumeric preprocessing

0 likes · 11 min read

Mastering Numeric Feature Scaling: 4 Techniques with Scikit‑Learn

Data Party THU

Apr 6, 2026 · Fundamentals

How Energy Distance Detects Distribution Shifts Between Training and Test Sets

Energy Distance is a statistical metric that quantifies the separation between two probability distributions by comparing cross‑distribution and within‑distribution Euclidean distances, enabling detection of data drift, covariate shift, and other multivariate distribution changes, especially when combined with permutation testing for statistical significance.

Energy Distancedata driftdistribution shift

0 likes · 7 min read

How Energy Distance Detects Distribution Shifts Between Training and Test Sets

Data Party THU

Apr 5, 2026 · Artificial Intelligence

How to Beat Shortcut Learning for Better OOD Generalization in Vision Models

Visual and vision-language models excel under IID benchmarks but often fail on out-of-distribution data due to shortcut learning; this article examines the problem, explains its causes, and proposes data-level and model-level interventions—including StillMix, FLASH, and SPARCL—to improve OOD robustness.

AI researchModel DesignOOD generalization

0 likes · 7 min read

How to Beat Shortcut Learning for Better OOD Generalization in Vision Models

Data Party THU

Apr 5, 2026 · Artificial Intelligence

How Sequential World Models Enable Scalable Multi‑Robot Cooperation

SeqWM introduces a sequential causal decomposition of multi‑robot dynamics, allowing each robot to model its marginal contribution conditioned on preceding agents, which simplifies learning, improves sample efficiency, and yields natural collaborative behaviors both in simulation (Bi‑DexHands, Multi‑Quadruped) and real‑world tests on Unitree Go2‑W, outperforming prior methods.

multi-robotreal-robotreinforcement-learning

0 likes · 7 min read

How Sequential World Models Enable Scalable Multi‑Robot Cooperation

Data Party THU

Apr 4, 2026 · Artificial Intelligence

Can a Tiny AI‑Enabled Ring Decode Your Metabolic Odor in Real Time?

A Hong Kong University of Science and Technology team has created a miniature AI‑powered wearable ring that uses a 0.0081 mm² olfactory sensor chip to non‑invasively capture skin‑emitted VOCs, identify diet and activity states, and even quantify alcohol intake, offering a new frontier for continuous health monitoring.

Artificial IntelligenceNature Communicationshealth monitoring

0 likes · 8 min read

Can a Tiny AI‑Enabled Ring Decode Your Metabolic Odor in Real Time?

Data Party THU

Apr 3, 2026 · Artificial Intelligence

Can Attention Replace Residuals? Inside the New Attention Residuals Breakthrough

The article reviews the Kimi team's Attention Residuals approach, which substitutes traditional ResNet additive shortcuts with learned attention‑based weighting, explains the theoretical motivation linking depth to time, details full‑attention and block‑wise implementations, presents experimental results showing up to 1.25× compute efficiency and improved performance on reasoning and knowledge tasks.

Deep LearningResidual NetworksTransformer

0 likes · 11 min read

Can Attention Replace Residuals? Inside the New Attention Residuals Breakthrough