Tagged articles
268 articles
Page 1 of 3
Machine Heart
Machine Heart
May 20, 2026 · Industry Insights

ByteDance Scholarship Goes Global: Tracking the Careers of 67 Winners Over Five Years

The 2026 ByteDance Scholarship opens to worldwide applicants, expands slots and funding, and now accepts stage‑level results; a five‑year review shows 67 awardees—spanning PhDs, masters and undergraduates—from top universities who have entered top AI labs, founded startups, or taken faculty positions, illustrating how early‑stage research often precedes industry trends.

AI researchAI scholarshipByteDance
0 likes · 12 min read
ByteDance Scholarship Goes Global: Tracking the Careers of 67 Winners Over Five Years
Machine Heart
Machine Heart
May 17, 2026 · Artificial Intelligence

What Exactly Is a World Model? History, Technology, and the $10 B Bet

The article traces the two decades‑long, parallel research lines that birthed video world models—dreaming agents in reinforcement learning and learning physics from human video—explains how they converged in 2024‑2025, evaluates current capabilities and limitations, and analyzes the $10 billion investment landscape and strategic moves by NVIDIA, OpenAI, and others.

AI researchRoboticsVideo Generation
0 likes · 32 min read
What Exactly Is a World Model? History, Technology, and the $10 B Bet
Data Party THU
Data Party THU
Apr 30, 2026 · Artificial Intelligence

Turning Transformers into Mamba: How Apple Linearized Inference Costs

Apple introduced a two‑step cross‑architecture distillation method that converts costly quadratic‑time Transformers into cheaper linear‑time Mamba models, preserving most of the original performance while dramatically reducing inference cost.

AI researchLinear AttentionMamba
0 likes · 8 min read
Turning Transformers into Mamba: How Apple Linearized Inference Costs
Machine Heart
Machine Heart
Apr 30, 2026 · Artificial Intelligence

Can a Pre‑1930 Language Model Infer Einstein’s Relativity? Insights from the Talkie‑1930 Project

Researchers built a 13‑billion‑parameter model trained only on texts published before 1931, called Talkie‑1930, and used surprise‑based metrics, programming tests, and a modern‑twin comparison to explore how far such a historically‑constrained model can extrapolate future knowledge and reveal data‑leakage challenges.

AI researchHumanEvaldata leakage
0 likes · 10 min read
Can a Pre‑1930 Language Model Infer Einstein’s Relativity? Insights from the Talkie‑1930 Project
PaperAgent
PaperAgent
Apr 30, 2026 · Artificial Intelligence

How Agentic AI is Redefining World Modeling

The article reviews the paper "Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond", introducing a two‑axis framework (capability levels L1‑L3 and law domains) to map diverse world‑modeling systems, highlighting that most current systems stall at L1, that explicit law encoding is crucial for long‑term stability, and that L3 represents the ultimate, self‑evolving model.

AI AgentsAI researchAgentic AI
1 likes · 6 min read
How Agentic AI is Redefining World Modeling
Data Party THU
Data Party THU
Apr 29, 2026 · Artificial Intelligence

How Far Can Unsupervised RL for Large Models Go? A Systematic Answer from a Tsinghua Team

The article analyzes the scaling limits of unsupervised reinforcement learning for large language models, revealing that intrinsic‑reward methods initially boost performance but inevitably collapse, proposes a unified theory and a model‑collapse metric to predict trainability, and argues that external‑reward approaches are the scalable path forward.

AI researchRL scalingexternal rewards
0 likes · 11 min read
How Far Can Unsupervised RL for Large Models Go? A Systematic Answer from a Tsinghua Team
Machine Heart
Machine Heart
Apr 25, 2026 · Artificial Intelligence

Can Multi-Model Co-Evolution Shatter the Single-Model Ceiling? Squeeze Evolve Achieves Validator-Free SOTA Inference

The paper introduces Squeeze Evolve, a validator‑free multi‑model evolutionary framework that orchestrates diverse large language models to break the performance ceiling of any single model, delivering up to 23‑point accuracy improvements and 1.4‑3.3× cost reductions across math, vision, and scientific benchmarks.

AI researchInference OptimizationSqueeze Evolve
0 likes · 8 min read
Can Multi-Model Co-Evolution Shatter the Single-Model Ceiling? Squeeze Evolve Achieves Validator-Free SOTA Inference
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Apr 23, 2026 · Industry Insights

Should You Take a Tencent AI Internship? Key Factors to Consider

The article examines whether a Tencent AI internship is worth pursuing by analyzing the program’s growth stage, unique user ecosystem, mentorship structure, compensation model, and early‑year advantages, illustrated with real intern case studies, to help students decide what they aim to gain from the experience.

AI internshipAI researchTech Industry
0 likes · 14 min read
Should You Take a Tencent AI Internship? Key Factors to Consider
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Apr 16, 2026 · Artificial Intelligence

Evidence Mining for Explainable AI: Methods and Applications

The talk introduces evidence‑mining techniques that extract supporting information from input text to improve model explainability, discusses the shortcut‑learning pitfalls of existing methods, and presents a new approach that enhances reliability and integrates with large‑model chain‑of‑thought compression for more interpretable, efficient reasoning.

AI researchevidence miningexplainable AI
0 likes · 4 min read
Evidence Mining for Explainable AI: Methods and Applications
Meituan Technology Team
Meituan Technology Team
Apr 16, 2026 · Artificial Intelligence

Can End-to-End Diffusion TTS Beat Traditional Pipelines? Inside LongCat-AudioDiT

LongCat-AudioDiT introduces a wave‑VAE plus diffusion Transformer architecture that eliminates intermediate spectrograms, solves training‑inference mismatch with dual constraints, replaces classifier‑free guidance with adaptive projection guidance, and achieves state‑of‑the‑art zero‑shot voice cloning performance on multiple benchmarks.

AI researchaudio generationdiffusion model
0 likes · 12 min read
Can End-to-End Diffusion TTS Beat Traditional Pipelines? Inside LongCat-AudioDiT
Data STUDIO
Data STUDIO
Apr 14, 2026 · Artificial Intelligence

Can ChatGPT Deep Research Double Your Research Efficiency?

The article explains how ChatGPT Deep Research transforms ordinary web searches into full‑fledged research reports, compares three leading Deep Research tools, outlines nine practical use cases, warns of common pitfalls, and offers prompt‑engineering tips for both individual and enterprise adoption.

AI researchChatGPTDeep Research
0 likes · 16 min read
Can ChatGPT Deep Research Double Your Research Efficiency?
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Apr 9, 2026 · Artificial Intelligence

Google DeepMind’s Deep Think Dominates Eight Language Olympiads and Solves Four AI Challenges

Google DeepMind’s Deep Think model posted top‑tier scores in eight language‑specific Olympiads—from IMO gold to ICPC finals—while also tackling open scientific problems, yet the results rely on internal evaluations without third‑party verification, highlighting both a breakthrough in multilingual AI reasoning and the need for transparent benchmarking.

AI benchmarkingAI researchDeep Think
0 likes · 9 min read
Google DeepMind’s Deep Think Dominates Eight Language Olympiads and Solves Four AI Challenges
PaperAgent
PaperAgent
Apr 6, 2026 · Artificial Intelligence

Can LLMs Self‑Improve After Deployment? Inside Microsoft’s Online Experiential Learning

Microsoft’s Online Experiential Learning framework lets large language models continuously self‑evolve after deployment by extracting experience from user interactions and consolidating it into model parameters, eliminating the need for human labels, reward models, or server‑side environment access, and demonstrating scalable gains across tasks and model sizes.

AI researchLLMOnline Learning
0 likes · 9 min read
Can LLMs Self‑Improve After Deployment? Inside Microsoft’s Online Experiential Learning
Data Party THU
Data Party THU
Apr 5, 2026 · Artificial Intelligence

How to Beat Shortcut Learning for Better OOD Generalization in Vision Models

Visual and vision-language models excel under IID benchmarks but often fail on out-of-distribution data due to shortcut learning; this article examines the problem, explains its causes, and proposes data-level and model-level interventions—including StillMix, FLASH, and SPARCL—to improve OOD robustness.

AI researchModel DesignOOD generalization
0 likes · 7 min read
How to Beat Shortcut Learning for Better OOD Generalization in Vision Models
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Apr 2, 2026 · Artificial Intelligence

How Large Language Models Can Self‑Improve: A Technical Review and Future Outlook

This article surveys the emerging self‑improvement paradigm for large language models, presenting a closed‑loop lifecycle comprising data acquisition, selection, model optimization, inference refinement, and an autonomous evaluation layer, and discusses current limitations and research directions toward fully autonomous LLM evolution.

AI researchLLMautonomous evaluation
0 likes · 11 min read
How Large Language Models Can Self‑Improve: A Technical Review and Future Outlook
PaperAgent
PaperAgent
Mar 28, 2026 · Artificial Intelligence

How ACCORD Breaks Concept Coupling in Custom Text‑to‑Image Generation

The ACCORD framework formalizes the concept‑coupling issue in text‑to‑image diffusion models as a statistical dependency problem and resolves it with two plug‑and‑play regularization losses, dramatically improving fidelity and text control without altering model architecture.

ACCORDAI researchconcept coupling
0 likes · 7 min read
How ACCORD Breaks Concept Coupling in Custom Text‑to‑Image Generation
SuanNi
SuanNi
Mar 25, 2026 · Artificial Intelligence

How LeWorldModel Learns Physics from Pixels in Hours – A Deep Dive

LeWorldModel (LeWM) is a compact AI world model that learns real‑world physics directly from raw pixel streams using only two simple mathematical rules, achieving dramatically faster planning and robust physical intuition compared to prior large‑scale models.

AI researchModel Predictive Controlphysics learning
0 likes · 14 min read
How LeWorldModel Learns Physics from Pixels in Hours – A Deep Dive
AI Architecture Hub
AI Architecture Hub
Mar 25, 2026 · Artificial Intelligence

How Memento-Skills Enables Continuous Learning for Frozen LLM Agents

The article analyzes the limitations of frozen LLM agents—fixed parameters, loss of state, and costly fine‑tuning—and introduces the Memento‑Skills framework, which adds an external, evolvable skill memory to achieve deployment‑time learning, detailed architecture, optimization knobs, and strong experimental gains.

AI researchDeployment-Time LearningLLM agents
0 likes · 14 min read
How Memento-Skills Enables Continuous Learning for Frozen LLM Agents
AIWalker
AIWalker
Mar 20, 2026 · Artificial Intelligence

Plug‑and‑Play reAR Boosts Visual AR to SOTA Quality with Only 177M Parameters

The paper introduces reAR, a plug‑and‑play regularization framework that aligns generator and tokenizer representations in visual autoregressive models, dramatically improving image quality and matching large diffusion models while using far fewer parameters, and validates the approach with extensive experiments, ablations, and scalability analysis.

AI researchRegularizationimage generation
0 likes · 20 min read
Plug‑and‑Play reAR Boosts Visual AR to SOTA Quality with Only 177M Parameters
AIWalker
AIWalker
Mar 17, 2026 · Artificial Intelligence

How a 4B-Parameter Open-Source Model Outperforms 14B Multimodal Giants

InternVL-U, a 4‑billion‑parameter unified multimodal model released as open source, combines a 2B MLLM backbone with a 1.7B visual generation head and, through a reasoning‑centric data pipeline and Chain‑of‑Thought guidance, achieves superior understanding, generation, and editing performance that surpasses much larger 14‑20B models on multiple benchmarks.

AI researchInternVL-Uimage generation
0 likes · 22 min read
How a 4B-Parameter Open-Source Model Outperforms 14B Multimodal Giants
AI Architecture Path
AI Architecture Path
Mar 17, 2026 · Artificial Intelligence

Automating LLM Tuning with Autoresearch: AI Agents on a Single GPU

Autoresearch, an open‑source project by Andrej Karpathy, enables AI agents to autonomously modify code, run experiments, and evaluate results for LLM tuning on a single GPU, dramatically reducing manual hyper‑parameter work, standardizing experiments, and offering low‑cost, reproducible research with clear limitations and practical setup steps.

AI researchAutonomous AgentsLLM tuning
0 likes · 11 min read
Automating LLM Tuning with Autoresearch: AI Agents on a Single GPU
AI Explorer
AI Explorer
Mar 15, 2026 · Artificial Intelligence

Large Models May Break Language Training Dependence, Redefining Intelligence

A new study suggests that large AI models could reduce their reliance on massive text corpora by early‑fusing multimodal data such as video and sensor streams, potentially slashing training costs, improving generalization, and prompting a shift toward more embodied notions of intelligence.

AI researchEmbodied IntelligenceMultimodal Learning
0 likes · 6 min read
Large Models May Break Language Training Dependence, Redefining Intelligence
AI Engineering
AI Engineering
Mar 10, 2026 · Artificial Intelligence

Yann LeCun’s New AMI Labs Secures $1.03B to Build a World‑Model Alternative to LLMs

Yann LeCun and Alexandre LeBrun have launched AMI Labs, raising $1.03 billion in Europe’s largest seed round to develop JEPA—a world‑model architecture intended to replace LLMs for high‑risk domains, with all code and papers open‑sourced, a 5‑10‑year horizon, and backing from NVIDIA, Samsung, Bezos’ venture, and others.

AI researchAMI LabsJEPA
0 likes · 3 min read
Yann LeCun’s New AMI Labs Secures $1.03B to Build a World‑Model Alternative to LLMs
AIWalker
AIWalker
Mar 8, 2026 · Artificial Intelligence

How VisionPangu’s 1.7B Model Beats Larger LLMs in Detailed Image Captioning

VisionPangu demonstrates that a compact 1.7 B‑parameter multimodal model can generate richly detailed, coherent image descriptions that rival much larger models by leveraging high‑quality dense data, a three‑part architecture, and a two‑stage deep alignment training strategy.

AI researchData QualityImage Captioning
0 likes · 13 min read
How VisionPangu’s 1.7B Model Beats Larger LLMs in Detailed Image Captioning
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 6, 2026 · Artificial Intelligence

15‑Person Overseas Chinese Team Builds Uni‑1, a Unified Image Model Surpassing Nano Banana

The article reviews Uni‑1, a decoder‑only transformer that unifies visual understanding and generation, details its architecture, benchmark superiority on RISEBench and ODinW‑13, showcases diverse visual examples where it outperforms GPT Image 1.5 and Nano Banana Pro, and highlights the small elite team behind the breakthrough.

AI researchLuma AIRISEBench
0 likes · 14 min read
15‑Person Overseas Chinese Team Builds Uni‑1, a Unified Image Model Surpassing Nano Banana
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 5, 2026 · Artificial Intelligence

Can AI Self‑Improve? Inside a Stanford PhD Defense on Continually Self‑Improving AI

Zitong Yang’s Stanford PhD defense introduced “continually self‑improving AI,” a system that autonomously refines its own parameters, generates synthetic training data, and even designs its own learning algorithms, with experiments on synthetic continual training, synthetic‑bootstrap pre‑training, and AI‑design‑AI demonstrating measurable gains over static baselines.

AI researchcontinual learningpretraining
0 likes · 35 min read
Can AI Self‑Improve? Inside a Stanford PhD Defense on Continually Self‑Improving AI
AI Frontier Lectures
AI Frontier Lectures
Feb 28, 2026 · Artificial Intelligence

Can Reinforcement Learning Revolutionize Text-to-3D Generation? A Deep Dive

This article presents a systematic investigation of applying reinforcement learning to text‑to‑3D generation, detailing reward design, algorithm selection, a new 3D benchmark, a hierarchical GRPO framework, extensive ablations, and the resulting performance gains and limitations.

AI researchGenerative Modelsreinforcement learning
0 likes · 13 min read
Can Reinforcement Learning Revolutionize Text-to-3D Generation? A Deep Dive
PaperAgent
PaperAgent
Feb 25, 2026 · Artificial Intelligence

How Contextual Co-Player Inference Enables Robust Multi-Agent Cooperation

These two recent Google papers advance multi‑agent reinforcement learning: one introduces contextual co‑player inference to achieve robust cooperation without explicit meta‑learning, while the other presents AlphaEvolve, a large‑language‑model‑driven evolutionary framework that automatically discovers novel MARL algorithms such as VAD‑CFR and SHOR‑PSRO.

AI researchCFRLLM-driven algorithm discovery
0 likes · 13 min read
How Contextual Co-Player Inference Enables Robust Multi-Agent Cooperation
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Feb 14, 2026 · Artificial Intelligence

Latent Forcing: Reordering Diffusion Steps Boosts Pixel‑Level Image Quality

The new Latent Forcing technique from Fei‑Fei Li’s team reorders the diffusion trajectory, first generating a latent structural sketch and then refining pixel details, which restores efficiency of latent‑space models while preserving 100 % pixel fidelity, achieving state‑of‑the‑art FID scores on ImageNet‑256.

AI researchImageNetdiffusion models
0 likes · 6 min read
Latent Forcing: Reordering Diffusion Steps Boosts Pixel‑Level Image Quality
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Feb 10, 2026 · Artificial Intelligence

LeCun Team’s Triple Breakthrough: Sparse Representations, Gradient Planning, and Lightweight JEPA for World Models

LeCun’s three new papers—Rectified LpJEPA, GRASP, and EB‑JEPA—address dense feature bottlenecks, inefficient gradient‑free planning, and heavyweight codebases by introducing sparsity‑preserving regularization, a parallel gradient‑based planner, and a lightweight modular library, delivering high‑performance world‑model representations that run on a single GPU.

AI researchJEPAWorld Models
0 likes · 11 min read
LeCun Team’s Triple Breakthrough: Sparse Representations, Gradient Planning, and Lightweight JEPA for World Models
JD Cloud Developers
JD Cloud Developers
Feb 4, 2026 · Artificial Intelligence

How Deep Research Transforms LLMs into Autonomous AI Researchers

This article examines Deep Research, an AI system that adds autonomous planning and deep reasoning to large language models, enabling them to browse the web, perform long‑chain reasoning, and generate professional, citation‑rich reports for complex tasks such as industry trend analysis and technical competitive research.

AI researchAutonomous AgentsLLM
0 likes · 22 min read
How Deep Research Transforms LLMs into Autonomous AI Researchers
JD Tech Talk
JD Tech Talk
Feb 4, 2026 · Artificial Intelligence

How Deep Research Turns LLMs into Autonomous AI Researchers

This article explains the background, core features, underlying ReAct‑based architecture, and engineering solutions of Deep Research—a system that equips large language models with autonomous planning, long‑chain reasoning, and professional report generation to tackle complex information‑intensive tasks.

AI researchAutonomous AgentsLLM
0 likes · 21 min read
How Deep Research Turns LLMs into Autonomous AI Researchers
PaperAgent
PaperAgent
Feb 2, 2026 · Artificial Intelligence

How Kimi K2.5 Achieves Multimodal Mastery with Joint Training and Agent Swarms

The Kimi K2.5 technical report reveals how a Chinese team combined joint text‑vision training, a novel Zero‑Vision SFT method, and a parallel agent‑swarm architecture to deliver top‑ranked multimodal performance, dramatically faster inference, and open‑source access for broader AI research.

AI researchAgent SwarmKimi-K2.5
0 likes · 9 min read
How Kimi K2.5 Achieves Multimodal Mastery with Joint Training and Agent Swarms
Data Party THU
Data Party THU
Jan 31, 2026 · Artificial Intelligence

Can LLMs Learn While Being Tested? Inside the TTT-Discover Breakthrough

The article examines the Test‑Time Training to Discover (TTT‑Discover) approach, which applies reinforcement learning during inference to let large language models continuously improve on single test problems, and reports strong results across mathematics, GPU kernel optimization, algorithm design, and biology.

AI researchLLMScientific Discovery
0 likes · 9 min read
Can LLMs Learn While Being Tested? Inside the TTT-Discover Breakthrough
AI Frontier Lectures
AI Frontier Lectures
Jan 30, 2026 · Artificial Intelligence

How SplatSSC Revolutionizes Semantic Scene Completion with Depth‑Guided Gaussian Splatting

SplatSSC introduces a depth‑guided Gaussian splatting framework that replaces random primitive initialization with geometry‑aware priors and a decoupled aggregation module, achieving state‑of‑the‑art performance on indoor semantic scene completion while dramatically reducing computational overhead and eliminating floaters.

3D perceptionAI researchGaussian splatting
0 likes · 10 min read
How SplatSSC Revolutionizes Semantic Scene Completion with Depth‑Guided Gaussian Splatting
Baobao Algorithm Notes
Baobao Algorithm Notes
Jan 26, 2026 · Artificial Intelligence

From Search Ads to Foundation Models: My Journey Building the EvoCUA GUI Agent

The author explains why he transitioned from search advertising algorithms to foundation model research, outlines the four typical activities of base‑model teams, and shares detailed technical insights, experimental practices, and scaling strategies that led the EvoCUA GUI Agent to achieve open‑source SOTA on OSWorld.

AI researchGUI agentsModel Scaling
0 likes · 17 min read
From Search Ads to Foundation Models: My Journey Building the EvoCUA GUI Agent
PaperAgent
PaperAgent
Jan 25, 2026 · Artificial Intelligence

How Deep GraphRAG Solves Retrieval’s Three‑Way Dilemma with Hierarchical Search

Deep GraphRAG tackles the three‑fold dilemma of traditional Retrieval‑Augmented Generation by introducing hierarchical global‑to‑local retrieval, a beam‑search dynamic reordering that cuts latency, and a DW‑GRPO reinforcement‑learning module that adaptively weights rewards, achieving near‑state‑of‑the‑art performance with up to 86% faster inference.

AI researchGraphRAGHierarchical Retrieval
0 likes · 5 min read
How Deep GraphRAG Solves Retrieval’s Three‑Way Dilemma with Hierarchical Search
PaperAgent
PaperAgent
Jan 25, 2026 · Industry Insights

Top 10 Chinese Large Models to Watch: Features, Benchmarks, and Download Links

This roundup highlights ten cutting‑edge Chinese AI models—including Qwen3‑TTS, LongCat‑Flash‑Thinking‑2601, GLM‑4.7‑Flash, STEP3‑VL‑10B, Baichuan‑M3, and Youtu‑LLM—detailing their multilingual capabilities, architecture innovations, performance claims, and providing direct repository links for researchers and developers.

AI researchChinese AIlarge language models
0 likes · 7 min read
Top 10 Chinese Large Models to Watch: Features, Benchmarks, and Download Links
PaperAgent
PaperAgent
Jan 20, 2026 · Artificial Intelligence

How Intrinsic Self‑Critique Boosts LLM Planning Accuracy to 89% %​

Google DeepMind's new "Intrinsic Self‑Critique" method lets large language models iteratively self‑evaluate and rewrite their plans, raising Blocksworld planning accuracy from 49.8% to 89.3% and setting new records across multiple planning benchmarks.

AI researchLLMPlanning
0 likes · 5 min read
How Intrinsic Self‑Critique Boosts LLM Planning Accuracy to 89% %​
BirdNest Tech Talk
BirdNest Tech Talk
Jan 11, 2026 · Artificial Intelligence

How AI Agents Overcome Context Window Limits: Gemini vs Manus Deep Research

The article analyzes the context‑window bottleneck of large language models, compares two architectural strategies—strengthening the model (Gemini Deep Research) and parallel agent decomposition (Manus Wide Research)—and details a wind‑power investment case study, technical implementation, and future directions.

AI researchAgent ArchitectureContext Window
0 likes · 16 min read
How AI Agents Overcome Context Window Limits: Gemini vs Manus Deep Research
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Jan 11, 2026 · Artificial Intelligence

Insights from NeurIPS 2025: Modeling Distributions and Venturing Beyond Them

The report summarizes NeurIPS 2025 in San Diego, highlighting four NIRC papers on noise‑robust 3D human pose estimation, LVLM video‑anomaly understanding, and hand‑object reconstruction, and discusses broader industry trends such as feed‑forward generation and large‑scale pre‑training showcased by leading AI companies.

3D human pose estimationAI researchLVLM
0 likes · 5 min read
Insights from NeurIPS 2025: Modeling Distributions and Venturing Beyond Them
Data Party THU
Data Party THU
Jan 7, 2026 · Artificial Intelligence

Why the Common KL Penalty in LLM RL Training Is Biased—and How to Fix It

A recent study reveals that the widely used KL regularization in LLM reinforcement learning (RLVR) is mathematically biased, leading to unstable training and poorer generalization, and shows that moving the KL term back to the reward with a simple K1 estimator can boost out‑of‑domain performance by up to 20%.

AI researchKL regularizationLLM training
0 likes · 10 min read
Why the Common KL Penalty in LLM RL Training Is Biased—and How to Fix It
PaperAgent
PaperAgent
Jan 6, 2026 · Artificial Intelligence

How Recursive Language Models Enable Unlimited Context for LLMs

Recursive Language Models (RLM) offer a cost‑effective alternative to expanding LLM context windows by storing prompts as variables and enabling recursive calls, allowing models to process over 100,000 tokens, with experiments showing superior performance and lower median costs compared to baseline approaches.

AI researchLLM scalingPrompt Engineering
0 likes · 5 min read
How Recursive Language Models Enable Unlimited Context for LLMs
HyperAI Super Neural
HyperAI Super Neural
Jan 5, 2026 · Artificial Intelligence

WorldPlay: Real‑Time Interactive World Modeling with Long‑Term Geometry Consistency

Tencent’s HyperAI team introduces WorldPlay, an open‑source real‑time interactive world model that achieves 24 FPS 720p video generation while preserving long‑term geometric consistency through dual‑action representation, dynamic context memory reconstruction, and a novel context‑forcing distillation, and also showcases Maya1 emotional TTS and RFdiffusion3 protein design models.

AI researchWorldPlaycontext memory
0 likes · 6 min read
WorldPlay: Real‑Time Interactive World Modeling with Long‑Term Geometry Consistency
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Dec 30, 2025 · Artificial Intelligence

Bridging Tokenizer Gaps: Cross-Tokenizer Knowledge Distillation at AAAI 2026

This paper introduces SeDi, a semantics‑ and distribution‑aware cross‑tokenizer knowledge distillation framework that aligns teacher and student token spaces via bipartite graph components and top‑K re‑encoding, achieving state‑of‑the‑art performance and lower exposure bias on multiple LLM benchmarks.

AI researchcross-tokenizer distillationentropy alignment
0 likes · 10 min read
Bridging Tokenizer Gaps: Cross-Tokenizer Knowledge Distillation at AAAI 2026
PaperAgent
PaperAgent
Dec 29, 2025 · Artificial Intelligence

Unveiling Bottom‑up Policy Optimization: Boosting LLM Reasoning with Internal Strategies

This article introduces Bottom‑up Policy Optimization (BuPO), a novel reinforcement‑learning framework that treats large language models as collections of internal layer and modular policies, revealing distinct inference entropy patterns in Llama and Qwen‑3 and demonstrating superior performance on complex mathematical reasoning benchmarks.

AI researchBottom-up OptimizationInternal Policy
0 likes · 10 min read
Unveiling Bottom‑up Policy Optimization: Boosting LLM Reasoning with Internal Strategies
PaperAgent
PaperAgent
Dec 26, 2025 · Artificial Intelligence

What Google’s 2025 AI Breakthroughs Reveal About the Future of Intelligent Agents

Google’s 2025 research recap highlights eight major breakthroughs—from the Gemini 3 series achieving unprecedented multimodal reasoning and efficiency, to AI‑driven advances in scientific discovery, creative generation, quantum computing, climate resilience, and responsible AI safety—showcasing how intelligent agents are reshaping products, research, and global challenges.

AI SafetyAI researchQuantum Computing
0 likes · 10 min read
What Google’s 2025 AI Breakthroughs Reveal About the Future of Intelligent Agents
HyperAI Super Neural
HyperAI Super Neural
Dec 19, 2025 · Artificial Intelligence

Weekly AI Paper Digest: Open-Source LLMs, Agent Systems, and Long-Context Reasoning

This week’s AI paper roundup reviews six recent research works—including RecGPT‑V2, Nemotron 3 Nano, FrontierScience benchmark, AutoGLM, Deeper‑GXX, and QwenLong‑L1.5—highlighting advances in large‑language‑model‑driven recommendation, Mixture‑of‑Experts models, expert‑level scientific reasoning, GUI‑based foundation agents, graph neural network deepening, and ultra‑long‑context inference.

AI researchAgent SystemsBenchmark
0 likes · 6 min read
Weekly AI Paper Digest: Open-Source LLMs, Agent Systems, and Long-Context Reasoning
AI Frontier Lectures
AI Frontier Lectures
Dec 15, 2025 · Artificial Intelligence

How UnityVideo Unifies Multimodal Training to Boost Video Generation

UnityVideo, a new vision framework from HKUST, CUHK, Tsinghua and Kuaishou, unifies training across depth, flow, pose, segmentation and RGB modalities, achieving faster convergence, higher video quality, zero‑shot generalization and stronger physical reasoning compared with existing single‑modality video generators.

AI researchUnityVideomultimodal video generation
0 likes · 15 min read
How UnityVideo Unifies Multimodal Training to Boost Video Generation
PaperAgent
PaperAgent
Dec 13, 2025 · Artificial Intelligence

Why Unified Multimodal Models Are the Key to Next‑Gen AGI – A Deep Survey

This article surveys the latest research on Unified Multimodal Foundations (UFM), explaining why integrating understanding and generation across text, image, video, and audio is essential for AGI, and detailing modeling paradigms, encoding/decoding strategies, training pipelines, benchmarks, and real‑world applications.

AI researchBenchmarkTraining
0 likes · 10 min read
Why Unified Multimodal Models Are the Key to Next‑Gen AGI – A Deep Survey
BirdNest Tech Talk
BirdNest Tech Talk
Dec 7, 2025 · Artificial Intelligence

Recreating DeerFlow’s Multi‑Agent Research Pipeline with LangGraphGo in 30 Minutes

This article walks through the open‑source DeerFlow framework—its multi‑agent architecture, core features, and a step‑by‑step implementation using the Go‑based LangGraphGo library, covering planner, researcher, reporter and podcast nodes, state‑graph design, CLI/web modes, and deployment instructions.

AI researchLLMLangGraphGo
0 likes · 14 min read
Recreating DeerFlow’s Multi‑Agent Research Pipeline with LangGraphGo in 30 Minutes
DataFunTalk
DataFunTalk
Dec 7, 2025 · Artificial Intelligence

Is the World Model the Key to AGI? Inside the AI Debate

The article examines the chaotic rise of “world models” in AI, tracing their origins from early mental‑model theory to modern representation‑ and generation‑based approaches, and argues that the current hype reflects a broader shift away from large language models toward embodied, physics‑grounded intelligence.

AI researchWorld Modelsgenerative video
0 likes · 13 min read
Is the World Model the Key to AGI? Inside the AI Debate
Data Party THU
Data Party THU
Dec 2, 2025 · Artificial Intelligence

FFGo: Turning the First Frame into a Conceptual Memory for Video Customization

FFGo reveals that the first frame of text‑to‑video models acts as a conceptual memory buffer storing visual entities, and by using a few‑shot LoRA trained on only 20‑50 curated examples with a special transition prompt, it reliably activates multi‑object fusion, enabling high‑quality, controllable video customization without model architecture changes.

AI researchVideo Generationconceptual memory
0 likes · 9 min read
FFGo: Turning the First Frame into a Conceptual Memory for Video Customization
Kuaishou Tech
Kuaishou Tech
Nov 25, 2025 · Artificial Intelligence

How Flow‑GRPO Boosts Image Generation Accuracy to 95% with Online Reinforcement Learning

Flow‑GRPO introduces online reinforcement learning into flow‑matching models by converting deterministic ODE sampling to stochastic SDE sampling and reducing denoising steps, raising SD‑3.5‑Medium's GenEval accuracy from 63% to 95%—surpassing GPT‑4o—and demonstrating strong gains in complex composition, text rendering, and human‑preference alignment across multiple generative tasks.

AI researchDeep Learningflow matching
0 likes · 8 min read
How Flow‑GRPO Boosts Image Generation Accuracy to 95% with Online Reinforcement Learning
HyperAI Super Neural
HyperAI Super Neural
Nov 19, 2025 · Artificial Intelligence

LocDiff: Achieving Global-Scale Precise Image Geolocation Without Grids or Reference Libraries

The LocDiff framework introduces a spherical‑harmonics Dirac‑delta encoding and a conditional Siren‑UNet diffusion model that enables accurate worldwide image geolocation without relying on predefined grids or external image libraries, outperforming prior methods in precision, generalization, and computational efficiency.

AI researchLocDiffdiffusion models
0 likes · 16 min read
LocDiff: Achieving Global-Scale Precise Image Geolocation Without Grids or Reference Libraries
Data Party THU
Data Party THU
Nov 13, 2025 · Artificial Intelligence

What Makes the Free Transformer a Game‑Changer in AI Decoding?

The Free Transformer paper introduces a decoder architecture that injects random latent variables to condition generation, breaking traditional GPT constraints and achieving notable performance gains on reasoning‑heavy benchmarks such as HumanEval+, MBPP, GSM8K, MMLU, and CSQA.

AI researchFree TransformerTransformer
0 likes · 10 min read
What Makes the Free Transformer a Game‑Changer in AI Decoding?
Alimama Tech
Alimama Tech
Nov 11, 2025 · Artificial Intelligence

Industrial-Scale Graph Learning: Boosting Ad ROI and Winning Beijing’s Science Award

The award‑winning industrial graph learning system developed by Peking University and Alibaba Mama combines novel dynamic graph embedding and GNN techniques, scales to millions of merchants, and has driven over 12% ad ROI improvement while publishing dozens of top‑conference papers.

AI researchIndustrial AIadvertising optimization
0 likes · 6 min read
Industrial-Scale Graph Learning: Boosting Ad ROI and Winning Beijing’s Science Award
Data Party THU
Data Party THU
Nov 5, 2025 · Artificial Intelligence

How VLM‑FO1 Turns Vision‑Language Models into Precise Perception Machines

VLM‑FO1 introduces a generate‑plus‑reference paradigm that replaces coordinate generation with region token referencing, adding plug‑in modules such as a proposal generator, a hybrid fine‑grained encoder, and a region‑language connector to give any pretrained visual language model accurate, fine‑grained perception while preserving its original capabilities.

AI researchPlug-and-PlayVLM
0 likes · 15 min read
How VLM‑FO1 Turns Vision‑Language Models into Precise Perception Machines
Data Party THU
Data Party THU
Oct 29, 2025 · Artificial Intelligence

Can Test-Time Scaling Unlock More Reliable Vision‑Language‑Action Robots?

The paper introduces RoboMonkey, a framework that applies a generate‑and‑verify paradigm and test‑time scaling to Vision‑Language‑Action models, showing that increasing sampling and verification at inference dramatically reduces action error across multiple VLA architectures, and presents scalable verifier training, synthetic data augmentation, and efficient deployment strategies.

AI researchAction VerificationRoboMonkey
0 likes · 8 min read
Can Test-Time Scaling Unlock More Reliable Vision‑Language‑Action Robots?
DataFunTalk
DataFunTalk
Oct 29, 2025 · Artificial Intelligence

OpenAI Unveils $25B AI Initiative and Multi‑Year AGI Roadmap

OpenAI’s recent restructuring created the OpenAI Foundation, pledged $25 billion to health and AI‑resilience research, outlined a multi‑year AGI timeline, announced plans for AI hardware, and set milestones for an AI research intern by next September and a fully autonomous AI researcher by 2028.

AGIAI hardwareAI research
0 likes · 3 min read
OpenAI Unveils $25B AI Initiative and Multi‑Year AGI Roadmap
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Oct 18, 2025 · Artificial Intelligence

Time Series Paper Digest (Oct 11‑17 2025): FIRE, CauchyNet, EvoRate, CoRA

From Oct 11‑17 2025, this digest presents four recent AI papers on time‑series forecasting: FIRE introduces a frequency‑domain decomposition with independent amplitude‑phase modeling and adaptive weighting; CauchyNet leverages holomorphic activations for compact, data‑efficient learning; the EvoRate framework quantifies learnability via mutual information; and CoRA adds covariate‑aware adaptation to foundation models, all reporting significant accuracy gains and enhanced interpretability.

AI researchDeep Learningcovariate-aware adaptation
0 likes · 10 min read
Time Series Paper Digest (Oct 11‑17 2025): FIRE, CauchyNet, EvoRate, CoRA
Meituan Technology Team
Meituan Technology Team
Oct 15, 2025 · Artificial Intelligence

What’s New in Large Model Research? Top Meituan AI Papers Up to Oct 2025

This curated list showcases Meituan’s latest large‑model breakthroughs and academic papers up to October 2025, spanning LLM system optimizations, multimodal generation, evaluation benchmarks, quantization techniques, and reinforcement‑learning‑driven improvements, offering researchers valuable insights and resources across the AI landscape.

AI researchBenchmarkinglarge language models
0 likes · 10 min read
What’s New in Large Model Research? Top Meituan AI Papers Up to Oct 2025
DataFunTalk
DataFunTalk
Oct 9, 2025 · Artificial Intelligence

From Physics to DeepMind: How a Tsinghua Star Is Shaping AI Research

Google DeepMind hired Shunyu Yao, a Tsinghua physics prodigy and former Anthropic researcher, whose rapid transition from theoretical physics to AI highlights the intense workload, values clash, and the accelerating pace of large‑model research.

AI researchDeepMindPhysics
0 likes · 9 min read
From Physics to DeepMind: How a Tsinghua Star Is Shaping AI Research
Data Party THU
Data Party THU
Oct 1, 2025 · Artificial Intelligence

Why SFT and RL Are Two Sides of the Same Coin: A Unified Gradient Theory for LLM Post‑Training

This article analyzes a recent paper that unifies supervised fine‑tuning (SFT) and reinforcement learning (RL) for large language models under a single gradient estimator, introduces the Unified Policy Gradient Estimator (UPGE) and the Hybrid Post‑Training (HPT) algorithm, and demonstrates their superior performance on math reasoning benchmarks.

AI researchHybrid TrainingLLM
0 likes · 11 min read
Why SFT and RL Are Two Sides of the Same Coin: A Unified Gradient Theory for LLM Post‑Training
AIWalker
AIWalker
Sep 23, 2025 · Artificial Intelligence

Manzano: A Small 3B Multimodal Model That Unifies Image Understanding and Generation with SOTA Performance

Manzano introduces a hybrid vision tokenizer and a three‑stage training recipe that let a 3‑billion‑parameter multimodal LLM achieve state‑of‑the‑art results on both image‑understanding benchmarks and text‑to‑image generation, while scaling smoothly to larger sizes and minimizing task conflict.

AI researchManzanohybrid tokenizer
0 likes · 25 min read
Manzano: A Small 3B Multimodal Model That Unifies Image Understanding and Generation with SOTA Performance
Amap Tech
Amap Tech
Sep 19, 2025 · Artificial Intelligence

How FSDrive Uses Spatio‑Temporal CoT to Revolutionize Autonomous Driving

FSDrive introduces a spatio‑temporal chain‑of‑thought approach that enables visual language models to generate future driving scenes as images, improving trajectory planning accuracy and safety by eliminating cross‑modal gaps and enforcing physical constraints in autonomous driving.

AI researchautonomous drivingspatio-temporal CoT
0 likes · 10 min read
How FSDrive Uses Spatio‑Temporal CoT to Revolutionize Autonomous Driving
Data Party THU
Data Party THU
Sep 18, 2025 · Artificial Intelligence

How Reinforcement Learning is Shaping the Future of Large Reasoning Models

This article surveys recent advances in applying reinforcement learning to large reasoning models, outlining the historical background, key breakthroughs like OpenAI o1 and DeepSeek‑R1, current challenges in reward design and scalability, and future research directions toward more capable AI systems.

AI researchRLHFreasoning
0 likes · 9 min read
How Reinforcement Learning is Shaping the Future of Large Reasoning Models
DataFunTalk
DataFunTalk
Sep 18, 2025 · Artificial Intelligence

How Tongyi DeepResearch Turns Chatty AI into a Research Powerhouse

Tongyi DeepResearch, an open‑source AI model and framework, achieves SOTA on multiple Deep Research benchmarks by combining fully open‑source models, frameworks, and data pipelines, and introduces novel agentic pre‑training, fine‑tuning, and reinforcement‑learning methods to enable complex multi‑step reasoning and real‑world applications.

AI researchagentic reinforcement learningopen source
0 likes · 14 min read
How Tongyi DeepResearch Turns Chatty AI into a Research Powerhouse
AntTech
AntTech
Sep 13, 2025 · Artificial Intelligence

LLaDA‑MoE: The First Native MoE Diffusion Language Model Shattering Autoregressive Limits

Ant Group and Renmin University unveiled LLaDA‑MoE, the industry’s first native MoE‑based diffusion language model trained on 20 TB of data, achieving performance comparable to Qwen2.5 while delivering several‑fold faster inference, and the model will be fully open‑sourced to accelerate global AI research.

AI researchDiffusion Language ModelLLaDA-MoE
0 likes · 6 min read
LLaDA‑MoE: The First Native MoE Diffusion Language Model Shattering Autoregressive Limits
DataFunTalk
DataFunTalk
Sep 12, 2025 · Artificial Intelligence

How Shunyu Yao is Shaping the Second Half of AI with Agents

Shunyu Yao, a Princeton‑trained AI researcher who rose through Tsinghua’s elite Yao class and OpenAI, is known for pioneering works like Tree of Thoughts, SWE‑bench, and ReAct, and now focuses on building general‑purpose agents and exploring the “second half” of AI development.

AI researchAgentReact
0 likes · 12 min read
How Shunyu Yao is Shaping the Second Half of AI with Agents
Data Party THU
Data Party THU
Sep 9, 2025 · Artificial Intelligence

From Chain‑of‑Thought to Graph‑of‑Thought: The Evolution of LLM Reasoning

This article examines how large language model reasoning has progressed from linear Chain‑of‑Thought prompting to parallel Tree‑of‑Thought and flexible Graph‑of‑Thought approaches, highlighting each method’s mechanism, strengths, limitations, computational costs, and the broader shift toward cognitive‑centric AI research.

AI researchGraph-of-ThoughtTree-of-Thought
0 likes · 7 min read
From Chain‑of‑Thought to Graph‑of‑Thought: The Evolution of LLM Reasoning
Tencent Technical Engineering
Tencent Technical Engineering
Sep 6, 2025 · Artificial Intelligence

ARC Lab’s Blueprint: Turning Multimodal AI Research into Real-World Impact

The article outlines ARC Lab’s evolution from its 2019 founding as an internal corporate research unit to a high‑impact AI team that pursues difficult multimodal understanding and generation problems, measures success through a technology‑impact funnel, publishes 30‑40 top‑tier papers annually, and translates research into open‑source tools and products that drive academic, industry, business, and societal value.

AI researchcorporate researchmultimodal models
0 likes · 19 min read
ARC Lab’s Blueprint: Turning Multimodal AI Research into Real-World Impact
Data Party THU
Data Party THU
Sep 1, 2025 · Artificial Intelligence

Why Intermediate Tokens Make LLMs Reason Better: Insights from Denny Zhou

The article analyzes Denny Zhou's Stanford CS25 lecture on large language model reasoning, explaining how intermediate token generation, chain‑of‑thought prompting, self‑consistency, reinforcement‑learning fine‑tuning, and answer aggregation together unlock powerful reasoning capabilities beyond traditional greedy decoding.

AI researchLLMPrompt Engineering
0 likes · 17 min read
Why Intermediate Tokens Make LLMs Reason Better: Insights from Denny Zhou
Volcano Engine Developer Services
Volcano Engine Developer Services
Aug 26, 2025 · Artificial Intelligence

From Single LLM to Multi‑Agent: How Context Engineering Drives the Next AI Architecture

This article examines the evolution of LangChain's Open Deep Research project from a monolithic LLM pipeline to a multi‑agent system, highlighting the role of context engineering, architectural trade‑offs, practical code examples, and best‑practice guidelines for building scalable, token‑efficient AI solutions.

AI researchContext EngineeringLLM architecture
0 likes · 16 min read
From Single LLM to Multi‑Agent: How Context Engineering Drives the Next AI Architecture
Baidu Geek Talk
Baidu Geek Talk
Aug 25, 2025 · Artificial Intelligence

How ERNIE‑4.5‑VL Redefines Multimodal AI with 100+ Language Support

The ERNIE‑4.5‑VL visual‑language model breaks single‑modality limits by delivering breakthrough image, video, and text understanding across more than 100 languages, offering lightweight yet competitive performance against models like Qwen2.5‑VL, supporting 128K context, dual “thinking” modes, and extensive deployment resources.

AI researchErnielarge language model
0 likes · 4 min read
How ERNIE‑4.5‑VL Redefines Multimodal AI with 100+ Language Support
Kuaishou Tech
Kuaishou Tech
Aug 23, 2025 · Artificial Intelligence

How Thyme Enables Models to Think Beyond Images with Code‑Driven Multimodal Reasoning

The Kwai Keye team presents Thyme, a novel multimodal reasoning framework that lets large language models generate and safely execute Python code for image manipulation and complex calculations, achieving significant performance gains over existing vision‑language models across perception, reasoning, and hallucination‑reduction benchmarks.

AI researchCode Generationlarge language model
0 likes · 12 min read
How Thyme Enables Models to Think Beyond Images with Code‑Driven Multimodal Reasoning
AI Frontier Lectures
AI Frontier Lectures
Jul 31, 2025 · Artificial Intelligence

What’s Driving the Latest LLM Architecture Trends? DeepSeek, OLMo, Gemma, and More Explained

This article examines the evolution of large language model architectures over the past seven years, comparing key design choices such as Multi‑Head Latent Attention, Grouped‑Query Attention, Mixture‑of‑Experts, sliding‑window attention, normalization placement, and optimizer variants across models like DeepSeek V3, OLMo 2, Gemma 3, Llama 4, Qwen 3, SmolLM 3, and Kimi 2.

AI researchLLM comparisonMixture of Experts
0 likes · 30 min read
What’s Driving the Latest LLM Architecture Trends? DeepSeek, OLMo, Gemma, and More Explained
AI Frontier Lectures
AI Frontier Lectures
Jul 31, 2025 · Artificial Intelligence

Can a 32‑Token Compressor Generate Images Without Training?

This article reviews a recent study that demonstrates how a highly compressed one‑dimensional tokenizer, using only 32 discrete tokens and gradient‑based test‑time optimization, can generate high‑quality images without training a separate generative model, and explores its methodology, findings, applications, and limitations.

1D tokenizerAI researchTiTok
0 likes · 10 min read
Can a 32‑Token Compressor Generate Images Without Training?
Data Thinking Notes
Data Thinking Notes
Jul 30, 2025 · Artificial Intelligence

Tracing the Evolution of Large Language Models: Key Papers and Breakthroughs

This article reviews the most influential papers in large language model research since 2017, covering foundational works such as the Transformer, GPT‑3, BERT, scaling laws, and recent innovations like FlashAttention, Mamba, and QLoRA, highlighting their core contributions and impact on AI development.

AI researchModel OptimizationTransformer
0 likes · 28 min read
Tracing the Evolution of Large Language Models: Key Papers and Breakthroughs
Kuaishou Tech
Kuaishou Tech
Jul 22, 2025 · Artificial Intelligence

How Orthus Achieves Lossless Multimodal Generation with a Unified Autoregressive Transformer

Orthus, a new unified multimodal model presented at ICML 2025, leverages an autoregressive Transformer backbone with separate language and diffusion heads to enable lossless image‑text interleaved generation, outperforming existing models on both understanding and generation benchmarks while remaining computationally efficient.

AI researchautoregressive transformerdiffusion models
0 likes · 11 min read
How Orthus Achieves Lossless Multimodal Generation with a Unified Autoregressive Transformer
DataFunTalk
DataFunTalk
Jul 20, 2025 · Artificial Intelligence

Why Meta’s AI Pioneer Yang Li‑kun Is Being Marginalized: Power Struggles Behind the Scenes

The article examines how Meta’s CEO Mark Zuckerberg’s aggressive talent‑buying and commercial focus have sidelined Turing‑award winner Yang Li‑kun, detailing the restructuring of Meta’s AI labs, the clash over research directions, and the broader dilemma of balancing academic innovation with business imperatives in the AI industry.

AI industryAI researchJEPA
0 likes · 14 min read
Why Meta’s AI Pioneer Yang Li‑kun Is Being Marginalized: Power Struggles Behind the Scenes
DataFunTalk
DataFunTalk
Jul 16, 2025 · Artificial Intelligence

Inside OpenAI: Unfiltered Lessons on AI, Culture, and Rapid Product Launches

A former OpenAI engineer shares a candid, unfiltered account of the company's fast‑paced growth, bottom‑up research culture, engineering practices, product decisions, and the intense seven‑week sprint that delivered Codex, offering valuable insights for AI researchers, product managers, and tech leaders.

AI researchCode GenerationOpenAI
0 likes · 22 min read
Inside OpenAI: Unfiltered Lessons on AI, Culture, and Rapid Product Launches
Baobao Algorithm Notes
Baobao Algorithm Notes
Jul 16, 2025 · Artificial Intelligence

What Small Labs Reveal About RL Training: Multi‑Stage, Entropy, and Resource Strategies

The article analyzes Skywork OR1's technical report, detailing how small‑scale teams use GRPO‑based reinforcement learning with multi‑stage training, advantage‑mask variants, high‑temperature sampling, adaptive entropy loss, and resource‑allocation tricks to improve large language model performance while avoiding premature entropy collapse.

AI researchentropy controlmulti-stage training
0 likes · 21 min read
What Small Labs Reveal About RL Training: Multi‑Stage, Entropy, and Resource Strategies
AI Frontier Lectures
AI Frontier Lectures
Jul 11, 2025 · Artificial Intelligence

How Llama Evolved: From Llama‑1 to Llama‑3 – Architecture, Data, and Performance Insights

This article provides a comprehensive technical analysis of Meta's Llama series, tracing the evolution from Llama‑1 through Llama‑2 to Llama‑3, detailing model architectures, training data pipelines, optimization methods, benchmark results, and the broader impact on the open‑source AI community.

AI researchLLaMAModel architecture
0 likes · 25 min read
How Llama Evolved: From Llama‑1 to Llama‑3 – Architecture, Data, and Performance Insights
Amap Tech
Amap Tech
Jul 9, 2025 · Artificial Intelligence

Bridging Human Perception and Video Motion Generation: VMBench & LD‑RPS

This article introduces VMBench, a perception‑aligned video motion generation benchmark with a five‑dimensional metric suite and meta‑guided prompt generation, and LD‑RPS, a zero‑shot unified image restoration framework using latent diffusion and recurrent posterior sampling, detailing their motivations, innovations, experiments, and future directions.

AI researchImage RestorationVideo Generation
0 likes · 14 min read
Bridging Human Perception and Video Motion Generation: VMBench & LD‑RPS
iQIYI Technical Product Team
iQIYI Technical Product Team
Jul 3, 2025 · Artificial Intelligence

Three iQIYI AI Papers Break New Ground at ACL 2025 & INTERSPEECH 2025

iQIYI’s AI research team secured three paper acceptances—two at ACL 2025 (including a main conference and a Findings paper) and one at INTERSPEECH 2025—covering long‑context large language model evaluation, Chinese novel summarization, and efficient Thai speech recognition, with links to each work.

ACL 2025AI researchINTERSPEECH 2025
0 likes · 7 min read
Three iQIYI AI Papers Break New Ground at ACL 2025 & INTERSPEECH 2025
Kuaishou Tech
Kuaishou Tech
Jul 2, 2025 · Artificial Intelligence

How EvoSearch Supercharges Image and Video Generation with Test‑Time Evolutionary Search

EvoSearch, a test‑time evolutionary search method, dramatically improves image and video generation by increasing inference compute without extra training, outperforming existing scaling techniques on diffusion and flow models while maintaining robustness and diversity across multiple benchmarks.

AI researchTest-Time ScalingVideo Generation
0 likes · 8 min read
How EvoSearch Supercharges Image and Video Generation with Test‑Time Evolutionary Search
DataFunSummit
DataFunSummit
Jun 30, 2025 · Artificial Intelligence

How Large Language Models Are Evolving Toward Autonomous Meta‑Learning Agents

This talk reviews the rapid evolution of generative large‑model AI from rule‑based systems to massive pre‑training, examines the current bottlenecks in continual learning and knowledge discovery, and proposes large‑scale meta‑learning—especially context‑based reinforcement learning (ICRL)—as a path toward truly autonomous, self‑learning agents.

AI researchAutonomous AgentsMeta Learning
0 likes · 24 min read
How Large Language Models Are Evolving Toward Autonomous Meta‑Learning Agents
DataFunTalk
DataFunTalk
Jun 27, 2025 · Artificial Intelligence

How Generative AI is Revolutionizing Ad Recommendation Systems

Join Baidu senior algorithm engineer Ji Zhi at the DataFun Summit 2025 to explore how generative AI transforms ad recommendation recall, covering item representation, evolving solution architectures, long‑sequence challenges, and practical insights for building efficient large‑model recommendation systems.

AI researchAd TechBaidu
0 likes · 3 min read
How Generative AI is Revolutionizing Ad Recommendation Systems
AIWalker
AIWalker
Jun 24, 2025 · Artificial Intelligence

How Multimodal Fusion Accelerates Paper Publication: Key Insights and Resources

The article surveys 117 recent multimodal‑fusion papers, classifies them into improvement‑based and combination‑based approaches, highlights representative works such as TimeXL, OGP‑Net, MMR‑Mamba and FusionSight, and provides a free collection of papers, classic models and code repositories for researchers.

AI researchComputer VisionDeep Learning
0 likes · 8 min read
How Multimodal Fusion Accelerates Paper Publication: Key Insights and Resources
AI Frontier Lectures
AI Frontier Lectures
Jun 20, 2025 · Artificial Intelligence

How GCA Achieves 1000× Length Generalization in Large Language Models

Ant Research introduces GCA, a causal retrieval‑based grouped cross‑attention mechanism that end‑to‑end learns to fetch relevant past chunks, dramatically reducing memory usage and achieving over 1000× length generalization on long‑context language modeling tasks, with near‑constant inference memory and linear training cost.

AI researchGrouped Cross AttentionLLM efficiency
0 likes · 11 min read
How GCA Achieves 1000× Length Generalization in Large Language Models
Kuaishou Audio & Video Technology
Kuaishou Audio & Video Technology
Jun 11, 2025 · Artificial Intelligence

Kuaishou Showcases 12 Cutting-Edge CVPR 2025 Papers on Video Generation and AI

Kuaishou presented twelve peer‑reviewed papers at CVPR 2025 covering video quality assessment, large‑scale video datasets, dynamic 3D avatar reconstruction, 4D scene simulation, controllable video generation, scaling laws for diffusion transformers, multimodal foundations, and more, highlighting the company's leading research in computer vision and AI.

AI researchCVPR2025Deep Learning
0 likes · 21 min read
Kuaishou Showcases 12 Cutting-Edge CVPR 2025 Papers on Video Generation and AI
AI Frontier Lectures
AI Frontier Lectures
Jun 9, 2025 · Artificial Intelligence

How DiSA Accelerates Autoregressive Image Generation with Diffusion Step Annealing

The article introduces DiSA, a training‑free diffusion step annealing technique that dramatically speeds up autoregressive image generation by reducing diffusion steps in later generation phases while preserving high visual quality, and validates the method across several state‑of‑the‑art AR‑Diffusion models.

AI researchDiSAautoregressive
0 likes · 16 min read
How DiSA Accelerates Autoregressive Image Generation with Diffusion Step Annealing