Tagged articles

AI research

295 articles · Page 1 of 3
Data Party THU
Data Party THU
Jun 30, 2026 · Artificial Intelligence

Large-Scale Sign Language Datasets: Resources, Benchmarks, and Annotation Standards

This ACL 2026 survey systematically reviews over 120 publicly available sign‑language datasets covering 35 languages, analyzes their modalities, annotation inconsistencies, and benchmark limitations, and proposes a 24‑field datasheet to promote reproducible and comparable AI research in sign language recognition, translation, and generation.

AI researchMultimodalannotation standards
0 likes · 15 min read
Large-Scale Sign Language Datasets: Resources, Benchmarks, and Annotation Standards
Machine Heart
Machine Heart
Jun 30, 2026 · Artificial Intelligence

LiveWorld: A New Paradigm for Video World Models that Keeps Off‑Screen Worlds Evolving

LiveWorld introduces a novel video world modeling paradigm that explicitly separates world evolution from observation rendering, enabling objects and events to continue evolving even when they leave the camera view; extensive experiments on the new LiveBench benchmark show substantial gains over prior camera‑controllable models.

AI researchLiveWorldbenchmark
0 likes · 13 min read
LiveWorld: A New Paradigm for Video World Models that Keeps Off‑Screen Worlds Evolving
Machine Heart
Machine Heart
Jun 28, 2026 · Industry Insights

Where Have the Eight Transformers' Pioneers Ended Up?

The article traces the post‑Google journeys of the eight "Attention Is All You Need" authors, detailing recent high‑profile exits to OpenAI and Anthropic, market fallout, each researcher’s contributions to the Transformer architecture, and how their divergent paths continue to shape AI beyond the original paper.

AI researchEssential AIGoogle DeepMind
0 likes · 21 min read
Where Have the Eight Transformers' Pioneers Ended Up?
Machine Heart
Machine Heart
Jun 26, 2026 · Industry Insights

Dawn Song, Leading Computer Security Expert, Joins Meta’s Superintelligence Labs

Dawn Song, a world‑renowned computer security and AI safety scholar and UC Berkeley professor, has become Meta’s VP of AI research, bringing her award‑winning work—including Dynamic Taint Analysis and the ALE benchmark—and her startups Oasis Labs and Virtue AI to strengthen Meta’s agent‑centric safety strategy.

AI researchAI safetyALE benchmark
0 likes · 5 min read
Dawn Song, Leading Computer Security Expert, Joins Meta’s Superintelligence Labs
Machine Heart
Machine Heart
Jun 25, 2026 · Industry Insights

Why DeepMind Veterans Are Leaving London: The Ongoing AI Talent Drain at Google

Top DeepMind researchers including Jonas Adler, Alexander Pritzel and Arthur Conmy are departing Google for Anthropic, highlighting a shift from Google's research‑lab culture to a model‑factory focus, a geographic move from London to Mountain View, and growing talent competition in the AI industry.

AI researchAI talentAnthropic
0 likes · 8 min read
Why DeepMind Veterans Are Leaving London: The Ongoing AI Talent Drain at Google
PaperAgent
PaperAgent
Jun 23, 2026 · Artificial Intelligence

Arbor Boosts Autonomous Research Performance 150% Over Claude Code

Arbor, a collaborative framework from RUC and Microsoft, uses Hypothesis‑Tree Refinement to turn short‑lived experiments into lasting research progress, achieving over 2.5× held‑out gains across six autonomous optimization tasks and setting a new SOTA on MLE‑Bench Lite.

AI researchArborAutonomous Optimization
0 likes · 10 min read
Arbor Boosts Autonomous Research Performance 150% Over Claude Code

Avoid Job‑Hunting Pitfalls: How a NLP PhD Secured an OpenAI Offer After 57 Interviews

Alisa Liu, a six‑year NLP PhD, shares a step‑by‑step recount of her job hunt—57 interviews across 11 top AI firms, including OpenAI—detailing interview formats, preparation tactics, offer negotiation, and the emotional toll, offering a practical guide to avoid common pitfalls for future candidates.

AI researchBehavioral InterviewJob Search
0 likes · 12 min read
Avoid Job‑Hunting Pitfalls: How a NLP PhD Secured an OpenAI Offer After 57 Interviews
Machine Heart
Machine Heart
Jun 22, 2026 · Artificial Intelligence

Why Dropping VAE and Private Data Boosts Text-to-Image Generation Performance

MiniT2I, a minimalist pixel-space text-to-image model that discards VAE, AdaLN, and private data, achieves 0.87 GenEval and 84.2 DPG-Bench scores with only 258 M parameters, demonstrating that a stripped-down architecture and public data can outperform larger, more complex systems.

AI researchMiniT2ITransformer
0 likes · 8 min read
Why Dropping VAE and Private Data Boosts Text-to-Image Generation Performance
SuanNi
SuanNi
Jun 17, 2026 · Artificial Intelligence

Can a 3B Small Model Match Top Closed‑Source LLMs? VibeThinker-3B’s Limits

VibeThinker-3B, a newly open‑sourced 3‑billion‑parameter model, achieves near‑state‑of‑the‑art scores on math competitions (AIME, IMO‑AnswerBench), coding (LiveCodeBench), and verification benchmarks, rivaling trillion‑parameter closed models, thanks to a Spectrum‑to‑Signal training pipeline, multi‑stage SFT, RL, and offline distillation, supporting a new parametric compression‑coverage hypothesis.

AI researchBenchmarkingParameter Efficiency
0 likes · 8 min read
Can a 3B Small Model Match Top Closed‑Source LLMs? VibeThinker-3B’s Limits
Machine Heart
Machine Heart
Jun 17, 2026 · Artificial Intelligence

Can a 3B Model Rival Opus 4.5 in Programming? Inside the Domestic VibeThinker‑3B

VibeThinker‑3B, a 3‑billion‑parameter Chinese‑built model, achieves programming benchmark scores comparable to top‑tier models like Opus 4.5, excelling in AIME, HMMT, LiveCodeBench and LeetCode contests, thanks to its Spectrum‑to‑Signal training pipeline, Claim‑Level reliability evaluation, and multi‑stage SFT and RL refinements.

AI researchClaim-Level ReliabilitySpectrum-to-Signal
0 likes · 7 min read
Can a 3B Model Rival Opus 4.5 in Programming? Inside the Domestic VibeThinker‑3B
Machine Heart
Machine Heart
Jun 17, 2026 · Artificial Intelligence

Why Transformers Struggle with State Tracking and How Recurrence Could Fix It

The DeepMind paper “The Topological Trouble With Transformers” reveals that the Transformer architecture inherently fails at state tracking, making chain‑of‑thought prompting only a costly patch, and proposes returning to recurrent mechanisms—such as looped or sequence‑wise recurrence—to achieve true, continuous memory.

AI researchChain-of-ThoughtDeepMind
0 likes · 9 min read
Why Transformers Struggle with State Tracking and How Recurrence Could Fix It
Machine Heart
Machine Heart
Jun 15, 2026 · R&D Management

How to Become an Outstanding AI Researcher: Lessons from an Anthropic Scientist

The article distills an Anthropic researcher’s candid guide on becoming a truly effective AI researcher, emphasizing deliberate practice of small skills—topic selection, literature reading, writing, rapid experiment cycles—and drawing on historic insights from Hamming, Sutton, Shannon, and others.

AI researchacademic writinghistorical insights
0 likes · 14 min read
How to Become an Outstanding AI Researcher: Lessons from an Anthropic Scientist
AI Architecture Path
AI Architecture Path
Jun 12, 2026 · Artificial Intelligence

How a New AI Research Skill Gained 2,685 Stars in One Day and Helps Anyone Bridge the Information Gap

The article explains how the open‑source tool last30days‑skill outperforms traditional search by aggregating real‑time community consensus from over 14 platforms—including Reddit, X, YouTube, and Polymarket—into structured, source‑backed reports, and provides detailed installation, configuration, and use‑case guidance for creators, product teams, developers, and investors.

AI researchPolymarketPrompt Engineering
0 likes · 17 min read
How a New AI Research Skill Gained 2,685 Stars in One Day and Helps Anyone Bridge the Information Gap
Top Architect
Top Architect
Jun 11, 2026 · Artificial Intelligence

Gemini Omni Review: How One Prompt Turns Sketches into Cinematic Videos

Google DeepMind’s Gemini Omni is presented as a new world model that combines reasoning and generation to enable conversational video editing, multimodal training, and emergent capabilities, contrasting it with Veo while discussing trade‑offs, safety measures, and the model’s broader impact on AI development.

AI researchGemini OmniMultimodal AI
0 likes · 10 min read
Gemini Omni Review: How One Prompt Turns Sketches into Cinematic Videos
HyperAI Super Neural
HyperAI Super Neural
Jun 11, 2026 · Artificial Intelligence

ChartNet: MIT/IBM’s Million‑Scale Synthetic Chart Dataset with 1.5M Diverse Samples

MIT and IBM researchers introduce ChartNet, the largest code‑guided synthetic chart dataset containing 1.5 million multimodal samples across 24 chart types and six libraries, and demonstrate that fine‑tuning visual‑language models on it yields consistent, significant gains on chart reconstruction, data extraction, summarization, and reasoning tasks, outperforming much larger off‑the‑shelf models including GPT‑4o.

AI researchChartNetchart understanding
0 likes · 13 min read
ChartNet: MIT/IBM’s Million‑Scale Synthetic Chart Dataset with 1.5M Diverse Samples
Machine Heart
Machine Heart
Jun 7, 2026 · Artificial Intelligence

FusionRoute: Token-Level Expert Routing and Self-Correction for Multi-LLM Collaboration

FusionRoute introduces a token‑level routing framework that dynamically selects the most suitable expert LLM for each token and adds a complementary generation step, enabling fine‑grained, stable multi‑model collaboration that outperforms existing sequence‑level and expert‑selection methods across diverse benchmarks.

AI researchModel Mergingexpert routing
0 likes · 11 min read
FusionRoute: Token-Level Expert Routing and Self-Correction for Multi-LLM Collaboration
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Jun 4, 2026 · Artificial Intelligence

Is This the Last Human-Written Paper? Converting PDFs into AI-Executable Research Artifacts

A collaborative paper by 37 scholars from Stanford, MIT, CMU and others argues that the centuries‑old PDF format imposes hidden storytelling and engineering taxes, proposes a four‑layer Agent‑Native Research Artifact (ARA) to preserve full experimental detail, and shows through benchmarks that ARA dramatically improves AI agents' understanding, reproduction and extension of research.

AI researchAgent-native artifactsScientific publishing
0 likes · 10 min read
Is This the Last Human-Written Paper? Converting PDFs into AI-Executable Research Artifacts
SuanNi
SuanNi
Jun 2, 2026 · Artificial Intelligence

Harvard’s AutoScientists Lets AI Agents Self‑Organize Research Teams and Outperform Traditional AI Agents

AutoScientists, a Harvard‑built system where nine AI agents self‑organize via a shared state without a central commander, achieves a 74.4% average rank on BioML‑Bench, runs GPT training experiments 1.9× faster, and improves ProteinGym fitness prediction by 12.5%, while ablation studies reveal the critical role of each of its four core mechanisms.

AI AgentsAI researchAutoScientists
0 likes · 12 min read
Harvard’s AutoScientists Lets AI Agents Self‑Organize Research Teams and Outperform Traditional AI Agents
Machine Heart
Machine Heart
Jun 2, 2026 · Artificial Intelligence

When AI Becomes Its Own Data Engineer: Inside DataMaster

DataMaster introduces an autonomous AI data engineer that automatically searches, cleans, combines, and reuses data, enabling fixed models and training pipelines to achieve substantial performance gains across benchmarks such as MLE‑Bench Lite and PostTrainBench, including a 31.0% GPQA score.

AI researchAutonomous AgentsData Engineering
0 likes · 11 min read
When AI Becomes Its Own Data Engineer: Inside DataMaster
PaperAgent
PaperAgent
May 30, 2026 · Artificial Intelligence

DeepSeek Researcher Co‑authors Two New Papers on Autonomous AI Research and Continual Learning

The article summarizes two recent DeepSeek papers—one presenting an L1–L5 taxonomy and four architecture patterns for autonomous research agents, the other proposing a three‑dimensional taxonomy for continual learning, detailing method families, a self‑improvement phase diagram, experimental comparisons, an impossibility theorem, and the production statistics of the Deli AutoResearch framework.

AI researchAutonomous AgentsContinual Learning
0 likes · 12 min read
DeepSeek Researcher Co‑authors Two New Papers on Autonomous AI Research and Continual Learning
Machine Heart
Machine Heart
May 25, 2026 · Artificial Intelligence

How DeepMind’s AI Solved Nine Erdős Problems for Only a Few Hundred Dollars Each

DeepMind’s AlphaProof Nexus framework enabled an AI agent to automatically prove and verify nine long‑standing Erdős conjectures at a cost of only a few hundred dollars per problem, using a simple “think‑try” loop and a more advanced multi‑agent evolution architecture, and demonstrating a shift toward leveraging raw large‑model reasoning for formal mathematics.

AI researchAlphaProof NexusDeepMind
0 likes · 11 min read
How DeepMind’s AI Solved Nine Erdős Problems for Only a Few Hundred Dollars Each
SuanNi
SuanNi
May 22, 2026 · Artificial Intelligence

All‑In‑One Image & Video: ByteDance’s Deployable Native Multimodal Model Lance

Lance, ByteDance’s newly open‑sourced 3‑billion‑parameter multimodal model, runs on a single 40 GB GPU, tops HuggingFace trend charts, and achieves leading scores on DPG Bench, GenEval, and video generation benchmarks while surpassing several state‑of‑the‑art single‑modal models.

AI researchByteDanceLance
0 likes · 3 min read
All‑In‑One Image & Video: ByteDance’s Deployable Native Multimodal Model Lance
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 21, 2026 · Artificial Intelligence

Visual Generation Meets Slow Thinking: Decoding New Multimodal Reasoning Paradigms from CVPR 2026

This article curates ten standout CVPR 2026 papers that introduce novel multimodal interaction frameworks, active video avatars, unified image customization, artistic poster generation, information‑theoretic video compression, all‑purpose visual reasoning models, 3D‑grounded spatial reasoning, interleaved text‑visual generation, and unified fine‑grained video understanding, each achieving state‑of‑the‑art performance.

AI researchCVPRMultimodal
0 likes · 13 min read
Visual Generation Meets Slow Thinking: Decoding New Multimodal Reasoning Paradigms from CVPR 2026
Machine Heart
Machine Heart
May 20, 2026 · Industry Insights

ByteDance Scholarship Goes Global: Tracking the Careers of 67 Winners Over Five Years

The 2026 ByteDance Scholarship opens to worldwide applicants, expands slots and funding, and now accepts stage‑level results; a five‑year review shows 67 awardees—spanning PhDs, masters and undergraduates—from top universities who have entered top AI labs, founded startups, or taken faculty positions, illustrating how early‑stage research often precedes industry trends.

AI researchAI scholarshipByteDance
0 likes · 12 min read
ByteDance Scholarship Goes Global: Tracking the Careers of 67 Winners Over Five Years
Kuaishou Tech
Kuaishou Tech
May 18, 2026 · Artificial Intelligence

How ALM‑MTA Improves Multi‑Touch Attribution with Front‑Door Identification and Adversarial Modeling

The ALM‑MTA method combines front‑door causal adjustment with an adversarial proxy for the unobserved mediator, eliminating hidden confounding in multi‑touch attribution and delivering more reliable uplift estimates that boosted Kuaishou's DAU by 0.6% and AUC by 11% over SOTA baselines, as reported in an ICLR 2026 paper.

AI researchadversarial learningcausal attribution
0 likes · 13 min read
How ALM‑MTA Improves Multi‑Touch Attribution with Front‑Door Identification and Adversarial Modeling
Machine Heart
Machine Heart
May 17, 2026 · Artificial Intelligence

What Exactly Is a World Model? History, Technology, and the $10 B Bet

The article traces the two decades‑long, parallel research lines that birthed video world models—dreaming agents in reinforcement learning and learning physics from human video—explains how they converged in 2024‑2025, evaluates current capabilities and limitations, and analyzes the $10 billion investment landscape and strategic moves by NVIDIA, OpenAI, and others.

AI researchSimulationreinforcement learning
0 likes · 32 min read
What Exactly Is a World Model? History, Technology, and the $10 B Bet
Data Party THU
Data Party THU
Apr 30, 2026 · Artificial Intelligence

Turning Transformers into Mamba: How Apple Linearized Inference Costs

Apple introduced a two‑step cross‑architecture distillation method that converts costly quadratic‑time Transformers into cheaper linear‑time Mamba models, preserving most of the original performance while dramatically reducing inference cost.

AI researchLinear AttentionMamba
0 likes · 8 min read
Turning Transformers into Mamba: How Apple Linearized Inference Costs
Machine Heart
Machine Heart
Apr 30, 2026 · Artificial Intelligence

Can a Pre‑1930 Language Model Infer Einstein’s Relativity? Insights from the Talkie‑1930 Project

Researchers built a 13‑billion‑parameter model trained only on texts published before 1931, called Talkie‑1930, and used surprise‑based metrics, programming tests, and a modern‑twin comparison to explore how far such a historically‑constrained model can extrapolate future knowledge and reveal data‑leakage challenges.

AI researchHumanEvalLanguage Models
0 likes · 10 min read
Can a Pre‑1930 Language Model Infer Einstein’s Relativity? Insights from the Talkie‑1930 Project
PaperAgent
PaperAgent
Apr 30, 2026 · Artificial Intelligence

How Agentic AI is Redefining World Modeling

The article reviews the paper "Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond", introducing a two‑axis framework (capability levels L1‑L3 and law domains) to map diverse world‑modeling systems, highlighting that most current systems stall at L1, that explicit law encoding is crucial for long‑term stability, and that L3 represents the ultimate, self‑evolving model.

AI AgentsAI researchAgentic AI
1 likes · 6 min read
How Agentic AI is Redefining World Modeling
Data Party THU
Data Party THU
Apr 29, 2026 · Artificial Intelligence

How Far Can Unsupervised RL for Large Models Go? A Systematic Answer from a Tsinghua Team

The article analyzes the scaling limits of unsupervised reinforcement learning for large language models, revealing that intrinsic‑reward methods initially boost performance but inevitably collapse, proposes a unified theory and a model‑collapse metric to predict trainability, and argues that external‑reward approaches are the scalable path forward.

AI researchRL scalingexternal rewards
0 likes · 11 min read
How Far Can Unsupervised RL for Large Models Go? A Systematic Answer from a Tsinghua Team
Machine Heart
Machine Heart
Apr 25, 2026 · Artificial Intelligence

Can Multi-Model Co-Evolution Shatter the Single-Model Ceiling? Squeeze Evolve Achieves Validator-Free SOTA Inference

The paper introduces Squeeze Evolve, a validator‑free multi‑model evolutionary framework that orchestrates diverse large language models to break the performance ceiling of any single model, delivering up to 23‑point accuracy improvements and 1.4‑3.3× cost reductions across math, vision, and scientific benchmarks.

AI researchInference OptimizationSqueeze Evolve
0 likes · 8 min read
Can Multi-Model Co-Evolution Shatter the Single-Model Ceiling? Squeeze Evolve Achieves Validator-Free SOTA Inference
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Apr 23, 2026 · Industry Insights

Should You Take a Tencent AI Internship? Key Factors to Consider

The article examines whether a Tencent AI internship is worth pursuing by analyzing the program’s growth stage, unique user ecosystem, mentorship structure, compensation model, and early‑year advantages, illustrated with real intern case studies, to help students decide what they aim to gain from the experience.

AI internshipAI researchCareer Guidance
0 likes · 14 min read
Should You Take a Tencent AI Internship? Key Factors to Consider
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Apr 16, 2026 · Artificial Intelligence

Evidence Mining for Explainable AI: Methods and Applications

The talk introduces evidence‑mining techniques that extract supporting information from input text to improve model explainability, discusses the shortcut‑learning pitfalls of existing methods, and presents a new approach that enhances reliability and integrates with large‑model chain‑of‑thought compression for more interpretable, efficient reasoning.

AI researchevidence miningexplainable AI
0 likes · 4 min read
Evidence Mining for Explainable AI: Methods and Applications
Meituan Technology Team
Meituan Technology Team
Apr 16, 2026 · Artificial Intelligence

Can End-to-End Diffusion TTS Beat Traditional Pipelines? Inside LongCat-AudioDiT

LongCat-AudioDiT introduces a wave‑VAE plus diffusion Transformer architecture that eliminates intermediate spectrograms, solves training‑inference mismatch with dual constraints, replaces classifier‑free guidance with adaptive projection guidance, and achieves state‑of‑the‑art zero‑shot voice cloning performance on multiple benchmarks.

AI researchText‑to‑Speechaudio generation
0 likes · 12 min read
Can End-to-End Diffusion TTS Beat Traditional Pipelines? Inside LongCat-AudioDiT
Data STUDIO
Data STUDIO
Apr 14, 2026 · Artificial Intelligence

Can ChatGPT Deep Research Double Your Research Efficiency?

The article explains how ChatGPT Deep Research transforms ordinary web searches into full‑fledged research reports, compares three leading Deep Research tools, outlines nine practical use cases, warns of common pitfalls, and offers prompt‑engineering tips for both individual and enterprise adoption.

AI researchChatGPTDeep Research
0 likes · 16 min read
Can ChatGPT Deep Research Double Your Research Efficiency?
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Apr 9, 2026 · Artificial Intelligence

Google DeepMind’s Deep Think Dominates Eight Language Olympiads and Solves Four AI Challenges

Google DeepMind’s Deep Think model posted top‑tier scores in eight language‑specific Olympiads—from IMO gold to ICPC finals—while also tackling open scientific problems, yet the results rely on internal evaluations without third‑party verification, highlighting both a breakthrough in multilingual AI reasoning and the need for transparent benchmarking.

AI benchmarkingAI researchDeep Think
0 likes · 9 min read
Google DeepMind’s Deep Think Dominates Eight Language Olympiads and Solves Four AI Challenges
PaperAgent
PaperAgent
Apr 6, 2026 · Artificial Intelligence

Can LLMs Self‑Improve After Deployment? Inside Microsoft’s Online Experiential Learning

Microsoft’s Online Experiential Learning framework lets large language models continuously self‑evolve after deployment by extracting experience from user interactions and consolidating it into model parameters, eliminating the need for human labels, reward models, or server‑side environment access, and demonstrating scalable gains across tasks and model sizes.

AI researchLLMcontinuous training
0 likes · 9 min read
Can LLMs Self‑Improve After Deployment? Inside Microsoft’s Online Experiential Learning
Data Party THU
Data Party THU
Apr 5, 2026 · Artificial Intelligence

How to Beat Shortcut Learning for Better OOD Generalization in Vision Models

Visual and vision-language models excel under IID benchmarks but often fail on out-of-distribution data due to shortcut learning; this article examines the problem, explains its causes, and proposes data-level and model-level interventions—including StillMix, FLASH, and SPARCL—to improve OOD robustness.

AI researchData AugmentationModel Design
0 likes · 7 min read
How to Beat Shortcut Learning for Better OOD Generalization in Vision Models
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Apr 2, 2026 · Artificial Intelligence

How Large Language Models Can Self‑Improve: A Technical Review and Future Outlook

This article surveys the emerging self‑improvement paradigm for large language models, presenting a closed‑loop lifecycle comprising data acquisition, selection, model optimization, inference refinement, and an autonomous evaluation layer, and discusses current limitations and research directions toward fully autonomous LLM evolution.

AI researchAutonomous EvaluationLLM
0 likes · 11 min read
How Large Language Models Can Self‑Improve: A Technical Review and Future Outlook
PaperAgent
PaperAgent
Mar 28, 2026 · Artificial Intelligence

How ACCORD Breaks Concept Coupling in Custom Text‑to‑Image Generation

The ACCORD framework formalizes the concept‑coupling issue in text‑to‑image diffusion models as a statistical dependency problem and resolves it with two plug‑and‑play regularization losses, dramatically improving fidelity and text control without altering model architecture.

ACCORDAI researchDiffusion Models
0 likes · 7 min read
How ACCORD Breaks Concept Coupling in Custom Text‑to‑Image Generation
SuanNi
SuanNi
Mar 25, 2026 · Artificial Intelligence

How LeWorldModel Learns Physics from Pixels in Hours – A Deep Dive

LeWorldModel (LeWM) is a compact AI world model that learns real‑world physics directly from raw pixel streams using only two simple mathematical rules, achieving dramatically faster planning and robust physical intuition compared to prior large‑scale models.

AI researchModel Predictive Controlphysics learning
0 likes · 14 min read
How LeWorldModel Learns Physics from Pixels in Hours – A Deep Dive
AI Architecture Hub
AI Architecture Hub
Mar 25, 2026 · Artificial Intelligence

How Memento-Skills Enables Continuous Learning for Frozen LLM Agents

The article analyzes the limitations of frozen LLM agents—fixed parameters, loss of state, and costly fine‑tuning—and introduces the Memento‑Skills framework, which adds an external, evolvable skill memory to achieve deployment‑time learning, detailed architecture, optimization knobs, and strong experimental gains.

AI researchDeployment-Time LearningLLM Agents
0 likes · 14 min read
How Memento-Skills Enables Continuous Learning for Frozen LLM Agents
AIWalker
AIWalker
Mar 20, 2026 · Artificial Intelligence

Plug‑and‑Play reAR Boosts Visual AR to SOTA Quality with Only 177M Parameters

The paper introduces reAR, a plug‑and‑play regularization framework that aligns generator and tokenizer representations in visual autoregressive models, dramatically improving image quality and matching large diffusion models while using far fewer parameters, and validates the approach with extensive experiments, ablations, and scalability analysis.

AI researchParameter EfficiencyRegularization
0 likes · 20 min read
Plug‑and‑Play reAR Boosts Visual AR to SOTA Quality with Only 177M Parameters
AIWalker
AIWalker
Mar 17, 2026 · Artificial Intelligence

How a 4B-Parameter Open-Source Model Outperforms 14B Multimodal Giants

InternVL-U, a 4‑billion‑parameter unified multimodal model released as open source, combines a 2B MLLM backbone with a 1.7B visual generation head and, through a reasoning‑centric data pipeline and Chain‑of‑Thought guidance, achieves superior understanding, generation, and editing performance that surpasses much larger 14‑20B models on multiple benchmarks.

AI researchInternVL-ULarge Language Model
0 likes · 22 min read
How a 4B-Parameter Open-Source Model Outperforms 14B Multimodal Giants
AI Architecture Path
AI Architecture Path
Mar 17, 2026 · Artificial Intelligence

Automating LLM Tuning with Autoresearch: AI Agents on a Single GPU

Autoresearch, an open‑source project by Andrej Karpathy, enables AI agents to autonomously modify code, run experiments, and evaluate results for LLM tuning on a single GPU, dramatically reducing manual hyper‑parameter work, standardizing experiments, and offering low‑cost, reproducible research with clear limitations and practical setup steps.

AI researchAutonomous AgentsLLM tuning
0 likes · 11 min read
Automating LLM Tuning with Autoresearch: AI Agents on a Single GPU
AI Explorer
AI Explorer
Mar 15, 2026 · Artificial Intelligence

Large Models May Break Language Training Dependence, Redefining Intelligence

A new study suggests that large AI models could reduce their reliance on massive text corpora by early‑fusing multimodal data such as video and sensor streams, potentially slashing training costs, improving generalization, and prompting a shift toward more embodied notions of intelligence.

AI researchEmbodied IntelligenceMultimodal Learning
0 likes · 6 min read
Large Models May Break Language Training Dependence, Redefining Intelligence
AI Engineering
AI Engineering
Mar 10, 2026 · Artificial Intelligence

Yann LeCun’s New AMI Labs Secures $1.03B to Build a World‑Model Alternative to LLMs

Yann LeCun and Alexandre LeBrun have launched AMI Labs, raising $1.03 billion in Europe’s largest seed round to develop JEPA—a world‑model architecture intended to replace LLMs for high‑risk domains, with all code and papers open‑sourced, a 5‑10‑year horizon, and backing from NVIDIA, Samsung, Bezos’ venture, and others.

AI researchAMI LabsJEPA
0 likes · 3 min read
Yann LeCun’s New AMI Labs Secures $1.03B to Build a World‑Model Alternative to LLMs
AIWalker
AIWalker
Mar 8, 2026 · Artificial Intelligence

How VisionPangu’s 1.7B Model Beats Larger LLMs in Detailed Image Captioning

VisionPangu demonstrates that a compact 1.7 B‑parameter multimodal model can generate richly detailed, coherent image descriptions that rival much larger models by leveraging high‑quality dense data, a three‑part architecture, and a two‑stage deep alignment training strategy.

AI researchData QualityImage Captioning
0 likes · 13 min read
How VisionPangu’s 1.7B Model Beats Larger LLMs in Detailed Image Captioning
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 6, 2026 · Artificial Intelligence

15‑Person Overseas Chinese Team Builds Uni‑1, a Unified Image Model Surpassing Nano Banana

The article reviews Uni‑1, a decoder‑only transformer that unifies visual understanding and generation, details its architecture, benchmark superiority on RISEBench and ODinW‑13, showcases diverse visual examples where it outperforms GPT Image 1.5 and Nano Banana Pro, and highlights the small elite team behind the breakthrough.

AI researchLuma AIMultimodal AI
0 likes · 14 min read
15‑Person Overseas Chinese Team Builds Uni‑1, a Unified Image Model Surpassing Nano Banana
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 5, 2026 · Artificial Intelligence

Can AI Self‑Improve? Inside a Stanford PhD Defense on Continually Self‑Improving AI

Zitong Yang’s Stanford PhD defense introduced “continually self‑improving AI,” a system that autonomously refines its own parameters, generates synthetic training data, and even designs its own learning algorithms, with experiments on synthetic continual training, synthetic‑bootstrap pre‑training, and AI‑design‑AI demonstrating measurable gains over static baselines.

AI researchContinual Learningpretraining
0 likes · 35 min read
Can AI Self‑Improve? Inside a Stanford PhD Defense on Continually Self‑Improving AI
AI Frontier Lectures
AI Frontier Lectures
Feb 28, 2026 · Artificial Intelligence

Can Reinforcement Learning Revolutionize Text-to-3D Generation? A Deep Dive

This article presents a systematic investigation of applying reinforcement learning to text‑to‑3D generation, detailing reward design, algorithm selection, a new 3D benchmark, a hierarchical GRPO framework, extensive ablations, and the resulting performance gains and limitations.

AI researchgenerative modelsreinforcement learning
0 likes · 13 min read
Can Reinforcement Learning Revolutionize Text-to-3D Generation? A Deep Dive
PaperAgent
PaperAgent
Feb 25, 2026 · Artificial Intelligence

How Contextual Co-Player Inference Enables Robust Multi-Agent Cooperation

These two recent Google papers advance multi‑agent reinforcement learning: one introduces contextual co‑player inference to achieve robust cooperation without explicit meta‑learning, while the other presents AlphaEvolve, a large‑language‑model‑driven evolutionary framework that automatically discovers novel MARL algorithms such as VAD‑CFR and SHOR‑PSRO.

AI researchCFRLLM-driven algorithm discovery
0 likes · 13 min read
How Contextual Co-Player Inference Enables Robust Multi-Agent Cooperation
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Feb 14, 2026 · Artificial Intelligence

Latent Forcing: Reordering Diffusion Steps Boosts Pixel‑Level Image Quality

The new Latent Forcing technique from Fei‑Fei Li’s team reorders the diffusion trajectory, first generating a latent structural sketch and then refining pixel details, which restores efficiency of latent‑space models while preserving 100 % pixel fidelity, achieving state‑of‑the‑art FID scores on ImageNet‑256.

AI researchDiffusion ModelsImageNet
0 likes · 6 min read
Latent Forcing: Reordering Diffusion Steps Boosts Pixel‑Level Image Quality
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Feb 10, 2026 · Artificial Intelligence

LeCun Team’s Triple Breakthrough: Sparse Representations, Gradient Planning, and Lightweight JEPA for World Models

LeCun’s three new papers—Rectified LpJEPA, GRASP, and EB‑JEPA—address dense feature bottlenecks, inefficient gradient‑free planning, and heavyweight codebases by introducing sparsity‑preserving regularization, a parallel gradient‑based planner, and a lightweight modular library, delivering high‑performance world‑model representations that run on a single GPU.

AI researchJEPAgradient planning
0 likes · 11 min read
LeCun Team’s Triple Breakthrough: Sparse Representations, Gradient Planning, and Lightweight JEPA for World Models
JD Cloud Developers
JD Cloud Developers
Feb 4, 2026 · Artificial Intelligence

How Deep Research Transforms LLMs into Autonomous AI Researchers

This article examines Deep Research, an AI system that adds autonomous planning and deep reasoning to large language models, enabling them to browse the web, perform long‑chain reasoning, and generate professional, citation‑rich reports for complex tasks such as industry trend analysis and technical competitive research.

AI researchAutonomous AgentsInformation Retrieval
0 likes · 22 min read
How Deep Research Transforms LLMs into Autonomous AI Researchers
JD Tech Talk
JD Tech Talk
Feb 4, 2026 · Artificial Intelligence

How Deep Research Turns LLMs into Autonomous AI Researchers

This article explains the background, core features, underlying ReAct‑based architecture, and engineering solutions of Deep Research—a system that equips large language models with autonomous planning, long‑chain reasoning, and professional report generation to tackle complex information‑intensive tasks.

AI researchAutonomous AgentsInformation Retrieval
0 likes · 21 min read
How Deep Research Turns LLMs into Autonomous AI Researchers
PaperAgent
PaperAgent
Feb 2, 2026 · Artificial Intelligence

How Kimi K2.5 Achieves Multimodal Mastery with Joint Training and Agent Swarms

The Kimi K2.5 technical report reveals how a Chinese team combined joint text‑vision training, a novel Zero‑Vision SFT method, and a parallel agent‑swarm architecture to deliver top‑ranked multimodal performance, dramatically faster inference, and open‑source access for broader AI research.

AI researchAgent SwarmKimi K2.5
0 likes · 9 min read
How Kimi K2.5 Achieves Multimodal Mastery with Joint Training and Agent Swarms
Data Party THU
Data Party THU
Jan 31, 2026 · Artificial Intelligence

Can LLMs Learn While Being Tested? Inside the TTT-Discover Breakthrough

The article examines the Test‑Time Training to Discover (TTT‑Discover) approach, which applies reinforcement learning during inference to let large language models continuously improve on single test problems, and reports strong results across mathematics, GPU kernel optimization, algorithm design, and biology.

AI researchLLMScientific Discovery
0 likes · 9 min read
Can LLMs Learn While Being Tested? Inside the TTT-Discover Breakthrough
AI Frontier Lectures
AI Frontier Lectures
Jan 30, 2026 · Artificial Intelligence

How SplatSSC Revolutionizes Semantic Scene Completion with Depth‑Guided Gaussian Splatting

SplatSSC introduces a depth‑guided Gaussian splatting framework that replaces random primitive initialization with geometry‑aware priors and a decoupled aggregation module, achieving state‑of‑the‑art performance on indoor semantic scene completion while dramatically reducing computational overhead and eliminating floaters.

3D PerceptionAI researchGaussian splatting
0 likes · 10 min read
How SplatSSC Revolutionizes Semantic Scene Completion with Depth‑Guided Gaussian Splatting
Baobao Algorithm Notes
Baobao Algorithm Notes
Jan 26, 2026 · Artificial Intelligence

From Search Ads to Foundation Models: My Journey Building the EvoCUA GUI Agent

The author explains why he transitioned from search advertising algorithms to foundation model research, outlines the four typical activities of base‑model teams, and shares detailed technical insights, experimental practices, and scaling strategies that led the EvoCUA GUI Agent to achieve open‑source SOTA on OSWorld.

AI researchFoundation ModelsGUI agents
0 likes · 17 min read
From Search Ads to Foundation Models: My Journey Building the EvoCUA GUI Agent
PaperAgent
PaperAgent
Jan 25, 2026 · Artificial Intelligence

How Deep GraphRAG Solves Retrieval’s Three‑Way Dilemma with Hierarchical Search

Deep GraphRAG tackles the three‑fold dilemma of traditional Retrieval‑Augmented Generation by introducing hierarchical global‑to‑local retrieval, a beam‑search dynamic reordering that cuts latency, and a DW‑GRPO reinforcement‑learning module that adaptively weights rewards, achieving near‑state‑of‑the‑art performance with up to 86% faster inference.

AI researchGraphRAGHierarchical Retrieval
0 likes · 5 min read
How Deep GraphRAG Solves Retrieval’s Three‑Way Dilemma with Hierarchical Search
PaperAgent
PaperAgent
Jan 25, 2026 · Industry Insights

Top 10 Chinese Large Models to Watch: Features, Benchmarks, and Download Links

This roundup highlights ten cutting‑edge Chinese AI models—including Qwen3‑TTS, LongCat‑Flash‑Thinking‑2601, GLM‑4.7‑Flash, STEP3‑VL‑10B, Baichuan‑M3, and Youtu‑LLM—detailing their multilingual capabilities, architecture innovations, performance claims, and providing direct repository links for researchers and developers.

AI researchChinese AIMultimodal
0 likes · 7 min read
Top 10 Chinese Large Models to Watch: Features, Benchmarks, and Download Links
PaperAgent
PaperAgent
Jan 20, 2026 · Artificial Intelligence

How Intrinsic Self‑Critique Boosts LLM Planning Accuracy to 89% %​

Google DeepMind's new "Intrinsic Self‑Critique" method lets large language models iteratively self‑evaluate and rewrite their plans, raising Blocksworld planning accuracy from 49.8% to 89.3% and setting new records across multiple planning benchmarks.

AI researchLLMPlanning
0 likes · 5 min read
How Intrinsic Self‑Critique Boosts LLM Planning Accuracy to 89% %​
BirdNest Tech Talk
BirdNest Tech Talk
Jan 11, 2026 · Artificial Intelligence

How AI Agents Overcome Context Window Limits: Gemini vs Manus Deep Research

The article analyzes the context‑window bottleneck of large language models, compares two architectural strategies—strengthening the model (Gemini Deep Research) and parallel agent decomposition (Manus Wide Research)—and details a wind‑power investment case study, technical implementation, and future directions.

AI researchReActagent architecture
0 likes · 16 min read
How AI Agents Overcome Context Window Limits: Gemini vs Manus Deep Research
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Jan 11, 2026 · Artificial Intelligence

Insights from NeurIPS 2025: Modeling Distributions and Venturing Beyond Them

The report summarizes NeurIPS 2025 in San Diego, highlighting four NIRC papers on noise‑robust 3D human pose estimation, LVLM video‑anomaly understanding, and hand‑object reconstruction, and discusses broader industry trends such as feed‑forward generation and large‑scale pre‑training showcased by leading AI companies.

3D human pose estimationAI researchLVLM
0 likes · 5 min read
Insights from NeurIPS 2025: Modeling Distributions and Venturing Beyond Them
Data Party THU
Data Party THU
Jan 7, 2026 · Artificial Intelligence

Why the Common KL Penalty in LLM RL Training Is Biased—and How to Fix It

A recent study reveals that the widely used KL regularization in LLM reinforcement learning (RLVR) is mathematically biased, leading to unstable training and poorer generalization, and shows that moving the KL term back to the reward with a simple K1 estimator can boost out‑of‑domain performance by up to 20%.

AI researchKL regularizationLLM training
0 likes · 10 min read
Why the Common KL Penalty in LLM RL Training Is Biased—and How to Fix It
PaperAgent
PaperAgent
Jan 6, 2026 · Artificial Intelligence

How Recursive Language Models Enable Unlimited Context for LLMs

Recursive Language Models (RLM) offer a cost‑effective alternative to expanding LLM context windows by storing prompts as variables and enabling recursive calls, allowing models to process over 100,000 tokens, with experiments showing superior performance and lower median costs compared to baseline approaches.

AI researchLLM scalingLong Context
0 likes · 5 min read
How Recursive Language Models Enable Unlimited Context for LLMs
HyperAI Super Neural
HyperAI Super Neural
Jan 5, 2026 · Artificial Intelligence

WorldPlay: Real‑Time Interactive World Modeling with Long‑Term Geometry Consistency

Tencent’s HyperAI team introduces WorldPlay, an open‑source real‑time interactive world model that achieves 24 FPS 720p video generation while preserving long‑term geometric consistency through dual‑action representation, dynamic context memory reconstruction, and a novel context‑forcing distillation, and also showcases Maya1 emotional TTS and RFdiffusion3 protein design models.

AI researchWorldPlaycontext memory
0 likes · 6 min read
WorldPlay: Real‑Time Interactive World Modeling with Long‑Term Geometry Consistency
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Dec 30, 2025 · Artificial Intelligence

Bridging Tokenizer Gaps: Cross-Tokenizer Knowledge Distillation at AAAI 2026

This paper introduces SeDi, a semantics‑ and distribution‑aware cross‑tokenizer knowledge distillation framework that aligns teacher and student token spaces via bipartite graph components and top‑K re‑encoding, achieving state‑of‑the‑art performance and lower exposure bias on multiple LLM benchmarks.

AI researchLanguage Modelscross-tokenizer distillation
0 likes · 10 min read
Bridging Tokenizer Gaps: Cross-Tokenizer Knowledge Distillation at AAAI 2026
PaperAgent
PaperAgent
Dec 29, 2025 · Artificial Intelligence

Unveiling Bottom‑up Policy Optimization: Boosting LLM Reasoning with Internal Strategies

This article introduces Bottom‑up Policy Optimization (BuPO), a novel reinforcement‑learning framework that treats large language models as collections of internal layer and modular policies, revealing distinct inference entropy patterns in Llama and Qwen‑3 and demonstrating superior performance on complex mathematical reasoning benchmarks.

AI researchBottom-up OptimizationInternal Policy
0 likes · 10 min read
Unveiling Bottom‑up Policy Optimization: Boosting LLM Reasoning with Internal Strategies
PaperAgent
PaperAgent
Dec 26, 2025 · Artificial Intelligence

What Google’s 2025 AI Breakthroughs Reveal About the Future of Intelligent Agents

Google’s 2025 research recap highlights eight major breakthroughs—from the Gemini 3 series achieving unprecedented multimodal reasoning and efficiency, to AI‑driven advances in scientific discovery, creative generation, quantum computing, climate resilience, and responsible AI safety—showcasing how intelligent agents are reshaping products, research, and global challenges.

AI researchAI safetyMultimodal AI
0 likes · 10 min read
What Google’s 2025 AI Breakthroughs Reveal About the Future of Intelligent Agents
HyperAI Super Neural
HyperAI Super Neural
Dec 19, 2025 · Artificial Intelligence

Weekly AI Paper Digest: Open-Source LLMs, Agent Systems, and Long-Context Reasoning

This week’s AI paper roundup reviews six recent research works—including RecGPT‑V2, Nemotron 3 Nano, FrontierScience benchmark, AutoGLM, Deeper‑GXX, and QwenLong‑L1.5—highlighting advances in large‑language‑model‑driven recommendation, Mixture‑of‑Experts models, expert‑level scientific reasoning, GUI‑based foundation agents, graph neural network deepening, and ultra‑long‑context inference.

AI researchAgent systemsLong-context reasoning
0 likes · 6 min read
Weekly AI Paper Digest: Open-Source LLMs, Agent Systems, and Long-Context Reasoning
AI Frontier Lectures
AI Frontier Lectures
Dec 15, 2025 · Artificial Intelligence

How UnityVideo Unifies Multimodal Training to Boost Video Generation

UnityVideo, a new vision framework from HKUST, CUHK, Tsinghua and Kuaishou, unifies training across depth, flow, pose, segmentation and RGB modalities, achieving faster convergence, higher video quality, zero‑shot generalization and stronger physical reasoning compared with existing single‑modality video generators.

AI researchUnityVideoVision models
0 likes · 15 min read
How UnityVideo Unifies Multimodal Training to Boost Video Generation
PaperAgent
PaperAgent
Dec 13, 2025 · Artificial Intelligence

Why Unified Multimodal Models Are the Key to Next‑Gen AGI – A Deep Survey

This article surveys the latest research on Unified Multimodal Foundations (UFM), explaining why integrating understanding and generation across text, image, video, and audio is essential for AGI, and detailing modeling paradigms, encoding/decoding strategies, training pipelines, benchmarks, and real‑world applications.

AI researchMultimodalTraining
0 likes · 10 min read
Why Unified Multimodal Models Are the Key to Next‑Gen AGI – A Deep Survey
BirdNest Tech Talk
BirdNest Tech Talk
Dec 7, 2025 · Artificial Intelligence

Recreating DeerFlow’s Multi‑Agent Research Pipeline with LangGraphGo in 30 Minutes

This article walks through the open‑source DeerFlow framework—its multi‑agent architecture, core features, and a step‑by‑step implementation using the Go‑based LangGraphGo library, covering planner, researcher, reporter and podcast nodes, state‑graph design, CLI/web modes, and deployment instructions.

AI researchLLMLangGraphGo
0 likes · 14 min read
Recreating DeerFlow’s Multi‑Agent Research Pipeline with LangGraphGo in 30 Minutes
DataFunTalk
DataFunTalk
Dec 7, 2025 · Artificial Intelligence

Is the World Model the Key to AGI? Inside the AI Debate

The article examines the chaotic rise of “world models” in AI, tracing their origins from early mental‑model theory to modern representation‑ and generation‑based approaches, and argues that the current hype reflects a broader shift away from large language models toward embodied, physics‑grounded intelligence.

AI researchgenerative videorepresentation learning
0 likes · 13 min read
Is the World Model the Key to AGI? Inside the AI Debate
Data Party THU
Data Party THU
Dec 2, 2025 · Artificial Intelligence

FFGo: Turning the First Frame into a Conceptual Memory for Video Customization

FFGo reveals that the first frame of text‑to‑video models acts as a conceptual memory buffer storing visual entities, and by using a few‑shot LoRA trained on only 20‑50 curated examples with a special transition prompt, it reliably activates multi‑object fusion, enabling high‑quality, controllable video customization without model architecture changes.

AI researchconceptual memoryfew-shot LoRA
0 likes · 9 min read
FFGo: Turning the First Frame into a Conceptual Memory for Video Customization
Kuaishou Tech
Kuaishou Tech
Nov 25, 2025 · Artificial Intelligence

How Flow‑GRPO Boosts Image Generation Accuracy to 95% with Online Reinforcement Learning

Flow‑GRPO introduces online reinforcement learning into flow‑matching models by converting deterministic ODE sampling to stochastic SDE sampling and reducing denoising steps, raising SD‑3.5‑Medium's GenEval accuracy from 63% to 95%—surpassing GPT‑4o—and demonstrating strong gains in complex composition, text rendering, and human‑preference alignment across multiple generative tasks.

AI researchOnline RLdeep learning
0 likes · 8 min read
How Flow‑GRPO Boosts Image Generation Accuracy to 95% with Online Reinforcement Learning
HyperAI Super Neural
HyperAI Super Neural
Nov 19, 2025 · Artificial Intelligence

LocDiff: Achieving Global-Scale Precise Image Geolocation Without Grids or Reference Libraries

The LocDiff framework introduces a spherical‑harmonics Dirac‑delta encoding and a conditional Siren‑UNet diffusion model that enables accurate worldwide image geolocation without relying on predefined grids or external image libraries, outperforming prior methods in precision, generalization, and computational efficiency.

AI researchDiffusion ModelsLocDiff
0 likes · 16 min read
LocDiff: Achieving Global-Scale Precise Image Geolocation Without Grids or Reference Libraries
Data Party THU
Data Party THU
Nov 13, 2025 · Artificial Intelligence

What Makes the Free Transformer a Game‑Changer in AI Decoding?

The Free Transformer paper introduces a decoder architecture that injects random latent variables to condition generation, breaking traditional GPT constraints and achieving notable performance gains on reasoning‑heavy benchmarks such as HumanEval+, MBPP, GSM8K, MMLU, and CSQA.

AI researchFree TransformerTransformer
0 likes · 10 min read
What Makes the Free Transformer a Game‑Changer in AI Decoding?
Alimama Tech
Alimama Tech
Nov 11, 2025 · Artificial Intelligence

Industrial-Scale Graph Learning: Boosting Ad ROI and Winning Beijing’s Science Award

The award‑winning industrial graph learning system developed by Peking University and Alibaba Mama combines novel dynamic graph embedding and GNN techniques, scales to millions of merchants, and has driven over 12% ad ROI improvement while publishing dozens of top‑conference papers.

AI researchGraph Neural Networksadvertising optimization
0 likes · 6 min read
Industrial-Scale Graph Learning: Boosting Ad ROI and Winning Beijing’s Science Award
Data Party THU
Data Party THU
Nov 5, 2025 · Artificial Intelligence

How VLM‑FO1 Turns Vision‑Language Models into Precise Perception Machines

VLM‑FO1 introduces a generate‑plus‑reference paradigm that replaces coordinate generation with region token referencing, adding plug‑in modules such as a proposal generator, a hybrid fine‑grained encoder, and a region‑language connector to give any pretrained visual language model accurate, fine‑grained perception while preserving its original capabilities.

AI researchMultimodalPlug-and-Play
0 likes · 15 min read
How VLM‑FO1 Turns Vision‑Language Models into Precise Perception Machines
Data Party THU
Data Party THU
Oct 29, 2025 · Artificial Intelligence

Can Test-Time Scaling Unlock More Reliable Vision‑Language‑Action Robots?

The paper introduces RoboMonkey, a framework that applies a generate‑and‑verify paradigm and test‑time scaling to Vision‑Language‑Action models, showing that increasing sampling and verification at inference dramatically reduces action error across multiple VLA architectures, and presents scalable verifier training, synthetic data augmentation, and efficient deployment strategies.

AI researchAction VerificationRoboMonkey
0 likes · 8 min read
Can Test-Time Scaling Unlock More Reliable Vision‑Language‑Action Robots?
DataFunTalk
DataFunTalk
Oct 29, 2025 · Artificial Intelligence

OpenAI Unveils $25B AI Initiative and Multi‑Year AGI Roadmap

OpenAI’s recent restructuring created the OpenAI Foundation, pledged $25 billion to health and AI‑resilience research, outlined a multi‑year AGI timeline, announced plans for AI hardware, and set milestones for an AI research intern by next September and a fully autonomous AI researcher by 2028.

AGIAI hardwareAI research
0 likes · 3 min read
OpenAI Unveils $25B AI Initiative and Multi‑Year AGI Roadmap
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Oct 18, 2025 · Artificial Intelligence

Time Series Paper Digest (Oct 11‑17 2025): FIRE, CauchyNet, EvoRate, CoRA

From Oct 11‑17 2025, this digest presents four recent AI papers on time‑series forecasting: FIRE introduces a frequency‑domain decomposition with independent amplitude‑phase modeling and adaptive weighting; CauchyNet leverages holomorphic activations for compact, data‑efficient learning; the EvoRate framework quantifies learnability via mutual information; and CoRA adds covariate‑aware adaptation to foundation models, all reporting significant accuracy gains and enhanced interpretability.

AI researchcovariate-aware adaptationdeep learning
0 likes · 10 min read
Time Series Paper Digest (Oct 11‑17 2025): FIRE, CauchyNet, EvoRate, CoRA
Meituan Technology Team
Meituan Technology Team
Oct 15, 2025 · Artificial Intelligence

What’s New in Large Model Research? Top Meituan AI Papers Up to Oct 2025

This curated list showcases Meituan’s latest large‑model breakthroughs and academic papers up to October 2025, spanning LLM system optimizations, multimodal generation, evaluation benchmarks, quantization techniques, and reinforcement‑learning‑driven improvements, offering researchers valuable insights and resources across the AI landscape.

AI researchBenchmarkingMultimodal AI
0 likes · 10 min read
What’s New in Large Model Research? Top Meituan AI Papers Up to Oct 2025
DataFunTalk
DataFunTalk
Oct 9, 2025 · Artificial Intelligence

From Physics to DeepMind: How a Tsinghua Star Is Shaping AI Research

Google DeepMind hired Shunyu Yao, a Tsinghua physics prodigy and former Anthropic researcher, whose rapid transition from theoretical physics to AI highlights the intense workload, values clash, and the accelerating pace of large‑model research.

AI researchDeepMindPhysics
0 likes · 9 min read
From Physics to DeepMind: How a Tsinghua Star Is Shaping AI Research
Data Party THU
Data Party THU
Oct 1, 2025 · Artificial Intelligence

Why SFT and RL Are Two Sides of the Same Coin: A Unified Gradient Theory for LLM Post‑Training

This article analyzes a recent paper that unifies supervised fine‑tuning (SFT) and reinforcement learning (RL) for large language models under a single gradient estimator, introduces the Unified Policy Gradient Estimator (UPGE) and the Hybrid Post‑Training (HPT) algorithm, and demonstrates their superior performance on math reasoning benchmarks.

AI researchHybrid TrainingLLM
0 likes · 11 min read
Why SFT and RL Are Two Sides of the Same Coin: A Unified Gradient Theory for LLM Post‑Training
AIWalker
AIWalker
Sep 23, 2025 · Artificial Intelligence

Manzano: A Small 3B Multimodal Model That Unifies Image Understanding and Generation with SOTA Performance

Manzano introduces a hybrid vision tokenizer and a three‑stage training recipe that let a 3‑billion‑parameter multimodal LLM achieve state‑of‑the‑art results on both image‑understanding benchmarks and text‑to‑image generation, while scaling smoothly to larger sizes and minimizing task conflict.

AI researchLarge Language ModelManzano
0 likes · 25 min read
Manzano: A Small 3B Multimodal Model That Unifies Image Understanding and Generation with SOTA Performance
Amap Tech
Amap Tech
Sep 19, 2025 · Artificial Intelligence

How FSDrive Uses Spatio‑Temporal CoT to Revolutionize Autonomous Driving

FSDrive introduces a spatio‑temporal chain‑of‑thought approach that enables visual language models to generate future driving scenes as images, improving trajectory planning accuracy and safety by eliminating cross‑modal gaps and enforcing physical constraints in autonomous driving.

AI researchautonomous drivingspatio-temporal CoT
0 likes · 10 min read
How FSDrive Uses Spatio‑Temporal CoT to Revolutionize Autonomous Driving
Data Party THU
Data Party THU
Sep 18, 2025 · Artificial Intelligence

How Reinforcement Learning is Shaping the Future of Large Reasoning Models

This article surveys recent advances in applying reinforcement learning to large reasoning models, outlining the historical background, key breakthroughs like OpenAI o1 and DeepSeek‑R1, current challenges in reward design and scalability, and future research directions toward more capable AI systems.

AI researchRLHFreasoning
0 likes · 9 min read
How Reinforcement Learning is Shaping the Future of Large Reasoning Models
DataFunTalk
DataFunTalk
Sep 18, 2025 · Artificial Intelligence

How Tongyi DeepResearch Turns Chatty AI into a Research Powerhouse

Tongyi DeepResearch, an open‑source AI model and framework, achieves SOTA on multiple Deep Research benchmarks by combining fully open‑source models, frameworks, and data pipelines, and introduces novel agentic pre‑training, fine‑tuning, and reinforcement‑learning methods to enable complex multi‑step reasoning and real‑world applications.

AI researchagentic reinforcement learningopen source
0 likes · 14 min read
How Tongyi DeepResearch Turns Chatty AI into a Research Powerhouse