Tagged articles
268 articles
Page 2 of 3
AI Algorithm Path
AI Algorithm Path
Jun 8, 2025 · Artificial Intelligence

Autoregressive vs Diffusion Language Models: Principles, Trade‑offs, and Future Directions

The article compares autoregressive and diffusion language models, detailing their mathematical foundations, training and inference pipelines, performance trade‑offs such as speed, coherence and diversity, and explores hybrid approaches and emerging research directions for more efficient and controllable text generation.

AI researchText GenerationTransformer
0 likes · 17 min read
Autoregressive vs Diffusion Language Models: Principles, Trade‑offs, and Future Directions
DataFunTalk
DataFunTalk
Jun 8, 2025 · Artificial Intelligence

Why Autoregressive Video Models Like MAGI-1 May Outperform Diffusion Approaches

The article examines the current dominance of diffusion models in commercial video generation, contrasts them with autoregressive methods, and details how the open‑source MAGI‑1 model combines both paradigms to achieve longer, more controllable video synthesis while addressing scalability and quality challenges.

AI researchAutoregressive ModelsDiffusion Models
0 likes · 70 min read
Why Autoregressive Video Models Like MAGI-1 May Outperform Diffusion Approaches
Architect
Architect
Jun 7, 2025 · Artificial Intelligence

Mass Framework: Boosting Multi‑Agent Design with Smarter Prompts & Topologies

The Mass framework, developed by Google and Cambridge University, automates multi‑agent system design by jointly optimizing prompts and topologies through three staged processes, demonstrating significant performance gains over existing methods across various tasks while highlighting the importance of coordinated prompt‑topology optimization.

AI researchMass frameworkTopology Design
0 likes · 6 min read
Mass Framework: Boosting Multi‑Agent Design with Smarter Prompts & Topologies
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jun 6, 2025 · Artificial Intelligence

How dots.llm1 Sets New Benchmarks for Open‑Source MoE Language Models

dots.llm1, an open‑source 142‑billion‑parameter Mixture‑of‑Experts language model from hi lab, achieves Qwen2.5‑72B‑level performance after training on 11.2 T high‑quality tokens, and the release includes full models, intermediate checkpoints, and detailed training pipelines for the research community.

AI researchMixture of ExpertsTraining Efficiency
0 likes · 10 min read
How dots.llm1 Sets New Benchmarks for Open‑Source MoE Language Models
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 5, 2025 · Artificial Intelligence

How Deep (Re)Search Transforms Code Search and AI-Powered Knowledge Retrieval

This article systematically explains the concepts of Deep Search and Deep Research, contrasts them with traditional Retrieval‑Augmented Generation, reviews leading commercial and open‑source solutions, details their architecture for code retrieval, and outlines future plans for specialized code‑search agents.

AI researchKnowledge RetrievalRetrieval Augmented Generation
0 likes · 13 min read
How Deep (Re)Search Transforms Code Search and AI-Powered Knowledge Retrieval
Kuaishou Tech
Kuaishou Tech
Jun 4, 2025 · Artificial Intelligence

KwaiCoder-AutoThink-preview: An Automatic‑Thinking Large Model Enhanced with Step‑SRPO Reinforcement Learning

The KwaiPilot team released the KwaiCoder‑AutoThink‑preview model, which introduces a novel automatic‑thinking training paradigm and a process‑supervised reinforcement‑learning method called Step‑SRPO, enabling the model to dynamically switch between thinking and non‑thinking modes, reduce inference cost, and achieve up to 20‑point gains on code and math benchmarks while handling large‑scale codebases.

AI researchModel OptimizationReinforcement Learning
0 likes · 12 min read
KwaiCoder-AutoThink-preview: An Automatic‑Thinking Large Model Enhanced with Step‑SRPO Reinforcement Learning
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jun 3, 2025 · Artificial Intelligence

Beyond One-Size-Fits-All: Tailored Benchmarks for Efficient Evaluation

The TailoredBench framework dramatically reduces large‑language‑model evaluation cost and error by using a global probe set, model‑specific source selection, extensible K‑Medoids clustering, and calibration, achieving up to 300× speedup and a 31.4% MAE reduction across diverse benchmarks.

AI researchK-MedoidsLLM evaluation
0 likes · 10 min read
Beyond One-Size-Fits-All: Tailored Benchmarks for Efficient Evaluation
AntTech
AntTech
May 31, 2025 · Artificial Intelligence

Machine Reasoning and Deep Thinking: Insights from Ant Financial’s NLP Lead Wu Wei

The article explores how DeepSeek R1 and long‑thinking chains have revived interest in machine reasoning, tracing the evolution of natural‑language models, defining reasoning as logical knowledge composition, and outlining future research directions in efficient reasoning architectures and deep‑thinking applications.

AI researchEfficient ReasoningLarge Language Models
0 likes · 8 min read
Machine Reasoning and Deep Thinking: Insights from Ant Financial’s NLP Lead Wu Wei
ShiZhen AI
ShiZhen AI
May 28, 2025 · Artificial Intelligence

Claude Finally Gets Voice: Anthropic Adds Speech to Its AI Assistant

Anthropic has introduced a voice mode for Claude, enabling English users to speak and type interchangeably with five voice personalities, while a new 3D AI startup, SpAItial, showcases photorealistic room generation and researchers present INTUITOR, a confidence‑driven training method that improves AI reasoning.

AI researchAnthropicClaude
0 likes · 7 min read
Claude Finally Gets Voice: Anthropic Adds Speech to Its AI Assistant
AI Frontier Lectures
AI Frontier Lectures
May 28, 2025 · Artificial Intelligence

How Token‑Shuffle Enables 2048×2048 Autoregressive Image Generation

The article analyzes the Token‑Shuffle method, which reduces visual token redundancy to allow high‑resolution (2048×2048) autoregressive image generation, detailing its architecture, training pipeline, experimental results, efficiency gains, and comparisons with diffusion and other AR models.

AI researchAutoregressive ModelsHigh‑Resolution Image Generation
0 likes · 17 min read
How Token‑Shuffle Enables 2048×2048 Autoregressive Image Generation
AI Frontier Lectures
AI Frontier Lectures
May 27, 2025 · Artificial Intelligence

Can One-Step Generative Modeling Beat Multi-Step Diffusion? Inside MeanFlow

The article presents MeanFlow, a novel one‑step generative modeling framework that replaces instantaneous velocity with an average‑velocity field, achieving a record‑low FID of 3.43 on ImageNet 256×256 with a single function evaluation and outperforming both prior single‑step and multi‑step diffusion models.

AI researchFIDImageNet
0 likes · 7 min read
Can One-Step Generative Modeling Beat Multi-Step Diffusion? Inside MeanFlow
Baobao Algorithm Notes
Baobao Algorithm Notes
May 26, 2025 · Artificial Intelligence

When Should Large Language Models Think? 10 Cutting‑Edge Strategies to Boost Reasoning Efficiency

This article reviews ten recent papers that tackle the over‑thinking problem in large language models by shortening chain‑of‑thought reasoning, introducing dynamic early‑exit, adaptive thinking triggers, and reinforcement‑learning‑based training, showing how models can maintain or improve accuracy while dramatically reducing token usage and latency.

AI researchModel Pruningadaptive inference
0 likes · 38 min read
When Should Large Language Models Think? 10 Cutting‑Edge Strategies to Boost Reasoning Efficiency
JD Tech
JD Tech
May 20, 2025 · Artificial Intelligence

How Re‑parameterization and Adaptive Learning Boost Visual Deep Learning Efficiency

The award‑winning project from Tsinghua University and JD Retail introduces re‑parameterization model design, cross‑scene adaptive learning, and platform‑aware compression to overcome accuracy‑efficiency trade‑offs in visual deep learning, achieving over 20% accuracy gains and more than 50% inference speedup in real‑world e‑commerce deployments.

AI researchComputer Visionadaptive models
0 likes · 6 min read
How Re‑parameterization and Adaptive Learning Boost Visual Deep Learning Efficiency
AI Frontier Lectures
AI Frontier Lectures
May 19, 2025 · Artificial Intelligence

DreamO: Multi‑Condition Image Customization with a 400M Flux‑Based Model

DreamO, a collaborative effort by ByteDance and Peking University, introduces a unified 400M‑parameter framework built on Flux‑1.0‑dev that enables simultaneous control of identity, style, appearance, and virtual try‑on, offering open‑source, low‑cost, and fast image customization comparable to commercial large models.

AI researchDreamOFlux model
0 likes · 6 min read
DreamO: Multi‑Condition Image Customization with a 400M Flux‑Based Model
Amap Tech
Amap Tech
May 19, 2025 · Artificial Intelligence

Group Policy Gradient: Direct Objective Optimization for Faster Reinforcement Learning

The article introduces Group Policy Gradient (GPG), a reinforcement‑learning framework that eliminates surrogate loss functions and critic models, directly optimizes the original objective, reduces bias and variance, and achieves state‑of‑the‑art performance on both single‑modal and multimodal tasks.

AI researchLLM fine-tuningPolicy Gradient
0 likes · 7 min read
Group Policy Gradient: Direct Objective Optimization for Faster Reinforcement Learning
Amap Tech
Amap Tech
May 12, 2025 · Artificial Intelligence

How G3PT Uses Autoregressive Modeling to Revolutionize 3D Generation

The paper introduces G3PT, a groundbreaking autoregressive 3D generation model that employs a Cross‑Scale Querying Transformer and multi‑scale tokenization to produce high‑quality meshes from a single image, outperforming diffusion‑based methods and revealing a scaling law for 3D generation.

3D generationAI researchG3PT
0 likes · 9 min read
How G3PT Uses Autoregressive Modeling to Revolutionize 3D Generation
AI Frontier Lectures
AI Frontier Lectures
May 10, 2025 · Artificial Intelligence

Can the ‘Canon’ Layer Unlock New Limits in Large Language Models?

A new study introduces the lightweight “Canon” layer for large language models, showing how it improves information flow, inference depth, and scalability across Transformers, linear attention, and state‑space architectures, while offering a controlled synthetic pre‑training benchmark for deeper architectural analysis.

AI researchLarge Language ModelsMamba
0 likes · 11 min read
Can the ‘Canon’ Layer Unlock New Limits in Large Language Models?
Baobao Algorithm Notes
Baobao Algorithm Notes
Apr 28, 2025 · Artificial Intelligence

What Makes Qwen3 the Next Leap in Large Language Models?

The article announces Qwen3, detailing its flagship 235B and smaller MoE models, superior benchmark performance, extensive multilingual support, expanded pretraining data, four-stage post‑training, flexible thinking modes, deployment guides for SGLang, vLLM, Ollama, and future plans toward AGI‑level capabilities.

AI researchDeploymentQwen3
0 likes · 15 min read
What Makes Qwen3 the Next Leap in Large Language Models?
DataFunTalk
DataFunTalk
Apr 25, 2025 · Artificial Intelligence

Does Reinforcement Learning Really Expand Reasoning Capacity in Large Language Models? Insights from Recent Empirical Study

Recent empirical research by Tsinghua’s LeapLab and Shanghai Jiao Tong University reveals that reinforcement‑learning‑based fine‑tuning (RLVR) improves sampling efficiency but does not extend the fundamental reasoning abilities of large language models beyond their base capabilities, as demonstrated across mathematics, code, and visual reasoning benchmarks.

AI researchLarge Language ModelsRLVR
0 likes · 12 min read
Does Reinforcement Learning Really Expand Reasoning Capacity in Large Language Models? Insights from Recent Empirical Study
Architect
Architect
Apr 21, 2025 · Artificial Intelligence

Microsoft Research Releases BitNet b1.58 2B4T: A 1‑Bit Native Large Language Model with Ultra‑Low Memory and Energy Consumption

Microsoft Research introduced BitNet b1.58 2B4T, a native 1‑bit large language model with 2 billion parameters trained on 4 trillion tokens, achieving only 0.4 GB non‑embedding memory, 0.028 J decoding energy, and 29 ms CPU latency while matching full‑precision performance.

1-bit LLMAI researchBitNet
0 likes · 7 min read
Microsoft Research Releases BitNet b1.58 2B4T: A 1‑Bit Native Large Language Model with Ultra‑Low Memory and Energy Consumption
21CTO
21CTO
Apr 17, 2025 · Artificial Intelligence

What’s New in OpenAI’s GPT‑4.1? Bigger Context, Faster, Cheaper AI

OpenAI has launched GPT‑4.1, a multimodal AI model that expands context windows to one million tokens, improves coding and instruction following, offers cheaper Mini and Nano variants, and signals a shift in its release roadmap, including plans to retire GPT‑4 and delay GPT‑5.

AI researchGPT-4.1OpenAI
0 likes · 5 min read
What’s New in OpenAI’s GPT‑4.1? Bigger Context, Faster, Cheaper AI
Baobao Algorithm Notes
Baobao Algorithm Notes
Apr 16, 2025 · Artificial Intelligence

Why Reinforcement Learning Finally Works: The Second Half of AI

The article argues that AI has entered its second half, where reinforcement learning finally generalizes thanks to large‑scale language pretraining and reasoning, shifting focus from building ever better models to redefining problems, evaluation methods, and real‑world utility.

AI researchindustry trends
0 likes · 16 min read
Why Reinforcement Learning Finally Works: The Second Half of AI
DevOps
DevOps
Apr 13, 2025 · Artificial Intelligence

The Amazing Magic of GPT‑4o and a Speculative Technical Roadmap

This article reviews the breakthrough image‑generation capabilities of GPT‑4o, showcases diverse examples, and offers a detailed speculation on its underlying autoregressive architecture, tokenization methods, VQ‑VAE/GAN advances, and training strategies that could explain its performance.

AI researchGPT-4oImage Generation
0 likes · 16 min read
The Amazing Magic of GPT‑4o and a Speculative Technical Roadmap
AntTech
AntTech
Apr 10, 2025 · Artificial Intelligence

Ant Group Presents Four AI Research Papers at ICLR 2025 Live Showcase

At the ICLR 2025 live session in Singapore, Ant Group showcased four cutting‑edge papers—CodePlan, Animate‑X, Group Position Embedding, and OmniKV—demonstrating advances in large‑language‑model reasoning, universal character animation, layout‑aware document understanding, and efficient long‑context inference.

AI researchLarge Language ModelsMultimodal
0 likes · 6 min read
Ant Group Presents Four AI Research Papers at ICLR 2025 Live Showcase
DevOps
DevOps
Apr 7, 2025 · Artificial Intelligence

Meta Llama 4 Scout, Maverick, and Behemoth: Architecture, NoPE Innovation, and Training Advances

The article introduces Meta's newly open‑sourced Llama 4 series—including Scout with a 1 billion‑token context window, Maverick with 400 billion parameters, and the upcoming Behemoth teacher model—detailing their expert‑mix architecture, the NoPE positional‑encoding removal, training pipelines, performance benchmarks, and infrastructure improvements for large‑scale AI research.

AI researchContext WindowLlama 4
0 likes · 8 min read
Meta Llama 4 Scout, Maverick, and Behemoth: Architecture, NoPE Innovation, and Training Advances
AntTech
AntTech
Mar 31, 2025 · Artificial Intelligence

Ant Group Papers Accepted at ICLR 2025: Summaries and Links

The article presents the abstracts, publication types, links, and research areas of seventeen Ant Group papers accepted at ICLR 2025, covering topics such as embodied robot co‑design, efficient distributed training for large language models, optimization via LLMs, character animation, interactive frame interpolation, KV‑cache management, and privacy‑preserving Transformers.

AI researchAnt GroupICLR2025
0 likes · 23 min read
Ant Group Papers Accepted at ICLR 2025: Summaries and Links
AI Frontier Lectures
AI Frontier Lectures
Mar 30, 2025 · Artificial Intelligence

Do Large Language Models Mirror Human Brain Language Processing? Google’s Groundbreaking Findings

Google researchers discovered a linear relationship between brain activity recorded during natural conversation and the internal embeddings of a speech‑to‑text large language model, revealing that acoustic and lexical representations from the model can accurately predict neural responses in both language comprehension and production.

AI researchGoogleLarge Language Models
0 likes · 8 min read
Do Large Language Models Mirror Human Brain Language Processing? Google’s Groundbreaking Findings
AI Frontier Lectures
AI Frontier Lectures
Mar 30, 2025 · Artificial Intelligence

How NOVA Generates High‑Quality Video Autoregressively Without Vector Quantization

This article provides an in‑depth analysis of the NOVA model, a non‑quantized autoregressive video generation framework that combines frame‑by‑frame temporal prediction with set‑by‑set spatial prediction, uses diffusion loss for token estimation, and achieves state‑of‑the‑art results on multiple video and image benchmarks.

AI researchAutoregressive ModelNOVA
0 likes · 15 min read
How NOVA Generates High‑Quality Video Autoregressively Without Vector Quantization
MaGe Linux Operations
MaGe Linux Operations
Mar 26, 2025 · Artificial Intelligence

Why Qwen2.5‑VL‑32B Is the New AI Breakthrough for Vision and Math

Alibaba's newly released Qwen2.5‑VL‑32B multimodal model delivers state‑of‑the‑art visual and textual performance, offering human‑aligned responses, superior mathematical reasoning, fine‑grained image understanding, and efficient deployment features that make it a compelling tool for developers and AI researchers alike.

AI researchQwen2.5-VL-32Blarge language model
0 likes · 9 min read
Why Qwen2.5‑VL‑32B Is the New AI Breakthrough for Vision and Math
AI Frontier Lectures
AI Frontier Lectures
Mar 25, 2025 · Artificial Intelligence

What Drives Alignment in Multimodal Large Language Models? A Comprehensive Review

This article provides an in‑depth review of alignment algorithms for multimodal large language models, covering application scenarios, dataset construction methods, evaluation benchmarks, current challenges, and future research directions, while summarizing contributions from leading academic institutions.

AI researchalignment algorithmsdataset construction
0 likes · 22 min read
What Drives Alignment in Multimodal Large Language Models? A Comprehensive Review
AI Algorithm Path
AI Algorithm Path
Mar 20, 2025 · Artificial Intelligence

Understanding Multimodal Large Language Models: Recent Advances and Comparative Analysis

This article surveys the latest multimodal large language model research, dissecting the design, training strategies, and performance trade‑offs of models such as Llama 3.2, Molmo, NVLM, Qwen2‑VL, Pixtral, MM1.5, Emu3, and Janus, and highlights the challenges of fair cross‑model evaluation.

AI researchCross-AttentionLarge Language Models
0 likes · 16 min read
Understanding Multimodal Large Language Models: Recent Advances and Comparative Analysis
AIWalker
AIWalker
Mar 14, 2025 · Artificial Intelligence

Dynamic Tanh Lets He Kaiming and LeCun Drop Transformer Normalization in 9 Lines

Researchers He Kaiming, Yann LeCun and colleagues propose a 9‑line Dynamic Tanh (DyT) layer that replaces LayerNorm/RMSNorm in Transformers, showing comparable or superior accuracy across vision, language, speech and DNA tasks while also reducing inference latency on modern GPUs.

AI researchDeep LearningDynamic Tanh
0 likes · 18 min read
Dynamic Tanh Lets He Kaiming and LeCun Drop Transformer Normalization in 9 Lines
AIWalker
AIWalker
Mar 11, 2025 · Artificial Intelligence

Introducing FAR: A Frequency‑Progressive Autoregressive Paradigm for Image Generation

The paper presents FAR, a frequency‑aware autoregressive framework that predicts image tokens from low‑frequency to high‑frequency components using a continuous tokenizer, and demonstrates its efficiency and quality on ImageNet and text‑to‑image benchmarks compared with existing AR and VAR methods.

AI researchAutoregressive ModelsFAR
0 likes · 20 min read
Introducing FAR: A Frequency‑Progressive Autoregressive Paradigm for Image Generation
AIWalker
AIWalker
Mar 5, 2025 · Artificial Intelligence

Attention Distillation in Diffusion Models: CVPR 2025 Technique Outperforms Traditional Image Generation

The paper introduces a novel attention‑distillation loss and a guided‑sampling scheme that together enable diffusion models to faithfully transfer visual features from reference images, dramatically speeding synthesis and surpassing prior plug‑and‑play attention methods across style transfer, text‑to‑image generation, and texture synthesis tasks.

AI researchDiffusion ModelsImage Generation
0 likes · 15 min read
Attention Distillation in Diffusion Models: CVPR 2025 Technique Outperforms Traditional Image Generation
Data Thinking Notes
Data Thinking Notes
Mar 4, 2025 · Artificial Intelligence

Unlock AI-Powered Research: The DeepSeek‑R1 & DeepResearch Guide

Compiled by Tsinghua University experts, this guide systematically analyzes the DeepSeek‑R1 inference model and DeepResearch platform, offering multi‑model comparisons, real‑world case studies, and end‑to‑end AI‑driven solutions from data collection to report generation for researchers.

AI researchData AutomationDeepSeek
0 likes · 6 min read
Unlock AI-Powered Research: The DeepSeek‑R1 & DeepResearch Guide
Architect
Architect
Mar 3, 2025 · Artificial Intelligence

Unlocking Reasoning LLMs: Methods, DeepSeek R1 Insights, and Cost‑Effective Strategies

This article examines how to build and improve reasoning‑capable large language models, explains the definition and use‑cases of reasoning models, details DeepSeek‑R1’s training pipeline, compares four key enhancement methods—including inference‑time scaling, pure RL, SFT + RL, and distillation—and offers budget‑friendly advice.

AI researchDeepSeekInference Scaling
0 likes · 27 min read
Unlocking Reasoning LLMs: Methods, DeepSeek R1 Insights, and Cost‑Effective Strategies
Tencent Cloud Developer
Tencent Cloud Developer
Feb 27, 2025 · Artificial Intelligence

DeepSeek LLM Series (V1‑V3, R1) Technical Overview and Analysis

The DeepSeek technical overview details the evolution from the dense 67 B V1 model through the 236 B MoE‑based V2 and 671 B V3 with FP8 training, to the RL‑only R1 series that learns reasoning without supervision, highlighting innovations such as Grouped‑Query Attention, Multi‑Head Latent Attention, load‑balancing‑free MoE, Multi‑Token Prediction, and knowledge distillation, and reporting state‑of‑the‑art benchmark results and open‑source reproduction projects.

AI researchDeepSeekMixture of Experts
0 likes · 37 min read
DeepSeek LLM Series (V1‑V3, R1) Technical Overview and Analysis
Architecture Digest
Architecture Digest
Feb 25, 2025 · Artificial Intelligence

DeepSeek Distillation Technology: Overview, Innovations, Architecture, Training, Performance, and Challenges

DeepSeek’s distillation technology combines data and model distillation to transfer knowledge from large teacher models to compact student models, detailing its definitions, principles, key innovations, architecture, training methods, performance gains, and challenges, especially in multimodal contexts.

AI researchDeepSeekLarge Language Models
0 likes · 16 min read
DeepSeek Distillation Technology: Overview, Innovations, Architecture, Training, Performance, and Challenges
Architect
Architect
Feb 22, 2025 · Artificial Intelligence

How Open‑Source Projects Reproduced DeepSeek‑R1 and Pushed LLM Limits

This article reviews the most notable open‑source reproductions of DeepSeek‑R1—including Open R1, OpenThoughts, LIMO and DeepScaleR—detailing their data pipelines, training steps, reinforcement‑learning strategies, dataset constructions, and benchmark results that demonstrate how small, high‑quality data can rival massive‑scale models.

AI researchDeepSeek-R1Model Scaling
0 likes · 26 min read
How Open‑Source Projects Reproduced DeepSeek‑R1 and Pushed LLM Limits
NewBeeNLP
NewBeeNLP
Feb 21, 2025 · Artificial Intelligence

Do Scaling Laws Still Hold? Analyzing Grok‑3, Deepseek and LLM Training Trends

The article examines whether pre‑training scaling laws remain valid, compares Grok‑3’s architecture and training strategy with Deepseek models, and explores how different scaling approaches—pre‑training, RL‑based, and test‑time—affect the cost‑effectiveness and intelligence of large language models.

AI researchGrok-3scaling laws
0 likes · 11 min read
Do Scaling Laws Still Hold? Analyzing Grok‑3, Deepseek and LLM Training Trends
Architect
Architect
Feb 20, 2025 · Artificial Intelligence

Why Long CoT and In‑Context RL Are the Next Frontier for LLMs

The article analyses recent breakthroughs such as OpenAI's o1, Long CoT, and test‑time search, arguing that enabling LLMs to perform self‑critique and reinforcement learning with long output sequences is essential for future AI performance, while warning against overly structured workflows.

AI researchIn‑Context RLLLM
0 likes · 12 min read
Why Long CoT and In‑Context RL Are the Next Frontier for LLMs
AIWalker
AIWalker
Feb 20, 2025 · Artificial Intelligence

Transfusion: A Single Model for Unified Image Generation and Understanding

Transfusion is a 7B‑parameter transformer that jointly trains language modeling and diffusion losses on mixed text‑image data, enabling seamless text generation, image generation, and image understanding within one model and outperforming prior multimodal approaches such as Chameleon across multiple benchmarks.

AI researchImage GenerationLanguage Modeling
0 likes · 20 min read
Transfusion: A Single Model for Unified Image Generation and Understanding
AIWalker
AIWalker
Feb 16, 2025 · Artificial Intelligence

VARGPT: A Unified Autoregressive Architecture for Multimodal Understanding and Generation

VARGPT is a novel multimodal large language model that unifies visual understanding and autoregressive image generation within a single architecture, extending LLaVA with next‑token and next‑scale prediction, trained through three staged data‑curated phases and achieving superior performance on numerous vision‑language benchmarks.

AI researchImage GenerationMultimodal
0 likes · 20 min read
VARGPT: A Unified Autoregressive Architecture for Multimodal Understanding and Generation
Baobao Algorithm Notes
Baobao Algorithm Notes
Feb 13, 2025 · Artificial Intelligence

How to Build and Improve Reasoning LLMs: Methods, Trade‑offs, and DeepSeek Insights

This article explains what reasoning language models are, when they are needed, and reviews four main techniques— inference‑time scaling, pure reinforcement learning, combined SFT + RL, and distillation—illustrated with DeepSeek‑R1’s development, cost analysis, and low‑budget alternatives.

AI researchDeepSeekInference Scaling
0 likes · 27 min read
How to Build and Improve Reasoning LLMs: Methods, Trade‑offs, and DeepSeek Insights
ZhongAn Tech Team
ZhongAn Tech Team
Feb 10, 2025 · Artificial Intelligence

Weekly AI Technology Overview: OpenAI ChatGPT Search, Deep Research, DeepSeek Advances, and Industry Insights

This week’s AI roundup covers OpenAI’s fully open ChatGPT Search, the launch of Deep Research for automated multi‑step research, NetEase Youdao’s integration of DeepSeek‑R1, Figure’s robot partnership break with OpenAI, low‑cost AI model s1, OpenAI’s Stargate data‑center plans, Google’s antitrust probe, DeepSeek’s traffic surge, and top AI scientist Xu joining Alibaba.

AI researchChatGPTDeepSeek
0 likes · 9 min read
Weekly AI Technology Overview: OpenAI ChatGPT Search, Deep Research, DeepSeek Advances, and Industry Insights
Top Architect
Top Architect
Feb 9, 2025 · Artificial Intelligence

DeepSeek‑R1: Training Pipeline, Reinforcement‑Learning Techniques, and Experimental Results

The article reviews DeepSeek‑R1’s training methodology—including cold‑start data collection, multi‑stage RL fine‑tuning, SFT data generation, and model distillation—highlights its performance comparable to OpenAI‑o1‑1217, and discusses key contributions, reward design, successful experiments, and failed attempts.

AI researchDeepSeekLLM
0 likes · 12 min read
DeepSeek‑R1: Training Pipeline, Reinforcement‑Learning Techniques, and Experimental Results
Architect
Architect
Feb 6, 2025 · Artificial Intelligence

DeepSeek‑R1: Reinforcement‑Learning‑Driven Long‑Chain Reasoning for Large Language Models

The article reviews DeepSeek‑R1, detailing its reinforcement‑learning‑based training pipeline that uses minimal supervised data, cold‑start fine‑tuning, multi‑stage RL, rejection‑sampling SFT, and distillation to achieve reasoning performance comparable to OpenAI‑o1‑1217, while also discussing successful contributions and failed experiments.

AI researchDeepSeek-R1LLM reasoning
0 likes · 11 min read
DeepSeek‑R1: Reinforcement‑Learning‑Driven Long‑Chain Reasoning for Large Language Models
AIWalker
AIWalker
Jan 17, 2025 · Artificial Intelligence

InternLM 3.0: Boosting Model Performance with Only 4 TB of Training Data

Shanghai AI Laboratory’s InternLM 3.0 upgrade demonstrates that refining data quality—measured as intelligence‑per‑token—can replace massive datasets, achieving higher reasoning and dialogue capabilities with just 4 TB of tokens, cutting training cost by over 75 % while approaching GPT‑4‑level performance.

AI researchInternLMModel Evaluation
0 likes · 9 min read
InternLM 3.0: Boosting Model Performance with Only 4 TB of Training Data
DevOps
DevOps
Jan 7, 2025 · Artificial Intelligence

Microsoft’s 2025 AI Predictions: Stronger Models, AI Agents, AI Companions, Efficient Resources, Testing & Customization, and Accelerated Scientific Research

Microsoft outlines six 2025 AI forecasts—including more powerful models, autonomous AI agents reshaping work, AI companions aiding daily life, greener resource use, rigorous testing and customization, and AI-driven scientific breakthroughs—highlighting how these advances will transform industries, research, and everyday experiences.

2025 predictionsAI modelsAI research
0 likes · 8 min read
Microsoft’s 2025 AI Predictions: Stronger Models, AI Agents, AI Companions, Efficient Resources, Testing & Customization, and Accelerated Scientific Research
21CTO
21CTO
Jan 2, 2025 · Artificial Intelligence

2025 AI Breakthroughs: Unlimited Memory & Intelligent Agents, Says Eric Schmidt

Former Google CEO Eric Schmidt warns that AI is on the brink of a transformative era, highlighting three 2025 breakthroughs—unlimited context memory, autonomous AI agents, and text‑to‑action programming—while also stressing the looming risks of energy consumption, security threats, and the need for ethical safeguards.

AI SafetyAI memoryAI research
0 likes · 14 min read
2025 AI Breakthroughs: Unlimited Memory & Intelligent Agents, Says Eric Schmidt
DaTaobao Tech
DaTaobao Tech
Dec 30, 2024 · Artificial Intelligence

AI Research Highlights: AAAI 2025 & NeurIPS 2024 Breakthroughs in Image Generation

This article compiles recent AI research breakthroughs presented at AAAI 2025 and NeurIPS 2024, summarizing eight papers on multi‑condition image generation, mixed auto‑regressive models, hallucination mitigation in vision‑language models, quantized diffusion denoising, facial part swapping, language‑guided concept vectors, attribution consistency, and video virtual try‑on, with links to each work.

AAAI 2025AI researchDiffusion Models
0 likes · 13 min read
AI Research Highlights: AAAI 2025 & NeurIPS 2024 Breakthroughs in Image Generation
Baobao Algorithm Notes
Baobao Algorithm Notes
Dec 16, 2024 · Artificial Intelligence

What Do Leading Open‑Source LLMs Do After Pretraining? A Deep Dive into Post‑Training Strategies

This article surveys the post‑training pipelines of major open‑source large language models released this year, detailing their alignment algorithms, data synthesis, reward modeling, DPO/GRPO variants, long‑context handling, tool use, and model‑averaging techniques, and highlights emerging trends such as data‑centric pipelines and iterative weak‑to‑strong alignment.

AI researchAlignmentLLM
0 likes · 99 min read
What Do Leading Open‑Source LLMs Do After Pretraining? A Deep Dive into Post‑Training Strategies
DataFunTalk
DataFunTalk
Nov 30, 2024 · Artificial Intelligence

Interview with Rich Sutton on Continuous Learning, Reinforcement Learning, and the Future of AI

In this extensive interview, Rich Sutton critiques the focus on transient deep learning, advocates for continuous learning, discusses the reward hypothesis, outlines research challenges, offers advice to emerging scholars, and predicts breakthroughs in AI understanding by 2030‑2040.

AI researchReinforcement Learningcontinuous learning
0 likes · 27 min read
Interview with Rich Sutton on Continuous Learning, Reinforcement Learning, and the Future of AI
Alipay Experience Technology
Alipay Experience Technology
Nov 27, 2024 · Artificial Intelligence

EchoMimicV2: High‑Quality Audio‑Driven Half‑Body Human Animation with Simple Inputs

EchoMimicV2 is an open‑source digital‑human framework that generates high‑quality half‑body animation videos from a single reference image, an audio clip, and a hand‑gesture sequence, addressing challenges of facial portrait limits, complex condition injection, and inference latency in audio‑driven animation.

AI researchDiffusion ModelsDigital Human
0 likes · 18 min read
EchoMimicV2: High‑Quality Audio‑Driven Half‑Body Human Animation with Simple Inputs
360 Tech Engineering
360 Tech Engineering
Nov 15, 2024 · Artificial Intelligence

Advances in Multimodal Large Models and Document Understanding Presented at the 2024 Global Machine Learning Conference (Beijing)

At the 2024 Global Machine Learning Conference in Beijing, 360 AI Research Institute showcased cutting‑edge multimodal large‑model research, fine‑grained open‑world object detection, and document understanding technologies, highlighting open‑source releases, real‑world deployments, and competitive achievements in AI competitions.

AI researchMultimodal AIdocument understanding
0 likes · 7 min read
Advances in Multimodal Large Models and Document Understanding Presented at the 2024 Global Machine Learning Conference (Beijing)
Tencent Cloud Developer
Tencent Cloud Developer
Nov 6, 2024 · Artificial Intelligence

Overview of Tencent Hunyuan Large and 3D Generation Model Open‑Source Release

Tencent has open‑sourced its 389‑billion‑parameter Hunyuan Large Mixture‑of‑Experts model—featuring 52 B active parameters, 256 K token context, novel routing, KV‑cache compression, and advanced training optimizations that beat leading open‑source models—and its first text‑to‑3D/image‑to‑3D Hunyuan 3D Generation model, both downloadable via GitHub, Hugging Face, and Tencent Cloud.

3D generationAI researchMixture of Experts
0 likes · 9 min read
Overview of Tencent Hunyuan Large and 3D Generation Model Open‑Source Release
Meituan Technology Team
Meituan Technology Team
Oct 31, 2024 · Artificial Intelligence

Selected Meituan Papers from CIKM 2024: Summaries of Eight Research Works

This article highlights eight Meituan research papers accepted at CIKM 2024—spanning self‑supervised sequential recommendation, rating‑consistent explanation generation, CTR prediction via recommendation pre‑training, cross‑domain interest transfer, multimodal vector retrieval, design‑aware poster layout, order‑fulfillment cycle‑time forecasting, and delivery‑scope substitution—offering insights from both internal and university collaborations.

AI researchCTR predictionCross‑Domain Recommendation
0 likes · 16 min read
Selected Meituan Papers from CIKM 2024: Summaries of Eight Research Works
Baobao Algorithm Notes
Baobao Algorithm Notes
Oct 30, 2024 · Artificial Intelligence

How to Choose High-Quality Instruction Data for LLM Fine‑Tuning: Methods Compared

This article surveys and categorizes instruction data selection techniques for large language model fine‑tuning, explaining metric‑based, trainable‑LLM, powerful‑LLM, and small‑model approaches, detailing representative papers, their pipelines, and empirical findings on data quality and diversity.

AI researchData QualityInstruction Tuning
0 likes · 15 min read
How to Choose High-Quality Instruction Data for LLM Fine‑Tuning: Methods Compared
AntTech
AntTech
Oct 29, 2024 · Artificial Intelligence

Three Ant Group Papers Featured at EMNLP 2024: Dynamic Transformers, Plug‑and‑Play Visual Reasoner, and Efficient Fine‑Tuning of Large Language Models

This announcement introduces three Ant Group papers accepted at EMNLP 2024—Mixture‑of‑Modules for dynamic Transformer assembly, a plug‑and‑play visual reasoning framework built via data synthesis, and a layer‑wise importance‑aware efficient fine‑tuning method for large language models—highlighting their innovations and upcoming live presentations.

AI researchEMNLP 2024Large Language Models
0 likes · 6 min read
Three Ant Group Papers Featured at EMNLP 2024: Dynamic Transformers, Plug‑and‑Play Visual Reasoner, and Efficient Fine‑Tuning of Large Language Models
Baobao Algorithm Notes
Baobao Algorithm Notes
Oct 24, 2024 · Artificial Intelligence

How NoteLLM-2 Boosts Multimodal Recommendations with In-Content Learning

NoteLLM-2 introduces multimodal In-Content Learning and Late Fusion to overcome visual‑modality bias in end‑to‑end fine‑tuned large representation models, delivering significant gains over baseline multimodal LLMs and traditional retrieval methods in recommendation tasks.

AI researchRecommendation Systemscontrastive learning
0 likes · 11 min read
How NoteLLM-2 Boosts Multimodal Recommendations with In-Content Learning
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Oct 16, 2024 · Artificial Intelligence

How VICTORIA Revolutionizes Multi‑Object Image Editing with Language‑Aware Diffusion

The VICTORIA algorithm, presented by Alibaba Cloud AI Platform PAI and South China University of Technology at ACM MM 2024, leverages linguistic dependency parsing to guide cross‑attention in Stable Diffusion, enabling accurate, training‑free multi‑object image editing while preserving spatial structure and achieving state‑of‑the‑art results on benchmark datasets.

AI researchDiffusion ModelsStable Diffusion
0 likes · 10 min read
How VICTORIA Revolutionizes Multi‑Object Image Editing with Language‑Aware Diffusion
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Oct 15, 2024 · Artificial Intelligence

How VICTORIA Boosts Text‑Guided Image Editing with Language‑Aware Diffusion

The VICTORIA algorithm, presented by Alibaba Cloud's PAI team at ACM MM2024, leverages linguistic dependency parsing and cross‑attention control to overcome multi‑object editing challenges in training‑free text‑guided image editing, delivering precise, structure‑preserving results across diverse scenes.

AI researchDiffusion Modelsimage manipulation
0 likes · 6 min read
How VICTORIA Boosts Text‑Guided Image Editing with Language‑Aware Diffusion
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Oct 8, 2024 · Artificial Intelligence

Two NIRC Papers Accepted at NeurIPS 2024: FM-Delta Compression and GLAFF Forecasting

The Beijing University of Posts and Telecommunications' Network Intelligent Research Center (NIRC) had two papers accepted to NeurIPS 2024, presenting FM-Delta, a lossless compression technique that halves storage and cuts cloud costs by over 40%, and GLAFF, a global‑local fusion framework that markedly improves the robustness of time‑series forecasting across multiple domains.

AI researchFM-DeltaGLAFF
0 likes · 8 min read
Two NIRC Papers Accepted at NeurIPS 2024: FM-Delta Compression and GLAFF Forecasting
Fighter's World
Fighter's World
Sep 30, 2024 · Artificial Intelligence

Exploring Google NotebookLM: Use Cases, Interaction Experience, and Key Insights

The author reviews Google NotebookLM, describing how it aids deep paper reading, boosts chat willingness with guided prompts, maintains conversation coherence through self‑play insights, highlights the audio‑overview feature, and reflects on AI concepts such as the "bitter lesson" and the limits of self‑play in open scenarios.

AI researchGoogleLLM
0 likes · 22 min read
Exploring Google NotebookLM: Use Cases, Interaction Experience, and Key Insights
Kuaishou Tech
Kuaishou Tech
Sep 27, 2024 · Artificial Intelligence

XPSR: Cross‑modal Priors for Diffusion‑based Image Super‑Resolution

The paper introduces XPSR, a diffusion‑based image super‑resolution method that incorporates cross‑modal semantic priors from a large multimodal language model, achieving state‑of‑the‑art performance on both reference and no‑reference quality metrics across synthetic and real‑world video restoration tasks.

AI researchECCV2024cross‑modal priors
0 likes · 8 min read
XPSR: Cross‑modal Priors for Diffusion‑based Image Super‑Resolution
DataFunSummit
DataFunSummit
Sep 13, 2024 · Artificial Intelligence

Research on Domain Large Models by Fudan University Knowledge Workshop Lab

This article presents the Fudan University Knowledge Workshop Lab's comprehensive research on domain large models, covering background, domain adaptation, capability enhancement, collaborative workflows, challenges such as inference cost and alignment, and proposed solutions including source‑enhanced training, self‑correction mechanisms, and hybrid retrieval‑augmented generation.

AI researchKnowledge Graphsdomain adaptation
0 likes · 16 min read
Research on Domain Large Models by Fudan University Knowledge Workshop Lab
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 5, 2024 · Artificial Intelligence

Why Small LLMs Are the Secret Weapon for Scaling Large Model Research

The article explains how homologous small language models—trained on the same tokenizer and data as their large counterparts—serve as cheap, fast experimental platforms that can predict large‑model performance, guide pre‑training decisions, and support techniques like distillation and reward modeling.

AI researchLLM scalingQwen2
0 likes · 13 min read
Why Small LLMs Are the Secret Weapon for Scaling Large Model Research
360 Tech Engineering
360 Tech Engineering
Aug 29, 2024 · Artificial Intelligence

FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance

FancyVideo is an open‑source UNet‑based video generation model that supports arbitrary resolutions, aspect ratios, styles, and motion dynamics by introducing a Cross‑frame Textual Guidance Module (CTGM) with temporal injectors, refiners, and boosters, achieving state‑of‑the‑art results on multiple benchmarks and enabling versatile applications such as video extension, backtracking, and frame interpolation.

AI researchUNetVideo Generation
0 likes · 6 min read
FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance
AntTech
AntTech
Aug 28, 2024 · Artificial Intelligence

Ant Group’s Selected Papers at KDD2024: Abstracts and Highlights

The article presents a curated collection of Ant Group's research papers accepted at KDD2024, summarizing each paper's title, type, link, source, relevant fields, and abstract, covering topics such as graph mining, large language models, fraud detection, recommendation systems, and multimodal medical AI.

AI researchAnt GroupKDD2024
0 likes · 31 min read
Ant Group’s Selected Papers at KDD2024: Abstracts and Highlights
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 20, 2024 · Artificial Intelligence

How DAFNet Enables Efficient Sequential Editing of Large Language Models

This article introduces DAFNet, a dynamic auxiliary fusion framework that enables efficient sequential editing of large language models by injecting knowledge with reduced resource costs while preserving model reliability, generalization, and mitigating hallucination, and details its dataset, architecture, and evaluation results.

AI researchdynamic auxiliary fusionmodel editing
0 likes · 10 min read
How DAFNet Enables Efficient Sequential Editing of Large Language Models
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 19, 2024 · Artificial Intelligence

How Long‑Tail Knowledge Boosts Retrieval‑Augmented Large Language Models

The paper introduces a method that classifies user queries into ordinary and long‑tail types, applying retrieval‑augmented generation only to long‑tail queries, which improves large language model efficiency and accuracy by leveraging specialized knowledge detection metrics and an extended RAG pipeline.

AI researchECE metricRetrieval Augmented Generation
0 likes · 9 min read
How Long‑Tail Knowledge Boosts Retrieval‑Augmented Large Language Models
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 11, 2024 · Artificial Intelligence

Alibaba Cloud PAI’s Breakthroughs in Chinese Diffusion, Prompting, and LLM Knowledge Editing

Recent ACL 2024 papers from Alibaba Cloud’s PAI platform showcase open‑source Chinese diffusion models, an interactive multi‑turn prompt generator, a long‑tail knowledge‑aware retrieval‑augmented LLM approach, and a dynamic fusion network for sequential model editing, all integrated into cloud services.

AI researchDiffusion ModelsRetrieval Augmented Generation
0 likes · 11 min read
Alibaba Cloud PAI’s Breakthroughs in Chinese Diffusion, Prompting, and LLM Knowledge Editing
21CTO
21CTO
Jul 10, 2024 · Information Security

Did a Hacker Breach OpenAI’s Internal AI Discussions? Implications for Security

A New York Times report reveals that a hacker accessed OpenAI's internal messaging system, exposing employee discussions on AI advancements and sparking concerns about foreign espionage, internal security practices, and the broader national‑security implications of AI technology.

AI researchAI securityOpenAI
0 likes · 4 min read
Did a Hacker Breach OpenAI’s Internal AI Discussions? Implications for Security
DataFunSummit
DataFunSummit
Jul 9, 2024 · Artificial Intelligence

Applying Large Language Models to Recommendation Systems at Ant Group

This article details Ant Group's research on integrating large language models into recommendation pipelines, covering background challenges, knowledge extraction, teacher‑student distillation, experimental results, and practical Q&A for improving bias, efficiency, and cold‑start performance.

AI researchAnt GroupLarge Language Models
0 likes · 14 min read
Applying Large Language Models to Recommendation Systems at Ant Group
Baobao Algorithm Notes
Baobao Algorithm Notes
Jul 9, 2024 · Artificial Intelligence

Why Step-Level DPO Is Revolutionizing LLM Math Reasoning

This article reviews recent step‑level DPO research, compares it with instance‑level DPO, explains the underlying Monte Carlo Tree Search formulation, and presents the author’s own replication experiments that demonstrate consistent performance gains across multiple LLM sizes on GSM8K and MATH benchmarks.

AI researchLLM alignmentMCTS
0 likes · 10 min read
Why Step-Level DPO Is Revolutionizing LLM Math Reasoning
Meituan Technology Team
Meituan Technology Team
Jun 27, 2024 · Artificial Intelligence

Meituan Technical Team's Three Papers Accepted at SIGIR 2024: Ad Auction Integration, Federated Recommendation, and POI Recommendation

The article highlights three Meituan research papers accepted at SIGIR 2024—covering deep automated mechanism design for ad auction, a retrieval‑enhanced vertical federated recommendation framework, and disentangled contrastive hypergraph learning for next POI recommendation—and announces an online sharing event where the authors will present their work.

AI researchAd AuctionFederated Recommendation
0 likes · 9 min read
Meituan Technical Team's Three Papers Accepted at SIGIR 2024: Ad Auction Integration, Federated Recommendation, and POI Recommendation
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 27, 2024 · Artificial Intelligence

How to Supercharge Retrieval‑Augmented Generation: Papers, Techniques, and Real‑World Tips

This article surveys the main challenges of deploying large language models, introduces key RAG optimization papers such as RAPTOR, Self‑RAG, and CRAG, and compiles practical engineering tricks—including chunking, query rewriting, hybrid and progressive retrieval—to help practitioners build more accurate and efficient RAG systems.

AI researchLLM optimizationRAG
0 likes · 22 min read
How to Supercharge Retrieval‑Augmented Generation: Papers, Techniques, and Real‑World Tips
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jun 20, 2024 · Artificial Intelligence

Xiaohongshu 2024 Large Model Frontier Paper Sharing Live Event

On June 27, 2024, Xiaohongshu’s technical team will livestream a two‑hour session across WeChat Channels, Bilibili, Douyin and Xiaohongshu, showcasing six top‑conference papers on large‑model advances—including early‑stopping and fine‑grained self‑consistency, novel evaluation methods, negative‑sample‑assisted distillation, and LLM‑based note recommendation—followed by a Q&A and recruitment briefing.

AI researchLarge Language ModelsModel Evaluation
0 likes · 12 min read
Xiaohongshu 2024 Large Model Frontier Paper Sharing Live Event
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jun 18, 2024 · Artificial Intelligence

Free-Prompt-Editing: Efficient Text-Guided Image Editing with Stable Diffusion

The paper introduces Free-Prompt-Editing (FPE), a novel, efficient algorithm for text‑guided image editing that leverages probe analysis of cross‑ and self‑attention maps in Stable Diffusion, demonstrates its superiority over existing methods through extensive experiments, and provides open‑source implementation for both synthetic and real‑image editing.

AI researchStable Diffusionattention maps
0 likes · 12 min read
Free-Prompt-Editing: Efficient Text-Guided Image Editing with Stable Diffusion
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jun 17, 2024 · Artificial Intelligence

How Free-Prompt-Editing Revolutionizes Text-Guided Image Editing with Stable Diffusion

The paper introduces Free-Prompt-Editing, a concise and efficient algorithm that replaces self‑attention maps during denoising to achieve high‑quality text‑guided image edits without source prompts, and demonstrates its superiority over existing methods on both synthetic and real images.

AI researchFree-Prompt-Editingattention mechanisms
0 likes · 6 min read
How Free-Prompt-Editing Revolutionizes Text-Guided Image Editing with Stable Diffusion
DataFunTalk
DataFunTalk
Jun 15, 2024 · Artificial Intelligence

Research on Domain Large Models by Fudan University Knowledge Factory Lab

This article presents Fudan University's Knowledge Factory Lab research on domain large models, covering background, challenges, data selection, source‑enhanced tagging, capability improvements, self‑correction, collaborative workflows, and retrieval‑augmented generation for practical AI deployment.

AI researchLarge Language Modelsdomain adaptation
0 likes · 16 min read
Research on Domain Large Models by Fudan University Knowledge Factory Lab
DataFunSummit
DataFunSummit
Jun 6, 2024 · Artificial Intelligence

MetaGPT: Multi‑Agent Collaboration and Agent Capability Enhancement

This article introduces MetaGPT, an open‑source multi‑agent framework that leverages large language models to automate software development, data science, and simulation tasks, detailing its development, impact, experimental results, memory and reasoning enhancements, and comparisons with related systems.

AI researchAgent MemoryLLM agents
0 likes · 21 min read
MetaGPT: Multi‑Agent Collaboration and Agent Capability Enhancement
NewBeeNLP
NewBeeNLP
May 28, 2024 · Artificial Intelligence

How Generative Models Are Redefining Recommendation Systems

This article reviews recent advances in generative recommendation, highlighting challenges such as item representation and multimodal fusion, and summarizing four key research papers that propose novel tokenization, collaborative integration, and transformer-based multimodal approaches to improve recommendation performance.

AI researchGenerative RecommendationLLM
0 likes · 8 min read
How Generative Models Are Redefining Recommendation Systems
360 Tech Engineering
360 Tech Engineering
May 17, 2024 · Artificial Intelligence

360VL: An Open‑Source Multimodal Large Language Model Based on Llama‑3‑70B

The article introduces 360VL, an open‑source multimodal large language model built on Llama‑3‑70B, describes its novel C‑abs bridge architecture for high‑resolution visual understanding, outlines the two‑stage training with bilingual data, and presents benchmark results showing superior performance over prior LMMs.

AI researchLlama3Multimodal
0 likes · 8 min read
360VL: An Open‑Source Multimodal Large Language Model Based on Llama‑3‑70B
NewBeeNLP
NewBeeNLP
May 15, 2024 · Artificial Intelligence

How Large Language Models and Knowledge Graphs Can Boost Each Other

This talk reviews recent advances in large language models, compares them with knowledge graphs, explores how LLMs enhance knowledge extraction and completion, examines how knowledge graphs aid LLM evaluation and safe deployment, and outlines future interactive integration between the two technologies.

AI researchKnowledge GraphsLarge Language Models
0 likes · 13 min read
How Large Language Models and Knowledge Graphs Can Boost Each Other
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
May 15, 2024 · Artificial Intelligence

OpenAI Unveils GPT‑4o: An Omni‑Capable Multimodal Model Offered Free to All Users

OpenAI introduced GPT‑4o, a free, omni‑capable multimodal model that processes text, audio, and images together, delivers near‑human response latency, showcases impressive live demos, and will soon be available via a discounted API, marking a significant step forward in end‑to‑end AI research.

AI researchGPT-4oMultimodal AI
0 likes · 7 min read
OpenAI Unveils GPT‑4o: An Omni‑Capable Multimodal Model Offered Free to All Users
21CTO
21CTO
Apr 8, 2024 · Artificial Intelligence

How Naver’s HyperCLOVA X Advances Multilingual AI for Asian Languages

Naver’s newly unveiled HyperCLOVA X large‑language model, detailed in an arXiv technical report, claims superior cross‑lingual reasoning for Asian languages, especially Korean, by pre‑training on a data mix of Korean, multilingual text and code, achieving state‑of‑the‑art translation and multilingual capabilities.

AI researchHyperCLOVA XKorean NLP
0 likes · 4 min read
How Naver’s HyperCLOVA X Advances Multilingual AI for Asian Languages