Tagged articles
374 articles
Page 2 of 4
Fighter's World
Fighter's World
Nov 28, 2025 · Artificial Intelligence

Is Gemini 3 Pro Google’s New Starting Point? An In‑Depth Technical and Market Analysis

The article examines Google’s Gemini 3 Pro launch, highlighting its full‑stack vertical integration, advanced System 2 reasoning, dynamic compute budgeting, native multimodal architecture, TPU cost advantages, the Antigravity IDE platform, generative UI capabilities, and the strategic implications for Google’s AI ecosystem and competitive positioning.

AI InfrastructureAntigravityGemini 3 Pro
0 likes · 32 min read
Is Gemini 3 Pro Google’s New Starting Point? An In‑Depth Technical and Market Analysis
Kuaishou Tech
Kuaishou Tech
Nov 28, 2025 · Artificial Intelligence

Keye-VL-671B-A37B Leads Vision, Video, and Math Benchmarks

Kwai has open‑sourced its new flagship multimodal model Keye‑VL‑671B‑A37B, which upgrades visual perception, cross‑modal alignment and complex reasoning, achieving top scores on image, video, and mathematical reasoning benchmarks while detailing its architecture, three‑stage pre‑training, post‑training strategies, and future multimodal agent plans.

Deep Learninglarge language modelmultimodal
0 likes · 10 min read
Keye-VL-671B-A37B Leads Vision, Video, and Math Benchmarks
AI Large Model Application Practice
AI Large Model Application Practice
Nov 24, 2025 · Artificial Intelligence

How to Turn Text into an AI‑Powered PPT Video: A Step‑by‑Step Guide

This article breaks down the end‑to‑end engineering pipeline that converts a knowledge source such as a URL or PDF into a narrated PPT‑style video, detailing six core stages—from knowledge extraction and script generation to image creation, voice synthesis, and final video stitching—while highlighting practical model choices, prompt design, and stability tricks.

LLMPPTTTS
0 likes · 16 min read
How to Turn Text into an AI‑Powered PPT Video: A Step‑by‑Step Guide
Amap Tech
Amap Tech
Nov 19, 2025 · Artificial Intelligence

How Gaode’s Spacetime‑GR Model Boosts POI Recommendation with AI‑Powered SFT and DPO

Gaode transforms its map app into a dynamic, AI‑driven “living map” by fine‑tuning the large Spacetime‑GR model through embedding‑based and generative ranking SFT, DPO alignment, and multimodal augmentation, achieving significant offline CTR‑AUC improvements and online CTR gains in POI recommendation.

AI recommendationDPOSFT
0 likes · 12 min read
How Gaode’s Spacetime‑GR Model Boosts POI Recommendation with AI‑Powered SFT and DPO
Data Party THU
Data Party THU
Nov 5, 2025 · Artificial Intelligence

How VLM‑FO1 Turns Vision‑Language Models into Precise Perception Machines

VLM‑FO1 introduces a generate‑plus‑reference paradigm that replaces coordinate generation with region token referencing, adding plug‑in modules such as a proposal generator, a hybrid fine‑grained encoder, and a region‑language connector to give any pretrained visual language model accurate, fine‑grained perception while preserving its original capabilities.

AI researchPlug-and-PlayVLM
0 likes · 15 min read
How VLM‑FO1 Turns Vision‑Language Models into Precise Perception Machines
AsiaInfo Technology: New Tech Exploration
AsiaInfo Technology: New Tech Exploration
Nov 4, 2025 · Artificial Intelligence

How Multimodal Large Models Are Revolutionizing Video Analysis

This article examines the evolution from single‑frame video analysis to multimodal large models, detailing their architecture, optimization techniques, experimental validation on edge devices, and practical scenarios, while highlighting current limitations and future directions for AI‑driven video understanding.

AIComputer VisionEdge Computing
0 likes · 20 min read
How Multimodal Large Models Are Revolutionizing Video Analysis
Meituan Technology Team
Meituan Technology Team
Nov 3, 2025 · Artificial Intelligence

LongCat-Flash-Omni: 560B Open‑Source Multimodal Model with Real‑Time Interaction

LongCat-Flash-Omni, the latest open‑source model from Meituan, combines a 560 billion‑parameter architecture, efficient multimodal perception and speech reconstruction modules, and a progressive training strategy to deliver real‑time audio‑video interaction and state‑of‑the‑art performance across text, image, audio, and video tasks.

AIBenchmarklarge language model
0 likes · 9 min read
LongCat-Flash-Omni: 560B Open‑Source Multimodal Model with Real‑Time Interaction
AI Info Trend
AI Info Trend
Nov 3, 2025 · Industry Insights

2025 Q3 AI Landscape: Key Players, Model Trends, and Hardware Shifts

Artificial Analysis’s Q3 2025 AI report reveals a rapidly accelerating industry across the entire stack, with US and Chinese labs neck‑and‑neck, fierce competition among OpenAI, Google, Anthropic, xAI, DeepSeek and Alibaba, cost‑efficient models, booming multimodal agents, and a hardware race led by NVIDIA’s Blackwell accelerators.

2025AIBenchmark
0 likes · 12 min read
2025 Q3 AI Landscape: Key Players, Model Trends, and Hardware Shifts
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Nov 3, 2025 · Artificial Intelligence

How AI Agents Are Revolutionizing Technology: The New Engine of Innovation

This article explores the rise of AI agents—from their definition as intelligent digital assistants powered by large language models to their evolution through planning, memory, and tool use—highlighting real‑world applications, core technical mechanisms, code implementations, and future trends such as autonomy, multimodal fusion, standardization, and safety considerations.

AI AgentAutonomous AITool integration
0 likes · 24 min read
How AI Agents Are Revolutionizing Technology: The New Engine of Innovation
DataFunSummit
DataFunSummit
Oct 30, 2025 · Artificial Intelligence

How Multimodal Large Models Are Revolutionizing Document Processing and OCR

This article explores how the explosion of unstructured data exposes the limits of traditional OCR and shows how emerging multimodal large language models provide end‑to‑end document understanding, reduce pipeline complexity, cut training costs, enable hybrid retrieval‑augmented generation, and drive real‑world industry deployments.

AIDocument ProcessingOCR
0 likes · 28 min read
How Multimodal Large Models Are Revolutionizing Document Processing and OCR
BirdNest Tech Talk
BirdNest Tech Talk
Oct 30, 2025 · Artificial Intelligence

How to Build Multimodal Prompts with LangChain: A Step‑by‑Step Guide

Learn how LangChain enables multimodal interactions by preparing inputs, constructing prompts, invoking models like GPT‑4o, and processing responses, with a complete example that demonstrates image‑question answering, code walkthrough, environment setup, and key considerations for API keys and image URLs.

LLMLangChainOpenAI
0 likes · 9 min read
How to Build Multimodal Prompts with LangChain: A Step‑by‑Step Guide
AntTech
AntTech
Oct 28, 2025 · Artificial Intelligence

Ming-Flash-Omni-Preview: 103B Open-Source Multimodal Model Excelling in Image, Video, and Speech

Introducing Ming‑Flash‑Omni‑Preview, a 103‑billion‑parameter open‑source multimodal model built on a sparse MoE architecture that delivers state‑of‑the‑art performance in controllable image generation, streaming video understanding, and context‑aware speech recognition, surpassing prior models on GenEval and GEdit benchmarks.

image generationlarge language modelmultimodal
0 likes · 8 min read
Ming-Flash-Omni-Preview: 103B Open-Source Multimodal Model Excelling in Image, Video, and Speech
HyperAI Super Neural
HyperAI Super Neural
Oct 24, 2025 · Artificial Intelligence

Google Teams Unite on Earth AI: Boosting Geospatial Reasoning by 64% with Three Core Data Types

Google Research, X, and Cloud teams introduced Earth AI, a interoperable GeoAI model family that fuses image, population, and environmental data via a Gemini‑driven reasoning Agent, achieving state‑of‑the‑art performance and a 64% reasoning boost over Gemini 2.5 Pro while enabling non‑experts to run real‑time cross‑domain analyses.

AgentBenchmarkEarth AI
0 likes · 16 min read
Google Teams Unite on Earth AI: Boosting Geospatial Reasoning by 64% with Three Core Data Types
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Oct 17, 2025 · Artificial Intelligence

LucaOne: Unified Nucleic Acid & Protein Language Model Surpasses Other Models

Researchers present LucaOne, a Transformer‑based foundation model that unifies DNA/RNA and protein sequences using a 39‑token vocabulary, rotary positional encoding, and molecule‑type embeddings, and demonstrate through extensive multi‑task benchmarks that it outperforms domain‑specific models across seven biological tasks.

DNATransformerbioinformatics
0 likes · 5 min read
LucaOne: Unified Nucleic Acid & Protein Language Model Surpasses Other Models
Wuming AI
Wuming AI
Oct 16, 2025 · Industry Insights

Top AI Model Releases This Week: NanoChat, Ring‑1T, Qwen3‑VL, Veo 3.1, Claude Haiku 4.5

This week’s AI landscape saw Karpathy’s NanoChat open‑sourcing a 8‑K‑line ChatGPT replica, Ant Group unveiling a trillion‑parameter Ring‑1T model, Alibaba releasing the 4B/8B Qwen3‑VL visual language models that outperform Gemini 2.5 Flash Lite and GPT‑5 Nano, Google launching Veo 3.1 for high‑fidelity video generation, and Anthropic announcing Claude Haiku 4.5, a faster and cheaper LLM that excels on SWE‑bench benchmarks.

AI modelsVideo Generationlarge language models
0 likes · 7 min read
Top AI Model Releases This Week: NanoChat, Ring‑1T, Qwen3‑VL, Veo 3.1, Claude Haiku 4.5
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Oct 15, 2025 · Big Data

How MaxCompute’s AI‑Native Data Warehouse Redefines Big Data for the Generative AI Era

The article details Alibaba Cloud's MaxCompute transformation into an AI‑native data warehouse, highlighting its serverless elasticity, multimodal data management, unified model lifecycle, AI Function integration, and new distributed Python engine that together address the bursty, high‑complexity data and compute challenges of the generative AI era.

AI-nativedistributed Pythonmultimodal
0 likes · 11 min read
How MaxCompute’s AI‑Native Data Warehouse Redefines Big Data for the Generative AI Era
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Oct 12, 2025 · Artificial Intelligence

Trading-R1: Open-Source LLM Framework for Explainable Financial Trading

This article reviews Trading‑R1, an open‑source LLM inference framework that integrates multimodal financial data, three‑stage supervised‑fine‑tuning and reinforcement learning to generate structured investment arguments and risk‑adjusted trade decisions, achieving superior Sharpe ratio and drawdown performance on real‑world stock and ETF tests.

DatasetFinancial TradingLLM
0 likes · 11 min read
Trading-R1: Open-Source LLM Framework for Explainable Financial Trading
DataFunSummit
DataFunSummit
Oct 12, 2025 · Artificial Intelligence

How Kuaishou Uses Large Models to Supercharge Ad Targeting with COPE and LEARN

This article reviews Kuaishou's two‑year exploration of multimodal large‑model techniques for advertising, outlining challenges in content‑domain ad estimation, the COPE unified product representation framework, and the LEARN LLM knowledge‑transfer approach that together improve ad system performance.

AdvertisingKuaishouLLM
0 likes · 6 min read
How Kuaishou Uses Large Models to Supercharge Ad Targeting with COPE and LEARN
DataFunSummit
DataFunSummit
Oct 10, 2025 · Artificial Intelligence

How Kuaishou Boosted Ad Performance with Multimodal Large Models

This article reviews Kuaishou's two‑year exploration of large‑model techniques in advertising, outlining challenges in content‑domain ad estimation, introducing the COPE unified content representation framework and the LEARN LLM knowledge‑transfer approach, and showing how these innovations delivered tangible business gains.

AIAdvertisingKnowledge Transfer
0 likes · 5 min read
How Kuaishou Boosted Ad Performance with Multimodal Large Models
DataFunSummit
DataFunSummit
Oct 9, 2025 · Artificial Intelligence

How Kuaishou Boosted Ad Performance with Multimodal Large Models: COPE & LEARN

This article reviews Kuaishou's two‑year exploration of multimodal large‑model techniques for advertising, detailing challenges of fragmented user behavior, the COPE unified product representation framework, and the LEARN LLM knowledge‑transfer approach that together delivered measurable business gains.

AIAdvertisingKnowledge Transfer
0 likes · 6 min read
How Kuaishou Boosted Ad Performance with Multimodal Large Models: COPE & LEARN
Data Party THU
Data Party THU
Oct 9, 2025 · Artificial Intelligence

Can One Model Master All Audio‑Visual Tasks? Introducing Crab’s Unified Approach

This article presents Crab, a unified audio‑visual scene understanding model that leverages a novel display‑cooperation learning paradigm, introduces the AV‑UIE dataset with explicit reasoning steps, and demonstrates superior performance across temporal, spatial, pixel‑level, and spatio‑temporal tasks through extensive experiments and ablations.

BenchmarkDatasetLoRA
0 likes · 12 min read
Can One Model Master All Audio‑Visual Tasks? Introducing Crab’s Unified Approach
DataFunSummit
DataFunSummit
Oct 8, 2025 · Artificial Intelligence

How Kuaishou Boosted Ad Performance with Multimodal LLMs and the COPE Framework

This article reviews Kuaishou’s two‑year exploration of large‑model techniques in advertising, detailing the content‑domain estimation challenges, how multimodal and LLM approaches improve full‑domain behavior utilization and external knowledge integration, and introducing the COPE product‑content representation framework and the LEARN LLM knowledge‑transfer system.

AdvertisingKuaishouLLM
0 likes · 7 min read
How Kuaishou Boosted Ad Performance with Multimodal LLMs and the COPE Framework
Data Party THU
Data Party THU
Oct 6, 2025 · Artificial Intelligence

How OneCAT Redefines Multimodal AI with a Decoder‑Only Architecture

OneCAT introduces a unified decoder‑only transformer that eliminates separate visual encoders, employs a modality‑specific MoE, integrates multi‑scale visual generation, and achieves state‑of‑the‑art performance and efficiency across multimodal understanding, text‑to‑image synthesis, and image editing tasks.

AI modelOneCATdecoder-only
0 likes · 14 min read
How OneCAT Redefines Multimodal AI with a Decoder‑Only Architecture
DataFunSummit
DataFunSummit
Sep 30, 2025 · Artificial Intelligence

How Kuaishou Uses Large Models to Boost Ad Performance with COPE and LEARN

This article outlines Kuaishou's two‑year exploration of large‑model techniques in advertising, detailing challenges of sparse cross‑domain data, the COPE unified product representation framework, and the LEARN LLM knowledge‑transfer approach that together improve ad system effectiveness.

COPELLMRecommendation Systems
0 likes · 6 min read
How Kuaishou Uses Large Models to Boost Ad Performance with COPE and LEARN
DataFunSummit
DataFunSummit
Sep 30, 2025 · Artificial Intelligence

How Kuaishou Boosted Ad Performance with Multimodal LLMs: COPE & LEARN Frameworks

Over the past two years, Kuaishou has leveraged multimodal large‑model techniques to overcome sparse advertising data, integrating full‑domain user behavior and external knowledge via the COPE unified product representation framework and the LEARN LLM knowledge‑transfer system, achieving measurable business gains.

KuaishouLLMRecommendation Systems
0 likes · 6 min read
How Kuaishou Boosted Ad Performance with Multimodal LLMs: COPE & LEARN Frameworks
Tech Freedom Circle
Tech Freedom Circle
Sep 27, 2025 · Artificial Intelligence

What Is an AI‑Native Application and How to Design One?

The article explains the concept of AI‑native applications, distinguishes them from AI‑plugin extensions, outlines their core principles such as model‑first design, data flywheel, event‑driven agents, multimodal semantics, continuous learning, and provides a seven‑step practical guide with code examples for building an AI‑native app.

AI AssistantAI-nativeData Flywheel
0 likes · 23 min read
What Is an AI‑Native Application and How to Design One?
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Sep 26, 2025 · Artificial Intelligence

Paper Summaries: Recent AI-Driven Finance Research (Sep 20‑26, 2025)

This article presents concise English summaries of four recent arXiv papers that explore AI-driven trading frameworks, dual‑view risk‑relation identification from 10‑K filings, multimodal language models for financial forecasting, and credit‑spread prediction enhanced by non‑financial data, highlighting their methods, datasets, and performance results.

AICredit SpreadsRisk Modeling
0 likes · 9 min read
Paper Summaries: Recent AI-Driven Finance Research (Sep 20‑26, 2025)
AIWalker
AIWalker
Sep 23, 2025 · Artificial Intelligence

Manzano: A Small 3B Multimodal Model That Unifies Image Understanding and Generation with SOTA Performance

Manzano introduces a hybrid vision tokenizer and a three‑stage training recipe that let a 3‑billion‑parameter multimodal LLM achieve state‑of‑the‑art results on both image‑understanding benchmarks and text‑to‑image generation, while scaling smoothly to larger sizes and minimizing task conflict.

AI researchManzanohybrid tokenizer
0 likes · 25 min read
Manzano: A Small 3B Multimodal Model That Unifies Image Understanding and Generation with SOTA Performance
HyperAI Super Neural
HyperAI Super Neural
Sep 12, 2025 · Industry Insights

Why Apple and ASML Back Mistral AI: Inside Its Tech, Funding and Controversies

The article examines Mistral AI's rapid rise—from its Paris founding and record‑breaking seed round to ASML's €1.3 billion C‑round stake and Apple acquisition rumors—detailing its lightweight and multimodal models, open‑source strategy, product ecosystem, and the plagiarism and geopolitical debates that shape its valuation.

AI modelsASMLApple
0 likes · 15 min read
Why Apple and ASML Back Mistral AI: Inside Its Tech, Funding and Controversies
Architect's Journey
Architect's Journey
Sep 12, 2025 · Artificial Intelligence

Coze vs Yuanqi: In‑Depth Comparison of Two AI Agent Platforms – Who Will Own the Future?

This article provides a detailed side‑by‑side analysis of ByteDance's Coze and Tencent's Yuanqi, examining their features, performance, ecosystem integration, free‑tier limits, target users, and future prospects to help developers and enterprises choose the platform that best fits their needs.

AI agentsCozeEcosystem Integration
0 likes · 13 min read
Coze vs Yuanqi: In‑Depth Comparison of Two AI Agent Platforms – Who Will Own the Future?
DataFunTalk
DataFunTalk
Sep 11, 2025 · Artificial Intelligence

How AI Dressing and Multimodal Models Transform Home Service Experiences

During a pre-conference interview, AI expert Wang Mingzhong details how multimodal AI dressing, video résumé creation, short‑video templates, and interactive digital‑human live streams are technically realized for 58 Home Services, highlighting model training, workflow optimization, and future fusion of template‑based and agent‑driven video generation.

AIDigital HumanDomestic Service
0 likes · 11 min read
How AI Dressing and Multimodal Models Transform Home Service Experiences
DataFunTalk
DataFunTalk
Sep 7, 2025 · Artificial Intelligence

Why Apple’s FastVLM Is 85× Faster and What It Means for On‑Device AI

Apple recently open‑sourced its FastVLM and MobileCLIP2 models, showcasing a multimodal vision‑language system that runs up to 85 times faster than comparable models, enabling real‑time AI on iPhones and other edge devices while illustrating Apple’s broader “B‑plan” of on‑device small‑model AI strategy.

AppleFastVLMVision-Language Model
0 likes · 15 min read
Why Apple’s FastVLM Is 85× Faster and What It Means for On‑Device AI
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Sep 2, 2025 · Artificial Intelligence

Why Enterprise Large‑Model Digitalization Is So Hard: Key Challenges and Capabilities

The article analyzes why enterprise‑wide large‑model AI projects face steep hurdles, outlining required human capabilities, historical labor shifts, current hot technologies such as RAG, Agent, CoT and multimodal, their limits, a three‑stage implementation roadmap, typical case pitfalls, and the key success factors for sustainable digital transformation.

AgentCoTDigital Transformation
0 likes · 15 min read
Why Enterprise Large‑Model Digitalization Is So Hard: Key Challenges and Capabilities
IT Services Circle
IT Services Circle
Sep 1, 2025 · Artificial Intelligence

Unlocking Gemini CLI: Extending Google’s AI Agent for Any LLM

This article introduces the rapidly popular Gemini CLI, compares it with Claude Code, explains its core features, demonstrates coding, multimodal, and MCP use cases, and details the author’s Easy LLM CLI fork that enables custom model integration, flexible configuration, and direct code embedding for developers.

AI AgentGemini CLILLM integration
0 likes · 15 min read
Unlocking Gemini CLI: Extending Google’s AI Agent for Any LLM
Data Party THU
Data Party THU
Aug 31, 2025 · Artificial Intelligence

How Google’s Gemini 2.5 “Nano Banana” Redefines Image Generation and Editing

Google’s Gemini 2.5 Flash model, codenamed “Nano Banana”, dramatically improves visual quality, natural editing, identity consistency, instruction following, and generation speed, while researchers discuss its new metrics, interleaved generation capabilities, comparisons with Imagen, and future directions for smarter, more factual multimodal AI.

AI modelGeminiimage generation
0 likes · 23 min read
How Google’s Gemini 2.5 “Nano Banana” Redefines Image Generation and Editing
DataFunTalk
DataFunTalk
Aug 26, 2025 · Artificial Intelligence

Exploring Cutting-Edge AI & Knowledge Graph Applications: A Curated Resource Guide

This resource guide presents a curated list of cutting‑edge topics—including multimodal GraphRAG, knowledge‑graph‑driven large‑model applications in finance, traditional Chinese medicine, automotive manufacturing, and knowledge‑management trends—offering insights into AI‑powered knowledge services, and invites readers to scan the QR code to download the full e‑book.

AIData IntegrationKnowledge Graph
0 likes · 2 min read
Exploring Cutting-Edge AI & Knowledge Graph Applications: A Curated Resource Guide
Qborfy AI
Qborfy AI
Aug 25, 2025 · Artificial Intelligence

Unlocking AI Understanding: A Deep Dive into Embeddings and Their Real‑World Applications

This article explains how embeddings transform discrete items such as text, images, or user actions into continuous vectors, walks through the step‑by‑step workflow—from tokenization to normalization—highlights core properties, compares popular models, and showcases practical use cases in e‑commerce intent filtering and medical image retrieval, all backed by concrete examples and code.

AI fundamentalsembeddingsmodel comparison
0 likes · 7 min read
Unlocking AI Understanding: A Deep Dive into Embeddings and Their Real‑World Applications
Kuaishou Tech
Kuaishou Tech
Aug 23, 2025 · Artificial Intelligence

How Thyme Enables Models to Think Beyond Images with Code‑Driven Multimodal Reasoning

The Kwai Keye team presents Thyme, a novel multimodal reasoning framework that lets large language models generate and safely execute Python code for image manipulation and complex calculations, achieving significant performance gains over existing vision‑language models across perception, reasoning, and hallucination‑reduction benchmarks.

AI researchCode Generationlarge language model
0 likes · 12 min read
How Thyme Enables Models to Think Beyond Images with Code‑Driven Multimodal Reasoning
Instant Consumer Technology Team
Instant Consumer Technology Team
Aug 21, 2025 · Artificial Intelligence

How Data‑Juicer Supercharges LLM Training with High‑Quality Multimodal Data

Data‑Juicer is an open‑source, one‑stop multimodal data processing system that provides fine‑grained operators, scalable pipelines, and ready‑made recipes to deliver high‑quality, diverse, and model‑friendly data for large language model pre‑training, fine‑tuning, and multimodal applications.

AILLMdata preprocessing
0 likes · 22 min read
How Data‑Juicer Supercharges LLM Training with High‑Quality Multimodal Data
AI Info Trend
AI Info Trend
Aug 19, 2025 · Industry Insights

What’s Driving the AI Revolution in 2025? Key Trends and Insights

The 2025 H1 AI Core Achievements and Trends report reveals how agents are reshaping productivity, models are gaining inference power and becoming smaller, reinforcement learning is overtaking pre‑training, and industry competition is intensifying, with China and the US narrowing their technology gap.

AIChinaModel Trends
0 likes · 10 min read
What’s Driving the AI Revolution in 2025? Key Trends and Insights
AI Info Trend
AI Info Trend
Aug 13, 2025 · Industry Insights

How China’s AI Labs Are Closing the Gap with the US in Q2 2025

The Q2 2025 State of AI report analyzes Chinese AI labs’ rapid progress across language models, open‑source weights, and multimodal generation, showing a shrinking performance gap with US leaders, detailed benchmark scores, ecosystem classifications, and emerging competitive dynamics.

AIBenchmarkChina
0 likes · 10 min read
How China’s AI Labs Are Closing the Gap with the US in Q2 2025
Data Party THU
Data Party THU
Aug 11, 2025 · Artificial Intelligence

Can Hidden Signals Reveal Multimodal Model Jailbreaks? Introducing HiddenDetect

This article presents HiddenDetect, a training‑free method that leverages refusal‑semantic vectors and layer‑wise activation analysis to detect jailbreak attempts in multimodal large language models, revealing distinct safety signals across text and image modalities and demonstrating strong performance on several LVLM benchmarks.

LVLMactivation analysisjailbreak detection
0 likes · 7 min read
Can Hidden Signals Reveal Multimodal Model Jailbreaks? Introducing HiddenDetect
Volcano Engine Developer Services
Volcano Engine Developer Services
Aug 6, 2025 · Artificial Intelligence

How VeOmni Revolutionizes Multimodal Model Training with 40% Speed Gains

VeOmni, ByteDance’s open‑source unified multimodal training framework, tackles fragmented training pipelines by integrating LoRA fine‑tuning, FSDP, Ulysses, and Expert Parallel, delivering up to 40% higher throughput, up to 55% memory savings, and streamlined one‑click deployment for LLM, VLM, and video models.

AIFrameworkParallelism
0 likes · 14 min read
How VeOmni Revolutionizes Multimodal Model Training with 40% Speed Gains
AI Info Trend
AI Info Trend
Aug 4, 2025 · Industry Insights

How AI Agents and Small Models Are Redefining Productivity in 2025 H1

The report analyzes first‑half‑2025 AI breakthroughs, covering the rise of general‑purpose agents, rapid inference improvements, small‑model proliferation, reinforcement‑learning compute dominance, evolving transformer architectures, and shifting industry dynamics, offering actionable insights for researchers, product leaders, and decision‑makers.

AIAgentTrend
0 likes · 9 min read
How AI Agents and Small Models Are Redefining Productivity in 2025 H1
AIWalker
AIWalker
Aug 4, 2025 · Artificial Intelligence

Can Lumina-mGPT 2.0 Replace Diffusion Models? A Deep Dive into Its Autoregressive Power

Lumina-mGPT 2.0 is a decoder‑only, zero‑shot trained autoregressive image model that rivals diffusion systems like DALL·E 3 in quality while offering unified multimodal tokenization, flexible multi‑task generation, and several inference‑speed tricks, yet it still faces licensing, scaling and sampling‑time challenges.

AI model analysisInference OptimizationLumina-mGPT
0 likes · 22 min read
Can Lumina-mGPT 2.0 Replace Diffusion Models? A Deep Dive into Its Autoregressive Power
DataFunTalk
DataFunTalk
Jul 21, 2025 · Artificial Intelligence

Top AI & Knowledge Graph Resources: A Curated Guide to Emerging Research

This article presents a curated list of cutting‑edge resources covering multimodal GraphRAG, knowledge‑graph‑driven large‑model applications in finance, healthcare, automotive, and more, offering insights into the evolving synergy between AI and knowledge graphs.

AIKnowledge GraphLarge Model
0 likes · 2 min read
Top AI & Knowledge Graph Resources: A Curated Guide to Emerging Research
DataFunSummit
DataFunSummit
Jul 14, 2025 · Artificial Intelligence

How AI Agents Transform E‑commerce Content from Production to Optimization

This presentation explores the evolution of AI agents in e‑commerce content creation, detailing the transition from text‑only industrial production (1.0) to multimodal image and video generation (2.0) and finally to quality‑driven optimization and decision‑making (3.0), highlighting technical architectures, challenges, and future directions.

AIAutomationContent Generation
0 likes · 27 min read
How AI Agents Transform E‑commerce Content from Production to Optimization
Fun with Large Models
Fun with Large Models
Jul 10, 2025 · Artificial Intelligence

Grok 4: The ‘Problem‑Solving Champion’ That Falters in Real‑World Use – Detailed Evaluation

The article reviews Grok 4’s flashy launch and claimed first‑principles advantage, then presents benchmark results—showing strong reasoning, multimodal and agent scores but disappointing coding performance versus DeepSeek‑R1—concluding that the model’s real‑world capabilities fall short of its hype.

AgentGrok4LLM
0 likes · 11 min read
Grok 4: The ‘Problem‑Solving Champion’ That Falters in Real‑World Use – Detailed Evaluation
DataFunTalk
DataFunTalk
Jul 10, 2025 · Artificial Intelligence

Inside Elon Musk’s Grok‑4 Launch: Breakthrough AI Capabilities and Pricing

Elon Musk unveiled Grok‑4, a subscription‑based AI reasoning model that claims near‑human performance on elite exams, showcases unprecedented benchmark scores, multimodal understanding, voice synthesis, and a roadmap of upcoming coding and video generation models, while introducing a $30/month and $300/month tier.

AI modelBenchmarkGrok 4
0 likes · 6 min read
Inside Elon Musk’s Grok‑4 Launch: Breakthrough AI Capabilities and Pricing
Kuaishou Tech
Kuaishou Tech
Jul 7, 2025 · Artificial Intelligence

8 Kuaishou Papers Spotlighted at ICML 2025: Multimodal AI, Causal Inference and More

Kuaishou has had eight cutting‑edge papers accepted at the International Conference on Machine Learning 2025, covering breakthroughs in multimodal emotion modeling, monotonic probability learning, causal effect generalization, cascade ranking, multimodal LLM alignment, ultra‑low‑rate image compression, and visual autoregressive super‑resolution, with links to each work and accompanying code repositories.

AIcausal inferencemachine learning
0 likes · 13 min read
8 Kuaishou Papers Spotlighted at ICML 2025: Multimodal AI, Causal Inference and More
DataFunSummit
DataFunSummit
Jul 6, 2025 · Artificial Intelligence

AI-Driven Knowledge Graphs: Key Insights from Multimodal GraphRAG Research

This article presents a comprehensive overview of cutting‑edge research on integrating large language models with knowledge graphs, covering multimodal GraphRAG, financial AI solutions, traditional Chinese medicine decision support, and industry‑specific knowledge services, guiding readers through emerging paradigms and practical implementations.

AIEnterprise AIKnowledge Graph
0 likes · 2 min read
AI-Driven Knowledge Graphs: Key Insights from Multimodal GraphRAG Research
AntTech
AntTech
Jul 3, 2025 · Artificial Intelligence

How Ant Group’s AI Multimodal Evaluation Transforms Image, Speech, and Video Quality Testing

In a QECon 2025 talk, Ant Group’s AI team detailed a comprehensive multimodal evaluation framework that leverages large‑model metrics, custom pipelines, and benchmark datasets to assess image generation, speech recognition, and video quality, while also contributing to industry standards and academic research.

AI Evaluationimage assessmentlarge models
0 likes · 16 min read
How Ant Group’s AI Multimodal Evaluation Transforms Image, Speech, and Video Quality Testing
DataFunTalk
DataFunTalk
Jul 3, 2025 · Artificial Intelligence

How Vivo’s Blue Heart XiaoV Leverages LLMs to Transform Conversational Recommendations

In an interview with Vivo AI engineer Liang Tianan, the article explores the challenges of post‑Q&A recommendation, the integration of large language models into recall, ranking and evaluation pipelines, and the engineering trade‑offs required to deliver high‑quality, diverse suggestions on mobile devices.

LLMMobile AIRecommendation Systems
0 likes · 15 min read
How Vivo’s Blue Heart XiaoV Leverages LLMs to Transform Conversational Recommendations
DataFunTalk
DataFunTalk
Jun 29, 2025 · Artificial Intelligence

Large Models Boost Douyin User Experience: Expert Insights

In an interview at the DA Digital Intelligence Conference, ByteDance AI specialist Cai Conghuai explains how large language models, combined with techniques like SFT, DPO, and RAG, are reshaping Douyin's user‑experience signal detection, root‑cause analysis, and evaluation, while outlining future AI‑agent breakthroughs.

AIDPORAG
0 likes · 12 min read
Large Models Boost Douyin User Experience: Expert Insights
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 26, 2025 · Artificial Intelligence

How to Build a Multi‑Dimensional Evaluation Framework for AI‑Powered Data Analysis Platforms

This article outlines the design of a scientific, quantifiable, multi‑dimensional evaluation system for the DataV‑Note intelligent analysis platform, addressing the lack of unified standards and accuracy challenges in AI‑driven data reporting, and proposes concrete metrics, model architecture, and future automation plans.

AI EvaluationMetricsModel Design
0 likes · 13 min read
How to Build a Multi‑Dimensional Evaluation Framework for AI‑Powered Data Analysis Platforms
Open Source Linux
Open Source Linux
Jun 12, 2025 · Artificial Intelligence

From Transformers to DeepSeek‑R1: The Evolution of Large Language Models (2017‑2025)

This article chronicles the rapid development of large language models from the 2017 Transformer breakthrough through the rise of BERT, GPT‑3, multimodal models, alignment techniques like RLHF, and finally the cost‑efficient DeepSeek‑R1 in 2025, highlighting key innovations, scaling trends, and real‑world impacts.

AI AlignmentDeep LearningModel Scaling
0 likes · 26 min read
From Transformers to DeepSeek‑R1: The Evolution of Large Language Models (2017‑2025)
AI Algorithm Path
AI Algorithm Path
Jun 11, 2025 · Artificial Intelligence

OpenAI's O3‑Pro Model: Deep Reasoning, Pricing, Benchmarks, and Access Guide

OpenAI introduced the O3‑Pro multimodal deep‑reasoning model with an 80% price cut for O3, detailed its training via large‑scale reinforcement learning, compared its capabilities and costs against GPT‑4o, GPT‑4.1 and O3‑Pro, listed its core specs, limitations, access methods, and presented benchmark tests that highlight both strengths and weaknesses.

AIBenchmarkO3-Pro
0 likes · 10 min read
OpenAI's O3‑Pro Model: Deep Reasoning, Pricing, Benchmarks, and Access Guide
Kuaishou Audio & Video Technology
Kuaishou Audio & Video Technology
Jun 11, 2025 · Artificial Intelligence

Kuaishou Showcases 12 Cutting-Edge CVPR 2025 Papers on Video Generation and AI

Kuaishou presented twelve peer‑reviewed papers at CVPR 2025 covering video quality assessment, large‑scale video datasets, dynamic 3D avatar reconstruction, 4D scene simulation, controllable video generation, scaling laws for diffusion transformers, multimodal foundations, and more, highlighting the company's leading research in computer vision and AI.

AI researchCVPR2025Deep Learning
0 likes · 21 min read
Kuaishou Showcases 12 Cutting-Edge CVPR 2025 Papers on Video Generation and AI
DataFunSummit
DataFunSummit
Jun 10, 2025 · Artificial Intelligence

How Quwan’s Kaitian Model Tackles Emotional AI for Social Apps – Architecture, Training Tricks, and Safety

Quwan Technology presents its Kaitian social large model, designed for personalized, emotionally rich, multimodal AI interactions, detailing its scene‑specific goals, CPT+SFT+RLHF training pipeline, data desensitization, LoRA fine‑tuning, evaluation methods, pruning, latency trade‑offs, safety mechanisms, and future feedback loops.

AI SafetyLoRAModel Pruning
0 likes · 13 min read
How Quwan’s Kaitian Model Tackles Emotional AI for Social Apps – Architecture, Training Tricks, and Safety
Kuaishou Tech
Kuaishou Tech
Jun 5, 2025 · Artificial Intelligence

7 Kuaishou AI Papers Accepted at ACL 2025: Video Understanding & Safe LLM Decoding

Kuaishou’s foundational large-model team has secured seven papers at ACL 2025, spanning alignment bias in training, safety defenses during inference, decoding strategies, fine-grained video-temporal understanding, reward fairness in RLHF, multimodal captioning benchmarks, and methods to curb hallucinations in vision-language models.

ACLAI SafetyBenchmark
0 likes · 13 min read
7 Kuaishou AI Papers Accepted at ACL 2025: Video Understanding & Safe LLM Decoding
Fighter's World
Fighter's World
Jun 2, 2025 · Artificial Intelligence

Why Is Context King for Large Language Models?

This article provides a comprehensive technical analysis of LLM context, covering its definition, types, tokenization, window‑size evolution, diminishing returns, management techniques such as RAG, CoT, memory‑as‑a‑service, and future challenges like multimodal fusion, privacy, and autonomous agent memory.

Agent MemoryContext managementLLM
0 likes · 48 min read
Why Is Context King for Large Language Models?
Baidu MEUX
Baidu MEUX
May 28, 2025 · Artificial Intelligence

Top 10 AI Breakthroughs This Week: New Models, Tools, and Industry Moves

This roundup highlights ten recent AI developments, from Apple's Matrix3D model that creates 3D scenes from photos, to Qwen's Deep Research assistant, Tencent's CodeBuddy 3.0, ByteDance's Seed1.5‑VL, Step Star's open‑source Step1X‑3D, Google's iOS icon refresh, Apple's eye‑tracking scrolling test, Chrome's upcoming Gemini AI assistant, Shanghai's AI Identity Ecosystem Alliance, and Kuaishou's Keling AI 2.0 topping the global video‑generation leaderboard.

3D generationAI assistantsAI models
0 likes · 5 min read
Top 10 AI Breakthroughs This Week: New Models, Tools, and Industry Moves
DataFunTalk
DataFunTalk
May 23, 2025 · Artificial Intelligence

2025 AI Landscape: Inference Models Dominate, Open‑Source Momentum Accelerates

The 2025 Q1 AI report from Artificial Analysis highlights six major trends—including a thousand‑fold drop in inference cost, the rise of MoE models, the growing parity of Chinese open‑source labs, the emergence of autonomous AI agents, native multimodal capabilities, and the trade‑off between performance, cost, and context windows—painting a picture of a rapidly evolving, increasingly competitive AI ecosystem.

AIInferenceagents
0 likes · 11 min read
2025 AI Landscape: Inference Models Dominate, Open‑Source Momentum Accelerates
Baidu Tech Salon
Baidu Tech Salon
May 21, 2025 · Artificial Intelligence

Baidu AI Day 2024: Wenxin X1 Turbo Sets New Benchmark with Top‑Level Evaluation and Advanced Multimodal Capabilities

At Baidu AI Day in Beijing, the company unveiled the Wenxin 4.5 Turbo and X1 Turbo models, detailing multimodal training breakthroughs, self‑feedback loops, enhanced reasoning and tool‑calling, while the China Academy of Information and Communications Technology awarded X1 Turbo the highest "4+" rating across 24 capability tests, highlighting its leading position in domestic large‑model performance.

BaiduModel EvaluationWenxin
0 likes · 9 min read
Baidu AI Day 2024: Wenxin X1 Turbo Sets New Benchmark with Top‑Level Evaluation and Advanced Multimodal Capabilities
Tencent Technical Engineering
Tencent Technical Engineering
May 19, 2025 · Artificial Intelligence

RAG, Agents, and Multimodal Large Models: Evolution, Challenges, and Future Trends

This article examines the evolution of large model technologies—including Retrieval‑Augmented Generation, AI agents, and multimodal models—detailing their technical foundations, practical challenges, industry applications, and future development trends, offering a comprehensive perspective for AI practitioners and researchers.

AI AgentKnowledge RetrievalRAG
0 likes · 14 min read
RAG, Agents, and Multimodal Large Models: Evolution, Challenges, and Future Trends
Bilibili Tech
Bilibili Tech
May 16, 2025 · Artificial Intelligence

How FineVQ Sets New Standards for Fine‑Grained UGC Video Quality Assessment

The article introduces FineVD, the first large‑scale multi‑dimensional UGC video quality dataset, and presents FineVQ, a unified model that predicts quality scores, attributes, and distortion types across six dimensions, achieving state‑of‑the‑art performance on multiple benchmarks and cross‑dataset evaluations.

Computer VisionDatasetDeep Learning
0 likes · 9 min read
How FineVQ Sets New Standards for Fine‑Grained UGC Video Quality Assessment
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
May 14, 2025 · Artificial Intelligence

Hands‑On CLIP: Implementing Multimodal Vision‑Language Understanding

This article introduces OpenAI’s CLIP multimodal model, explains its architecture and contrastive training, details hardware and installation steps, and demonstrates a hands‑on zero‑shot image classification workflow that achieves 97% confidence on a cat image without any task‑specific fine‑tuning.

CLIPPythoncontrastive learning
0 likes · 6 min read
Hands‑On CLIP: Implementing Multimodal Vision‑Language Understanding
DevOps
DevOps
May 13, 2025 · Artificial Intelligence

The Rise of AI Agents: Current Trends, Core Capabilities, and Future Outlook

This article surveys the rapid emergence of AI agents, outlining their projected 2025 breakthrough, market momentum, key frameworks such as Manus and MCP, the four core abilities of perception, planning, tool use, and memory, and the evolving landscape of multimodal and autonomous AI systems.

AI agentsMemoryPlanning
0 likes · 11 min read
The Rise of AI Agents: Current Trends, Core Capabilities, and Future Outlook
DataFunSummit
DataFunSummit
May 13, 2025 · Artificial Intelligence

Integrating Large Language Models and Knowledge Graphs for Financial Applications: Challenges, Solutions, and Future Directions

This talk explores the technical challenges of applying large language models and knowledge graphs in finance, discusses solutions such as RAG enhancements, graph‑guided retrieval, multimodal extensions, and presents future research directions including multimodal graph integration, agentic systems, and decision‑making applications.

AIRAGagentic systems
0 likes · 33 min read
Integrating Large Language Models and Knowledge Graphs for Financial Applications: Challenges, Solutions, and Future Directions
Alimama Tech
Alimama Tech
May 12, 2025 · Artificial Intelligence

Universal Recommendation Model (URM): A General Large‑Model Recall System for Advertising

The article presents the Universal Recommendation Model (URM), a large‑language‑model‑based recall framework that integrates world knowledge and e‑commerce expertise through knowledge injection and prompt‑driven alignment, achieving significant offline recall gains and a 3.1% increase in ad consumption while meeting high‑QPS, low‑latency production constraints.

AdvertisingPrompt engineeringhigh QPS
0 likes · 17 min read
Universal Recommendation Model (URM): A General Large‑Model Recall System for Advertising
AntTech
AntTech
May 12, 2025 · Industry Insights

How AI Large Models Are Revolutionizing Multimodal Content Safety

An award‑winning joint project by Shanghai Jiao Tong University and Ant Group unveils a multimodal foundation model and advanced detection techniques that dramatically improve AI‑driven content risk governance across massive online services.

AIAnt GroupContent Safety
0 likes · 3 min read
How AI Large Models Are Revolutionizing Multimodal Content Safety
Alibaba Cloud Developer
Alibaba Cloud Developer
May 9, 2025 · Information Security

What’s New in MCP 2025‑03‑26? Deep Dive into OAuth 2.1, Streamable HTTP, and JSON‑RPC Enhancements

The MCP 2025‑03‑26 release introduces mandatory OAuth 2.1 with PKCE, a single‑endpoint Streamable HTTP transport, required JSON‑RPC batch processing, richer tool metadata, structured progress notifications, audio multimodal support, and robust session management, all backed by extensive security hardening and performance gains.

API SecurityJSON-RPCMCP
0 likes · 14 min read
What’s New in MCP 2025‑03‑26? Deep Dive into OAuth 2.1, Streamable HTTP, and JSON‑RPC Enhancements
Tencent Cloud Developer
Tencent Cloud Developer
May 8, 2025 · Artificial Intelligence

Advances and Future of AI Agents: Capabilities, Trends, and Applications

AI agents are rapidly evolving toward a 2025 breakthrough in perception, autonomous planning, tool use and memory, driven by multimodal models, neural‑symbolic reasoning and embodied intelligence, with $27 billion investment forecasts, exemplified by general‑purpose agents like Manus and emerging applications in code generation, research, healthcare, and risk analysis.

AI AgentAgent FrameworkAutonomous Planning
0 likes · 12 min read
Advances and Future of AI Agents: Capabilities, Trends, and Applications
AI Algorithm Path
AI Algorithm Path
May 2, 2025 · Artificial Intelligence

Qwen3 Launch: Open-Source Models Redefine General AI

The Qwen3 series introduces eight open‑source large language models ranging from 0.6B to 235B parameters, combines dense and Mixture‑of‑Experts architectures, supports multimodal input, offers mixed inference modes, and demonstrates benchmark superiority over leading models such as OpenAI o1 and Gemini 2.5 Pro.

AI agentsBenchmarkMixture of Experts
0 likes · 10 min read
Qwen3 Launch: Open-Source Models Redefine General AI
Data Thinking Notes
Data Thinking Notes
Apr 29, 2025 · Artificial Intelligence

From Transformers to DeepSeek‑R1: How LLMs Evolved to 2025

This article chronicles the evolution of large language models from the 2017 Transformer breakthrough through BERT, GPT series, multimodal models, and recent cost‑efficient innovations like DeepSeek‑R1, highlighting key architectures, training methods, alignment techniques, and their transformative impact on AI applications.

AI AlignmentTransformerlarge language models
0 likes · 29 min read
From Transformers to DeepSeek‑R1: How LLMs Evolved to 2025
DevOps
DevOps
Apr 27, 2025 · Artificial Intelligence

Large Model Technologies: RAG, AI Agents, Multimodal Applications, and Future Trends

This article examines how Retrieval‑Augmented Generation (RAG), AI agents, and multimodal large‑model techniques are reshaping AI‑industry integration, discusses their technical challenges and practical implementations, and outlines future development directions across algorithms, products, and domain‑specific applications.

AI agentsRAGRetrieval-Augmented Generation
0 likes · 14 min read
Large Model Technologies: RAG, AI Agents, Multimodal Applications, and Future Trends
Kuaishou Tech
Kuaishou Tech
Apr 23, 2025 · Artificial Intelligence

Kuaishou's Accepted Papers at ICLR 2025 and Their Summaries

The article highlights Kuashou's eleven high‑quality papers accepted at ICLR 2025, covering advances in streaming video understanding, 3D trajectory control, multimodal talking‑face animation, transformer indexing, efficient video generation, industrial recommendation datasets, token gradient conflict in MoE, stable segmentation, multi‑camera video synthesis, large‑scale multimodal instruction tuning, and hallucination detection in retrieval‑augmented generation.

AIResearchDeepLearningICLR2025
0 likes · 20 min read
Kuaishou's Accepted Papers at ICLR 2025 and Their Summaries
Liangxu Linux
Liangxu Linux
Apr 22, 2025 · Artificial Intelligence

Top 10 Open-Source OCR Projects on GitHub Ranked by Stars

This article compiles a ranked list of ten popular open-source OCR projects on GitHub, summarizing each tool’s key capabilities—such as multimodal text extraction, PDF linearization, layout analysis, and multilingual support—along with star counts and direct repository links for developers seeking ready-to-use OCR solutions.

Computer VisionGitHubOCR
0 likes · 9 min read
Top 10 Open-Source OCR Projects on GitHub Ranked by Stars
AIWalker
AIWalker
Apr 17, 2025 · Artificial Intelligence

Unveiling DeepSeek’s Janus Series: Decoupled Visual Encoding for Unified Multimodal Understanding and Generation

This article provides an in‑depth analysis of DeepSeek’s Janus and Janus‑Pro models, explaining how decoupling visual encoding resolves the conflict between multimodal understanding and generation, detailing training stages, data scaling, architectural choices, and presenting extensive benchmark results that demonstrate significant performance gains.

BenchmarkDeepSeekJanus
0 likes · 23 min read
Unveiling DeepSeek’s Janus Series: Decoupled Visual Encoding for Unified Multimodal Understanding and Generation
58UXD
58UXD
Apr 17, 2025 · Artificial Intelligence

How Zero‑UI and Gemini’s Multimodal AI Are Redefining Human‑Computer Interaction

Zero‑UI, powered by multimodal AI models like Google Gemini, is shifting design from screen‑based interfaces to natural voice, gesture, and environmental interactions, prompting a fundamental redesign of how devices understand user intent across smart homes, cars, and immersive experiences.

AIHuman-Computer InteractionUX design
0 likes · 9 min read
How Zero‑UI and Gemini’s Multimodal AI Are Redefining Human‑Computer Interaction
Baidu Tech Salon
Baidu Tech Salon
Apr 16, 2025 · Artificial Intelligence

Release of the 'Fangsheng' Large Model Benchmark Results (Q1 2025) and Overview of Baidu's Wenxin 4.5 and X1 Models

The China AI Industry Alliance unveiled its Q1 2025 Fangsheng benchmark, showing Baidu’s new multimodal models—Wenxin 4.5 leading basic abilities and Wenxin X1 excelling in reasoning—available for free on the Wenxin Yiyan platform, while Baidu pledges major 2025 investments in AI, data‑center and cloud infrastructure.

AIBenchmarkFactTesting
0 likes · 4 min read
Release of the 'Fangsheng' Large Model Benchmark Results (Q1 2025) and Overview of Baidu's Wenxin 4.5 and X1 Models
JD Tech
JD Tech
Apr 15, 2025 · Artificial Intelligence

Reliable Advertising Creative Generation and Personalized Recommendation via Multimodal Feedback and Offline Representation

The article presents a series of technical breakthroughs by JD's advertising team that improve the quality and coverage of AI‑generated ad images through a trustworthy multimodal feedback network, introduce a large human‑annotated image dataset, and enhance creative ranking with offline multimodal representations and online architecture optimizations, ultimately achieving more precise and scalable ad personalization.

AIAIGCAdvertising
0 likes · 10 min read
Reliable Advertising Creative Generation and Personalized Recommendation via Multimodal Feedback and Offline Representation
58 Tech
58 Tech
Apr 11, 2025 · Artificial Intelligence

Optimization of Multimodal Visual Large Model Inference: Pre‑processing, ViT TensorRT, CUDA Graphs, Tokenization, Prefix Cache, and Quantization

This report details a comprehensive set of optimizations for multimodal visual large‑model (VLM) inference—including image pre‑processing acceleration, TensorRT integration for the ViT module, CUDA‑Graph replay, token‑count reduction, prefix‑cache handling, and weight quantization—demonstrating up to three‑fold throughput gains while maintaining accuracy.

CUDA GraphTensorRTinference-optimization
0 likes · 19 min read
Optimization of Multimodal Visual Large Model Inference: Pre‑processing, ViT TensorRT, CUDA Graphs, Tokenization, Prefix Cache, and Quantization
AntTech
AntTech
Apr 10, 2025 · Artificial Intelligence

Ant Group Presents Four AI Research Papers at ICLR 2025 Live Showcase

At the ICLR 2025 live session in Singapore, Ant Group showcased four cutting‑edge papers—CodePlan, Animate‑X, Group Position Embedding, and OmniKV—demonstrating advances in large‑language‑model reasoning, universal character animation, layout‑aware document understanding, and efficient long‑context inference.

AI researchdocument understandinglarge language models
0 likes · 6 min read
Ant Group Presents Four AI Research Papers at ICLR 2025 Live Showcase
Baidu Geek Talk
Baidu Geek Talk
Apr 9, 2025 · Artificial Intelligence

Baidu's Wenxin X1 Large Model Officially Launches on Qianfan Platform

On April 2, Baidu released its Wenxin X1 large model on the Qianfan platform, offering enterprise users and developers a multimodal, deep‑thinking AI with superior math, coding, and reasoning scores, low token‑price API access, batch inference, one‑click distillation, and rapid RAG/Agent application building.

AIAPI ServiceBaidu
0 likes · 4 min read
Baidu's Wenxin X1 Large Model Officially Launches on Qianfan Platform
AI Algorithm Path
AI Algorithm Path
Apr 6, 2025 · Artificial Intelligence

Meta’s Open-Source Llama 4: 2‑Trillion‑Parameter Behemoth Redefines AI

Meta’s newly released Llama 4 models—Maverick with 4 020 billion total parameters and Scout with 1 090 billion—feature a 128‑expert MoE, 10 million‑token context, native multimodal fusion, and FP8 training, delivering benchmark‑leading performance that outpaces GPT‑4o, Gemini 2.0 Flash and DeepSeek v3, while being openly available on Hugging Face and GitHub.

BenchmarkFP8 trainingLlama 4
0 likes · 8 min read
Meta’s Open-Source Llama 4: 2‑Trillion‑Parameter Behemoth Redefines AI
Fighter's World
Fighter's World
Apr 5, 2025 · Artificial Intelligence

Is Gemini 2.5 Pro the Turning Point for Google’s AI Strategy?

The article analyses Google’s Gemini 2.5 Pro as a decisive shift toward a “Reasoning Model”, detailing its architectural focus on inference, benchmark breakthroughs such as Humanity’s Last Exam and GPQA Diamond, long‑context capability, multimodal strengths, Vibe‑coding experience, and the roadmap for future Gemini models.

AI strategyBenchmarkGemini 2.5 Pro
0 likes · 25 min read
Is Gemini 2.5 Pro the Turning Point for Google’s AI Strategy?
Nightwalker Tech
Nightwalker Tech
Apr 1, 2025 · Artificial Intelligence

Evaluation of AutoGLM: Features, Architecture, and Practical Test Results

This article reviews AutoGLM, the first "think‑while‑doing" AI agent released by Zhipu AI, detailing its core capabilities, full‑stack architecture, user experience, identified limitations, and the outcomes of three hands‑on tests using both the client application and a Chrome extension.

AI AgentAutoGLMevaluation
0 likes · 4 min read
Evaluation of AutoGLM: Features, Architecture, and Practical Test Results
AIWalker
AIWalker
Mar 31, 2025 · Artificial Intelligence

VBench-2.0: A Next‑Generation Benchmark for Intrinsic Faithfulness in AI Video Generation

VBench-2.0 expands the original VBench suite by introducing six fine‑grained dimensions—Human Fidelity, Controllability, Creativity, Physics, Commonsense, and more—to evaluate not only the visual quality of generated videos but also their intrinsic faithfulness to physical laws, common sense, and narrative coherence, providing open‑source tools, prompts, and human‑aligned metrics for the research community.

AI EvaluationBenchmarkIntrinsic Faithfulness
0 likes · 12 min read
VBench-2.0: A Next‑Generation Benchmark for Intrinsic Faithfulness in AI Video Generation
Nightwalker Tech
Nightwalker Tech
Mar 28, 2025 · Artificial Intelligence

Comprehensive Evaluation of GPT-4o Multimodal Image Generation Capabilities

This article presents a thorough assessment of GPT‑4o’s new image generation features, detailing multiple test scenarios—from simple portrait creation and style transfer to UI design, product rendering, and educational illustrations—comparing its output with Claude‑3.7‑Sonnet, highlighting strengths in realism and weaknesses in Chinese text handling.

AI EvaluationGPT-4oimage generation
0 likes · 16 min read
Comprehensive Evaluation of GPT-4o Multimodal Image Generation Capabilities