Tagged articles

Multimodal

422 articles · Page 3 of 5
Kuaishou Tech
Kuaishou Tech
Jul 7, 2025 · Artificial Intelligence

8 Kuaishou Papers Spotlighted at ICML 2025: Multimodal AI, Causal Inference and More

Kuaishou has had eight cutting‑edge papers accepted at the International Conference on Machine Learning 2025, covering breakthroughs in multimodal emotion modeling, monotonic probability learning, causal effect generalization, cascade ranking, multimodal LLM alignment, ultra‑low‑rate image compression, and visual autoregressive super‑resolution, with links to each work and accompanying code repositories.

AIMultimodalRanking
0 likes · 13 min read
8 Kuaishou Papers Spotlighted at ICML 2025: Multimodal AI, Causal Inference and More
DataFunSummit
DataFunSummit
Jul 6, 2025 · Artificial Intelligence

AI-Driven Knowledge Graphs: Key Insights from Multimodal GraphRAG Research

This article presents a comprehensive overview of cutting‑edge research on integrating large language models with knowledge graphs, covering multimodal GraphRAG, financial AI solutions, traditional Chinese medicine decision support, and industry‑specific knowledge services, guiding readers through emerging paradigms and practical implementations.

AIEnterprise AIMultimodal
0 likes · 2 min read
AI-Driven Knowledge Graphs: Key Insights from Multimodal GraphRAG Research
AntTech
AntTech
Jul 3, 2025 · Artificial Intelligence

How Ant Group’s AI Multimodal Evaluation Transforms Image, Speech, and Video Quality Testing

In a QECon 2025 talk, Ant Group’s AI team detailed a comprehensive multimodal evaluation framework that leverages large‑model metrics, custom pipelines, and benchmark datasets to assess image generation, speech recognition, and video quality, while also contributing to industry standards and academic research.

AI evaluationMultimodalimage assessment
0 likes · 16 min read
How Ant Group’s AI Multimodal Evaluation Transforms Image, Speech, and Video Quality Testing
DataFunTalk
DataFunTalk
Jul 3, 2025 · Artificial Intelligence

How Vivo’s Blue Heart XiaoV Leverages LLMs to Transform Conversational Recommendations

In an interview with Vivo AI engineer Liang Tianan, the article explores the challenges of post‑Q&A recommendation, the integration of large language models into recall, ranking and evaluation pipelines, and the engineering trade‑offs required to deliver high‑quality, diverse suggestions on mobile devices.

EvaluationLLMMultimodal
0 likes · 15 min read
How Vivo’s Blue Heart XiaoV Leverages LLMs to Transform Conversational Recommendations
DataFunTalk
DataFunTalk
Jun 29, 2025 · Artificial Intelligence

Large Models Boost Douyin User Experience: Expert Insights

In an interview at the DA Digital Intelligence Conference, ByteDance AI specialist Cai Conghuai explains how large language models, combined with techniques like SFT, DPO, and RAG, are reshaping Douyin's user‑experience signal detection, root‑cause analysis, and evaluation, while outlining future AI‑agent breakthroughs.

AIDPOLarge Language Models
0 likes · 12 min read
Large Models Boost Douyin User Experience: Expert Insights
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 26, 2025 · Artificial Intelligence

How to Build a Multi‑Dimensional Evaluation Framework for AI‑Powered Data Analysis Platforms

This article outlines the design of a scientific, quantifiable, multi‑dimensional evaluation system for the DataV‑Note intelligent analysis platform, addressing the lack of unified standards and accuracy challenges in AI‑driven data reporting, and proposes concrete metrics, model architecture, and future automation plans.

AI evaluationModel DesignMultimodal
0 likes · 13 min read
How to Build a Multi‑Dimensional Evaluation Framework for AI‑Powered Data Analysis Platforms
Open Source Linux
Open Source Linux
Jun 12, 2025 · Artificial Intelligence

From Transformers to DeepSeek‑R1: The Evolution of Large Language Models (2017‑2025)

This article chronicles the rapid development of large language models from the 2017 Transformer breakthrough through the rise of BERT, GPT‑3, multimodal models, alignment techniques like RLHF, and finally the cost‑efficient DeepSeek‑R1 in 2025, highlighting key innovations, scaling trends, and real‑world impacts.

AI alignmentDeep LearningLarge Language Models
0 likes · 26 min read
From Transformers to DeepSeek‑R1: The Evolution of Large Language Models (2017‑2025)
AI Algorithm Path
AI Algorithm Path
Jun 11, 2025 · Artificial Intelligence

OpenAI's O3‑Pro Model: Deep Reasoning, Pricing, Benchmarks, and Access Guide

OpenAI introduced the O3‑Pro multimodal deep‑reasoning model with an 80% price cut for O3, detailed its training via large‑scale reinforcement learning, compared its capabilities and costs against GPT‑4o, GPT‑4.1 and O3‑Pro, listed its core specs, limitations, access methods, and presented benchmark tests that highlight both strengths and weaknesses.

AIBenchmarkMultimodal
0 likes · 10 min read
OpenAI's O3‑Pro Model: Deep Reasoning, Pricing, Benchmarks, and Access Guide
Kuaishou Audio & Video Technology
Kuaishou Audio & Video Technology
Jun 11, 2025 · Artificial Intelligence

Kuaishou Showcases 12 Cutting-Edge CVPR 2025 Papers on Video Generation and AI

Kuaishou presented twelve peer‑reviewed papers at CVPR 2025 covering video quality assessment, large‑scale video datasets, dynamic 3D avatar reconstruction, 4D scene simulation, controllable video generation, scaling laws for diffusion transformers, multimodal foundations, and more, highlighting the company's leading research in computer vision and AI.

AI researchCVPR2025Deep Learning
0 likes · 21 min read
Kuaishou Showcases 12 Cutting-Edge CVPR 2025 Papers on Video Generation and AI
DataFunSummit
DataFunSummit
Jun 10, 2025 · Artificial Intelligence

How Quwan’s Kaitian Model Tackles Emotional AI for Social Apps – Architecture, Training Tricks, and Safety

Quwan Technology presents its Kaitian social large model, designed for personalized, emotionally rich, multimodal AI interactions, detailing its scene‑specific goals, CPT+SFT+RLHF training pipeline, data desensitization, LoRA fine‑tuning, evaluation methods, pruning, latency trade‑offs, safety mechanisms, and future feedback loops.

AI safetyLoRAModel Pruning
0 likes · 13 min read
How Quwan’s Kaitian Model Tackles Emotional AI for Social Apps – Architecture, Training Tricks, and Safety
Kuaishou Tech
Kuaishou Tech
Jun 5, 2025 · Artificial Intelligence

7 Kuaishou AI Papers Accepted at ACL 2025: Video Understanding & Safe LLM Decoding

Kuaishou’s foundational large-model team has secured seven papers at ACL 2025, spanning alignment bias in training, safety defenses during inference, decoding strategies, fine-grained video-temporal understanding, reward fairness in RLHF, multimodal captioning benchmarks, and methods to curb hallucinations in vision-language models.

ACLAI safetyBenchmark
0 likes · 13 min read
7 Kuaishou AI Papers Accepted at ACL 2025: Video Understanding & Safe LLM Decoding
Fighter's World
Fighter's World
Jun 2, 2025 · Artificial Intelligence

Why Is Context King for Large Language Models?

This article provides a comprehensive technical analysis of LLM context, covering its definition, types, tokenization, window‑size evolution, diminishing returns, management techniques such as RAG, CoT, memory‑as‑a‑service, and future challenges like multimodal fusion, privacy, and autonomous agent memory.

Agent MemoryContext ManagementLLM
0 likes · 48 min read
Why Is Context King for Large Language Models?
Baidu MEUX
Baidu MEUX
May 28, 2025 · Artificial Intelligence

Top 10 AI Breakthroughs This Week: New Models, Tools, and Industry Moves

This roundup highlights ten recent AI developments, from Apple's Matrix3D model that creates 3D scenes from photos, to Qwen's Deep Research assistant, Tencent's CodeBuddy 3.0, ByteDance's Seed1.5‑VL, Step Star's open‑source Step1X‑3D, Google's iOS icon refresh, Apple's eye‑tracking scrolling test, Chrome's upcoming Gemini AI assistant, Shanghai's AI Identity Ecosystem Alliance, and Kuaishou's Keling AI 2.0 topping the global video‑generation leaderboard.

3D generationAI assistantsAI models
0 likes · 5 min read
Top 10 AI Breakthroughs This Week: New Models, Tools, and Industry Moves
DataFunTalk
DataFunTalk
May 23, 2025 · Artificial Intelligence

2025 AI Landscape: Inference Models Dominate, Open‑Source Momentum Accelerates

The 2025 Q1 AI report from Artificial Analysis highlights six major trends—including a thousand‑fold drop in inference cost, the rise of MoE models, the growing parity of Chinese open‑source labs, the emergence of autonomous AI agents, native multimodal capabilities, and the trade‑off between performance, cost, and context windows—painting a picture of a rapidly evolving, increasingly competitive AI ecosystem.

AIAgentsMultimodal
0 likes · 11 min read
2025 AI Landscape: Inference Models Dominate, Open‑Source Momentum Accelerates
Baidu Tech Salon
Baidu Tech Salon
May 21, 2025 · Artificial Intelligence

Baidu AI Day 2024: Wenxin X1 Turbo Sets New Benchmark with Top‑Level Evaluation and Advanced Multimodal Capabilities

At Baidu AI Day in Beijing, the company unveiled the Wenxin 4.5 Turbo and X1 Turbo models, detailing multimodal training breakthroughs, self‑feedback loops, enhanced reasoning and tool‑calling, while the China Academy of Information and Communications Technology awarded X1 Turbo the highest "4+" rating across 24 capability tests, highlighting its leading position in domestic large‑model performance.

BaiduMultimodalWenxin
0 likes · 9 min read
Baidu AI Day 2024: Wenxin X1 Turbo Sets New Benchmark with Top‑Level Evaluation and Advanced Multimodal Capabilities
Tencent Technical Engineering
Tencent Technical Engineering
May 19, 2025 · Artificial Intelligence

RAG, Agents, and Multimodal Large Models: Evolution, Challenges, and Future Trends

This article examines the evolution of large model technologies—including Retrieval‑Augmented Generation, AI agents, and multimodal models—detailing their technical foundations, practical challenges, industry applications, and future development trends, offering a comprehensive perspective for AI practitioners and researchers.

AI AgentMultimodalRAG
0 likes · 14 min read
RAG, Agents, and Multimodal Large Models: Evolution, Challenges, and Future Trends
Bilibili Tech
Bilibili Tech
May 16, 2025 · Artificial Intelligence

How FineVQ Sets New Standards for Fine‑Grained UGC Video Quality Assessment

The article introduces FineVD, the first large‑scale multi‑dimensional UGC video quality dataset, and presents FineVQ, a unified model that predicts quality scores, attributes, and distortion types across six dimensions, achieving state‑of‑the‑art performance on multiple benchmarks and cross‑dataset evaluations.

Deep LearningFineVQMultimodal
0 likes · 9 min read
How FineVQ Sets New Standards for Fine‑Grained UGC Video Quality Assessment
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
May 14, 2025 · Artificial Intelligence

Hands‑On CLIP: Implementing Multimodal Vision‑Language Understanding

This article introduces OpenAI’s CLIP multimodal model, explains its architecture and contrastive training, details hardware and installation steps, and demonstrates a hands‑on zero‑shot image classification workflow that achieves 97% confidence on a cat image without any task‑specific fine‑tuning.

CLIPMultimodalPython
0 likes · 6 min read
Hands‑On CLIP: Implementing Multimodal Vision‑Language Understanding
DevOps
DevOps
May 13, 2025 · Artificial Intelligence

The Rise of AI Agents: Current Trends, Core Capabilities, and Future Outlook

This article surveys the rapid emergence of AI agents, outlining their projected 2025 breakthrough, market momentum, key frameworks such as Manus and MCP, the four core abilities of perception, planning, tool use, and memory, and the evolving landscape of multimodal and autonomous AI systems.

AI agentsArtificial IntelligenceMultimodal
0 likes · 11 min read
The Rise of AI Agents: Current Trends, Core Capabilities, and Future Outlook
DataFunSummit
DataFunSummit
May 13, 2025 · Artificial Intelligence

Integrating Large Language Models and Knowledge Graphs for Financial Applications: Challenges, Solutions, and Future Directions

This talk explores the technical challenges of applying large language models and knowledge graphs in finance, discusses solutions such as RAG enhancements, graph‑guided retrieval, multimodal extensions, and presents future research directions including multimodal graph integration, agentic systems, and decision‑making applications.

AIAgentic SystemsMultimodal
0 likes · 33 min read
Integrating Large Language Models and Knowledge Graphs for Financial Applications: Challenges, Solutions, and Future Directions
Alimama Tech
Alimama Tech
May 12, 2025 · Artificial Intelligence

Universal Recommendation Model (URM): A General Large‑Model Recall System for Advertising

The article presents the Universal Recommendation Model (URM), a large‑language‑model‑based recall framework that integrates world knowledge and e‑commerce expertise through knowledge injection and prompt‑driven alignment, achieving significant offline recall gains and a 3.1% increase in ad consumption while meeting high‑QPS, low‑latency production constraints.

AdvertisingMultimodalPrompt engineering
0 likes · 17 min read
Universal Recommendation Model (URM): A General Large‑Model Recall System for Advertising
AntTech
AntTech
May 12, 2025 · Industry Insights

How AI Large Models Are Revolutionizing Multimodal Content Safety

An award‑winning joint project by Shanghai Jiao Tong University and Ant Group unveils a multimodal foundation model and advanced detection techniques that dramatically improve AI‑driven content risk governance across massive online services.

AIAnt GroupContent Safety
0 likes · 3 min read
How AI Large Models Are Revolutionizing Multimodal Content Safety
Alibaba Cloud Developer
Alibaba Cloud Developer
May 9, 2025 · Information Security

What’s New in MCP 2025‑03‑26? Deep Dive into OAuth 2.1, Streamable HTTP, and JSON‑RPC Enhancements

The MCP 2025‑03‑26 release introduces mandatory OAuth 2.1 with PKCE, a single‑endpoint Streamable HTTP transport, required JSON‑RPC batch processing, richer tool metadata, structured progress notifications, audio multimodal support, and robust session management, all backed by extensive security hardening and performance gains.

API SecurityJSON-RPCMCP
0 likes · 14 min read
What’s New in MCP 2025‑03‑26? Deep Dive into OAuth 2.1, Streamable HTTP, and JSON‑RPC Enhancements
Tencent Cloud Developer
Tencent Cloud Developer
May 8, 2025 · Artificial Intelligence

Advances and Future of AI Agents: Capabilities, Trends, and Applications

AI agents are rapidly evolving toward a 2025 breakthrough in perception, autonomous planning, tool use and memory, driven by multimodal models, neural‑symbolic reasoning and embodied intelligence, with $27 billion investment forecasts, exemplified by general‑purpose agents like Manus and emerging applications in code generation, research, healthcare, and risk analysis.

AI AgentAgent frameworkAutonomous Planning
0 likes · 12 min read
Advances and Future of AI Agents: Capabilities, Trends, and Applications
AI Algorithm Path
AI Algorithm Path
May 2, 2025 · Artificial Intelligence

Qwen3 Launch: Open-Source Models Redefine General AI

The Qwen3 series introduces eight open‑source large language models ranging from 0.6B to 235B parameters, combines dense and Mixture‑of‑Experts architectures, supports multimodal input, offers mixed inference modes, and demonstrates benchmark superiority over leading models such as OpenAI o1 and Gemini 2.5 Pro.

AI agentsBenchmarkMixture of Experts
0 likes · 10 min read
Qwen3 Launch: Open-Source Models Redefine General AI
Data Thinking Notes
Data Thinking Notes
Apr 29, 2025 · Artificial Intelligence

From Transformers to DeepSeek‑R1: How LLMs Evolved to 2025

This article chronicles the evolution of large language models from the 2017 Transformer breakthrough through BERT, GPT series, multimodal models, and recent cost‑efficient innovations like DeepSeek‑R1, highlighting key architectures, training methods, alignment techniques, and their transformative impact on AI applications.

AI alignmentLarge Language ModelsMultimodal
0 likes · 29 min read
From Transformers to DeepSeek‑R1: How LLMs Evolved to 2025
DevOps
DevOps
Apr 27, 2025 · Artificial Intelligence

Large Model Technologies: RAG, AI Agents, Multimodal Applications, and Future Trends

This article examines how Retrieval‑Augmented Generation (RAG), AI agents, and multimodal large‑model techniques are reshaping AI‑industry integration, discusses their technical challenges and practical implementations, and outlines future development directions across algorithms, products, and domain‑specific applications.

AI agentsArtificial IntelligenceMultimodal
0 likes · 14 min read
Large Model Technologies: RAG, AI Agents, Multimodal Applications, and Future Trends
Kuaishou Tech
Kuaishou Tech
Apr 23, 2025 · Artificial Intelligence

Kuaishou's Accepted Papers at ICLR 2025 and Their Summaries

The article highlights Kuashou's eleven high‑quality papers accepted at ICLR 2025, covering advances in streaming video understanding, 3D trajectory control, multimodal talking‑face animation, transformer indexing, efficient video generation, industrial recommendation datasets, token gradient conflict in MoE, stable segmentation, multi‑camera video synthesis, large‑scale multimodal instruction tuning, and hallucination detection in retrieval‑augmented generation.

AIResearchDeepLearningICLR2025
0 likes · 20 min read
Kuaishou's Accepted Papers at ICLR 2025 and Their Summaries
Liangxu Linux
Liangxu Linux
Apr 22, 2025 · Artificial Intelligence

Top 10 Open-Source OCR Projects on GitHub Ranked by Stars

This article compiles a ranked list of ten popular open-source OCR projects on GitHub, summarizing each tool’s key capabilities—such as multimodal text extraction, PDF linearization, layout analysis, and multilingual support—along with star counts and direct repository links for developers seeking ready-to-use OCR solutions.

GitHubMultimodalOCR
0 likes · 9 min read
Top 10 Open-Source OCR Projects on GitHub Ranked by Stars
AIWalker
AIWalker
Apr 17, 2025 · Artificial Intelligence

Unveiling DeepSeek’s Janus Series: Decoupled Visual Encoding for Unified Multimodal Understanding and Generation

This article provides an in‑depth analysis of DeepSeek’s Janus and Janus‑Pro models, explaining how decoupling visual encoding resolves the conflict between multimodal understanding and generation, detailing training stages, data scaling, architectural choices, and presenting extensive benchmark results that demonstrate significant performance gains.

BenchmarkDeepSeekJanus
0 likes · 23 min read
Unveiling DeepSeek’s Janus Series: Decoupled Visual Encoding for Unified Multimodal Understanding and Generation
58UXD
58UXD
Apr 17, 2025 · Artificial Intelligence

How Zero‑UI and Gemini’s Multimodal AI Are Redefining Human‑Computer Interaction

Zero‑UI, powered by multimodal AI models like Google Gemini, is shifting design from screen‑based interfaces to natural voice, gesture, and environmental interactions, prompting a fundamental redesign of how devices understand user intent across smart homes, cars, and immersive experiences.

AIHuman-Computer InteractionMultimodal
0 likes · 9 min read
How Zero‑UI and Gemini’s Multimodal AI Are Redefining Human‑Computer Interaction
Baidu Tech Salon
Baidu Tech Salon
Apr 16, 2025 · Artificial Intelligence

Release of the 'Fangsheng' Large Model Benchmark Results (Q1 2025) and Overview of Baidu's Wenxin 4.5 and X1 Models

The China AI Industry Alliance unveiled its Q1 2025 Fangsheng benchmark, showing Baidu’s new multimodal models—Wenxin 4.5 leading basic abilities and Wenxin X1 excelling in reasoning—available for free on the Wenxin Yiyan platform, while Baidu pledges major 2025 investments in AI, data‑center and cloud infrastructure.

AIBenchmarkFactTesting
0 likes · 4 min read
Release of the 'Fangsheng' Large Model Benchmark Results (Q1 2025) and Overview of Baidu's Wenxin 4.5 and X1 Models
JD Tech
JD Tech
Apr 15, 2025 · Artificial Intelligence

Reliable Advertising Creative Generation and Personalized Recommendation via Multimodal Feedback and Offline Representation

The article presents a series of technical breakthroughs by JD's advertising team that improve the quality and coverage of AI‑generated ad images through a trustworthy multimodal feedback network, introduce a large human‑annotated image dataset, and enhance creative ranking with offline multimodal representations and online architecture optimizations, ultimately achieving more precise and scalable ad personalization.

AIAIGCAdvertising
0 likes · 10 min read
Reliable Advertising Creative Generation and Personalized Recommendation via Multimodal Feedback and Offline Representation
58 Tech
58 Tech
Apr 11, 2025 · Artificial Intelligence

Optimization of Multimodal Visual Large Model Inference: Pre‑processing, ViT TensorRT, CUDA Graphs, Tokenization, Prefix Cache, and Quantization

This report details a comprehensive set of optimizations for multimodal visual large‑model (VLM) inference—including image pre‑processing acceleration, TensorRT integration for the ViT module, CUDA‑Graph replay, token‑count reduction, prefix‑cache handling, and weight quantization—demonstrating up to three‑fold throughput gains while maintaining accuracy.

CUDA GraphMultimodalQuantization
0 likes · 19 min read
Optimization of Multimodal Visual Large Model Inference: Pre‑processing, ViT TensorRT, CUDA Graphs, Tokenization, Prefix Cache, and Quantization
AntTech
AntTech
Apr 10, 2025 · Artificial Intelligence

Ant Group Presents Four AI Research Papers at ICLR 2025 Live Showcase

At the ICLR 2025 live session in Singapore, Ant Group showcased four cutting‑edge papers—CodePlan, Animate‑X, Group Position Embedding, and OmniKV—demonstrating advances in large‑language‑model reasoning, universal character animation, layout‑aware document understanding, and efficient long‑context inference.

AI researchLarge Language ModelsLong Context
0 likes · 6 min read
Ant Group Presents Four AI Research Papers at ICLR 2025 Live Showcase
Baidu Geek Talk
Baidu Geek Talk
Apr 9, 2025 · Artificial Intelligence

Baidu's Wenxin X1 Large Model Officially Launches on Qianfan Platform

On April 2, Baidu released its Wenxin X1 large model on the Qianfan platform, offering enterprise users and developers a multimodal, deep‑thinking AI with superior math, coding, and reasoning scores, low token‑price API access, batch inference, one‑click distillation, and rapid RAG/Agent application building.

AIAPI ServiceBaidu
0 likes · 4 min read
Baidu's Wenxin X1 Large Model Officially Launches on Qianfan Platform
AI Algorithm Path
AI Algorithm Path
Apr 6, 2025 · Artificial Intelligence

Meta’s Open-Source Llama 4: 2‑Trillion‑Parameter Behemoth Redefines AI

Meta’s newly released Llama 4 models—Maverick with 4 020 billion total parameters and Scout with 1 090 billion—feature a 128‑expert MoE, 10 million‑token context, native multimodal fusion, and FP8 training, delivering benchmark‑leading performance that outpaces GPT‑4o, Gemini 2.0 Flash and DeepSeek v3, while being openly available on Hugging Face and GitHub.

BenchmarkFP8 trainingLlama 4
0 likes · 8 min read
Meta’s Open-Source Llama 4: 2‑Trillion‑Parameter Behemoth Redefines AI
Fighter's World
Fighter's World
Apr 5, 2025 · Artificial Intelligence

Is Gemini 2.5 Pro the Turning Point for Google’s AI Strategy?

The article analyses Google’s Gemini 2.5 Pro as a decisive shift toward a “Reasoning Model”, detailing its architectural focus on inference, benchmark breakthroughs such as Humanity’s Last Exam and GPQA Diamond, long‑context capability, multimodal strengths, Vibe‑coding experience, and the roadmap for future Gemini models.

AI StrategyBenchmarkGemini 2.5 Pro
0 likes · 25 min read
Is Gemini 2.5 Pro the Turning Point for Google’s AI Strategy?
Nightwalker Tech
Nightwalker Tech
Apr 1, 2025 · Artificial Intelligence

Evaluation of AutoGLM: Features, Architecture, and Practical Test Results

This article reviews AutoGLM, the first "think‑while‑doing" AI agent released by Zhipu AI, detailing its core capabilities, full‑stack architecture, user experience, identified limitations, and the outcomes of three hands‑on tests using both the client application and a Chrome extension.

AI AgentAutoGLMEvaluation
0 likes · 4 min read
Evaluation of AutoGLM: Features, Architecture, and Practical Test Results
AIWalker
AIWalker
Mar 31, 2025 · Artificial Intelligence

VBench-2.0: A Next‑Generation Benchmark for Intrinsic Faithfulness in AI Video Generation

VBench-2.0 expands the original VBench suite by introducing six fine‑grained dimensions—Human Fidelity, Controllability, Creativity, Physics, Commonsense, and more—to evaluate not only the visual quality of generated videos but also their intrinsic faithfulness to physical laws, common sense, and narrative coherence, providing open‑source tools, prompts, and human‑aligned metrics for the research community.

AI evaluationBenchmarkIntrinsic Faithfulness
0 likes · 12 min read
VBench-2.0: A Next‑Generation Benchmark for Intrinsic Faithfulness in AI Video Generation
Nightwalker Tech
Nightwalker Tech
Mar 28, 2025 · Artificial Intelligence

Comprehensive Evaluation of GPT-4o Multimodal Image Generation Capabilities

This article presents a thorough assessment of GPT‑4o’s new image generation features, detailing multiple test scenarios—from simple portrait creation and style transfer to UI design, product rendering, and educational illustrations—comparing its output with Claude‑3.7‑Sonnet, highlighting strengths in realism and weaknesses in Chinese text handling.

AI evaluationGPT-4oMultimodal
0 likes · 16 min read
Comprehensive Evaluation of GPT-4o Multimodal Image Generation Capabilities
Meituan Technology Team
Meituan Technology Team
Mar 27, 2025 · Artificial Intelligence

Q-Eval-100K Dataset and Q-Eval-Score Evaluation Framework for Text-to-Visual Generation

The Q‑Eval‑100K dataset, comprising 100 k AIGC images and videos with separate visual‑quality and textual‑consistency annotations, powers the open‑source Q‑Eval‑Score framework that fine‑tunes multimodal models to deliver state‑of‑the‑art, scalable, and objective evaluation—including a “vague‑to‑specific” strategy for long prompts—surpassing existing benchmarks.

AIGCEvaluationMultimodal
0 likes · 9 min read
Q-Eval-100K Dataset and Q-Eval-Score Evaluation Framework for Text-to-Visual Generation
37 Interactive Technology Team
37 Interactive Technology Team
Mar 26, 2025 · Artificial Intelligence

LUI vs GUI: Choosing the Right Interface for AI Product Design

When designing AI products, choosing between a Language User Interface—leveraging speech recognition, NLP, and conversational flexibility—and a Graphical User Interface—relying on visual icons, layouts, and intuitive interaction—depends on technology maturity, response speed, and user learning cost, while emerging multimodal designs increasingly blend both for richer, context‑aware experiences.

AIGUIInteraction
0 likes · 11 min read
LUI vs GUI: Choosing the Right Interface for AI Product Design
JD Retail Technology
JD Retail Technology
Mar 25, 2025 · Artificial Intelligence

2024 Advances in Advertising Creative Generation and Selection

In 2024 the advertising team deployed an end‑to‑end AIGC pipeline that automatically creates high‑quality ad images, uses the multimodal Reliable Feedback Network and the million‑size RF1M dataset to filter outputs, builds rich offline and online multimodal representations with contrastive and list‑wise learning, and optimizes ranking architecture to deliver scalable, personalized creative selection.

AIAIGCAdvertising
0 likes · 10 min read
2024 Advances in Advertising Creative Generation and Selection
Amap Tech
Amap Tech
Mar 19, 2025 · Artificial Intelligence

Driving by the Rules: Integrating Lane-Level Traffic Regulations into Online HD Maps

Gaode Map and Xi'an Jiaotong University introduce the “Driving by the Rules” task, releasing the MapDR benchmark that integrates lane‑level traffic‑sign regulations into online‑constructed HD maps, and provide modular (VLE‑MEE) and end‑to‑end (RuleVLM) baselines to evaluate rule extraction and lane association.

AIBenchmarkHD maps
0 likes · 8 min read
Driving by the Rules: Integrating Lane-Level Traffic Regulations into Online HD Maps
IT Services Circle
IT Services Circle
Mar 19, 2025 · Artificial Intelligence

ByteDance’s AI Video Generation Model Goku, Streamer‑Sales Live‑Selling Model, and MimicTalk 3D Talking‑Head Project

ByteDance and partners open‑source three AI projects—Goku for high‑quality text‑to‑video generation, Streamer‑Sales for multimodal live‑selling LLMs, and MimicTalk for rapid 3D talking‑head creation—detailing their core features, underlying transformer‑based architectures, training pipelines, and public repositories.

AI video generationMultimodalTransformer
0 likes · 5 min read
ByteDance’s AI Video Generation Model Goku, Streamer‑Sales Live‑Selling Model, and MimicTalk 3D Talking‑Head Project
JD Tech Talk
JD Tech Talk
Mar 19, 2025 · Artificial Intelligence

Reliable Advertising Image Generation and Creative Selection Using Multimodal Feedback and MLLM Representations

The 2024 advertising team introduced a suite of AI‑driven techniques—including a trustworthy feedback network, a large‑scale human‑annotated dataset, multimodal large language model representations, and online ranking architecture upgrades—to dramatically improve the quality, coverage, and personalization of generated ad creatives.

AIGCAdvertisingMLLM
0 likes · 10 min read
Reliable Advertising Image Generation and Creative Selection Using Multimodal Feedback and MLLM Representations
JD Cloud Developers
JD Cloud Developers
Mar 19, 2025 · Artificial Intelligence

How AIGC Boosts Ad Creative Quality: Trustworthy Image Generation & Selection

2024 saw the advertising team achieve major breakthroughs in AI-generated ad creatives by introducing a multimodal reliable feedback network to improve image usability, releasing a large human-annotated dataset, and leveraging multimodal large language models for richer representation and more effective online/offline creative selection.

AIGCMultimodalad optimization
0 likes · 10 min read
How AIGC Boosts Ad Creative Quality: Trustworthy Image Generation & Selection
NewBeeNLP
NewBeeNLP
Mar 18, 2025 · Interview Experience

How to Ace Multimodal Model Interviews at Taobao's Search AI Division

This article recounts a three‑stage interview for a multimodal large‑model position at Taobao's Search AI division, detailing typical questions on CLIP, LoRA, BLIP, Qwen‑VL, Transformer fundamentals, RLHF, and coding challenges, and offers insights on what interviewers focus on.

AICLIPLoRA
0 likes · 5 min read
How to Ace Multimodal Model Interviews at Taobao's Search AI Division
Code Mala Tang
Code Mala Tang
Mar 15, 2025 · Artificial Intelligence

What Makes Google’s New Gemma 3 Model a Game‑Changer for AI Developers?

Google’s Gemma 3, a lightweight open‑source model with up to 27 billion parameters, offers multimodal input, 128K token context, and broad language support, outperforming leading rivals on single‑GPU benchmarks and providing flexible deployment options for developers and researchers alike.

AI modelGemma 3Google AI
0 likes · 9 min read
What Makes Google’s New Gemma 3 Model a Game‑Changer for AI Developers?
AIWalker
AIWalker
Mar 7, 2025 · Artificial Intelligence

How GIFNet’s Low‑Level Interaction Breakthrough Enables Universal Multimodal Fusion Across Tasks

The paper introduces GIFNet, a three‑branch network that leverages low‑level visual tasks and a cross‑fusion gating mechanism to achieve a single, task‑agnostic image‑fusion model with dramatically reduced computation, strong generalization to unseen modalities, and even single‑modal enhancement capabilities.

CVPR2025GIFNetImage Fusion
0 likes · 20 min read
How GIFNet’s Low‑Level Interaction Breakthrough Enables Universal Multimodal Fusion Across Tasks
DaTaobao Tech
DaTaobao Tech
Mar 7, 2025 · Artificial Intelligence

Taobao Content AI: Summary of AIGC Content Generation and Multimodal Model Techniques

Taobao’s AIGC pipeline combines a human‑feedback multimodal reward model, audio‑visual joint pre‑training, and Mixture‑of‑Experts distillation to clean data, align outputs with user preferences, and achieve state‑of‑the‑art multimodal LLM performance that drives content cold‑start and conversion gains in e‑commerce.

AIGCContent GenerationMultimodal
0 likes · 10 min read
Taobao Content AI: Summary of AIGC Content Generation and Multimodal Model Techniques
Cognitive Technology Team
Cognitive Technology Team
Mar 7, 2025 · Artificial Intelligence

From Word Embeddings to Large Language Models: A Comprehensive Overview of AI Model Evolution

This article traces the development of AI models—from early word embeddings like Word2Vec and ELMo, through transformer‑based encoders such as BERT and decoder‑only models like GPT‑1/2/3, to recent multimodal systems and scaling laws—explaining their architectures, training methods, and impact on modern AI applications.

AIEmbeddingLarge Language Models
0 likes · 22 min read
From Word Embeddings to Large Language Models: A Comprehensive Overview of AI Model Evolution
DaTaobao Tech
DaTaobao Tech
Mar 5, 2025 · Artificial Intelligence

Multimodal Large‑Model Cover Generation AI Agent for Taobao Video and Live Streams

Taobao’s new multimodal AI Agent automatically creates high‑quality static and dynamic video covers by planning tasks, consulting a memory of quality criteria, executing frame selection with ReKV streaming and dual‑stage evaluation, generating marketing copy via fine‑tuned Qwen2.5‑7B, and refining layout, resulting in significantly higher click‑through rates, lower latency, and reduced manual effort.

AIMultimodalVideo Processing
0 likes · 17 min read
Multimodal Large‑Model Cover Generation AI Agent for Taobao Video and Live Streams
DaTaobao Tech
DaTaobao Tech
Mar 3, 2025 · Artificial Intelligence

How Taobao’s “Faxiang” AI Model Revolutionizes E‑Commerce Video Generation

Taobao’s AIGC video generation platform, built on a large‑scale “Faxiang” model that evolved from UNet to DiT, leverages over 2 billion curated e‑commerce videos, expert alignment, Lora fine‑tuning, and multi‑control capabilities to deliver diverse, high‑quality product videos that dramatically boost conversion metrics across the marketplace.

AI video generationAIGCMultimodal
0 likes · 11 min read
How Taobao’s “Faxiang” AI Model Revolutionizes E‑Commerce Video Generation
JD Retail Technology
JD Retail Technology
Mar 1, 2025 · Industry Insights

How JD Retail’s AI Assistant Uses Multimodal LLMs to Boost E‑Commerce

JD Retail’s AI assistant combines a Master‑Sub agent framework, ReAct paradigm, multimodal integration and MoE architecture to improve sales forecasting, pricing, and recommendation accuracy, while the team’s collaborative culture and open talent pathways illustrate how cutting‑edge AI is applied in real‑world e‑commerce.

AIJD RetailLLM
0 likes · 8 min read
How JD Retail’s AI Assistant Uses Multimodal LLMs to Boost E‑Commerce
AIWalker
AIWalker
Feb 20, 2025 · Artificial Intelligence

Transfusion: A Single Model for Unified Image Generation and Understanding

Transfusion is a 7B‑parameter transformer that jointly trains language modeling and diffusion losses on mixed text‑image data, enabling seamless text generation, image generation, and image understanding within one model and outperforming prior multimodal approaches such as Chameleon across multiple benchmarks.

AI researchLanguage ModelingMultimodal
0 likes · 20 min read
Transfusion: A Single Model for Unified Image Generation and Understanding
Architect
Architect
Feb 16, 2025 · Artificial Intelligence

DeepSeek-V3, DeepSeek-R1, and Janus‑Pro: Architecture, Training Techniques, and Performance Insights

This article provides an in‑depth technical overview of DeepSeek‑V3, DeepSeek‑R1 and Janus‑Pro models, covering their Mixture‑of‑Experts architecture, novel MLA attention, auxiliary‑loss‑free load balancing, multi‑token prediction, FP8 mixed‑precision training, efficient cross‑node communication, reinforcement‑learning pipelines, multimodal modeling strategies, performance comparisons, cost statistics, and current limitations.

AI ArchitectureDeepSeek-V3FP8 training
0 likes · 18 min read
DeepSeek-V3, DeepSeek-R1, and Janus‑Pro: Architecture, Training Techniques, and Performance Insights
AIWalker
AIWalker
Feb 16, 2025 · Artificial Intelligence

VARGPT: A Unified Autoregressive Architecture for Multimodal Understanding and Generation

VARGPT is a novel multimodal large language model that unifies visual understanding and autoregressive image generation within a single architecture, extending LLaVA with next‑token and next‑scale prediction, trained through three staged data‑curated phases and achieving superior performance on numerous vision‑language benchmarks.

AI researchMultimodalVARGPT
0 likes · 20 min read
VARGPT: A Unified Autoregressive Architecture for Multimodal Understanding and Generation
Architects' Tech Alliance
Architects' Tech Alliance
Feb 16, 2025 · Artificial Intelligence

How DeepSeek’s Distillation Breaks Bottlenecks and Boosts Multimodal AI Performance

This article provides an in‑depth technical analysis of DeepSeek’s model distillation technology, covering its core principles, innovative data‑model fusion strategies, architecture design, training optimizations, performance benchmarks, and the remaining challenges of scaling distillation to multimodal tasks.

DeepSeekLarge Language ModelsMultimodal
0 likes · 16 min read
How DeepSeek’s Distillation Breaks Bottlenecks and Boosts Multimodal AI Performance
AI Code to Success
AI Code to Success
Jan 23, 2025 · Industry Insights

Core Tech vs Application Optimization: Where’s the Real Battleground in the AI Large‑Model Race?

The article analyzes the 2025 AI large‑model landscape, contrasting slowing foundational breakthroughs with fierce application competition, highlighting MiniMax’s low‑cost linear‑attention models, multimodal advances, and the strategic shift from price wars to sustainable, technology‑driven growth.

AIIndustry AnalysisMultimodal
0 likes · 7 min read
Core Tech vs Application Optimization: Where’s the Real Battleground in the AI Large‑Model Race?
DataFunSummit
DataFunSummit
Jan 22, 2025 · Artificial Intelligence

RAG2.0 Engine Design Challenges and Implementation

This article presents a comprehensive overview of the RAG2.0 engine design, covering RAG1.0 limitations, effective chunking methods, accurate retrieval techniques, advanced multimodal processing, hybrid search strategies, database indexing choices, and future directions such as agentic RAG and memory‑enhanced models.

ChunkingHybrid SearchMultimodal
0 likes · 23 min read
RAG2.0 Engine Design Challenges and Implementation
AI Code to Success
AI Code to Success
Jan 16, 2025 · Industry Insights

How MiniMax’s Open‑Source Linear‑Attention Model Is Shaking Up the Global AI Landscape

MiniMax, a Shanghai‑based AI unicorn, has open‑sourced its MiniMax‑01 series featuring large‑scale linear attention, secured $600 million in funding, launched multimodal products like Talkie and Hailuo AI, and is positioning itself as a competitive force amid rising geopolitical tensions in the global artificial‑intelligence market.

AIChina AILinear Attention
0 likes · 4 min read
How MiniMax’s Open‑Source Linear‑Attention Model Is Shaking Up the Global AI Landscape
ZhongAn Tech Team
ZhongAn Tech Team
Jan 12, 2025 · Artificial Intelligence

AI Weekly Digest Issue 10: Market Insights, Industry Solutions, and Notable Technologies

This issue reviews recent AI industry developments, including Lee Kai‑fu’s clarification on Zero‑One’s strategy, Microsoft’s open‑source Phi‑4 model, the multimodal VITA‑1.5 release, and HaiLuo AI’s advanced Chinese voice‑cloning technology, providing technical details and market implications.

AIMultimodalVoice Cloning
0 likes · 10 min read
AI Weekly Digest Issue 10: Market Insights, Industry Solutions, and Notable Technologies
Infra Learning Club
Infra Learning Club
Jan 2, 2025 · Artificial Intelligence

Three Major LLM Trends in 2025: Ubiquitous Agents, Rising Small Models, and Multimodal Fusion

In 2025, large language models will see three key trends—agents becoming pervasive in daily life and industry, the emergence of efficient small models for edge and specialized tasks, and the integration of multimodal capabilities that combine text, images, and audio to enable more natural human‑machine interaction.

AI trendsAgentsLLM
0 likes · 4 min read
Three Major LLM Trends in 2025: Ubiquitous Agents, Rising Small Models, and Multimodal Fusion
Programmer DD
Programmer DD
Dec 31, 2024 · Artificial Intelligence

Build an AI‑Powered Expense Tracker with GLM‑4V‑Flash and MaxKB

This article demonstrates how to create an AI‑driven personal expense‑tracking assistant by leveraging Zhipu's GLM‑4V‑Flash multimodal model for receipt OCR, generating SQL statements, and integrating them with MaxKB workflows and a MySQL database, complete with code snippets and deployment steps.

AIGLM-4V-FlashMaxKB
0 likes · 13 min read
Build an AI‑Powered Expense Tracker with GLM‑4V‑Flash and MaxKB
Baidu Geek Talk
Baidu Geek Talk
Dec 25, 2024 · Industry Insights

How to Build a Multimodal Web Page Model for the LLM Era

This article examines the unique multimodal and multi‑granular nature of web pages, compares fusion strategies, proposes a cross‑modal attention approach, outlines fine‑ and coarse‑grained pre‑training tasks, and explores low‑cost adaptor methods for adapting large multimodal models to web‑page modeling in the LLM era.

AIHTMLLLM adaptation
0 likes · 10 min read
How to Build a Multimodal Web Page Model for the LLM Era
DevOps
DevOps
Dec 23, 2024 · Artificial Intelligence

Understanding AIGC Agents: Definition, Core Features, Underlying Logic, and Commercial Applications

This article explains what AIGC agents are, outlines their four main characteristics, describes the underlying transformer‑based architecture, dual‑stage learning, probabilistic generation and feedback optimization, and explores their current and future commercial use cases across content creation, knowledge bases, customer service, internal operations, and product design.

AIGCAgentArtificial Intelligence
0 likes · 14 min read
Understanding AIGC Agents: Definition, Core Features, Underlying Logic, and Commercial Applications
Tencent Cloud Developer
Tencent Cloud Developer
Dec 5, 2024 · Industry Insights

Why Most RAG Projects Fail and How Tencent’s LeXiang AI Assistant Overcomes Them

The article analyses the rapid growth of Retrieval‑Augmented Generation (RAG) in enterprises, explains why self‑built RAG solutions often collapse under cost and maintenance pressures, and demonstrates how Tencent LeXiang AI Assistant addresses these issues through a robust knowledge‑management core, extensive industry experience, scalable resources, and advanced multimodal capabilities.

AI assistantEnterprise AIKnowledge Management
0 likes · 16 min read
Why Most RAG Projects Fail and How Tencent’s LeXiang AI Assistant Overcomes Them
21CTO
21CTO
Dec 4, 2024 · Artificial Intelligence

Introducing Pi-zero: A General‑Purpose AI Foundation Model for Robotics

Physical Intelligence's new Pi-zero model, built on a vision‑language foundation and fine‑tuned with extensive robot data, outperforms prior baselines across multiple tasks, showcasing the promise of large multimodal foundation models for flexible, robust robot control.

AIFoundation ModelsMultimodal
0 likes · 6 min read
Introducing Pi-zero: A General‑Purpose AI Foundation Model for Robotics
NewBeeNLP
NewBeeNLP
Dec 2, 2024 · Artificial Intelligence

What Are Today’s Unified Generation-and-Understanding Multimodal Model Architectures?

This article surveys current unified generation-and-understanding multimodal large-model architectures, compares LLM-centric and LLM-plus-diffusion designs, extracts common insights, details large-scale training tricks from models like Emu3, Chameleon and Janus, and outlines open research directions for visual encoders.

Large Language ModelsMultimodaldiffusion
0 likes · 5 min read
What Are Today’s Unified Generation-and-Understanding Multimodal Model Architectures?
JD Retail Technology
JD Retail Technology
Nov 14, 2024 · Artificial Intelligence

Improving Advertisement Image Generation with a Multimodal Reliable Feedback Network (ECCV 2024)

The paper introduces a Multimodal Reliable Feedback Network (RFNet) and a consistency‑condition regularization technique that together boost the usable rate of automatically generated advertisement images while preserving visual quality, supported by a new million‑image annotated dataset and extensive ECCV‑2024 experiments.

AIDiffusion ModelsECCV2024
0 likes · 8 min read
Improving Advertisement Image Generation with a Multimodal Reliable Feedback Network (ECCV 2024)
Bilibili Tech
Bilibili Tech
Nov 8, 2024 · Artificial Intelligence

AI-Powered Game Recognition for League of Legends Live Streaming on Bilibili

Bilibili’s AI‑driven game‑recognition system extracts real‑time LoL events through OCR, hero detection and hot‑spot tagging, generating high‑energy timestamps and interactive overlays that let viewers jump to key moments and view detailed statistics, enhancing spectator engagement and analytical capabilities across major esports tournaments.

AIGame RecognitionMultimodal
0 likes · 14 min read
AI-Powered Game Recognition for League of Legends Live Streaming on Bilibili
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Nov 6, 2024 · Artificial Intelligence

Unlocking Long-Text Video Understanding and LLM Distillation with Alibaba PAI

Alibaba Cloud’s AI platform PAI recently saw two papers accepted at EMNLP2024—VideoCLIP‑XL, which enhances video‑text representation for long descriptions using a large video‑long‑description dataset and novel pre‑training tasks, and TAPIR, a curriculum‑planning framework that distills instruction‑following abilities of large language models—while also releasing associated models, datasets, and integration tools for users.

DistillationEMNLP2024Multimodal
0 likes · 8 min read
Unlocking Long-Text Video Understanding and LLM Distillation with Alibaba PAI
DataFunSummit
DataFunSummit
Nov 1, 2024 · Big Data

DataFun Summit Session Overview and E‑book Access Instructions

The article outlines how to obtain the DataFun Summit e‑book by following the public account instructions and provides concise English summaries of twelve technical sessions covering data lineage, integration, AI language models, multimodal content, game AI agents, lake‑warehouse governance, big‑data architecture, and cluster management.

AIBig DataData Integration
0 likes · 5 min read
DataFun Summit Session Overview and E‑book Access Instructions
AntTech
AntTech
Oct 28, 2024 · Artificial Intelligence

Highlights of AI Large‑Model Sessions at CNCC 2024

The CNCC 2024 conference featured a series of expert talks on AI large‑model research, covering paradigm shifts in scientific discovery, knowledge enhancement and governance, data‑infrastructure analytics, vertical‑domain inference, diffusion‑model advances, multimodal model progress, and medical AI applications, illustrating the breadth and impact of large‑model technologies across multiple domains.

AIKnowledge GovernanceMultimodal
0 likes · 9 min read
Highlights of AI Large‑Model Sessions at CNCC 2024
JD Retail Technology
JD Retail Technology
Oct 15, 2024 · Artificial Intelligence

Large‑Model‑Driven Evolution of E‑commerce Search and Recommendation at JD Retail

The article examines how large language models are reshaping JD Retail's e‑commerce search and recommendation pipelines, detailing industry evolution, technical challenges such as knowledge hallucination, intent understanding, personalization, cost, and safety, and presenting JD's end‑to‑end AIGC architecture, data preprocessing, alignment, evaluation, and next‑generation AI search solutions.

AIMultimodale-commerce
0 likes · 36 min read
Large‑Model‑Driven Evolution of E‑commerce Search and Recommendation at JD Retail
DataFunTalk
DataFunTalk
Oct 1, 2024 · Artificial Intelligence

From Early AI to Superintelligence: Challenges and Prospects

The article reviews the evolution of artificial intelligence from early statistical models through deep learning and Transformer architectures, examines current breakthroughs like multimodal models, and discusses the technical, computational, and safety challenges that must be overcome before achieving artificial superintelligence (ASI).

AIArtificial IntelligenceMultimodal
0 likes · 8 min read
From Early AI to Superintelligence: Challenges and Prospects
Data Thinking Notes
Data Thinking Notes
Sep 26, 2024 · Big Data

How Data Platforms Are Shifting from Cost Efficiency to Value in the AI Era

The talk reviews the evolution of data technologies from early database storage to today’s generative AI-driven era, highlighting how massive data, multimodal processing, and advanced analytics are transforming data systems from cost‑centered infrastructures to value‑focused ecosystems that empower intelligent agents, open data ecosystems, and new application paradigms.

Big DataData PlatformsData Value
0 likes · 19 min read
How Data Platforms Are Shifting from Cost Efficiency to Value in the AI Era
JD Tech Talk
JD Tech Talk
Sep 23, 2024 · Artificial Intelligence

JD Advertising R&D: AI‑Driven Solutions for Traffic Valuation, Multimodal Understanding, Auction Mechanisms, Generative Recommendation, and Large‑Model Engineering

The JD Advertising R&D team applies cutting‑edge AI techniques—including query intent models, multimodal representation pipelines, reinforcement‑learning‑based auction mechanisms, generative recommendation with quantized product tokens, and large‑model infrastructure—to boost traffic valuation, ad relevance, revenue, and creative generation across the platform.

AIAdvertisingGraph Neural Networks
0 likes · 19 min read
JD Advertising R&D: AI‑Driven Solutions for Traffic Valuation, Multimodal Understanding, Auction Mechanisms, Generative Recommendation, and Large‑Model Engineering
JD Cloud Developers
JD Cloud Developers
Sep 23, 2024 · Artificial Intelligence

How JD’s Advertising Lab Leverages Large‑Scale AI to Transform E‑Commerce Ads

JD's advertising research team combines deep learning, multimodal modeling, reinforcement‑learning auctions, and generative recommendation to boost ad relevance, improve long‑tail product exposure, and overcome large‑model inference challenges in a high‑traffic e‑commerce environment.

Graph Neural NetworkMultimodaladvertising AI
0 likes · 22 min read
How JD’s Advertising Lab Leverages Large‑Scale AI to Transform E‑Commerce Ads
AntData
AntData
Sep 9, 2024 · Big Data

From Cost‑Efficiency to Value‑Centric: The Evolution of Data Systems in the Data+AI Era

The article reviews the rapid advances in generative AI and big‑data technologies, traces the historical development of data infrastructure, and argues that modern data systems are shifting from a cost‑efficiency focus to a value‑centric paradigm driven by multimodal, non‑structured data, vector search and machine‑oriented services.

@DataArtificial IntelligenceBig Data
0 likes · 18 min read
From Cost‑Efficiency to Value‑Centric: The Evolution of Data Systems in the Data+AI Era
JD Retail Technology
JD Retail Technology
Sep 4, 2024 · Artificial Intelligence

Multimodal Recommendation Algorithms and System Architecture at JD.com

This article presents JD.com’s multimodal recommendation system architecture, covering content understanding, multimodal ranking and recall models, practical deployment pipelines, and future research directions such as large‑model integration and supply‑side generation, all illustrated with detailed diagrams and Q&A.

AIJD.comMultimodal
0 likes · 14 min read
Multimodal Recommendation Algorithms and System Architecture at JD.com
AI Large Model Application Practice
AI Large Model Application Practice
Aug 29, 2024 · Artificial Intelligence

8 Essential Indexing Strategies to Boost Enterprise RAG Performance

This article presents eight practical optimization recommendations for the indexing stage of enterprise‑level Retrieval‑Augmented Generation (RAG) applications, covering chunk creation, abbreviation handling, multimodal document processing, semantic enrichment, metadata usage, alternative index types, and embedding model selection.

ChunkingIndexingMetadata
0 likes · 15 min read
8 Essential Indexing Strategies to Boost Enterprise RAG Performance
DataFunSummit
DataFunSummit
Aug 29, 2024 · Artificial Intelligence

Intelligent NPC Practices in Tencent Games: Multi‑Modal LLM Solutions and System Optimizations

This article details Tencent Game's end‑to‑end approach to building intelligent NPCs, covering the opportunities brought by AI, the practical implementation of multimodal LLM‑driven dialogue, knowledge‑augmented retrieval, long‑context handling, safety measures, multimodal expression (voice and facial animation), and system‑level performance optimizations for real‑time deployment.

AILLMMultimodal
0 likes · 18 min read
Intelligent NPC Practices in Tencent Games: Multi‑Modal LLM Solutions and System Optimizations
DataFunSummit
DataFunSummit
Aug 25, 2024 · Artificial Intelligence

Applying Large AI Models to Financial Data Governance and Innovative Use Cases

This article presents a comprehensive technical overview of how large AI models are reshaping financial data production, governance, multimodal document understanding, lakehouse storage, private‑domain model deployment, data‑centric engineering methods, and multi‑agent intelligent advisory within the finance sector.

AIMultimodalRAG
0 likes · 21 min read
Applying Large AI Models to Financial Data Governance and Innovative Use Cases
NewBeeNLP
NewBeeNLP
Aug 15, 2024 · Industry Insights

Decoding Xiaohongshu’s Decentralized Recommendation: Sideinfo and Multimodal Fusion

This article analyzes how Xiaohongshu addresses the decentralization challenge in its recommendation system by strengthening side‑information usage, integrating multimodal signals across the full pipeline, and implementing interest exploration and protection mechanisms, while also outlining future research directions such as generative recommendation and large‑model‑driven user profiling.

Multimodaldecentralized-distributiongraph
0 likes · 25 min read
Decoding Xiaohongshu’s Decentralized Recommendation: Sideinfo and Multimodal Fusion