Tagged articles
374 articles
Page 3 of 4
Meituan Technology Team
Meituan Technology Team
Mar 27, 2025 · Artificial Intelligence

Q-Eval-100K Dataset and Q-Eval-Score Evaluation Framework for Text-to-Visual Generation

The Q‑Eval‑100K dataset, comprising 100 k AIGC images and videos with separate visual‑quality and textual‑consistency annotations, powers the open‑source Q‑Eval‑Score framework that fine‑tunes multimodal models to deliver state‑of‑the‑art, scalable, and objective evaluation—including a “vague‑to‑specific” strategy for long prompts—surpassing existing benchmarks.

AIGCDatasetevaluation
0 likes · 9 min read
Q-Eval-100K Dataset and Q-Eval-Score Evaluation Framework for Text-to-Visual Generation
37 Interactive Technology Team
37 Interactive Technology Team
Mar 26, 2025 · Artificial Intelligence

LUI vs GUI: Choosing the Right Interface for AI Product Design

When designing AI products, choosing between a Language User Interface—leveraging speech recognition, NLP, and conversational flexibility—and a Graphical User Interface—relying on visual icons, layouts, and intuitive interaction—depends on technology maturity, response speed, and user learning cost, while emerging multimodal designs increasingly blend both for richer, context‑aware experiences.

AIDesignGUI
0 likes · 11 min read
LUI vs GUI: Choosing the Right Interface for AI Product Design
JD Retail Technology
JD Retail Technology
Mar 25, 2025 · Artificial Intelligence

2024 Advances in Advertising Creative Generation and Selection

In 2024 the advertising team deployed an end‑to‑end AIGC pipeline that automatically creates high‑quality ad images, uses the multimodal Reliable Feedback Network and the million‑size RF1M dataset to filter outputs, builds rich offline and online multimodal representations with contrastive and list‑wise learning, and optimizes ranking architecture to deliver scalable, personalized creative selection.

AIAIGCAdvertising
0 likes · 10 min read
2024 Advances in Advertising Creative Generation and Selection
Amap Tech
Amap Tech
Mar 19, 2025 · Artificial Intelligence

Driving by the Rules: Integrating Lane-Level Traffic Regulations into Online HD Maps

Gaode Map and Xi'an Jiaotong University introduce the “Driving by the Rules” task, releasing the MapDR benchmark that integrates lane‑level traffic‑sign regulations into online‑constructed HD maps, and provide modular (VLE‑MEE) and end‑to‑end (RuleVLM) baselines to evaluate rule extraction and lane association.

AIBenchmarkDataset
0 likes · 8 min read
Driving by the Rules: Integrating Lane-Level Traffic Regulations into Online HD Maps
IT Services Circle
IT Services Circle
Mar 19, 2025 · Artificial Intelligence

ByteDance’s AI Video Generation Model Goku, Streamer‑Sales Live‑Selling Model, and MimicTalk 3D Talking‑Head Project

ByteDance and partners open‑source three AI projects—Goku for high‑quality text‑to‑video generation, Streamer‑Sales for multimodal live‑selling LLMs, and MimicTalk for rapid 3D talking‑head creation—detailing their core features, underlying transformer‑based architectures, training pipelines, and public repositories.

AI video generationTransformerVirtual digital human
0 likes · 5 min read
ByteDance’s AI Video Generation Model Goku, Streamer‑Sales Live‑Selling Model, and MimicTalk 3D Talking‑Head Project
JD Tech Talk
JD Tech Talk
Mar 19, 2025 · Artificial Intelligence

Reliable Advertising Image Generation and Creative Selection Using Multimodal Feedback and MLLM Representations

The 2024 advertising team introduced a suite of AI‑driven techniques—including a trustworthy feedback network, a large‑scale human‑annotated dataset, multimodal large language model representations, and online ranking architecture upgrades—to dramatically improve the quality, coverage, and personalization of generated ad creatives.

AIGCAdvertisingMLLM
0 likes · 10 min read
Reliable Advertising Image Generation and Creative Selection Using Multimodal Feedback and MLLM Representations
JD Cloud Developers
JD Cloud Developers
Mar 19, 2025 · Artificial Intelligence

How AIGC Boosts Ad Creative Quality: Trustworthy Image Generation & Selection

2024 saw the advertising team achieve major breakthroughs in AI-generated ad creatives by introducing a multimodal reliable feedback network to improve image usability, releasing a large human-annotated dataset, and leveraging multimodal large language models for richer representation and more effective online/offline creative selection.

AIGCad optimizationimage generation
0 likes · 10 min read
How AIGC Boosts Ad Creative Quality: Trustworthy Image Generation & Selection
NewBeeNLP
NewBeeNLP
Mar 18, 2025 · Interview Experience

How to Ace Multimodal Model Interviews at Taobao's Search AI Division

This article recounts a three‑stage interview for a multimodal large‑model position at Taobao's Search AI division, detailing typical questions on CLIP, LoRA, BLIP, Qwen‑VL, Transformer fundamentals, RLHF, and coding challenges, and offers insights on what interviewers focus on.

AICLIPLoRA
0 likes · 5 min read
How to Ace Multimodal Model Interviews at Taobao's Search AI Division
Code Mala Tang
Code Mala Tang
Mar 15, 2025 · Artificial Intelligence

What Makes Google’s New Gemma 3 Model a Game‑Changer for AI Developers?

Google’s Gemma 3, a lightweight open‑source model with up to 27 billion parameters, offers multimodal input, 128K token context, and broad language support, outperforming leading rivals on single‑GPU benchmarks and providing flexible deployment options for developers and researchers alike.

AI modelGemma 3Google AI
0 likes · 9 min read
What Makes Google’s New Gemma 3 Model a Game‑Changer for AI Developers?
AIWalker
AIWalker
Mar 7, 2025 · Artificial Intelligence

How GIFNet’s Low‑Level Interaction Breakthrough Enables Universal Multimodal Fusion Across Tasks

The paper introduces GIFNet, a three‑branch network that leverages low‑level visual tasks and a cross‑fusion gating mechanism to achieve a single, task‑agnostic image‑fusion model with dramatically reduced computation, strong generalization to unseen modalities, and even single‑modal enhancement capabilities.

CVPR2025Computer VisionGIFNet
0 likes · 20 min read
How GIFNet’s Low‑Level Interaction Breakthrough Enables Universal Multimodal Fusion Across Tasks
DaTaobao Tech
DaTaobao Tech
Mar 7, 2025 · Artificial Intelligence

Taobao Content AI: Summary of AIGC Content Generation and Multimodal Model Techniques

Taobao’s AIGC pipeline combines a human‑feedback multimodal reward model, audio‑visual joint pre‑training, and Mixture‑of‑Experts distillation to clean data, align outputs with user preferences, and achieve state‑of‑the‑art multimodal LLM performance that drives content cold‑start and conversion gains in e‑commerce.

AIGCContent GenerationReward model
0 likes · 10 min read
Taobao Content AI: Summary of AIGC Content Generation and Multimodal Model Techniques
Cognitive Technology Team
Cognitive Technology Team
Mar 7, 2025 · Artificial Intelligence

From Word Embeddings to Large Language Models: A Comprehensive Overview of AI Model Evolution

This article traces the development of AI models—from early word embeddings like Word2Vec and ELMo, through transformer‑based encoders such as BERT and decoder‑only models like GPT‑1/2/3, to recent multimodal systems and scaling laws—explaining their architectures, training methods, and impact on modern AI applications.

AIEmbeddingTransformer
0 likes · 22 min read
From Word Embeddings to Large Language Models: A Comprehensive Overview of AI Model Evolution
DaTaobao Tech
DaTaobao Tech
Mar 5, 2025 · Artificial Intelligence

Multimodal Large‑Model Cover Generation AI Agent for Taobao Video and Live Streams

Taobao’s new multimodal AI Agent automatically creates high‑quality static and dynamic video covers by planning tasks, consulting a memory of quality criteria, executing frame selection with ReKV streaming and dual‑stage evaluation, generating marketing copy via fine‑tuned Qwen2.5‑7B, and refining layout, resulting in significantly higher click‑through rates, lower latency, and reduced manual effort.

AIVideo processingcover generation
0 likes · 17 min read
Multimodal Large‑Model Cover Generation AI Agent for Taobao Video and Live Streams
DaTaobao Tech
DaTaobao Tech
Mar 3, 2025 · Artificial Intelligence

How Taobao’s “Faxiang” AI Model Revolutionizes E‑Commerce Video Generation

Taobao’s AIGC video generation platform, built on a large‑scale “Faxiang” model that evolved from UNet to DiT, leverages over 2 billion curated e‑commerce videos, expert alignment, Lora fine‑tuning, and multi‑control capabilities to deliver diverse, high‑quality product videos that dramatically boost conversion metrics across the marketplace.

AI video generationAIGCe‑commerce
0 likes · 11 min read
How Taobao’s “Faxiang” AI Model Revolutionizes E‑Commerce Video Generation
JD Retail Technology
JD Retail Technology
Mar 1, 2025 · Industry Insights

How JD Retail’s AI Assistant Uses Multimodal LLMs to Boost E‑Commerce

JD Retail’s AI assistant combines a Master‑Sub agent framework, ReAct paradigm, multimodal integration and MoE architecture to improve sales forecasting, pricing, and recommendation accuracy, while the team’s collaborative culture and open talent pathways illustrate how cutting‑edge AI is applied in real‑world e‑commerce.

AIJD RetailLLM
0 likes · 8 min read
How JD Retail’s AI Assistant Uses Multimodal LLMs to Boost E‑Commerce
AIWalker
AIWalker
Feb 20, 2025 · Artificial Intelligence

Transfusion: A Single Model for Unified Image Generation and Understanding

Transfusion is a 7B‑parameter transformer that jointly trains language modeling and diffusion losses on mixed text‑image data, enabling seamless text generation, image generation, and image understanding within one model and outperforming prior multimodal approaches such as Chameleon across multiple benchmarks.

AI researchLanguage ModelingTransformer
0 likes · 20 min read
Transfusion: A Single Model for Unified Image Generation and Understanding
Architect
Architect
Feb 16, 2025 · Artificial Intelligence

DeepSeek-V3, DeepSeek-R1, and Janus‑Pro: Architecture, Training Techniques, and Performance Insights

This article provides an in‑depth technical overview of DeepSeek‑V3, DeepSeek‑R1 and Janus‑Pro models, covering their Mixture‑of‑Experts architecture, novel MLA attention, auxiliary‑loss‑free load balancing, multi‑token prediction, FP8 mixed‑precision training, efficient cross‑node communication, reinforcement‑learning pipelines, multimodal modeling strategies, performance comparisons, cost statistics, and current limitations.

AI ArchitectureDeepSeek-V3FP8 training
0 likes · 18 min read
DeepSeek-V3, DeepSeek-R1, and Janus‑Pro: Architecture, Training Techniques, and Performance Insights
AIWalker
AIWalker
Feb 16, 2025 · Artificial Intelligence

VARGPT: A Unified Autoregressive Architecture for Multimodal Understanding and Generation

VARGPT is a novel multimodal large language model that unifies visual understanding and autoregressive image generation within a single architecture, extending LLaVA with next‑token and next‑scale prediction, trained through three staged data‑curated phases and achieving superior performance on numerous vision‑language benchmarks.

AI researchVARGPTimage generation
0 likes · 20 min read
VARGPT: A Unified Autoregressive Architecture for Multimodal Understanding and Generation
Architects' Tech Alliance
Architects' Tech Alliance
Feb 16, 2025 · Artificial Intelligence

How DeepSeek’s Distillation Breaks Bottlenecks and Boosts Multimodal AI Performance

This article provides an in‑depth technical analysis of DeepSeek’s model distillation technology, covering its core principles, innovative data‑model fusion strategies, architecture design, training optimizations, performance benchmarks, and the remaining challenges of scaling distillation to multimodal tasks.

AI OptimizationDeepSeeklarge language models
0 likes · 16 min read
How DeepSeek’s Distillation Breaks Bottlenecks and Boosts Multimodal AI Performance
Ops Development & AI Practice
Ops Development & AI Practice
Feb 10, 2025 · Artificial Intelligence

What’s Inside Google Gemini 2.0 Pro? Free Pricing, Multimodal Power & Real‑Time Streaming

The article reviews Google Gemini 2.0 Pro Experimental, detailing its free‑during‑experiment pricing, multimodal understanding, real‑time streaming, native tool integration, usage limits, latency controls, and practical scenarios such as large‑scale code processing and live media handling.

AIGeminiRealtime Streaming
0 likes · 5 min read
What’s Inside Google Gemini 2.0 Pro? Free Pricing, Multimodal Power & Real‑Time Streaming
AI Code to Success
AI Code to Success
Jan 23, 2025 · Industry Insights

Core Tech vs Application Optimization: Where’s the Real Battleground in the AI Large‑Model Race?

The article analyzes the 2025 AI large‑model landscape, contrasting slowing foundational breakthroughs with fierce application competition, highlighting MiniMax’s low‑cost linear‑attention models, multimodal advances, and the strategic shift from price wars to sustainable, technology‑driven growth.

AIIndustry analysislarge models
0 likes · 7 min read
Core Tech vs Application Optimization: Where’s the Real Battleground in the AI Large‑Model Race?
DataFunSummit
DataFunSummit
Jan 22, 2025 · Artificial Intelligence

RAG2.0 Engine Design Challenges and Implementation

This article presents a comprehensive overview of the RAG2.0 engine design, covering RAG1.0 limitations, effective chunking methods, accurate retrieval techniques, advanced multimodal processing, hybrid search strategies, database indexing choices, and future directions such as agentic RAG and memory‑enhanced models.

Hybrid SearchRAGRetrieval Augmented Generation
0 likes · 23 min read
RAG2.0 Engine Design Challenges and Implementation
AI Code to Success
AI Code to Success
Jan 16, 2025 · Industry Insights

How MiniMax’s Open‑Source Linear‑Attention Model Is Shaking Up the Global AI Landscape

MiniMax, a Shanghai‑based AI unicorn, has open‑sourced its MiniMax‑01 series featuring large‑scale linear attention, secured $600 million in funding, launched multimodal products like Talkie and Hailuo AI, and is positioning itself as a competitive force amid rising geopolitical tensions in the global artificial‑intelligence market.

AIChina AILinear Attention
0 likes · 4 min read
How MiniMax’s Open‑Source Linear‑Attention Model Is Shaking Up the Global AI Landscape
ZhongAn Tech Team
ZhongAn Tech Team
Jan 12, 2025 · Artificial Intelligence

AI Weekly Digest Issue 10: Market Insights, Industry Solutions, and Notable Technologies

This issue reviews recent AI industry developments, including Lee Kai‑fu’s clarification on Zero‑One’s strategy, Microsoft’s open‑source Phi‑4 model, the multimodal VITA‑1.5 release, and HaiLuo AI’s advanced Chinese voice‑cloning technology, providing technical details and market implications.

AImultimodalvoice cloning
0 likes · 10 min read
AI Weekly Digest Issue 10: Market Insights, Industry Solutions, and Notable Technologies
Infra Learning Club
Infra Learning Club
Jan 2, 2025 · Artificial Intelligence

Three Major LLM Trends in 2025: Ubiquitous Agents, Rising Small Models, and Multimodal Fusion

In 2025, large language models will see three key trends—agents becoming pervasive in daily life and industry, the emergence of efficient small models for edge and specialized tasks, and the integration of multimodal capabilities that combine text, images, and audio to enable more natural human‑machine interaction.

AI trendsLLMagents
0 likes · 4 min read
Three Major LLM Trends in 2025: Ubiquitous Agents, Rising Small Models, and Multimodal Fusion
Programmer DD
Programmer DD
Dec 31, 2024 · Artificial Intelligence

Build an AI‑Powered Expense Tracker with GLM‑4V‑Flash and MaxKB

This article demonstrates how to create an AI‑driven personal expense‑tracking assistant by leveraging Zhipu's GLM‑4V‑Flash multimodal model for receipt OCR, generating SQL statements, and integrating them with MaxKB workflows and a MySQL database, complete with code snippets and deployment steps.

AIGLM-4V-FlashMaxKB
0 likes · 13 min read
Build an AI‑Powered Expense Tracker with GLM‑4V‑Flash and MaxKB
Baidu Geek Talk
Baidu Geek Talk
Dec 25, 2024 · Industry Insights

How to Build a Multimodal Web Page Model for the LLM Era

This article examines the unique multimodal and multi‑granular nature of web pages, compares fusion strategies, proposes a cross‑modal attention approach, outlines fine‑ and coarse‑grained pre‑training tasks, and explores low‑cost adaptor methods for adapting large multimodal models to web‑page modeling in the LLM era.

AIHTMLLLM adaptation
0 likes · 10 min read
How to Build a Multimodal Web Page Model for the LLM Era
DevOps
DevOps
Dec 23, 2024 · Artificial Intelligence

Understanding AIGC Agents: Definition, Core Features, Underlying Logic, and Commercial Applications

This article explains what AIGC agents are, outlines their four main characteristics, describes the underlying transformer‑based architecture, dual‑stage learning, probabilistic generation and feedback optimization, and explores their current and future commercial use cases across content creation, knowledge bases, customer service, internal operations, and product design.

AIGCAgentContent Generation
0 likes · 14 min read
Understanding AIGC Agents: Definition, Core Features, Underlying Logic, and Commercial Applications
Tencent Cloud Developer
Tencent Cloud Developer
Dec 5, 2024 · Industry Insights

Why Most RAG Projects Fail and How Tencent’s LeXiang AI Assistant Overcomes Them

The article analyses the rapid growth of Retrieval‑Augmented Generation (RAG) in enterprises, explains why self‑built RAG solutions often collapse under cost and maintenance pressures, and demonstrates how Tencent LeXiang AI Assistant addresses these issues through a robust knowledge‑management core, extensive industry experience, scalable resources, and advanced multimodal capabilities.

AI AssistantEnterprise AIRAG
0 likes · 16 min read
Why Most RAG Projects Fail and How Tencent’s LeXiang AI Assistant Overcomes Them
21CTO
21CTO
Dec 4, 2024 · Artificial Intelligence

Introducing Pi-zero: A General‑Purpose AI Foundation Model for Robotics

Physical Intelligence's new Pi-zero model, built on a vision‑language foundation and fine‑tuned with extensive robot data, outperforms prior baselines across multiple tasks, showcasing the promise of large multimodal foundation models for flexible, robust robot control.

AIPi-zerofoundation-models
0 likes · 6 min read
Introducing Pi-zero: A General‑Purpose AI Foundation Model for Robotics
NewBeeNLP
NewBeeNLP
Dec 2, 2024 · Artificial Intelligence

What Are Today’s Unified Generation-and-Understanding Multimodal Model Architectures?

This article surveys current unified generation-and-understanding multimodal large-model architectures, compares LLM-centric and LLM-plus-diffusion designs, extracts common insights, details large-scale training tricks from models like Emu3, Chameleon and Janus, and outlines open research directions for visual encoders.

diffusionlarge language modelsmultimodal
0 likes · 5 min read
What Are Today’s Unified Generation-and-Understanding Multimodal Model Architectures?
JD Retail Technology
JD Retail Technology
Nov 14, 2024 · Artificial Intelligence

Improving Advertisement Image Generation with a Multimodal Reliable Feedback Network (ECCV 2024)

The paper introduces a Multimodal Reliable Feedback Network (RFNet) and a consistency‑condition regularization technique that together boost the usable rate of automatically generated advertisement images while preserving visual quality, supported by a new million‑image annotated dataset and extensive ECCV‑2024 experiments.

AIECCV2024advertisement generation
0 likes · 8 min read
Improving Advertisement Image Generation with a Multimodal Reliable Feedback Network (ECCV 2024)
Bilibili Tech
Bilibili Tech
Nov 8, 2024 · Artificial Intelligence

AI-Powered Game Recognition for League of Legends Live Streaming on Bilibili

Bilibili’s AI‑driven game‑recognition system extracts real‑time LoL events through OCR, hero detection and hot‑spot tagging, generating high‑energy timestamps and interactive overlays that let viewers jump to key moments and view detailed statistics, enhancing spectator engagement and analytical capabilities across major esports tournaments.

AIComputer VisionGame Recognition
0 likes · 14 min read
AI-Powered Game Recognition for League of Legends Live Streaming on Bilibili
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Nov 6, 2024 · Artificial Intelligence

Unlocking Long-Text Video Understanding and LLM Distillation with Alibaba PAI

Alibaba Cloud’s AI platform PAI recently saw two papers accepted at EMNLP2024—VideoCLIP‑XL, which enhances video‑text representation for long descriptions using a large video‑long‑description dataset and novel pre‑training tasks, and TAPIR, a curriculum‑planning framework that distills instruction‑following abilities of large language models—while also releasing associated models, datasets, and integration tools for users.

DistillationEMNLP2024large-language-models
0 likes · 8 min read
Unlocking Long-Text Video Understanding and LLM Distillation with Alibaba PAI
DataFunSummit
DataFunSummit
Nov 1, 2024 · Big Data

DataFun Summit Session Overview and E‑book Access Instructions

The article outlines how to obtain the DataFun Summit e‑book by following the public account instructions and provides concise English summaries of twelve technical sessions covering data lineage, integration, AI language models, multimodal content, game AI agents, lake‑warehouse governance, big‑data architecture, and cluster management.

AIBig DataData Integration
0 likes · 5 min read
DataFun Summit Session Overview and E‑book Access Instructions
AntTech
AntTech
Oct 28, 2024 · Artificial Intelligence

Highlights of AI Large‑Model Sessions at CNCC 2024

The CNCC 2024 conference featured a series of expert talks on AI large‑model research, covering paradigm shifts in scientific discovery, knowledge enhancement and governance, data‑infrastructure analytics, vertical‑domain inference, diffusion‑model advances, multimodal model progress, and medical AI applications, illustrating the breadth and impact of large‑model technologies across multiple domains.

AIKnowledge Governancemultimodal
0 likes · 9 min read
Highlights of AI Large‑Model Sessions at CNCC 2024
JD Retail Technology
JD Retail Technology
Oct 15, 2024 · Artificial Intelligence

Large‑Model‑Driven Evolution of E‑commerce Search and Recommendation at JD Retail

The article examines how large language models are reshaping JD Retail's e‑commerce search and recommendation pipelines, detailing industry evolution, technical challenges such as knowledge hallucination, intent understanding, personalization, cost, and safety, and presenting JD's end‑to‑end AIGC architecture, data preprocessing, alignment, evaluation, and next‑generation AI search solutions.

AIKnowledge Graphe‑commerce
0 likes · 36 min read
Large‑Model‑Driven Evolution of E‑commerce Search and Recommendation at JD Retail
DataFunTalk
DataFunTalk
Oct 1, 2024 · Artificial Intelligence

From Early AI to Superintelligence: Challenges and Prospects

The article reviews the evolution of artificial intelligence from early statistical models through deep learning and Transformer architectures, examines current breakthroughs like multimodal models, and discusses the technical, computational, and safety challenges that must be overcome before achieving artificial superintelligence (ASI).

AISuperintelligenceTransformer
0 likes · 8 min read
From Early AI to Superintelligence: Challenges and Prospects
Data Thinking Notes
Data Thinking Notes
Sep 26, 2024 · Big Data

How Data Platforms Are Shifting from Cost Efficiency to Value in the AI Era

The talk reviews the evolution of data technologies from early database storage to today’s generative AI-driven era, highlighting how massive data, multimodal processing, and advanced analytics are transforming data systems from cost‑centered infrastructures to value‑focused ecosystems that empower intelligent agents, open data ecosystems, and new application paradigms.

Big DataData PlatformsData Value
0 likes · 19 min read
How Data Platforms Are Shifting from Cost Efficiency to Value in the AI Era
JD Tech Talk
JD Tech Talk
Sep 23, 2024 · Artificial Intelligence

JD Advertising R&D: AI‑Driven Solutions for Traffic Valuation, Multimodal Understanding, Auction Mechanisms, Generative Recommendation, and Large‑Model Engineering

The JD Advertising R&D team applies cutting‑edge AI techniques—including query intent models, multimodal representation pipelines, reinforcement‑learning‑based auction mechanisms, generative recommendation with quantized product tokens, and large‑model infrastructure—to boost traffic valuation, ad relevance, revenue, and creative generation across the platform.

AIAdvertisinggraph neural networks
0 likes · 19 min read
JD Advertising R&D: AI‑Driven Solutions for Traffic Valuation, Multimodal Understanding, Auction Mechanisms, Generative Recommendation, and Large‑Model Engineering
JD Cloud Developers
JD Cloud Developers
Sep 23, 2024 · Artificial Intelligence

How JD’s Advertising Lab Leverages Large‑Scale AI to Transform E‑Commerce Ads

JD's advertising research team combines deep learning, multimodal modeling, reinforcement‑learning auctions, and generative recommendation to boost ad relevance, improve long‑tail product exposure, and overcome large‑model inference challenges in a high‑traffic e‑commerce environment.

Graph Neural Networkadvertising AIe‑commerce
0 likes · 22 min read
How JD’s Advertising Lab Leverages Large‑Scale AI to Transform E‑Commerce Ads
AntData
AntData
Sep 9, 2024 · Big Data

From Cost‑Efficiency to Value‑Centric: The Evolution of Data Systems in the Data+AI Era

The article reviews the rapid advances in generative AI and big‑data technologies, traces the historical development of data infrastructure, and argues that modern data systems are shifting from a cost‑efficiency focus to a value‑centric paradigm driven by multimodal, non‑structured data, vector search and machine‑oriented services.

@DataBig DataData Value
0 likes · 18 min read
From Cost‑Efficiency to Value‑Centric: The Evolution of Data Systems in the Data+AI Era
JD Retail Technology
JD Retail Technology
Sep 4, 2024 · Artificial Intelligence

Multimodal Recommendation Algorithms and System Architecture at JD.com

This article presents JD.com’s multimodal recommendation system architecture, covering content understanding, multimodal ranking and recall models, practical deployment pipelines, and future research directions such as large‑model integration and supply‑side generation, all illustrated with detailed diagrams and Q&A.

AIJD.comcontent understanding
0 likes · 14 min read
Multimodal Recommendation Algorithms and System Architecture at JD.com
AI Large Model Application Practice
AI Large Model Application Practice
Aug 29, 2024 · Artificial Intelligence

8 Essential Indexing Strategies to Boost Enterprise RAG Performance

This article presents eight practical optimization recommendations for the indexing stage of enterprise‑level Retrieval‑Augmented Generation (RAG) applications, covering chunk creation, abbreviation handling, multimodal document processing, semantic enrichment, metadata usage, alternative index types, and embedding model selection.

RAGchunkingindexing
0 likes · 15 min read
8 Essential Indexing Strategies to Boost Enterprise RAG Performance
DataFunSummit
DataFunSummit
Aug 29, 2024 · Artificial Intelligence

Intelligent NPC Practices in Tencent Games: Multi‑Modal LLM Solutions and System Optimizations

This article details Tencent Game's end‑to‑end approach to building intelligent NPCs, covering the opportunities brought by AI, the practical implementation of multimodal LLM‑driven dialogue, knowledge‑augmented retrieval, long‑context handling, safety measures, multimodal expression (voice and facial animation), and system‑level performance optimizations for real‑time deployment.

AILLMNPC
0 likes · 18 min read
Intelligent NPC Practices in Tencent Games: Multi‑Modal LLM Solutions and System Optimizations
DataFunSummit
DataFunSummit
Aug 25, 2024 · Artificial Intelligence

Applying Large AI Models to Financial Data Governance and Innovative Use Cases

This article presents a comprehensive technical overview of how large AI models are reshaping financial data production, governance, multimodal document understanding, lakehouse storage, private‑domain model deployment, data‑centric engineering methods, and multi‑agent intelligent advisory within the finance sector.

AIMulti-AgentRAG
0 likes · 21 min read
Applying Large AI Models to Financial Data Governance and Innovative Use Cases
NewBeeNLP
NewBeeNLP
Aug 15, 2024 · Industry Insights

Decoding Xiaohongshu’s Decentralized Recommendation: Sideinfo and Multimodal Fusion

This article analyzes how Xiaohongshu addresses the decentralization challenge in its recommendation system by strengthening side‑information usage, integrating multimodal signals across the full pipeline, and implementing interest exploration and protection mechanisms, while also outlining future research directions such as generative recommendation and large‑model‑driven user profiling.

decentralized-distributiongraphinterest-exploration
0 likes · 25 min read
Decoding Xiaohongshu’s Decentralized Recommendation: Sideinfo and Multimodal Fusion
AntTech
AntTech
Aug 13, 2024 · Artificial Intelligence

Ant Group Contributions to ACL 2024: Summaries of 14 Accepted Papers Across NLP and AI

From August 11‑16, 2024 the ACL conference in Bangkok featured 14 Ant Group papers covering large‑scale information extraction, decomposed LLMs for semantic search, multimodal hallucination detection, long‑context attention mechanisms, concept‑reasoning datasets, knowledge‑graph alignment, and more, highlighting the group's breadth in natural language processing and AI research.

ACL2024Information ExtractionNLP
0 likes · 20 min read
Ant Group Contributions to ACL 2024: Summaries of 14 Accepted Papers Across NLP and AI
DataFunSummit
DataFunSummit
Jul 28, 2024 · Artificial Intelligence

Leveraging Large Language Models for Graph Learning: Opportunities, Current Progress, and Future Directions

This article reviews why large language models can be applied to graph learning, outlines their capabilities and graph data characteristics, surveys current research across different graph types and LLM roles, and proposes future research directions for unified cross‑domain graph learning.

AIResearch Directionsgraph learning
0 likes · 19 min read
Leveraging Large Language Models for Graph Learning: Opportunities, Current Progress, and Future Directions
Tencent Cloud Developer
Tencent Cloud Developer
Jul 18, 2024 · Artificial Intelligence

Exploring Large Language Models (LLM): Fundamentals, Applications, and Future Directions

Exploring Large Language Models, this article surveys their core concepts, evolution through Transformers, GPT and BERT, generation challenges, diverse applications such as QA, multimodal creation, summarization and retrieval‑augmented generation, prompt‑engineering frameworks and tools, LangChain‑based pipelines, AI‑driven agents, and future prospects toward domain‑specific use, multimodality, and AGI.

AIAgentLLM
0 likes · 35 min read
Exploring Large Language Models (LLM): Fundamentals, Applications, and Future Directions
Architects' Tech Alliance
Architects' Tech Alliance
Jul 10, 2024 · Industry Insights

Why AI Large Models Are Driving the Next Industrial Revolution

The article analyzes the rapid evolution of AI large models—from their role in advancing AGI through massive pre‑training and fine‑tuning, to current market dynamics led by GPT and domestic Chinese players, and finally to future multimodal applications, content‑factory capabilities, and emerging AIGC revenue models projected to reach trillion‑yuan scales by 2030.

AIAIGCGPT
0 likes · 7 min read
Why AI Large Models Are Driving the Next Industrial Revolution
Baobao Algorithm Notes
Baobao Algorithm Notes
Jul 8, 2024 · Industry Insights

Why Large‑Model Deployment Stalls: Robots, Scaling Laws, and Multimodal Frontiers

The article analyzes current challenges in deploying large AI models, covering robot automation, scaling‑law limits, vertical‑domain use cases, multimodal breakthroughs, algorithmic evolution, and the hardware‑software trade‑offs of training and inference infrastructures, while questioning ROI and practical feasibility.

Roboticsalgorithm evolutioninference infrastructure
0 likes · 21 min read
Why Large‑Model Deployment Stalls: Robots, Scaling Laws, and Multimodal Frontiers
AI Large Model Application Practice
AI Large Model Application Practice
Jul 4, 2024 · Artificial Intelligence

Mastering Multimodal RAG: From PDF Parsing to Advanced Query Rewriting

This article explains how to handle complex multimodal PDFs in RAG systems, outlines extraction, indexing, and multimodal model integration, details four query‑rewriting strategies (HyDE, stepwise, sub‑question, backward), and presents key evaluation metrics and tools for assessing RAG performance.

Document ParsingQuery RewritingRAG
0 likes · 12 min read
Mastering Multimodal RAG: From PDF Parsing to Advanced Query Rewriting
360 Tech Engineering
360 Tech Engineering
Jul 3, 2024 · Artificial Intelligence

360LayoutAnalysis: Open‑Source Lightweight Document Layout Analysis Models for Multiple Scenarios

The 360LayoutAnalysis project from 360 AI Lab releases lightweight, yolov8‑based layout analysis models covering Chinese and English papers, Chinese research reports, and a general document scenario, providing fast inference, paragraph‑level detection, and open‑source code and weights for flexible document‑understanding pipelines.

AI modelLayout AnalysisYOLOv8
0 likes · 9 min read
360LayoutAnalysis: Open‑Source Lightweight Document Layout Analysis Models for Multiple Scenarios
JD Tech
JD Tech
Jun 28, 2024 · Artificial Intelligence

An Overview of Large Language Models: History, Fundamentals, Prompt Engineering, Retrieval‑Augmented Generation, Agents, and Multimodal AI

This article provides a comprehensive introduction to large language models, covering their historical development, core architecture, training process, prompt engineering techniques, Retrieval‑Augmented Generation, agent frameworks, multimodal capabilities, safety challenges, and future research directions.

AI SafetyAI agentsDeep Learning
0 likes · 22 min read
An Overview of Large Language Models: History, Fundamentals, Prompt Engineering, Retrieval‑Augmented Generation, Agents, and Multimodal AI
AntTech
AntTech
Jun 18, 2024 · Artificial Intelligence

Ant Group’s 24 Papers Featured at CVPR2024: Topics and Abstracts

The IEEE CVPR2024 conference in Seattle accepted 2,719 papers out of 11,532 submissions, and Ant Group contributed 24 papers covering computer vision, deep learning, digital humans, large models, multimodal remote sensing, vision‑language distillation, federated incremental learning, model‑stealing defense, and more, with one highlighted as a highlight.

Ant GroupCVPR2024Computer Vision
0 likes · 17 min read
Ant Group’s 24 Papers Featured at CVPR2024: Topics and Abstracts
NewBeeNLP
NewBeeNLP
Jun 18, 2024 · Artificial Intelligence

How Shopee Builds an E‑Commerce Knowledge Graph and Leverages Large Models

This article presents Shopee's comprehensive approach to constructing an e‑commerce knowledge graph, detailing the challenges of heterogeneous data, multi‑language handling, entity disambiguation, and the integration of deep learning and large language models to improve product matching, recommendation, and operational efficiency.

AIKnowledge Graphentity disambiguation
0 likes · 22 min read
How Shopee Builds an E‑Commerce Knowledge Graph and Leverages Large Models
DataFunTalk
DataFunTalk
Jun 14, 2024 · Artificial Intelligence

Shopee's E‑commerce Knowledge Graph Construction and Integration with Large Models

This article presents Shopee's comprehensive exploration of building an e‑commerce knowledge graph, detailing its challenges, construction pipeline, AI‑driven extraction and fusion techniques, multilingual and multimodal modeling, and practical applications ranging from search and recommendation to AI assistants and real‑time updates.

AI applicationsInformation ExtractionKnowledge Graph
0 likes · 21 min read
Shopee's E‑commerce Knowledge Graph Construction and Integration with Large Models
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 13, 2024 · Artificial Intelligence

Creating a Full AI‑Generated Music Video with Large‑Model Agents

This article documents the end‑to‑end workflow of using large multimodal models and specialized agents to automatically generate a storyboard, compose original music and lyrics, produce keyframes, and assemble a complete music video, while highlighting the remaining manual steps and future automation possibilities.

AIMusicStoryboard
0 likes · 10 min read
Creating a Full AI‑Generated Music Video with Large‑Model Agents
Baobao Algorithm Notes
Baobao Algorithm Notes
Jun 5, 2024 · Artificial Intelligence

Is GLM‑4‑9B the New Powerhouse? A Deep Dive into Its Performance and Usage

This article reviews the open‑source 9‑billion‑parameter GLM‑4‑9B model, covering installation, quick‑start inference code, quirky Chinese riddles that highlight its strengths over GPT‑4, extensive benchmark tables for dialogue, multilingual, tool‑calling and multimodal tasks, and its broader impact on the Chinese AI ecosystem.

AIGLM-4-9BModel Evaluation
0 likes · 14 min read
Is GLM‑4‑9B the New Powerhouse? A Deep Dive into Its Performance and Usage
DataFunSummit
DataFunSummit
Jun 4, 2024 · Artificial Intelligence

Multimodal and Graph Neural Network Techniques for eBay Recommendation Systems

This article details eBay's practical experience integrating multimodal data and graph neural networks into its recommendation pipeline, covering pain‑point analysis, a twin‑tower multimodal embedding model with triplet loss and TransH, engineering design, experimental results, and key takeaways for future AI‑driven product development.

EmbeddingGNNGraph Neural Network
0 likes · 19 min read
Multimodal and Graph Neural Network Techniques for eBay Recommendation Systems
Alimama Tech
Alimama Tech
May 29, 2024 · Artificial Intelligence

Mixture of Multi‑Modal Experts for Advertising Recall

The Mixed‑Modal Expert Model combines ID features with image and text embeddings through optimized representations and conditional output fusion, dramatically improving advertising recall—especially for long‑tail items—and delivering measurable gains in click‑recall, revenue, CTR, and page views in large‑scale online tests.

Modelmachine learningmultimodal
0 likes · 15 min read
Mixture of Multi‑Modal Experts for Advertising Recall
NewBeeNLP
NewBeeNLP
May 28, 2024 · Artificial Intelligence

How Generative Models Are Redefining Recommendation Systems

This article reviews recent advances in generative recommendation, highlighting challenges such as item representation and multimodal fusion, and summarizing four key research papers that propose novel tokenization, collaborative integration, and transformer-based multimodal approaches to improve recommendation performance.

AI researchGenerative RecommendationLLM
0 likes · 8 min read
How Generative Models Are Redefining Recommendation Systems
DataFunTalk
DataFunTalk
May 20, 2024 · Artificial Intelligence

Deploying OPPO Multi‑Modal Pretrained Models in Edge‑Cloud Scenarios: Techniques and Optimizations

This article presents OPPO's practical research on deploying multi‑modal pre‑training models across mobile devices and cloud, covering edge image‑text retrieval, text‑image generation and understanding optimizations, and lightweight diffusion model techniques, with detailed algorithmic improvements, performance results, and real‑world application cases.

AIGCOPPOdiffusion
0 likes · 18 min read
Deploying OPPO Multi‑Modal Pretrained Models in Edge‑Cloud Scenarios: Techniques and Optimizations
21CTO
21CTO
May 18, 2024 · Artificial Intelligence

What Makes GPT‑4o Faster, Smarter, and More Multimodal Than GPT‑4?

This article examines OpenAI's GPT‑4o, outlining its key performance, speed, accuracy, latency, multimodal, and resource‑efficiency improvements over GPT‑4, and explains why these enhancements broaden the model's applicability across various AI‑driven applications.

AI modelGPT-4omultimodal
0 likes · 6 min read
What Makes GPT‑4o Faster, Smarter, and More Multimodal Than GPT‑4?
360 Tech Engineering
360 Tech Engineering
May 17, 2024 · Artificial Intelligence

360VL: An Open‑Source Multimodal Large Language Model Based on Llama‑3‑70B

The article introduces 360VL, an open‑source multimodal large language model built on Llama‑3‑70B, describes its novel C‑abs bridge architecture for high‑resolution visual understanding, outlines the two‑stage training with bilingual data, and presents benchmark results showing superior performance over prior LMMs.

AI researchLlama3large language model
0 likes · 8 min read
360VL: An Open‑Source Multimodal Large Language Model Based on Llama‑3‑70B
CSS Magic
CSS Magic
May 14, 2024 · Artificial Intelligence

First Look at GPT-4o: Hands‑On Experience, FAQs, and New Free‑User Benefits

The article provides a hands‑on review of OpenAI's newly released GPT‑4o model, covering its multimodal capabilities, real‑time voice demo, desktop client rollout, access options for paid and free users, practical usage tips, and early observations on API performance and limitations.

AI modelAPIChatGPT
0 likes · 9 min read
First Look at GPT-4o: Hands‑On Experience, FAQs, and New Free‑User Benefits
DataFunSummit
DataFunSummit
Apr 24, 2024 · Artificial Intelligence

Multimodal Content Understanding in Baidu Commercial Systems: The ViCAN Model and Its Applications

This article presents Baidu's exploration of multimodal content understanding for commercial advertising, detailing the ViCAN pre‑training model, its contrastive and mask‑language learning tasks, integration across recall, ranking and risk‑control pipelines, quantization with MMDict, and future AIGC‑driven generation, all backed by extensive experiments and Q&A.

AIAIGCAdvertising
0 likes · 27 min read
Multimodal Content Understanding in Baidu Commercial Systems: The ViCAN Model and Its Applications
21CTO
21CTO
Apr 20, 2024 · Artificial Intelligence

What Developers Need to Know About Meta’s New Open‑Source Llama 3 Model

Meta’s newly open‑source Llama 3 model pushes the frontier of large language models with a larger context window, Mixture‑of‑Experts architecture, multilingual support, and multimodal capabilities, while facing challenges in transparency, bias, and computational resources, and offering diverse applications from NLU to code generation.

AIBenchmarkLlama3
0 likes · 10 min read
What Developers Need to Know About Meta’s New Open‑Source Llama 3 Model
Architects' Tech Alliance
Architects' Tech Alliance
Apr 7, 2024 · Artificial Intelligence

How Sora Is Redefining Text‑to‑Video Generation: Inside the New AI Model

Sora, the newly announced text‑to‑video large model, can generate one‑minute high‑fidelity videos from textual prompts or static images, handling complex scenes, expressive characters, and sophisticated camera motions while also supporting video extension and frame‑filling, positioning it at the forefront of multimodal AI research.

AI modelSoraVideo Generation
0 likes · 6 min read
How Sora Is Redefining Text‑to‑Video Generation: Inside the New AI Model
DataFunSummit
DataFunSummit
Mar 27, 2024 · Artificial Intelligence

Generative Multimodal Pretraining (OFA) and Representational Multimodal Pretraining (ONE-PEACE): Research Overview and Findings

This article reviews Tongyi Lab's work on the OFA framework for generative multimodal pretraining and the ONE-PEACE model for unified multimodal representation learning, detailing their architectures, training strategies, experimental results across vision‑language and audio tasks, and future research directions.

OFAONE-PEACEmultimodal
0 likes · 15 min read
Generative Multimodal Pretraining (OFA) and Representational Multimodal Pretraining (ONE-PEACE): Research Overview and Findings
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Mar 18, 2024 · Artificial Intelligence

How MuLTI Achieves Memory‑Efficient Video‑Language Understanding with Text‑Guided MultiWay Sampling

The paper presents MuLTI, a multimodal video‑language model that tackles the memory and efficiency challenges of long video‑text sequences by introducing a Text‑Guided MultiWay Sampler and a Multiple Choice Modeling pre‑training task, achieving state‑of‑the‑art results on video QA and retrieval while drastically reducing GPU memory consumption.

efficient-aifeature fusionmultimodal
0 likes · 19 min read
How MuLTI Achieves Memory‑Efficient Video‑Language Understanding with Text‑Guided MultiWay Sampling
NewBeeNLP
NewBeeNLP
Feb 27, 2024 · Artificial Intelligence

Boosting E‑Commerce AIGC with Knowledge Graphs: From Multimodal Inputs to Controlled LLMs

The article details how JD.com leverages domain‑specific and generic knowledge graphs to enhance multimodal product information, improve controlled text generation, and boost LLM performance for e‑commerce copywriting, covering model architecture, copy‑only mechanisms, token‑type encoding, experimental results, and practical deployment scenarios.

AIGCKnowledge GraphLLM
0 likes · 23 min read
Boosting E‑Commerce AIGC with Knowledge Graphs: From Multimodal Inputs to Controlled LLMs
21CTO
21CTO
Feb 17, 2024 · Artificial Intelligence

How OpenAI’s Sora Is Pushing Video Generation to New Frontiers

OpenAI’s Sora model demonstrates large‑scale text‑conditional video generation using a diffusion transformer that operates on spatiotemporal patches, supporting variable durations, resolutions, and aspect ratios while showcasing emergent simulation abilities, flexible sampling, and multimodal editing capabilities, though it still has notable limitations.

AI researchSoraTransformer
0 likes · 19 min read
How OpenAI’s Sora Is Pushing Video Generation to New Frontiers
Java Tech Enthusiast
Java Tech Enthusiast
Feb 16, 2024 · Artificial Intelligence

Google's Gemini 1.5: Breakthrough in Long-Context Understanding and Multimodal Capabilities

Google’s Gemini 1.5, a new multimodal Mixture‑of‑Experts model, supports up to a million‑token context (10 million internally), can understand text, video, audio and code, learns a new language from a single prompt, and is already being used by Samsung, Jasper and Quora, positioning it as a direct challenger to OpenAI’s flagship models.

Gemini 1.5Google AILLM
0 likes · 7 min read
Google's Gemini 1.5: Breakthrough in Long-Context Understanding and Multimodal Capabilities
Baobao Algorithm Notes
Baobao Algorithm Notes
Feb 4, 2024 · Industry Insights

Balancing Fun, Utility, and Slow Thinking: The Future of AI Agents

In this talk, the speaker examines the dual goals of AI agents—being entertaining and useful—while introducing the concepts of fast and slow thinking, multimodal perception, long‑term memory, retrieval‑augmented generation, and tool integration as essential steps toward building truly valuable digital companions.

AI agentsFuture AILong-term Memory
0 likes · 18 min read
Balancing Fun, Utility, and Slow Thinking: The Future of AI Agents
DataFunSummit
DataFunSummit
Jan 10, 2024 · Artificial Intelligence

Baidu Commercial Multimodal Understanding and AIGC Innovation Practices

This article presents Baidu's commercial multimodal understanding and AIGC innovations, detailing rich‑media multimodal perception, a unified large‑scale representation framework, scenario‑specific fine‑tuning, and practical applications such as marketing copy, digital‑human video, and poster generation.

AIGCAdvertisingBaidu
0 likes · 12 min read
Baidu Commercial Multimodal Understanding and AIGC Innovation Practices
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
Jan 4, 2024 · Artificial Intelligence

How to Strengthen AIGC Content Safety with Multimodal Data and Model Upgrades

The article examines the security challenges introduced by large‑model AIGC, outlines three technical upgrade paths—richer training data, few‑shot model fine‑tuning, and multimodal fusion—and demonstrates practical implementations that dramatically improve detection efficiency, accuracy, and scalability.

AI securityAIGCContent Safety
0 likes · 24 min read
How to Strengthen AIGC Content Safety with Multimodal Data and Model Upgrades
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Dec 22, 2023 · Artificial Intelligence

Machine Learning-Based Text‑Image Correlation Analysis

This article introduces a machine‑learning approach for correlating text and image data, covering preprocessing, feature extraction, model training, experimental results, and future directions, and provides complete Python code examples using NLP and deep‑learning libraries.

machine learningmultimodaltext-image correlation
0 likes · 17 min read
Machine Learning-Based Text‑Image Correlation Analysis
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Dec 9, 2023 · Artificial Intelligence

Google Unveils Gemini: A New Multimodal Large Model Family (Ultra, Pro, Nano)

Google announced Gemini, a suite of multimodal large language models—including Ultra, Pro, and Nano—that achieve state‑of‑the‑art results on dozens of benchmarks, support native multimodal pre‑training, and are being integrated across Google products such as Bard, Search, and upcoming Pixel devices.

BenchmarkGeminiGoogle AI
0 likes · 7 min read
Google Unveils Gemini: A New Multimodal Large Model Family (Ultra, Pro, Nano)
DataFunSummit
DataFunSummit
Dec 8, 2023 · Artificial Intelligence

Multimodal Cold‑Start Techniques for Music Recommendation at NetEase Cloud Music

This article presents NetEase Cloud Music's multimodal cold‑start solution, detailing the problem background, feature selection using CLIP, two modeling approaches (I2I2U indirect and U2I DSSM direct), contrastive learning enhancements, interest‑boundary modeling, and evaluation results showing significant gains in user engagement.

AIcold-startcontrastive learning
0 likes · 15 min read
Multimodal Cold‑Start Techniques for Music Recommendation at NetEase Cloud Music
360 Smart Cloud
360 Smart Cloud
Nov 20, 2023 · Artificial Intelligence

Overview of Recent Open‑Source AI Models and Tools (November 2023)

This article summarizes a collection of newly released open‑source AI projects covering natural‑language processing, multimodal processing, intelligent agents, recommendation systems, and model training acceleration, providing brief descriptions, key capabilities, and links to their repositories.

AIRecommendation Systemslarge language models
0 likes · 9 min read
Overview of Recent Open‑Source AI Models and Tools (November 2023)
php Courses
php Courses
Nov 10, 2023 · Artificial Intelligence

OpenAI Announces Data Partnership Program for Public and Private Training Datasets

OpenAI revealed a new data partnership initiative to collect large‑scale public and private datasets across multiple modalities, aiming to improve AI model safety and usefulness by incorporating diverse, hard‑to‑access human‑generated content while respecting privacy and intent.

AI training dataData PartnershipOpenAI
0 likes · 3 min read
OpenAI Announces Data Partnership Program for Public and Private Training Datasets
DataFunTalk
DataFunTalk
Nov 2, 2023 · Artificial Intelligence

Enhancing Language and Vision Models with External Knowledge and Tools: OREO‑LM, REVEAL, and AVIS

This article reviews recent research on augmenting language and multimodal models with external knowledge sources and tool‑calling mechanisms, covering three systems—OREO‑LM for knowledge‑graph reasoning, REVEAL for multi‑source visual‑language pretraining, and AVIS for dynamic tool selection—and their experimental results and implications.

Knowledge GraphLanguage ModelTool integration
0 likes · 28 min read
Enhancing Language and Vision Models with External Knowledge and Tools: OREO‑LM, REVEAL, and AVIS
Baobao Algorithm Notes
Baobao Algorithm Notes
Oct 23, 2023 · Artificial Intelligence

Why Multimodal AI Agents Could Be the Next Killer App for Large Models

The article recounts a personal test of a multimodal AI agent in Newport Beach and expands into a detailed analysis of current multimodal LLM architectures, memory mechanisms, task planning, tool usage, personality modeling, cost constraints, evaluation challenges, and the broader social and reliability implications of deploying such agents.

AI agentsCostMemory
0 likes · 44 min read
Why Multimodal AI Agents Could Be the Next Killer App for Large Models
DataFunSummit
DataFunSummit
Oct 17, 2023 · Artificial Intelligence

Enhancing Vision and Language Models with External Knowledge Graphs and Tool Integration

This article reviews recent research on augmenting language and vision models by incorporating external knowledge sources such as knowledge graphs, multi‑source retrieval, and dynamic tool‑calling frameworks, presenting three systems—OREO‑LM, REVEAL, and AVIS—and their experimental results.

AI researchLanguage ModelTool integration
0 likes · 27 min read
Enhancing Vision and Language Models with External Knowledge Graphs and Tool Integration
DataFunTalk
DataFunTalk
Sep 26, 2023 · Artificial Intelligence

MiniGPT-4: Enhancing Vision‑Language Understanding with Large Language Models

This article presents MiniGPT-4, a multimodal system that combines a frozen visual encoder (Q‑Former + ViT) with an open‑source large language model (Vicuna), describes its motivation, training pipeline, demo capabilities, observed limitations, and includes a brief Q&A session.

AI researchImage CaptioningMiniGPT-4
0 likes · 15 min read
MiniGPT-4: Enhancing Vision‑Language Understanding with Large Language Models
DataFunTalk
DataFunTalk
Sep 19, 2023 · Artificial Intelligence

Simultaneous Speech Translation: Technical Background, System Architecture, and Key Challenges

This article reviews the technical background of simultaneous speech translation, compares offline and real‑time scenarios, details ASR and MT technologies, describes the system architecture and design strategies, and discusses the major challenges and solutions for deploying robust, low‑latency translation services.

ASRHuaweiReal-Time
0 likes · 16 min read
Simultaneous Speech Translation: Technical Background, System Architecture, and Key Challenges
DaTaobao Tech
DaTaobao Tech
Sep 13, 2023 · Artificial Intelligence

Integrating Large Language Models with Recommendation Systems: Paradigms, Methods, and Experiments

The article surveys how large language models can be integrated into recommendation systems, either as feature extractors or as end‑to‑end recommenders, showing that LLM‑derived semantics improve recall, ranking, diversity, and user experience, and outlining future multimodal, efficiency, and re‑ranking directions.

EmbeddingLLMPrompt engineering
0 likes · 19 min read
Integrating Large Language Models with Recommendation Systems: Paradigms, Methods, and Experiments
DataFunTalk
DataFunTalk
Sep 5, 2023 · Artificial Intelligence

Baidu Commercial Multimodal Understanding and AIGC Innovation Practices

This article presents Baidu's commercial multimodal understanding framework and AIGC innovations, detailing rich-media multimodal perception, the VICAN‑12B multimodal representation‑generation model, scenario‑specific fine‑tuning, feature quantization for ranking, and practical applications such as marketing content generation, digital‑human video creation, and poster synthesis.

AIGCBaiduVisual Language
0 likes · 12 min read
Baidu Commercial Multimodal Understanding and AIGC Innovation Practices
Huolala Tech
Huolala Tech
Jul 21, 2023 · Artificial Intelligence

Visual Language Models Power Open-Set Detection and Surgical Tool Segmentation

Recent advances in visual language models enable zero-shot multimodal tasks, and this article explores their application to open-set object detection, prompt learning, and promptable surgical instrument segmentation, highlighting methods like CLIP, CoOp, and the DetPro framework with experimental results across multiple benchmarks.

Computer VisionVisual-Language Modelsmultimodal
0 likes · 12 min read
Visual Language Models Power Open-Set Detection and Surgical Tool Segmentation
360 Tech Engineering
360 Tech Engineering
Jul 6, 2023 · Artificial Intelligence

CSIG Enterprise Visit to Qihoo 360: Multimodal and Cross‑Modal Learning in the Era of Large Models

The CSIG‑hosted "Enterprise Visit – Into Qihoo 360" event on June 29, 2023 gathered over a thousand participants to explore multimodal and cross‑modal learning in the large‑model era, featuring keynote speeches from leading university researchers and Qihoo 360 AI experts, a tour of the company's facilities, and discussions on future AI research directions.

CSIGCross-modalQihoo360
0 likes · 8 min read
CSIG Enterprise Visit to Qihoo 360: Multimodal and Cross‑Modal Learning in the Era of Large Models
Tencent Cloud Developer
Tencent Cloud Developer
Jun 28, 2023 · Artificial Intelligence

Prompt Engineering: Fundamentals, Techniques, and Advanced Strategies

Prompt engineering teaches how to craft effective instructions, context, input data, and output formats for large language models, using clear commands, iterative refinement, and advanced methods such as zero‑shot, few‑shot, chain‑of‑thought, Tree of Thoughts, retrieval‑augmented and progressive‑hint prompting to achieve precise, reliable results across diverse tasks.

AIFew‑Shot LearningKnowledge Retrieval
0 likes · 17 min read
Prompt Engineering: Fundamentals, Techniques, and Advanced Strategies
Efficient Ops
Efficient Ops
Jun 26, 2023 · Artificial Intelligence

How Multimodal AI Is Revolutionizing Credit Card Fraud Detection

Amid tightening financial regulations, ICBC's software team proposes a multimodal AI anti‑fraud framework that combines image, video, and structured data to detect deep‑fake, mask, and forged‑document attacks, enriches verification with cross‑modal cues, and outlines future expansion to text and speech modalities.

AIComputer VisionDeep Learning
0 likes · 7 min read
How Multimodal AI Is Revolutionizing Credit Card Fraud Detection