Tagged articles

Multimodal

422 articles · Page 4 of 5
AntTech
AntTech
Aug 13, 2024 · Artificial Intelligence

Ant Group Contributions to ACL 2024: Summaries of 14 Accepted Papers Across NLP and AI

From August 11‑16, 2024 the ACL conference in Bangkok featured 14 Ant Group papers covering large‑scale information extraction, decomposed LLMs for semantic search, multimodal hallucination detection, long‑context attention mechanisms, concept‑reasoning datasets, knowledge‑graph alignment, and more, highlighting the group's breadth in natural language processing and AI research.

ACL2024Large Language ModelsMultimodal
0 likes · 20 min read
Ant Group Contributions to ACL 2024: Summaries of 14 Accepted Papers Across NLP and AI
DataFunSummit
DataFunSummit
Jul 28, 2024 · Artificial Intelligence

Leveraging Large Language Models for Graph Learning: Opportunities, Current Progress, and Future Directions

This article reviews why large language models can be applied to graph learning, outlines their capabilities and graph data characteristics, surveys current research across different graph types and LLM roles, and proposes future research directions for unified cross‑domain graph learning.

AIGraph Neural NetworksLarge Language Models
0 likes · 19 min read
Leveraging Large Language Models for Graph Learning: Opportunities, Current Progress, and Future Directions
Tencent Cloud Developer
Tencent Cloud Developer
Jul 18, 2024 · Artificial Intelligence

Exploring Large Language Models (LLM): Fundamentals, Applications, and Future Directions

Exploring Large Language Models, this article surveys their core concepts, evolution through Transformers, GPT and BERT, generation challenges, diverse applications such as QA, multimodal creation, summarization and retrieval‑augmented generation, prompt‑engineering frameworks and tools, LangChain‑based pipelines, AI‑driven agents, and future prospects toward domain‑specific use, multimodality, and AGI.

AIAgentLLM
0 likes · 35 min read
Exploring Large Language Models (LLM): Fundamentals, Applications, and Future Directions
Architects' Tech Alliance
Architects' Tech Alliance
Jul 10, 2024 · Industry Insights

Why AI Large Models Are Driving the Next Industrial Revolution

The article analyzes the rapid evolution of AI large models—from their role in advancing AGI through massive pre‑training and fine‑tuning, to current market dynamics led by GPT and domestic Chinese players, and finally to future multimodal applications, content‑factory capabilities, and emerging AIGC revenue models projected to reach trillion‑yuan scales by 2030.

AIAIGCGPT
0 likes · 7 min read
Why AI Large Models Are Driving the Next Industrial Revolution
Baobao Algorithm Notes
Baobao Algorithm Notes
Jul 8, 2024 · Industry Insights

Why Large‑Model Deployment Stalls: Robots, Scaling Laws, and Multimodal Frontiers

The article analyzes current challenges in deploying large AI models, covering robot automation, scaling‑law limits, vertical‑domain use cases, multimodal breakthroughs, algorithmic evolution, and the hardware‑software trade‑offs of training and inference infrastructures, while questioning ROI and practical feasibility.

Multimodalalgorithm evolutioninference infrastructure
0 likes · 21 min read
Why Large‑Model Deployment Stalls: Robots, Scaling Laws, and Multimodal Frontiers
AI Large Model Application Practice
AI Large Model Application Practice
Jul 4, 2024 · Artificial Intelligence

Mastering Multimodal RAG: From PDF Parsing to Advanced Query Rewriting

This article explains how to handle complex multimodal PDFs in RAG systems, outlines extraction, indexing, and multimodal model integration, details four query‑rewriting strategies (HyDE, stepwise, sub‑question, backward), and presents key evaluation metrics and tools for assessing RAG performance.

Document ParsingEvaluationMultimodal
0 likes · 12 min read
Mastering Multimodal RAG: From PDF Parsing to Advanced Query Rewriting
360 Tech Engineering
360 Tech Engineering
Jul 3, 2024 · Artificial Intelligence

360LayoutAnalysis: Open‑Source Lightweight Document Layout Analysis Models for Multiple Scenarios

The 360LayoutAnalysis project from 360 AI Lab releases lightweight, yolov8‑based layout analysis models covering Chinese and English papers, Chinese research reports, and a general document scenario, providing fast inference, paragraph‑level detection, and open‑source code and weights for flexible document‑understanding pipelines.

AI modelLayout AnalysisMultimodal
0 likes · 9 min read
360LayoutAnalysis: Open‑Source Lightweight Document Layout Analysis Models for Multiple Scenarios
JD Tech
JD Tech
Jun 28, 2024 · Artificial Intelligence

An Overview of Large Language Models: History, Fundamentals, Prompt Engineering, Retrieval‑Augmented Generation, Agents, and Multimodal AI

This article provides a comprehensive introduction to large language models, covering their historical development, core architecture, training process, prompt engineering techniques, Retrieval‑Augmented Generation, agent frameworks, multimodal capabilities, safety challenges, and future research directions.

AI agentsAI safetyDeep Learning
0 likes · 22 min read
An Overview of Large Language Models: History, Fundamentals, Prompt Engineering, Retrieval‑Augmented Generation, Agents, and Multimodal AI
AntTech
AntTech
Jun 18, 2024 · Artificial Intelligence

Ant Group’s 24 Papers Featured at CVPR2024: Topics and Abstracts

The IEEE CVPR2024 conference in Seattle accepted 2,719 papers out of 11,532 submissions, and Ant Group contributed 24 papers covering computer vision, deep learning, digital humans, large models, multimodal remote sensing, vision‑language distillation, federated incremental learning, model‑stealing defense, and more, with one highlighted as a highlight.

Ant GroupCVPR2024Deep Learning
0 likes · 17 min read
Ant Group’s 24 Papers Featured at CVPR2024: Topics and Abstracts
NewBeeNLP
NewBeeNLP
Jun 18, 2024 · Artificial Intelligence

How Shopee Builds an E‑Commerce Knowledge Graph and Leverages Large Models

This article presents Shopee's comprehensive approach to constructing an e‑commerce knowledge graph, detailing the challenges of heterogeneous data, multi‑language handling, entity disambiguation, and the integration of deep learning and large language models to improve product matching, recommendation, and operational efficiency.

AIMultimodale-commerce
0 likes · 22 min read
How Shopee Builds an E‑Commerce Knowledge Graph and Leverages Large Models
DataFunTalk
DataFunTalk
Jun 14, 2024 · Artificial Intelligence

Shopee's E‑commerce Knowledge Graph Construction and Integration with Large Models

This article presents Shopee's comprehensive exploration of building an e‑commerce knowledge graph, detailing its challenges, construction pipeline, AI‑driven extraction and fusion techniques, multilingual and multimodal modeling, and practical applications ranging from search and recommendation to AI assistants and real‑time updates.

AI ApplicationsLarge Language ModelsMultimodal
0 likes · 21 min read
Shopee's E‑commerce Knowledge Graph Construction and Integration with Large Models
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 13, 2024 · Artificial Intelligence

Creating a Full AI‑Generated Music Video with Large‑Model Agents

This article documents the end‑to‑end workflow of using large multimodal models and specialized agents to automatically generate a storyboard, compose original music and lyrics, produce keyframes, and assemble a complete music video, while highlighting the remaining manual steps and future automation possibilities.

AIAgentsMultimodal
0 likes · 10 min read
Creating a Full AI‑Generated Music Video with Large‑Model Agents
Baobao Algorithm Notes
Baobao Algorithm Notes
Jun 5, 2024 · Artificial Intelligence

Is GLM‑4‑9B the New Powerhouse? A Deep Dive into Its Performance and Usage

This article reviews the open‑source 9‑billion‑parameter GLM‑4‑9B model, covering installation, quick‑start inference code, quirky Chinese riddles that highlight its strengths over GPT‑4, extensive benchmark tables for dialogue, multilingual, tool‑calling and multimodal tasks, and its broader impact on the Chinese AI ecosystem.

AIGLM-4-9BMultimodal
0 likes · 14 min read
Is GLM‑4‑9B the New Powerhouse? A Deep Dive into Its Performance and Usage
DataFunSummit
DataFunSummit
Jun 4, 2024 · Artificial Intelligence

Multimodal and Graph Neural Network Techniques for eBay Recommendation Systems

This article details eBay's practical experience integrating multimodal data and graph neural networks into its recommendation pipeline, covering pain‑point analysis, a twin‑tower multimodal embedding model with triplet loss and TransH, engineering design, experimental results, and key takeaways for future AI‑driven product development.

EmbeddingGNNGraph Neural Network
0 likes · 19 min read
Multimodal and Graph Neural Network Techniques for eBay Recommendation Systems
Alimama Tech
Alimama Tech
May 29, 2024 · Artificial Intelligence

Mixture of Multi‑Modal Experts for Advertising Recall

The Mixed‑Modal Expert Model combines ID features with image and text embeddings through optimized representations and conditional output fusion, dramatically improving advertising recall—especially for long‑tail items—and delivering measurable gains in click‑recall, revenue, CTR, and page views in large‑scale online tests.

ModelMultimodalmachine learning
0 likes · 15 min read
Mixture of Multi‑Modal Experts for Advertising Recall
NewBeeNLP
NewBeeNLP
May 28, 2024 · Artificial Intelligence

How Generative Models Are Redefining Recommendation Systems

This article reviews recent advances in generative recommendation, highlighting challenges such as item representation and multimodal fusion, and summarizing four key research papers that propose novel tokenization, collaborative integration, and transformer-based multimodal approaches to improve recommendation performance.

AI researchLLMMultimodal
0 likes · 8 min read
How Generative Models Are Redefining Recommendation Systems
DataFunTalk
DataFunTalk
May 20, 2024 · Artificial Intelligence

Deploying OPPO Multi‑Modal Pretrained Models in Edge‑Cloud Scenarios: Techniques and Optimizations

This article presents OPPO's practical research on deploying multi‑modal pre‑training models across mobile devices and cloud, covering edge image‑text retrieval, text‑image generation and understanding optimizations, and lightweight diffusion model techniques, with detailed algorithmic improvements, performance results, and real‑world application cases.

AIGCMultimodalOPPO
0 likes · 18 min read
Deploying OPPO Multi‑Modal Pretrained Models in Edge‑Cloud Scenarios: Techniques and Optimizations
21CTO
21CTO
May 18, 2024 · Artificial Intelligence

What Makes GPT‑4o Faster, Smarter, and More Multimodal Than GPT‑4?

This article examines OpenAI's GPT‑4o, outlining its key performance, speed, accuracy, latency, multimodal, and resource‑efficiency improvements over GPT‑4, and explains why these enhancements broaden the model's applicability across various AI‑driven applications.

AI modelGPT-4oMultimodal
0 likes · 6 min read
What Makes GPT‑4o Faster, Smarter, and More Multimodal Than GPT‑4?
360 Tech Engineering
360 Tech Engineering
May 17, 2024 · Artificial Intelligence

360VL: An Open‑Source Multimodal Large Language Model Based on Llama‑3‑70B

The article introduces 360VL, an open‑source multimodal large language model built on Llama‑3‑70B, describes its novel C‑abs bridge architecture for high‑resolution visual understanding, outlines the two‑stage training with bilingual data, and presents benchmark results showing superior performance over prior LMMs.

AI researchLlama3Multimodal
0 likes · 8 min read
360VL: An Open‑Source Multimodal Large Language Model Based on Llama‑3‑70B
CSS Magic
CSS Magic
May 14, 2024 · Artificial Intelligence

First Look at GPT-4o: Hands‑On Experience, FAQs, and New Free‑User Benefits

The article provides a hands‑on review of OpenAI's newly released GPT‑4o model, covering its multimodal capabilities, real‑time voice demo, desktop client rollout, access options for paid and free users, practical usage tips, and early observations on API performance and limitations.

AI modelAPIChatGPT
0 likes · 9 min read
First Look at GPT-4o: Hands‑On Experience, FAQs, and New Free‑User Benefits
DataFunSummit
DataFunSummit
Apr 24, 2024 · Artificial Intelligence

Multimodal Content Understanding in Baidu Commercial Systems: The ViCAN Model and Its Applications

This article presents Baidu's exploration of multimodal content understanding for commercial advertising, detailing the ViCAN pre‑training model, its contrastive and mask‑language learning tasks, integration across recall, ranking and risk‑control pipelines, quantization with MMDict, and future AIGC‑driven generation, all backed by extensive experiments and Q&A.

AIAIGCAdvertising
0 likes · 27 min read
Multimodal Content Understanding in Baidu Commercial Systems: The ViCAN Model and Its Applications
21CTO
21CTO
Apr 20, 2024 · Artificial Intelligence

What Developers Need to Know About Meta’s New Open‑Source Llama 3 Model

Meta’s newly open‑source Llama 3 model pushes the frontier of large language models with a larger context window, Mixture‑of‑Experts architecture, multilingual support, and multimodal capabilities, while facing challenges in transparency, bias, and computational resources, and offering diverse applications from NLU to code generation.

AIBenchmarkLlama3
0 likes · 10 min read
What Developers Need to Know About Meta’s New Open‑Source Llama 3 Model
Architects' Tech Alliance
Architects' Tech Alliance
Apr 7, 2024 · Artificial Intelligence

How Sora Is Redefining Text‑to‑Video Generation: Inside the New AI Model

Sora, the newly announced text‑to‑video large model, can generate one‑minute high‑fidelity videos from textual prompts or static images, handling complex scenes, expressive characters, and sophisticated camera motions while also supporting video extension and frame‑filling, positioning it at the forefront of multimodal AI research.

AI modelMultimodalSora
0 likes · 6 min read
How Sora Is Redefining Text‑to‑Video Generation: Inside the New AI Model
DataFunSummit
DataFunSummit
Mar 27, 2024 · Artificial Intelligence

Generative Multimodal Pretraining (OFA) and Representational Multimodal Pretraining (ONE-PEACE): Research Overview and Findings

This article reviews Tongyi Lab's work on the OFA framework for generative multimodal pretraining and the ONE-PEACE model for unified multimodal representation learning, detailing their architectures, training strategies, experimental results across vision‑language and audio tasks, and future research directions.

MultimodalOFAONE-PEACE
0 likes · 15 min read
Generative Multimodal Pretraining (OFA) and Representational Multimodal Pretraining (ONE-PEACE): Research Overview and Findings
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Mar 18, 2024 · Artificial Intelligence

How MuLTI Achieves Memory‑Efficient Video‑Language Understanding with Text‑Guided MultiWay Sampling

The paper presents MuLTI, a multimodal video‑language model that tackles the memory and efficiency challenges of long video‑text sequences by introducing a Text‑Guided MultiWay Sampler and a Multiple Choice Modeling pre‑training task, achieving state‑of‑the‑art results on video QA and retrieval while drastically reducing GPU memory consumption.

Multimodalefficient-aifeature fusion
0 likes · 19 min read
How MuLTI Achieves Memory‑Efficient Video‑Language Understanding with Text‑Guided MultiWay Sampling
NewBeeNLP
NewBeeNLP
Feb 27, 2024 · Artificial Intelligence

Boosting E‑Commerce AIGC with Knowledge Graphs: From Multimodal Inputs to Controlled LLMs

The article details how JD.com leverages domain‑specific and generic knowledge graphs to enhance multimodal product information, improve controlled text generation, and boost LLM performance for e‑commerce copywriting, covering model architecture, copy‑only mechanisms, token‑type encoding, experimental results, and practical deployment scenarios.

AIGCLLMMultimodal
0 likes · 23 min read
Boosting E‑Commerce AIGC with Knowledge Graphs: From Multimodal Inputs to Controlled LLMs
21CTO
21CTO
Feb 17, 2024 · Artificial Intelligence

How OpenAI’s Sora Is Pushing Video Generation to New Frontiers

OpenAI’s Sora model demonstrates large‑scale text‑conditional video generation using a diffusion transformer that operates on spatiotemporal patches, supporting variable durations, resolutions, and aspect ratios while showcasing emergent simulation abilities, flexible sampling, and multimodal editing capabilities, though it still has notable limitations.

AI researchDiffusion ModelsMultimodal
0 likes · 19 min read
How OpenAI’s Sora Is Pushing Video Generation to New Frontiers
Java Tech Enthusiast
Java Tech Enthusiast
Feb 16, 2024 · Artificial Intelligence

Google's Gemini 1.5: Breakthrough in Long-Context Understanding and Multimodal Capabilities

Google’s Gemini 1.5, a new multimodal Mixture‑of‑Experts model, supports up to a million‑token context (10 million internally), can understand text, video, audio and code, learns a new language from a single prompt, and is already being used by Samsung, Jasper and Quora, positioning it as a direct challenger to OpenAI’s flagship models.

Gemini 1.5Google AILLM
0 likes · 7 min read
Google's Gemini 1.5: Breakthrough in Long-Context Understanding and Multimodal Capabilities
Baobao Algorithm Notes
Baobao Algorithm Notes
Feb 4, 2024 · Industry Insights

Balancing Fun, Utility, and Slow Thinking: The Future of AI Agents

In this talk, the speaker examines the dual goals of AI agents—being entertaining and useful—while introducing the concepts of fast and slow thinking, multimodal perception, long‑term memory, retrieval‑augmented generation, and tool integration as essential steps toward building truly valuable digital companions.

AI agentsFuture AIMultimodal
0 likes · 18 min read
Balancing Fun, Utility, and Slow Thinking: The Future of AI Agents
DataFunSummit
DataFunSummit
Jan 10, 2024 · Artificial Intelligence

Baidu Commercial Multimodal Understanding and AIGC Innovation Practices

This article presents Baidu's commercial multimodal understanding and AIGC innovations, detailing rich‑media multimodal perception, a unified large‑scale representation framework, scenario‑specific fine‑tuning, and practical applications such as marketing copy, digital‑human video, and poster generation.

AIGCAdvertisingBaidu
0 likes · 12 min read
Baidu Commercial Multimodal Understanding and AIGC Innovation Practices
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
Jan 4, 2024 · Artificial Intelligence

How to Strengthen AIGC Content Safety with Multimodal Data and Model Upgrades

The article examines the security challenges introduced by large‑model AIGC, outlines three technical upgrade paths—richer training data, few‑shot model fine‑tuning, and multimodal fusion—and demonstrates practical implementations that dramatically improve detection efficiency, accuracy, and scalability.

AI securityAIGCContent Safety
0 likes · 24 min read
How to Strengthen AIGC Content Safety with Multimodal Data and Model Upgrades
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Dec 22, 2023 · Artificial Intelligence

Machine Learning-Based Text‑Image Correlation Analysis

This article introduces a machine‑learning approach for correlating text and image data, covering preprocessing, feature extraction, model training, experimental results, and future directions, and provides complete Python code examples using NLP and deep‑learning libraries.

Multimodalmachine learningtext-image correlation
0 likes · 17 min read
Machine Learning-Based Text‑Image Correlation Analysis
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Dec 9, 2023 · Artificial Intelligence

Google Unveils Gemini: A New Multimodal Large Model Family (Ultra, Pro, Nano)

Google announced Gemini, a suite of multimodal large language models—including Ultra, Pro, and Nano—that achieve state‑of‑the‑art results on dozens of benchmarks, support native multimodal pre‑training, and are being integrated across Google products such as Bard, Search, and upcoming Pixel devices.

Artificial IntelligenceBenchmarkGemini
0 likes · 7 min read
Google Unveils Gemini: A New Multimodal Large Model Family (Ultra, Pro, Nano)
DataFunSummit
DataFunSummit
Dec 8, 2023 · Artificial Intelligence

Multimodal Cold‑Start Techniques for Music Recommendation at NetEase Cloud Music

This article presents NetEase Cloud Music's multimodal cold‑start solution, detailing the problem background, feature selection using CLIP, two modeling approaches (I2I2U indirect and U2I DSSM direct), contrastive learning enhancements, interest‑boundary modeling, and evaluation results showing significant gains in user engagement.

AIMultimodalcold-start
0 likes · 15 min read
Multimodal Cold‑Start Techniques for Music Recommendation at NetEase Cloud Music
360 Smart Cloud
360 Smart Cloud
Nov 20, 2023 · Artificial Intelligence

Overview of Recent Open‑Source AI Models and Tools (November 2023)

This article summarizes a collection of newly released open‑source AI projects covering natural‑language processing, multimodal processing, intelligent agents, recommendation systems, and model training acceleration, providing brief descriptions, key capabilities, and links to their repositories.

AILarge Language ModelsMultimodal
0 likes · 9 min read
Overview of Recent Open‑Source AI Models and Tools (November 2023)
php Courses
php Courses
Nov 10, 2023 · Artificial Intelligence

OpenAI Announces Data Partnership Program for Public and Private Training Datasets

OpenAI revealed a new data partnership initiative to collect large‑scale public and private datasets across multiple modalities, aiming to improve AI model safety and usefulness by incorporating diverse, hard‑to‑access human‑generated content while respecting privacy and intent.

AI training dataData PartnershipMultimodal
0 likes · 3 min read
OpenAI Announces Data Partnership Program for Public and Private Training Datasets
DataFunTalk
DataFunTalk
Nov 2, 2023 · Artificial Intelligence

Enhancing Language and Vision Models with External Knowledge and Tools: OREO‑LM, REVEAL, and AVIS

This article reviews recent research on augmenting language and multimodal models with external knowledge sources and tool‑calling mechanisms, covering three systems—OREO‑LM for knowledge‑graph reasoning, REVEAL for multi‑source visual‑language pretraining, and AVIS for dynamic tool selection—and their experimental results and implications.

Language ModelMultimodalTool Integration
0 likes · 28 min read
Enhancing Language and Vision Models with External Knowledge and Tools: OREO‑LM, REVEAL, and AVIS
Baobao Algorithm Notes
Baobao Algorithm Notes
Oct 23, 2023 · Artificial Intelligence

Why Multimodal AI Agents Could Be the Next Killer App for Large Models

The article recounts a personal test of a multimodal AI agent in Newport Beach and expands into a detailed analysis of current multimodal LLM architectures, memory mechanisms, task planning, tool usage, personality modeling, cost constraints, evaluation challenges, and the broader social and reliability implications of deploying such agents.

AI agentsEvaluationMultimodal
0 likes · 44 min read
Why Multimodal AI Agents Could Be the Next Killer App for Large Models
DataFunTalk
DataFunTalk
Sep 26, 2023 · Artificial Intelligence

MiniGPT-4: Enhancing Vision‑Language Understanding with Large Language Models

This article presents MiniGPT-4, a multimodal system that combines a frozen visual encoder (Q‑Former + ViT) with an open‑source large language model (Vicuna), describes its motivation, training pipeline, demo capabilities, observed limitations, and includes a brief Q&A session.

AI researchImage CaptioningMiniGPT-4
0 likes · 15 min read
MiniGPT-4: Enhancing Vision‑Language Understanding with Large Language Models
DataFunTalk
DataFunTalk
Sep 19, 2023 · Artificial Intelligence

Simultaneous Speech Translation: Technical Background, System Architecture, and Key Challenges

This article reviews the technical background of simultaneous speech translation, compares offline and real‑time scenarios, details ASR and MT technologies, describes the system architecture and design strategies, and discusses the major challenges and solutions for deploying robust, low‑latency translation services.

ASRHuaweiMachine Translation
0 likes · 16 min read
Simultaneous Speech Translation: Technical Background, System Architecture, and Key Challenges
DaTaobao Tech
DaTaobao Tech
Sep 13, 2023 · Artificial Intelligence

Integrating Large Language Models with Recommendation Systems: Paradigms, Methods, and Experiments

The article surveys how large language models can be integrated into recommendation systems, either as feature extractors or as end‑to‑end recommenders, showing that LLM‑derived semantics improve recall, ranking, diversity, and user experience, and outlining future multimodal, efficiency, and re‑ranking directions.

EmbeddingLLMMultimodal
0 likes · 19 min read
Integrating Large Language Models with Recommendation Systems: Paradigms, Methods, and Experiments
DataFunTalk
DataFunTalk
Sep 5, 2023 · Artificial Intelligence

Baidu Commercial Multimodal Understanding and AIGC Innovation Practices

This article presents Baidu's commercial multimodal understanding framework and AIGC innovations, detailing rich-media multimodal perception, the VICAN‑12B multimodal representation‑generation model, scenario‑specific fine‑tuning, feature quantization for ranking, and practical applications such as marketing content generation, digital‑human video creation, and poster synthesis.

AIGCBaiduMultimodal
0 likes · 12 min read
Baidu Commercial Multimodal Understanding and AIGC Innovation Practices
Huolala Tech
Huolala Tech
Jul 21, 2023 · Artificial Intelligence

Visual Language Models Power Open-Set Detection and Surgical Tool Segmentation

Recent advances in visual language models enable zero-shot multimodal tasks, and this article explores their application to open-set object detection, prompt learning, and promptable surgical instrument segmentation, highlighting methods like CLIP, CoOp, and the DetPro framework with experimental results across multiple benchmarks.

MultimodalSemantic Segmentationcomputer vision
0 likes · 12 min read
Visual Language Models Power Open-Set Detection and Surgical Tool Segmentation
360 Tech Engineering
360 Tech Engineering
Jul 6, 2023 · Artificial Intelligence

CSIG Enterprise Visit to Qihoo 360: Multimodal and Cross‑Modal Learning in the Era of Large Models

The CSIG‑hosted "Enterprise Visit – Into Qihoo 360" event on June 29, 2023 gathered over a thousand participants to explore multimodal and cross‑modal learning in the large‑model era, featuring keynote speeches from leading university researchers and Qihoo 360 AI experts, a tour of the company's facilities, and discussions on future AI research directions.

CSIGMultimodalQihoo360
0 likes · 8 min read
CSIG Enterprise Visit to Qihoo 360: Multimodal and Cross‑Modal Learning in the Era of Large Models
Tencent Cloud Developer
Tencent Cloud Developer
Jun 28, 2023 · Artificial Intelligence

Prompt Engineering: Fundamentals, Techniques, and Advanced Strategies

Prompt engineering teaches how to craft effective instructions, context, input data, and output formats for large language models, using clear commands, iterative refinement, and advanced methods such as zero‑shot, few‑shot, chain‑of‑thought, Tree of Thoughts, retrieval‑augmented and progressive‑hint prompting to achieve precise, reliable results across diverse tasks.

AIChain-of-ThoughtMultimodal
0 likes · 17 min read
Prompt Engineering: Fundamentals, Techniques, and Advanced Strategies
Efficient Ops
Efficient Ops
Jun 26, 2023 · Artificial Intelligence

How Multimodal AI Is Revolutionizing Credit Card Fraud Detection

Amid tightening financial regulations, ICBC's software team proposes a multimodal AI anti‑fraud framework that combines image, video, and structured data to detect deep‑fake, mask, and forged‑document attacks, enriches verification with cross‑modal cues, and outlines future expansion to text and speech modalities.

AIDeep LearningMultimodal
0 likes · 7 min read
How Multimodal AI Is Revolutionizing Credit Card Fraud Detection
DataFunSummit
DataFunSummit
Jun 14, 2023 · Artificial Intelligence

DataFun Summit 2023: Large Language Models and AIGC Conference

DataFun will host the DataFun Summit 2023 on June 17‑18, featuring three chairs and eight presenters who will discuss core topics such as large language model research, multimodal generation, reinforcement learning, tool learning, distributed training, and industry applications, with free registration via QR code.

AI ConferenceAIGCLarge Language Models
0 likes · 42 min read
DataFun Summit 2023: Large Language Models and AIGC Conference
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Jun 12, 2023 · Artificial Intelligence

Comprehensive Guide to Using OpenAI APIs: Models, Prompts, Embeddings, Fine‑Tuning, LangChain, and Multimodal Applications

This article provides a detailed, step‑by‑step tutorial on OpenAI’s language models, API endpoints, prompt engineering, embeddings, moderation, fine‑tuning, LangChain workflows, memory management, and multimodal capabilities such as audio transcription and image generation, complete with code examples and practical usage tips.

APIEmbeddingLangChain
0 likes · 45 min read
Comprehensive Guide to Using OpenAI APIs: Models, Prompts, Embeddings, Fine‑Tuning, LangChain, and Multimodal Applications
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Jun 11, 2023 · Artificial Intelligence

Comprehensive Technical Overview of GPT Series, Transformers, and Emerging Capabilities in Large Language Models

This article provides a detailed technical review of the evolution of GPT models, the Transformer architecture, large language model training methods, emergent abilities such as in‑context learning and chain‑of‑thought, multimodal extensions, and the challenges of data, scaling, and alignment, offering a holistic view for researchers and practitioners.

AIGPTInstructGPT
0 likes · 28 min read
Comprehensive Technical Overview of GPT Series, Transformers, and Emerging Capabilities in Large Language Models
NetEase LeiHuo Testing Center
NetEase LeiHuo Testing Center
Jun 2, 2023 · Artificial Intelligence

AI Techniques for a Global Search Platform: Word Segmentation, Text Similarity, Image Retrieval, and Multimodal Models

This article shares the development of a global search platform that leverages AI technologies such as Chinese word segmentation, part‑of‑speech tagging, text similarity via Simhash and Synonyms, image similarity using histogram, Hamming distance and ResNet‑50, and multimodal CLIP‑based models to improve search efficiency and accuracy.

AIMultimodalNLP
0 likes · 12 min read
AI Techniques for a Global Search Platform: Word Segmentation, Text Similarity, Image Retrieval, and Multimodal Models
Programmer DD
Programmer DD
May 5, 2023 · Artificial Intelligence

How Microsoft’s Bing Chat Upgrade Turns Search into an AI Copilot

Microsoft has fully opened Bing Chat to all users, introducing multimodal responses, a multilingual Image Creator, persistent chat history, and upcoming plugin support, while sharing usage statistics and outlining weekly update plans that position Bing as an AI‑driven search copilot competing with ChatGPT.

AIBingChatMicrosoft
0 likes · 8 min read
How Microsoft’s Bing Chat Upgrade Turns Search into an AI Copilot
DataFunSummit
DataFunSummit
Apr 20, 2023 · Artificial Intelligence

Mengzi Lightweight Model Technology System and Advances in Small‑Scale and Retrieval‑Augmented Pretraining

This presentation introduces the Mengzi lightweight model technology stack, covering large‑scale pre‑training, motivations for lightweight models, detailed techniques such as knowledge and sequence‑relation enhancement, training optimization, model compression, retrieval‑augmented pre‑training, multimodal extensions, open‑source releases, and real‑world applications.

Knowledge DistillationLarge Language ModelsMultimodal
0 likes · 23 min read
Mengzi Lightweight Model Technology System and Advances in Small‑Scale and Retrieval‑Augmented Pretraining
21CTO
21CTO
Apr 2, 2023 · Artificial Intelligence

Can GPT‑4 Be Considered Early AGI? Insights from Microsoft’s 155‑Page Study

This article reviews Microsoft’s extensive 155‑page work on early experiments with GPT‑4, exploring how the model approaches artificial general intelligence, its testing methodology, multimodal capabilities, programming and mathematical performance, interaction with tools and humans, limitations, societal impact, and future research directions.

AI safetyArtificial General IntelligenceGPT-4
0 likes · 15 min read
Can GPT‑4 Be Considered Early AGI? Insights from Microsoft’s 155‑Page Study
DataFunTalk
DataFunTalk
Apr 1, 2023 · Artificial Intelligence

Nvidia Meets OpenAI: Highlights from the GTC Fireside Chat on GPT‑4, Deep Learning History, and the Future of AI

In a GTC fireside chat, Nvidia CEO Jensen Huang and OpenAI co‑founder Ilya Sutskever discuss GPT‑4's multimodal advances, the evolution of deep learning from early neural networks to large‑scale models, the pivotal role of GPUs and datasets like ImageNet, and their vision for more reliable, scalable artificial intelligence.

Artificial IntelligenceDeep LearningGPT-4
0 likes · 10 min read
Nvidia Meets OpenAI: Highlights from the GTC Fireside Chat on GPT‑4, Deep Learning History, and the Future of AI
Programmer DD
Programmer DD
Mar 22, 2023 · Artificial Intelligence

How Baidu’s Ernie Bot Stacks Up Against GPT‑4: A Deep Dive

The article reviews Baidu’s newly launched Ernie Bot, a multimodal large language model, comparing its literary, business, mathematical, Chinese comprehension, and multimodal abilities with GPT‑4, while detailing the underlying technologies, knowledge‑enhancement techniques, and deployment strategy behind the model.

AI comparisonBaiduErnie Bot
0 likes · 10 min read
How Baidu’s Ernie Bot Stacks Up Against GPT‑4: A Deep Dive
Python Programming Learning Circle
Python Programming Learning Circle
Mar 18, 2023 · Artificial Intelligence

Baidu’s ERNIE Bot (Wenxin Yiyan) Launch: Features, Use Cases, and Technical Architecture

Baidu unveiled its new generative AI chatbot ERNIE Bot, showcasing five practical scenarios, multimodal generation, a detailed technical stack based on the ERNIE and PLATO models, and a comparison with ChatGPT and Bing Chat, while also announcing its invitation‑only testing program and API access for enterprises.

Artificial IntelligenceBaiduChatbot
0 likes · 12 min read
Baidu’s ERNIE Bot (Wenxin Yiyan) Launch: Features, Use Cases, and Technical Architecture
Architecture Digest
Architecture Digest
Mar 17, 2023 · Artificial Intelligence

Baidu’s Ernie Bot (Wenxin Yiyan) vs GPT‑4: Capabilities, Technical Foundations, and Market Reaction

The article reviews Baidu's launch of the multimodal large language model Wenxin Yiyan, compares its literary, business, mathematical, Chinese‑understanding and multimodal abilities with GPT‑4, explains the underlying six‑core technologies and hardware stack, and reports the mixed market and netizen response.

AIBaiduErnie Bot
0 likes · 11 min read
Baidu’s Ernie Bot (Wenxin Yiyan) vs GPT‑4: Capabilities, Technical Foundations, and Market Reaction
DataFunSummit
DataFunSummit
Mar 15, 2023 · Artificial Intelligence

Key Features and Capabilities of OpenAI's GPT‑4

OpenAI's GPT‑4, a large multimodal language model, expands token limits, adds image understanding, demonstrates strong reasoning on professional exams, supports many languages, and is already integrated into Microsoft Bing, while offering various access options and improved safety compared to its predecessor.

AIGPT-4Microsoft Bing
0 likes · 9 min read
Key Features and Capabilities of OpenAI's GPT‑4
Alimama Tech
Alimama Tech
Feb 1, 2023 · Artificial Intelligence

CapOnImage: Context-driven Dense Captioning on Images

The paper presents CapOnImage, a novel image‑on‑image captioning task that generates location‑specific decorative text for product images, introduces the 2.1‑million‑image CapOnImage2M dataset, and proposes a mixed‑modality transformer with position‑aware pre‑training and progressive training, achieving superior accuracy and diversity and already deployed in Alibaba’s advertising platforms for measurable business impact.

Context-AwareDeep LearningImage Captioning
0 likes · 9 min read
CapOnImage: Context-driven Dense Captioning on Images
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Jan 4, 2023 · Artificial Intelligence

Relevance Modeling and Ranking for Cloud Music Video Search

The paper details Cloud Music’s video‑search pipeline—query understanding, recall, relevance, ranking and re‑ranking—highlighting challenges such as ambiguous content, timeliness and multi‑objective goals, and describes two deployed models (a twin‑tower aspect relevance network and a click‑graph propagator) that together boost click‑through rate by 1.5 % and effective CTR by 2.3 %.

MultimodalRankingclick graph
0 likes · 24 min read
Relevance Modeling and Ranking for Cloud Music Video Search
DataFunTalk
DataFunTalk
Dec 17, 2022 · Artificial Intelligence

Multimodal Pre‑training Techniques and Applications – Overview, OPPOVL Dataset, Architecture, and Performance

This article presents a comprehensive overview of multimodal pre‑training, describing its motivation, architecture choices, large‑scale Chinese image‑text dataset construction, training optimizations, performance benchmarks, downstream applications, and a Q&A session that highlights practical deployment considerations.

Deep LearningLarge-Scale DataMultimodal
0 likes · 16 min read
Multimodal Pre‑training Techniques and Applications – Overview, OPPOVL Dataset, Architecture, and Performance
DataFunSummit
DataFunSummit
Oct 20, 2022 · Artificial Intelligence

End-to-End Speech Relation Extraction

This paper presents an end‑to‑end approach for extracting relational triples directly from speech signals, bypassing intermediate transcription, and demonstrates its effectiveness on synthesized speech versions of the CoNLL04 and TACRED datasets, highlighting challenges such as length constraints and cross‑modal alignment.

End-to-EndMultimodalnatural language processing
0 likes · 17 min read
End-to-End Speech Relation Extraction
HelloTech
HelloTech
Oct 19, 2022 · Artificial Intelligence

Intelligent Creative System: Types, Quality Evaluation, Generation Models, and Optimization

The Intelligent Creative System defines advertising creatives across formats, evaluates image and text quality using reference‑based metrics and models like DeepBIQ, generates multimodal ads via GANs and Transformers, and selects optimal variants through bandit‑based CTR prediction and multimodal fusion, enabling scalable, data‑driven creative production.

AIBandit ModelGaN
0 likes · 10 min read
Intelligent Creative System: Types, Quality Evaluation, Generation Models, and Optimization
DataFunTalk
DataFunTalk
Sep 13, 2022 · Artificial Intelligence

Intelligent Question Answering in QQ Browser Search: Background, Key Technologies, and Frontier Research

This article presents an in‑depth overview of intelligent question answering in QQ Browser search, covering its background, the core KBQA and DeepQA technologies, system architecture, challenges, recent advances such as end‑to‑end, knowledge‑guided and multimodal QA, and practical Q&A for deployment.

AIDeep LearningMultimodal
0 likes · 22 min read
Intelligent Question Answering in QQ Browser Search: Background, Key Technologies, and Frontier Research
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Aug 17, 2022 · Artificial Intelligence

Live Streaming Recommendation Practices in NetEase Cloud Music: Real-time, Multi-target, and Multimodal Approaches

The paper describes NetEase Cloud Music’s LOOK live‑streaming recommendation system for the song‑playback page, which combines millisecond‑level real‑time feature pipelines, multi‑target optimization (click, watch, gift, comment) via ESMM+FM and MMoE models, GradNorm‑based loss fusion, and a multimodal avatar‑text‑host ranking model, achieving double‑digit CTR and CTCVR gains while balancing producer and consumer retention.

ESMMGradNormLive Streaming
0 likes · 26 min read
Live Streaming Recommendation Practices in NetEase Cloud Music: Real-time, Multi-target, and Multimodal Approaches
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jul 29, 2022 · Artificial Intelligence

Unlock Chinese Text-to-Image Generation with EasyNLP’s Open‑Source Models

This article introduces EasyNLP’s newly integrated Chinese text‑to‑image generation framework, explains the underlying Transformer‑VQGAN architecture, provides model specifications, code snippets, performance benchmarks on multiple datasets, and step‑by‑step tutorials for fine‑tuning and inference using open‑source checkpoints.

AI generationChinese NLPEasyNLP
0 likes · 20 min read
Unlock Chinese Text-to-Image Generation with EasyNLP’s Open‑Source Models
DataFunSummit
DataFunSummit
Jul 27, 2022 · Artificial Intelligence

DataFun 2022 Natural Language Processing Summit – Leading Experts Discuss Large‑Scale Language Models, Multimodal Understanding, Dialogue Systems and AI Applications

The DataFun 2022 NLP Summit, held on July 30, brings together top researchers and industry leaders from Alibaba, Baidu, Microsoft, Amazon, and more to present the latest advances in large‑scale pre‑training, multimodal perception, information extraction, dialogue interaction, machine translation, and practical AI deployments, with live streaming and free registration via QR code.

AIDialogue SystemsLarge Language Models
0 likes · 44 min read
DataFun 2022 Natural Language Processing Summit – Leading Experts Discuss Large‑Scale Language Models, Multimodal Understanding, Dialogue Systems and AI Applications
DataFunSummit
DataFunSummit
Jul 27, 2022 · Artificial Intelligence

Intelligent Creative Advertising: Content Understanding, Generation, and Distribution at JD.com

This article presents JD.com's end‑to‑end intelligent creative system, covering the background of content‑driven e‑commerce, a multi‑stage content understanding pipeline, AI‑powered video, image and copy generation, multimodal creative selection and distribution, and real‑world business impact.

AIAdvertisingMultimodal
0 likes · 27 min read
Intelligent Creative Advertising: Content Understanding, Generation, and Distribution at JD.com
Alimama Tech
Alimama Tech
Jul 13, 2022 · Artificial Intelligence

Fully Automatic Template‑Free Image‑Text Creative Generation System

Alibaba Alimama’s fully automatic, template‑free image‑text creative generation system uses deep‑learning models across material mining, layout synthesis, on‑image copy generation, and visual attribute rendering to produce personalized ad creatives directly from product images and metadata, achieving roughly 19 % CTR lift over prior template‑based methods.

AIAd CreativeAutomation
0 likes · 19 min read
Fully Automatic Template‑Free Image‑Text Creative Generation System
DataFunTalk
DataFunTalk
Jul 9, 2022 · Artificial Intelligence

Education Knowledge Graph: Opportunities and Challenges

The article provides a comprehensive overview of education knowledge graphs, explaining their definition, significance, diverse application scenarios such as smart textbooks, deep reading, subject insight, and intelligent services, while also analyzing technical challenges like data heterogeneity, granularity, multimodality, quality control, and proposing future research directions.

Artificial IntelligenceIntelligent TutoringMultimodal
0 likes · 25 min read
Education Knowledge Graph: Opportunities and Challenges
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jun 20, 2022 · Artificial Intelligence

Action Sequence Verification in Videos with CosAlignment Transformer (CAT)

The paper introduces Action Sequence Verification (ASV), a task that determines whether two videos follow the same ordered actions, provides the Chemical Sequence Verification dataset and re‑annotated COIN‑SV and Diving48‑SV sets, and proposes the CosAlignment Transformer (CAT) with intra‑step feature extraction, a Transformer‑based inter‑step encoder, and a sequence‑alignment loss that outperforms prior baselines and serves as a pre‑training model for video retrieval and classification.

Action VerificationMultimodalTransformer
0 likes · 7 min read
Action Sequence Verification in Videos with CosAlignment Transformer (CAT)
JD Retail Technology
JD Retail Technology
Jun 16, 2022 · Artificial Intelligence

2022 Global AI Technology Innovation Competition – Algorithm Challenge: Connecting AI with E‑commerce

The 2022 Global AI Technology Innovation Competition – Algorithm Challenge, co‑hosted by JD Retail and academic partners, brought together 12 finalist teams from over 3,000 entrants to tackle e‑commerce‑focused AI problems such as multimodal image‑text matching and product‑title entity recognition, highlighting real‑world business impact and fostering talent exchange.

AI competitionJD RetailMultimodal
0 likes · 8 min read
2022 Global AI Technology Innovation Competition – Algorithm Challenge: Connecting AI with E‑commerce
AntTech
AntTech
Jun 15, 2022 · Artificial Intelligence

XYLayoutLM: Towards Layout-Aware Multimodal Networks for Visually-Rich Document Understanding

XYLayoutLM introduces a layout‑aware multimodal network that improves visually‑rich document understanding by augmenting XY‑Cut for robust reading order generation and employing a Dilated Conditional Position Encoding to handle variable‑length inputs, achieving state‑of‑the‑art performance on XFUN and FUNSD datasets.

MultimodalVision TransformerXYCut
0 likes · 10 min read
XYLayoutLM: Towards Layout-Aware Multimodal Networks for Visually-Rich Document Understanding
DaTaobao Tech
DaTaobao Tech
May 27, 2022 · Artificial Intelligence

Multimodal Pretraining for Search Recall in E-commerce

The paper proposes a multimodal pre‑training framework that jointly encodes query text and item titles with images via shared and single‑stream towers, using MLM, MPM, QIC, and matching tasks, and demonstrates substantial Recall@K gains on a billion‑item e‑commerce catalog by leveraging visual cues to bridge the semantic gap.

MultimodalVector Retrievale-commerce
0 likes · 17 min read
Multimodal Pretraining for Search Recall in E-commerce
DataFunTalk
DataFunTalk
May 20, 2022 · Artificial Intelligence

Hierarchical Graph Convolutional Networks for Video Social Relationship Modeling

This article presents a multimodal approach that combines dynamic analysis and graph machine learning to generate and apply social relationship graphs in videos, detailing problem background, graph generation modules, applications such as video retrieval, experimental results, and future research directions.

AIGraph Neural NetworkMultimodal
0 likes · 11 min read
Hierarchical Graph Convolutional Networks for Video Social Relationship Modeling
Laiye Technology Team
Laiye Technology Team
May 18, 2022 · Artificial Intelligence

Overview of Document Intelligence Models: StrucText, LayoutLMv3, and GraphDoc

This article reviews three representative document intelligence models—StrucText, LayoutLMv3, and GraphDoc—detailing their input features, feature fusion strategies, self‑supervised tasks, and underlying architectures, and explains how they learn embeddings for segments, words, or regions to enable classification and key‑value extraction.

Graph Neural NetworksLayout AnalysisMultimodal
0 likes · 15 min read
Overview of Document Intelligence Models: StrucText, LayoutLMv3, and GraphDoc
Bilibili Tech
Bilibili Tech
May 10, 2022 · Artificial Intelligence

Glance Supervised Video Moment Retrieval via the ViGA Framework

The paper presents a glance‑supervised video moment retrieval approach that records a single annotator‑seen frame, introduces the ViGA contrastive learning framework to leverage this weak temporal cue, and demonstrates on three benchmarks performance rivaling fully supervised methods while keeping annotation cost minimal.

Glance SupervisionMultimodalViGA
0 likes · 8 min read
Glance Supervised Video Moment Retrieval via the ViGA Framework
Tencent Tech
Tencent Tech
Apr 21, 2022 · Artificial Intelligence

How Tencent’s HunYuan Model Dominated All Major Video Retrieval Benchmarks

Tencent’s newly unveiled HunYuan AI model achieved a grand‑slam by ranking first on the five most authoritative cross‑modal video retrieval datasets, showcasing a hierarchical multimodal approach that dramatically boosts retrieval precision and promises broad impact for both research and industry applications.

AIMultimodalTencent
0 likes · 5 min read
How Tencent’s HunYuan Model Dominated All Major Video Retrieval Benchmarks
DaTaobao Tech
DaTaobao Tech
Apr 6, 2022 · Artificial Intelligence

Improving New User Experience in Taobao Live Recommendation via Multi‑Channel Lifelong Product Sequence Modeling

The paper tackles Taobao Live’s cold‑start problem for new users by introducing a multi‑channel lifelong product‑sequence network that enriches purchase histories with side information, extracts relevance‑focused subsequences across five channels, and integrates them via target‑attention DIN, achieving substantial offline and online performance gains, especially for low‑activity users.

MultimodalRecommendation SystemsUser Modeling
0 likes · 23 min read
Improving New User Experience in Taobao Live Recommendation via Multi‑Channel Lifelong Product Sequence Modeling
DataFunTalk
DataFunTalk
Mar 28, 2022 · Artificial Intelligence

Construction and Application of Meituan's On‑site Comprehensive Knowledge Graph

This article introduces Meituan's on‑site comprehensive knowledge graph, detailing its multi‑layer design, data‑driven construction pipeline, challenges of diverse user demands and industry complexity, and showcases practical applications in search, recommendation, intelligent display, as well as future expansion plans.

MeituanMultimodalknowledge graph
0 likes · 22 min read
Construction and Application of Meituan's On‑site Comprehensive Knowledge Graph
DataFunTalk
DataFunTalk
Jan 22, 2022 · Artificial Intelligence

Multimodal Content Understanding Techniques in Search Systems

This talk presents Tencent's multimodal content understanding framework for search, covering hierarchical content features, large‑scale ranking, fine‑grained image semantic vectors, video and document analysis, quality detection, duplicate removal, and future directions in AI‑driven search.

AIImage EmbeddingMultimodal
0 likes · 17 min read
Multimodal Content Understanding Techniques in Search Systems
DataFunTalk
DataFunTalk
Dec 26, 2021 · Artificial Intelligence

Neural–Symbolic Learning and Multimodal Knowledge Discovery: Recent Advances, Methods, and Challenges

This talk reviews recent progress in neural‑symbolic learning and multimodal knowledge discovery, highlighting examples such as GPT‑3 reasoning failures, the need for symbolic knowledge, historical developments, various integration methods, challenges in multimodal knowledge graphs, and future research directions.

AIMultimodalNeural-symbolic
0 likes · 20 min read
Neural–Symbolic Learning and Multimodal Knowledge Discovery: Recent Advances, Methods, and Challenges
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 6, 2021 · Artificial Intelligence

Can AI Design Full Clothing Lines? Inside Alibaba’s M6-UFC Generator

Alibaba’s DAMO Academy and Tsinghua University introduced M6‑UFC, a non‑autoregressive multimodal transformer that unifies arbitrary text and image controls to generate high‑quality, editable fashion designs, dramatically reducing carbon emissions and outperforming GAN‑based models in fidelity and relevance while accelerating production speed.

AIM6-UFCMultimodal
0 likes · 11 min read
Can AI Design Full Clothing Lines? Inside Alibaba’s M6-UFC Generator
DataFunSummit
DataFunSummit
Dec 3, 2021 · Artificial Intelligence

Real‑Time Voice Dialogue: Practices, Challenges, and Duplex Conversation

This article presents an in‑depth overview of Alibaba's real‑time voice dialogue system, covering the Hotline XiaoMi robot, the unique challenges of spoken interactions such as colloquialism, multimodality and duplex communication, and the research advances in ASR‑robust SLU, emotion detection, colloquial processing, and duplex conversation modeling.

ASRMultimodalSLU
0 likes · 22 min read
Real‑Time Voice Dialogue: Practices, Challenges, and Duplex Conversation
AntTech
AntTech
Oct 29, 2021 · Artificial Intelligence

Ant Insurance Technology and CASIA Win Two Tracks at MuSe2021 Multimodal Sentiment Challenge (ACM MM 2021)

The Ant Insurance Technology team, together with the Institute of Automation of the Chinese Academy of Sciences, secured first place in both the MuSe‑Wilder and MuSe‑Sent tracks of the MuSe2021 Multimodal Sentiment Challenge held at the 29th ACM International Conference on Multimedia in Chengdu, showcasing advanced multimodal AI techniques.

BiLSTMDeep LearningMuSe2021
0 likes · 4 min read
Ant Insurance Technology and CASIA Win Two Tracks at MuSe2021 Multimodal Sentiment Challenge (ACM MM 2021)
DataFunTalk
DataFunTalk
Sep 30, 2021 · Artificial Intelligence

Advances in Knowledge Graph Construction and Applications by Alibaba's AliMe Team

This article presents Alibaba's AliMe team's year‑long progress on knowledge graph research, covering the fundamentals of knowledge graphs, domain and multimodal graph construction techniques, practical e‑commerce applications such as dialogue‑driven recommendation, virtual‑anchor script generation, and insights on future directions.

AIMultimodaldialogue system
0 likes · 23 min read
Advances in Knowledge Graph Construction and Applications by Alibaba's AliMe Team
DataFunSummit
DataFunSummit
Sep 26, 2021 · Artificial Intelligence

Contrastive Learning and Its Applications in Weibo Content Representation

This article explains the fundamentals of contrastive learning, reviews typical models such as SimCLR, MoCo, SwAV, BYOL, SimSiam and Barlow Twins, and demonstrates how these methods are applied to Weibo text and multimodal (text‑image) representation tasks like hashtag generation and image‑text matching.

MultimodalNLPWeibo
0 likes · 18 min read
Contrastive Learning and Its Applications in Weibo Content Representation
Meituan Technology Team
Meituan Technology Team
Sep 2, 2021 · Artificial Intelligence

Construction and Application of Retail Product Knowledge Graph at Meituan

The paper describes Meituan’s retail product knowledge graph—a multi‑layered, multi‑modal system that structures billions of SKUs, attributes, and user insights using hierarchical categories, graph‑enhanced NER, semi‑supervised learning, and expert‑in‑the‑loop validation, enabling precise search, ranking, recommendation, and real‑time merchant optimization.

AIMultimodalRetail
0 likes · 25 min read
Construction and Application of Retail Product Knowledge Graph at Meituan
DataFunTalk
DataFunTalk
Aug 30, 2021 · Artificial Intelligence

Contrastive Learning: Foundations, Typical Models, and Applications to Weibo Content Representation

This article explains the concept of contrastive learning, its relationship to self‑supervised and metric learning, describes key system components and loss functions, reviews major image, NLP and multimodal models such as SimCLR, MoCo, SwAV, BYOL, and demonstrates how contrastive learning is applied to Weibo hashtag generation, similar‑post retrieval, and text‑image matching using CD‑TOM and W‑CLIP models.

AIMultimodalWeibo
0 likes · 19 min read
Contrastive Learning: Foundations, Typical Models, and Applications to Weibo Content Representation
Tencent Advertising Technology
Tencent Advertising Technology
Aug 18, 2021 · Artificial Intelligence

2021 Tencent Advertising Algorithm Competition: Winners, Accepted Papers, and Reviewer Feedback

The 2021 Tencent Advertising Algorithm Competition, held as the ACM MM 2021 Grand Challenge, announced the top three teams for two tracks, presented the accepted multimodal video advertising papers with detailed reviewer comments, and highlighted the significance of algorithmic innovation over ranking alone.

ACM MMAIAdvertising
0 likes · 8 min read
2021 Tencent Advertising Algorithm Competition: Winners, Accepted Papers, and Reviewer Feedback
DataFunTalk
DataFunTalk
Jul 12, 2021 · Artificial Intelligence

Tencent Music Live Streaming Recommendation System: Architecture, Challenges, and Model Design

This article presents an in‑depth overview of Tencent Music's live‑streaming recommendation system, covering business background, system architecture, recall and ranking model designs, multi‑modal extensions, and advanced training techniques such as DSSM, ESMM, GradNorm, and CGC to improve user engagement and conversion.

AIDSSMLive Streaming
0 likes · 13 min read
Tencent Music Live Streaming Recommendation System: Architecture, Challenges, and Model Design
DataFunTalk
DataFunTalk
Jul 1, 2021 · Artificial Intelligence

Pre‑Trained Models: Past, Present, and Future – A Comprehensive Survey

This article surveys the evolution of pre‑trained models, covering the origins of transfer and self‑supervised learning, the rise of transformer‑based PTMs such as BERT and GPT, efficient architecture designs, multimodal and multilingual extensions, theoretical analyses, and future research directions for scalable and robust AI systems.

AI researchEfficient TrainingLarge Language Models
0 likes · 27 min read
Pre‑Trained Models: Past, Present, and Future – A Comprehensive Survey
Xianyu Technology
Xianyu Technology
Jul 1, 2021 · Artificial Intelligence

Improving Search Relevance in Xianyu: System Design and Model Implementation

The paper describes Xianyu’s new relevance‑matching pipeline—integrating basic, text‑matching, semantic (BERT‑based dual‑tower), multimodal, and click‑graph features and fusing them with a GBDT model—which boosts search DCG@10 by 6.5 %, query satisfaction by 24 % and click interaction by over 20 % while outlining future enhancements for finer attribute matching and richer structured data.

MultimodalRankinge-commerce
0 likes · 13 min read
Improving Search Relevance in Xianyu: System Design and Model Implementation
Tencent Advertising Technology
Tencent Advertising Technology
May 28, 2021 · Artificial Intelligence

Insights from the Tencent Advertising Algorithm Competition: Model Framework and Optimization Strategies

The article shares a Tencent competition champion’s practical TensorFlow‑based video ad solution, detailing data handling, model architecture, optimization tricks, multimodal fusion techniques, and experimental observations to help participants improve performance in the 2021 Tencent Advertising Algorithm Contest.

MultimodalTensorFlowadvertising algorithm
0 likes · 7 min read
Insights from the Tencent Advertising Algorithm Competition: Model Framework and Optimization Strategies