Tagged articles

Multimodal

422 articles · Page 5 of 5
DataFunTalk
DataFunTalk
Feb 25, 2021 · Artificial Intelligence

Applying Graph Embedding and Vector Recall for Personalized Recommendation in a UGC Community

This article describes how a UGC app tackled user and content cold‑start problems by introducing a personalized vector‑recall pipeline based on network representation learning and multimodal embeddings, detailing graph construction, GraphSAGE and GAT implementations, offline experiments, A/B test results, and future directions.

GNNMultimodalgraph-embedding
0 likes · 14 min read
Applying Graph Embedding and Vector Recall for Personalized Recommendation in a UGC Community
JD Cloud Developers
JD Cloud Developers
Feb 5, 2021 · Artificial Intelligence

2020 NLP Milestones & Future Trends: Insights from JD’s AI Scientist

In an InfoQ interview, JD Technology senior algorithm scientist Wu Youzheng reviews the rapid advances of natural language processing in 2020—including GPT‑3, multimodal dialogue, knowledge‑enhanced pre‑training, and knowledge graphs—while outlining the most promising research directions and practical challenges for the coming year.

AI ApplicationsGPT-3Multimodal
0 likes · 18 min read
2020 NLP Milestones & Future Trends: Insights from JD’s AI Scientist
DataFunTalk
DataFunTalk
Nov 16, 2020 · Artificial Intelligence

Deep Semantic Relevance and Multimodal Video Search at Alibaba Entertainment

The presentation by Alibaba Entertainment's senior algorithm expert details the challenges of video search in the 4G/5G era and describes a comprehensive framework covering business overview, relevance and ranking, multimodal retrieval, deep semantic modeling, dataset construction, and practical deployment techniques.

Deep LearningInformation RetrievalMultimodal
0 likes · 27 min read
Deep Semantic Relevance and Multimodal Video Search at Alibaba Entertainment
JD Cloud Developers
JD Cloud Developers
Nov 4, 2020 · Artificial Intelligence

Multimodal AI Breakthroughs Unveiled at NLPCC 2020 Workshop

The article recaps the inaugural Multimodal Natural Language Processing workshop at NLPCC 2020, highlighting breakthroughs in multimodal summarization, pre‑training models, AI‑driven art, visual‑language interaction, and multimodal dialogue systems, and showcases research from leading institutions and industry partners.

AIMultimodalNLP
0 likes · 9 min read
Multimodal AI Breakthroughs Unveiled at NLPCC 2020 Workshop
DataFunSummit
DataFunSummit
Nov 3, 2020 · Artificial Intelligence

Deep Semantic Relevance and Multi‑Modal Video Search at Alibaba Entertainment

This presentation details Alibaba Entertainment's video search system, covering its business scope, user‑value metrics, a layered algorithm framework, relevance challenges, multi‑modal retrieval, deep semantic relevance techniques, model selection, asymmetric twin‑tower deployment, multi‑stage knowledge distillation, and practical effect cases.

AlibabaMultimodalSearch Algorithms
0 likes · 25 min read
Deep Semantic Relevance and Multi‑Modal Video Search at Alibaba Entertainment
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 2, 2020 · Artificial Intelligence

How FashionBERT Boosts E‑Commerce Image‑Text Matching with Patch Embeddings

This article introduces FashionBERT, a multimodal BERT‑based model that replaces ROI‑based image tokens with uniform image patches to overcome e‑commerce specific challenges, details its architecture, adaptive loss balancing, deployment in Alibaba search, and reports significant performance gains on public and internal datasets.

BERTDeep LearningImage-Text Matching
0 likes · 13 min read
How FashionBERT Boosts E‑Commerce Image‑Text Matching with Patch Embeddings
Didi Tech
Didi Tech
May 25, 2020 · Artificial Intelligence

How Didi Harnesses Cutting‑Edge Speech Recognition: From ASR Basics to Transformer Models

This article provides a comprehensive technical overview of modern speech recognition, covering Didi’s driver‑assistant and smart‑customer‑service applications, fundamental ASR concepts, classic GMM‑HMM methods, deep‑learning breakthroughs such as DNN‑HMM, CTC, attention‑based and transformer models, practical training tricks, signal‑processing steps, and multimodal fusion techniques.

ASRCTCDeep Learning
0 likes · 16 min read
How Didi Harnesses Cutting‑Edge Speech Recognition: From ASR Basics to Transformer Models
DataFunTalk
DataFunTalk
May 23, 2020 · Artificial Intelligence

iQIYI Deep Semantic Representation Learning Framework for Video Recommendation and Search

Based on academic and industry experience, iQIYI has designed a deep semantic representation learning framework that integrates multimodal side information and deep models such as Transformers and graph neural networks, improving recall, ranking, deduplication, diversity and semantic matching across recommendation and search scenarios.

Deep LearningGraph Neural NetworksMultimodal
0 likes · 27 min read
iQIYI Deep Semantic Representation Learning Framework for Video Recommendation and Search
iQIYI Technical Product Team
iQIYI Technical Product Team
May 15, 2020 · Artificial Intelligence

iQIYI Deep Semantic Representation Learning Framework: Design, Challenges, and Applications

iQIYI’s deep semantic representation learning framework integrates multimodal content, knowledge graphs, and user behavior through layered data, feature, strategy, and application components, employing early, late, and hybrid fusion with Transformers, GCNs, and other deep models to deliver high‑quality embeddings that boost recommendation, search, and streaming performance across dozens of business scenarios.

Graph Neural NetworksMultimodalSearch
0 likes · 28 min read
iQIYI Deep Semantic Representation Learning Framework: Design, Challenges, and Applications
DataFunTalk
DataFunTalk
Apr 20, 2020 · Artificial Intelligence

Video Search at Youku: Algorithmic Practices, Relevance, Ranking, and Multimodal Techniques

This article presents a comprehensive overview of Youku's video search system, covering business background, evaluation metrics, system and algorithm frameworks, relevance and ranking feature engineering, dataset construction, semantic matching, multimodal video understanding, and practical case studies that illustrate the impact of deep learning and AI techniques on search performance.

AIDeep LearningMultimodal
0 likes · 18 min read
Video Search at Youku: Algorithmic Practices, Relevance, Ranking, and Multimodal Techniques
Youku Technology
Youku Technology
Apr 16, 2020 · Artificial Intelligence

Multimodal Video Classification: Image Feature Improvements and System Insights

The talk presents Alibaba’s hierarchical video‑category system and a multimodal classification pipeline—leveraging EfficientNet, NeXtVLAD fusion, attention‑dropping augmentation, and MoCo contrastive learning—that together boost cold‑start recall by 43%, improve program classification over 20%, and set the stage for larger models and advanced unsupervised methods.

AIEfficientNetMultimodal
0 likes · 17 min read
Multimodal Video Classification: Image Feature Improvements and System Insights
DataFunTalk
DataFunTalk
Mar 30, 2020 · Artificial Intelligence

Enhancing Multimodal Video Classification with Improved Image Features and Category System

This article presents a comprehensive overview of Alibaba Entertainment's category system and multimodal video classification algorithm, detailing the construction of a high‑accuracy hierarchical taxonomy, improvements to image feature extraction using EfficientNet and data augmentation, unsupervised training techniques, experimental results, practical pitfalls, and future research directions.

AIMultimodalcategory system
0 likes · 17 min read
Enhancing Multimodal Video Classification with Improved Image Features and Category System
iQIYI Technical Product Team
iQIYI Technical Product Team
Mar 27, 2020 · Artificial Intelligence

Multimodal Short Video Content Tagging Techniques and Applications at iQIYI

The article surveys iQIYI’s multimodal short‑video content‑tagging pipeline, detailing extraction‑ and generation‑based methods, challenges of open‑world tags, model evolution from rule‑based to Transformer generators, visual‑text fusion techniques, and applications such as recommendation, search, clustering, and future enhancements.

MultimodalNLPcontent tagging
0 likes · 18 min read
Multimodal Short Video Content Tagging Techniques and Applications at iQIYI
DataFunTalk
DataFunTalk
Feb 3, 2020 · Artificial Intelligence

Alibaba Entertainment Search Algorithm Practice and Insights – Video Search Case Study with Youku

The live session presented Alibaba Entertainment’s senior algorithm expert discussing Youku’s video search business, relevance and ranking models, multimodal search challenges, and practical AI techniques, offering attendees a comprehensive view of modern video retrieval systems and their implementation.

AIInformation RetrievalMultimodal
0 likes · 3 min read
Alibaba Entertainment Search Algorithm Practice and Insights – Video Search Case Study with Youku
DataFunTalk
DataFunTalk
Nov 20, 2019 · Artificial Intelligence

Advances and Reflections on Human‑Machine Dialogue Technologies

This presentation reviews recent progress in spoken and multimodal dialogue systems, covering X‑driven architectures, task‑oriented and open‑domain approaches, NLU/DM integration, FAQ, KB/KG‑driven methods, document‑driven dialogue, and outlines remaining challenges and future research directions.

Artificial IntelligenceDialogue SystemsMultimodal
0 likes · 21 min read
Advances and Reflections on Human‑Machine Dialogue Technologies
iQIYI Technical Product Team
iQIYI Technical Product Team
Sep 27, 2019 · Artificial Intelligence

iQIYI-VID: A Large-Scale Multimodal Video Dataset for Person Recognition

iQIYI-VID is the world’s largest multimodal video dataset for person recognition, containing 10,000 celebrity identities and 600,000 video clips drawn from millions of videos, supporting tasks such as detection, identification, attribute and audio analysis, and serving as the basis for 2018‑2019 challenges and a face‑recognition subset, thereby driving research while performance gaps remain.

AIMultimodaliQIYI-VID
0 likes · 7 min read
iQIYI-VID: A Large-Scale Multimodal Video Dataset for Person Recognition
iQIYI Technical Product Team
iQIYI Technical Product Team
Aug 9, 2019 · Artificial Intelligence

iQIYI 2019 Multimodal Video Person Recognition Competition Report by Zheey Team

The Zheey team from Beijing University of Posts and Telecommunications tackled the iQIYI 2019 Multimodal Video Person Recognition Challenge with a three‑layer MLP on official face features, boosting a baseline 0.8742 to 0.8949 through model fusion, quality filtering and fine‑tuning, ultimately ranking sixth and open‑sourcing their code.

MLPMultimodalcompetition
0 likes · 9 min read
iQIYI 2019 Multimodal Video Person Recognition Competition Report by Zheey Team
iQIYI Technical Product Team
iQIYI Technical Product Team
Jun 6, 2019 · Artificial Intelligence

Large-Scale Hierarchical Classification Algorithm for iQIYI Short Videos

iQIYI’s large‑scale hierarchical classification system combines multimodal text and image embeddings, low‑rank multimodal fusion, and a dense hierarchical multilabel network with cascade‑style weighting to assign accurate type tags to short videos, boosting production efficiency and personalized recommendation diversity.

AIHierarchical ClassificationMultimodal
0 likes · 16 min read
Large-Scale Hierarchical Classification Algorithm for iQIYI Short Videos
iQIYI Technical Product Team
iQIYI Technical Product Team
Apr 4, 2019 · Artificial Intelligence

My Experience and Methods in the iQIYI Multimodal Person Recognition Challenge

In the iQIYI Multimodal Person Recognition Challenge, I leveraged the provided facial features, weighted face‑quality averaging, DBSCAN‑based noise clustering and a dynamic extra noise class within an iterative KNN‑to‑neural‑network training pipeline, ultimately reaching the top‑5 and open‑sourcing the full workflow on GitHub.

DBSCANMultimodaliQIYI
0 likes · 7 min read
My Experience and Methods in the iQIYI Multimodal Person Recognition Challenge
iQIYI Technical Product Team
iQIYI Technical Product Team
Mar 22, 2019 · Artificial Intelligence

Experience Report of the 2018 iQIYI Multimodal Video Person Identification Challenge (WitcheR Team)

The WitcheR team won the 2018 iQIYI multimodal video person identification challenge by building a fast pipeline that combined a custom face‑and‑keypoint detector, ArcFace‑trained face embeddings, scene classification, and a three‑layer MLP with several training tricks, achieving a final mAP of 88.6 % and demonstrating the value of rapid idea validation and open‑sourced code for future challenges.

MLPModel FusionMultimodal
0 likes · 12 min read
Experience Report of the 2018 iQIYI Multimodal Video Person Identification Challenge (WitcheR Team)
DataFunTalk
DataFunTalk
Mar 15, 2019 · Artificial Intelligence

Designing Personalized, Dynamic, and Multimodal Knowledge Graphs for Chatbots

The article explores how chatbots require personalized dense knowledge graphs, dynamic temporal graphs, subjective emotion modeling, integration with external services, and multimodal media support, while also promoting a new NLP book and a related giveaway for readers.

AIChatbotDynamic Graph
0 likes · 9 min read
Designing Personalized, Dynamic, and Multimodal Knowledge Graphs for Chatbots