Tagged articles

Multimodal

422 articles · Page 5 of 5

Feb 25, 2021 · Artificial Intelligence

Applying Graph Embedding and Vector Recall for Personalized Recommendation in a UGC Community

This article describes how a UGC app tackled user and content cold‑start problems by introducing a personalized vector‑recall pipeline based on network representation learning and multimodal embeddings, detailing graph construction, GraphSAGE and GAT implementations, offline experiments, A/B test results, and future directions.

GNNMultimodalgraph-embedding

0 likes · 14 min read

Applying Graph Embedding and Vector Recall for Personalized Recommendation in a UGC Community

JD Cloud Developers

Feb 5, 2021 · Artificial Intelligence

2020 NLP Milestones & Future Trends: Insights from JD’s AI Scientist

In an InfoQ interview, JD Technology senior algorithm scientist Wu Youzheng reviews the rapid advances of natural language processing in 2020—including GPT‑3, multimodal dialogue, knowledge‑enhanced pre‑training, and knowledge graphs—while outlining the most promising research directions and practical challenges for the coming year.

AI ApplicationsGPT-3Multimodal

0 likes · 18 min read

2020 NLP Milestones & Future Trends: Insights from JD’s AI Scientist

DataFunTalk

Nov 16, 2020 · Artificial Intelligence

Deep Semantic Relevance and Multimodal Video Search at Alibaba Entertainment

The presentation by Alibaba Entertainment's senior algorithm expert details the challenges of video search in the 4G/5G era and describes a comprehensive framework covering business overview, relevance and ranking, multimodal retrieval, deep semantic modeling, dataset construction, and practical deployment techniques.

Deep LearningInformation RetrievalMultimodal

0 likes · 27 min read

Deep Semantic Relevance and Multimodal Video Search at Alibaba Entertainment

JD Cloud Developers

Nov 4, 2020 · Artificial Intelligence

Multimodal AI Breakthroughs Unveiled at NLPCC 2020 Workshop

The article recaps the inaugural Multimodal Natural Language Processing workshop at NLPCC 2020, highlighting breakthroughs in multimodal summarization, pre‑training models, AI‑driven art, visual‑language interaction, and multimodal dialogue systems, and showcases research from leading institutions and industry partners.

AIMultimodalNLP

0 likes · 9 min read

Multimodal AI Breakthroughs Unveiled at NLPCC 2020 Workshop

DataFunSummit

Nov 3, 2020 · Artificial Intelligence

Deep Semantic Relevance and Multi‑Modal Video Search at Alibaba Entertainment

This presentation details Alibaba Entertainment's video search system, covering its business scope, user‑value metrics, a layered algorithm framework, relevance challenges, multi‑modal retrieval, deep semantic relevance techniques, model selection, asymmetric twin‑tower deployment, multi‑stage knowledge distillation, and practical effect cases.

AlibabaMultimodalSearch Algorithms

0 likes · 25 min read

DataFunTalk

Jul 8, 2020 · Artificial Intelligence

Multi‑Level Multi‑Modal Search Engine and Graph Engine for Video Content at Youku

The article presents a detailed technical overview of Youku's video search system, covering multi‑modal inputs, multi‑level element indexing, face search, cross‑level and cross‑modal retrieval, and the design and applications of a multimodal graph engine with knowledge‑graph integration.

AIMultimodalface search

0 likes · 12 min read

Multi‑Level Multi‑Modal Search Engine and Graph Engine for Video Content at Youku

Alibaba Cloud Developer

Jun 2, 2020 · Artificial Intelligence

How FashionBERT Boosts E‑Commerce Image‑Text Matching with Patch Embeddings

This article introduces FashionBERT, a multimodal BERT‑based model that replaces ROI‑based image tokens with uniform image patches to overcome e‑commerce specific challenges, details its architecture, adaptive loss balancing, deployment in Alibaba search, and reports significant performance gains on public and internal datasets.

BERTDeep LearningImage-Text Matching

0 likes · 13 min read

How FashionBERT Boosts E‑Commerce Image‑Text Matching with Patch Embeddings

Didi Tech

May 25, 2020 · Artificial Intelligence

How Didi Harnesses Cutting‑Edge Speech Recognition: From ASR Basics to Transformer Models

This article provides a comprehensive technical overview of modern speech recognition, covering Didi’s driver‑assistant and smart‑customer‑service applications, fundamental ASR concepts, classic GMM‑HMM methods, deep‑learning breakthroughs such as DNN‑HMM, CTC, attention‑based and transformer models, practical training tricks, signal‑processing steps, and multimodal fusion techniques.

ASRCTCDeep Learning

0 likes · 16 min read

How Didi Harnesses Cutting‑Edge Speech Recognition: From ASR Basics to Transformer Models

DataFunTalk

May 23, 2020 · Artificial Intelligence

iQIYI Deep Semantic Representation Learning Framework for Video Recommendation and Search

Based on academic and industry experience, iQIYI has designed a deep semantic representation learning framework that integrates multimodal side information and deep models such as Transformers and graph neural networks, improving recall, ranking, deduplication, diversity and semantic matching across recommendation and search scenarios.

Deep LearningGraph Neural NetworksMultimodal

0 likes · 27 min read

iQIYI Deep Semantic Representation Learning Framework for Video Recommendation and Search

iQIYI Technical Product Team

May 15, 2020 · Artificial Intelligence

iQIYI Deep Semantic Representation Learning Framework: Design, Challenges, and Applications

iQIYI’s deep semantic representation learning framework integrates multimodal content, knowledge graphs, and user behavior through layered data, feature, strategy, and application components, employing early, late, and hybrid fusion with Transformers, GCNs, and other deep models to deliver high‑quality embeddings that boost recommendation, search, and streaming performance across dozens of business scenarios.

Graph Neural NetworksMultimodalSearch

0 likes · 28 min read

iQIYI Deep Semantic Representation Learning Framework: Design, Challenges, and Applications

DataFunTalk

Apr 20, 2020 · Artificial Intelligence

Video Search at Youku: Algorithmic Practices, Relevance, Ranking, and Multimodal Techniques

This article presents a comprehensive overview of Youku's video search system, covering business background, evaluation metrics, system and algorithm frameworks, relevance and ranking feature engineering, dataset construction, semantic matching, multimodal video understanding, and practical case studies that illustrate the impact of deep learning and AI techniques on search performance.

AIDeep LearningMultimodal

0 likes · 18 min read

Video Search at Youku: Algorithmic Practices, Relevance, Ranking, and Multimodal Techniques

Youku Technology

Apr 16, 2020 · Artificial Intelligence

Multimodal Video Classification: Image Feature Improvements and System Insights

The talk presents Alibaba’s hierarchical video‑category system and a multimodal classification pipeline—leveraging EfficientNet, NeXtVLAD fusion, attention‑dropping augmentation, and MoCo contrastive learning—that together boost cold‑start recall by 43%, improve program classification over 20%, and set the stage for larger models and advanced unsupervised methods.

AIEfficientNetMultimodal

0 likes · 17 min read

Multimodal Video Classification: Image Feature Improvements and System Insights

DataFunTalk

Mar 30, 2020 · Artificial Intelligence

Enhancing Multimodal Video Classification with Improved Image Features and Category System

This article presents a comprehensive overview of Alibaba Entertainment's category system and multimodal video classification algorithm, detailing the construction of a high‑accuracy hierarchical taxonomy, improvements to image feature extraction using EfficientNet and data augmentation, unsupervised training techniques, experimental results, practical pitfalls, and future research directions.

AIMultimodalcategory system

0 likes · 17 min read

Enhancing Multimodal Video Classification with Improved Image Features and Category System

iQIYI Technical Product Team

Mar 27, 2020 · Artificial Intelligence

Multimodal Short Video Content Tagging Techniques and Applications at iQIYI

The article surveys iQIYI’s multimodal short‑video content‑tagging pipeline, detailing extraction‑ and generation‑based methods, challenges of open‑world tags, model evolution from rule‑based to Transformer generators, visual‑text fusion techniques, and applications such as recommendation, search, clustering, and future enhancements.

MultimodalNLPcontent tagging

0 likes · 18 min read

Multimodal Short Video Content Tagging Techniques and Applications at iQIYI

DataFunTalk

Feb 3, 2020 · Artificial Intelligence

Alibaba Entertainment Search Algorithm Practice and Insights – Video Search Case Study with Youku

The live session presented Alibaba Entertainment’s senior algorithm expert discussing Youku’s video search business, relevance and ranking models, multimodal search challenges, and practical AI techniques, offering attendees a comprehensive view of modern video retrieval systems and their implementation.

AIInformation RetrievalMultimodal

0 likes · 3 min read

Alibaba Entertainment Search Algorithm Practice and Insights – Video Search Case Study with Youku

DataFunTalk

Nov 20, 2019 · Artificial Intelligence

Advances and Reflections on Human‑Machine Dialogue Technologies

This presentation reviews recent progress in spoken and multimodal dialogue systems, covering X‑driven architectures, task‑oriented and open‑domain approaches, NLU/DM integration, FAQ, KB/KG‑driven methods, document‑driven dialogue, and outlines remaining challenges and future research directions.

Artificial IntelligenceDialogue SystemsMultimodal

0 likes · 21 min read

Advances and Reflections on Human‑Machine Dialogue Technologies

iQIYI Technical Product Team

Sep 27, 2019 · Artificial Intelligence

iQIYI-VID: A Large-Scale Multimodal Video Dataset for Person Recognition

iQIYI-VID is the world’s largest multimodal video dataset for person recognition, containing 10,000 celebrity identities and 600,000 video clips drawn from millions of videos, supporting tasks such as detection, identification, attribute and audio analysis, and serving as the basis for 2018‑2019 challenges and a face‑recognition subset, thereby driving research while performance gaps remain.

AIMultimodaliQIYI-VID

0 likes · 7 min read

iQIYI-VID: A Large-Scale Multimodal Video Dataset for Person Recognition

iQIYI Technical Product Team

Aug 9, 2019 · Artificial Intelligence

iQIYI 2019 Multimodal Video Person Recognition Competition Report by Zheey Team

The Zheey team from Beijing University of Posts and Telecommunications tackled the iQIYI 2019 Multimodal Video Person Recognition Challenge with a three‑layer MLP on official face features, boosting a baseline 0.8742 to 0.8949 through model fusion, quality filtering and fine‑tuning, ultimately ranking sixth and open‑sourcing their code.

MLPMultimodalcompetition

0 likes · 9 min read

iQIYI 2019 Multimodal Video Person Recognition Competition Report by Zheey Team

iQIYI Technical Product Team

Jun 6, 2019 · Artificial Intelligence

Large-Scale Hierarchical Classification Algorithm for iQIYI Short Videos

iQIYI’s large‑scale hierarchical classification system combines multimodal text and image embeddings, low‑rank multimodal fusion, and a dense hierarchical multilabel network with cascade‑style weighting to assign accurate type tags to short videos, boosting production efficiency and personalized recommendation diversity.

AIHierarchical ClassificationMultimodal

0 likes · 16 min read

Large-Scale Hierarchical Classification Algorithm for iQIYI Short Videos

iQIYI Technical Product Team

Apr 4, 2019 · Artificial Intelligence

My Experience and Methods in the iQIYI Multimodal Person Recognition Challenge

In the iQIYI Multimodal Person Recognition Challenge, I leveraged the provided facial features, weighted face‑quality averaging, DBSCAN‑based noise clustering and a dynamic extra noise class within an iterative KNN‑to‑neural‑network training pipeline, ultimately reaching the top‑5 and open‑sourcing the full workflow on GitHub.

DBSCANMultimodaliQIYI

0 likes · 7 min read

My Experience and Methods in the iQIYI Multimodal Person Recognition Challenge

iQIYI Technical Product Team

Mar 22, 2019 · Artificial Intelligence

Experience Report of the 2018 iQIYI Multimodal Video Person Identification Challenge (WitcheR Team)

The WitcheR team won the 2018 iQIYI multimodal video person identification challenge by building a fast pipeline that combined a custom face‑and‑keypoint detector, ArcFace‑trained face embeddings, scene classification, and a three‑layer MLP with several training tricks, achieving a final mAP of 88.6 % and demonstrating the value of rapid idea validation and open‑sourced code for future challenges.

MLPModel FusionMultimodal

0 likes · 12 min read

Experience Report of the 2018 iQIYI Multimodal Video Person Identification Challenge (WitcheR Team)

DataFunTalk

Mar 15, 2019 · Artificial Intelligence

Designing Personalized, Dynamic, and Multimodal Knowledge Graphs for Chatbots

The article explores how chatbots require personalized dense knowledge graphs, dynamic temporal graphs, subjective emotion modeling, integration with external services, and multimodal media support, while also promoting a new NLP book and a related giveaway for readers.

AIChatbotDynamic Graph

0 likes · 9 min read

Designing Personalized, Dynamic, and Multimodal Knowledge Graphs for Chatbots