Tagged articles
8 articles
Page 1 of 1
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Mar 18, 2024 · Artificial Intelligence

How MuLTI Achieves Memory‑Efficient Video‑Language Understanding with Text‑Guided MultiWay Sampling

The paper presents MuLTI, a multimodal video‑language model that tackles the memory and efficiency challenges of long video‑text sequences by introducing a Text‑Guided MultiWay Sampler and a Multiple Choice Modeling pre‑training task, achieving state‑of‑the‑art results on video QA and retrieval while drastically reducing GPU memory consumption.

efficient-aifeature fusionmultimodal
0 likes · 19 min read
How MuLTI Achieves Memory‑Efficient Video‑Language Understanding with Text‑Guided MultiWay Sampling
iQIYI Technical Product Team
iQIYI Technical Product Team
Jul 19, 2019 · Artificial Intelligence

Face Quality‑Driven Feature Denoising and Fusion for iQIYI‑VID‑2019 Video Person Recognition

The seefun team leveraged face detection scores and quality metrics to denoise and weight‑fuse facial features during training and testing, using a three‑layer MLP with Swish activation and dropout, and achieved a 0.8983 mAP (fourth place) on the iQIYI‑VID‑2019 video person‑recognition challenge.

MLPface quality weightingfeature fusion
0 likes · 10 min read
Face Quality‑Driven Feature Denoising and Fusion for iQIYI‑VID‑2019 Video Person Recognition
iQIYI Technical Product Team
iQIYI Technical Product Team
Jul 12, 2019 · Artificial Intelligence

Multimodal Video Retrieval Solution for iQIYI Challenge: Feature Fusion and Model Ensemble

The ‘One Name’ team from Nanjing University achieved a MAP of 0.8986 and third place in the iQIYI multimodal video retrieval challenge by fusing official face embeddings with scene features, using channel‑attention‑based video feature fusion, a multimodal SE‑ResNeXt module, and a carefully partitioned model ensemble.

Multimodal Retrievalfeature fusioniQIYI challenge
0 likes · 7 min read
Multimodal Video Retrieval Solution for iQIYI Challenge: Feature Fusion and Model Ensemble
iQIYI Technical Product Team
iQIYI Technical Product Team
Jul 5, 2019 · Artificial Intelligence

Residual Dense Network with Feature Fusion for Multimodal Video Person Identification (iQIYI-VID-2019)

The authors introduce a feature‑fusion pipeline and a Residual Dense Net that leverages multi‑frame face embeddings to identify persons in iQIYI‑VID‑2019 videos, achieving 0.9035 mAP (second place) with only ≈0.5 GFLOPs and processing the full test set in minutes.

Multimodal Learningfeature fusioniQIYI-VID-2019
0 likes · 11 min read
Residual Dense Network with Feature Fusion for Multimodal Video Person Identification (iQIYI-VID-2019)
iQIYI Technical Product Team
iQIYI Technical Product Team
Jun 28, 2019 · Artificial Intelligence

Watchdog Team's TOP1 Solution for the iQIYI & ACMMM2019 Multimodal Video Person Recognition Challenge

Watchdog team won TOP1 in iQIYI & ACMMM2019 multimodal video person recognition challenge using pre‑extracted multimodal features, a 2048‑dim classifier with BCE loss, re‑ranking, DALI‑accelerated re‑detection, fine‑tuned InsightFace, and multi‑model ensembling achieving ~91% test accuracy.

Multimodal Learningfeature fusionmodel ensemble
0 likes · 12 min read
Watchdog Team's TOP1 Solution for the iQIYI & ACMMM2019 Multimodal Video Person Recognition Challenge
iQIYI Technical Product Team
iQIYI Technical Product Team
Jun 6, 2019 · Artificial Intelligence

Large-Scale Hierarchical Classification Algorithm for iQIYI Short Videos

iQIYI’s large‑scale hierarchical classification system combines multimodal text and image embeddings, low‑rank multimodal fusion, and a dense hierarchical multilabel network with cascade‑style weighting to assign accurate type tags to short videos, boosting production efficiency and personalized recommendation diversity.

AIHierarchical Classificationfeature fusion
0 likes · 16 min read
Large-Scale Hierarchical Classification Algorithm for iQIYI Short Videos
Didi Tech
Didi Tech
May 1, 2019 · Artificial Intelligence

Didi AI Labs' DFS Face Detection Algorithm Achieves Top Rankings on the WIDER FACE Benchmark

The DFS face-detection algorithm jointly created by Didi AI Labs and Beijing University's PRIS team secured five first-place and one second-place results on the WIDER FACE benchmark, achieving 96.3% (Easy), 95.4% (Medium) and 90.7% (Hard) AP by leveraging a Feature Fusion Pyramid and semantic-segmentation supervision, and is already deployed in Didi's driver-identity verification and in-vehicle privacy systems.

WIDER FACEfeature fusionsemantic segmentation
0 likes · 5 min read
Didi AI Labs' DFS Face Detection Algorithm Achieves Top Rankings on the WIDER FACE Benchmark