Tagged articles
1 articles
Page 1 of 1
Alibaba Cloud Developer
Alibaba Cloud Developer
Oct 25, 2017 · Artificial Intelligence

How Hierarchical Multimodal LSTM Boosts Image Captioning Accuracy

This article reviews an ICCV paper introducing a hierarchical multimodal LSTM that jointly embeds images, phrases, and whole sentences, enabling detailed image descriptions and superior performance on Flickr30K, MS‑COCO, and region‑phrase datasets compared to previous methods.

Computer VisionImage CaptioningMultimodal Learning
0 likes · 8 min read
How Hierarchical Multimodal LSTM Boosts Image Captioning Accuracy