Alibaba Cloud Developer
Oct 25, 2017 · Artificial Intelligence
How Hierarchical Multimodal LSTM Boosts Image Captioning Accuracy
This article reviews an ICCV paper introducing a hierarchical multimodal LSTM that jointly embeds images, phrases, and whole sentences, enabling detailed image descriptions and superior performance on Flickr30K, MS‑COCO, and region‑phrase datasets compared to previous methods.
Computer VisionImage CaptioningMultimodal Learning
0 likes · 8 min read
