JD AI Presents Eight Papers at AAAI 2019 Showcasing Advances in Machine Learning, NLP, and Computer Vision
At AAAI 2019 in Hawaii, JD AI Research Institute had eight papers accepted covering machine learning, natural language processing, computer vision, and multimodal AI, highlighting innovations such as AutoZOOM black‑box attacks, SACN for knowledge base completion, and temporally aware video captioning models.
On January 27 (U.S. time), the AAAI 2019 conference opened in Hawaii, featuring top AI research worldwide. JD AI Research Institute had eight papers accepted, spanning machine learning, natural language processing, video and image processing, and multimodal AI, demonstrating the company’s strong technical capabilities and industry impact.
Machine Learning
The paper AutoZOOM: Auto encoder-based Zeroth Order Optimization Method for Attacking Black-box Neural Networks proposes a framework that dramatically reduces query numbers for black‑box attacks by combining an adaptive stochastic gradient estimator with a powerful auto‑encoder, achieving high attack success without sacrificing visual quality.
The work MPD-AL: An Efficient Membrane Potential Driven Aggregate-Label Learning Algorithm for Spiking Neurons introduces a membrane‑potential‑driven learning algorithm that identifies optimal spike times and guides synaptic adaptation, outperforming previous spiking‑neuron methods and improving classification accuracy on language‑recognition tasks.
Natural Language Understanding
End-to-end Structure-Aware Convolutional Networks for Knowledge Base Completion (SACN) combines weighted graph convolutional networks with a Conv‑TransE decoder to produce richer node embeddings, achieving roughly 10 % relative improvement on FB15k‑237 and WN18RR benchmarks.
Attentive Tensor Product Learning (ATPL) presents an unsupervised method for extracting syntactic role vectors, enhancing downstream NLP tasks such as image captioning and part‑of‑speech tagging while reducing annotation costs.
Computer Vision & Video Understanding
Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation uses a high‑level LSTM manager to set coherent topics and a low‑level Semantic Compositional Network worker to generate video descriptions, outperforming existing models on the VIST dataset.
Structured Two‑stream Attention Network for Video Question Answering integrates structured video segments and textual features to focus on salient visual content, improving open‑ended video QA performance.
Temporal Sentence Localization in Video with Attention Based Location Regression (ABLR) introduces a bi‑directional LSTM encoder and a co‑attention mechanism to locate sentence boundaries in video, achieving superior accuracy on ActivityNet and TACoS.
Temporal Deformable Convolutional Encoder‑Decoder Networks for Video Captioning (TDConvED) employs temporally deformable convolutions and offset convolutions to model long sequences, mitigating gradient issues of recurrent networks and accelerating training for video captioning tasks.
Overall, JD AI’s research demonstrates a focus on multimodal data processing, bridging text, image, and video modalities, and applying these advances across JD’s e‑commerce, logistics, and finance ecosystems, thereby accelerating the integration of AI into real‑world industry applications.
JD’s AI platform, NeuHub, supported over 148.7 billion calls during the 11.11 shopping festival, with a single‑day peak of 15.3 billion, illustrating the scale of AI deployment in the company’s retail chain.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JD Tech
Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
