Trajectory Prediction Algorithm for Autonomous Vehicles: Winning Solutions in NeurIPS 2020 INTERPRET Challenge
Meituan’s unmanned delivery team secured first place in the Generalizability track and second in the Regular track of the NeurIPS 2020 INTERPRET trajectory‑prediction challenge by employing a mixed‑attention graph‑transformer with dual‑channel GRU and adaptive map processing, achieving ADEs of 0.5339 m and 0.1912 m respectively.
Meituan's unmanned delivery vehicle team won the NeurIPS 2020 INTERPRET trajectory prediction challenge, taking first place in the Generalizability track and second place in the Regular track. This article introduces the algorithmic details that led to the victory.
01 Background
NeurIPS (Conference on Neural Information Processing Systems) is a top conference in machine learning and computational neuroscience. The INTERPRET trajectory prediction challenge, part of the NeurIPS 2020 Workshop Competition Track, was organized by UC Berkeley MSC Lab to provide a public dataset for evaluating trajectory prediction methods in autonomous driving.
02 Competition Overview
The competition consists of two tracks:
Generalizability Track – test trajectories differ significantly from training data (collected from different scenes) and contain no high‑definition map.
Regular Track – test and training trajectories are from the same scenes and include high‑definition maps.
The dataset covers multiple countries (USA, China, Germany) and includes highway, urban, round‑about, and unprotected left‑turn scenarios with pedestrians, bicycles, and vehicles.
For each obstacle, participants receive 1 second (10 frames) of past motion and must predict the next 3 seconds (30 frames). Up to 50 candidate trajectories can be submitted per obstacle, but ranking is based on the Average Displacement Error (ADE) of the best (rank‑1) trajectory.
03 Algorithm Introduction
Part 1: Map Data Processing
Because the two tracks have different map availability, two representations are used:
Regular Track – high‑definition map is queried to obtain lane geometry around each position.
Generalizability Track – a location‑based semantic map is constructed from historical obstacle trajectories. The process involves:
Dividing a region (e.g., 50 m × 50 m) into a grid with a chosen resolution (e.g., 0.2 m), yielding 250 × 250 cells.
Assigning each trajectory point’s direction to the corresponding cell.
Aggregating direction statistics per cell by dividing 360° into eight 45° bins and normalizing.
The resulting semantic map has dimensions grid_x × 8.
Part 2: Prediction Model Design
The core of the solution is a mixed‑attention mechanism built on a graph‑attention backbone. The model follows an encoder‑decoder architecture:
Feature Embedding Network : uses Timewise + Agentwise attention and a dual‑channel GRU to encode obstacle trajectories and map information.
Interaction & Prediction Network : employs Agentwise + Conditional attention to model interactions among agents and between agents and lanes, outputting multimodal trajectories with probabilities.
Two novel attention modules, Enc‑MAT and Dec‑MAT (Mixture Attention Transformer), extend the standard Transformer encoder by adding a second attention channel:
Enc‑MAT combines Timewise and Agentwise attention.
Dec‑MAT combines Agentwise and Conditional (distance‑based) attention.
All three attention types share the same computation formula; they differ only in how Q, K, V are generated.
Part 3: Trajectory Prediction Process
Encoding stage:
Obstacle trajectories are first enhanced by Enc‑MAT, then passed through a Slow + Fast Channel GRU to capture fine‑grained and coarse motion features.
Road topology is encoded differently depending on map availability: a standard Transformer encoder for HD‑map scenes, and separate horizontal/vertical Bi‑GRUs for semantic‑map scenes.
Decoding stage:
High‑level interaction – a global graph composed of obstacle and map features is processed by Dec‑MAT (Agentwise + Conditional attention) to produce refined obstacle features.
Trajectory prediction – two MLP heads generate the predicted trajectory and its probability.
The method achieved ADE 0.5339 m on the Generalizability track (champion) and ADE 0.1912 m on the Regular track (runner‑up).
04 Summary
Accurate obstacle trajectory prediction is crucial for safe autonomous driving. The presented mixed‑attention approach demonstrates strong performance on both tracks and provides a practical solution for Meituan’s real‑world unmanned delivery vehicles.
05 References
[1] Alahi et al., Social LSTM, CVPR 2016.
[2] Gupta et al., Social GAN, CVPR 2018.
[3] Zhu et al., StarNet, IROS 2019.
[4] Chai et al., Multipath, arXiv 2019.
[5] Chang et al., Argoverse, CVPR 2019.
[6] Liang et al., Peeking into the Future, CVPR 2019.
[7] Mohamed et al., Social‑STGCNN, CVPR 2020.
[8] Liang et al., Learning lane graph representations, ECCV 2020.
[9] Huang et al., STGAT, ICCV 2019.
[10] Gao et al., VectorNet, arXiv 2020.
[11] Zhao et al., TNT, arXiv 2020.
06 Author Information
Yan Liang, Fu Zhuang, De Heng, Dong Chun, etc., all algorithm engineers at Meituan Unmanned Delivery Center.
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
