VAD
TL;DR
Motivations & Innovations
Approach
[图片] Scene Representation Map token: predict the vectorized representation of the map$$\hat{V}{map} \in \R^{N_m \times (N_p \times 2 +C)}$$, where $$N_m$$, $$N_p$$ and $$C$$denote the number of predicted map vectors, the number of points contained in each map vector and the number of class (including lane centerline, lane divider, road boundary, pedestrian crossing). Agent token : predict agent information $$\hat{V}{agent} \in \R^{N_a \times (N_k \times N_t \times 2 +N_k)}$$, where $$N_a$$, $$N_k$$and $$N_t$$denote the number of predicted agents, the number of modalities (driving intention), and the number of future timestamps. (including location, orientation , size, speed, category, and multi-mode future trajectories). VAD outputs a probability score for each modality. Traffic signal token : predict the states of traffic signal (traffic light and stop sign)
- Navigation infromation : encoder with an MLP
- Ego status : encoder with an MLP Planning via Interaction Ego-Agent Interaction $$Q’{\text{ego}}$$ [图片] Ego-Map Interaction $$Q’’{\text{ego}}$$ [图片] [图片] Planning Head (MLP) $$\hat{V}_{\text{ego}} \in {\R}^{N_c \times N_k \times N_t \times 2}$$, where $$N_c$$denote the number of high-level command (turn left, turn right, go straight) [图片] End-to-End Learning [图片] Map Loss
- Focal loss: classification loss
- Manhattan distance: regression loss between the predicted map points and the ground truth map points. Agent Loss
- Detection
- L1 loss: regression loss
- Focal loss: classification loss
- Motion
Use the trajectory which has the minimum final displacement error (minFDE) as a representative prediction.
- L1 loss: motion regression loss between the representative trajectory and the ground truth trajectory
- Focal loss: multi-modal motion classification loss Constraint Loss [图片] Ego-agent Collision Constraint
- Agent selection: filter out low-confidence agent predictions by a threshold
- Multi-modality motion selection: select the trajectory with the highest confidence score as the final prediction [图片] Ego-Boundary over-stepping Constraint Ego-Lane Directional Constraint Imitation Learning Loss (L1 loss) VAD adopt an L1 loss between the predicted ego trajectory and the ground truth ego trajectory, aiming at guiding the planning trajectory with expert driving behavior. [图片]