Contents

Building an Autonomous Future (ICCV 2025 WDFM-AD)

Ashok Elluswamy, VP, Tesla

Recently Achievements

  • 2025.06, launch robotaxi service
  • deliver the first self-driving production vehicle from the tesla factory in austin to customer’s home in austin (20-30 minutes).
  • in the us, the production vehicle delivers itself from the manufacturing line to the loding docks (a couple miles away).

End-to-End Foundation Model at Scale

  • Map raw sensor inputs directly to control signal (next steering and acceleration (two tokens) -> steering angle, throttle, brake)
  • Runs at 36Hz
  • Perception can be implicit and can be trained as auxiliary things

/posts/community/tesla/iccv2025/images/1763442624131.webp

Why End-to-End?

/posts/community/tesla/iccv2025/images/1763443008269.webp

Codifying human values is incredibly difficult

/posts/community/tesla/iccv2025/images/1763443817575.webp

Interface between perception, prediction and planning is ill-defined

/posts/community/tesla/iccv2025/images/1763444181379.webp

Challenges of End-to-End

Curse of dimensionality

  • Problem: scale mismatch between input and output
  • Solution:
    • large data: Tesla fleet can provide 500 years of driving data every single day.
    • data engine

/posts/community/tesla/iccv2025/images/1763444333478.webp

/posts/community/tesla/iccv2025/images/1763445111214.webp

/posts/community/tesla/iccv2025/images/1763445654521.webp

Interpretability, Safety Guarantees and Internal Supervision

Rich Intermediate Outputs: Perception, 3DGS, Language

  • with prompts
  • auxiliary but helpful

/posts/community/tesla/iccv2025/images/1763445860960.webp

/posts/community/tesla/iccv2025/images/1763445936167.webp

Efficient 3D Gaussian Splatting for System Debugging

/posts/community/tesla/iccv2025/images/1763446126710.webp

Real-Time and Reflective Modes in a Single Model (Dual-Mode)

  • A fast path for low-lattency control, used in normal driving
  • [optional] A reflective mode for introspection, where the model can emit reasoning tokens and natural language summaries of its decision logic when more time is available.

/posts/community/tesla/iccv2025/images/1763446392802.webp

Evaluation (Hardest of All)

  • Training loss and open-loop metrics can not indicate closed-loop performance.
  • Safety-critical driving policy is multi-modal, and can not be judged by distance-to-ground-truth alone.

/posts/community/tesla/iccv2025/images/1763448148809.webp

Neural Network Closed-loop World Simulator

  • closed-loop evaluation
  • closed-loop reinforcement learning

/posts/community/tesla/iccv2025/images/1763448562085.webp

/posts/community/tesla/iccv2025/images/1763448750146.webp

/posts/community/tesla/iccv2025/images/1763448900634.webp

/posts/community/tesla/iccv2025/images/1763449061821.webp

/posts/community/tesla/iccv2025/images/1763449148065.webp

What’s Next

/posts/community/tesla/iccv2025/images/1763449299812.webp

/posts/community/tesla/iccv2025/images/1763449836356.webp

/posts/community/tesla/iccv2025/images/1763449886240.webp

/posts/community/tesla/iccv2025/images/1763449923047.webp

References