πŸš— Taking the Driving Theory Test with Vision-Language Models

πŸ“ EPFL – Bachelor in Communication Systems, Year 3 (2024)
πŸ“š Supervised by: Prof. Alexandre Alahi, Dr. Charles CorbiΓ¨re
πŸ”— Final Report: Report


This project explores the capabilities of Vision-Language Models (VLMs) in interpreting and answering driving theory test questions, which often combine visual inputs (e.g., road signs, traffic situations) with linguistic cues.

The focus was on evaluating zero-shot and few-shot performance of multimodal transformers in high-stakes reasoning tasks. By fine-tuning open models on driving-related datasets, we assessed their generalization, interpretability, and real-world usability in an educational context.


πŸ›  Tools & Libraries:

🧠 Techniques: