🚗 Taking the Driving Theory Test with Vision-Language Models

📍 EPFL – Bachelor in Communication Systems, Year 3 (2024)
📚 Supervised by: Prof. Alexandre Alahi, Dr. Charles Corbière
🔗 Final Report: Report

This project explores the capabilities of Vision-Language Models (VLMs) in interpreting and answering driving theory test questions, which often combine visual inputs (e.g., road signs, traffic situations) with linguistic cues.

The focus was on evaluating zero-shot and few-shot performance of multimodal transformers in high-stakes reasoning tasks. By fine-tuning open models on driving-related datasets, we assessed their generalization, interpretability, and real-world usability in an educational context.

🛠 Tools & Libraries:

Python
PyTorch
HuggingFace Transformers
CLIP
BLIP

🧠 Techniques:

Vision-Language Modeling
Zero-Shot Learning
Fine-Tuning
Educational AI