π Taking the Driving Theory Test with Vision-Language Models
π EPFL β Bachelor in Communication Systems, Year 3 (2024)
π Supervised by: Prof. Alexandre Alahi, Dr. Charles CorbiΓ¨re
π Final Report: Report
This project explores the capabilities of Vision-Language Models (VLMs) in interpreting and answering driving theory test questions, which often combine visual inputs (e.g., road signs, traffic situations) with linguistic cues.
The focus was on evaluating zero-shot and few-shot performance of multimodal transformers in high-stakes reasoning tasks. By fine-tuning open models on driving-related datasets, we assessed their generalization, interpretability, and real-world usability in an educational context.
π Tools & Libraries:
- Python
- PyTorch
- HuggingFace Transformers
- CLIP
- BLIP
π§ Techniques:
- Vision-Language Modeling
- Zero-Shot Learning
- Fine-Tuning
- Educational AI