πŸ—ΊοΈ Robust Journey Planner

πŸ“ EPFL – COM-490 Big Data (2026)
πŸ‘₯ Team: Matthias Wyss, Thierry Sokhn, LΓ©an Bruttin, Alain Girard, Sofia Taouhid
πŸŽ₯ Video presentation: Video πŸ”— Code Repository: GitHub


Traditional public transport applications optimize solely for the shortest travel time, often suggesting risky connections. This project introduces a robust SBB journey planner that solves this issue by guaranteeing an arrival time with a user-defined statistical confidence level (Q%). The engine processes massive historical actual data (IstDaten) and timetable data (GTFS) to accurately predict delays and connection reliabilities across the Lausanne and Ouest Lausannois districts.

The core of the engine is built on a Backward Stochastic Multi-Criteria Time-Dependent Dijkstra algorithm. By traversing backward through time from the destination, the algorithm identifies the latest possible departure. It optimizes for three conflicting criteria simultaneously, Departure Time, Path Confidence, and Total Walking Distance, dynamically pruning inferior branches to construct a Pareto Dominance Frontier of optimal itineraries.

To estimate connection reliability and elegantly handle data sparsity, we designed an empirical 5-Level Hierarchical Fallback Delay Model ranging from highly contextual (Trip + Stop + Weather) to a global statistical safety net. The data processing architecture leverages PySpark and Trino on an EPFL Hadoop (HDFS) cluster to handle the scale of the historical records.

The final product includes a rigorous backtesting framework to validate the confidence levels against real-world scenarios, alongside an interactive UI built with Plotly and Mapbox to visualize the Pareto-optimal routes, transit paths, and walking transfers.


πŸ›  Tools & Libraries:

🧠 Techniques: