pipeline_tag: tabular-classification
F1 Race Prediction & Visualization System
A full-stack machine learning project that predicts Formula 1 podium finishers and race winners using real telemetry data, with a live race replay simulation and an autonomous AI driver.
Screenshots
Pre-Race Predictions & Track Map
Full Driver Rankings
Model Accuracy Dashboard
What It Does
- Predicts podium finishers and race winners using two separate Random Forest models trained on 3+ seasons of real F1 data
- Serves predictions through a Streamlit web app with qualifying order input, race condition settings, and a model accuracy dashboard
- Replays real races in a live Arcade simulation with actual GPS telemetry from FastF1 β cars move around the real circuit lap by lap
- Updates predictions live during the replay as race positions evolve
- Autonomous AI driver β a Deep Q-Network (DQN) ported from C++ drives around real F1 circuits using LIDAR sensors
Results
| Model | Accuracy | F1 Score | AUC-ROC |
|---|---|---|---|
| Podium (Top 3) | 94% | 0.86 | ~0.93 |
| Winner (P1) | 87% | 0.55 | ~0.85 |
Tech Stack
| Area | Tools |
|---|---|
| Data Collection | FastF1, Pandas |
| Machine Learning | Scikit-learn (Random Forest) |
| AI Driver | PyTorch (DQN), LIDAR ray casting |
| Web UI | Streamlit, Plotly |
| Simulation | Python Arcade, NumPy |
| Version Control | Git, GitHub |
Project Structure
f1podiumpredictor/
βββ app.py # Streamlit web app
βββ main.py # Full pipeline runner
βββ replay.py # Arcade race replay entry point
βββ multiclass.py # Optional P1-P20 position predictor
β
βββ f1predictor/ # Core ML package
β βββ collect.py # FastF1 data collection
β βββ features.py # Feature engineering
β βββ train.py # Model training (podium + winner)
β βββ predict.py # Prediction logic
β
βββ src/ # Replay + AI modules
β βββ arcade_window.py # Main Arcade window
β βββ replay_data.py # Lap position loader
β βββ live_predict.py # In-race prediction updates
β βββ ai_driver.py # DQN agent (PyTorch)
β βββ ai_track.py # F1 circuit surface + LIDAR
β
βββ models/
β βββ best_time.pt # Trained DQN model
βββ data/ # Generated after running pipeline
βββ model/ # Generated after training
Setup
1. Clone the repo
git clone https://github.com/sarthak-codes11/f1podiumpredictor.git
cd f1podiumpredictor
2. Install dependencies
pip install -r requirements.txt
pip install torch --index-url https://download.pytorch.org/whl/cpu
3. Collect data and train models
python main.py
This collects 3 seasons of F1 data from the FastF1 API and trains both models. Takes 15β30 minutes due to API rate limits.
4. If you already have data, skip collection
python main.py --skip-collect
Usage
Streamlit Predictions App
streamlit run app.py
- Enter qualifying order and race conditions
- Get win % and podium % for each driver
- Check model accuracy, confusion matrices, and feature importances
Race Replay Simulation
python replay.py --year 2024 --round 12
Replay Controls
| Key | Action |
|---|---|
TAB |
Switch between F1 Replay and AI Driver mode |
SPACE |
Pause / Resume (replay) or Restart AI (AI mode) |
β / β |
Step back / forward one lap |
β / β |
Change playback speed |
1β5 |
Set speed directly (0.25x to 4x) |
L |
Toggle LIDAR rays (AI mode) |
R |
Restart replay from lap 1 |
Features Used by the Models
| Feature | Description |
|---|---|
grid_pos |
Starting grid position |
quali_pos |
Qualifying position |
avg_finish_last3 |
Rolling avg finishing position (last 3 races) |
avg_quali_last3 |
Rolling avg qualifying position (last 3 races) |
track_temp |
Race day track temperature |
is_wet |
Binary wet/dry race flag |
driver_id |
Encoded driver identifier |
team_id |
Encoded constructor identifier |
circuit_podium_rate |
Driver's historical podium rate at this circuit |
cumulative_podiums |
Podiums accumulated this season (championship form) |
How the Models Work
Two separate Random Forest classifiers are trained:
Podium Model β predicts whether a driver finishes in the top 3.
Class imbalance is handled with class_weight={0:1, 1:6} since only
3 of 20 drivers podium per race. Decision threshold set to 0.4.
Winner Model β predicts whether a driver wins the race.
Far rarer event (1 in 20), so uses class_weight={0:1, 1:55} and
a lower decision threshold of 0.3 to avoid always predicting "no win".
Both models use a time-ordered train/test split β trained on older seasons and tested on newer ones β to prevent data leakage.
AI Driver
The AI driver is a Deep Q-Network trained in C++ using Raylib and ported to Python for inference. It uses a 23-dimensional state space:
- Speed, heading (sin/cos), normalized position (5 dims)
- 13 short-range LIDAR rays for wall danger detection
- 5 long-range LIDAR rays for corner anticipation
The agent drives around real F1 circuit centerlines extracted from FastF1 telemetry, with wall detection based on distance to the circuit edge rather than pixel colors.
Disclaimer
Data is sourced from the FastF1 library which accesses publicly available F1 timing data. This project is for educational purposes only. Formula 1 and related trademarks are property of their respective owners.


