1. Describing the data
In order to predict the Q3-Qualifying times, I ran an SQL-query on the https://ergast.com/mrd/ database. With the help of Python I transformed the q3-times into seconds (q3_sec). This will help us to compare the times and predict future laptimes.
Overall, we have a dataset with 125 rows and data from 2006 until 2019. The q3-times have a minimum of 87.87 seconds and a maximum of 116.31 seconds. The overall median is 93.11 while the mean is 93.98 seconds.
# Boxplot of all Q3 Times >> boxplot(df$q3_sec ~ df$year)
Looking at the boxplots over the years, we see a an outlier of qualifying times in 2010 with significantly higher times than in all other years. We could think that this was a race under rain conditions, but after a little bit of research I found out, that it was actually a race with a different cirquit layout. The track was 887 meters longer and caused higher laptimes. Considering this, I will exclude the 2010 data in the following analyses.
# Exclude 2010-times >> cond <- df$year != 2010 >> df <- df[cond,] >> boxplot(df$q3_sec ~ df$year)
As we can see, the average Q3-times are as low as they have never been. This will play an important role in the prediction model. The means of the different years are distributed as shown in the following boxplot
# Group by years and calculate the mean >> years_grouped <- aggregate(df, list(df$year), mean) >> boxplot(years_grouped$q3_mean)
2. Find a Prediction Model
Applying a linear regression to the historical Q3-Laptimes:
# Build the first Prediction Model >> result_all <- lm(q3_mean ~ year, data = years_grouped) >> plotModel(result_all) >> summary(result_all)
The model describes 53.62 % of the values and we see, that per year the average times are decreasing by 0.348 seconds. Applying this model would mean, that in the year 2020/2021 we would have average Q3-times of 89.633 and 89,285. As we already had in 2019 Q3-times below the predicted values, it might make sense to adjust the model.
Adjusted Model (considering only data from 2016-2019)
3. Prediction for 2020/2021
The model describes 96.72 % of the values at a probability of error of 1.65% and shows, that per year, the laptimes decrease by 0.662 seconds. This would mean, that we will see average Q3-Times of 87.777 (1:27.777 in 2020) and 87.115 (1:27.115 in 2021) in the upcoming races in Bahrain. This is – of course – also depending on track conditions (tarmac, temperature and weather conditions) and more importantly on the effort constructors are able to take to develop the cars further…
Thanks for reading, hope you liked it 🙂