I found this peace of data on Kaggle and tried to construct a representative model to predict the quality of wine. The dataset contains roughly 1600 observations and rates the quality of wine between 3 and 8.
The variables are:
# Column Non-Null Count Dtype --- ------ -------------- ----- 0 fixed acidity 1599 non-null float64 1 volatile acidity 1599 non-null float64 2 citric acid 1599 non-null float64 3 residual sugar 1599 non-null float64 4 chlorides 1599 non-null float64 5 free sulfur dioxide 1599 non-null float64 6 total sulfur dioxide 1599 non-null float64 7 density 1599 non-null float64 8 pH 1599 non-null float64 9 sulphates 1599 non-null float64 10 alcohol 1599 non-null float64
In order to find the best model possible, I tested three different classification models:
- Nearest Neighbor
- Decision Tree
- Random Forest
The jupyter notebook with all the results can be found on Kaggle.