Baseline Model

Model used:

Random Forest Classifier
feature to be predicted: position

Quantitative Features

kills
deaths
assists
damageshare
wardsplaced
earnedgoldshare
total: 6

Nominal Features

year
patch
total: 2

Ordinal Features

no ordinal features were used
total: 0

Encodings

RFC don’t need numerical data to be scaled or altered, so all quantitative features remained as is
We used One Hot Encoding on the year and patch features
- although year and patch are stored as numbers, they have no order or weight

Model Performance

The five different positions are model can predict are:

Because the consequence of an incorrect prediction is the same for all positions, we will be using accuracy as our main evaluation metric, as opposed to precision, recall, or F1 score

Model hyperparameters:

These were arbitrarily chosen, just to give us a reference of how good the model would be without any tuning

max_depth: None
- None because we were curious to see how far down a decision tree would go without limiting it
n_estimators: 300
- 300 was set because it is a decently high number for the size of our dataset

Training Data

accuracy score: 100%
this means there is a very good chance the model took advantage of having no limit for max_depth and the random forest has many deep trees
very good change this model overfit to the data

Testing Data

accuracy score: 62.76%
comparing this to the training data score, it is clear our baseline model was overfitting
difference in training and testing scores: 37.24%

Conclusion

We need to tune our model’s hyperparameters so that it does not overfit

LoL Position Classification Data Analysis