Question: Do tanks/bruiser have higher damage mitigated per minute compared to tanky supports?
Null Hypothesis: Tanks/bruisers have a higher damage mitigated than tanky supports.
Alternate Hypothesis: Tanks/bruisers have a lower damage mitigated than tanky supports.
Test Statistic: Difference in Means
Alpha Level: .05
Explanation:
Null Hypothesis: Tanks/bruisers should have higher damage mitigated
per minute due to being a solo laner ( Top Lane )
Alternate Hypothesis: Opposite to the null hypothesis. Tanky supports
should be protecting their ADC, and shielding the damage
Test Statistic: We should be able to see a mean difference if true,
since damage mitigated per minute is a numerical value
Alpha Level: We set an alpha level of .05, because this is common
and is enough to showcase if our p-value is true
filtered_df.groupby('class_actual')['damagemitigatedperminute'].agg(['mean','count'])
We now need to observe the mean difference between both tank/bruiser and tanky supports. Here’s an interactable graph for damage mitgated per minute for each class
In order to test our data, we need to shuffle the values around.
with_shuffled = filtered_df.assign(Shuffled_Weights=np.random.permutation(filtered_df['damagemitigatedperminute']))
with_shuffled.head()
group_means = with_shuffled.groupby('class_actual').mean()
group_means
We can now compare the original graph, and the shuffled columns graph.
Using the difference in means, we have found that the observed difference is -330, and our p-value is 0. This suggests that tanks/bruiser have a higher damage mitigated overall compared to tanky support.
n = 500
differences = []
for _ in range(n):
with_shuffled = filtered_df.assign(Shuffled_Weights=np.random.permutation(filtered_df['damagemitigatedperminute']))
group_means = (
with_shuffled
.groupby('class_actual')
.mean()
.loc[:, 'Shuffled_Weights']
)
difference = group_means.loc['tanky support'] - group_means.loc['tank/bruiser']
differences.append(difference)
mean_weights = filtered_df.groupby('class_actual')['damagemitigatedperminute'].mean()
observed_difference = mean_weights['tanky support'] - mean_weights['tank/bruiser']
print(observed_difference)
p_value = (np.array(differences) <= observed_difference).mean()
print(p_value)
-330.57659934042726
0.0
Conclusion: P-value = 0.0 Since our P-value (0.0) is less than our significance level (alpha = .05), this suggests that tanks/bruiser have a higher damage mitigated per minute than tanky supports.
We will now focus on Framing a prediction problem:
Our goal is to build a multiclass classification model that predicts the role or position (bot, jng, mid, sup, or top) of a player in a game. The response variable we are predicting is position, which is categorical with five distinct classes.
This problem is a multiclass classification because:
Choice of Response Variable We chose position as the response variable because:
Evaluation Metric Choice Since we are dealing with a multiclass classification problem, our choice of evaluation metric is accuracy. Accuracy is appropriate in this case because:
Features Used for Prediction We ensured that all features used in training would be available at the time of prediction