Question: Do tanks/bruiser have higher damage mitigated per minute compared to tanky supports?
Here’s our initial data:
print(raw.head())
This data has too many unnecessary columns that aren’t needed such as first dragon, first baron, golddiffat20 which won’t answer our question. Thus, we dropped the majority of the columns. After dropping columns, we filtered out the incomplete data using the “completness” column, and then dropped all NA.
Here’s our cleaned data:
print(cleaned_df.head())
We then created a new column, “class” and “class_actual”, which shows whether or not a champion is a tank/bruiser or a tanky support. To determine the class, we filtered for champions with high damage mitigated per minute and amounts of wards placed, and compared it to “class_actual” which is a manually mapped class to compare accuracy.
accuracy = (cleaned_df['class'] == cleaned_df['class_actual']).mean()
accuracy
0.7817412624393058
Univariate Analysis:
Bivariate Analysis:
Aggregation:
df.pivot_table(index = 'class', values = 'damagemitigatedperminute', aggfunc = ['mean', 'sum'])
Assesment of missingness:
A lot of the data is missing in general which is shown in the column “datacompleteness”, which we already filtered for completeness. This means that any missing data left is NMAR, for example, ‘goldat25mins’ has some missing values which can be expected because not all games last 25 mins long. Thus it’s NMAR.
Tanks/Bruisers tend to destroy towers more often than tanky supports, becaues that is one of the jobs of a top laner. One column that depends on another is towers and gamelength, which the longer the game happens, the more towers are destroyed:
nmar_col = 'towers'
other_col = 'gamelength'
miss = raw.dropna(subset = [other_col])[[other_col, nmar_col]].assign(missing = raw[nmar_col].isna())
observed_diff = miss.groupby('missing')[other_col].mean().diff().iloc[1]
reps = 500
diffs = list()
for _ in range(reps):
shuf_df = miss.assign(shuffled = np.random.permutation(miss[nmar_col]))
diff = shuf_df.groupby('shuffled')[other_col].mean().diff().iloc[1]
diffs.append(diff)
fig = px.histogram(np.array(diffs))
fig.add_vline(x = observed_diff, line_color = 'red')
fig.show()
print((np.array(diffs) <= observed_diff).mean())
.498
Tanks/Bruisers destroying towers more often than tanky supports should not affect Barons, as Barons require a team to kill. One columns that does not depend on another is towers and barons, as they are unrelated in missingness:
nmar_col = 'towers'
other_col = 'barons'
miss = raw.dropna(subset = [other_col])[[other_col, nmar_col]].assign(missing = raw[nmar_col].isna())
observed_diff = miss.groupby('missing')[other_col].mean().diff().iloc[1]
reps = 500
diffs = list()
for _ in range(reps):
shuf_df = miss.assign(shuffled = np.random.permutation(miss[nmar_col]))
diff = shuf_df.groupby('shuffled')[other_col].mean().diff().iloc[1]
diffs.append(diff)
fig = px.histogram(np.array(diffs))
fig.add_vline(x = observed_diff, line_color = 'red')
fig.show()
print((np.array(diffs) <= observed_diff).mean())
0.0