Initial Cluster Analysis of November 2023 Makuuchi: Two Clusters
Guess who the two clusters are...
I’ll just tell you: Takakeisho and Hokuseiho.
Now you may be thinking that I am trying to confuse you….
No, no, this is not sumo chaos. I promise you, this will make sense before it is over.
Reminder: Hellinger metric as distance between rikishi
This is a distance between two discrete distributions, and the discrete distributions I’m using are the kimarite each rikishi win by. The sumo database handily gives the cumulative distribution for each rikishi.
In prior posts, I used a common “average” distribution from the whole database as the reference distribution. But I’m going to do something different now.
Clusters!
No, not these kinds of clusters.
Now, I’m not doing k-means clustering - I’m doing something different, which is to reduce potential overlapping/abutting.
I’m doing something extreme, where I pick two “discriminant” points that are farthest from each other in the original data set, and then each point in the set is assigned to the “extremal” point it’s closest to.
So - I took all the rikishi, and calculated the pairwise Hellinger distances… and I got a matrix. Here is a conditional-formatted version (very red = large distance, white = 0)
The Hokuseiho row is obviously the most extreme — and the rikishi he’s farthest from is Takakeisho.
Comparison of Hokuseiho and Takakeisho
So let’s see what made for this result of Hokuseiho and Takakeisho being the two poles under this metric.
Basically, the two most common kimarite, oshidashi and yorikiri, are zeroed out for each rikishi, more or less.
Where do the other rikishi fall?
What’s really neat is that November Makuuchi splits almost 50/50 between these two poles.
To be sure, some are very close to the middle (and once I pick a third “pole”, we’ll start seeing more interesting clusters), but let’s just start with these two extremes.
Takakeisho camp (oshidashi favorers)
Takakeisho (duh)
Daieisho
Abi
Hokutofuji
Ura
Tobizaru
Takayasu
Gonoyama
Midorifuji
Onosho
Shonannoumi
Takanosho
Kinbozan
Mitakeumi
Myogiryu
Kotoeko
Oho
Tamawashi
Tomokaze
Ichiyamamoto
Churanoumi
Tohakuryu
Hokuseiho camp (yorikiri favorers)
Terunofuji
Kirishima
Hoshoryu
Kotonowaka
Wakamotoharu
Asanoyama
Shodai
Meisei
Nishikigi
Hokuseiho
Endo
Atamifuji
Ryuden
Hiradoumi
Sadanoumi
Tsurugisho
Takarafuji
Roga
Nishikifuji
Kitanowaka
I’ll look more into the oshidashi/yorikiri split later, which is the big one — this is like principal components analysis, where one finds the big differences, and then gets into more detailed differences.
UPDATE: Spreadsheet
Note: I implemented the Hellinger Distance as a user-defined function in VBA and substack doesn’t support embedding that right now.
So, this is a spreadsheet with just the values and not the calculation of the function. You’ll have to check my calculations… or just trust me.