lichess.org
Donate

Ratings Are Broken

Interesting trends with ratings are also observable on lichess. This is not a scientific analysis but merely my own observation.

For instance, since the average rating in Arena Tournaments is usually around 1800, players below that rating tend to gain some while players above tend to lose rating while playing Arena Tournaments. (Thanks, berserker button!)

At higher levels (>2000) you are actually much more likely to play against lower rated opponents in quick pairings, no matter who you are. This is a by-product of the lichess player population which decreases severely with ratings.

For instance, if I am 2300 and play quick pairings, I will get 2100s and 2200s all the time, whereas a 2500 would be a very rare sight for me even in hundreds of games. In return, that 2500 is probably getting 2300s all the time.
The Glicko system is a lot better than the Elo system.
Using it will probably solve the problem of underated players ...
FIDE should probably just adopt Glicko (or Glicko-2). The only real reason they don't is because 'we have always used elo, so why stop now?'
I guess the (initial value of the) k-factor on Lc fluctuates a bit according to games per date range played. This can counter balance influx of new, and the burnouts of declining players.

Mr Sonas - as statistician - should not spawn recommandations about an ELO or alike system.
Only after a proposition by rule makers, he might give some.
In a few countries a sudden rating change has been done, with lots of complaints etc. as consequence.
Sonas seems to be unaware of this.

Just like monetary inflation, there is perceived inflation and real inflation. But how do you measure those?

BTW did Sonas neutralize the k-factor acc. to rating classes in his data?
I always find it strange how players have an opinion about otb chess but don't play themselves it. If you didn't play any fide rated game last months then it is very unlike you understand the problem and impossible to propose a good solution.
This seems to be about ELO assuming some kind of steady state (or stationnary?) assumption about the population, while we have a population dynamics or a transient flow.

Or that the population average should be constant or somewhere we know.

Are there more flexible systems out there, like the one on lichess? I have only looked at your blog here.. not the referred to sources. But the suggestion of initiatial condition change, might be a good temporry perturbation experiment to see how the population distribution reacts.

But, alernative competition pairing systems could be used on the same games from any time point.. to test which behave most robustly to evolving population distribution, by way of influx and outflux. Whether the transients come from sampling uncertainty (like the assumptions of glicko in my words, sampling not being i think the proper term) improvement trajectories, or whole populatino of players changes, the games, the players, and the times of events are there. These time series could serve many rating evolution measure system simulation.. something like that.. or where am I misconceiving?
Strange coincidence: some days ago I raised a similar (almost identical!) topic in Germans's biggest forum. We have many underrated players after the lockdown and several booms in every tournament which feels like dragging everyone down.
@dboing said in #7:
> This seems to be about ELO assuming some kind of steady state (or stationnary?) assumption about (...)

At least in physics, the word is equilibrium. :-)
@peppie23 said in #6:
> I always find it strange how players have an opinion about otb chess but don't play themselves it. If you didn't play any fide rated game last months then it is very unlike you understand the problem and impossible to propose a good solution.

I play a lot of OTB chess, and still believe Glicko is the right solution.