lichess.org
Donate

Ratings Are Broken

The FIDE proposal by Sonas only discussed deflation that has "stretched" the lower echelons on players due huge amount people coming into the rating pool. There are obviously all sorts of mechanisms creating inflation/deflation actually I think one or a another will happening all the time.
Yes people need to understand that it is most likely impossible to build a ratingsystem which is never subject to inflation/ deflation. So you try to keep it within limits and clearly 400 points off for some ranges is not within limits. Personally I think 50 points is somewhere the maximum I would like to see.
Nate, you raised your rating from 2273 to 2305 so what 2200s to 2400s are you talking about?
I suspect rating noise can accumulate in the pool. More noise means less accurate ratings, for you and the players you meet. Less accuracy means less predictive quality, which was measured here.

If you would increase noise in a single player, for example by changing Magnus' rating by -1000 or +1000, he will stabilize again around his true rating in 30 matches or so. But during those matches, he would spread the noise to the opponents he meets. So the noise stays in the system, just gets distributed to players, who then have to play more to get rid of it themselves.

I'm not sure this post goes to the cause of the problem, I don't think the newcomers from "the Queen's Gambit" are a major cause of the increased noise measured here, because the system corrects for this by giving them a high RD?
@Yersinia_Pestis said in #5:

> Mr Sonas - as statistician - should not spawn recommandations about an ELO or alike system.

Why not?

He's fully entitled to make recommendations as he pleases. He's doing the community a service. FIDE officials can do whatever they want with his proposal--ignore it, adopt it, or use it as inspiration for a different solution.

BTW, Jeff Sonas isn't just a statistician. He's served on the FIDE ratings committee before.
<Comment deleted by user>
I just read the full 19-page mathematical explanation, and he has convinced me, it's all backed up statistically.

1) He proves there is a deflation
2) He proves that it is getting worse
3)He shows that this is a problem (f.ex : some master-level players do not get master-level elo --> see the graph with the player quantity/elo)
4) He shows how to solve it
5) He backs it up whith simulations and data from 15 000 000 games

Let's do this !!!
Aha

That's what I was looking for. Just became too lazy to read up on it. But reading Sonas from way back (90s), I knew he would simulate stuff.

So that's good.

But still, I'm hoping someone would simulate using other rating systems like Glicko.

I wonder if Nate has also done something in chess ratings.
I think a system that does not assume a specific very low complexity stationary distribution to be coercing all the dynamics of a real multidimensional population paired competition system, would change more at the population level and less at the individual level. If you give room for the population to be emerging from the individual trajectory assumptions, I would think that the changes would have less impact on the notion of relative strenght or game difficulty that we would like be attached to an inidividual with whom the next game of ours is going to happen..

I am not talking only about ELO and tournament discrete events with their tiers structures only, also for the more round the clock non-discrete grouping of online games, that have no tiers.. (I have no understanding of the tiering effect anyway, I suspect it might be something that ELO might be tied to and the inertia to deviate also).

I think we should be clear about what we want a rating to be interpreted as. not their absolute value as number but as relative to the pool where it is used in practice. I do not think for example that even the average should be fixed. How much history can we base such value on.. with online chess the scope of the population of players is not of the same order as before internet existed. and same before that with some democratization of chess that might have followed the increasing free time that some societies had for wider class of its citizens.

I only care about expected game difficulty relative to my current abilities. what else could there be?
I'm so confused by this. When I used to play OTB in Canada, I was given a "provisional rating" until I completed 25 games. So I had a pseudo rating (average of my opponents rating + 400 * r, where -1 < r < 1 reflecting my win-loss record from -100% to +100%) and my opponents could not win or lose rating points by playing me (games between two provisional rated players didn't count for rating, I think). After playing 25 games, the provisional rating, which was a reasonable approximation of my true strength, was considered established, which made it the same as everyone else's rating where my rating goes up and down with my results according to an ELO-like system. I think this system somewhat nullifies the underrated noobie effect. I was under the impression that all systems use this or something similar. Glicko just sets a very high initial RD, which accomplishes the same thing in another way. I have a hard time believing other systems don't already take this into account, by some similar mechanism.

(Canada also used to have a bunch of screwed-up scaling and bonus points that made players under 2200 overrated; obviously, I'm not recommending anyone adopt that nonsense.)