Proposed Modification to Lehman Rating System
Preamble
First of all let me say that I am a great fan of the Lehman Rating system
used on OKBridge. If I had worked on devizing such a system, I hope that
I would have come up with the same one. I think that once a player has
played a reasonable number of boards, his/her lehman rating reflects pretty
accurately the player's true level. Of course, no rating system is perfect
but I think that Lehmans are the best we've got (certainly more accurate
than, for example, the accumulation of ACBL master points).
The problem - an example
The modification which this document proposes is an attempt to remedy a
situation which I think leads to inaccuracies in the Lehman rating. The
situation is caused by new members (or members who have prevailed upon
Matt to reset their rating) joining OK Bridge where their true playing
level is not close to 50 (the starting value for every new player). Let
us suppose that a novice (true rating somewhere in the low thirties) signs
on to OKB. Some nice intermediate player (myself, perhaps) agrees to play
with this novice. Let's say that two more intermediate players join as
opps (all the intermediate players have a rating of 45, the novice who
has never played before has a rating of 50). We play a board and we score
42% (MPs or converted IMPs). According to the Lehman scheme, the pseudo-ratings
for this board (which will be added in at the end of the week to form a
new rating) are as follows: N: 40.89; S: 36.81; E/W: 53.65. As expected
North's rating will go down somewhat, and the E/W pair get a boost. But
look what happens to South's pseudo-rating: it is 36.81! This is because
the expected MP score for N/S should have been 51.35% and "we" only
achieved 42%. That's a relatively big hit because North, with the
higher current "rating", gets a proportionately bigger share of our "take".
[Note, however, that South's rating doesn't change quite as much as Norths
because the differences are proportional to the ratings]. After a
few boards like this, South is likely to say to North, "It's been nice
playing with you, good luck with OKB, I have to go...". That's a shame
because if everyone does that (I know many players who would have quit
after one hand, or even before the end of the first hand!), our newbie
is likely to get a bit despondent (like he has electronic B.O. or something).
Futhermore, even though we might feel that we can live with the problem
described above re: newbies, it is nevertheless a fact that the incorrect
calculation of the ratings also contributes in the long term to every player's
rating.
The solution - high-level - with example
My proposed modification is to keep track of an estimate of the error in
the rating (also called the "error bound"), as well as the rating itself.
Note that if we use the standard error estimate of 1/sqrt(N-1), then all
the server needs to remember is the total number of boards played (which
it must do anyway). [Note that N itself is also subject to the standard
Lehman decay of 6.7% per week, as it represents the number of boards contributing
to the player's history]. Now there is a fairly minor modification
to the Lehman formula which calculates the pseudo-ratings (details below).
The result would be the following pseudo-ratings: N: 30.53; S: 44.45; E/W:
55.01. This time, North's pseudo-rating is quickly adjusted down to the
generally appropriate area, while South's rating hardly changes at all.
It's true that E/W get a better boost than perhaps they deserve, but it's
mainly at the expense of the newbie. After the first week of playing, even
if it's only 25 boards, his error estimate will drop to 20% instead of
100% and he won't suffer such big effects. It's almost as if the new player
is getting rated according to the old scratch system - which is appropriate
given that so far we nothing about his playing ability. Needless to say,
if a player gets his account "reset", then the number of boards played
will also go back to zero and he'll be treated exactly like a totally new
player.
The solution - the gory details
First, here's a helpful link to the description available from the OKBridge
Club page: Lehman
Ratings. And this links to the original
specification.
Notation:
Terms in standard Lehman formula
-
Rn
-
Current Rating for North
-
Rns
-
Combined Rating for North/South (arithmetic mean)
-
Rnsew
-
Combined Rating for North/South/East/West
-
Pns
-
Actual score for North/South
-
P^ns
-
Predicted score for North/South = 50 Rns / Rnsew
-
Qn
-
Pseudo-rating (based on this result) for North = Rn * Pns / P^ns
-
DELTAn (difference that will be applied to Rn after being weighted according
to the number of boards in N's history)
-
Difference: = Qn - Rn = Rn * ( Pns / P^ns - 1 )
Notation:
Terms in modified Lehman formula (note the use of the prime character:
' to show modifications)
-
En
-
Current Error bound in rating for North
-
Ens
-
Mean Error bound for N/S
-
R'n
-
= Rn * En / Ens
-
P^'ns
-
= 50 R'ns / R'nsew
-
DELTA'n
-
= R'n * ( Pns / P^'ns - 1 )
-
Q'n
-
= Rn + DELTA'n
The important aspects of the modified formula are:
-
As before, the DELTA (the difference from the current rating to the pseudo-rating)
is proportional to the rating; however, in addition, it is also proportional
to the current error bound (estimate of player's error based on existing
history).
-
As before, the sum of the four DELTA terms must be zero - otherwise the
lehman rating system would need recentering each week.
If you're interested in the really gory details and/or would like to experiment
a little, I've provided a spreadsheet
for your interest. Note also that any real implementation must address
the fact that according to the standard error formula the error bound when
there are fewer than 2 samples is essentially infinite (i.e. 100% in our
situation). If you have any trouble downloading and viewing the spreadsheet,
please MailMe.
Another, more detailed, example
These results are based on a simulation (using another Excel
spreadsheet). In the two simulations a new player sits down with
three other players of average ability (ie. 50% guessers, having ratings
of 50). However, the new player is only a 40% guesser (that's to
say on key decisions, he only makes the right bid/play 40% of the time).
Of course, he has a 50 rating too, to begin with. They play
five boards, then break up. This happens ten times each week.
The newbie's partner is always the same person, but the opponents are different
each time (and always 50% guessers/ratings). The simulation goes
on for 10 weeks. Before the simulation, newbie's partner has played
500 boards (ie. he's been on OKB about 10 weeks, say). Here are the
results (the top row is the week number, the second is the newbie's rating
at the end of the week, and the third is the partner's rating at the end
of the week):
Using existing Lehman formula:
1 2
3 4 5
6 7 8
9 10
47.3 46.5 44.1 43.0
42.7 42.9 42.0 41.4 41.2
40.9
49.7 49.6 49.0 48.6
48.5 48.6 48.2 47.9 47.8
47.6
Using proposed modified Lehman formula:
1 2
3 4 5
6 7 8
9 10
44.8 44.7 41.8 40.7
40.6 41.0 40.1 39.5 39.4
39.1
50.0 50.0 49.7 49.5
49.5 49.6 49.3 49.1 49.0
48.9
Note that with the existing lehman formula, newbie approaches his "true"
40 rating fairly slowly (still not there after 10 weeks). The partner
drops to 47.5 by that time. With the modified formula, the newbie
gets quite quickly to around 40 (actually a little lower) while the partner
is much less affected (finishes around 49). The numbers end up perhaps
a little lower than expected largely because this was a random simulation
and the pair actually achieved an overall result of 44.85% against the
constantly average opponents (where perhaps 45% might have been expected).
On the other hand, there is no perfect correlation between guessing ability
(the input to the simulation) and Lehman rating (the output) although I
have found a pretty good correspondence with other simulations I've done.
Note also that if I had chosen a different number for the partner's
history (say 2000 boards instead of 500) the partner's rating would be
affected even less with the modified formula, than in the example shown.
In fact with 2000 bds history, partner's rating would drop only to 49.8
and newbie's would drop to 39.2.
Please let me know MailMe
if you like this suggestion or have questions about it. I am very serious
about proposing that this change should be made in the OKBridge server.
Matt Clegg's further suggestion, and an even further
one by myself
In an email discussion we had, Matt proposed an interesting manner of
presenting the the above described ratings/error-bound information in a
user's stats. Instead of showing the rating and error bound independently,
he suggested showing the rating less the error bound only.
So, for example, instead of showing a rating of 45.35 plus or minus 2.65%,
the rating would simply show 42.7. This value represents the worst
true rating the player could have, based on current information.
I think a further improvement might be to remove any extra significant
figures not justified by the error bound. In the above example, the
rating would show as simply 43, on the grounds that pretending that there
is precision to 1/100th of a rating point, when we know that the
error bound in this case is greater than 1, is silly.
Other excerpts from R.G.B.O newsgroup correspondence
Note
I have slightly modified this document from that first pointed to by
Matt Clegg's page On
the Horizon on the OKBridge Spectator of February 1998, but the essentials
are the same - only more corroborative detail.
This page created by Robin
Hillyard (aka spider) and last modified February 9th, 1998.