Proposed Modification to Lehman Rating System

Preamble

First of all let me say that I am a great fan of the Lehman Rating system used on OKBridge. If I had worked on devizing such a system, I hope that I would have come up with the same one. I think that once a player has played a reasonable number of boards, his/her lehman rating reflects pretty accurately the player's true level. Of course, no rating system is perfect but I think that Lehmans are the best we've got (certainly more accurate than, for example, the accumulation of ACBL master points).

The problem - an example

The modification which this document proposes is an attempt to remedy a situation which I think leads to inaccuracies in the Lehman rating. The situation is caused by new members (or members who have prevailed upon Matt to reset their rating) joining OK Bridge where their true playing level is not close to 50 (the starting value for every new player). Let us suppose that a novice (true rating somewhere in the low thirties) signs on to OKB. Some nice intermediate player (myself, perhaps) agrees to play with this novice. Let's say that two more intermediate players join as opps (all the intermediate players have a rating of 45, the novice who has never played before has a rating of 50). We play a board and we score 42% (MPs or converted IMPs). According to the Lehman scheme, the pseudo-ratings for this board (which will be added in at the end of the week to form a new rating) are as follows: N: 40.89; S: 36.81; E/W: 53.65. As expected North's rating will go down somewhat, and the E/W pair get a boost. But look what happens to South's pseudo-rating: it is 36.81! This is because the expected MP score for N/S should have been 51.35% and "we" only achieved 42%.  That's a relatively big hit because North, with the higher current "rating", gets a proportionately bigger share of our "take".  [Note, however, that South's rating doesn't change quite as much as Norths because the differences are proportional to the ratings].  After a few boards like this, South is likely to say to North, "It's been nice playing with you, good luck with OKB, I have to go...". That's a shame because if everyone does that (I know many players who would have quit after one hand, or even before the end of the first hand!), our newbie is likely to get a bit despondent (like he has electronic B.O. or something).

Futhermore, even though we might feel that we can live with the problem described above re: newbies, it is nevertheless a fact that the incorrect calculation of the ratings also contributes in the long term to every player's rating.

The solution - high-level - with example

My proposed modification is to keep track of an estimate of the error in the rating (also called the "error bound"), as well as the rating itself. Note that if we use the standard error estimate of 1/sqrt(N-1), then all the server needs to remember is the total number of boards played (which it must do anyway).  [Note that N itself is also subject to the standard Lehman decay of 6.7% per week, as it represents the number of boards contributing to the player's history].  Now there is a fairly minor modification to the Lehman formula which calculates the pseudo-ratings (details below). The result would be the following pseudo-ratings: N: 30.53; S: 44.45; E/W: 55.01. This time, North's pseudo-rating is quickly adjusted down to the generally appropriate area, while South's rating hardly changes at all. It's true that E/W get a better boost than perhaps they deserve, but it's mainly at the expense of the newbie. After the first week of playing, even if it's only 25 boards, his error estimate will drop to 20% instead of 100% and he won't suffer such big effects. It's almost as if the new player is getting rated according to the old scratch system - which is appropriate given that so far we nothing about his playing ability. Needless to say, if a player gets his account "reset", then the number of boards played will also go back to zero and he'll be treated exactly like a totally new player.

The solution - the gory details

First, here's a helpful link to the description available from the OKBridge Club page: Lehman Ratings. And this links to the original specification.

Notation:

Terms in standard Lehman formula
Rn
Current Rating for North
Rns
Combined Rating for North/South (arithmetic mean)
Rnsew
Combined Rating for North/South/East/West
Pns
Actual score for North/South
P^ns
Predicted score for North/South = 50 Rns / Rnsew
Qn
Pseudo-rating (based on this result) for North = Rn * Pns / P^ns
DELTAn (difference that will be applied to Rn after being weighted according to the number of boards in N's history)
Difference: = Qn - Rn = Rn * ( Pns / P^ns - 1 )

Notation:

Terms in modified Lehman formula (note the use of the prime character: ' to show modifications)
En
Current Error bound in rating for North
Ens
Mean Error bound for N/S
R'n
 = Rn * En / Ens
P^'ns
 = 50 R'ns / R'nsew
DELTA'n
 = R'n * ( Pns / P^'ns - 1 )
Q'n
 = Rn + DELTA'n
The important aspects of the modified formula are: If you're interested in the really gory details and/or would like to experiment a little, I've provided a spreadsheet for your interest. Note also that any real implementation must address the fact that according to the standard error formula the error bound when there are fewer than 2 samples is essentially infinite (i.e. 100% in our situation). If you have any trouble downloading and viewing the spreadsheet, please MailMe.

Another, more detailed, example

These results are based on a simulation (using another Excel spreadsheet).  In the two simulations a new player sits down with three other players of average ability (ie. 50% guessers, having ratings of 50).  However, the new player is only a 40% guesser (that's to say on key decisions, he only makes the right bid/play 40% of the time).  Of course, he has a 50 rating too, to begin with.   They play five boards, then break up.  This happens ten times each week.   The newbie's partner is always the same person, but the opponents are different each time (and always 50% guessers/ratings).  The simulation goes on for 10 weeks.  Before the simulation, newbie's partner has played 500 boards (ie. he's been on OKB about 10 weeks, say).  Here are the results (the top row is the week number, the second is the newbie's rating at the end of the week, and the third is the partner's rating at the end of the week):

Using existing Lehman formula:
1       2      3      4      5      6      7      8      9      10
47.3   46.5   44.1   43.0   42.7   42.9   42.0   41.4   41.2   40.9
49.7   49.6   49.0   48.6   48.5   48.6   48.2   47.9   47.8   47.6

Using proposed modified Lehman formula:
1       2      3      4      5      6      7      8      9      10
44.8   44.7   41.8   40.7   40.6   41.0   40.1   39.5   39.4   39.1
50.0   50.0   49.7   49.5   49.5   49.6   49.3   49.1   49.0   48.9

Note that with the existing lehman formula, newbie approaches his "true" 40 rating fairly slowly (still not there after 10 weeks).  The partner drops to 47.5 by that time.  With the modified formula, the newbie gets quite quickly to around 40 (actually a little lower) while the partner is much less affected (finishes around 49).  The numbers end up perhaps a little lower than expected largely because this was a random simulation and the pair actually achieved an overall result of 44.85% against the constantly average opponents (where perhaps 45% might have been expected).  On the other hand, there is no perfect correlation between guessing ability (the input to the simulation) and Lehman rating (the output) although I have found a pretty good correspondence with other simulations I've done.

Note also that if I had chosen a different number for the partner's history (say 2000 boards instead of 500) the partner's rating would be affected even less with the modified formula, than in the example shown.  In fact with 2000 bds history, partner's rating would drop only to 49.8 and newbie's would drop to 39.2.

Please let me know MailMe if you like this suggestion or have questions about it. I am very serious about proposing that this change should be made in the OKBridge server.


Matt Clegg's further suggestion, and an even further one by myself

In an email discussion we had, Matt proposed an interesting manner of presenting the the above described ratings/error-bound information in a user's stats.  Instead of showing the rating and error bound independently, he suggested showing the rating less the error bound only.  So, for example, instead of showing a rating of 45.35 plus or minus 2.65%, the rating would simply show 42.7.  This value represents the worst true rating the player could have, based on current information.  I think a further improvement might be to remove any extra significant figures not justified by the error bound.  In the above example, the rating would show as simply 43, on the grounds that pretending that there is  precision to 1/100th of a rating point, when we know that the error bound in this case is greater than 1, is silly.


Other excerpts from R.G.B.O newsgroup correspondence


Note

I have slightly modified this document from that first pointed to by Matt Clegg's page On the Horizon on the OKBridge Spectator of February 1998, but the essentials are the same - only more corroborative detail.


This page created by Robin Hillyard (aka spider) and last modified February 9th, 1998.