1. Objective
A unified pan-atlantic (European/American) integrated Elo rating, dan/kyu grade, handicap system. To achieve this the following issues have to be addressed:
2. Revision history
3. Background
Five years ago John Kenney and I made a first pass at establishing a correspondence table between dan/kyu grades and Elo ratings. Our ultimate goal was to produce a promotion system based on Elo ratings as the (then and now) current point system that I had devised is inflationary by its very nature. This is one of the reasons Europe has so many 3-dans and relatively few 1- and 2-dans. The European correspondence table, also has a serious problem with compression of grades as will become clear in section 5. In America, over the past years Larry Kaufman has designed an integrated system that relates grades to ratings and rating differences to handicaps. It has proven to be very stable. Statistical analysis of the limited amount of data available seems to indicate that the correspondence between handicap and rating difference is the same for all strengths. The rating officers on both sides of the Atlantic (Larry Kaufman and Eric Cheymol) and I think the time is ripe to establish an integrated system so that a player's rating and grade mean exactly the same no matter on which continent he plays. I am involved in this matter as I have knowledge of and experience with both the European and the American systems and because unifying the two is a long-standing wish of mine. Considering that calculation of Elo ratings on both sides of the ocean is done almost identically, the potentially highest hurdle to unification has already been taken care of. Larry, Eric and I (as well as Hans Secelle, Reijer Grimbergen and George Fernandez) have discussed the issues extensively and this document is the result of these discussions.
One requirement for arriving at a unified system is that tournament practices on either side of the Atlantic are accepted by the other side. This proposal does not aim to change these practices or impose new rules. There are differences in minimum time allotment that make games ratable (i.e., valid for rating calculations). In Europe this is 45 minutes plus byoyomi, in America 20 minutes plus byoyomi. This proposal does not intend to change these regulations; both sides need only accept that these differences exist and agree that they do not interfere with establishing reliable ratings. Of course, FESA might decide to regard games ratable if the time limits are 20 minutes+30 seconds, while recommending 45+30 in general and imposing 60+30 for Grand Prix tournaments. In America handicap games are an integrated part of the rating system. Again, the intention of this proposal is not that FESA accepts handicap games as ratable in European tournaments, but only that regarding handicap games as ratable in America does not preclude a unified rating system. It would be preferred if these differences disappeared over time, but this is not a requirement for the present purpose of establishing a single rating system. A unified rating system would also enable us to have a single pan-atlantic rating list, but again this is not a necessity. For all practical purposes it is easier to maintain two separate lists as long as the rating of few players that appear on both lists is the same on both lists.
5. Relationship between Elo ratings, dan/kyu grades and handicaps
The Elo rating is the primary indicator of current actual strength, while the dan and kyu grades are titles based on historic peak performances (much like Grand Master, Master and FIDE Master titles in chess). When unifying the current ratings systems, establishing the relationship between Elo ratings and dan/kyu grades is of crucial importance. I will focus on the European situation as the American system is already being brought in line with the system proposed here (see Table I). In Europe in 1992, the Elo difference between average 3- and 4-dans was 145 points; it was 125 on the average for all dan grades. Nowadays, the "theoretical" Elo difference between two subsequent dan grades is 100 points. This indicates that the European grades are too close together in terms of the number of Elo points that separates them. Based on the following observations, Eric Cheymol and I have come to the conclusion that this is indeed the case.
A) In Japan some clubs distinguish between weak and strong 1 dans, 2 dans, 3 dans, and 4 dans. With an Elo difference of only 100 between grades, it is virtually impossible to distinguish between weak and strong players of the same dan grade. Also, European 1-dans and 2-dans have good chances to beat a 3-dan. In Japan, a 1-dan would hardly ever beat a 3-dan if they both received their grades in the same club. Therefore, it seems logical to widen the ranges of Elo ratings that correspond to each dan grade.
B) European kyu and lower dan grades are probably at least 1 grade "harder" than the average Japanese. Assuming that 4-dans/top 3-dans in Europe and Japan are approximately of equal strength, this indicates that the Elo range for each grade below 4-dan is too narrow.
C) Based on his personal experience with playing in Japan, Larry Kaufman believes that 200 Elo points per dan grade is on the conservative side for Japan: the range is probably about 225-250 there and certainly larger than the 100 points now used in Europe. Since America uses 200 points, it seems reasonably to compromise and use 150 points for the lower dan grades.
D) The 2-dan and 1-dan populations in Europe (9 and 11, respectively) each are about half the size of the 3-dan population (17). Also, the actual average ratings for the lower dan grades are systematically lower than the "theoretical" ratings (see Table I). These two observations again indicate that the Elo ranges for the lower dan grades are too narrow.
Taking the above considerations into account and using the divider between 3-dan and 4-dan (2120 in Europe; 2100 in America) as an anchoring point, the pan-atlantic correspondence detailed in Table I is proposed. It follows the European system in that going from 3-dan to 15-kyu the Elo range per grade becomes progressively narrower (pan-atlantic: 200, 150, 100; European: 100, 80, 60, 40). It remedies the problems with the overall too narrow European ranges (as exemplified by items A-D above). Compared to the present European system, it doubles the width of the ranges for the higher dan-grades (4-dan and above). This brings them more in line with Japan and avoids inflation of these higher grades by making it more difficult to break through into the 5/6-dan grades (which are not awarded in Europe (yet?)). With 150 rather than 200 Elo points for the lower dan grade ranges, the system should still lead to grades that are "harder" than the Japanese. Promotion to 1-dan and 1-kyu should be more difficult than promotion within the other kyu grades; the proposed system adequately addresses this by making the 1 and 2-kyu Elo range appropriately wide (150 points).
Grade | Elo rating | ||||
Europe | America | Proposed pan-atlantic | |||
"theoretical" | average * | range | width | ||
6-dan | 2320 - 2419 | -- | 2500 - 2699 | 2500 - 2699 | |
5-dan | 2220 - 2319 | -- | 2300 - 2499 | 2300 - 2499 | |
4-dan | 2120 - 2219 | 2159 (8) | 2100 - 2299 | 2100 - 2299 | |
3-dan | 2020 - 2119 | 1978 (17) | 1900 - 2099 | 1900 - 2099 | |
2-dan | 1920 - 2019 | 1869 (9) | 1700 - 1899 | 1750 - 1899 | |
1-dan | 1820 - 1919 | 1794 (11) | 1500 - 1699 | 1600 - 1749 | |
1-kyu | 1740 - 1819 | 1694 (16) | 1400 - 1499 | 1450 - 1599 | |
2-kyu | 1680 - 1739 | 1575 (9) | 1300 - 1399 | 1300 - 1449 | |
3-kyu | 1620 - 1679 | -- | 1200 - 1299 | 1200 - 1299 | |
4-kyu | 1560 - 1619 | 1435 (8) | 1100 - 1199 | 1100 - 1199 | |
5-kyu | 1500 - 1559 | 1353 (4) | 1000 - 1099 | 1000 - 1099 | |
6-kyu | 1440 - 1499 | 1375 (6) | 900 - 999 | 900 - 999 | |
7-kyu | 1380 - 1439 | 1392 (2) | 800 - 899 | 800 - 899 | |
8-kyu | 1320 - 1379 | -- | 700 - 799 | 700 - 799 | |
9-kyu | 1260 - 1319 | -- | 600 - 699 | 600 - 699 | |
10-kyu | 1200 - 1259 | 1182 (7) | 500 - 599 | 500 - 599 | |
11-kyu | 1160 - 1199 | -- | 400 - 499 | 400 - 499 | |
12-kyu | 1120 - 1159 | 1212 (3) | 300 - 399 | 300 - 399 | |
13-kyu | 1080 - 1119 | 1173 (2) | 200 - 299 | 200 - 299 | |
14-kyu | 1040 - 1079 | -- | 100 - 199 | 100 - 199 | |
15-kyu | 1000 - 1039 | 1224 (3) | 1 - 99 | 1 - 99 | |
* Average of the July 1997 Elo list (numbers of players in parentheses). -- indicates 0 or 1 players. |
Since overall the Elo ranges are widened in the proposed system, a one-time adjustment of ratings is appropriate. If this were not done, the rating of many lower kyu players would correspond to a grade that is 4 or 5 levels higher then their present one. The following adjustment addresses that issue adequately:
The consequences of these adjustments for the present population would be that:
handicap | DElo | handicap | DElo | |
sente | 25 | 3 p (right Lance) | 675 | |
Lance | 75 | 3 p (left Lance) | 700 | |
Bishop | 225 | 4 p | 750 | |
Rook | 300 | 5 p (right kNight) | 900 | |
Rook + Lance | 400 | 5 p (left kNight) | 1050 | |
2 p | 600 | 6 p | 1200 |
When computing the rating difference between opponents in a handicap game for rating calculations, the above handicap values are added to the rating of the handicap receiver first. Thus, if a player rated 1800 gives Bishop (225 points) to a player rated 1600, the game will be rated as if the 1600 player were 25 points higher rated (1600+225=1825) than the 1800 player.
Larry Kaufman has carried out a careful statistical analysis of a large number of games in terms of actual Elo rating differences, handicaps used and the outcome of the games. On the basis of that analysis and the grade-rating correspondences of table I, he proposes the system detailed in Table II for relating handicaps to Elo rating differences. As said before, this handicap system does not need to be adopted by FESA, but it might be advantageous to introduce it e.g. in clubs where handicap games are often played. This would help establishing fairly reliable ratings quickly even for players who do not often play in tournaments and also in cases where large differences in strength exist between players within a club.
In order to avoid drift of the population as a whole, every year the ratings of all active players with a rating of 1900 or above and at least fifty rated games are recorded, and one year from that moment the average of those of them that are still active is compared to their previous year's average. If any year's average is significantly lower than the previous year's average, the rating officers by unanimous agreement may raise the average rating of all players by an amount not to exceed the calculated difference. These calculations are carried out on the basis of ratings, not of grades.
6. Elo calculation and promotion based on Elo ratings
The following consensus rules and regulations governing Elo rating determination and promotions based on these ratings are being proposed.
6A) Gaining and losing Elo points
The FIDE "Logistic" Elo rating calculation formula (explained in the "Official Laws of Chess"), that is currently used to calculate European ratings, will be adopted without any changes. It relates a player's rating gain or loss (DElo in the formula below) upon completion of a tournament to the results of his games and the difference in ratings between him and his opponents:
where the summation extends over all games this player played in the tournament, V(i) is the result of the game against his i-th opponent (1 for a win, 0.5 for a draw, 0 for a loss), Elo(i) the rating of his i-th opponent, Elo his own rating before the tournament, and K a coefficient that, when divided by 2, indicates how many points a player gains (loses) when winning (losing) against a player of the same rating. A K-value of 20 is adequate for the higher dan grades, but lower graded players that can progress rapidly should be able to move faster, which the value of 40 enables them to do.
If a player with a rating below 1200 gains points in a "tournament", his gain shall be doubled unless such doubling brings him over 1200. In that case his new rating shall be 1200 or the rating he would have obtained if his gain were not doubled (whichever is higher).
6B) Time table for calculation of Elo ratings on the basis of tournament results
Events are to be rated in order of their finishing date if at all possible. Therefore, an event should not be rated if the results of a prior event have not yet been received.
For purposes of the rating system, a "tournament" is defined
as all games which are to be rated at one time, using the same starting
rating. If multiple events are held in a short time, they may all be rated
as if they were part of one large tournament.
No "tournament" can include more than twenty games by any one
player. If necessary to avoid this, an event will be broken into two or
more parts for rating purposes.
Club events are rated based on the date of the last game of the event. It is recommended that any event that takes a long time (e.g. more than 4 months), at the discretion of the club running the event, be split into two or more portions for rating purposes.
A tournament is ratable if at least two (provisional or established, see section 6C below) rated players participate. Once a "tournament" is rated, the new ratings will be used for subsequent events. The present European custom of using the semi-annual rating as the basis for all events over the next six months will be discontinued.
6C) Elo categories and procedure for rating tournaments
A player with less than 4 games is unrated. A player gets a provisional rating after 4 games and an established rating after 15 games. A rating remains in effect for an inactive player even if it is no longer published, unless unusual circumstances indicate that it would be more accurate to treat the player as a newcomer. A player who earns a club grade (either in a western club or in Japan) from unrated play that is more than a grade above the grade corresponding to his current rating and who has not played a rated game for at least one year may have his rating redetermined at the discretion of the rating officer. In that case, he will be regarded an unrated player with two games assumed at his new grade (see section 6C1 below).
When rating a tournament first the unrated players are rated, then the players with a provisional ratings and finally the established players:
This procedure gives provisional players fair POST-event ratings, but more importantly it uses ratings that are as reliable as possible for the calculation of POST-event ratings of established players.
6D) Determination of a promotion system based on Elo ratings
Promotion is an important issue: as demotions cannot occur, one must be very careful when promoting players. As indicated in sections 3 and 5 above (and known even when it was introduced), the point system is by definition inflationary. For example, a 1-dan with a 1923 rating is much stronger than one with a 1678 rating, and this should be taken into account in any promotion system. An Elo-based promotion system satisfies this condition.
Considering Elo ratings of European kyu players, it is clear that Elo ratings and kyu grades hardly correlate. In fact, Dutch kyu grades used to be "hard," but based on the current Elo list they seem "softer" than most others. The practice of kyu promotions being awarded by individual associations without any guiding principles has not worked and has led to the present discrepancies between ratings and grades. Replacing the current system of unregulated kyu grade assignments by an Elo-rating-based system would solve that problem.
Based on these observations, we discourage use of the point system for promotions and propose an Elo-based promotion system (for dan and kyu players alike) instead. In this system, promotion to a certain grade requires that a player satisfies one of the following three conditions:
The numbers in Table III are chosen as a compromise between European and American practices. Refer to section 6E for promotions to 5 and 6-dan.
Although new Elo ratings are calculated upon completion of an "event", for purposes of promotion the ratings officers will keep track exactly after which game a player
A) exceeded the lower-bound and mid-point ratings of the grades above his current grade, and
B) has maintained those ratings during the number of consecutive games listed in Table III (which earns him promotion).
For this reason it is very important that results of an event be reported to the ratings officers in the actual playing order. If the actual order in which games took place cannot possibly be determined, only then will it be assumed that rating points were gained at a constant rate and an interpolation scheme will be used to determine after which game a player crossed the promotion threshold the first time (A) or when he has earned promotion by maintained a rating above that threshold for enough consecutive games (B). The game 'thr' after which the threshold Elo(thr) was passed is given by:
where Elo(0) and Elo(end) are the Elo ratings at the beginning and end of the event, respectively, while #games is the number of games making up the event. This interpolation scheme should be avoided whenever possible, though.
As mentioned in section 6A, new Elo ratings can and should be calculated after each "event" or (mostly relevant for lower kyu players) after each club night. This makes it convenient for national associations and clubs to base promotions of their own players on Elo ratings. For an Elo system and an Elo-based promotion system to be conveniently used in tournaments and clubs alike, it is advantageous if a pan-atlantic shogi pairing/Elo-calculation program become available for both PC and Macintosh. It is proposed that this option be considered seriously by FESA and SFA.
6E) Promotion to 5 and 6-dan
Nihon Shogi Renmei (NSR) had asked foreign shogi organizations not to award 5 or 6-dan promotions unless they were obtained in Japan. It is proposed that neither European nor American organizations will promote a player to 5 or 6-dan at this point in time, unless the promotion is ratified by NSR.
* If a player has had a rating of 2300 (5-dan lower bound) for 16 consecutive games and his rating was above 2400 (5-dan mid-point) at least once, NSR will be requested to ratify promotion to 5-dan.
* If a player has had a rating of 2500 (6-dan lower bound) for 16 consecutive games and his rating was above 2600 (6-dan mid-point) at least once, NSR will be requested to ratify promotion to 6-dan.