Rebirth of Dynamic Grading
When I recently introduced the new grading system iDG (improved Dynamic Grading), I had compared its performance with that of DG (Dynamic Grading) over the batch of game results that were used by the AC Ranking Review Committee in all its tests and comparisons. It comprises the batch of games in the AC database from January 2000 until October 2010. This batch seemed a reasonable choice at the time. The database was in full swing by then, it was the most recent batch of games and sufficiently plentiful (160324 games in all). However, I decided a few days ago to make a more elaborate comparison of the mentioned two systems. This led to disappointing discoveries. The official system DG, in operation now for about three years, was actually not performing as expected. Its delivered Grade Deviation was sometimes not even up to the standard of the simple grading system Idx20. Further investigation revealed that the long held tacit assumption, that games over a recent 11-year span give a suitable batch over which to optimise parameters, is seriously flawed.
This article begins by substantiating the claims just made. It goes on to offer an explanation of why the mentioned assumption is flawed and to call attention to the existence of simple but effective remedial measures.
Background information is available at the following sources.
[ODG] is the article in which the original Dynamic Grading system (hereafter denoted oDG) was introduced by the AC Ranking Review Committee. A plain English introduction appears in [PEI]. Details about iDG can be found in [IDG].
The Grade Deviation statistic (GDEV) of a grading system is always calculated over a batch of game results deemed representative for the sport. In case of the AC game results the batch will be defined by the start of its First Test Year (FTY) and the end of its Last Test Year (LTY). The column headed by TGTOT gives the Test Game Total. The table to follow give GDEVs for three known grading systems as calculated over recent 11-year periods.
In each of these three assessments oDG did not reach the standard of the simple grading system Idx20. This was not supposed to be happening! What went wrong?
In order to figure this out we need to consider the nature of a dynamic grading system.
It arises as solution to an operations research problem. Given a set of game results, which choice of parameters (of a predetermined kind) will provide the best predictive system? The "problem" changes from year to year. Some players become inactive. New players arrive on the scene, with new starting grades. New rapid improvers appear, old ones disappear. Cultural practices change. It is unrealistic to expect the "problem" to remain unchanged and equally unrealistic to expect the "solution" to remain the same. This is what we have done. We expected the same "solution" (i.e. chosen set of optimal parameters) to remain optimal year after year no matter how the situation may change.
With the wisdom of hindsight it can now be recognised that a flawed assumption was at work. A test batch based on a recent 11-year period may have peculiarities that need not be typical of the sport in the long run. When parameters are optimised over such a batch they provide the best accommodation of those peculiarities. When the batch changes, so do the peculiarities, which may no longer be well accommodated, i.e. the parameters may no longer be optimally chosen. The above table illustrates that well.
To remedy the awkward situation arrived at we need to optimise parameters over the largest available set of game results at the time of the optimisation. This will be called Total Batch Optimisation. Furthermore, it is not a good idea to decide on a set of parameters once and for all and keep them year after year. Such a practice is contrary to the very philosophy of dynamic grading – one should use the feedback of past experience all the time. Accordingly, instead of having an official grading system (defined in terms of parameters with fixed values) we should go for an official grading method. For example, an algorithm defined in terms of parameters whose values get updated at the end of each calendar year.
For the purpose of the ensuing discussion I'm going to introduce common names for the three parameters common to both oDG and iDG, as follows.
LM = Lowest Modulator = lowest value that the modulator is allowed to attain.
These are fractionally adjustable parameters, i.e. they can be incremented by arbitrary small fractions. In addition to these three, oDG also has two integer-valued parameters:
PRN = Primary Retrospection Number
PRN specifies the number of preceding games whose data is used for calculation of the Recent Performance Deviation of the player (see [ODG] for more details). SRN is the number of Recent Performance Deviations whose average determines the PDT (Performance Deviation Trend) of the player. The official optimal values of the above parameters that completes the definition of the (currently official) grading system oDG are as follows:
PRN = 30, SRN = 8, LM = 16.0, SM = 24.0, HM = 35.2
To optimise parameters is tedious and tiring. The task that lies ahead motivated me to do something I should have done long ago – develop software to aid this process. I'm not a professional programmer but what I have now is a great help. It converts a given list of start parameter values to a set of optimal values. I'll gladly share it with anybody who wants to join in the search for optimal parameters for the systems under study.
The optimisation process is reminiscent of the action of water sprinkled over an uneven terrain. The water falling on a particular spot runs off to form a puddle nearby. In general there is no unique puddle, but each puddle indicates the position of a local minimum height above sea level, which is generally not a global minimum. Accordingly, an optimised set of parameters is not unique and they do not in general give a global minimum value for the GDEV statistic - only a local minimum, arrived at from the starting values. The chosen starting values thus have considerable influence on the eventual optimised values obtained.
Let us go back in time and redo the definition of oDG that was made at the end of 2010. Instead of optimising the parameters over the games of the most recent 11 year period, we use instead the Total Batch of all games in the database up to that point. There is no change to the update algorithm, only the values chosen for the parameters. The integer-valued parameters PRN = 30 and SRN = 8 could be chosen once and for all. Instead of choosing fractionally adjustable parameters once and for all, we prescribe the method of their updated determination at the end of each calendar year. We depart from an experimentally determined set of start values. These start values could also be chosen once and for all. Once a year we run the optimisation procedure from these start values. Let us illustrate how this procedure would play out for the calendar years 2010, 2011, 2012, using an end-of-calendar-year recalculation date. It gives fairly consistent parameters and an excellent GDEV throughout.
In other words, instead of the static official system oDG we can have a variable official system whose fractionally adjustable parameters are updated at the end of each calendar year. In the above example we are departing from the experimentally determined fixed start values 17.125, 27.5, 34.7625 (not much different from the fixed values of oDG). These three become updated at the end of 2010, resulting in only a slight change in the HM value, to serve for the next twelve month period; then they become updated slightly at the end of 2011 and again at the end of 2012. It is never a radical departure from the values of the previous year, but enough to keep them reasonably optimal, as shown by the fairly consistent and excellent GDEV statistic that is produced.
iDG suffers from the very same birth defect as oDG. It could be reborn in the same way, as suggested by the following list of experimentally determined start values for its fractionally adjustable parameters and the annually optimised adjustments.
The MM parameter is the Modulator Multiplier (see [IDG]).
In the above examples an end-of-calendar-year recalculation date was used for simplicity. In practice, the end of March or September will have the advantage of not being mid-season anywhere. On the other hand, the recalculation is not expected to be disruptive. It ought to make a smooth transition that will be invisible to the general eye.
In order to have a better overall perspective, let us now compare the following grading systems over Total Batches as regards the GDEV statistic they produce. Idx20 and CGS are well known and oDG and iDG have been described above.
RDG (Reviewed Dynamic Grading) and SDG (Simplified Dynamic Grading) are the reborn versions of oDG and iDG respectively obtained at the end of 2012. Thus RDG and oDG use exactly the same algorithm while differing only in parameter values, as follows:
Similarly SDG and iDG differ only in parameter values, as follows
The charts to follow illustrate the above table as usual.
The preceding comparisons show that optimisation over the Total Batch from 1985 to 2012 gives parameter sets that retain reasonable optimality over all smaller Total Batches. By contrast, Table 1 has shown that optimisation over recent 11-year batches gives optimality that is very quickly lost, even for adjacent batches. This is consistent with the diagnosis given above for what went wrong with oDG.
The original purpose which got this study underway, namely to make a more thorough comparison of oDG and iDG, has an anticlimactic outcome. Neither of the two are worth having. Both are best abandoned. As regards which has the more promising reborn version, it seems that the oDG version RSD is marginally stronger in the long run. We may as well simplify things by now forgetting about iDG and its reborn version SDG. Anybody who may have resented the introduction of iDG in the first place could derive consolation from the fact that without iDG the birth defect of oDG may well have remained undiscovered for many years to come.
In fact, the serendipitous way in which this defect came to light should serve as reminder of the usefulness of the GDEV statistic in serving as a warning signal. If the mentioned remedial action is followed up it would be good if the GDEV is reported whenever the optimisation becomes updated.
All rights reserved © 2013-2018