Technical
May the Best Man
Win?
The Statistican (1995) 44, No. 4, pp. 529538 By DAVID R. APPLETON University of Newcastle upon Tyne, UK [Received March 1995. Final revision July 1995] SummarySimulations of various kinds of sporting tournament have been carried out to assess their relative ability to produce as winner the best of the entrants. A strong contender when it is necessary to play relatively few games is the seeded draw and process. When the players are closely matched and there can be more games the round robin played twice is most effective. Keywords: Croquet; Paired comparisons; Sporting tournaments 1. Introduction'It takes a good team to win the league, but you need luck to win the cup.' 'You get some funny results over five frames but the best snooker player will win the first to 19.' 'I don't know why they bother seeding tennis tournaments; the seeds always seem to get knocked out.' Observations like these are the clichés of the sports fan, and it is the intention of this paper to look at such points and to assess their validity. What we shall say will often have applicability to many sports, but it will in fact refer almost exclusively to Association Croquet, not only because it is the sport with which the author is most familiar, but also because the ingenuity of managers of croquet events in devising new formats gives us the opportunity to look at many different kinds of tournament. It is not our intention to describe the game itself; several books are available, from the simple (Mulliner, 1989), through the more detailed (Lamb, 1990) to one which makes it clear that only those with a highly developed interest in decision theory should attempt to play (Wylie, 1985). What we must do, however, is point out that a game is either won or lostthere are no drawsand that the score is generally agreed to be irrelevant: a player may win a close game 2625, or 269 or even (some would maintain) 260. The reason for this is that (as in snooker) a game may begin with defensive tactics on either side, then one player may gain a slight initiative and make a break which (possibly after another spell of evenly matched play) proves decisive. The loser may not have done much wrong. Our main aim will be to determine which form of tournament is most likely to produce as winner the best of the entrants, and we shall do this by simulating various situations. Nevertheless we shall remember that many tournaments are also devised to make sure that all entrants have a reasonable amount of play, and that who wins a friendly weekend event is relatively unimportant. Prestigious events might take place over a week or more, though some are scheduled for a long weekend; club events might involve intermittent matches throughout a season. These considerations affect how many matches it is feasible to schedule for a tournament, and hence its reliability of finding the 'right' winner. Most croquet games take less than three hours, and indeed such a limit may be imposed; some take less than an hour. This variability makes it difficult to estimate how many games or rounds can be completed in a given time. In Section 2 we shall describe some tournament designs and in Section 3 discuss how ties for first place may be resolved for those designs which do not guarantee a unique winner. We shall then give details of the simulation methods, present the results and conclude with a discussion of the relevance of the simulations to real life. We shall refer throughout to the (English) Croquet Association as the CA, and to its Scottish equivalent as the SCA. 2. Types of tournament2.1. The knockoutThe knockout format is important in its own right and also provides the basis for several other designs of tournament. The format is too well known to need much explanation. If there are n players then n  1 matches are required, taking [log_{2} n] rounds. If n is not an integer power of 2 some players receive byes into the second round, which has the unfortunate effect of giving players different numbers of matches; this may be done by the BagnallWild method (Croquet Association, 1989), but we shall not consider its repercussions further here as in the tournaments we have in mind it is usually possible to have either eight or 16 players. The event culminates in a final, which is good for spectators. The CA play their men's and women's tournaments in this format. 2.2. Bestofthree knockoutLosing a single game, for which an unaccountable stroke of ill fortune may be responsible, is enough to cause elimination from a knockout tournament if each match consists of only one game. A greater degree of reliability is achieved by playing each match as the best of three games. The extent of this will be investigated. However, apart from taking longer to play, this modification makes the tournament more difficult to schedule as time in each round must be allowed for three games when only two may be required especially in the latter stages when there are fewer matches (although they may be closer), The CA Open Championship is played as a best of three. It is easy to show that if a player has probability p of winning a single game he has probability p^{2}(3  2p) of winning a bestofthree match if results are independent. If we write p = ½ + d this is ½ + 3/2 d  2d ^{3}. 2.3. Seeded knockoutSeeding is a wellknown way of trying to ensure that the best players will meet late in the tournament and that the final stages will therefore provide the highest quality of play and most excitement. If games go to form, the number one seed (the pretournament assessment of the best player) should meet the number two seed in the final, the number four seed in the semifinal, the number eight before that, etc. The second seed should meet the third in the semifinal, number seven in the previous round, etc. There are different ways of organizing seeding; the following (for 16 players) illustrates the scheme used in this study: 1 versus 16; 8 versus 9; 4 versus 13; 5 versus 12; 11 versus 6; 14 versus 3; 10 versus 7; 15 versus 2. It can be difficult to justify seeding unless there is a wellaccepted ranking list of the entrants. 2.4. Draw and processThe draw and process arrangement is another attempt to reduce the effect of a single stroke of luck: an individual must lose two games before being eliminated. A draw is made for a knockout in the usual way, and a second independent knockout (the process) is formed deterministically from the first. If the order of players in the draw is 1 versus 2, 3 versus 4, ... , 15 versus 16, then the order in the process is 1 versus 9, 5 versus 13, 3 versus 11, 7 versus 15, 2 versus 10, 6 versus 14, 4 versus 12, 8 versus 16. Similar schemes exist for other numbers of entrants (Croquet Association, 1989). What is ensured is that any players who meet in the first or second rounds of the draw cannot meet until the final or semifinal respectively of the process (and vice versa for the first two rounds of the process). If the same player wins both knockouts, there is no need for a playoff, the lack of which can be a slight anticlimax; otherwise two players have lost only one match each and they play to decide the overall winner. 2.5. Seeded draw and processThe seeded draw and process is a combination of the previous two schemes. The first two seeds are scheduled to meet in both finals. If this occurs there is essentially a bestofthree final consisting of the final matches of the draw and the process, and the playoff. 2.6. Round robinIn contrast with the knockout (cup) format, the league format is at the other end of the spectrum in terms of matches played. Also known as an American block, this format involves each player playing each of the others once, so that there are n  1 rounds and ½ n(n  1) matches. It is therefore impractical, in terms of both playing time and facilities, for most tournaments with a substantial number of players but is feasible for eight players at a weekend tournament. With players over a wide range of ability there may be many games of little interest. Used throughout a season it can prove difficult to complete in some sports once players have no chance of winning it. There is no final to provide a focal point for the tournament, and there may not be a unique winner, but unlike the previous designs it offers each player the same number of matches. Many books on graph theory use the round robin tournament as an application of a directed graph; a useful summary of the main points is given by Harary and Moser (1966). 2.7. Repeated round robinThe repeated round robin is a popular choice for croquet invitation events with small numbers of players. The CA's Masters tournament and the SCA's Chairman's Salver both use this form. With eight good players it should be possible to play the 14 rounds required over four days, though there is still the possibility that there will be a tie for first place. 2.8. American blocks with playoffA mixture of the two main forms is provided by having blocks of 4, the winners of which go through to a knockout. The main problem with this format is the very strong possibility that the six matches per block will fail to provide a unique winner to proceed to the next stage. 2.9. SwissThe Swiss style of tournament also has some of the attributes of the knockout and the round robin. A draw is made as for a knockout tournament, but only the first round is played. Everyone participates in the second round, in which the players are ordered in terms of their number of wins, ties being resolved by maintaining the order of the previous round (or possibly of the original draw). This system continues, except that two players may not meet twice in the tournament. It is usual to play at least two rounds more than would be the case with a knockout, so that a player may lose once (or more) and still win the tournament. We shall provide an example of a tournament (with eight players), because this is the most difficult type of event to manage. It has the advantage that all players have the same number of matches, and since players usually meet opponents with a similar record in the tournament fewer matches than in a round robin are likely to be one sided. The last round may see the two principal contenders in opposition, but frequently does not because they have played previously. Again there may be a tie for first place. Suppose that the ties in the first round are as follows, and the winners are shown in bold type: 1 versus 2; 3 versus 4; 5 versus 6, 7 versus 8. Then the second round (with winners again shown) is 1 versus 3, 6 versus 7, 2 versus 4 and 5 versus 8. The natural order for the third round is 1 versus 6, 3 versus 2, 7 versus 8 and 4 versus 5, but players 7 and 8 have already met. As little disruption as possible is caused to the top of the table, and the matches are in fact 1 versus 6, 3 versus 2, 7 versus 5 and 8 versus 4. The present state in terms of games won is 1 (3), 6 (2), 2 (2), 8 (2), 3 (1), 7 (1), 5 (1) and 4 (0). The first player in the list that player 1 has still to play is number 8; number 6 can play number 2; number 3 can play 7 and 5 can meet 4. After 1 versus 8, 6 versus 2, 3 versus 7 and 5 versus 4 we have 1 (4), 2 (3), 8 (2), 6 (2), 7 (2), 5 (2), 3 (1) and 4 (0). For the fifth and final round 1 has only 4, 5 and 7 to play and 7 is highest of these; should number 2 play number 8, number 6 cannot play 5 because they have already met, nor number 3, as this leaves 5 and 4 who have just played one another, but 6 can play 4 and 5 finishes by playing 3. The algorithm for creating a round when the obvious games lead to problems can take several forms, but the exact details are not very important as long as players are confident that a consistent algorithm is being used. The method given by Croquet Association (1989) treats the top and bottom of the table alike; although this is pleasingly symmetric it is illogical if it is more important to ascertain who comes first than who comes last. An interesting feature of the Swiss tournament is that it can be made the basis of a schedule for a round robin. If we make the assumption that the higher player in the draw always wins, and we treat the bottom of the table as being of equal importance as the top (to maintain symmetry) we can derive a pattern such as that for 12 players (A  L) shown in Table 1. Such symmetry can only be obtained if the number of players is a multiple of 4. The rounds may, of course, be played in any order, so they have been denoted by symbols. 2.10. Other formatsThe descriptions above are of the types of tournament which have been simulated; there are many minor variants. If there are blocks followed by a knockout the blocks may be seeded or the subsequent knockout may be best of three, both of which adaptations occur in the Australian Open croquet championship. There may also be consolation events of practically any description after one of the knockout forms. Sometimes a Swiss tournament is run simultaneously with the main knock out event and continues as a 'plate' event while a bestofthree final is held. One interesting version known as a Swiss robin tries to simplify the management of a Swiss event by establishing a round robin design as above, drawing the first round at random and determining subsequent rounds by those implied by what would be the top pairing in a Swiss tournament (Hunter, 1990). In 1993 the SCA Open consisted of a Swiss tournament to determine which four players would play off in the semifinals and final. It could have been anticipated that there would be a tie for fourth place (by four players in fact), necessitating further rounds. The World Championship has begun with blocks of 7, with four players from each progressing to the next stage; again there is likely to be a tie for fourth place in at least one block, At present it has the rather strange feature of letting four of the five players in each block progress to a knockout stage, using the blocks essentially as a method of seeding that knockout. Other formats, including the popular 'Egyptian' system (Hands, 1990), are designed to fit in as many games as possible rather than to determine a winner and will not be considered here; nor will the 'ladder' style of event which is too poorly defined to admit analysis. 3. The question of tiesAs we have indicated, any scheme which involves a round robin or a Swiss element may lead to two or more players tying for first (or another) place. If time allows there may be a playoff, which is the most satisfactory method of resolving a tie between two players, but with more than two this option is unlikely to be available, and even with two it frequently happens that this last game must be rushed and is not a fair reflection of the players' abilities. In the simulation studies we have resolved ties by lot, so the frequency with which such schemes identify the best player will be slightly underestimated in comparison with having a proper match. The effect of this will be quantified, but since the option of an extra game may be impossible in practice such an approach is quite reasonable. Another common way of resolving ties is to give the title to any player who has beaten all the others in the tie. This is only likely to be workable if two players tie in a Swiss tournament or round robin where they have met only once, but though it is an intuitively plausible rule it is in fact illogical, as we shall show with a very small example. Suppose that in a small tournament of five players A has beaten B, C and D, B has beaten C, D and E, C has beaten D and E, D has beaten E, and E has beaten A. A and B therefore tie for first place and since A beat B he is the overall winner. However, should not the fact that A has lost to the worst player in the field penalize him with respect to B? A possible approach is via Bradley and Terry's (1952) model, which in this context assumes that when n players meet (in pairs) there are n parameters (pi)_{i} >= 0 such that in a game between players i and j the odds of a win for player i are (pi)_{i}/(pi)j. Since these parameters are only identifiable to within a scale factor we may impose an arbitrary condition, e.g. that min{(pi)_{i} = 1. If we define (theta)_{i} = log (pi)_{i} we may evaluate the (theta)_{i} by using the GLIM program in Appendix A. This gives values (theta)a = (theta)b = 2.024, (theta)c = 1.012 and (theta)d = (theta)e, = 0, showing that in the view of this plausible model A and B cannot be separated. However, it might cause some dissension at most tournaments if the BradleyTerry model were used, particularly if the only competitor who understood it was one of those in the tie and he was applying it on his portable computer! There is also a potential problem in identifiably if a player loses all his games. A simpler way of attempting to break ties is to count the number of games won by each player's defeated opponents. This is one of the many techniques which have come to croquet from chess tournaments (Kazic (1980), p. 64) and is designed to credit a player with beating the better rather than the worse players. It has the added advantage of making all games in a tournament likely to contribute towards the identification of the winner, thereby adding interest to games which might otherwise have little. However, as can be seen from the example above, it may be at odds with the BradleyTerry model. Better is the method proposed by David (1987) where each player is given a score of their own wins minus their own losses plus the total number of wins of players that they beat minus the total number of losses of players they lost to. This calculation gives a score of 5 to each of players A and B in the above example. 4. Simulation methodsSince our intention is to compare tournament designs, and not to estimate the chance of the best player winning any particular tournament, we may within reason take whatever model of determining winners that we please. What we have assumed is that there is an underlying rating for each player in a tournament and that these ratings are normally distributed (without loss of generality with zero mean and unit variance). For each game a player's actual rating is his underlying rating with a normal error with mean 0 and variance (rho)^{2} . If the ratings for two opponents are then xA and xB the probability that A beats B is assumed to be (Phi){k(xA  xB)} for some k, where (Phi)( ) is the cumulative normal distribution function. We shall report briefly the effect of changing (rho) in the context of a knockout tournament and thereafter give results for (rho) = 0.2. We have used k = 1 to simulate tournaments with considerable disparity of ability and k = 0.25 for those, such as invitation events or events played to handicap, where the standard is more homogeneous. We have considered only tournaments of eight or 16 players. As indicated earlier, when a tie occurs it is broken at random, but results will be reported also for cases when there was a unique winner. When seeding is involved it is assumed that it is done perfectly. In each case 10,000 simulations were carried out (with different ratings for the participants), so two standard errors of the percentages given correspond to at most one percentage point. All programs were written in APL which uses the random number generator x_{(n+ 1)} =7^{5}x(n) (mod 2^{31}  1). 5. ResultsTable 2, row (a), shows the effect of changing the withinplayer variance relative to the between player variance for a knockout tournament of eight players. Not surprisingly the best player wins most often the more consistently players perform. Row (b) shows how the games per match increase with increasing error variance in a bestofthree tournament, as does the average number of games lost by the winner (row (c)). Row (d) shows the frequency of ties in the block stage of a blocks and playoff tournament. TABLE 2
The principal results, detailing the percentage of times on which the best player won each style of tournament, are shown in Table 3, and Table 4 gives some further details about Swiss and round robin tournaments. Table 5 shows the percentage of times that there is a unique winner, and the number of games lost by that winner. Of the 25% of occasions when there is no outright winner of a round robin ((rho) = 0.2 and k = 1), the best player is involved in a tie with one other player on 63% (16% of the total) and with more than one other on a further 22% (5% of the total). TABLE 3
** based on 10 000 simulations of each with (rho) = 0.2 (see text). Tournaments are ordered by the total number of games played. TABLE 4
TABLE 5
+ k = 0.25, (rho) = 0.2. Fig. 1 relates the probability that the best player wins to the total number of matches played for toumwnents with eight players and k = 1, (rho) = 0.2. The arrows indicate the effect of playing a single extra game as a tiebreaker when only two competitors have tied for first place. Fig, 2 shows how the probability of winning a particular type of tournament differs for players of all rankings. The tournaments illustrated are for eight players of comparable abilities (k = 0.25, (rho) = 0.2). 6. DiscussionIt cannot be denied that many of the most fascinating aspects of competition have been excluded from this investigation. What psychological effects do knockout tournaments engender compared with round robins? Is either player favoured in the deciding game of a best of three? Is it more difficult to win as a top set or as a defending champion? Are there particular styles of play such that, although A is likely to beat B and B to beat C, nevertheless C is likely to beat A? Is it possible that what really, makes winners is their smaller withinsubject variability (as well as general ability)? There is no doubt that these are interesting questions to the sportsman, but we must assume that they are secondorder effects in the context of our present investigation. Similarly, if our purpose had been to find the probability that the best player would win a particular tournament we would have had to justify by recourse to data the assumptions that we have made  but how can we tell who the best player is? Since our aim has been to compare formats we believe our assumptions to be adequate, but we shall emphasize that the results in the tables are for comparative purposes only. The assumption of normality of the distribution of players' ratings is not important in a comparative study: if the distribution is actually skew to the right the best player is more likely to win, and conversely if it is skew to the left he is less likely to. The results that we have obtained can certainly be used to justify many of the current designs of croquet tournaments. Where the spread of ability is small a good case can be made for the use of the round robin tournament played twice, as long as the title at stake is sufficiently prestigious to justify the amount of time which will be taken up by this format. It is probably out of the question for more than eight (or at most 10) players. The best alternative in most circumstances appears to be the draw and process, and this is the more effective if it is seededthough because of that it is important that there should be good grounds for the seeding. It compares well with the bestofthree knockout format and is as easy to schedule, though the absence of a guaranteed final could be a problem, and it should be noted that for eight players there are as many rounds, though not as many matches, as for a round robin. It is possible to expand slightly on the effect of seeding. We can use Spearman's rank correlation coefficient Rs to summarize the agreement between the correct seeding and the actual draw order in an ordinary knockout tournament. Since a negative correlation is just as good as a positive correlation (getting the seeding completely wrong produces the correct games), we are only interested in the absolute value of Rs. Its expectation for a sample of size 8 can be shown to be 0.31. Thus, from Table 3 (k = 3) wwe can see the when Rs = 1 the best player wins 62% of tournaments; when Rs = 0.31 he only wins 58%. If we were to have the worst seeding possible (i.e. Rs = 0) he would win about 56%. The Swiss tournament is a useful format, as it performs reasonably well in comparison with the round robin in eliciting the best player as winner, and it gives all players an equal number of games, an important criterion if players have travelled long distances. Its main failing is that of the round robin, but more pronounced: its frequent inability to produce a unique winner and the consequent need for a tiebreak or the sharing of a trophy. If that is not regarded as a major problem it may be considered to do very well, in as much as the best player often at least ties for first place. The results in Table 5 indicate that in terms of demanding of the winner that he win every game the competing formats vary. The draw and process and the bestofthree schemes are similar and are more tolerant to a single loss than is the Swiss system. The larger number of games in a round robin, even with eight players, means that a loss is well tolerated, and when this is played twice the winner is likely to have lost several games. Although the method of allocating wins in the simulation exercise was not
based on data from real tournaments, the results suggest quite reasonably that
the 'favourite' in a competitive field of 16 might start at odds of about 3:1
or 4:1 against in one of the shorter tournaments, but that more than evens
in a round robin tournament played twice would be generous. However, the corresponding
results for entrants of more widely varying ability (where for eight players
the best has about twice the chance of winning than in the more homogeneous
case) indicate the dangers of extrapolating the numbers that we have produced
in any other than a comparative sense. Appendix A: GLIM program to apply BradleyTerry model to example of Section 3
ReferencesBradley, R. A. and Terry, M. E. (1952) Rank analysis of incomplete block designs:
I, The method of paired comparisons. Biometrika, 39,
324345. All rights reserved © 1995
