Is skating really unfair? Yes, even in extra stringent analysis.

TL;DR

Yesterday, we reported that random variability in the starting procedure of racing sports can bias competitions, even at Olympic events. Not everyone was keen to believe this, and some people have made suggestions for things we should control for. Some even went so far as to criticise our methods. In this post we address all questions, and provide an extra analysis that looks at within-athlete effects of changes in the ready-start interval on changes in race times. This analysis is robust to differences between skaters’ individual qualities, and has causal power. Our results indicate that there still is evidence that random differences in ready-start intervals might bias competitions. At the very least, this calls for future research into the starting procedure of racing sports. Which is exactly what we intended to provoke with yesterday’s publication.

What happened?

Perspective article in Frontiers in Psychology

Yesterday, we published a Perspective article in the academic journal Frontiers in Psychology. The article made a theoretical point about a potential bias in the starting procedure in some racing sports, including speed skating. We explained about the alerting effect, that makes people quicker to respond when they are just alerted:

The starting procedure in racing sports closely resembles a classical experiment, where participants receive an alerting cue before having to respond to a target stimulus.The cue is a general, non-spatial signal that precedes the target stimulus by a variable interval. In the lab, participants are quicker (Posner and Boies, 1971; Adams and Lambos, 1986) and more precise(Klein and Kerr, 1974) to respond after an optimal interval duration of around 500ms, and are progressively slower and less precise after longer durations.

Inconsistent starts in racing sports

In some racing sports, the referee signals the competing athletes to get ready, and fires the starting shot after a certain interval. We refer to this as the ready-start interval. The crucial point is that this interval is variable. It depends on how quick athletes assume their starting position, but also on regulations that specifically call for variability. The latter is true in speed skating. In addition, speed skaters compete in pairs, and whoever ends up with the lowest total wins the gold. Crucially, each skater starts with different ready-start intervals.

Alerting differences could bias competitions

In our article, we argued that athletes that start with a shorter ready-start interval should have a theoretical benefit over those that start with a longer ready-start interval, due to the alerting effect we describe above. Of course, we realised that this was quite a strong claim to make based on data from psychological studies that is collected in a highly controlled environment. Surely, at Olympic events, factors like talent and training should strongly outweigh the potential benefits of a short ready-start interval? This is why we collected data from the 500 meter speed skating competition at the 2010 Winter Olympics. We collected millisecond accurate ready-start intervals from the audio trace of the event’s broadcast, and chose to correlate that with the race times of each athlete.

What we show, is that ready-start intervals and race times correlate. And the effect was quite large at that: several hundreds of milliseconds. That does not sound like much, but can make a real difference in racing sports. Enough, in some cases, to mean the difference between winning the gold, and not winning a medal at all!

SPEED SKATING OLYMPICS 2010 REGRESSION

The longer a referee waits between “Ready” and “Start”, the slower athletes finish!

So what’s wrong?

Lots of attention for our article…

The story got picked up by media and was reported on in newspapers, websites, and radio shows in several countries. In the Netherlands, where Dutch people live, reporting was especially pronounced: Beorn (second author and former speed skater) even made it to national television. There are two things you should know about the Dutch: they love speed skating, and they are a naturally sceptic kind of people. This quality makes them very good at evaluating science, and so they did: Mere minutes from publication, I was already receiving emails from colleagues that suggested further analyses, and what kind of control studies could be run. This was great! Post-publication discussion of the starting procedure is exactly what we wanted!

Some bad attention…

In addition to the constructive comments, some people simply dismissed our findings. Some did so because they simply “do not believe them”, but some had more substantiated criticisms. The most prevalent of these was that we relied on what is essentially a correlation between ready-start intervals and race times. As we have al heard, correlation does not imply causation! What people argued was that the causality might be opposite: maybe good skaters take less time to get ready, and thus have shorter ready-start intervals!

In addition, some people argued that we should not have been using a correlation in the first place, because our data included at most two data points from the same athletes. The assumption here is that the data points coming from the same athlete might cause non-independence in our data set. At least, that is what I assume they meant, because they did not do a very good job at explaining their point. They also did not make any suggestions about what would be the appropriate analysis, and they hid behind an anonymous website. Furthermore, they did not contact us before publication, and they fail to recognise that our manuscript has passed scientific peer review, which should vote for its methodological and statistical soundness. Instead, TopSport Topics decided to post a rather shallow discussion of our paper, that came to a very strong conclusion, without more than simply handwaving towards potential issues.

Cool scientific discussion

From the previous paragraph, you might conclude that we were a bit upset. And we were. But not because they criticised us! Other people have contacted us about similar issues, and we had very good discussions about our data. For example, Prof. Lex Borhans of Maastricht University contacted us with the question highlighted before: what direction was causality in? Were skaters really quicker due to shorter ready-start intervals, or could it be that good skaters assume their starting positions quicker and thus have shorter ready-start intervals? I should note that Lex was not the only one asking this question, but he was the only one (to my knowledge) that wanted to write a blog post on it. He asked for our data, and we were very happy to send it over for Lex to re-analyse. The resulting blog post can be found here.

The method that Lex applied was really clever: he tried to predict the race times of skaters’ second race by using the ready-start interval from the first time. This allowed him to infer the direction of causality. If shorter ready-start intervals really cause shorter race times, there should be no correlation in Lex’ analysis. If, on the other hand, short ready-start intervals were caused by athletes being really good, there should be a correlation. Lex’ findings were more in line with the latter suggestion: there were correlations between the ready-start intervals on one race correlated with the race time of the other! The correlation was less strong than the correlation we found, but that difference was not significant.

We have to go deeper!

So where does that leave us? Did we publish too soon? Not necessarily, because Lex’ method is a bit like a statistical sledgehammer: It’s not really sensitive to the kind of subtle effects that our alerting hypothesis would predict. In addition, if you assume skaters are causing both their ready-start intervals and their race times to be longer or shorter at the same time, then the skater would be a moderator. In Lex’ analysis, this would mean that the relationship between the ready-start interval of one race and the finish time of the other race is inflated. For example, if a skater performs bad (e.g. due to a long ready-start interval) during the first race and gets demotivated after the first race, they will show both a longer ready-start interval and a slower race time. Thus the correlation between the first ready-start interval and the finish time of the second race would be moderated by the skater’s state. In the new analysis we describe below, this is not an issue: When the state of the skater is assumed to have an effect on both ready-start interval and race time, this does not change the direction of the relationship between ready-start interval and race time. And that direction is exactly the problem in the current starting procedure. Please do read on if you want to learn about our new super-stringent analysis.

So we set out to do what every scientist does when they are being critiqued: we dove back into our own data, and we scienced the shit out of it. The results are presented below.

What’s really going on?

One analysis to rule them all

There is one way in which we can address all criticisms at the same time, and that is by looking at individual differences within each speed skater. In that way we can measure the effect that differences in ready-start interval have on each individual skater’s performance. This answers the causality question, because we use a difference in ready-start interval (caused by the referee) to explain a difference in race times using a linear regression. It also bypasses the issue of differences in skaters’ abilities, because we are looking at differences in each individual’s performance. Finally, it bypasses all other confounds that others have brought up here or there, for example the idea that both referees and skaters get quicker as the competition progresses due to excitement building up. The following analysis addresses those issues, because it is a direct test of the effect of within-skater differences in ready-start interval on differences in within-skater race times.

Methods, methods, methods

We want to be absolutely clear about our methods here, so here they are. We excluded all skaters that fell or nearly fell during their race, and also those that did not complete one of either. These are Mitchell Whitmore (nearly fell in first race), Maciej Biega (nearly fell in second race), Shani Davis (gave up after first race), Annette Gerritsen (fell in first race), and Yulia Nemaya (fell in second race). Falls or near falls have such a massive impact on race time that they obscure everything else, including athletes’ talent and training, but also the effects of ready-start intervals. Therefore we strongly feel that we should exclude these races from further analysis. For all remaining skaters, 70 in total, we calculated the difference between the first and the second race (race 1 minus race 2) in both their ready-start interval and their race time. We combined the data for men and women, because we have no theoretical reason to split them up, because their individual differences are on the same scale (unlike their race times), and because we need the sample to be sufficiently large for any regression or correlation to be sensitive enough to pick up the effects that we predicted.

Way more stringent analysis, but same results

The linear regression between the individual differences in ready-start intervals and race times demonstrates that there is a significant positive effect of ready-start interval on race time. When the difference in ready-start interval is negative (i.e. the second race had a longer ready-start interval), the difference in race times was also negative (i.e. the second race time was longer). The Pearson correlation is significant (p = 0.003), and explains about 12% of the variance in race time differences.

In the current dataset, assuming a linear relationship, one extra second of ready-start interval difference caused 174 ms of difference in race times. Both the explained variance and the magnitude of the effect are less than what we demonstrated in the analysis in our article. This means that at least some variance in that data can be explained by what several people, including Lex, suggested: quicker skaters are quicker to assume their starting position, and thus have shorter ready-start intervals. However, we can still explain 12% of the variance, whereas this should be 0% in a fair competition. And the remaining effect of 174 ms of added race time per extra second of ready-start interval is still very worrying in sports where the difference between winning gold or silver (or nothing!) can sometimes be only a few milliseconds.

DIFFERENCE REGRESSION PLOT

Shorter ready-start intervals do cause shorter race times, even when controlling for confounds.

Is this sample too small?

After collecting data and computing a correlation, you can calculate the statistical power of your results. Ours is 91.42%. That number indicates that our sample was big enough to reliably test the effect that we found.

What happened to the differences between men and women?

They are not there in the current analysis, with the individual Pearson R for men being 0.21, and 0.26 for women. This means they were likely due to noise in the men’s analysis from our article.

Final note

The most sceptic of people might now argue something along the lines of the following: “But wait a minute… Maybe skaters that are very good are both more constant in assuming their position (and thus their ready-start interval), and in their race times! So you ARE wrong! HA!“. In that case, one would still expect the minor differences you assume to correlate in the same direction. So they should still be picked up by our regression. In other words: our concerns with starting procedures still hold.

Conclusion

The theoretical issues that we put forward in our article are valid, and the data we provided to support our claims are still valid. This is after we corrected for skaters’ individual qualities, and using a regression on within-participant differences from which causal inferences can be made. Our article was a Perspective, which are intended to highlight important areas of future research. Our article and the discussion following its publication illustrate precisely that: There is a need to thoroughly investigate the starting procedure of racing sports, and speed skating in particular.

Further comments

If you have any comments, objections, compliments, or tips for funny cat videos, please post them in the comment section below. You can also direct them to me directly, using my email address: edwin.dalmaijer@psy.ox.ac.uk.

Reference

  • Dalmaijer, E.S., Nijenhuis, B.G., & Van der Stigchel, S. (2015). Life is unfair, and so are racing sports: Some athletes can randomly benefit from alerting effects due to inconsistent starting procedures. Frontiers in Psychology, 6(1618). doi: 10.3389/fpsyg.2015.01618

30 Comments:

  1. Interesting stuff. I always like science and sports.

    Even more unfair is that speedskating is a time-trial with an opponent that can help you or blow your race. That would be the first step to solve if you want to make the sport more fair.

    If you want to keep traditions, don’t change anything. Like the guy with the bell still rings for the last lap, even in a 500m -race

  2. Wonderful real world application! I still don’t think the causality question is 100% settled though – you’ve removed ‘stable trait’ influences with the analyses on the differences in ready start interval and race times, but I don’t think you’ll find a way to control out fluctuations in an individuals state, that may lead to both longer ready start intervals and longer race times… simply being more tired than in their previous race could make someone slower at both.

    • Thanks for your comment! The issue in any non-experimental study is always causality. I think the current analysis does have causal power, but your are right about random fluctuations potentially still being a factor. I would argue that all of the factors that you could think of, including tiredness, are post-hoc explanations that should definitely be addressed in future research. They are, however, currently unsupported by any available data, nor by the experience of our former prof-skating co-author. Specifically, there is no reason for us to assume that quicker athletes (due to individual quality or tiredness) are also quicker at assuming their starting position, in combination with a referee that’s quicker to sound the starting shot. Our theoretical argument is based on a large body of alerting literature, and our analyses are enough to demonstrate something is likely to be going on. That, in my mind, should be enough to warrant further investigation in direct experimental designs, e.g. testing professional athletes in competition-like environments.

      • Well, sometimes even non-experimental work has things we can reasonably treat as random… if the ready start interval was not influenced by the skater for instance (there must be sports where this is the case!). But I agree that you’ve shown something interesting.

        • Despite the potential influence of skater on ready-start interval, there’s definitely some causal effect of ready-start interval on performance in our non-experimental work. Which, I think, is what you’re hinting at too – so we agree :)

          Why this relation is there (alerting, muscle fatigue, etc.) is food for further experimental investigation.

  3. Allard neijmeijer

    Edwin, thank you for taking the time to respond to al the critique at your research. Being a speedskater myself I have the same kind of intuitive feelings(based on experiences in races) as Michel Mulder has, that the conclusion of your research that the start-interval has such a large effect on finishing times is wrong.

    The main question I have about your research is this:
    The literature and theoretical explanation of possible effects is mainly about factors influencing the reaction time of a test subject. Isn’t it odd then to take the finishing time of a skater as a measure of this response? This introduces a lot of noise into the data from factors that are not accounted for in the analysis. Wouldn’t taking a 100 meter time, or even measuring the actual response time deliver more valid results?

    Of course the only thing that matters in speed skating is the time at the finish line, but in order to conclude that the starting procedure is unfair, you should choose a measure that actually says something about the result of the start, while introducing as little noise in the data as possible

    • Hi Allard, thanks for replying! I appreciate your intuitive concerns, and I would like to stress that science never is a one-study kind of thing. It’s an iterative process in which a lot of people approach a problems from a lot of different angles, and in the end someone takes the time to summarise these findings in a review. Our paper is a Perspective article, which is meant to highlight an important issue that receives to little attention in the current literature, at least in our opinion. We provide evidence that is based on observations from real-world conditions, and this evidence is by definition less strong that direct experimental evidence. However, direct experimental evidence is time-consuming, costly, and would in this case mean someone would have to go test professional athletes within their form peak on an experiment that will take hours (you need a lot of repeated measurements for solid stats). Our work provides an indication that there might well be an issue with the current starting procedure, and we can back this up with tons of experimental research on the alerting, and the data from the 2010 Winter Olympics described above. I think that’s more than enough to warrant future research into, and caution with the current starting procedures.

      Regarding your question: We used the finishing times, because – as you say – this is all that matters. A direct relation between ready-start interval and race time is the only way in which we could demonstrate that this is a problem. Also taking into account the 100 meter times would be a very interesting thing to do, but it would also mean that compensatory strategies can be present in the first 100 or the last 300 meters. Any analysis that would not include the actual race times would miss those. An example: Suppose you realise (perhaps unconsciously) that your start was kinda slow due to a long ready-start interval. You could put in extra effort to make up for that, and show a decent time at the 100-meter point. However, you would also suffer from putting in that extra effort in the following 300 meters, therefore you would still experience a negative effect of the longer ready-start interval, which might not have been present at the 100 meter point yet.

  4. Great stuff, great analyses, puzzling results.
    One hypothesis is that skaters who feel less primed and confident at any of the two races take more time to get ready and also have slower times.
    To rule that out, you could consider run your analysis on the opener time only, as one would expect that the slow getting down should not have an impact on the 400 lap time, 100 meter later…
    Also, if this is true (which certainly seems to be the case) , one should see a similar trend in the opener of the 1000m). -although you won’t have two races in the same skater to compare…

    • Thanks for your comment and compliments!

      We had no prior reason to assume that quicker (due to individual quality, tiredness, or self-confidence) skaters also take less time to get ready, and cause referees to shoot the gun earlier. This has never been reported on, and it was not something that our former prof-speedskating co-author had ever experienced. Our second analysis does seem to suggest that some variance can be explained by your suggestion, but that there definitely still is a relatively strong effect of ready-start interval on race time.

      Regarding your question about using the opener time: We used the finishing times, because this is all that matters in the end. A direct relation between ready-start interval and race time is the only way in which we could demonstrate that this is a problem in actual competitions. The 100 meter times would be a very interesting thing to look at, but it would also mean that compensatory strategies can be present in the first 100 or the last 300 meters. Any analysis that would not include the actual race times would miss those. An example: Suppose a skater realises (perhaps unconsciously) that their start was kinda slow due to a long ready-start interval. They could put in extra effort to make up for that, and show a decent time at the 100-meter point. However, they would also suffer from putting in that extra effort in the following 300 meters, therefore they would still experience a negative effect of the longer ready-start interval, which might not have been present at the 100 meter point yet.

  5. Edwin, thanls a lot for take time to ‘meet the Dutch (= critics)’ 😉

    Skipping in your article to the paragraph ‘A Potential Solution’

    “The fairness of any racing sports’ starting procedure could be improved by introducing an extra step, and removing temporal variability. In our remedied start, a referee signals athletes to get ready (“Get Set”), and explicitly cues the impeding start only after everyone has assumed position (“Ready”). After a fixed time, the starting shot should sound. Ideally, to prevent human timing error, the referee could simply press a button after all athletes have assumed their starting positions. This would activate a computerized system that produces both the “Ready” cue and the starting shot, separated by a fixed interval.

    Although using a fixed interval will equalize general alerting benefits, it will not remove the effects of selective temporal expectancy. In fact, it will increase athletes’ ability to anticipate the starting shot. Importantly, incorporating a fixed ready-start interval in official rules and regulations will allow athletes to train their response times for that specific interval. This would make an athlete’s temporal attention an individual quality that contributes to their likelihood to win. In our view, this would be fairer than the current starting procedure, which is essentially a lottery.”

    a few things spring to mind; this is one of them:

    Knowing (beforehand – everyone does, as it clear in all races) that a substantial part of the ready-start interval is taken by the ‘setting’ procedure of the two competitors and knowing too that some competitors’ setting takes (deliberately or not) quite a bit longer than others’ (thus influencing the overall ready-start interval value), why did you take the ready-start interval (being the same for both skaters) and not the set-start interval for the individual skaters?

    • Hi Peter, no worries, I love a good discussion!

      Fixing the time an athlete needs to get ready is quite challenging, as they are human and thus variable by nature. Instead, we argue for sounding another alerting cue after all athletes have assumed their starting positions, then waiting for a fixed time, and then shooting the starting pistol. This will equalise all alerting benefits.

      Other concerns with the starting procedure’s timing include muscle fatigue. It’s probably not a good thing to be in the starting position for too long (skaters and coaches have argued this before), thus you want athletes to assume positions within a time window with as little variability as possible. You could, for example, enforce a rule that athletes would have to assume position within 2-2.5 seconds; not quicker and not slower. However, as I said, athletes are human and thus variable by nature. It might be very hard to make them assume their starting position within a fixed time. Also, as always in science, we need further experimental studies that demonstrate the relation between the time spent in the starting position and the actual finishing time, to make sure that there is in fact a causal relationship there.

  6. Hi Edwin, thanks for your quick reply.

    There’s no denying that measuring a set-start interval in the current set up is more challenging than measuring the ready-start interval.

    However your focusing on (repeating) a potential ‘solution’ doesn’t answer the/my question. In fact, all your concerns mentioned only emphasise limitations of having taken the ready-start interval!

    One could argue that (re-)doing an analysis on the same race with individual set-start intervals, rather than ready-start intervals for pairs, would give a better idea of any causal link. Only after that one should come up with suggestion to changing starting procedures.

    • Ah, yes, I see your point now. The issue with getting the set-start interval, is that it isn’t possible to get it without using video equipment with a very high temporal resolution. In addition, it would require us to know when the referee thinks the skaters have stopped moving, which is not an objective decision (there’s bound to be micro-movements).

      Another, more fundamental issue is that skaters will not be able to be fully aware of when the referee thinks both skaters have assumed their starting position. This is a decision process, and the (gradual!) decision is not likely potent enough to produce an alerting effect. Therefore, opting for the cue that we can be sure will cause an alerting effect (“Ready”) was the best choice for the current study.

  7. Like I said, it’d be more challenging to (re)without a proper cue, but without it I’d argue that your results have been affected by the lack of doing so and thus your conclusions too: the slower setter of a pair may/will have lowered the opponent’s alertness and therefore his/her 500m time, whereas two quick setters don’t.

  8. .. to (re)do ..

    • Two things which I think we’re currently agreeing on?

      1) There is an alerting effect from the “Ready” cue, but not from both skaters having assumed their starting position.

      2) The time it takes for both skaters to get ready determines how alert they will be when the starting shot sounds. This is what you are saying, and is in line with our argument. The ready-start interval exists of the setting of the athletes and the extra time a referee waits before firing the pistol. The longer this is, the longer ago the alerting “Ready” cue was, and thus the less aroused both athletes will be. Thus the slower will they be to respond to the starting shot. Two quick setters (provided they also have a low added time from the referee) will be more alert, and will thus be able to respond to the starting shot quicker.

      Maybe you shouldn’t have skipped most of the paper, because we explain where we’re coming from, and you seem to be arguing for the same things 😉

      • Yes, we agree on both things.

        But the difference is that I’m arguing it based on common knowledge as well as the literature you refer to – they would predict your outcomes qualitatively.
        To get to reliable quantitatively significant conclusions you’d háve to split the ready-start intervals into the various relevant subintervals for the individual skaters.

        PS: don’t worry, I read the whole paper, but found the potential solution offered so obvious based on only the theory that i was surprised that you choose to test the theoretical knowledge (which to me seem to more applicable to a set-start interval measurment)with the ready-start intervals.

        • Ah, I see where the issue is. In our opinion, the alerting cue is the referee’s “Ready”. The target signal is the starting shot. Therefore, the appropriate window for testing alerting effects is the ready-start interval.

          But please do feel inspired to do a follow-up study that separates the ready-set and the set-start intervals. I’d love to see that! Please do let me know if you need any help or further input :)

          • Like I set in my refrased comment (@15:41):

            What literature did you decide to include the ready-set, thus opting for ready-start rather than set-start, apaprt from the perceived practical obstacles to measuring the set-start intervals reliably with the limited sources available to you?

            I.e. what studies do show a causal link between alertness and ready-set intervals, where to me in speedskating reality the length of that (ready-set) interval is sometimes due to deliberate actions from (at least) one of the two skaters.

  9. PS: could you re-do the analysis, but separating the (visually) slower/faster setter for each ready-start interval, just to see if the slower setters of longer ready-start intervals are significantly faster than the faster setter (= longer waiting) of the same (longer) ready- start intervals?

  10. It’s of enough quality to qualitatively separate the slower from the faster setter within a pair and thus giving you the option to differentiate between them in your analysis. If you want more, ask the NOS for the original tapes.

  11. Re-reading more carefully what you wrote about what you think we agree about, I’d have to refrase my comment:

    I think the alerting period should be counted from the set-position, not from the ready cue. Where in the literature does the ready-set period prove to have a causal relation with alertness?

    • As I argue, the ‘set’ moment is not one precise instant in time. It’s a decision process. Alerting, from the outset, has been described and tested as a response to a non-specific cue, usually an auditory and sometimes a visual event. With a very clear onset, unlike the ‘set’ moment.

      To highlight a few references on alerting (quoted in our paper):

      1. Posner, M.I. , and Boies, S.J. (1971). Components of attention. Psychol. Rev. 78, 391–408. doi:10.1037/h0031333
      2. Lawrence, M.A., and Klein, R.M. (2013). Isolating exogenous and endogenous
        modes of temporal attention. J. Exp. Psychol. Gen. 142, 560–572. doi:10.1037/a0029023
      3. Weinbach, N., and Henik, A. (2012). Temporal orienting and alertin — the same or different? Front. Psychol. 3:236. doi:10.3389/fpsyg.2012.00236
      4. Weinbach, N., and Henik, A. (2012). The relationship between alertness and executive control. J. Exp. Psychol. Hum. Percept. Perform. 38, 1530–1540. doi:10.1037/a0027875
      5. Weinbach, N., and Henik, A. (2013). The interaction between alerting and executive control: dissociating phasic arousal and temporal expectancy. Attent. Percept. Psychophys. 75, 1374–1381. doi:10.3758/s13414-013-0501-6

      In all fairness, I think it would be a great idea to test to what extend an ambiguous decision process could produce a similar alerting effect. It would be a hard study to do, as it requires you to find a way to pinpoint the decision moment in some way, but I’m sure one could find a way. Probably through modelling: the decision making literature offers so many different options to do that, that it seems like a plausible option.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">