Big Data

The era of big data has arrived in sports science research, and I couldn’t be happier. For a long time I was skeptical about sports science as a source of useful information about how to train effectively as an endurance athlete. The typical study was just too limited in scope and too simplified in comparison to the real world for me as a coach to put much stock in its findings. Even basic truths like the importance of training at high volume to maximize endurance fitness had virtually zero support in the scientific literature because it was almost impossible to prove or disprove within the constraints of a typical sports science study.

But the advent of big data has changed all that. Now scientists can answer specific training questions with a high degree of confidence by collecting training data from tens of thousands of athletes and teasing out correlations between training inputs and fitness and performance outputs.

The latest example is a study on tapering in marathon training that was conduction by Barry Smyth and Aonghus Lawlor at University College Dublin and published in Frontiers in Sports and Active Living. Smyth and Lawlor analyzed data from the devices of more than 158,000 runners in the final weeks of marathon training, focusing on 1) how long their taper period was (i.e., how many weeks out from race day their training volume began to decline), 2) how disciplined their taper was (i.e., how consistently their training volume decreased throughout it), and 3) how well they performed in the marathon relative to their best 10K time. Here are the main findings:

  1. A more disciplined taper (i.e., consistent decline in volume) was the strongest predictor of better marathon performance. Once runners began to taper, they were better off continuing to taper.
  2. Runners who tapered for three weeks tended to perform better than runners who tapered for two weeks or less. Extending the taper to four weeks resulted in no additional gains.
  3. Runners who trained at higher volumes prior to tapering tended to taper longer and to execute more disciplined tapers.
  4. A majority of the runners (64 percent) tapered for two weeks or less and in an undisciplined way.

The authors concluded, “An important practical implication of this work is that there could be an opportunity for many runners to improve their relative performance by implementing a more disciplined form of taper. This is likely to be of considerable interest to recreational marathoners and coaches.”

They’re certainly right on that last point. As a coach to many marathon runners, I take considerable interest in these findings. But I’m not exactly sure yet what to do with them. I’ve always believed that the duration of a taper should be determined by how hard the athlete trains before the taper, and that most recreational runners don’t train hard enough to require a long taper. In this study, appropriately, high-volume runners were found to have engaged in the longest tapers, but even lower-volume runners tended to gain a slight benefit from a three-week taper versus a shorter one. The impact of a disciplined taper was greater than that of a lengthier taper, however, and like any coach with half a brain I always prescribe disciplined tapers, so that won’t change.

Come to think of it, I don’t know if anything will change. Off the top of my head, I can’t think of a single athlete I’ve ever coached who underperformed in a marathon as a consequence of feeling under-tapered going into the race. As they say, if it ain’t broke, don’t fix it. On the other hand, my curiosity is piqued, so I will probably give one of my athletes an opportunity to experiment with a slightly longer marathon taper in the near future. If it doesn’t work, we can both blame Barry Smyth and Aonghus Lawlor.

As a final note, although this study focused on marathon tapering, its most striking finding had to do with marathon pacing. Specifically, female runners were found to pace their marathons far more skillfully than male runners, who on average added 4.49 minutes to their finish times by starting too aggressively and hitting the wall. For me, this finding points to the need for a comprehensive guide to developing pacing skill. Stay tuned.

“We can neither deny what science affirms nor affirm what science denies.” I forget who said this, but whoever said it, it’s true. If you’re not so sure about that, it’s likely because you’re misinterpreting the statement as meaning that science is always right about everything. But that’s not at all what it says. What it says is that if you want to be “right” about anything, you must use the scientific method to address whatever it is you want to be right about. For example, if the scientific method is used to arrive at the conclusion that earth’s climate is changing, and that human activity is the primary driver of that change, then no one should put any stock in a denial of this conclusion unless it, too, is arrived at through the use of the scientific method. Even if it turns out that earth’s climate is not changing or that human activity is not the primary driver of that change, a person whose reason for denying the current scientific consensus on this matter is that it snowed in April one time last year is not really “right,” or is right only in the sense that the stopped clock is right twice a day. Indeed, the only way it could really “turn out” that earth’s climate is not changing or that human activity is not the cause of that change is for science itself to come to this new conclusion.

The scientific method is really nothing more, and nothing less, than intellectual integrity. By nature, individual human beings tend to form highly biased beliefs. A highly biased belief can be true, but in general, biased beliefs are unreliable. The scientific method was developed as a way to remove bias from the process of belief formation as much as possible. It is by no means a perfectly reliable method of forming beliefs, but it is more reliable than any other method.

Granted, the applicability of the scientific method is limited. It cannot be used to settle questions such as whether the Beatles are better than the Rolling Stones or whether prisoners should be allowed to vote—in other words, aesthetic or moral questions. Science is also of limited value in the domain of real-world problem solving. For example, I’d put more trust in an experienced general with a record of winning battles to win the next battle than in a scientist who came up with a new strategy for winning battles by running a bunch of computer simulations.

Endurance sports training is another example. Historically, elite coaches and athletes have been way out ahead of the scientists with respect to identifying the methods that do and don’t work. The crucible of international competition is not a controlled study, but it’s enough like one in its ruthless determination of winners and losers to have given lower-level coaches and athletes like me a high degree of confidence in their beliefs about the best way to train. In contrast, it’s actually surprisingly difficult to design and execute a controlled scientific study that has any substantive relevance to real-world endurance training. For example, one of the greatest certainties of endurance training is that high-volume training is essential to maximizing fitness and performance, yet there is virtually zero scientific evidence to support this certainty because it’s impractical to execute the kind of strictly controlled, long-term prospective study needed to supply such evidence.

But things are changing. The advent of wearable devices has made it possible for sport scientists to take a “big data” approach to investigating what works and what doesn’t in endurance training. In this approach, scientists dispense with the familiar tools of generating hypotheses and then testing them by actively intervening in the training of a small group of athletes and instead just collect relevant data from very large numbers of athletes and use statistical tools to quantify correlations between particular inputs (e.g., training volume) and specific outputs (e.g., marathon performance). While this approach lacks the tidiness of the traditional controlled study, it has the potential to yield results that have equal empirical validity by virtue of the sheer volume of data involved. And because these studies are done in situ, they do not share the controlled prospective study’s questionable real-world relevance.

The Science of Running

As an experienced endurance coach who respects science, I have long been highly circumspect in using science to inform my coaching practices. I always check new science against what I know from real-world experience before I incorporate it into my coaching practice. But studies based on the big-data approach are my kind of science because they’re really just a formalized version of the learning we coaches do in the real world.

So I was particularly excited to see a new study titled “Human Running Performance from Real-World Big Data” in the journal Nature. It’s a true landmark investigation, drawing observations from data representing 1.6 million exercise sessions completed by roughly 14,000 individuals. Its authors, Thorsten Emig of Paris-Saclay University and Jussi Peltonen of the Polar Corporation, are clearly very smart guys who understand both statistics and running. The paper is highly readable even for laypersons like myself, and it’s also available free online, so I won’t belabor its finer points here. What I will say is that its three key findings squarely corroborate the conclusions that elite coaches and athletes have come to heuristically over the past 150 years of trying stuff. Here they are:

Key Finding #1 – Running More Is the Best Way to Run Faster

One of the key variables in the performance model developed by Emig and Peltonen is speed at maximal aerobic power (roughly equivalent to velocity at VO2max), which they are able to “extract” from race performance data. The collaborators found that the strongest training predictor of this variable was mileage. Simply put, runners who ran more were fitter and raced faster. Emig and Peltonen speculated that high-mileage training achieved this effect principally by improving running economy.

Key Finding #2 – There Is No Such Thing As Too Slow in Easy Runs

Another clear pattern in the data collected by Emig and Peltonen was that runners with a higher MAP speed tended to spend more time training at lower percentages of this speed. In other words, faster runners tended to train slower relative to their ability. As an example, the collaborators tell us that a runner with a MAP speed of 4 meters per second (6:42/mile) will do most of their training between 64 and 84 percent of this speed, whereas a runner with a MAP of 5 meters per second (5:21/mile) will cap their easy runs at 66 percent of this speed. Here we have clear validation of the 80/20 rule of intensity balance, which I always like to see.

Key Finding #3 – Training Load Is Not the Gift That Keeps on Giving

Perhaps the “freshest” key finding of this study is one that validates the practice of training in macrocycles not exceeding several months in length. What Emig and Peltonen discovered on this front was that individual runners appeared to have an optimal cumulative training load representing the accumulated seasonal volume and intensity of training that yielded maximal fitness and performance. Runners gained fitness in linear fashion as the season unfolded and as they approached this total, but when they went beyond it, their fitness regressed. In short, training is not the gift that keeps on giving. Runners can train only so much and get only so fit before they need a break.

That’s science.

$ubscribe and $ave!

  • Access to over 600 plans
  • Library of 5,000+ workouts
  • TrainingPeaks Premium
  • An 80/20 Endurance Book

 

30 day money back guarentee

For as little as $2.32 USD per week, 80/20 Endurance Subscribers receive:

  • 30-day Money Back Guarantee