Sunday, April 27, 2014

Introducing CrowdPace

On the previous post, I presented a method to estimate how fast a person can run a distance based on a recent race result. I have implemented a web application, called CrowdPace, where you can see this method in action. On the right side of this blog you will find a form under the name of CrowdPace, there you can input the distance you have ran and the time it took you to ran it. Then you can enter another distance for which you want to know your estimated finishing time.

There are some caveats to keep in mind. If the time that you input was for a race on a hilly course, or during a hot day, then the estimated time will be longer than what you could run on ideal conditions or on a flat course. It's also important to keep in mind that CrowPace compares your time to the time of tens of thousands of runners to make the prediction. Some distances are more popular than others, this makes the prediction more accurate when comparing times between popular distances. As more data becomes available the estimated times will become more accurate. Finally, when calculating a time for a distance that is much smaller or much longer than the distance of the input race, please keep in mind that CrowdPace assumes that you have done equivalent training for both distances (find here some examples of equivalent training plans).

Please share your comments to help us make CrowdPace better.

Wednesday, April 23, 2014

Let the crowd suggest the pace

Meb Keflezighi's win at the Boston Marathon is one of the top art performances I have ever seen. What makes Meb's win so remarkable is that he had no margin of error and was able to perform a perfect act. A major factor for his victory was his choice of running pace, it was right on target from the beginning. It is difficult, even for elite marathoners, to pin down the best race-day pace. On this post I want to present a way to choose the right race-day pace given a (shorter) race effort a few weeks before.

Meb ran the 26.2 miles from Hopkinton to Boston with impeccable technique and a level of masterly rarely seen in the arts, let alone running. It is fair to say that Meb was the underdog in a field that included a handful of sub 2:06:00 marathoners. With a personal record of 2:09:08 (Houston, Jan. 2012), Keflezighi's only chance at the win was to run a personal best and hope for a not so good day for the rest of the field.

The lead pack had covered the first 5 km in 15:09. At this pace the finish time would have been 2:07:51. The second 5 km were ran in 15:20. Had Meb kept running at this pace he would have finished in 2:09:13, right around his best shot for a win. The brilliance of Meb was to keep running at this pace, independently of what others were doing, and hope for the best. Meb followed this plan except for the 20 to 25 km stretch, for which he clocked 14:55 opening a 1 minute gap from the pack. All other splits were ran at paces between 15:10 and 15:18. The chasing pack waited too long to catch up and Meb crossed the finish line first in 2:08:37, his personal best to date, and less than 10 seconds ahead of the runner up.

Talented runners may be able to pick their race-pace by following their gut instinct, but that is not the only option. People training for a marathon usually include a shorter race a few weeks before the big event. Knowing the finishing time at the shorter race, one can predict the right pace for the marathon by leveraging historical data from tens of thousands of runners. The method I propose here is rather simple. Suppose John Doe ran today a half marathon in 1:40:00. This means that John Doe is faster than 75% of runners at the half marathon (see top plot on the figure below).  I propose that John Doe should choose a pace for the marathon such that he is faster than the same 75% of runners. In this case it means that John Doe should aim to finish the marathon in 3:39:14.



The method I am proposing can be stated succinctly: under equivalent race conditions and equivalent training, the finishing times for a runner are such that they divide the distributions of finishing times at the same percentile, independently of the race distance. 

To validate this method, I went ahead and tested how well does it do at predicting the finishing times for actual runners. For the comparison I looked at runners who ran both the 2011 NYC Marathon and the 2011 Grete's Great Gallop half-marathon. I used the half-marathon as the pivot race to predict the finishing times for the marathon.  The following plot shows the results.



These were good races to choose because of the fair and similar weather conditions for both of them, and because the time span between them, 35 days, is long enough for runners to engage the half-marathon at race-effort but not too long for the fitness level of runners to change significantly from the date of the half to the date of the marathon. The field of the Grete's half was large, from the 4974 people who ran it, 2169 also ran the NYC marathon a month after. The predicted times are evenly distributed around the perfect prediction (49.4% of the data lies above the blue line).

The relationship between the marathon and the half marathon times is nearly linear. It is reproduced, rather accurately, by the following formulas:

M=2.50*HM-29.15 (for men)

and

M=2.41*HM-27.02 (for women),

where M stands for the marathon time and HM stands for the half-marathon time, both in minutes. I hope these were convincing arguments for letting the crowd suggest the pace.

Wednesday, April 9, 2014

Start slow finish strong

A prevalent advice for race strategy is to start running slower than your goal pace and keep increasing the pace as you cover miles. Start at a conservative pace and finish strong. Most of us have had the experience of starting a race running too fast and by the end we are running some very slow and painful miles. We can actually quantify what difference does it make to run negative splits. This means running in a way that your split times decrease as the race progresses, as opposed to running positive splits, which is to slow down as one covers miles.

For the sake of argument, I will concentrate again on data from the 2014 NYC half-marathon. This race is one of the few events that have split times at 5 km, 10 km, 15 km and 20 km. I will separate runners in 3 groups. The first group will be the negative split times group. A runner is in this first group if he or she consistently ran each 5 km split faster than the previous one. The second group will be the positive split times group. A runner is assigned to this group if he or she ran each 5 km split slower than the previous one. The third group is composed by everyone else, these are runners that for some splits they ran faster than before but for other splits they ran slower than before. 

We can now look at the distribution of finishing times for each of these groups of runners. We find that runners in the "negative splits" group are indeed faster than runners in the other two groups. In fact, the slowest runners are the ones in the "positive splits" group. 



We have not only confirmed the common wisdom that it is best to start a race at a conservative pace and then pick up the pace, we can now quantify how much better that strategy is. From the fitting curves for the distribution of finishing times (previous plot), we see that the typical male runner who runs negative splits finishes the half marathon in a time close to 1h48min, the typical runner that runs mixed splits finishes the half in about 1h54min, and the typical runner who runs positive splits finishes the half in about 2h06min. This indicates that most runners would be able to lower their finishing time by about 6 minutes if they follow a negative splits race strategy.  This conclusion would be rock solid if one could show that the ability for running negative or constant split times is mostly determined by race strategy. Although that is the case, training makes a big difference in our ability to control our pace. In other words, choosing the right race strategy helps but it only takes you up to where your training allows.

For women we find a similar case. Form the fitting curves on the following plot the typical finishing times for our three groups of runners are 1h56min, 2h08min and 2h20min.



Most women who ran mixed split times would be able to lower their finishing times by about 12 minutes if they follow a negative splits strategy for their next half-marathon.

Running negative splits is by no means an easy task. The main difficulty is to gauge appropriately your best race pace for a given day. If you initially shoot too high, you are bound to slow down by the end of the race. From the 11,000 women who ran the 2014 NYC half-marthon only 1,345 ran negative splits, 7,845 ran mixed splits and 1,824 ran positive splits. Only 12% of women ran negative splits. From the almost 10,000 men only 14% or more precisely 1,462 ran negative splits, 7,004 ran mixed splits and 1,331 ran positive splits.

Friday, April 4, 2014

2014 NYC half-marathon

The popularity of running is undeniable. Road races with fields in the thousands sell out in days, if not hours, and lotteries to get in are becoming the norm. Who are these passionate people that put their toes on the line? Let's narrow down the question and concentrate on the runners in this year's New York City half-marathon. The NYC half is fun to look at, well because it's NYC, but mostly because it has a massive field that allows for great analytics.

If you ran this past March in Central Park then passing through Times Square and you are a woman, chances are you are around 25 years old. If you are a man then you may be in your early 30s. But a plot takes us farther than 13.1 miles, so here we have the distribution by age and gender for the 2014 NYC half (you can access race results on this link).


Women dominated the field with over 11 thousand participants. The number of men was just under 10 thousand. About a third of all women, 3467 of them to be more precise, were age between 20 and 30. For the same age group there were only 1801 men. Not a bad ratio if you are a single guy who missed one of the many valentine's runs. The number of runners dwindle as age increases, somehow resembling the actual population of the US, and remind us of the finiteness of life. What does not resemble at all the actual population of the US is the number of young runners. Is the sharp jump at the age of 20 a result of human physiology or is it cultural? Even more, should young people run long distances at all? I may even point out that the hefty fees for the NYC half could be to blame for at least a fraction of that drop (sure enough, NYRR makes up with their Team for Kids initiative).


If you are running a half marathon you should absolutely plan for brunch with your friends after finishing. Just be sure to send invites for about 2 to 3 hours after the race start.



Most runners finish in about 2 hours. The mode for men is around 1:50 and for women about 2:05. It's interesting to see the pros bump at the lower end of times distribution. Not surprisingly the NYC half attracts a sizable number of professional athletes. For the numerically inclined minds the solid fitting lines are log-normal distributions. In other words, the distribution of the logarithm of the finishing times closely resembles a Gaussian. There's a piece of wisdom one can extract from this plot. If you are a guy looking to run your next half-marathon in under 1:30, and lack a running buddy to pace you, try to stick to a female runner after mile 9 or so. You can be sure she knows what she's doing, after all she's faster than 95% of all other women.


More than one third of all runners are age 40 or older. To be precise, 7140 finishers were over 40. I think this is a remarkable number, especially if you compare with other sports. Probably the second most popular sport among 40+ people is golf (assuming you agree on calling golf a sport). It may be surprising to know that performance is not affected by age as much as one could expect.


The typical man between 20 to 25 years old finishes in around 1:50. Men with more than twice their age, in the range between 50 to 55 years old will finish just about 10 minutes later. A similar drop in pace is observed for women. It is not until the age of 65 that the slowing down rate accelerates.

The previous plot only shows the typical finishing time for the average runner. But your average 20-year old pal may become part of the fastest 10% by age 65. Not a far fetched thought given the 45 years of happy running. The great news is that the typical 20 year old may become an even faster runner at age 65! At age 65 the top 10% of runners finish the half marathon in under 1:45.




I'll end with this plot showing the evolution of finishing times for runners between the top 5% and 15%. I hope you enjoyed this post, please share your thoughts in the comments.

Wednesday, April 2, 2014

Running

It was the summer of 2008 when for the first time I approached running as a sport, and not just as something to do when one is late for a plane. Six years later running is part of what defines me. My first road race was the 2009 NYRR's Brooklyn Half-Marathon. Most memories from that race are blurry except for the clear thoughts, from about mile 9 or so, that I would never ever engage in such a crazy activity again.

I'm not sure of when I changed my mind, it could have been 3 hours after the race or perhaps the week after when my legs were not sore anymore. The fact is that 4 months later I was running my second half-marathon. This time in Queens. Later that year, in October, Staten Island got me visiting for my third half. The rest is the story I'll blog about.

Here I will be posting about my running experiences and to spice up things I will be salting my words with some healthy dose of stats and number crunching. Stay tuned!