The NCAA Division 3 Cross Country Championships are Saturday in Louisville, Kentucky. As we have for the past two years, we have performed speed rating analysis for a set of meets over the season and are ready to predict what will happen in the championship races on Saturday. What follows is an analysis of the team competition exclusively; we do not take into account or make any predictions for the overall individual finishes and All-American selections.

Data and Methods

This year, we have calculated speed ratings for races from Week 1’s season openers through to Week 12’s regional championships. These races were selected before the season began based on the schedules of the teams ranked in the preseason coaches poll. We used this as our race sampling strategy to try and make the task a little more manageable and organized compared to prior years. All of the biggest D3 meets were included and we introduced additional smaller meets to help give us a more accurate sense of season-long dynamics and trends.

As for our method, this is not the place to present a detailed explanation of how speed rating works. However, we will say that speed rating is a process by which we convert times run on any given cross country course to a standardized time. This standardized time depends, in part, on how much faster or slower a given race ran compared to our baseline race. In all cases, we use the 2015 Mike Woods Invitational hosted by SUNY Geneseo as our baseline comparison. Speed ratings also work the opposite of normal race times: higher is better! To estimate the course correction for each race, we employed a combination of predictive modeling and other automated techniques in order to produce more consistent estimates across races. Humans did end up checking to make sure that the course corrections produced speed ratings within reason. (e.g. The whole race shouldn’t be slower or faster than expected.)

To predict the outcomes from Saturday’s team races, we first calculated the weighted average speed rating for each runner competing. Our weighting scheme gives a lot of weight to each runner’s most recent race, as well as extra weight to their best performance. We also calculated the uncertainty around these estimates. Then, using these values, we simulated the race 10,000 times by making random draws from a normal distribution defined by the weighted average and variance for each runner. We then scored each race. This probabilistic forecast allows us to consider how likely it is that each team will finish in a given place overall, as well as what their likely score will be.

Simulation Results - Individuals

All of our estimated values for each runner come with their own degree of uncertainty. This uncertainty represents the degree to which each runner has been consistent in their performances over the year. Most runners experience some sort of ups and downs over the course of the season for a variety of reasons, including working out during competitions, returning from injury, or facing difficult racing conditions. Our speed ratings cannot account for these factors, although we have tried to drop performances that we know for a fact to be workouts. Instead, we rely on our weighting scheme which gives more value to recent and best performances and less value to older and worse performances. As such, we are left with a measure of the weighted variance in each runners’ performances. By accounting for this variance, we can assess a whole set of possible race outcomes and evaluate their likelihood of occurring.

To do this accounting, we simulated the meet 10,000 times, each time taking a random draw from each runner’s unique distribution of possible performances. We then ordered each race and scored the meet. We can then use all 10,000 race results to assess the most likely outcomes on Saturday.

First, though, let’s consider the individual results from the simulations. These are presented in the following table. This table is ordered based on the mean speed rating observed across all 10,000 simulations. To get a sense of how variable some runners are compared to others, we also present the standard deviation for each runner’s average, as well as their median speed rating, their 25th percentile speed rating, and their 75th percentile speed rating from all 10,000 simulations.

One conclusion that should be obvious from looking at the table is that some runners have a much wider range of possible outcomes across our simulations than others. The best example of this may be Parley Hannan of Ithaca, who has a mean speed rating of 144.26, but a standard deviation of 6.49 speed rating points. This high level of variance is in part due to the fact that Hannan had a slower start of the year in a few smaller meets with poorer competition. If she runs her best, we would not be surprised to see her win. Yet, we don’t actually believe she will run to the low-end of her projections, unless her recent improvements have come from all out tapering. Genny Corcoran of Geneseo is an interesting comparison to Hannan, given the number of times they have competed against each other all year. Corcoran, though, has run much more aggressively in many more of her meets, reflected by a slightly lower uncertainty. Her standard deviation of 5 is also inflated from running a relaxed conference championship meet with no real competition.

As an intermediary step, we can calculate our first team simulation results directly from these individual rankings. Such results are presented in the following table, with estimates based on either runners’ mean speed rating or their median speed rating. For each team, we present their mean speed rating based on either each runner’s mean or median, as well as the points scored based on these rankings. Clicking on a team reveals the individual data for each of their seven runners.

Off the top, we can tell that Johns Hopkins and MIT are very evenly matched, with Hopkins squeaking out a win based on mean speed ratings and median ratings. We will need to better account for the variability in team performance to make a final determination over which we think should be the slim favorite on Saturday. Notably, though, we have another close race between the third and fourth best teams in our projections: Wash U. and Carleton are 3 points apart.

Simulation Results - Teams

Just as individual performances in our simulation come with a variable amount of uncertainty, so do our team performances. As we can see from the results compiled just from the average individual performances in the 10,000 simulations, this year’s championship is likely to be a very close affair, making the variability observed in our simulations all the more important.

As we can see in the next table, Hopkins and MIT are close, but maybe not as close as our initial estimates. Hopkins by all appearances looks like a slim favorite with no margin for error. However, we should remember that Hopkins’ best runners have taken a conservative approach to racing. It is entirely possible their actual ability is much better.

To get a better sense of just how close some of these teams are, consider the following figure visualizing the median score across all 10,000 simulations, with bars indicating the 25th and 75th percentiles of scores. The visualization helps us to see first, the gap between the top two teams and the rest of the field. Second, we can see how some teams have produced more uncertainty about their ability than others. These teams have larger windows of likely outcomes because their prior performances have not been consistent. This means that it may be hard to argue against them improving on their predicted rank. However, it remains an open question whether such large variance could actually lead to a surprise team on the podium.

The median score for each team across 10,000 simulations. Bars mark the range from the 25th to the 75th percentile of scores for each team. Teams are ordered by simulation rank.

The median score for each team across 10,000 simulations. Bars mark the range from the 25th to the 75th percentile of scores for each team. Teams are ordered by simulation rank.

To get at this question of whether we could see a surprise team on the podium, we should calculate the probability a team finishes on the podium based on our simulation results. In the figure below, you can see the probability each team, ordered by their simulation rank, will earn any podium spot, first place, second, place, third place, or fourth place. Obviously, from the two leaders on, the probability of making the podium declines to just above zero for Dickinson, the tenth best team in our simulations.

The probabilities each team earns any spot on the podium, first place, second place, third place, or fourth place. Schools are ordered based on simulation rankings. Probabilities calculated based on the number of occurrences of each event across all 10,000 simulations.

The probabilities each team earns any spot on the podium, first place, second place, third place, or fourth place. Schools are ordered based on simulation rankings. Probabilities calculated based on the number of occurrences of each event across all 10,000 simulations.

Conclusion - Who is Going to Win it All?

Based on the fact that Hopkins already has a small advantage and that we believe that advantage will grow with a tapered and peaked performance this weekend, we will be selecting Johns Hopkins in a narrow, but comfortable race over MIT. Behind them, we will go with our simulations and take Carleton and Wash U., although we could be easily convinved to reverse that order.

Why You Should Listen to Us

We think our refinements of our method warrant serious consideration. Last year, we developed an approach that weighted races over the course of the season differently, based on when in the season they fell and how much they deviated from a runner’s average. A bad race received less weight, a good race received more. The most recent race received the most. However, we think we spread out these weights too evenly. Based on exploratory analyses, we think much more of the weight should be given to the most recent race and the best race a runner has in a season. As such, we’ve adjusted our methods.

Similarly, we’ve developed a more consistent approach for calculating speed ratings. In the past, as we developed the software we use to conduct our analyses, we relied heavily on human coding of races. This meant that inevitably, certain biases and inconsistencies appeared between races. This year, we relied more on predictive modeling and other automated techniques to generate consistent course corrections. These were still reviewed by humans, but ultimately led to less human involvement and assessment.

All together, we see ourselves making progress in addressing these two essential areas of speed rating. The collegiate context is much more difficult to deal with, as there is greater inequality in talent within certain conferences and regions than you would find in most geographically organized high school leagues, especially those in New York State, where speed rating analysis has been widely developed and employed. This means that we know we probably don’t have the formula perfectly tuned just yet. But, we think we are much closer this year than last, and we hope that the results bare that out come Saturday.