The NCAA Division 3 Cross Country Championships are Saturday in Louisville, Kentucky. As we have for the past two years, we have performed speed rating analysis for a set of meets over the season and are ready to predict what will happen in the championship races on Saturday. What follows is an analysis of the team competition exclusively; we do not take into account or make any predictions for the overall individual finishes and All-American selections.

Data and Methods

This year, we have calculated speed ratings for 58 races from Week 1’s season openers through to Week 12’s regional championships. These 58 races were selected before the season began based on the schedules of the teams ranked in the preseason coaches poll. We used this as our race sampling strategy to try and make the task a little more manageable and organized compared to prior years. All of the biggest D3 meets were included and we introduced additional smaller meets to help give us a more accurate sense of season-long dynamics and trends.

As for our method, this is not the place to present a detailed explanation of how speed rating works. However, we will say that speed rating is a process by which we convert times run on any given cross country course to a standardized time. This standardized time depends, in part, on how much faster or slower a given race ran compared to our baseline race. In all cases, we use the 2015 Mike Woods Invitational hosted by SUNY Geneseo as our baseline comparison. Speed ratings also work the opposite of normal race times: higher is better! To estimate the course correction for each race, we employed a combination of predictive modeling and other automated techniques in order to produce more consistent estimates across races. Humans did end up checking to make sure that the course corrections produced speed ratings within reason. (e.g. The whole race shouldn’t be slower or faster than expected.)

To predict the outcomes from Saturday’s team races, we first calculated the weighted average speed rating for each runner competing. Our weighting scheme gives a lot of weight to each runner’s most recent race, as well as extra weight to their best performance. We also calculated the uncertainty around these estimates. Then, using these values, we simulated the race 10,000 times by making random draws from a normal distribution defined by the weighted average and variance for each runner. We then scored each race. This probabilistic forecast allows us to consider how likely it is that each team will finish in a given place overall, as well as what their likely score will be.

Simulation Results - Individuals

All of our estimated values for each runner come with their own degree of uncertainty. This uncertainty represents the degree to which each runner has been consistent in their performances over the year. Most runners experience some sort of ups and downs over the course of the season for a variety of reasons, including working out during competitions, returning from injury, or facing difficult racing conditions. Our speed ratings cannot account for these factors, although we have tried to drop performances that we know for a fact to be workouts. Instead, we rely on our weighting scheme which gives more value to recent and best performances and less value to older and worse performances. As such, we are left with a measure of the weighted variance in each runners’ performances. By accounting for this variance, we can assess a whole set of possible race outcomes and evaluate their likelihood of occurring.

To do this accounting, we simulated the meet 10,000 times, each time taking a random draw from each runner’s unique distribution of possible performances. We then ordered each race and scored the meet. We can then use all 10,000 race results to assess the most likely outcomes on Saturday.

First, though, let’s consider the individual results from the simulations. These are presented in the following table. This table is ordered based on the mean speed rating observed across all 10,000 simulations. To get a sense of how variable some runners are compared to others, we also present the standard deviation for each runner’s average, as well as their median speed rating, their 25th percentile speed rating, and their 75th percentile speed rating from all 10,000 simulations.

One conclusion that should be obvious from looking at the table is that some runners have a much wider range of possible outcomes across our simulations than others. The best example of this may be Sam Pinkowski of Wartburg, who has a mean speed rating of 192.23, but a standard deviation of 6.29 speed rating points. This high level of variance is in part due to the fact that Pinkowski had a notably underwhelming race at his regional meet, running a season worst 171.16. Given that Wartburg competes in the American Rivers Conference, and their conference meet was effectively unrateable and dropped from our analyses, we have very little certainty about what to expect from Pinkowski come Saturday. If he runs his best, we would not be surprised to see him win. But, if he continues to tail off, he could end up well over 100th place. Logan McKenzie of Berea is in a similar boat after a disappointing performance in the South Regional, his worst over the last three years.

As an intermediary step, we can calculate our first team simulation results directly from these individual rankings. Such results are presented in the following table, with estimates based on either runners’ mean speed rating or their median speed rating. For each team, we present their mean speed rating based on either each runner’s mean or median, as well as the points scored based on these rankings. Clicking on a team reveals the individual data for each of their seven runners.

Off the top, we can tell that North Central and Williams are very evenly matched, with North Central squeaking out a win based on mean speed ratings and Williams winning if median performances are considered instead. We will need to better account for the variability in team performance to make a final determination over which we think should be the slim favorite on Saturday.

Simulation Results - Teams

Just as individual performances in our simulation come with a variable amount of uncertainty, so do our team performances. As we can see from the results compiled just from the average individual performances in the 10,000 simulations, this year’s championship is likely to be a very close affair, making the variability observed in our simulations all the more important.

As we can see in the next table, Williams and North Central are nearly indistinguishable across all 10,000 simulations. Their mean scores are just over 2 points apart and their median scores are only 1 point apart! Their scores over all simulations also have similar amount of variance, with North Central producing just a little bit more erratically than Williams. This marginally greater amount of dispersion means that North Central has a slightly lower floor across our simulations, but also a marginally higher ceiling. All things considered, this race appears too close to call.

Interestingly, the race behind Williams and North Central is much more competitive, with several teams clustering close together between 250-300 points and 315-375 point ranges. Many of these teams have been highly evaluated by the USTFCCCA Coaches Poll and Flotrack’s FloXC Rankings. Yet, our analysis suggest some slight reordering. Some teams ranked highly in these polls, though, do not appear to be nearly as strong contenders as their evaluations would suggest. In particular, Johns Hopkins, ranked 6th in the USTFCCCA poll and 4th by FloTrack, is only the 15th best team according to our simulations. Yet, our model gives a lot of weight to the most recent results. Hopkins faced off against relatively weak competition in both their conference and regional championships. It is entirely possible (and highly probable) that these performances do not reflect 100% efforts from the Hopkins runners. The same could likely be said of SUNY Geneseo and Carnegie Mellon, two other teams that have shown flashes of dominance, but have not matched their best performances during this championship season. We would expect, then, that these teams will run towards their best performances instead of their averages come Saturday. Similarly, the more a team races within its region compared to out of it, the harder it is for us to get a sense of how accurate their ratings really are. This problem mainly affects teams from the South and West regions who have to travel farther to participate in large inter-region competitions.

We also want to give some discussion to Wartburg’s chances. As we can see in the table, Wartburg has the third best average score across our 10,000 simulations. Such a result, though, is highly dependent on Sam Pinkowski, who we discussed earlier, running much as he did earlier this season and late last season. As we noted above, his regional race was a dramatic negative outlier compared to his average. If he runs even 4-5 speed rating points below his average, which would be a fraction of the departure from his average we saw in his regional race, Wartburg would quickly fall of the podium as their third runner seems unlikely to make the jump to Pinkowski’s historical best.

To get a better sense of just how close some of these teams are, consider the following figure visualizing the median score across all 10,000 simulations, with bars indicating the 25th and 75th percentiles of scores. The visualization helps us to see first, the gap between the top two teams and the rest of the field. Second, we can see how some teams have produced more uncertainty about their ability than others. These teams have larger windows of likely outcomes because their prior performances have not been consistent. This means that it may be hard to argue against them improving on their predicted rank. However, it remains an open question whether such large variance could actually lead to a surprise team on the podium.

The median score for each team across 10,000 simulations. Bars mark the range from the 25th to the 75th percentile of scores for each team. Teams are ordered by simulation rank.

The median score for each team across 10,000 simulations. Bars mark the range from the 25th to the 75th percentile of scores for each team. Teams are ordered by simulation rank.

To get at this question of whether we could see a surprise team on the podium, we should calculate the probability a team finishes on the podium based on our simulation results. In the figure below, you can see the probability each team, ordered by their simulation rank, will earn any podium spot, first place, second, place, third place, or fourth place. Obviously, from the two leaders on, the probability of making the podium declines to just above zero for MIT, the ninth best team in our simulations. However, that probability rises slightly for the next four teams: Chicago, Carnegie Mellon, Geneseo, and RPI. This behavior suggests that these teams have a degree of variance in their performances all season that could be masking their true ability. As we’ve already explained, holding back at conference and regional championships compared to going all out at large interregional meets could produce just such a discrepancy. As such, these teams should not count themselves out of the mix, even if we may not recommend picking them for the podium given the actually low odds of them doing so.

The probabilities each team earns any spot on the podium, first place, second place, third place, or fourth place. Schools are ordered based on simulation rankings. Probabilities calculated based on the number of occurrences of each event across all 10,000 simulations.

The probabilities each team earns any spot on the podium, first place, second place, third place, or fourth place. Schools are ordered based on simulation rankings. Probabilities calculated based on the number of occurrences of each event across all 10,000 simulations.

Conclusion - Who is Going to Win it All?

So, who should you expect to win on Saturday? The best answer is to probably hedge and say one of Williams or North Central. However, this is rather unsatisfying. The race between these two top teams could come down to how well Ryan Cox performs. Last year, Cox finished 6th overall. It appears he has been dealing with some sort of injury which has set back his performances. Yet, if he can produce a big race like he did last year and finish with single digit points, North Central may not be able to overcome that gap. Nick Gannon and Elias Lindgren also face the tough challenge of not sliding back at all from their projections. Williams could potentially lose about 15 points between their fifth runner, Kenneth Marshall, and North Central’s, Alec Beutel, that Gannon and Lindgren carry the extra weight of trying to make up that deficit.

If North Central is to win, they will need to hit their peak perfectly. On paper, it appears likely that the team has been holding back in their conference and regional performances. While this suggests that our estimates of their ability are conservative, it does not help us to predict by just how much they should improve. We would wager that if North Central can see each runner improve between 2-3 speed rating points, or about 9-13 seconds, without a similar improvement across Williams’s whole squad, they may just be able to make up any potential big race from Ryan Cox. Is such an improvement likely? For either squad? The answers here seem to rest in North Central’s favor. Williams’s 3-5 runners have run consistently within about 3 speed rating points all year. Gannon has a standard deviation of 1.20 speed rating points, the eigth lowest standard deviation of any runner in the team competition on Saturday. Any large improvements from Gannon, then, will come as an absolute surprise. North Central on the other hand, has only one of their top-five runners with a standard deviation under 3 points: Gabriel Pommier. If we score the meet based on the 75th percentile of speed ratings for competitors, North Central ends up with four runners ahead of Williams’s third runner.

For this reason, we will be predicting a close North Central victory on Saturday. We believe that North Central’s top-five runners will have the modest improvement across the board necessary to hold off big improvements from Williams’s top two.

Why You Should Listen to Us

This time of year is ripe with discussion between cross country fans about who will surprise and disappoint at the NCAAs on Saturday. Over the last two years we have channeled that same energy into trying to calculate speed ratings and predict from those ratings the team results. Last year, our predictions were serviceable. We predicted the eventual winner (North Central), but that was obvious to any moderately informed follower of the sport. We also correctly picked second and third, which we think is pretty good. Our pick for fourth place, Calvin, ended up finishing a disappointing 19th, but only after suffering a bad fall early in the race. (RPI and Claremont-Mudd-Scripps, teams we picked to finish 7 spots better than they did, also got caught up in this fall.) Our pick for the 32nd team, the last in the race, ended up in 21st. On the whole, our average error in rank order was only 4.5 places, a measure we do not think is particularly note worthy, but also not horrible. In regards to points, we averaged a difference just over 81 points, which again, we think is not bad when teams can score over 600 points.

But, having had moderate success last year is not the real reaosn you should take our predictions this year seriously. Rather, we think our refinements of our method warrant serious consideration. Last year, we developed an approach that weighted races over the course of the season differently, based on when in the season they fell and how much they deviated from a runner’s average. A bad race received less weight, a good race received more. The most recent race received the most. However, we think we spread out these weights too evenly. Based on exploratory analyses, we think much more of the weight should be given to the most recent race and the best race a runner has in a season. As such, we’ve adjusted our methods.

Similarly, we’ve developed a more consistent approach for calculating speed ratings. In the past, as we developed the software we use to conduct our analyses, we relied heavily on human coding of races. This meant that inevitably, certain biases and inconsistencies appeared between races. This year, we relied more on predictive modeling and other automated techniques to generate consistent course corrections. These were still reviewed by humans, but ultimately led to less human involvement and assessment.

All together, we see ourselves making progress in addressing these two essential areas of speed rating. The collegiate context is much more difficult to deal with, as there is greater inequality in talent within certain conferences and regions than you would find in most geographically organized high school leagues, especially those in New York State, where speed rating analysis has been widely developed and employed. This means that we know we probably don’t have the formula perfectly tuned just yet. But, we think we are much closer this year than last, and we hope that the results bare that out come Saturday.