When to Pull the Goalie: Running the Numbers on NHL Goalie Pulls

12 min readJan 15, 2021

For hockey fans, it’s a familiar story. As the clock runs down in the final (3rd) period, teams losing by a goal or two will look to pull their goalie and send out an extra skater in their place. This usually results in a 5 on 6 player situation, leading to offensive pressure and generating a late game push.

This move can be effective, but it dramatically increases the chance of the opposition scoring, since they get to shoot on an empty net. Usually it’s just a matter of time until this happens, at which point it’s pretty much game over. But this is a smart risk to take, given that losing has high odds anyway if the game is played out even strength.

It’s not a question of pulling the goalie or not, but what time is best? Too early and there’s a big chance of being scored on and missing out on some 5-on-5 opportunities to score. Too late and you won’t maximize the potential of your 5-on-6 advantage.

When should you pull the goalie?

In this post I look at indicators for optimal goalie pull times. Using historical data I model the odds of scoring as a function of the time when goalie was pulled in the 3rd period.

I start by discussing some previous work done on this problem.

Then I explain how my training dataset was created, and I’ll walk through some technical details of the models (including some Python code).

Lastly, I discuss the findings.

TLDR;
As discussed in the results section of this post, I found that it’s optimal to pull an NHL goalie when there’s 3:00 left in the period. In this case, you would have 1 in 4 odds of scoring.

Source Code

The source code for this project is available on GitHub. If you notice something wrong below, then you can submit a ticket on the repo or open a pull request.

https://github.com/agalea91/nhl-goalie-pull-optimization

Previous works lack source data and visual aids

Analysts tend to agree [TSN, WSJ, Sportsnet] that pulling the goalie early tends to result in better outcomes. Many of the popular media articles rely on small data sets and report on raw statistics instead of building models.

For example, the Sportsnet article reports:

During the 2015–16 NHL regular season …
Pull between 1:30 — 5:00 remaining — 16 % chance of success
Pull < 1:30 remaining — 10 % chance of success

It would be nice to know the error on each statistic. Assuming N=700 goalie pulls in a season (where 600 of those are in the last 1:30) I can add binomial error estimates:

16 +/- 3% chance of success with 1:30–5:00 remaining
10 +/- 1 % chance of success with < 1:30 remaining

This suggests good confidence that it’s better to pull the goalie before the 1:30 mark.

Asness and Brown [2018] have published a model that suggests 6:10 is the optimal goalie pull time for a one-goal deficit.

Included in their paper is a literature review that’s reproduced below:

Source: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3132563

Overall it seems that previous works are lacking in interpretability through visual aids. Charts can also help us identifying trends.

Additionally, there was very little access to datasets the studies relied upon. Since I couldn’t find a good training dataset for my model, I went in search of goalie pull times.

The next section describes how I generated the Goalie Pull Dataset. For results, skip down to Pulling Earlier and More Often.

Goalie Pull Dataset

Here’s a link to the goalie pull game time data set from 2003–2019, which was used for the analysis below.

In this section, I’ll explain how I created this dataset and you can see some of the assumptions that went into my algorithm.

To obtain a suitable data source, I parsed goalie pull information directly from the play-by-play game sheets on NHL.com.

Here’s a sample of the result (full file at link above):

From 2003–2007 this was recorded as a timestamped row with description: <#> <NAME>, Goalie Pulled , where <#> and <NAME> label the goalie that was pulled [example].

After finding a row like this, we scan the remainder of the table looking for a goal.

If a goal is found then we cross reference the players on the ice to make sure the a goalie is not present. This step is important because there’s no data on when goalies return to the net, which happens quite commonly, e.g. in the case of a defensive zone face-off.

Also, I noticed that goalie pulls are being recorded when penalties are called, where the goalie goes to the bench for what is usually just a few seconds. To minimize false positive results as a result of this, I only searched for goalie pulls in the last 5 minutes of game time.

From 2007 onwards, goalie pulls were no longer recorded explicitly on the game sheet [example]. For these games, each row is labeled with the players on ice, so I used this to infer the estimated pull time.

Here’s the simplified algorithm:

pulls = []
for season in seasons:
  for game in season.games:
    goal_scan = False    for row in game.game_sheet_rows:
      # Look for goalie pull
      if row.is_goalie_pull:
        goal_scan = True
        pulled_goalie = row.pulled_goalie
        pulled_time = row.time      if goal_scan:
        # There has been a pull, scanning for a goal
        if row.is_goal:
          if pulled_goalie not in row.players_on_ice:
            # We have found an empty net goal
            pulls.append({
              "season_game": [season, game],
              "pulled_time": pulled_time,
              "goal_time": goal_time
            })          # We have found an empty net goal
          pulls.append({
            "season_game": [season, game],
            "pulled_time": pulled_time,
            "goal_time": goal_time
          })

Additionally, I record which team scored the goal (for / against) and track goalie pulls that result in no goal.

Pulling Earlier and More Often

Let’s start with looking at some overall trends and statistics in our newly obtained data set.

The full source code for this analysis is available in a Jupyter Notebook here: https://nbviewer.jupyter.org/github/agalea91/nhl-goalie-pull-optimization/blob/master/notebooks/src/3_exploratory_analysis_2003-2019.ipynb

Goalie Pulls Trending Up

Plotting the goalie pull count histogram over time, we see a pretty even through each season (as would be expected) with gaps during the off-seasons and some lockout years.

We see a gradual increase in total number of goalie pulls over this time. Expansion teams entering the league would naturally push the total counts up, but we also see the average number of pulls per game increasing:

Marginal Gains on Positive Outcomes

Below I split this chart out based on the outcome: goal for, goal against or no goal.

The increase in goalie pulls over the years has only resulted in slight increases of goals for, with most of the additional pulls resulting in a goal against.

You might have noticed the blue outlier point in the chart above this one for the 2015/2016 season. Here we see that is mostly due to an unusually high number of goals against (red), as opposed to goals for (blue) or no goal outcomes (yellow). Perhaps this poor return on good outcomes explains why the next year we see average pulls per game return to the trend.

Interestingly, we see a downwards trend in goalie pulls where no goal is scored (i.e. the game ends).

Emerging Trend to Pull Goalies Earlier

From 2003–2013, average goalie pull times gradually increased from about 1.2 to 1.3 minutes remaining in the game. Now, as of 2019, goalies are being pulled an average of 45% sooner at 1.9 minutes remaining.

This is illustrated in the following box plot of average goalie pull times:

The average time remaining for each season is marked with a solid black line through the middle of the bar, while the upper whiskers give a sense of the variation in each segment.

In recent years we can see increasingly large contribution of relatively early goalie pulls (e.g. above 3 minutes remaining). Historically, these points were just outliers.

Goalie Pulls are Left Skewed

As would be expected, goalie pulls occur more frequently as the game clock winds down.

Labelling by outcome, we see that late game pulls tend to have no-goal outcomes (yellow). Not having normalized the histograms, we can visualize the high likelihood of a goal against (red), compared to goal for (green).

Note the sparsity of data below ~17.5 mins and above ~19.5 mins. This will end up leading to huge uncertainty in the likelihood calculations below.

“Goals for” Lead “Goals Against”

Whereas the charts above represent goalie pull times, below we look at the times when goals were scored following a pull.

These tend to occur very late in the game, with goals against (red) slightly lagging goals for (green). This is logical given that teams intentionally pull the goalie when they are in a strong offensive position and usually get a scoring opportunity before the opposition does.

Overall, exploratory analysis reveals that we have a highly noisy dataset where statistically significant optimization will be difficult, especially due to a lack of data for early pulls (prior to ~2.5 min remaining).

Despite this, I feel that our dataset showed some interesting trends and yielded valuable insights. It is also much larger than data sets used in other studies and open source.

I encourage others to study and help validate the dataset, which is available on GitHub.
New seasons can be added to the analysis by forking the project and expanding the source code.
⭐ Pull requests are welcome
⚠️ Please respect the API quotas when using this code

Building a Bayesian Model

Bayesian statistics tends to be well suited for sports modeling, in part because of the scarcity of training data. In general bayesian statistics is typically applicable, relatively easy to work with and helpful for error estimation (as seen below).

The full source code for this modeling is available in a Jupyter Notebook here: https://nbviewer.jupyter.org/github/agalea91/nhl-goalie-pull-optimization/blob/master/notebooks/src/4_bayes_gamma.ipynb

Since I am interested in the optimal pull time, I’ll first fit the outcome (goal for, goal against, no goal) distributions above. The Gamma distribution is a suitable choice for modeling the data:

where t is the time elapsed in the 3rd period, alpha and beta are parameters to be determined using Bayes rule, and P will be the posterior probability of an outcome.

Using our full 2003–2019 goalie pull dataset X, I’ll solve for the probability of the outcome y, i.e. P(y|X; t). This is done computationally using PyMC3’s Markov Chain Monte Carlo (MCMC) algorithm. The outcomes of interest are y={goal for, goal against, no goal}.

I set up uniform priors on the Gamma parameters alpha and beta, and solve for these using MCMC and our observations on Gamma. With PyMC3 handling the heavy lifting, the code for this is deceivingly simple. For more details about the calculation, you can check out the source code.

MCMC Samples

Looking at the trace of our MCMC calculation for alpha and beta, we see convergence rather quickly:

When performing this calculation, PyMC3 also samples P(y|X; t) for us. Below I plot those samples along with the theoretical distributions (i.e. using values I calculated for alpha and beta).

Normalizing these as per population sizes in the training data, we see the following charts:

Early Pulls Yield More Goals

Here we can see maxima to the left of the 19 minute mark for goals and to the right of the 19 minute mark for no goals. These represent the most common times goalies are pulled for each outcome. The exact values are:

+--------------+----------+--------------+---------+
|              | Goal For | Goal Against | No Goal |
+--------------+----------+--------------+---------+
| Time Elapsed | 18.6     | 18.7         | 19.3    |
+--------------+----------+--------------+---------+
| Game Clock   | 01:24    | 01:19        | 00:41   |
+--------------+----------+--------------+---------+

Successful Outcomes are Unlikely

Looking at the cumulative distributions tells us about the average outcome rates:

On the right hand side of the chart, we see that no goal outcomes are about twice as likely as goal against outcomes, which in turn are about twice as likely as a goal for (the success case). This is summarized as follows:

+------------------+----------+--------------+---------+
|                  | Goal For | Goal Against | No Goal |
+------------------+----------+--------------+---------+
| Mean Probability | 0.13     | 0.33         | 0.53    |
+------------------+----------+--------------+---------+

Odds of Success are 20% if Pulled Early

In order to determine the optimal pull time, I re-normalize the posterior probabilities such that each time slice (along the y-axis) adds up to one. This way I can see how the odds of each outcome fluctuate over time.

Mathematically this is done by multiplying P(y|X; t) with a function c(t), as defined by:

Re-normalization function

The result is as follows. Keep in mind that the x-axis corresponds to the time when the goalie is pulled. For example, if pulling the goalie at t=19 min (01:00 game clock) there’s a 30% chance of a goal against outcome.

This chart leads to several interesting observations:

The odds of a goal for are ~20% up until the 02:00 mark (peaking at 03:00). Then they approach zero gradually through 02:00–01:00 remaining, and more rapidly in the final minute.
Odds of a goal against drop off linearly up to the 02:00 mark, dropping from a high of ~60% to ~40%. From 02:00 onwards it follows the same trend as goals for.
Odds of no goal starts low and increases exponentially as the game clock ticks down.
If pulling the goalie with 30 seconds left, the odds are 5% goal for, 15% goal against and 80% no goal.

Outcomes are Uncertain for Very Early Pulls

When interpreting this chart we must be careful to think about the high statistical uncertainty associated with earlier pulls. Using the standard deviation of the alpha and beta MCMC samples (seen above in the trace plots), we can perform error propagation to estimate these uncertainties:

Error propagation with partial derivatives

This results in the following error band estimates:

As expected, uncertainty plays a large factor for early pull times, and odds for times earlier than 03:45 cannot be accurately distinguished. Note that the singular points are a result of error propagation with partial derivatives and should not be interpreted literally.

Look to Pull ASAP after the game clock hits 03:00

Following from the result above, we can calculate the odds of success when pulling the goalie at time t in the 3rd period.

The maximum likelihood is 26% ± 4% at the 03:00 mark on the game clock. In other words, pulling the goalie with 3 mins left in the 3rd period has historically yielded a 1/4 chance of success.

Following the line over to the right, we see the odds of success drop to zero as the game clock winds down. Like the chart above, we have very little statistical confidence in our model for earlier goalie pulls, due to a lack of training data.

Conclusion

It’s generally well accepted that so-called “analytics” tells us we should be trying to pull the goalie earlier than was perviously done.

This work supports this view through use of visual aids and models of goalie pull results that vary as a function of time left in the game.

The dataset and statistical method used for this work is open source, and I hope they can influence future research on the subject.

Thanks for reading 🏒
- Alex
alexgalea.ca

Special thanks to Willem Klumpenhouwer @wklumpen for reviewing this work and offering very helpful advice.

Please direct technical questions, comments or concerns through GitHub’s issue tracker.