Predicting PGA Golfers’ Approach Shot Location

Posted by

We’re launching something new here at BigML:  a highlighted Model of the Week.  Each week we’ll detail the ins and outs of a model that has been built and shared on the BigML gallery.

Have you built a model that you’d like to share with the BigML community?  If so, let us know and if we like it we’ll feature your work on the BigML blog!

PGA Tour logo
PGA Tour logo (Photo credit: Wikipedia)

What is the objective of the model?
With the PGA Season drawing near a close, we wanted to determine the outcome of approach shots by professional golfers at the PGA Tour Championship golf tournament, which is played every year at the East Lake Country Club in Atlanta, GA, and takes place this year September 19-22.  You can view the full model here.

What is the data source?
The PGA Tour made Shotlink data available for research purposes, and you can download the data set here.

What was the modeling strategy?
To optimize the data for modeling purposes, we first turned the .txt file into a .csv, and then processed the data such that the entries were only approach shots (distances under 250 yards) on Par 4 holes.  In addition, we trimmed the fields to those most relevant to the study, thereby eliminating redundant or irrelevant data.  Once the data source was uploaded to BigML, we iterated a number of models before settling on key fields for the model.

What were the fields?
Fields selected for this model were:  hole number (there are 12 different par 4 holes), from location (fairway, rough, bunker, etc), pre-shot distance to pin (in yards), ball lie (good or unknown), elevation of ball lie (level, above/below feet), and slope of ball lie (level, uphill/downhill).  Our objective field was the “to location” (green, fairway, primary rough, bunker, water, etc.).  In total there were 1435 instances.

What did we find?
While the top node of the tree was the “from location” (primary rough or not primary rough — most likely the fairway), the field with the greatest importance (61.5%) was distance to pin.  It seems that even professionals are always better off being nearer the pin (even in a poor lie), rather than further back.


By using the sunburst view, we were able to see that the most confident prediction (87.55%) was that a approach would reach the green if the shot came from the fairway, with a good lie, and between 54 and 121 yards.

sunburst pga

As we can sympathize with the weekend hacker more than the skilled professional golfer, we naturally wanted to see what was most likely to cause someone to hit into a sand bunker on their approach shot.  The most confident ‘bunker’ scenario is shown below, where a golfer on the 10th hole, approaching from the rough between 190 and 197 yards has a 51% chance of landing in the bunker (although there were only 4 instances).


We encourage you to clone the dataset to your own account, and start running your own models and predictions.  In addition to the approach shot location, you can change the objective field to predict distance to pin -– or you can include added fields such as player name to see what golfers were most likely to be successful on their approach shots.

If you’re tuned into the PGA Tour Championship later this month, impress your friends by predicting the outcomes of approach shots before they happen!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s