In the start of Ben Lindbergh and Travis Sawchik's new book, The MVP Machine, the authors focus on players who have drastically improved their game by way of embracing the data. Checking in on Trevor Bauer during the winter of 2018, Lindbergh and Sawchik tell the story of Bauer's exhaustive data collection from the mound, using high definition photo's of his delivery and the resultant pitch.
There have been many debates and great articles written about the arm slot. From The Hardball Times, article on varying arm slots and the pitch types they create, to the extensively detailed breakdown of biomechanics by Graeme Lehman, there is a lot to be uncovered in the angles. Here, I'll detail how a deep-learning model could potentially give great insight to a pitcher's future and valuation.
What is deep-learning?
Deep-learning is a division of machine learning that was inspired by the human brain and the firing neurons that make us able to transmit information all over our bodies. Thus, the neural network is a form of deep-learning that transmits information data through a vast network of similar neurons and makes decisions based on it's analysis. For more on deep-learning visit the best resource for such things, Machine Learning Mastery. A version of this deep-learning is called a convolutional neural network (CNN). This network is best at what is called, computer vision. Basically, computer vision is a form of AI that allows a computer to interpret images in a way a human might. Here's a version of this type of network that I created to analyze images of clothing to determine what category of clothing (hat, shoe, coat, shirt) it belongs to. The network takes in an image, interprets it pixel by pixel, does some serious mathematics and then, spits out a category, like, "This is a shirt" or, in the case of an image of a pitcher, "This is a pitcher who is likely to develop a shoulder issue in the next 6 months if they do not change their arm slot."
What can a CNN do?
Let's use a hot topic, injuries. Wouldn't it be nice if MLB organizations could better predict the likelihood of a young prospect pitcher developing a serious arm injury? Think of all the young prospects who show so much potential only to be sidelined for 2 years with a UCL injury. Could we have predicted Michael Kopech's injury without a deep-learning model? Probably. But, what about all the other young prospect pitchers who show so much promise. How can a CNN help? Going back to the story of Trevor Bauer and his off-season workouts, think about all the images he was collecting. Let's imagine over the course of the winter, Bauer and a number of other pitchers collected thousands of images. What if those images were saved and stored. Who would do that? Well, a place like the gym where Bauer was working on his mechanics, Driveline Baseball could do something like that. But, the question is, what do you do with this data set of images? Time to let the CNN shine! First, we will need a baseline, a database of pitching delivery images of pitchers who later developed a shoulder issue, a UCL issue, needed Tommy John's surgery or some other type of common pitcher injury. Once we have this we can train the network on these images. We should also have a large dataset of pitchers who did not develop an injury. We would essentially give a category to those pitcher images analyzed in the form of a type of injury or no injury and then train the network with these categorized images. If we categorized images with no injury as a 0, we could say a 1 is a UCL injury, a 2 is a should issue, a 3 is a forearm issue and so forth. Next, we could use this network in determining an upcoming player's valuation. Do they have a delivery that our network categorizes as a potential injury (a 1, 2, or 3) candidate? If so, can we get the young pitcher to alter their delivery to save their arms and then re-analyze? Can we de-value those players who are unable to change their delivery and keep driving down the road to injury? The benefits of running a model like this kind of computer vision CNN would be very beneficial to MLB organizations looking at draft picks, developing scouting and valuing free-agents. Of course, with all big data projects the requirements are concrete, but the technology to acquire and analyze this data is present in many baseball companies and organizations. Just let me know if you would like it done for yours. I'm open to working weekends.
0 Comments
I love good defense. Watching a center fielder chase down what should have been a blooped in single and creating a shocked reaction from the baserunner as he turns and realizes he's out is priceless. That classic, one hand in the dirt, rest of the shortstop's body flying through the air snag, is truly my favorite. I know what people say about the excitement of a home run and I get it. The rifle-like, cracking sound of bat on ball, closely followed by fans standing and cheering and spilling and spitting! God I'm going to miss baseball this winter!
As as the season comes to a close, we celebrate more than just home runs. We celebrate and award players for all their actions on and off the field. With that, it's nearly time to award the best defensive players of the year with the Rawlings Gold Glove Award. There's nothing like having a gold glover on your team and being able to watch them hold it down in the field all season long.
The Rawlings Gold Glove Award
Like many awards, managers and team coaches get to vote. Managers can't vote for players on their own team and they have to stay in their own league (AL/NL). In addition, they have to vote for players who qualify (mostly needing at least 713 total innings) as laid out by Rawlings. It's nice to have the men who are closest to the game voting and giving out these awards. But, there must also be some quantifiable way to determine who is deserving. According to Rawlings, 25% of the vote is left up to metrics. Using the SABR Defensive Index, advanced analytics are now built into the award. This index is includes: - Defensive runs saved (DRS) - Ultimate Zone Rating (UZR) - Runs Effectively Defended - Defensive Regression Analysis - Total Zone Rating
Sometimes players just jump over the gold and go platinum.
What about Machine Learning?
What if we could take that 25% of the vote coming from data and boost it? What if we still left the 75% up to a vote among the coaches, but were able to give them a pool of the most qualified players based on the metrics. What if we learned from the past to predict the future? Let's let computers create a pool of players each year to vote from. Okay, sure, old-school coaches and managers could vote outside of the pool of candidates if they felt the computer just doesn't know what it's talking about. Just check the 'Other' box and fill in your vote. Here's how it's done. Using Fangraphs.com's leaderboard, I downloaded standard and advanced defensive metrics from 2002 on. This 2002 limit is due to the fact that it was in this year that advanced metrics such as (UZR, DFS, BIZ) began to be recorded. The model was trained on the following metrics:
If you compare the features held in this model to the SABR Defensive Index used to measure GG candidates likelihood of receiving an award, you can see how much more detailed and full picture the ML model is. Next, I attached a target column to this defensive fielding data, whether the player was awarded a gold glove at the end of the season (1 categorical variable) or not (0 categorical variable) Next, we let computers learn on these metrics, see who was awarded a GG based on specific metrics and voila! We have a trained model. The results were right in line with the SABR Defensive Index rankings (at least, through August 18th as the metrics have not been updated to reflect the end of the season)
Here are the model's predictions for GG awards in 2019. 3 Players are listed as candidates ordered from highest model score to lowest in each position in each league.
1B:
Hopefully you see that the pool of candidates provided by the ML model makes sense. For details on the work and code that went into this work, please see my GitHub link below.
GitHub link to notebook:
https://github.com/lucaskelly49/Machine-Learning-Model-Predicting-MLB-Gold-Glove-Award-Winners/blob/master/Student_Final.ipynb https://github.com/lucaskelly49/Work-Samples-from-The-Pick-Off---A-Baseball-Blog/blob/master/Machine%20Learning%20Our%20Way%20to%20the%20Gold%20Glove%20Award/2019%20GG%20Predictions.ipynb Resources: https://sabr.org/sdi https://www.rawlings.com/site-content/gold-glove-selection-criteria.html |
AuthorThis blog is dedicated to baseball analytics and general baseball discussion. Archives
January 2021
Categories |