In a quiet offseason, any topic is good to write about. This post is about the Gold Glove (GG). The GG lacks the luster of the MVP and CY, but it still is relevant enough to encourage some healthy discussion (check this and this out). I will attempt to answer three questions:
As we see above, GG winners in 2006 were, on average, the 11th best in FP% and 10th best on Def. While FP%'s rank went continuously down until 2010-2011, it has then gone back up to the 10th - 12th rank range. Conversely, the average Def rank remained steady (at 10th) until 2013. However, for the last 3 years, Gold Gloves winners were the 6th best on Def on average, a 40% improvement. So it looks we are moving away from the Fielding percentage but, most importantly, from the "scout's eye test" (e.g. Torii Hunter, 2007) or just who's the best/popular hitter at the position (e.g. Derek J2ter in 2010). Now, some of this shift can be attribute to a 2013 agreement between MLB, Rawlings and the Society for American Baseball Research (SABR) to develop a mixed criteria to determine GG winners. The new selection process continued to have managers and coaches' votes on the driver's seat accounting for 75% of the votes, but it added the SABR Defensive Index (SDI) to improve the results. While the results might have been tilted by SDI, votes have had a shift that goes above and beyond SDI, probably into the voters' deeper understanding of the game and widely available data on fielding performance.
Now, let validate whether voters evaluate each position differently.
First basemen: 1B seems to be a position where FP% still holds large relevance. In the last 11 years only twice Def rank was lower than FP%, in 2014 and 2016. In 2014, though, both FP and Def valued Eric Hosmer's and Adrian Gonzalez's defense poorly. In 2016, Mitch Moreland and Anthony Rizzo took the GG home. Moreland was ranked 1st in both Def and FP% making harder to evaluate whether voter's change their mindset, Rizzo, however, ranked 7th on FP% and 3rd in Def.
Third basemen: 3B portraits a different picture from 1B. Voters appear to rely on Def as an input for defense performance. 2012 was the last year when FP% rank was lower than Def rank. And it was by a small margin. That year Chase Headley won the Gold Glove with San Diego, edging David Wright who had better FP% and his run value was twice as good as Headley's. Events such as 2007, when Beltre and Wright won despite posting average Def and FP% numbers, are outliers in this position.
Second basemen: 2B is one position where a shift can be easily spotted. First, voters are giving it to the best fielder. The height of the bars is trending down and that's a good thing. Secondly, Def has been lower than FP% in 5 of the last 6 years, with 2015 being the exception.
Shortstop: Even more so than 2B, SS has shifted towards Def than any other position. Interestingly, since 2013, voters seem to pay little attention to FP (or perhaps FP has little relation to Def?) as GG winners have ranked approximately as 2nd on Def but 8th on FP%, which is the opposite of what happened pre-2013. 2012's winners were an obvious choice for voters - Robinson Cano and Darwin Barney performed exceptionally well in FP% and Def and left no room for disagreements.
Outfielders: Just like with SS, there is a clear turning point - in this case was 2012. Since then Def has always been lower than FP%, by some considerable margin. Long gone are the days where poor defenders with high FP% are recognized ('08 Nate McLouth or '11 Nick Markakis, I'm looking at you)
I can't pinpoint exactly when it happened but the Gold Glove award's voters are embracing analytical defensive metrics more than in the part, in particular Def as highlighted in this exercise. Even though the sabermetric community has had defense analysis at the heart of their research, it has admittedly not progressed as quickly as in other areas. New developments made through Statcast make the future bright though. The incorporation of SDI in 2013 was definitely a boost, but more needs to happen for us to get to a system where the Gold Glove awards is not a popularity contest nor an award for players that do what scouts love but doesn't necessarily work.
Lastly, I will leave you with a table of Gold Glove recipients that did not deserve to win and some player that should have won based on what the raw numbers say.
All Def and FP% stats are from Fangraphs. Gold Glove data is from baseball-reference.
By Oswaldo Gonzalez
Randomness and circumstances are important driving forces in everything that happens in the world. Although they usually work hand in hand with our own actions and decisions, they have the ability to pick you up when you hit the jackpot at the casino, or throw you down when your car gets crushed by a falling tree (hopefully you’re comfortably sleeping in your bed when that happens). They can also be the difference between a pitcher having an average season on the mound, and having an outstanding one. Such is the case with the seasons Jon Gray and Kyle Hendricks had this year.
I’m not going to make the argument that these two pitchers performed equally well this season, with the main differences being random chance and circumstances, because they didn’t. Hendricks was the better pitcher; it just wasn’t the 2.48-run difference their ERAs show. The similarities between the two performances can be summarized in basically two stats. If we take a look at xFIP and SIERA (two important ERA estimators available at Fangraphs), Hendricks’ numbers of 3.59 and 3.70, respectively, are eerily similar to Gray’s 3.61 and 3.72. From there on, however, the numbers separate abruptly.
Much like Dr. Jekyll and Mr. Hyde represent the good and the bad within a person, Hendricks’ and Gray’s seasons represent two sides of the same coin. On the one hand, circumstantial factors and good fortune turned Hendricks’ very good performance into an historical season, while a different set of circumstances and some bad fortune turned Gray’s good performance into merely an average one. In this piece, we’ll take a look at the factors that influenced these diametrically opposed results.
I’ll start by saying that Kyle Hendricks had a remarkable and impressive season. He had an average strike-out rate (8.05 K/9), didn’t walk many batters (2.08 BB/9), and allowed very few long balls (0.71 HR/9), which resulted in a really good 3.20 FIP, which ranked 4th in the majors. His ERA, however, ended up all the way down to 2.13; a whopping 1.07 runs less than his FIP. Despite being a big difference, it’s not all that uncommon, as nearly 2% of individual seasons by starters in the history of the game have had an E-F (ERA minus FIP) of -1.07 or lower. Nonetheless, that difference is hardly sustainable through multiple seasons. In major league history, out of 2259 pitchers with at least 500 innings pitched, only two had a career E-F below -1.00, and both of them were full-time relievers (in case you’re curious, they are Alan Mills and Al Levine).
On the other side of the spectrum, Jon Gray also had a very solid season. He had an outstanding 9.91 strikeouts per 9 innings (that ranked him 9th among qualifying starters), an average walk rate of 3.16 BB/9, and a solid homerun rate (0.94 HR/9), lower than league average despite pitching half of his innings at Coors Field. His performance was good enough for a 3.60 FIP, but his actual ERA rocketed to 4.61. This 1.01 positive difference is just as unusual as Hendricks’ negative one, as about 2% of individual seasons throughout history have resulted in differences of 1.01 or higher. For visualizing purposes, here’s a table summarizing both pitchers’ numbers.
So the question still remains: what were the determining factors in these two pitchers having such an abysmal difference in results? Let’s dive right into it.
First of all, I decided to look at the correlation factors between E-F and a wide array of pitching stats, using data from every pitcher in MLB history with 500+ innings. As a general rule of thumb, a correlation factor between 0.40 and 0.69 indicates a strong relationship between the two variables. The following table shows the stats that had at least a 0.40 correlation factor with E-F:
Welp, that’s a pretty lame table. Keep in mind, I analyzed correlations for stats as varied as pitch-type percentages, pitch-type vertical and horizontal movements, and Soft, Medium, and Hard-hit rates, as well as K, BB, and HR per 9, or HR/FB%. None of those had even a moderate relationship with E-F. So let’s stick with the stats presented on the table.
The first two stats are really no surprise. FIP basically assumes league-average BABIP and LOB% to estimate what a pitcher’s ERA should look like. So, if a pitcher has a high BABIP, FIP is going to estimate a lower ERA than the actual one, resulting in a higher E-F; thus the positive correlation. On the other hand, if a pitcher has a higher LOB%, he’ll allow fewer runs than his FIP would suggest, resulting in a lower E-F. This explains the negative correlation shown in the table. The last stat, however, came as a real surprise, at least for me. ERA seems to be positively correlated with E-F, which means that pitchers with higher ERA tend to have higher E-F than pitchers with lower ERA.
The next logical step would be to determine which factors, if any, explain BABIP and/or LOB% among pitchers. Using the same pitching stats than in the previous step, I ran correlations with BABIP and LOB% separately. The following table shows the stats that had a strong (0.40 to 0.69) or moderate (0.30 to 0.39) relationship.
As was the case in the first table, both of these stats are correlated strongly with E-F, showing factors of 0.58 and -0.42, respectively. It doesn’t come as a shock either, that they are strongly correlated with each other. The negative correlating factor (-0.42) indicates, as you would expect, that a high BABIP leads to a low LOB%, and vice versa. On the BABIP side, a positive strong relationship with ERA is almost too obvious, as more balls in play falling for hits leads to more runs being scored. Also, since fly balls in play (not counting home runs) turn more often into outs than ground balls do, it makes sense that BABIP holds a negative relationship with the former, and a positive one with the latter. This fact, however, goes against a somewhat popular belief that ground-ball pitchers tend to have lower BABIPs.
The factors that correlate to LOB% are more interesting. The first one is not unexpected: a higher strikeout rate seems to lead to more runners getting stranded, and that’s a pretty easy concept to wrap your head around. The second one, however, is really mind-boggling, and I really can’t say I can find a reasonable explanation for it. It indicates that the higher the homerun rate allowed by a pitcher, the more runners are going to be left on base. It is quite possible that this is just a spurious correlation, having no causality at all. Finally, the last factor listed on the table is very interesting and useful in this particular case. It suggests that high percentages of soft contact lead to higher LOB%. We’ll get to that later on in this article.
So let’s go back to our pitchers and check if any of this makes sense. We know that E-F is mainly affected by BABIP and LOB%. Hendricks and Gray had very different numbers in these two stats. The Cubs’ righty had a .250 BABIP and a LOB% of 81.5, while the Rockies’ fireballer had .308 and 66.4%. Considering that the league averages were .298 and 72.9%, respectively, we can say that Hendricks did considerably better than average, while Gray did just the opposite. So far so good, right? These facts go a long way towards explaining the differing outcomes. However, BABIP and LOB% aren’t exactly pitcher-dependent; in fact, they’re the marquee stats for the generic term “luck”.
Looking at the stats from the second table, few of them help out in figuring this out. High strikeout rates, for example, are supposed to increase LOB%, but Gray still managed a really low 66.4% despite a 9.91 K/9. On the other hand, Hendricks’ 81.5% LOB ranked 5th among qualified starters, even though his strikeout rate of 8.05 was right around league average. Similarly, groundball percentage is shown to have a positive correlation with BABIP. Nonetheless, Hendricks’ higher-than-average rate of 48.4% (league average was 44.7%) resulted in a ridiculously low BABIP of .250, while Gray’s below-average rate of 43.5% came with a .308 BABIP. Almost the same thing happens when you look at the fly-ball rates.
The only factor from that second table that does make sense in these particular examples is soft-contact rate. Hendricks ranked 1st in this regard among qualified starters, with an impressive 25.1% (league average was 18.8%), while Gray had a below-average rate of 17.8%, which ranked him 50th out of 73 qualified starters. This stat is very much pitcher-dependant, and it does help explain some of the differences in LOB%. It has, however, a moderate relationship with LOB%, as evidenced by its factor of -0.37. Is that enough to account for the massive difference in the results? Intuitively, I’ll say no. There is one more factor, however, that we haven’t even discussed yet.
FIP stands for Fielding Independent Pitching, so the very thing that FIP is trying to subtract from the equation might hold the key to answering our question. Defensive performances can heavily influence the outcome of the game, and make up a big chunk of what we generally call “luck” in a pitcher’s final results. In order to have a numerical confirmation of this idea, I looked at the correlations between teams’ yearly defensive component of WAR and its staff’s BABIP, LOB%, and E-F. The data I used for this exercise was every individual team season from 1989 (the first year in which play-by-play data contained information on hits and outs location) to 2016.
We can see here that a team’s defense has a strong correlation with all three of the stats, especially E-F. Higher values of the defensive component of WAR lead to lower BABIP, higher LOB%, and lower E-F, just as you would expect.
Saying that the Cubs had a great defensive performance this year it’s an understatement. Not only was it the best defense in 2016 by a bunch, it was also the best defense of the last 17 years, according to Fangraphs’ defensive component of WAR. Of the 814 individual team seasons played in MLB since 1989, this year’s Cubs rank 8th. That’ll put a serious dent on opponents’ BABIP. In fact, Cubs’ average on balls in play of .255 (yes, that is the whole pitching staff’s BABIP) is the absolute lowest since the ’82 Padres. Oh, and also the Cubs pitching staff’s LOB% of 77.5% is tied for 2nd highest since 1989. All of this adds up to a team E-F of -0.62. Wow. Just wow.
The Rockies defense, on the other hand, wasn’t bad, but it also wasn’t great. According to Fangraphs, it was 17.9 runs above average, which ranked 12th in MLB. Again, that’s really not bad at all, just miles away from the 115.5 runs above average the Cubs had. The Rockies’ staff as a whole had a .317 BABIP, and a 68.0% LOB%; not unexpected from a team that plays half their games at altitude. Still, both of these values are worse than league average, resulting in a team E-F of 0.54.
All in all, Kyle Hendricks still had a better season than Jon Gray, and people will remember the 2.13 ERA and not the 4.61. This analysis just puts it a little bit more in perspective, and helps shed some light on the little details that make big differences in the course of a long season.
The old football adage says that “defense wins championships”. That doesn’t really apply to baseball, but in the future, when I think back to the 2016 Cubs, I’ll definitely think about their defense.
by Juan Pablo Zubillaga
With less than one month to go, the American League MVP race is very close. While usually nothing is set on stone in early September, during the last few years the AL MVP has been a two-man race (Mike Trout with either Josh Donaldson or Miguel Cabrera). This year, however, features five remarkable candidates: Mookie Betts, David Ortiz, Jose Altuve, Mike Trout and Josh Donaldson. Yes, I expect a few other to grab a few top-5 votes (e.g. Cano, Cabrera, Lindor and Machado) but I don't anticipate the award to fall outside those 5 players.
Let's look at the classic, old-school numbers first, which not only are sometimes referenced in casual conversations at local bars and pubs but also frequently (and occasionally unfortunately) followed by voters. I've plotted R, RBI, HR, OBP, SLG and SB as percentiles of the entire population. Let's take a quick look.
If you like well-rounded players, probably this year you're excited with Altuve, Trout and Betts, who dominate across the board. In an era where stolen bases keep declining, 20+ SB will get you to the 90 percentile. On the other hand, if you're into true sluggers, then the show Ortiz has put this season should be one to remember. However, then again, these metrics paint only part of the picture - they don't take into account when or where each event happened nor they include defense or base running on its most complete form.
Let's take a deeper look at WAR and a quick indicator for each batting, fielding and base running performance.
Obviously when we move away from batting, David Ortiz loses ground - he only contributes on one aspect of the game, and while he has been outstanding in the batters box, likely it will not be enough for him to win. When we adjust by park and league, we realize the Trout - Betts race for the best OF is not as close as I initially thought. Trout has quietly put a(nother) great season in an awful team (again)- he's already at 8.1WAR and a 175 wRC+ both easily leads the league. His defense is slightly below average at the best but he compensates by running extremely well. Altuve and Donaldson have had similar seasons offensively. However, Altuve having a down season in both defense and base running (remarkably low on Ultimate Base Running (UBR), which measures how frequently and effectively a runner takes an extra base via running). Betts drives his value largely from his defense, where he's settled in nicely as one of the best OF this year.
One the metrics I tend to assess when I look at awards is how performance was spread the entire season. I want an MVP to be someone that I rely throughout the year, not only during a hot stretch. Additionally, having a big month can really uplift the numbers and build up a misleading argument in favor of someone. Let's understand how wRC+ is split by month.
This picture to me is interesting for a couple of reasons. First, part of the argument on Betts' candidacy is that he's getting better, and delivering when it matters the most - in the middle of a pennant race. After a below average Mar/Apr, Betts has been a beast since July, when Ortiz cooled off a bit. Now, then again, Mike Trout has also followed an upward trending curve -peaking at 206 in August- and his lowest point is at 144, which is the highest of all lowest points in the sample. From my perspective, if everything else is equal, I'd rather have a Trout-esque curve than Donaldson's one, who has the highest single month wRC+ (213 in June) but also with the largest swing (118 difference between May and June). And then you have remarkably constant Altuve - with the narrowest gap between highest and lowest points throughout the season and at least 140wRC+ in any given month.
Now, most of what we have shown up to now is context-neutral. An argument could be made that every single game is worth the same, regardless of whether it's in April or July - what's really important is to deliver in key, high-leverage situations. There is where true MVPs show their full potential to influence a team and define its fate. As they say, a home run against a non-contender team when you are losing by 5 runs is not as valuable as a game-winning double against our wild-card rival's closer in the 9th inning. I'll admit neither OPS in high leverage situation or Win Probability Added (WPA) is the perfect metric to evaluate this, but they provide a very good proxy to how well they have fared in tough, game-changing situations. If you are not familiar with WPA, please click here.
Again we see the usual suspect - Mike 'King' Trout - leading not only this graph but the MLB with his 5.66WPA, closely followed by Josh Donaldson, and the only two player from this sample to have a higher OPS in high-leverage situations than in low-leverage ones. Interestingly, Boston's Betts and Ortiz OPS goes down to 9% and 15% respectively when the stakes are high. I definitely don't want to say that Altuve's 0.841 OPS in high leverage is bad, but I certainly want to recognize Donaldson's and Trout's clutcher performance.
Another way of looking at the MVP is to ask yourself: Where would that team be if that player wouldn't have been part of it? While in essence it is impossible to know for sure the answer, a nice proxy is to measure what percentage of WAR is that player responsible for i.e. what percentage share does this player represent.
Well, this is another way to see Mike Trout leadership on the field. Almost half of the Angels WAR have his Trout's name attached to it, which it's amazing. (For reference, the leaders on this table are Kris Davis and Marcus Semien with 122% (2.2 WAR each out of 1.6 Athletics total WAR). Now, Donaldson and Altuve have, too, a remarkably 33% and 35% of their total, but probably Betts falls short again with his 23%.
At the end, when all is said and done, it looks numbers indicate it should go down to Donaldson vs Trout race, just as it was in 2015. Ortiz has had an amazing season but his baserunning and defense (or lack thereof) limits his overall impact on his team. Betts' is definitely an exciting, 5-tool player but his performance hasn't been as good as Donaldson's or as consistent as Trout's. Additionally, Boston talent-loaded team reduces his value (Is this the opposite of Trout-Angels argument - how valuable can you be when your team would perform well, even if you're not there?) His future is extremely bright though. Finally you have Altuve, which may have a legitimate case but falls (a bit) short on overall performance to Donaldson and Trout. Houston has underperformed and arguably that's a worse outcome than Trout's, because we knew the Angels were going to be bad, but we thought the Astros would be better.
Last year, Donaldson built his case with a magnificent August, when he posted a 1.132 OPS and Toronto got to first place in the AL East. This year it was Trout who had a torrid August, but the Angels are not in the Wild card race. It surely seems to me as if we are measuring the MVP as a team award. Though I understand the rationale of having an MVP in a winning team, there is more to it. If I had a vote, and with still a few games away from the end of the season, I'd support Trout on his quest for his second MVP (as of today), but it looks momentum and narrative is gaining traction around Donaldson - who has posted much better numbers than his MVP season, Altuve - who brings new blood to the MVP discussion and might get an extra push if Houston makes it to the playoff, and Betts - who is clearly the face of Boston's extremely talented young generation. They, though, despite a great September, will post worse numbers than Trout. Yes, the Angels are a bad team - but to what extend is that Trout's fault? What else could he have done? When did 'valuable' translate into 'winning by himself beyond reasonable expectations'? When did we change this award to 'best player in the best team'? In 2012 it was Cabrera's Triple Crown and in 2015 it was Donaldson's 'ability' to get Toronto to the postseason for the first in many years. In 2016, Trout has been comprehensively better, avoid any deep slumps during the season, performed very well under pressure and shown that you can put counting stats in a bad team. We are running out of excuses this year.
By Oswaldo Gonzalez
Statistics are sourced from Fangraphs as of September 4th 2016 for qualifies hitters only
David Ortiz will retire at the end of this season and the Red Sox will probably miss him. Over his past 3 seasons David Ortiz has defied Father Time.
I ran a query for batters 38 years old or over and found 128 qualified seasons since 1950. When compared to Ortiz’s performance, you see how Ortiz have gotten progressively better. Ortiz have followed a non-conventional career path and his ageing curve looks skew to the right. This has not only triggered PED unsupported rumors but questions whether retirement should be postponed. While on average 38-40 years old are ~10% better than the average hitter (in part because only very good players get to accumulate enough at bats to qualify as this list is filled with Hall of Famers), Ortiz have been consistently 34%+ better, up to an insane, career best 178 wRC+ this year, which results on 2.1 Z-Score.
No player in history has had a higher ISO, OPS or wRC+ with at least 40 years old than Ortiz’s 2016 stats – a testament to bring back in 5 years when Hall of Fame conversations begin to happen and perspective allows us to ponder fairly what he did. Ortiz has been a top 10 hitter since 2009 by almost any metric and while it’d be hard to single out a root for his longevity (e.g. at bat pacing or offseason preparation), he certainly has shown us what physical and mental preparation for a high performer athlete looks like.
These are the top-22 seasons produced by 38+ years old, sorted by wRC+:
Statistics as of July 27th.
By Oswaldo Gonzalez
MLB Draft has passed but its impact will last. Some selections will go down as busts (e.g. Matt Anderson by the Tigers in 1997). Others will be real bargains such as Carlos Beltran with the 49th pick in 1995. I decided to look at the numbers in an attempt to answer the following questions I read over the last few weeks:
As I usually do, let's define the data sources and assumptions. First, my data source is Baseball Reference. There are many assumptions and disclaimers in this process, but the most important ones are:
Question 1 - How many Round 1 picks do end up in the big league? What's the average impact of a Round 1 compare to a Round 2? Are there differences between pitcher and batters?
The table below outlines how many players have been/were called up to the majors and how many actually have had a positive career WAR i.e. over 0.1. I have also added the average career WAR per player and I have broken down the data by round and by position (pitcher and batter) to grasp the differences easily. Just take a moment with this table:
Three things come to my mind:
First, this provides some empirical validation of what we intuitively thought: First round picks produce greater WAR values than the others. While I only have data for the first 3 rounds, it's worth noting that the gap between Round 1 to Round 2 (10%) is smaller than from Round 2 to Round 3 (41%)
Second, I actually found surprising that 67% of first rounders reached MLB at some point. That is 2 players out of 3 and it's a testament to how important are raw skills when it comes to moving up through the minors.
Lastly, the answer to the question of whether to draft pitchers or batters looks like an easy one. Batters not only reached MLB at a higher pace but delivered better results as a group and as individuals. While this results are not statistically significant, they provide a pragmatic answer to the question and suggest a sound strategy might be to draft batters and trade for pitchers later down the road.
Question 2 - What has been the best draft class for the 1993-2008 period?
This table should provide guidance on how to answer this question but does not fully explain it. If we think of it as the number of players that got to MLB, then 2008 is the best year. That year highlights Eric Hosmer, Buster Posey, Brett Lawrie, Craig Kimbrel and Gerrit Cole as the most prominent stars, but offers a very low career total WAR as most of its players are still playing - they're the youngest generation of my sample. In this class, 27 out of the top 30 picks have reached MLB, though a few for a very short stint e.g. Kyle Skipworth or Ethan Martin.
If we think of the highest total career WAR, then the winner is 2002. This class is led by two of the best picks on the sample (Zack Greinke and Joey Votto) but also features Prince Fielder, Jon Lester and Curtis Granderson. If we think of highest concentration of skills, then the 1995 class has to be the first one with an average of 11.83 WAR per MLB player. On the other hand, only 41 players got the MBL call, the lowest among the sample. While Carlos Beltran and Roy Halladay are the most notable names in that draft, player such as Darin Erstad, Kerry Wood, Randy Winn and Bronson Arroyo enjoyed nice peaks.
Question 3 - What teams have done a better job?
Evidently, not every team has selected in the same combination of draft slots e.g. some teams have had the opportunity to choose top picks (Rays, for example), while other have frequently picked from mid-bottom draft slots (Yankees). It would not be fair to compare total career WAR for players the Yankees has selected against those that the Rays has because the latter had more options and access to a different pool of players than that the Yankees had. How to fix that? I am comparing what each team did on the overall pick they were slotted. If we use 2016 as an example, I would be comparing how good Philadelphia was in choosing Mickey Moniak as pick 1 against the average of all other pick 1 in the time frame (1993-2008). Once I know the WAR gap between a particular team and the average WAR per pick, I need to standardise that number by the standard deviation i.e. calculating Z scores. In simple terms, this is understanding how good or bad a pick was in relation to the entire distribution of a particular draft slot. The Z-score number allows us to compare how good a pick 14th was in relation to a pick 3rd, for example. Finally, to identify which teams have fared better, I am calculating the average of Z-scores for all picks.
Again, there are many caveats here, but this should give us a ballpark estimate on how well teams have drafted from 1993-2008. Keep in mind, this methodology does not produce a linear WAR per draft slot. That means, for example, that overall pick 4 will produce greater WAR than pick 5. On average, the 4th pick has produced 6.21 WAR on average, while the 5th one has produced 14.26. While this might be counter intuitive (it is at least for me), the empirical evidence of this sample size shows that.
Perhaps surprisingly, the Phillies come at the top of the list. The Phillies advantage came in 3 picks: First, Chase Utley was drafted in 2000 with the high 15th pick and has had a great career that is up to 63.4 WAR. Second, in 1993, the Phillies chose Scott Rolen (70 career WAR) with the 46th overall pick - which seems like a bargain now. Finally, Randy Wolf in 1997 was selected in the 54th position and went on to have a 23.1 career WAR. The Nationals have had very much success on their first few years as a franchise with both Jordan Zimmerman and Ryan Zimmerman. The sample size do not include Bryce Harper or Stephen Strasburg, which may push the Nats to the top of the list in the near future.
Astros, Expos, Yankees, Cubs and Indians are the bottom 5 teams. Coincidentally or not, these teams have long drought (Yankees exempted). Interesting to see if there is a relationship between draft performance and wins but I guess that's is another post.
We could go and dig deeper for each team into what's they've done well and not so much but would not make sense. Teams make mistakes and it looks like the draft selection is pretty damn hard with an extremely high WAR standard deviation (11.57 WAR through the first 30 picks).
Question 4 - What is the best round (top 10 overall picks)?
This question is about finding the best selection on each of the first 10 picks. I have used the Z-score which pick was really ahead of the curve.
Well, this is quite a nice group of players. A-Rod is the WAR leader of our sample. Even as a first pick, which on average has yielded the highest WAR, he manages to be 3 standards deviations above the mean. Five other players are active and two of them (Greinke and Kershaw) still are among the best starting pitchers in the game. They will continue to cement their position as great draft picks for Royals and Dodgers. Interestingly enough, Barry Zito and Eric Chavez were part of the A's Moneyball team that frequently over performed a few years ago - a reminder of how important it is to build a strong core of players.
As a bonus question - these are the top 10 picks, according to this methodology:
âAs always, feel free to share your thoughts and comments in the section below or through our twitter account @imperfectgameb.
By Oswaldo Gonzalez
It seems that 2016 will be the year that Statcast begins to permeate Fantasy Baseball analysis. Recently there has been a wealth of articles exploring the possibilities of using these kinds of data. These pieces have provided relevant insights on how to improve our understanding of well-hit balls and launch angles. Also, they’ve facilitated access to information on exit velocity leaders and surgers, as well as provided thoughtful analyses to the possible workings behind some early season breakouts.
However, there is still a lot we don’t know about Statcast data. For instance, we are uncertain of how consistent these skills are over time, both across seasons or within seasons. Also we don’t know what constitutes a relevant sample size or when rates are likely to stabilize. All in all, this makes using 2016 Statcast data to predict rest of season performance a potentially brash and faulty proposition. Having said that, we can’t help but to try; so here’s our attempt at using early season 2016 Statcast data to partially predict future performance.
One of the early gospels of Statcast data analysis posits that the “sweet spot” for hitting homers comes from a combination of a launch angle in the range of 25 - 30 degrees and a 95+ MPH exit velocity. If this is indeed the ideal combination for hitting home runs, one could argue that players that have a higher share of fly balls that meet these criteria should perform better in other traditional metrics such as HR/FB%.
Following this line of thought we dug up all the batted balls under the “sweet spot” criteria, and divided them by all balls hit at a launch angle of 25 degrees or higher (which MLB determines as fly balls) to come up with a Sweet Spot%. In an attempt to identify potential HR/FB % surgers, we compare Sweet Spot% and HR/FB% z-scores (to normalize each rate) for all qualified hitters with at least 25 fly balls and highlight the biggest gaps. Here are the Top 5 gaps considering the games up to May 28th:
Calhoun seems like a good candidate for a power uptick. He has the third highest Sweet Spot% of 2016, has sustained similar Hard% and FB% to the previous two seasons, and plays in one of the most homer prone parks in the majors. Yet somehow he has managed to cut his HR/FB% to less than half of what he put together in either 2014 or 2015. More so, he has had some bad luck with balls hit in the “sweet spot”; his batting average in these kinds of balls is .500, whereas the league average is around .680. He is not killing fly balls in general, with an average exit velocity of 84.6 MPH, but if he keeps consistently hitting balls in the “sweet spot” range he should improve in the power department. Look out for a potential turnaround in the coming weeks and a return to 2015 HR/FB% levels.
Piscotty holds second place in the Sweet Spot% rankings. However, his FB% is very similar to what he did in 2015 whilst his Hard% is down from 38.5% to 32.5%. Lastly, he plays half of his games in Busch Stadium, which has a history of suppressing home runs. I would be cautious of expecting a major home run surge, but in any case Piscotty is likely to at least sustain his performance in the power department, which would be welcomed news to owners that got him at bargain prices.
Carpenter is another dweller of Busch Stadium, however his outlook might be a bit different. He is the absolute leader in Sweet Spot%. He is posting the highest Hard% and FB% marks of his career. Carpenter is also crushing his fly balls in general, with an average Exit Velocity of 93.7 MPH. Just as a point of reference Miguel Cabrera, Josh Donaldson and Giancarlo Stanton fail to reach an average of 93 MPH on their own fly balls. Lastly, he has had some tough luck with balls hit in the “sweet spot”, posting a batting average of just .420. Carpenter is already putting up the highest HR/FB% of his career, and he is a 30 year-old veteran of slap-hitting fame, but the power looks legit and perhaps there is more to come.
Denard Span and Yonder Alonso show up in this list not because of their Sweet Spot% prowess but rather due to their putrid HR/FB%. They barely crack the Top 50 in Sweet Spot%. They play half their games in two of the bottom 3 parks for HR Park Factor. Span is putting up his lowest FB% and Hard% rates since 2013, when he ended up with a HR/FB% of 3.4%. Meanwhile, Yonder’s rates most closely resemble those of 2012, when he had a HR/FB of 6.2%. Whilst their batting average of “sweet spot” batted balls is just .500, there is nothing to look here. In any case, their power situation looks to improve from bad to mediocre.
If you are interested in the perusing the Top 50 gaps between HR/FB% and Sweet Spot%, please find them below:
As always, feel free to share your thoughts and comments in the section below or through our Twitter account @imperfectgameb.
By Douglas Barrios
Back in college, I remember being fascinated by a concept I learned in one of the first chemistry classes I took: the atomic orbitals. Contrary to what I thought at the time, electrons don’t orbit around the atom’s nucleus in a defined path, the way the planets orbits around the sun. Instead, they move randomly in the vicinity of the nucleus, making it really hard to pinpoint their location. In order to describe the electrons’ whereabouts within the atom, scientists came up with the concept of orbitals, which, simply put, are areas where there’s a high probability of finding an electron. That’s pretty much how I see baseball projections.
A term that is very often used by the sabermetric community is “true talent level”, and just like an electron’s position, is a very hard thing to pinpoint. Projections, however, do a very good job of defining the equivalent of an atomic orbital, sort of like a range of values where there’s a high probability of finding a certain stat. I know what you’re thinking; projections are not a range of values. But you can always convert them very quickly just by adding a ±20% error (or any other percentage you consider fitting). So, for example, if a certain player is projected to hit 20 home runs, you can reasonably expect to see him slug 16 to 24 homers.
As a 12-year veteran fantasy baseball manager (and not a very good one at that), I’ve never used projected stats as a player-evaluating tool when I’ve gone into a draft. For some reason (probably laziness), I’ve mainly focused on “last year’s” stats, and felt that players repeating their last season’s numbers was as good a bet as any. This year, after taking a lot of heat for picking Francisco Lindor and Joe Panik much higher than what my buddies thought they should’ve been taken, I started wondering how much of a disadvantage was using a simple prior-year data instead of a more elaborate method.
To satisfy my curiosity, I decided to evaluate how good a prediction are “last year” numbers, and compare them to other options such as using the last two or three years, and using some projections publicly available. In this particular piece, I’ll limit the study to offensive stats, but I’ll probably tackle pitching stats in a second article.
The first step for this little research was to determine the criteria with which to compare the different projections. A simple way to evaluate projection performance is using the sum of the squared errors; the greater the sum, the worse the projection (in case you’re wondering, squared errors are used in order to make negative errors positive so they can be added, it also penalizes bigger errors more than smaller errors). In this particular case however, I wanted to evaluate projections for a number of different stats, so a simple sum of squared errors would have an obvious caveat in that stats with bigger values have bigger errors. For example, an error of 10 at-bats is a very small one, given that most players log 450+ of them per season. On the other hand, an error of 10 HR is huge. Additionally, not every stat has the same variation among players. Home runs, for example, have a standard deviation of around 70% of the mean, while batting average’s standard deviation is only about 11% of the mean. So, you could say that it’s harder to predict HR than it is to predict AVG.
Long story short, I divided each squared error by the squared standard deviation, and calculated the average of all those values for each stat. Finally, I converted those averages to a 0 to 1 scale, with 1 being a perfect prediction (in reality, these values could be less than zero when errors are greater than 1.5 standard deviations, but I scaled it so that none of the averages came out negative).
For this study, only players with at least 250 AB on the season were considered. Also, players that were predicted to have less than 100 AB were not considered, even if they did amass more than 250 AB on the season. The analysis was done on five different sets of predicting data:
1. Last season stats.
2. A weighted average of the two preceding seasons, with a weight of 67% for year n-1, and 33% for year n-2.
3. A weighted average of the last three seasons, with 57.5% for year n-1, 28.5% for year n-2, and 14% for year n-3.
4. ZiPS projections (Created by Dan Szymborski, available at Fangraphs)
5. Steamer projections (Created by Jared Cross, Dash Davidson, and Peter Rosenbloom. Also available at Fangraphs)
The following graph shows the average score of each of the 5 projections for each individual stat considered in this study. The graph also shows the overall score for each stat, in order to have an idea of the “predictability” of each one of them. Remember, higher scores indicate better performance, with 1 being a perfect prediction.
Other than hinting that it is in fact a very poor decision to use only last year’s data, this graph doesn’t tell us much about which predicting data has a better overall performance. It does provide, however, a very good idea of the comparative reliability of each stat within the projections.
Aside from stolen bases (which honestly surprised me as being the most predictable stat of the bunch), the three most reliable stats are the ones you would’ve expected: HR, BB, and K. They´re called “true outcomes” for a reason, they depend a great deal on true talent level, and involve very few external factors such as luck or opponent’s defensive ability.
On the other end of the spectrum, it’s really no surprise to find three-baggers as the least reliable stat. This may seem counterintuitive at first, given that players that lead the league in triples have a distinctive characteristic in being usually speedy guys. Nonetheless, 3B almost always involve an outfielder misplaying a ball and/or a weird feature of the park such as the Green Monster in Fenway or Tal’s Hill in Minute Maid’s center field, making triples unusual and random events. Playing time (represented in this case by at-bats) has also an understandably low overall score. Most injuries, which are a major modifier of playing time, are random and hard to predict. Also managerial of front office decisions can affect a player’s playing time. It does surprise me, however, to see doubles so far down in this graph, and I really can’t find a logical explanation for it.
Let’s move on now to the real reason why we started doing all this in the first place. Here’s a graph that shows the average score for each predicting data, for years 2013, 2014, and 2015. It also shows the three-year average score.
The one fact that clearly stands out in this graph is that last-year numbers are a very poor predicting tool. Its performance is consistently and considerably worse than any other set of data used. So my initial question is answered in a pretty definite way: it is a huge mistake to rely on just last season’s number when trying to predict future performance.
Turning our attention to the other four projections, it becomes a bit harder to separate them from each other, especially using only three years worth of data. The average performance of the three-year period gives us a general idea of the accuracy of each option, but looking at the year-by-year numbers, it’s not really clear which one is better. Steamer seems to be the winner here, since it had the better score on all three years. ZiPS, on the other hand, despite having a better overall score than the 3-year weighted average, it has a worse score in two of the three years. They were really close in 2014 and 2015, but ZiPS was considerably better in 2013, which interestingly, was a less predictable year than the other two.
The biggest point in favor of ZiPS when comparing against the 3-year weighted average is that ZiPS doesn’t actually need players to have three years worth of MLB data in order to predict future performance, and that makes a huge difference. Another major point in favor of ZiPS is that it’s doing all the work for you! Believe me, you do not want to be matching data from three different years every time drafting season comes around (I just did it for this piece and it’s really dull work).
After all is said and done, projection systems such as Steamer or ZiPS do a fine job of giving us a good indication of what to expect from players. We’re much better off using them as guidelines when constructing our fantasy teams than any home-made projection we could manufacture (unless you’re John Nash or Bill freaking James). I know next march I’ll be taking advantage of these tools, hoping they translate into my very elusive first fantasy league title.
by Juan Pablo Zubillaga
Adam Wainwright is the star of the St. Louis Cardinals pitching staff and one of the best aces in the majors. The righty has 121 wins, a 3.04 career ERA, 1,335 strikeouts and four top-three Cy Young finishes under his belt. In 2015 he started as expected, cruising. In 4 starts he managed to post an 1.44 ERA and a 2.05 FIP in 25 innings with 18 SO and just 1 BB, 8 XHB (35% of the hits allowed). When everything was looking promising for another dominant season, he suffered a ruptured Achilles tendon during a plate appearance against the Milwaukee Brewers on April 25th. This injury sent him to the disabled list until late September where he just got the chance to pitch another 3 innings.
Before the start of this season his name was part of lots of baseball discussions: Which Adam Wainwright should we expect? The ace? Or will he show declining signs due to the long ride on the DL, his 34 years and +500 innings in the last two seasons? The numbers speak by themselves: An 7.25 ERA, 4.87 FIP in 22.1 innings with just 9 SO and 10 BB, 13 XBH (45% of the hits allowed). A complete disaster if we compare this start with last April.
Those facts led us to the question: What is wrong with Adam Wainwright? Using the data sample of April 2015 & 2016 we will try to figure out the reasons behind this horrible start of the season and what should be the changes that could help Waino get back on track.
Pitch velocity and movement
The first reason that jumped to my mind was that he may be having trouble with the speed of the fastball or break of his nasty curveball. I went to Brook Baseball to check this values and compare April’15 with April’16.
Using the 4 starts of last year, Waino’s fast pitches were the four-seamer, the sinker and the cutter, averaging 90.33 MPH, 90.4 MPH and 86.4 MPH respectively. Contrary to my first hypothesis, the speed chart on 2016’s April did not show any significant variance averaging 90.83 MPH, 90.33 MPH and 87.06 MPH. If anything, he is throwing faster. What about the breaking stuff? During 2015 the nasty curveball and the changeup average were 75.36 MPH and 83.71 MPH, values that are really similar to what we have seen this year: 75.43 MPH and 83.54 MPH.
We can conclude with this data that the speed is not an issue, but what do the numbers say about the ball’s movement? All his pitches were showing very similar vertical and horizontal movement compared to last year data and the career normal of Adam Wainwright. These means that the first hypothesis has to be dismissed, the power on his fast pitches and the break on the slow ones is still there.
Location and control
Other potential cause of the bad start of the season could be the location of Adam’s pitches and his control of them. A good way to visually understand the location of his pitches is using a heatmap over the K-Zone. The darker the color, the biggest the frequency. To generate the great graphs that you can see below I used the pitchFX tool from Baseball Savant, posting side-by-side the career, 2015 and 2016 values.
The heatmaps really help to get quick answers. Let’s start with the four-seamer. We can clearly see that during this season the dark cluster is located up in the zone. Compared to his career profile Wainwright is locating the fastball higher than his typical zone, something that is not a good sign for a pitcher that only throw it at 90 MPH and depend so much on control to minimize damage.
The case of the cutter is similar: low control of the pitch. 2016 graph shows a problem locating this pitch in the strike zone. The career profile indicates that he likes to throw this pitch down and outside for RHB and down and in for LHB, something that have been difficult this season when the cutter is also falling high that normal.
In the case of the sinker I split the heatmaps between LHB and RHB since this specific pitch is used very differently by Waino depending on the batter handedness. Against lefties the heatmaps shows great that he is following his typical profile, so there should not be a problem. Meanwhile against righties Wainwright has been having troubles locating this pitch outside in the zone as he is used to. This year, lot of the sinkers against RHB has been located in the center of the plate many times, low in the zone, but still in an area than MLB batters can crush easily.
Exactly the same thing happens when we see the curveball graphs. Career data showed that he has been really successful hitting the low part of the strike zone, especially last year when this pitch was falling in the ideal place, just below the K-zone frame. But this year the story have changed. The curveballs has been located higher than ever, in the hitter power zone.
There is no doubt that Wainwright in this season is having a hard time controlling his pitches, especially falling up in the zone with the fast ones and right in the middle with the breaking ones. He is showing significant differences with his career profile that could be a direct cause of the bad start of 2016.
The speed and break are still there. The location no so much. So what about the approach to the at bats? It’s similar or he have changed it due the lack of control of his pitches? Let’s try to answer this question using data of his pitch mix and the results of balls in play comparing Wainwright’s career profile with the 2016 sample data.
As you can see on the table below, two things needs to be addressed: First, this season he largely ditched his sinker (-9%) in favor of more cutters (+8%) and curves (+4%). Second, the ground balls has dropped dramatically (-10%), leading to an increase in flyballs (+9%) and linedrives (+1%). Why such a change in Waino’s approach to the plate?
There are quick conclusions. The sinker is an excellent groundball pitch, so obviously if you use less sinkers, you get less groundballs. But as we saw in the previous section of the article, Wainwright is having tons of problems with the location of his sinker: the majority of this pitches stay on the hitter friendly zone, resulting in an increase of the linedrive percentage (+17%) and a .500 batting average of balls in play.
As if it were not enough with the sinker issues, the high location of his four-seamer is causing 18% more flyballs and 24% less ground balls. This critical situation left just one option of the fast pitch arsenal of Wainwright: the cutter. As his last resource he increase the use of it 8% and some results have been good. Is the only one that have an increase in groundball percentage (+4%) and a drop in flyballs percentage (-12%). Nevertheless the resulting average of balls in play is .400, so please don’t take this as a sliver bullet. Remember that we also point out previously that the control on the cutter has not been the best in 2016.
The other pitch that has been favored this season was the curveball. Although the % of wiffs has dropped from a career average of 17% to only 9% and the flyballs (+11%) have increased significantly, the opponents only average .118 against the curve. This is really impressive especially after we analyze the bad location of this pitch, but he keep using it since is the only pitch that is giving good results to Wainwright.
Even with a small sample of 2016 data we can drive some conclusions: The arm power and the movement on Adam Wainwright’s five pitches is still there. The long rest due the injury, the +500 innings from 2013 to 2015 and his 34 years do not seem to be a problem right now. The problem seems to be in the location of his pitches. The four-seamer high in the zone and he sinker in the middle of the plate has been destroyed by the batters, reducing the ground balls in a dramatically way and increasing the linedrives and flyballs.
Wainwright is clearly trying to make adjustments in order to reduce the damage. For now on his nasty curve is saving the day being his only effective pitch even when has been located in a dangerous zone. The cutter is not helping enough so his focus should be in taking back the control of the location of the pitches. In his last outing he showed some positive signs. Let’s see what happens in the next one against Arizona, if we get more of the ace or if he still struggles to get back to track.
On the previous post we discussed essentially two questions: First, whether there is a relationship between team payroll and wins. Second, has this relationship changed in time? If so, where are the peaks? Where are we now? Let's continue digging this topic up.
Question 3: Will money buy you a ring or a post-season ticket? If so, how much should we spend?
Let's start by saying that nothing will buy you a championship ring. But money can and will improve your odds! I'd say it can get your foot in the door.
The following graph shows the probability of reaching playoff, winning the American or National league or winning the World Series at the beginning of each season (BoS). I have split teams into 3 tiers depending on their payroll total each year. The low tier refers to the bottom 33% payroll total of all teams in a season, medium tier goes from 33% to 66% and top tier is the top 34%. Keep in mind I am analyzing data from 1976 to 2015, excluding 1994 due to the strike. I have also added to the graph below the expected probability for each event e.g. playoff appearance, league win and World Series win. The expected probability is the natural probability each team has at the beginning of the season; for example, each team has 1/30, or ~3.3%, chance of winning the World Series. In the long run, in a very competitive and balanced league, the numbers should be closer to the expected rates, however they are not.
Did you see that? Let's state the obvious first: Large payroll team had done better than the rest of the teams i.e. got to the playoff as well as reached and won the WS more frequently than low and medium tiers. Let's digest that again: top tier teams are almost 4 times more likely to reach the playoff than low tier. As we move along in the post season, as expected, high budget teams win more often. While in the rich teams got to the playoff at a ~80% better rate than expected, they won the WS at a ~106% better than expected.
Let's look at the tiny 0.3% of low-tier teams that won the WS. I should say team. I am talking about the Miami Marlins in 2003. They are the only low-tier team that has won the WS, since 1976. Amusingly, they beat the NYY.
Now, these numbers do not show the full picture because I am compounding the effect of being eliminated in the previous step of the event I am measuring. For example, you can't win the WS if you did not win your league. You can't win the league if you did not make it to the play-off. Let's dial back and think of the probability of winning the World Series once you are in the World Series. The same situation happens with the league championship probability. Let's calculate out of the teams that are already in playoff. The graph below shows the probability of winning at the beginning of each event (BoE). Does that make sense? I hope it does.
Let's go over each event, from left to right: First, playoff appearance probability remains the same as before. Mid- and low-tier budget teams reached the playoff with a lower probability than you would expect. The second bucket is related to winning the league (read: reaching the WS) once you are in playoff. For example in 2015 there were 10 teams in playoff (5 teams per league). The expected probability of those teams to reach the WS is 20%. With the inclusion of the Wild Card and then the second Wild Card, that number has decreased but historically sits on a 31%. While Top and mid-tier payroll teams have reached the WS more frequently than benchmark would suggest the difference is small and, interestingly, higher for mid-tier teams. It is important to notice that poor teams have a little more than half the expected chances of reaching the WS, once they get to the playoff. So even if you assume low tiers teams at this stage are good (they are in the playoff after all), they have performed considerably worse than the rest. This is a finding in itself.
If we move to World Series, the situation gets even tougher for low budget teams. Similarly to the League win breakdown, rich and mid-tier teams have performed better than the average, but in this case, rich ones have done slightly better than mid tiers. If we think about this, we would expect this result because two very good teams are facing each other - no matter how much they are playing their players. On the other hand, low tiers ball clubs have fared badly in this situation, accomplishing only one WS win (the aforementioned Marlins in 2003) in 7 attempts. It looks that their chances are reduced by ~71%. Again, remember we are talking about good/great teams playing the WS, but again and again have failed to deliver.
So I would like to highlight the findings so far in this question:
Question 4: Are there big spenders? If so, who are they? Have they changed over the years?
If you are still reading, I have reached my objective.
To answer this question I have plotted the average versus the standard deviation of the z-score for each team. I have also bucketed teams into 4 types of spenders e.g. High, Mid-high, mid-low and low. The table below shows the number of seasons per team with their payroll labelled as high, medium and low tier. Please take a look at those:
Please remember low tier refers to the bottom 33% payroll total of all teams in a season, medium tier goes from 33% to 66% and top tier is the top 34%.The answer to our first sub-question seems relatively straight forward. As you can see, there are 3 teams (NYY, BOS and LAD) who have been significantly above the pack, in terms of average payroll. The NYY has been a high tier payroll team in 39 out of 40 seasons. Boston and LAD have been in the top tier 33 and 28 times out of 40, respectively. These teams have big payrolls consistently and therefore are the truly big market teams. You may argue that NYM or LAA are big market teams and you would not be entirely wrong. They are definitely wealthy but payroll comparison shows they have not been in the league's top 34% payroll on at least 40% of the last 40 seasons.
I have also highlighted in red the teams that I have classified as low spenders. These are PIT, MIA, MIN, TBR and WAS. TBR has never been in the top tier, which is the lowest spender in the league followed by MIA - what is going on in Florida? You may argue that SDP and/or MON are low spenders and I would not try to persuade you to think otherwise. The line is thin but had to be drawn somewhere.
Another interesting insight is payroll variance. No team has been more consistent than STL or TBR. On the other side of the spectrum we have PHI and SEA. This is probably a reflection of how these organizations are run. Below there is a plot of accumulated payroll z-scores and win percentage (for the entire period 1976-2015). If you have been following baseball for a few years most of this resonates with you: CHC, SEA COL and NYM have historically been underperforming while STL, ATL, CIN and OAK usually find non-payroll related ways to win.
With the best fit-line information (Expected W% = 0.0296*Payroll Z-score + 0.4994), I have calculated the expected winning percentage (read: what 'should' have happened) and compare it to what actually happened. This will quickly allow us to identify good performers over the 40-years period. In essence, in the table below I am highlighting what teams are furthest away from the dotted line in the graph above.
We have one last question to discuss in this post and it is whether deep pocket teams have changed over time. I think by now you know the short answer to this is 'yes, they have' however the truth of the story lies in the details. I partly addressed this question with the Standard deviation of the Z-scores before, however I would like to share a view of how this picture has evolved by decades.
I sliced teams into 4 categories. First there are the downward spenders. It is interesting how some teams e.g. MON, MIL, CIN and PIT moved from mid-high payroll spenders to (very) low ones. It looks as if they re-shifted their spending priorities in the mid-80's and have stuck with that strategy since. The second bucket (Swingers) is teams that have swung between high and low payroll tiers, depending on how the wind blows. Teams such as CLE, PHI, MIN, COL or DET are here. The third group (Upward) is comprised of those teams who have progressively moved into the upper tier e.g. SEA and WAS. These are big city, relatively new franchises that have not had on-field success. Finally there is a group (Keepers) that have remained constant on payroll spending. These are the likes of NYY, BOS, LAA, LAD, SDP, MIA, TBR.
In summary, it looks like money matters since the relationship between payroll and wins is weak but statistically significant. However, the influence of payroll is not as big as we may originally have thought. Money definitely influences which teams go to the post-season i.e. post season chances are directly proportional to payroll, but once a team is in the post-season, payroll predictive power goes down i.e. it does not pay off to over-invest in payroll (did you hear that Theo?). Thus there seems to be a diminishing returns curve during the season as the value of $1 extra in payroll changes depending on where you are in the curve. Ideally, a GM wants to spend just enough to get his/her team to the playoff because, after that point, the field is more leveled, raising the question of whether move those resources should be allocated to other areas e.g. manager, front office, player development. I guess that's part of another post.
As always, feel free to share your thoughts and comments in the section below or through our Twitter account @imperfectgameb.
By Oswaldo Gonzalez
Money in baseball has been an infinite source of criticism. In MLB, there is no salary cap as in other major sports, and luxury tax is relatively recent. Media has made us believe that the small fish (e.g. Small market teams) will always be eaten by the big one (e.g. Big market teams). The Kansas City Royals’ performance during the last couple of years, along with the tricky and often misunderstood Moneyball concept, has brought back salary to the newspapers headlines even though it is safe to say the Royals were not even a low-end payroll team. In any case, this post is an attempt to see if popular beliefs regarding money, power and on-field performance pass the numerical test.
There are many interesting questions related to this topic. However I will limit myself to the following ones during two posts:
My assumptions and caveats are the following:
Without further talk, I will get to it.
Question 1: Is there a relationship between payroll and wins? If so, how strong is it?
To answer this question, I found the correlation between yearly payroll and winning percentage for every individual season played from 1976 to 2015. Because payroll values have changed so much in 40 years, I used z-scores or standard score, which allows us to compare different seasons, regardless of payroll differences. A payroll number on its own does not mean much and should be compared to the pool of teams on a yearly basis i.e. it is the distribution of payroll in the league what matters. Here’s a link in case you are not familiar with the concept of z-scores, please keep in mind that correlation does not imply causation.
A couple of interesting insights can be drawn from this graph. The first one, quite obvious, is there’s a positive slope there, implying that more money affects wins positively. The second point, though, is that payroll alone does not wholly explain the total number of wins. We inherently knew that. In 40 years, we are able to find teams that satisfied each situation: low payroll teams that were awful (Houston 2013), low payroll teams that played over 0.600 win percentage (Oakland 2001 and 2002), high payroll teams that unperformed (Boston 2012) and high payroll teams that exceeded expectations and went on to win 114 games (NYY 1998). There is a mid-tier team that did extremely well (SEA 2001). These are all outliers, though people can (will?) use every one of these cases to support a preconceived idea e.g. “baseball is a sport and it is attitude and effort what matters”, “money will buy you handshakes at the end of each game”, “big money teams won’t win because they lack camaraderie”, etc. Therefore, let’s focus on the big picture.
The third point I’d like to highlight is the R-Square. The R-Square measures how successful the fit line is in explaining the variation of the overall data on a 0-to-1 spectrum. In this case R-Square is 0.1905 so it looks ~19% of the total variation in Wins can be explained by the linear relationship between payroll and wins. Also, the slope of the best fit line is 0.0302. This means for a 1 unit increment in Z-scores, there is a 0.0303 win percentage increment. Remember z-scores increments are not linear e.g. going from -0.5 to 1.5 requires a different amount than moving from 2 to 3.
However, the potential drivers behind the total number of wins are complex (injuries, roster construction, plain luck, etc.) and the R-square, along with the F-test and P-value, shows that money matters but seems to be overrated. Again, remember that correlation does not imply causation.
Question 2: Has this relationship changed in time? If so, where are the peaks? Where are we now?
We have established that team payroll can predict win percentage with a low confident level; however has that always been the case? Was money more important in the 80’s than now? The following graph shows the R-Square value for every 2-year period from 1976 to 2015. It is important to keep in mind that the higher the R-square value, the stronger the relationship between payroll and winning percentage.
The answer to our question of whether the relationship has changed over time is definitely yes. There are noticeable peaks and valleys. There have been 2 periods (which I highlighted in green) when money was a better predictor of winning percentage: from 1976 to 1979 and from 1996 to 1999. The first period corresponds to the first four years of free agency. Team owners flooded the league with new money as they went after key players e.g. Mike Schmidt or Reggie Jackson, and payroll increased drastically (60% in 1977, 34% in 1978), as shown below. These have been largely documented (here, here and here)
The second period (1996 - 1999) is linked to NYY, BAL (though they dramatically underperformed in 1998), CLE and ATL successful expenditure (read: lot of won games) and to the lack of Cinderella stories (perhaps only Houston in 1998 and Cincinnati in 1999). This period was also characterized by, firstly, league expansion sequel: Tampa Bay and Arizona joined the league in 1998 and, understandably, underperformed. Secondly, MLB revenues year to year growth averaged 17% from 1996 to 1999 (not adjusted), so probably teams redirected that surplus to the salary pool. Lastly, in the late ‘90’s, MLB was increasingly becoming a rich team game. The graph below will show the payroll coefficient of variation for the 1976 - 2015 timeframe. This number, which I will call payroll spread, is simply the standard deviation divided by the mean. This number allows us to quickly assess how spread is the payroll across the league over time. Do you see the trend after ~1985? By 1999, this number had increased continuously for almost 15 years and MLB has had enough. As the power money increased AND the gap widened, MLB commissioned the Blue Ribbon Panel to come up with initiatives to level the field A.K.A. a revenue sharing program to increase competition. Entertainingly, the correlation of money and winning percentage has decrease steadily but the payroll spread has remained pretty much consistent. I am hesitant to attribute the decline in R-Square to the Blue Ribbon Panel or to other factors (read: is this coincidence?).
If we go back to the yearly payroll and winning percentage correlation graphs, you’d notice that I highlighted two periods in red too - from 1982 to 1993 and from 2012 until last season. Those were moments when the correlation of salary power and winning percentage was remarkably low. The first period seems to be closely related to the collusion MLB crisis (check out this link as well). The lowest point was in 1984-1987, when the correlation was only 0.03 and the salary spread was 0.22.
The 2012-onwards period has brought down R-Square to a 20-year low (0.06 in 2012-2013). While TV revenue keeps rising, baseball landscape has changed and new variables are in the mixed. There is a redefined revenue sharing model, we have analytically-inclined organizations, an extended Wild-Card system and international signing – all these factors have added more complexity to the winning equation, effectively diminishing the relationship between payroll and winning percentage – even with the salary spread still at ~0.40. We are living interesting times in baseball indeed: If investing money in players doesn’t lead to better on field results, where do teams need to invest e.g. analytics, managers or front office?
As always, feel free to share your thoughts and comments in the section below or through our twitter account @imperfectgameb.
By Oswaldo Gonzalez