How Can Value Add Be Right If WAR, ERA, and QS Are Wrong?

The Associated Press
The Associated Press

When critics at the New Republic and the Baseball Think Factory (BTF) attacked Breitbart Sports’ Value Add Baseball, the responses ranged from vitriolic (see comments 84 and 10 comparing Breitbart getting into baseball to a communist plot and indicating Andrew Breitbart burns in hell) to very constructive criticism that led to improvements in the new 2.0 Version.

While we responded to the New Republic immediately, I did not see the BTF hit until much later so my response there starts at comment 82. However, with the release of Baseball Value Add 2.0 this week, here are the key points:

1. Game-by-Game Analysis: How Value Add Can Be Right and WAR, ERA and Quality Starts Wrong?

Pitcher A throws six innings and allows three earned runs in all 36 starts. Pitcher B gets bombed for nine earned runs in three innings in nine of his 36 starts, but in all the others allows only 1 earned run in 7 innings. Who is more valuable?

According to ERA they are equal (4.50), Quality Starts indicates Pitcher A is better (36 to 27), and WAR calculates both are equal pitchers worth 1.2 wins to their team. All three are wrong. The winning percentage for each Innings Pitched-Earned Runs table for all 7000+ games since Opening Day 2014 indicates Pitcher B’s team would actually win four more games than Pitcher A.

Win% per SP’s IP, ER 0 1 2 3 4 5 6 7 8 9 10 11
less than 3 IP 40% 46% 40% 33% 13% 17% 10% 4% 10% 0%   0%
less than 5 IP 67% 47% 31% 33% 16% 13% 11% 8% 8% 0%
5 IP 68% 65% 51% 37% 31% 17% 10% 6% 7%
6 IP 73% 66% 58% 47% 28% 24% 11% 19% 0% 0%
7 IP 86% 78% 60% 42% 37% 17% 5% 0%
8 IP 89% 78% 63% 60% 20% 43% 0%
9 IP 98% 87% 83% 67% 100% 0%
All Starts 82% 73% 56% 43% 28% 18% 10% 9% 8% 0% 0% 0%

Bold numbers show a fluke fluctuation that would even out over time, while italicized numbers indicate there were 10 or fewer occurrences. 

This is why we went to the trouble of calculating the starting pitcher’s rating in each of 7,114 games (click here) rather than simply calculating the cumulative stats of the 253 pitchers to determine ERA or WAR or just crediting a pitcher with Quality Starts.

The reason is that for a pitcher it is the 2nd, 3rd and 4th earned run allowed in a given game that are much more damaging than the others. The one run Pitcher B gives up in most games hardly hurts him. It only lowers his chance to win from 86% to 78%.  In the 4th game he has already pretty much lost by the time he has given up four runs in three innings (17% chance of win), so the extra few runs he gives up hardly lowers his chance any more.

Pitcher A loses most of his games because he givies up the damaging 2nd and 3rd run every game.  He only enables his team to win 47% of the time despite every game counting as a Quality Start (6 innings, 3 earned runs).

The only reason I’ve had things written about me in ESPN, Fox Sports and Sports Illustrated is because I was able to take the principle of WAR and apply it to college basketball, so no one is a bigger WAR fan than me. It just doesn’t work as well on Starting Pitchers as it does everyday players and relievers who have a smaller impact on many more games.

Support Neutral Won-Loss records did address the need for game-by-game analysis, but it does not appear the versions I saw over the years accounted for opposing line-ups, ballparks etc., and it did not put the value into the WAR terminology that let’s us compare starting pitchers to other players based on additional wins.

So how could I write that Value Add could be more accurate than WAR for starting pitchers? Precisely because of cases like this where WAR is four games off of the truth when calculating that Pitcher A and Pitcher B would give a team the same number of wins (1.2) over a replacement pitcher.

2. Most Valuable Player vs. Best Player

Some critics are open to the need for the game-by-game analysis for starting pitchers, but disagree with giving a starting pitcher credit or blame for winning or losing the game. I disagree, but to give those critics an alternative Value Add 2 now runs a separate calculation for the BEST player based on the game-by-game basis below but from the perspective of how easy the pitcher’s performance made it for his team to win the game.

This gives us three calculations. The fourth column gives the “Value Add” calculation, the sixth column indicates “How Good” the pitcher was on a game-by-game basis, and the final column indicates how good the pitcher has been per game pitched.

Johnny Cueto was first in all three calculations even before his complete game shutout with 11 strikeouts and two hits allowed last Tuesday. David Price is the 2nd most valuable while he has been the 4th best pitcher including playoff appearances, and the 9th best per game, thus “David  Price (2,4,9).”

Corey Kluber has been the 7th most valuable player but has been the second best pitcher. A few pitchers such as the injured Yu Darvish has not had enough starts to make the top 50 in the first two categories, but he is averaging being worth 0.20 wins per game in his 22 starts, which would have him on pace to be the 6th best pitcher in baseball if he had pitched all of 2014 and the first half of 2015:

Pitcher 2014/15 Value, Best,Per Game GS Vict Value Add? Team (s) How Good? Per Game
Johnny Cueto (1,1,1) 49 31 16.36 Cincinnati Reds 10.69 0.22
David  Price (2,4,9) 52 34 14.84 DET 7.60, TB 7.23 9.92 0.19
Felix Hernandez (3,6,12) 51 33 14.37 Seattle Mariners 9.19 0.18
Max Scherzer (4,3,3) 50 34 14.21 DET 9.50, WAS 4.70 10.08 0.20
Zack Greinke (5,11,27) 50 31 12.26 Los Angeles Dodgers 7.71 0.15
Lance Lynn (6,17,46) 50 29 11.89 St. Louis Cardinals 6.12 0.12
Corey Kluber (7,2,2) 51 26 10.68 Cleveland Indians 10.35 0.20
Chris Sale (8,8,4) 41 23 10.25 Chicago White Sox 8.22 0.20
Jon Lester (9,12,25) 49 29 10.24 BOS 6.24, CHC 2.43, OAK 1.56 7.58 0.15
Dallas Keuchel (10,9,15) 46 29 10.00 Houston Astros 8.13 0.18
Clayton Kershaw (11,5,5) 46 31 9.77 Los Angeles Dodgers 9.20 0.20
Chris Archer (12,7,14) 50 27 9.50 Tampa Bay Rays 8.85 0.18
Jorge De La Rosa (67,10,16) 45 23 4.65 Colorado Rockies 7.81 0.17
Yu Darvis (87,53,6) 22 13 3.50 Texas Rangers 4.37 0.20
Chad Bettis (111,131,7) 10 7 2.39 Colorado Rockies 1.98 0.20
Masahiro Tanaka (30,21,8) 30 21 6.78 New York Yankees 5.92 0.20
Eduardo Rodriguez (136,143, 10) 8 5 1.58 Boston Red Sox 1.49 0.19

I realize some would be happy if Value Add simply measured the “best” pitcher and did not worry about holding pitchers accountable for whether their team wins or loses by calculating their “Value Add.”

For years the Won-Loss record dominated pitcher evaluations even more than ERAs. This was wrong, but too many in the baseball analytics community threw out the baby with the bathwater by making winning irrelevant in their calculations. When in fact Bill James reminded us years ago that winning is not only important it is the gold standard.

Starting pitchers, quarterbacks, and hockey goalies are unique in that they can account for 1/3 up to close to 1/2 of the results in their games. And a player with that much control should partially be evaluated based if he can get a win.

The key is to measure the success against how tough an obstacle the quarterback, goalie, starting pitcher, or occasionally even a dominant basketball player, had to overcome for the wins. The fact that Stephen Curry’s Golden State team beat LeBron James’ Cleveland Cavaliers for the NBA title four games to two does not make us conclude that Curry played a better series than James.

We give Curry credit for four wins and James credit for two wins. Then we look at how much either player had to overcome for wins. Curry had the support of one of the greatest shooting teams ever assembled in the NBA, so his four wins were much easier to come by. James had to overcome that team after losing the other two All-Stars on his team—meaning his degree of difficulty was so much higher than Curry’s that his two wins make us conclude he had a better series than Curry and his four wins.

If LeBron had sulked through a 4-0 sweep, then it is quite likely we would have concluded Curry was the better player in the series.

Part of the standing of Joe Montana and John Elway is based on them driving their teams down the field against all odds and winning victories. We don’t just treat “the Drive” and “the Catch” as just a few more completed passes in a long career.

So a key component of Value Add Baseball is giving a pitcher the credit for a Victory when he pitches and the team wins, unless he fails to last five innings (the average starter gets a victory in 47% of his starts games, while 47% of the time his opposing pitcher gets the Victory and in 6% of games neither gets a Victory because the pitcher on the winning team fails to last five innings).

Easy Victories. As with Curry and James, we realize some victories are much hard to come by than others. For example, the Baltimore Orioles have provided great support to Chris Tillman and Bud Norris and the two have delivered by getting victories in almost 60% of their starts.

Tough Victories. Aaron Harang and AJ Burnett only managed victories in 40% of their starts in large part because both had to spend part of the past two seasons getting almost no support from the Phillies.

We need to know how many victories the same replacement player would have gotten in the place of these four pitchers to determine how many victories each pitcher really was worth above that replacement player (precisely what WAR is supposed to measure).

The factors measured in the Value Add formula include:

  1. Runs scored by the pitcher’s team
  2. Runs allowed by relief pitchers who come in after him
  3. Any RBIs or runs scored by the pitcher in a game in an NL Park (this is not in the spreadsheet provided due to the logistical issues running the program on that many games, but will be added)
  4. Unearned runs allowed by the pitcher’s defense
  5. How much the pitcher’s defense helped him or hurt him based on how many outs they were able to record relative the balls that were put in play off his pitches (see explanation under sub header 3 below)
  6. An “ERA Needed” to win the game based on above, then
  7. How good the line-up the pitcher faced
  8. The ballpark in which the game was played
  9. Whether or not an AL opponent had to sit their DH due to playing in an NL park, or if an NL opponent was able to add a DH when playing in an AL park
  10. An “Adjusted ERA Needed” to win the game based on steps 7 to 9, and
  11. The percent chance that a replacement pitcher would have won the game giving steps 1 to 10, and finally
  12. Whether or not the pitcher was able to get a “Victory” for his team winning (if a replacement pitcher would have had a 33% chance to win and the pitcher won, he gets a +0.67 for a win, and if not he gets a -0.33).
  13. Over the course of the two seasons, we do see that even though Norris and Tillman had many more victories (54 to 40) than Harang and Burnett, Harang and Burnett have been much more valuable because a replacement pitcher would have just under 27 victories in place of Harang and Burnett, while they could have achieved just under 45 victories in place of Tillman and Norris
  14. In 21% of games teams give their starting pitcher no chance to win the game (a 0.00 ERA Needed to win), so in those cases the pitcher automatically gets a Value of 0.18 as long as he completes five innings to not drain the bullpen
  15. In 14% of games, the pitcher gives his team almost no chance to win the game by not lasting five innings, so in those cases he gets a -0.50, the worst rating a pitcher can get for one game.
Pitcher 2014, Playoffs & 2015 GS Victories Victory % How Valuable Rnk
Aaron Harang 50 20 40% 6.63 32
A.J. Burnett 50 20 40% 6.60 33
Chris Tillman 52 30 58% 5.24 54
Bud Norris 41 24 59% 3.90 77

The reason for the harsh criticism in the links above was that in 2013 I noted that while Clayton Kershaw was far and away the best pitcher in baseball, it was a close call on him being the most valuable.

From May 26 through the end of the season, the Dodgers scored either two or three runs seven times with Kershaw on the mound. In all his other starts that year, Kershaw had an unbelievable 1.52 ERA. If he had done that those seven times the Dodgers needed a gem the most, we would have expected the Dodgers to go at least 4-3 if not better for huge tough wins.

Instead, in those seven games his ERA was twice as high at 3.02, and the result was the Dodgers were only 1-6, about what they would have done with any decent pitcher on the mound.

The point I made was then was that Pat Corbin was 6-5 when Arizona gave him 1, 2 or 3 runs en route to getting 21 Victories in his first 25 starts, so to that point he had been every bit as valuable as the much better pitcher Kershaw.

Contrast Kershaw’s 3.02 ERA in tight games to the best performance of any pitcher in 2014.

Last July 4, the Value Add 2.0 system gave Kershaw the best “How Good?” mark of the season, 0.70, as he barely allowed a runner in the ultimate hitters’ park (Coors Field has a 1.39 factor for enabling 39% more runs than other parks in an average game).

However, his “How Valuable?” rating was just 0.23, because the Dodgers won 9-0 and would have therefore had a 77% chance of winning even with a borderline replacement player starting in place of Kershaw.

The next day Jon Lester turned in a more valuable performance when he allowed no walks or earned runs in eight innings and overcame two unearned runs to beat Baltimore 3-2. His “How Good” rating was great but nowhere near Kershaw’s (0.42 to 0.70), but his performance was more valuable (0.99 to 0.23) because the Red Sox would have had only a 1% of winning with a borderline replacement pitcher.

For those who believe Kershaw should get more credit than Lester for that weekend, they can just focus on the “How Good?” rating. The problem with saying you cannot give a pitcher less credit for an easy 9-0 win is that it means you cannot give the same pitcher credit when he pulls out a tough win against the odds.

The league wins just 27% of the time when scoring one-to-three runs, and only seven out of 253 pitchers have been able to get their teams wins in those games more than half the time. Lester has won 10 of 18 (56%) with one-three runs of support. Adam Wainwright is not in the overall leaders’ list because he was injured after four games this year, but he has given the Cardinals an incredible 9-4 mark (69%) in games over the past two seasons with wins by 1-0 three times, 2-0 three times, 3-0 two times, and 2-1 once. His ability to allow one run in those nine starts adds tremendous value, and shows why it is perfectly legitimate for the Value Add system to measure this tremendous value for getting victories in very difficult situations.

With 1,2 or 3 runs of support Victory Total Win%
Adam Wainwright 9 13 69%
Masahiro Tanaka 7 12 58%
Henderson Alvarez 9 16 56%
Jon Lester 10 18 56%
Shane Greene 6 11 55%
Johnny Cueto 9 17 53%
Felix Hernandez 14 27 52%
David  Price 8 16 50%
Chris Archer 7 14 50%
Gerrit Cole 6 12 50%
Kyle Hendricks 6 12 50%
All Pitchers 2014 and 2015 785 2936 27%

As noted above, Value Add Baseball measures more than just the offensive support, but this gives an idea of the toughest of wins.

3. Credit/Blame Defense for More Than Unearned Runs

One legitimate question raised in response to Value Add 1.0 was why the only defensive calculation was for unearned runs. This led to additions in Version 2.0.

The best thing a pitcher can do in most cases is strikeout the opposing batter so that the only help he needs is the catcher catching the strike. A pitcher can also do a few things that hurt him without giving his defense any chance (home runs allowed, walks, hit by pitch).

When the pitcher allows a line drive, it turns into a hit 69% of the time, while a grounder or fly ball turns into a hit less than 25% of the time based on a Fangraphs table here.

Occurance How often Batting average Value Add Anticipated Outs
Ground Ball (GB) 44% 0.239 0.79
Line Drive (LD) 21% 0.685 Unfortunately cannot distinguish from FB in box score
Fly Ball (FB) 35% 0.207 0.69 (includes LD, FB and subset of IF)
Infield Fly (IF) 11% very low (a subset of Fly Balls)

We know a ground ball should result in an out .761 percent of the time, and when you add the extra outs from double plays and other outs that do not happen on a fly ball or ground ball (out on the base paths due to pickoffs, etc.) we end up expecting the defense to record about 0.79 outs per the number of ground balls the pitcher allows in a game.

The problem is that I don’t believe we can do an accurate version of Defense Independent Pitching Statistic (DIPs) on a game-by-game because we cannot break out line drives and infield flies, so we have to lump them all together with fly balls. Obviously home runs are never caught, so take them out of the equation, just as we take the pitcher’s strikeouts out of the total number of outs (e.g. a pitcher who completes 6 innings including 5 strikeouts had 13 outs recorded some other way).

The math to make the total outs recorded in all 7000+ games add up to innings times 3 minus strikeouts yields an estimated 0.69 outs recorded non-grounders put in the field of play.

So if a pitcher allowed 10 fly balls (not including homers) we expect 6.9 total outs, and if he also induces 10 ground balls we add another expected 7.9 outs meaning on average his defense should produce 14.8 outs. If he pitched 7 innings (21 outs) and had 3 by strikeout, then the rest of the defense recorded 18 outs.

That means the defense appears to have done a good job by supporting him with 3.2 more outs than expected. Keep in mind that this does not mean all 18 outs actually came on a ground out for fly out. Some could have come from runners being thrown out stealing, or trying to take an extra base, etc.

A team that had 10 grounders and 10 non-home run fly balls in just five innings with three strikeouts (12 – 14.8 = -2.8) would be judged to have a weaker game with almost three fewer outs recorded than expected.

While that is the estimate, we only credit or blame the defense with that figure x 0.1 runs to be conservative, so the first team is credited with saving the pitcher about 0.32 runs, while the second defense is blamed for hurting the pitcher by -0.28 runs.

We will tweak this number, and if we end up able to incorporate a breakdown of how many balls put into play are line drives then we may weight the defensive factor as a heavier component in line with DIPs.

4. Non-Constructive Criticism

Obviously none of this explanation or fine tuning of Value Add Baseball will do anything to gain acceptance of those who hurl non-constructive criticisms. When one of the truly harsh critics submitted a comment attacking me for not using DIPS or some other system two years ago, I remember going back to look up the original writing on the method the critic held out as the perfect way to do things. I laughed reading old writings from the original author (it may have been Voros McCracken but the old link is now lost so I may be wrong), which read something along the lines of “Why is everyone attacking me for my simple little way of measuring things.”

Some things never change.

There will need to be a version 3.0 and 4.0 of Value Add baseball as I look for ways to hone some of the issues such as a reliever giving up runs in a mop up role, an easier way to program pitcher’s offensive ratings, and truly developing the “best” rating which still makes some use of the original Bill James Game Score (see Society for American Baseball Research—SABR piece—on winning percentage based on Game Score written almost two decades after its invention). To get the Game Score to truly correspond to the likely winning percentage we needed to make adjustments such as never allowing a pitcher to receive a higher Game Score than (Innings Pitched x 10) + 10). A pitcher who gets bombed early does not do enough bad things calculate just how little a chance he has given his team, but an Adjusted Game Score of 23 for only going 1/3 of an inning is a good building block even before considering opposing line-up, ball park and how good the defense was behind him while he was getting shelled. For example, a pitcher who gives up five straight home runs in a pitchers’ park without recording an out has truly almost doomed his team in about the worst performance possible, while a pitcher who gives up 5 runs without recording an out on seven straight grounders through the infield single in Coors Field cannot quite get the full blame, since a team can come back from down 5-0 in Coors Field and you would think no matter how hard all seven grounders were hit that someone should have gotten to at least a couple of them to help him out.

Certainly you can argue over the best system for determining the best pitcher (see CBS Sports comparison here), but if we want to truly follow the logic of WAR, the ultimate goal is to determine who many WINS a player gives his team ABOVE what a borderline REPLACEMENT would do.

I anticipate some will charge that game values are too high and should be closer to zero, particularly since I stated a pitcher can control 1/3 to 1/2 of the game whereas the scoring allows them to get well over half of the credit for an extremely dominant game or be blamed for the full half a loss for a terrible outing preventing a Victory. While it is true that overall pitching is less than half the game over the course of a season, the bell curve of the actual game can give him more than half the credit or more blame. This is because nine individuals in a line-up are going to tend more toward the middle in a given game (four may have great games, two strike out every time, and three go one-for-four at the plate). The overall batting contribution reflected in their WAR ratings will fall toward average when you add up all nine players, while one hot or cold pitcher can have a much more extreme impact on the game for better or worse.

If we truly want to measure the value of pitcher by figuring out the chance of a team winning if a replacement pitcher took the mound in place of him every time out, then the Value Add should be determined by measuring the chance of a replacement player getting a win that the player in question either achieved or failed to achieve. A long time ago Bill James reminded us that Wins are the gold standard, and to that end the top Value Add pitchers are the truest attempt at this measurement whether or not the most valuable pitcher happens to be the best pitcher in a given stretch or not.

But if nothing else, go ahead and click on the google sheet to look over the starts of all 253 pitchers in one sheet, and draw your own conclusions.

COMMENTS

Please let us know if you're having issues with commenting.