Thursday, June 5, 2014

Serious Problems with WAR



Right now, as of the afternoon of June 5, 2014, the best team in Major League Baseball is the San Francisco Giants. They are 38-21, good for a .644 winning percentage and the best record in all of baseball. According to Fangraphs, the Giants are also the owners of the 19th-highest total of Wins Above Replacement. Judging by their performance, they are the 19th-best team in baseball and the third-best team in their own division. I wondered how this could possibly be the case, and discovered a few interesting, and troubling, facts about WAR -- Wins Above Replacement, the most advanced of advanced statistics, the golden stat that attempts to boil down a player's entire contribution to his team's winning into one meaningful number.



Let's first take a step back and try to figure out what WAR is, and specifically, what team WAR is supposed to tell us. WAR, as we've discussed before on this blog, essentially tries to tell you how many more wins a team will have because of a given player's contributions versus a "replacement-level" player, who is described as a minor league veteran, or someone who would basically be paid the minimum as a fill-in on a Major League roster. According to Fangraphs and Baseball-Reference.com, this level is defined mathematically as such: if a team consisted solely of replacement players, that team would tend to end the season at around a .294 winning percentage. An everyday Major League player should gain at least 2 WAR in a season.

The first problem with WAR on Fangraphs is what I mentioned in the first paragraph: team WAR does not equate to team wins. It's actually not even particularly close. I put together a chart that compares a team's real winning percentage with a team's War-adjusted winning percentage.




For the record, I could not FOR THE LIFE OF ME figure out how to label each data point so that it showed the team that each corresponded to. I tried for like over an hour. It was very frustrating. If anyone knows a way to do this, let me know. Great.

ANYWAY, this is what I did. I took each team's winning percentage in real life, and that was my X axis. I then figured out each team's WAR-adjusted winning percentage. I did this by adding Fangraphs team WAR valuations, and then adding that to the number of games that each team would have won by now if it were made up entirely of replacement-level players (this entailed taking the number of games that each team had played and multiplying by .294).

The blue line represents a perfect correlation between WAR-adjusted winning percentage and real winning percentage (a line with a slope of 1). If each team's winning percentage and WAR-adjusted winning percentage were equal, then all points on this plot would fit on that line. Perfection is a lot to ask for, and something one should not expect, but it's good to see what the ideal would look like.

The black line is the line of best fit for the scatter plot. The slope of that line is 0.4034. The relationship between what real winning percentage and WAR winning percentage is much closer to random (line with a slope of 0) than perfect. That's not a good sign. It also confirms what we know from the case of the Giants: a team's actual wins and WAR wins are barely related, if they're related at all.

This is a troubling fact. Though we are only one-third of the way through the season, this relationship isn't even close to what it is supposed to be. It begs the question: what is going into this WAR calculus? How valid is it? Are we measuring the right things to determine what each player's and team's contribution is to winning? At least for this year, this plot would tend to say that we are missing the mark by a pretty big margin.

A second fact, less troubling though still striking, about this plot is that WAR would dictate that almost every team should have MANY more wins according to their WAR. There are only two teams whose actual winning percentage is lower than their WAR-adjusted winning percentage (the Giants and the Marlins). In fact, according to WAR, every single team should be above .500 this year in baseball. What does that tell us about WAR?

Firstly, if this were the only problem, it would be forgivable. WAR is useful when compared across players and teams, not in isolation. As long as the ratios are all correct, one can compare numbers that are out of whack. As we see, however, the ratios are definitely not all correct. But generally speaking, this isn't a kiss of death for WAR.

Secondly, this might give us a hint into how we are evaluating WAR. What are we overvaluing so much, to the point that even the Astros are expected to have a .511 winning percentage? If we can isolate the aspects that go into WAR that make expected win totals balloon, perhaps we can get to the bottom of this other issue.

One might argue that WAR is not meant to be taken so literally: a WAR win is not the same as a regular win. It is an independent statistic that is useful in other ways. My response to that is: what are we measuring, then? Are we measuring how much a player contributes to actual winning (y'know, the point of sports), or are we measuring arbitrary stuff that we like? If WAR is not based in actual wins, then it's not what it ultimately claims to be, and absolutely should not be taken so seriously, or seriously at all, as a statistic.

One might argue that WAR is a measure of how teams should be performing, sort of in a vacuum, based on their intrinsic talent. Expecting a perfect correlation between WAR wins and actual wins is not helpful. I agree with that, but also would retort that WAR wins and real wins should at least resemble each other. If the correlation is not 1, then it should at least definitely be above .5, and probably closer to .8 or .9. A .4 correlation, as we see here, is really almost a random correlation. I have a hard time believing that teams are performing so colossally out of line with their intrinsic talent level, to the point where the two variables almost have nothing to do with each other.

This might be a small sample size. Fortunately for us, Fangraphs has data on team WAR for each year. Let's look at the full year of 2012 (I don't want to keep harping on 2013 and seem like a homer after all). Our second problem is not an issue with full-season totals because Fangraphs calibrates team WAR to reflect win totals at the end of the year (not sure what goes into this calibration, so I'm not sure how I feel about it). We should, however, be able to see if our first, main issue holds true in our full-year analysis.



MUCH better! Our correlation is now 0.75. This, of course, came after Fangraphs's "calibration" to reflect actual win totals. PHEW! WAR is safe. Maybe.

We will return to this issue before Fangraphs calibrates its WAR win totals to actual win totals at the end of the year. Perhaps our two-month sample size is insufficient to yield acceptable results. Check back in September when we dive back into this issue to see if WAR is great or awful. Could be either.

No comments:

Post a Comment