Kamis, 14 April 2011

The Wonderful World of Stats

Okay, so over the past couple decades, baseball statistics have evolved into monstrous calculations based on correlations, fielding independent averages, and statistical trends.  For some reason, the mainstream media thinks all of you are too dumb to understand these statistics and hasn't put forth why they can be so useful for understanding how your favorite teams and players are doing.  So, we're going to dwell into the wonderful world of statistical analysis.  Sort of a transition through history to help people better understand what they heck we're talking about when we say a guy has a great LD% or OPS+ and why the heck those stats are important.

The Basics:

These are the stats you've probably heard of before.  Since a lot of them are really self-explanatory, I won't bother with giving any kind of detailed description:

1) Wins/Losses- This is the decision given to a pitcher (stupidly) after a game is concluded.  First, if a pitcher wants a win, he needs to go 5 innings at a minimum.  If he exits with a lead and the team never surrenders the lead, he gets a win.  If he exits with a deficit and the team never rallies to overcome it, he gets a loss.  If, at any time, after the pitcher leaves the game is tied, he is given a "No Decision."

2) ERA- This is a pitcher's earned run average.  Basically, an earned run is a run that scores without an error occurring that leads to it (so hits, walks, wild pitches, etc are earned runs).  The important thing to remember with ERA is that it is an average over 9 innings pitched.

3) WHIP- The stat is the formula.  The number of Walks + Hits divided by the number of Innings Pitched.  Essentially, this is the number of batters a hitter allows to reach base in the "earned" way each inning (there are exceptions, but that is the general idea)

4) K/BB- This is a pitcher's strikeout to walk ratio.  Basically, you want to have your pitcher strike out a lot of guys so the ball isn't put into play, and you don't want to walk batters.  This just gives you an idea of how good he is at doing those things.

Now that you know these four, you might be asking what a "good" stat line for a pitcher would be.  Well, that's a vague term, but I'll give what I would consider to be a "good" stat line for a National League pitcher (adjust the stats a little higher for the AL because of the DH):

12 wins/ 3.80 ERA/ 1.25 WHIP/ 2.5 K/BB---This statline is good enough to be "good", but won't make the pitcher among the NL's best pitchers.

Batters (known as the "Slash Line"):
1) BA- This is a player's batting average.  Very simply, it is the ratio of the number of hits a hitter gets to how many at bats it takes him to get those hits (an at-bat is any plate appearance that results in a hit, out, error, or fielder's choice).  This tells you how good the batter is at getting hits.
2) OBP- This is a little more telling than BA.  On-Base Percentage tells you how often a player reaches base in the "earned" way.  It is the decimal form of the percentage of the time that he is getting on base in this way.  Walks + Hits+ Hit by Pitch all divided by plate appearances (any time you step to the plate).
3) SLG- This is Slugging Percentage.  This stat tells you how "far" a player is going on the base paths every time he has an at-bat.  SLG is based on Total Bases, which is just the number of bases you earn.  A single is 1, double is 2, a triple 3, and a home run is 4.  To get a SLG, you divide the number of total bases by the number of at-bats.

What's good?

.270 BA/ .350 OBP/ .425 SLG

Now that you've gotten some general stats that we might use, let's look at the more intermediate-level stats.  In our previews, you won't see us use Errors, Assists, and other basic fielding stats, because we are really against using them, because they aren't very telling.   Let's look at some intermediate stats that can really help people analyze their favorite players:

1) ERA+...Wait, you just put a plus sign at the end?  Yeah, but that plus represents a lot of things.  ERA+ is an adjusted measurement of ERA.  ERA+ has the goal of adjusting a pitcher's ERA to that of the league average and the parks the pitcher is playing in.  It occurs on a percentage scale, and 100 is perfectly average. So, if a player has a 112 ERA+, he performed 12% better than the average pitcher in his league given his park environment.  Many people like to throw this in along with the raw ERA, because it's sometimes easier to adjust for oneself.
2) BABIP ...BABIP is used in an attempt to measure the "luck" factor.  Say a pitcher has a really low ERA, and people have no idea why.  The explanation is that his Batting Average on Balls in Play is probably really low.  This means that he is recording a large number of outs, despite a hitter's ability to make contact off of him.  A pitcher with a low BABIP will likely have a low ERA, and a pitcher with a high BABIP will have a higher ERA.  This is one trend that tends to normalize over a larger sample size, so if a pitcher has an abnormally low BABIP, expect that to start climbing along with his ERA.
3) Hit Tendencies ...These are important in knowing what a pitcher is likely to do in the future.  Giving up a lot of line drives is bad, giving up fly balls can lead to homers, and giving up ground balls tends to result in the least amount of damage.  Ground ball pitchers tend to have high BABIP values, but low ERA values, simply because they get hitters to pound the ball into the ground.  Flyball pitchers may have really low BABIP values, but really high ERAs, because they can give up more homers.

1) OPS+...Same concept as ERA+.  You adjust a hitter's OPS to his league and his park.  However, be wary of this statistic.  If two guys are close in OPS+ values, take a gander at the individual OBP and SLG values.  Having an OBP-heavy OPS tends to correlate with scoring more runs, because you're on base more often.  A point of OBP is worth more than a point of SLG.
2) BABIP...Yes, you want to know what this is for both hitters and pitchers, because the "luck" factor knows no boundaries.  BABIP can be associated with "fluke" performance, but once again, be wary.  A contact hitter will have a higher BABIP, because he puts the ball in play more often.  A ground-ball slap hitter will tend to have a higher BABIP, a fly-ball power hitter will have a weaker BABIP, and your Albert Pujols line drive hitting machines will have normalized BABIP, because he will hit the ball all over the place.  The biggest thing about offensive BABIP is that it is nice to know how often the hitter is making contact, so look at his contact rates and plate discipline.
3) ISO...Screw singles, we want to know how much POWER a guy has.  Take the singles out of the SLG equation, and you have ISO.  For example, Jose Bautista, who led the league in HR, had an ISO of .357, which is better than the SLG for a good number of the league's players.  If a guy has a high ISO, he's getting a lot of extra base hits.

Okay, so you've seen some intermediate stats that aren't correlations.  You're simply seeing some adjusted and isolated statistics.  With a good knowledge of the first two tiers of statistics, you can probably show up most of your friends in class or your buddies over at the water cooler.  Well, it's time to bring you up to an advanced status.  These are the statistics sought by scouts, GMs, and SABR fanatics across the nation.  If you bring these up at the water cooler, you might want to invite people to sit down and discuss this over lunch.

1) xFIP..You might want to check out FIP first, but this is the adjusted version of that.  We are now in the spooky world of fielding independent statistics.  Basically, we are taking that pesky game-saving or day-ruining defense out of the equation.  Focus on what pitchers can control the most: where the ball goes and the patterns he throws certain pitches with.  We want to know, given a league-average defensive performance, how the pitcher did.  So, we look at his K rate, BB rate, Hit-By-Pitches, and Home run rates.  After all, these are the things that a defense can't control.  xFIP will show you how a pitcher would have done if you put a group of robots in the field programmed to be average in every way on defense.
2) tERA/tRA...Okay, now we're getting pretty crazy with correlations.  Take xFIP and account for the correlations for ground balls, line drives, fly balls, and popups and suddenly you have created the monster that is tERA (true earned runs average).  The simplest way to describe this is that you are trying to take the kind of balls in play a pitcher gives up, account for K rates, BB rates, HBP, and HR rates, and then normalize all of that around a league average defense.  For some people, this is just where the line is crossed.  Don't worry, I feel you on this one.  It's possible to simply look at BIP types and come up with a logical conclusion, but if you get lazy, tERA does a good job.
3) WAR...Okay, so now we've gotten so far into a discussion that you're splitting hairs with your buddies and you want a good way to simply account for...well...everything.  That is what Wins Above Replacement tries to do.  If the definition of MVP holds true, the player with the highest WAR will win it every year.  WAR's basic idea is to try to account for how much better a player is than a "replacement player" at his position.  WAR is not an average, but a cumulative stat.  You can add and lose points of WAR.  For example, Alex Rodriguez once had a WAR over 11 (he was 11 wins more valuable than the replacement shortstop).  Other players have put up negative WAR values, which means that your team would be foolish to keep playing him at his position (time to bring up the rookie in this case).

1) wRC+...So now you want to know how many runs a player in your offense is creating.  Well, adjust your OBP to a WOBA (weighted on-base average), then adjust that to the number of runs you have created (RC), and weight that against your peers.  Yes, this is a very complicated process and is not one you want to have to explain to everyone, so get the links ready.  wRC+ is a lot like OPS+.  You want to weight the number of runs you have created against your peers.
2) Plate Discipline...This will allow you to call "bullcrap" on your friends if they say a guy has a "good eye" or a "bad eye" at the plate.  This is all about whether or not you swing at the pitches you should (and whether or not you hit them when you do swing).  If a guy has a high SwStr%, that mumbo-jumbo means he is swinging and missing at a lot of pitches (the league average was only 8.5%), so if you think guys are swinging and missing on a lot of pitches, you are caught in the moment.  This will help you break down how disciplined a player is.  Just remember one thing: you don't have to swing at everything in the zone, and you don't want to.
3) WAR...Yep, this goes both ways.  You want to know how much more valuable a hitter is than his AAA partner who would replace him should he go down.  Guys like Pujols put up WAR values between 7 and 10. Legendary players have gone as high as 12 in any individual season.  On a 25-man roster, one guy solely accounting for 10 wins is pretty dang valuable.

Now, I for one don't consider any fielding metric really anything but advanced.  Most fielding metrics are very complicated and are based on things that most people won't understand.  Let's look at a few of them:

1) UZR...Basically, split up your field into 9 equal parts and measure how well your defender holds his ground.  UZR takes into account almost every aspect of defense.  Throwing arm, range, double-plays, and errors are all taken into account.  You want to know the basics: how well a guy can get to the ball, how accurate and strong his throwing arm is, how often he's able to make multiple outs, and how often he makes mistakes.  I personally am not a huge fan of UZR, but it's a good way to get a general sense of how good a fielder is.
2) RTOT...Not a really good link for this one.  Go to baseball-reference.com for more details on it (at the bottom of the page).  Go to any player page, and you'll find it under the defensive metrics.  Basically, this is an attempt to find out how many runs a player prevented above the average defender with his defense.  With defense, the best way to go about things is probably to read a scouting report, because defensive metrics can be pretty misleading.
3) +/-...This is a system developed to tell how many good plays a player made versus how many bad plays he made.  This system is actually run by scouts (professional ones) in order to tell how good a player was.  This really is the best way to tell how good a defender is.  Pick up a copy of the Fielding Bible, and you'll be able to own the dudes down at the water cooler.

Any questions....well, ask fangraphs.  They know a lot about this statistical nonsense.

