I know I have not been around much, but I've become huge in sabermetrics and compiled some interesting data. I posted this to fangraphs and reddit and thought you guys would like to see exOUT% In the early 21st century, Oakland Athletics’ General Manager Billy Beane revolutionized baseball forever. He was the first general manager in baseball to heavily utilize sabermetrics in his baseball operations. This isn’t a history lesson though, I bring him up because of his idea that outs are precious, and as a hitter your goal is to not make out, thus him prioritizing OBP so heavily. In the following years, baseball statistics have seen phenomenal progress on both offense and for pitchers. While I believe FIP and xFIP are both very useful statistics in really measuring a pitcher’s skill, my problem is that they essentially ignore all the batted ball data that we have (GB%, FB%, LD%). SIERA and tERA have solved some of these problems, but are far from perfect, and I believe the more statistics we have, the better. As I mentioned with Beane, while we largely focus on a hitter’s ability to not make out, we still don’t have a catch-all statistic to realize how effective pitchers are at getting batters, because if the batter’s goal is to not make out, the pitcher’s goal is to get the batter out. So I present to you expected out percentage, or exOUT% (the name is certainly a work in progress). exOUT% sets out to answer a simple question: For any plate appearance, what is the likelihood that the pitcher will get the batter out? This can easily be found by just looking at a pitcher’s opponent OBP, but that is rather primitive, and we can get a better estimate by focusing more on pitchers’ skills to strike people out, not walk batters, and the type of contact they are giving up, and also trying to negate the effect of the defense by him, by just using league averages. So to calculate a pitcher’s exOUT%, I used K%, BB%, GB%, LD%, FB%, lFFB%, and 2014 league averages on ground balls, line drives, and fly outs. (HBPs are essentially ignored but can certainly be incorporated in a future version, this is pretty exOUT% v .01) I want to give full disclosure, I am not a statistician or close to it. Math and statistics are an area of interest and I am currently pursuing a degree in math-economics, but I am far from a professional, so I recognize there are going to be errors in my data, this is an extremely rough version, there’s even a combination of data from this year and last year so there will be inconsistencies, as I don’t have the resources to gather all the data I need. If after reading this, you are interested in this and would like to take this further, please feel free to contact me if you have the skills necessary to advance this further (or even if you don’t). I will first post a simple step-by-step breakdown of how to calculate exOUT%, and then get into more detail and take you through it with Clayton Kershaw, because well, he is awesome. 1- Add K% and BB%, subtract this percentage from 100%, this leaves you with a balls in play%, let’s just say BIP% 2- Multiply the pitchers GB% (make the percentage a number less than 1, for example 40% is .4) and BIP% (leave it between 1 and 100, ex 40%), this gives you a GB% for all PAs, not just balls in play, we’ll call this overall GB%, or oGB%… now multiply this percentage (in between 1 and 100) times the league average percentage of ground balls that don’t go for hits (league average is .239 on ground balls in 2014, so out percentage on ground balls is 76.1%, but make it .761…. this will give you a percentage you can leave between 1 and 100, if the number is 20%, that means that there’s a 20% chance that pitcher will induce a ground ball out that PA, assuming league average defense, we can assume this because we’re using the league average for batting average on groundballs… we’ll call this exgbOUT% 3- Now follow the same steps but with LD%, exldOUT%, the percentage chance for any given PA that the pitcher will produce a line drive out. (the league average on line drives last season was .685 (!) so that means there is a 31.5% chance a line drive will result in an out) 4- Same thing with FB%, sort of, because we also want to incorporate IFFB%. So multiply a pitcher’s FB% by their IFFB%, this gives you the percentage of balls in play that the pitcher produces an infield fly ball (bipIFFB%). Multiply this percentage by their BIP% to get his overall percentage of PAs that result in an infield fly, and this will also be their exiffbOUT%, because any infield fly ball should be converted to an out, and if not, it’s to no fault of the picture, so we won’t punish him. Next subtract a pitcher’s IFFB% from 1 or 100, whatever, and this is their balls in play percentage of fly balls that are normal fly balls, to the outfield. Multiply this number by their BIP%, this gives you the overall normal FB% for a pitcher, not just balls in play. Multiply this number by .793 (the league average on fly balls in is .207, so there’s a 79.3% that a fly ball will result in an out). This number is the percentage chance that for any given PA, the pitcher will produce a fly ball out to the outfield. Add this exnfbOUT% (n for normal) and his exiffbOUT% and you have his exfbOUT%, the percentage that for any given PA, the pitcher will produce a flyball out, to the infield or outfield. 5- Add K% + exgbOUT + exldOUT + exfbOUT 6- You have your exOUT% The terms are not that technical or scientific so I don’t confuse anyone, tried to simply a very complicated procedure as much as possible. To clarify and give you an example, let’s go through Clayton Kershaw Kershaw profiles like this (I compiled this data on 8/21): 32.3 K%, 4.9 BB%, 52.8 GB%, 26 FB%, 11.8 IFFB%, 21.2 LD%. So let’s look at the balls that don’t go in play, strikeouts and walks. Add the two and balls not in play percentage is 37.2, 4.9% are walks and thus won’t be an out, and 32.3% are strikeouts so will be an out. Thus far, Kershaw’s exOUT% is 32.3 (of a possible 37.2 so far) Now let’s look at the balls in play. People will usually say that a pitcher can’t control what happens when a ball is in play, but I vehemently disagree, the type of contact the pitcher gives up can’t be ignored and large effects what will happen to the ball in play. I will quote a fangraphs article here to explain it, “Generally speaking, line drives go for hits most often, ground balls go for hits more often than fly balls, and fly balls are more productive than ground balls when they do go for hits (i.e. extra base hits). Additionally, infield fly balls are essentially strikeouts and almost never result in hits or runner advancement.” And fangraphs also gives us this data from 2014. GB: AVG- .239, ISO- .020, wOBA- .220 LD: AVG- .685, ISO- 190, wOBA- .684 FB: AVG- .207, ISO- .378, wOBA- .335 So this means that fly ball pitchers are most likely to get outs, although they may be less effective because when they don’t get outs, it’s more trouble than ground ball pitchers. But remember, this statistic is just finding the chance that the pitcher will get a hitter out. Alright so let’s calculate Kershaw’s exgbOUT%, exldOUT%, and exfbOUT%, you can follow the numbers along with the steps I listed above. GB%- 52.8 62.8 x .528 = 33.1584 (33.1584 x .761)= 25.23354424 exgbOUT LD%- 21.2 62.8 x .212 = 13.3136 (13.3136 x .315) = 4.193784 exldOUT FB%- 26 26 x .118= 3.068 bipIFFB% 26 x .882= 22.932 (bipFB%) 62.8 x .22932= 14.401296 (onFB%) 14.401296 x .791= 11.3914251 exnfbOUT% 62.8 x .03068= 1.926704 oIFFB% and exiffbOUT% exnfbOUT% + exiffbOUT% = 13.3469317 exfbOUT%, if you followed my math exactly a decimal may be off, like 13.31 something, but this is the number the excel doc chugged out, so I’m trusting that, my iPhone calculator can’t carry all the decimals sometimes. Now add them all up 32.3 + 25.23354424 + 4.193784 + 1.926704 + 11.3914251 = 75.07% K% + exgbOUT% + exldOUT% + exiffbOUT% + exnfbOUT% = exOUT% The league average exOUT%, using league average statistics from 2014 for the ones involved, is 69.8%. Scherzer leads the majors (well the 89 pitchers I was able to export data from fangraphs) with a 76.43 exOUT%, if you want to look at it as a more concise and better version of opponent OBP, his is .236, so, you know, good. Here is a picture of the data,(http://imgur.com/um42PkI) for the top 37, the J column is what you are looking at, Betances is in their because I wanted to calculate one reliever. Alright, I’ve explained it a bit in the prologue, but now that you’ve seen it, let me explain more why I like this stat. Well first, I created it and calculated, so, well, yeah… but I also like this stat because it answers a very simple question “How good is a pitcher at getting people out?” Pitching in its simplest form, is exactly that, getting people out. The stat recognizes that there’s basically only these outcomes for an at bat: strikeout, walk, ground ball, line drive, fly out and looks at the pitcher’s stats in these categories to determine how many people he should be getting out. The stat is more predictive than evaluative in nature, because you can calculate a pitcher’s actual out percentage, but that doesn’t nearly tell the whole story, because a lot of luck is involved with balls in play, and other fluky outcomes. This operates under the basis that a ground ball will perform the way the average ground ball does, a line drive performs the way an average line drive does, and a fly ball behaves the way a typical fly ball does. There could be guys getting very fortunate with ground balls: having a great infield behind them, balls not squeaking through the holes; with line drives: being hit right at people; and fly balls: staying in the park, having outfielders who cover a lot of ground. And there could be guys who are getting unlucky: the ground balls are getting through the holes, the infielders don’t have range; line drives seem like they are always going for hits, and fly balls are falling in. This says that a pitcher can’t control that, but they can control how much they strike out people, how much they walk people, and how often they give up ground balls, line drives, and fly balls, and if these balls in play behaved the way they should, the pitcher should be getting this percentage of people out. I will address the flaws I have found with it. As much as getting people out is important, sometimes what happens in the plate appearances that don’t end in outs are almost as important. This only deals in average regarding balls in play, but wOBA is very important too. Fly balls are more likely to be outs than ground balls, but the wOBA on fly balls is over 100 points higher on fly balls too. Additionally, I’d prefer instead of ground balls, line drives, fly balls, to use soft contact, medium contact, hard contact, because that is a truer test of pitcher skill, however, I did not have this data at my disposal as far league averages on what the batting average is for soft contact, medium contact, hard contact (if someone does, please contact me like I said). So what I have for now will do and this batted ball data is still a good measure. I sat out to calculate what percentage of batters a pitcher should be getting out, and that is exactly what I found out. So while it’s not perfect, it has it’s use, and it’s something to build on exRP27 And build on I did, while the out percentage is nice, it doesn’t give us a measure like ERA or FIP or xFIP, that tells us how many runs a pitcher should be giving up. So using the data I used to calculate exOUT%, I present to you exRP27 (expected runs per 27 outs, a stupid name for a hopefully not stupid stat). The basis for this stat is this data from fangraphs, “Line drives are death to pitchers, while ground balls are the best for a pitcher. In numerical terms, line drives produce 1.26 runs/out, fly balls produce 0.13 R/O, and ground balls produce only 0.05 R/O.” *(I don’t know how this was calculated, or when it is accurate for, but this is what I got). We don’t know this for soft contact, medium contact, hard contact, so again I’m sticking with ground balls, line drives, and fly balls. Alright so what I am going to do using this stat and the pitcher’s K%, BB%, GB%, LD%, and FB% is seeing how many runs the pitcher should be allowing over 27 outs, and then adjust it to get it on a scale similar to ERA, FIP, and xFIP. Keeping Clayton Kershaw as our example, let’s take a look Kershaws K% is 32.3, we’re multiplying this by 27 (for outs in a game), and we get 8.721 K’s, so 0 runs so far because a K will never produce a run Now GB%. His exgbOUT% is 25.23354424, multiply this by 27 and we get 6.8 (ish, final number will be exact via the excel doc). Multiply this by .05 (the runs per GB out he gets) and we get .34 runs. LD%- his exldOUT% is 4.193784, multiply by 27 and get 1.13232168, and multiply this by 1.26 for LD runs/out and we get 1.43 runs His exfbOUT% is 13.3181291, now multiply by 27 get 3.6 and then that by .13 and you get .47 runs Add up all these exRUNS and Kershaw’s total is 2.24. However, we can’t stop here because the number of outs he’s recorded is only 20.3 (8.7+6.8+1.1+3.6) approximately. 20.3 is the rounded up total. So get this 20.3 (or whatever the pitcher’s exOUTS is) up to 27 by multiplying by whatever it takes, and then multiply his exRUNS by this same number. For Kershaw you end up with 2.97 exRP27unofficial. The league average would be 3.78. Last year’s average ERA/FIP/xFIP was 3.74, but when I adjust everything to that, everyone’s exRP27 just goes down slightly (Kershaw’s from 2.97 to 2.94), but I want it to be on a more realistic scale where everyone’s totals are lower and a really good exRP27 is comparable to a really good FIP, like in the low 2s. So I don’t what the statistics correct way is, but here is what I did to make it work. I calculated what his “ERA” would be using by multiplying his exRUNS by 9 and then dividing that by his exOUTS. His was .99, the league average was 1.26. I then did .99/1.26 to get .78 or so, I then multiplied that by his exRP27unofficial and got 2.34, I felt like this was more realistic and in line with his ERA/FIP/xFIP. Obviously, can’t be the same because they measure different things, but just got in in the area. And the same is done for all pitchers obviously, not everyone gets multiplied by .78 of course. Here is the leaderboard for that (S column), you will need to zoom in. http://imgur.com/1AvTDiG I really like this stat a lot, and feel like it does what I wanted to accomplish, figure out how many runs a pitcher should allow per 27 outs given his K%, BB%, GB%, LD%, FB%, and the notion that balls in play will behave the way they normally, as anything else is likely luck and not indicative of the pitcher’s performance. I look at Sonny Gray as someone this stat is perfect for. His ERA is outstanding at 2.04, but his FIP is 3.00, his xFIP is 3.47 and his SIERA is 3.50. The problem is, at least with FIP and xFIP for sure is that they ignore what happens when the ball is in play. He doesn’t strike out too many people, he has a good BB% but not spectacular, and he’s given up 10 home runs, a fair amount, so this hurts his FIP and what not. However, instead of saying “well he will regress, look at his FIP/xFIP/SIERA” this looks at why he’s having this success, and it has to do with the balls in play, which is getting ignored. Gray’s LD% is just 14.6! That is really good! Second best of the 90 pitchers I did this for. And his GB% is 54%, 9th best, also really good. The pitcher does have control over the type of contact he allows, and the fact that Gray is producing a ton of ground balls, and very few line drives is why he’s been so successful. His 2.34 exRP27 suggests that he has not been as good as his 2.04 ERA suggests, but he’s not as far off as the other stats suggest. Obviously exRP27 is far from perfect, and is in no way supposed to replace FIP/xFIP/SIERA, but it is something to look at with them. I am a big believes in aggregation, so I think that averaging some combination of these 4 stats together or them all, is an even better way to evaluate a pitcher. We’ve got more data than ever, so it makes sense to use it, exRP27 and exOUT% are just more examples of utilizing this data to help better evaluate pitchers. I hope you guys enjoyed. Any feedback please comment or contact me. Next I will be looking at exWOBA against for pitchers using similar data, and exWOBA for batters using the data but for hitters.
Your long hard devotion to diligence, in your work, is honorable..... Your view of metrics is for most too much information. (TMI)..... in other words, your work is good, way to long for any attention span, other than Bill James, who is who you should be addressing, instead of us..... Stats are good for many things...metrics can and have been used to make a rose smell like shit, and shit smell like perfume. One's use of metrics is often required to any extent, ie, "If you can measure it, you can see it, and if you can see it, you can fix it".....however, good your intent, you perhaps should seriously submit your work to Bill James-Abstract, You may even find an email for his publishing firm. I'm sure his snail mail is available, perhaps thru his publisher, or the Bosox.....? PS: best of luck to ya'.....
Good to see you tsky! Holy crap.... Could you give a key to what all the abbreviations used stand for? What is 1FFB? expected anything isn't a stat... Only what has been done. So I'd remove the "ex" portion.... The rest I'll actually need to sit at a computer & read with the key before I can adequately respond.
For someone who claims he is not nor close to a statistician I'd say you've done your home work well. I will second good to see you on the board.
So, tsky. Please explain how the Yankees keep a dud like Drew on their roster................statistically speaking? lol PS ........ I know you are concentrating on pitching. and I apologize for that. Yet, are there stats that keep him on the Yankee roster? Just a curious thought!
...guys, fwiw, I would not look for the OP to post here on a regular basis...he spams essentially the same thing every so often on multiple boards.
not sure I'd call this spam but okay... I am a big yankee fan/baseball fan and like statistics. Just because I don't post here all the time doesn't mean I can't share some of my findings if I feel like they are interesting
I can't hear you, can you look this way so I can read your lips, and I hope you don't mind me writing out what I can't say.......!