I’ve been pretty cavalier about my skepticism with using college production for NFL evaluation purposes in the past. But I believe in being informed and personally testing everything, even methodologies I don’t buy into. So I started to look into production so I could see exactly how effective or ineffective it truly could be on a macro scale.
I started at the quarterback position. And I’ll be honest, I thought that I would find that college production was not predictive for signal callers. I could not have been more off-base. When optimized, college production can be very predictive at quarterback.
Before you jump around and yell about “box score scouting” — believe me, that’s the last thing I buy into — keep in mind that I am a tape evaluator first and foremost. Using film to identify translatable traits is vital to the NFL evaluation process and is always the first thing that should be looked at when analyzing a player. And when I say production is predictive, I don’t mean your typical passing yards and touchdowns. In fact, typical raw box score numbers have minuscule correlations to NFL success since 1999:
MAVPY stands for Modified Approximate Value Per Year. You can read about it and the MAVEM system here.
Situationally Enhanced Metrics
So, if these box-score numbers are essentially worthless, what do I mean by “production can predictive at QB when optimized?” I am referring to Situationally Enhanced Metrics rather than raw statistics, meant to isolate player performance by accounting for external factors outside a player’s control. Put simply, these metrics are created by adding additional numerical context to position-specific rate statistics, to provide a clearer picture of how a player performs in their respective environment. Here are the enhancements applied to our QB rate stats:
Accounting for level of competition: To account for the level of competition differences among players, all of the rate stats used are given a conference-based adjuster.
Accounting for era: To account for the period in time in which a prospect played, a year-based adjustment is applied to all the quarterback rate stats used in the dataset.
Accounting for surrounding talent: To account for supporting cast, the draft capital spent on a prospect’s college teammates is added up and factored into the dataset. For current prospects, a consensus big board is used to determine projected draft position.
Accounting for pace and translatable schemes: This adjustment requires a bit more explanation. Quantifying the translatability of a college offense to the NFL is difficult to nail down. To remove the subjectivity and time output associated with manually tagging each team’s scheme in every season, we can use an offense’s era-adjusted plays per game instead. No solution will be perfect for tackling this, but using adjusted plays per game makes sense from a methodological perspective (because gimmicky offenses run more plays). And over a large dataset, it works quite well as a macro solution and captures many of the elements we are looking for to account for.
After applying these adjustments to our positional rate statistics, Situationally Enhanced Metric Testing for Efficiency and eXplosion (SEMTEX) is conducted. SEMTEX’s testing parameters are inspired by machine learning protocols, and when ran over this enhanced production data set, indicate quarterbacks with the highest and lowest chances of NFL success based on past data. These protocols are trained by the MAVPY values discussed above, as well as Draft Capital Adjusted Returns (DCAR) for players since 1999. If you aren’t familiar, DCAR is a MAVPY-based metric that evaluates a player relative to where they were selected in the draft.
SEMTEX separates Football Bowl Subdivision (FBS) passers into three different groups based purely on their college enhanced production metrics. SEMTEX is limited to FBS prospects, as inconsistent stat keeping at the lower levels makes it hard to not only find accurate historical stats, but to also apply sufficient situational enhancements to said stats as well. We will call these three groups the Gold, Silver, and Bronze buckets for now. The premise is that members of the Gold group are the most likely to succeed in the NFL, while Bronze passers have the lowest likelihood of becoming starters at the next level.
SEMTEX Gold QBs
Here is every Gold passer drafted since 1999, sorted by their NFL MAVPY output. Remember that MAVPY is not a part of SEMTEX formulation, but rather a way to evaluate the success of Gold, Silver, and Bronze buckets quantitatively after the fact. MAVPY is generated in the pros, and SEMTEX is meant to be applied to college QBs before they are drafted:
Italics Indicate Position Conversion
If you aren’t familiar with MAVPY, in very general terms, a MAVPY of around 4 is NFL starter level, while a MAVPY of 8 is around Pro Bowl level. Keep in mind that MAVPY is an average of a player’s outputs over his career. Obviously this hurts players like Tyrod Taylor, who were backups for multiple seasons.
For readability and relevancy purposes, this list is all the Gold quarterbacks who have been drafted. In total, there have been 131 college quarterbacks in the Gold SEMTEX category since 1999, and when we evaluate success and starter rates for each group later on, these UDFAs will be included. Notable Gold UDFAs include Case Keenum, Billy Volek and Kellen Moore.
I’ve noticed that a good chunk of the failures in the Gold group fall below the lower bound QB threshold in Slaytics. As much sense as it makes to filter out lower bound passers who don’t have the baseline physical traits to be NFL starting QBs, I plan to keep Slaytics and SEMTEX as separate standalones, and won’t intertwine them here.
The main reason for this is SEMTEX doesn’t cover other positions at the moment, and we are a ways out from knowing if it will work at other spots yet. If we did blend the systems together, though, David Carr, Brian Brohm, Garrett Grayson, Greg McElroy, Case Keenum (UDFA), Kellen Moore (UDFA) and Graham Harrell (UDFA) would be removed from the Gold list. So obviously, there is some credence to the idea that removing these players would be valuable, and it’s something I will be monitoring as an evaluator.
The position converts will be included in all these lists and listed in italics. The gray area between position conversions can be muddied at times, so to keep it simple, we will just keep everyone in. Isaiah Stanback and Pat White are the main Gold examples, as they were college QBs even though Stanback was drafted as a WR and White had more than four times more carries than he had throws in the NFL. White was intended to be a QB, though, so he straddles the line enough to where he isn’t in italics.
SEMTEX Silver QBs
Here is every Silver passer selected since the 1999 NFL Draft, also sorted by MAVPY:
Italics Indicate Position Conversion
I like to call this the “gunslinger” group due to the type of passers it attracts. Stylistically, these are largely volume passers. The Silver group has a few Pro Bowl-caliber passers, but with this being the intermediate group, traits like mental make-up and supreme arm talent become vital for prospects in this group to succeed as volume types. Winston, Stafford, Palmer, and Cutler fall into the elite-level arms and body types category, with cannons and frames to match.
On the other hand, passers like Bulger, Hasselbeck, and Cousins proved their intellectual acuity with pro style concepts in college, then paired up with innovative offensive minds like Mike Martz, Mike Holmgren, and Jay Gruden in the NFL to fully weaponize that ability. While these types of passers likely won’t win you a Super Bowl by themselves, these play caller/quarterback duos are a great way to build an identity for teams struggling to find one. Obviously, it takes the right offensive coordinator to pull off building a unique scheme that showcase a passer’s strengths while hiding their flaws like in those cases, though.
If you don’t evaluate a Silver passer as clearly fitting one of those buckets, or lack the offensive system to build around the latter, I would recommend avoiding those prospects. Looking at you, Jimmy Clausen.
Since 1999, there have been 112 college passers in the Silver group. Notable UDFAs include Vernon Adams, Brett Smith, Timmy Chang, Brian Hoyer, Cleo Lemon, and Anthony Wright.
Obviously, Antwaan Randle-El was a starter-level player at WR, not QB. Braxton Miller is following this path as well. They are both italicized to indicate their position conversion.
SEMTEX Bronze QBs
Here is every Bronze passer drafted since 1999:
Italics Indicate Position Conversion
Since 1999, there have been 865 college QBs in the Bronze category. Notable UDFAs include Chase Daniel, Matt Moore, Connor Halliday, Taylor Heinicke, Seth Doege, Matt McGloin, Caleb Hanie, Tim Hasselbeck, Chris Leak and Juice Williams.
Of the drafted Bronze prospects that have a MAVPY above zero, 10 of them are italicized position conversions. After factoring out these position converts, only three passers have a MAVPY above starter level out of the 865 quarterbacks in the Bronze group since 1999.
One of them is Josh Freeman, a first-rounder who didn’t even make it through his rookie contract in Tampa Bay. The main reason he is over starter level is due to how quickly he left the league after a couple of successful seasons early on in his career. He did not outperform his draft capital.
The second one is David Garrard, who continues to spit in the face of predictive thresholds. In case you don’t know, Garrard is the only starter-level QB under the lower bound Slaytic threshold since 1999, and continues to be a special exception to most rules.
The last and by far biggest outlier of the Bronze group is Ryan Tannehill.
Why did he slip through the cracks?
Looking into the context of his college background may help answer that. Tannehill actually played WR his first two and a half seasons at Texas A&M, getting only one full year to start at quarterback for the Aggies. It is not like Tannehill was starting from scratch and had never played the position before, though. He played quarterback as well as safety in high school. But the lack of reps for two years not only may have set his long-term development behind, but this break likely negatively impacted his stats as well.
Now these aren’t excuses. Tannehill is certainly a big miss, and his success removes us from being closer to absolution, but his unique circumstances may have contributed to his placement in the bottom grouping.
Machine Learning Protocols
Now, before we quantifiably evaluate our threshold groups with MAVEM valuation metrics, let’s circle back to something we briefly talked about at the beginning: machine learning. Or more specifically, the potential for overfit models in machine learning.
Overfitting is an issue that can occur when machine learning protocols match their training dataset too well and make random noise from the dataset part of the model rather than just translatable data. This is an exaggerated example, but this would be like if a parameter was established that all QBs with the last name Leinart should be avoided. This would be a problem when Matt Leinart Jr. shows up in 7 years.
Now, this couldn’t happen with the machine learning protocols used here, as they are all heavily supervised for this purpose. Additionally, as mentioned before, SEMTEX parameters are inspired by machine learning protocols, not completely generated by them. This distinction means I have to methodically buy into a parameter being viable when replicated with future draft classes for it to be included. This isn’t just a casually trained macro set, and I believe it takes a “football person” to fully construct and understand how to implement this strategy to full capacity.
The standard way to evaluate if a model is overfit is to separate a dataset into a training portion and a testing portion. As discussed, overfit issues can occur if the training portion makes up the whole dataset. This isn’t possible with Slaytics though, as measurable data doesn’t go far back enough to split. That is why Slaytic thresholds are so strictly supervised and have to make sense in a football capacity to be implemented.
The whole idea for using machine learning to inspire rather than generate came from this issue. This isn’t so much of an issue with production though, as the base statistics go much further back than athletic testing. We are still taking the same precautions we do with Slaytics in SEMTEX, as we want these results to be as accurate as possible going forward, but we can apply the trained tests from our post-1999 results to older data as statistics go much further back.
To fully accomplish evaluating this new time period, the MAVEM system had to be extended further back, as it went to 1999 originally as well. So, MAVEM now supports analysis all the way back to 1983. Now, let’s look at the SEMTEX results for passers in this older era:
SEMTEX Gold QBs (Pre-1999)
Notable Gold UDFAs in this testing era include Steve Young, Jeff Garcia and Mike Elkins.
SEMTEX Silver QBs (Pre-1999)
The Silver UDFA group includes Jake Delhomme, Mike Gundy and Darrell Bevell.
SEMTEX Bronze QBs (Pre-1999)
Damon Huard, Chad Hutchinson and Kirk Herbstreit were the most notable undrafted Bronze passers from the pre-1999 group.
As you can see, SEMTEX translated to our testing group very well. From 1983 to 1998, only one drafted Bronze passer turned into a starter-level QB. And that one starter wasn’t any of the six first-round, five second-round, seven third-round, or nine fourth-round Bronze passers selected over that span, but rather eighth-rounder Elvis Grbac, a journeyman who managed to stay above the starter baseline after a starting a few years in Kansas City, partially because he never came back and played after being displaced.
Of the six first-rounders in this Bronze testing group, the biggest wastes of Draft Capital were both drafted in 1993: Rick Mirer (second overall pick) and Todd Blackledge (seventh overall pick).
Looking at the player list for this Silver group, I believe the volume passer moniker still fits with the pre-1999ers. Top-level arm talent or elite structure paired with quick mental processing is needed for success in this group. Dan Marino certainly fits the bill, having one of the best arms ever, while the middle-tier guys showcase the latter portion.
The biggest wastes of draft capital among the Silver group were Ryan Leaf (second overall pick), Kelly Stouffer (sixth overall pick), and Andre Ware (seventh overall pick)
The Gold group has its fair share of busts as well, though. Heath Shuler (third overall pick) and Dan Klingler (sixth overall pick) lead the way in wasted draft capital, and even from our original post-1998 training group, there are misses like Blaine Gabbert and Joey Harrington as well. That said, just from analyzing the player lists, it is clear that if you want a starter- or Pro Bowl-level quarterback, you should avoid drafting Bronze passers and target Gold ones.
Using the MAVEM framework, we can back this up and quantify how each group has performed on a macro scale across the testing/trained eras, giving us a full look among drafted FBS quarterbacks since 1983:
Starter Percentage: The percentage of players in each group above starter level NFL MAVPY output.
Success Rate: The percentage of players in each group that have a positive DCAR, meaning they have outperformed their draft-capital investment.
Impact Score: The average of the top five MAVPY outputs in each grouping. These are meant to showcase the upside of prospects in these groupings. In very general terms, a MAVPY of around 4 is NFL starter level, while a MAVPY of 8 is around Pro Bowl level. Because we are using MAVPY, which is scaled horizontally in a cross-positional manner as well as vertically to match surplus values given in NFL contracts, impact scores can be compared this way as well.
For reference on what these numbers mean in context, the starter percentage among all drafted QBs is 17.78 percent, the average success rate for a drafted QB is 21.39 percent, and the average NFL MAVPY output of a drafted QB is 1.69.
As you can see, the Gold group wins in every category. The Silver group’s performance is interesting because it has a higher starter percentage than success rate, which is not normal. That means, in general terms, that players in this group are overdrafted. I would imagine that the supreme arm talent half of the “volume passer” bucket discussed earlier causes that.
The Impact Scores are pretty telling. The Gold impact score is extremely impressive, and scrolling through the player lists showcases why. The upside on the Bronze QBs is extremely low, however, as the best case impact score scenarios in that group come in barely above starter level, and that is with Tannehill’s odd path bringing that score up.
- 96.1 percent of starter-level quarterbacks drafted since 1983 have been Gold or Silver passers.
- There has never been an Bronze 1st Team All-Pro QB in the SEMTEX era.
- The Starter Percentage for drafted Gold QB’s (34.38%) is nearly double the Starter Percentage among all drafted QB’s (17.78%)
- Correction: Mark Rypien (Bronze) is the only eligible non-Gold QB to win a Super Bowl as a starter. Bernie Kosar (Silver), Elvis Grbac (Bronze) and Brock Osweiler (Bronze) were on Super Bowl-winning teams as backups, behind Gold quarterbacks Steve Young, Troy Aikman and Peyton Manning. Super Bowls are obviously team achievements, but interesting to note.