ESPN needs to adjust their statistical model


In my rant tonight before the big day on Thursday (and as I wait to get a read on Providence and USC in tonight’s game!), I am really disappointed with ESPN and their analytics group.   All year, they have been pushing their BPI statistic.  I have read about the teams that are most likely going to lose because of their BPI.   I have seen them push that the Selection Committee should be using better analytics than the RPI – they should use Ken Pom and BPI as their guide.

The problem is that the models don’t lend themselves to credibility at the moment – they might have the last laugh, but it is hard to take them seriously with some of the ridiculous rankings that they have.   Since ESPN is marketing their model so much, here is how ridiculous these rankings have been.

  • BPI is good with three of the #1 seeds (Villanova, Gonzaga and UNC) – but Kansas should be a 3 seed as the 10th best BPI team.
  • So who is the 4th #1 – Virginia.   Yes – that is right, not Arizona, Kentucky, Duke, Louisville, or pick your 3 or 4 seed – Virginia is so poorly ranked that they are off by about 16 teams.
  • Kentucky – which won the SEC by 2 games is behind Florida.   Which I guess isn’t as bad as Kansas – which won the Big 12 by 4 games is behind West Virginia.
  • Syracuse and Indiana would both be in the field as 8 seeds – yes, that is right.  They are not on the bubble, they are safely in the field.
  • Clemson would have been a 9 seed and Texas Tech and Houston would have been in as 11 seeds.
  • Minnesota would have been a 10 seed if we listened to the BPI
  • So which 5 teams would have been left out – those would have been the two teams playing right now (USC and Providence), #9 Seton Hall, #9 seed Virginia Tech, and #6 Maryland.   That’s right, they are saying that Maryland is 23 teams worse than their current seed.

I shouldn’t only hammer ESPN on this.   Some of the better known statistical ratings like KenPom and Sagarin also have some really strange rank orderings (also loving some of the same teams that BPI loves).   Which really makes me sad.   I think you really can leverage statistics to predict these events – and that should make them excellent for creating a ranking.   But they are breaking down this year – because they are inviting teams like Indiana, Clemson and Texas Tech that no one (except maybe fans of those teams) would argue should be in the field.

So, I did the obvious thing – I have entered all their ratings into the pool, just for fun so we can see how much better we can be over the models.   Fortunately for the models, the Selection Committee helps them out in a few places (for example, in KenPom 6 of the top 13 teams are all in the East, making games that could be Elite 8 type games happen in the 2nd round).   But my guess is that we are going to destroy them – and then hopefully, their owners will go and rebuild them over the off-season.

OK – going to stop my rant and enjoy the game.

 


2 responses to “ESPN needs to adjust their statistical model”

  1. Tom,

    Of course, thank you again for pulling me out of my humdrum existence and re-elevating me to a higher plane. My current job has me not touching data a whit (whatever a whit is). The team I am on is all so much better than I at real work they no longer trust me to so much as as sweep up the punched card chads. (I keep meaning to ask where the card reader is so I can submit a job.) So, it is with great relish I launch into your data and once again realize that after 15 years of Bradley-Terry modeling I still have not documented my code and I again had to repuzzle it all out. What a joy.

    I find, again, that the NCAA professionals have pretty well used all my tricks–their seedings are almost perfect–I hardly have an inversion. Sure am glad I don’t have to beat a market consensus view on predicting uncertain outcomes in order to earn a living.

    And, I even got my simulator to work. Using your scoring system I can say my 95% end of tournament score range is 49 to 117. Median is 77. (I think you multiple everyone by 10–gotta love the score inflation.) The average game is a .6 correct prediction. (More predictable than a coin flip of .5). The difference between my predictions and best seed prediction are negligible–totally in the binomial noise.

    Of course, if I ever win I will like totally (that expression dates me as having raised my kids 20 years ago) credit my modeling. And I allocate every year that I lose to binomial variability. Human psychology is simply wonderful in how it protects the pride.

    Two evenings now of futzing. Can’t say SAS/Studio and its Oracle Virtual Machine has grown on me, but nice that it is available for free home use at all.

    Now after 100,000 tournament simulations I simply check out and check back in again in 20 days and see how simulation 100,001 turned out. Does a real tournament actually exist? How would I know? (I have never actually watched an entire basketball game.–I think I did watch 5 minutes once.)

    Having a great time. Glad to be sharing it with you. So glad you are sharing this all with us.

    Best,

    Bill

  2. Thanks, Bill!!!!! I love this time of year, and I love doing all this analysis over the games. Considering your Bradley-Terry model went 16 for 16, I would be thrilled if I could ever create a model or simulator as good as the ones you build.

    Enjoy the tournament!!!!! Good luck to you!!!!!

Leave a Reply to The Lunatic Cancel reply

Your email address will not be published. Required fields are marked *