As many of you know, one of my insane features is that I try to provide people with data about the teams in case they want to do research on the teams. Each year, we get several people who have demonstrated the power of statistics by building models in order to predict the games. Some of them have been extremely successful with this – especially Bill Kahn with his Bradley-Terry models, showing that even something extremely unpredictable as sports can be forecasted through good statistical techniques. But the part of this that has made me happy – and why I do this – is because a few people who were not statisticians but were taking a stats training course at work used this data for their class project and ended up having some success – including our 2006 champion, David Shaddick.
So, since that point, I decided to provide the scores to everyone in an attempt to provide people as much of a chance to try to leverage data to make their decisions. I realize that most of you will probably spend three to five minutes just looking at the teams and figuring who will do best – I probably don’t need a model to decide that the number 1 seeds will beat the 16 seeds… In fact, the way I have been going this year with trying to innovate the site, I will likely being doing the same thing late on Wednesday night.
However, if I can give people a chance to try to learn something about statistics in a very fun environment, it is well worth the effort. So, in the 2011 Statistics section is a link to an Excel file with the basic box score numbers for each game for every Division I team.
My classic caveats on this data are the following: I have only checked that standings match with ESPN’s page. I don’t have time to error check over 5000 box scores. I also tried to identify all the neutral court games – I believe I have done that since I was able to match the March 6th RPI standings, which uses site of games in the calculation as a weight. But no guarantees. Also, obviously, the 4 championships on Sunday are not in the file. If you notice something terribly wrong, let me know – no promises I have time to fix it, but at least everyone will know.
Enjoy the data!!!!