One of the advantages to my loyal statisticians is that with the need to completely renew my website is that I am ahead of schedule on pulling box scores. If you click on the research link, our traditional Excel spreadsheet with 2024 game data is there. I will continue to update it throughout March – although maybe not every day. Can’t start my sleep deprivation too early.
The typical disclaimer is there. My process has some basic quality checks to match records to what the NCAA releases on their NET rankings page. The research page also gives you a link to the NCAA Statistics site – where I get the box scores (but they have more data including play-by-play data that I wished I had time to pull and scrub).
For those of you wondering why I do this insane data collection, it is so that people can build models with it. Each year, we get several people who have demonstrated the power of statistics by building models in order to predict the games – some with tons of success. Bill Kahn had been near the top of the standings multiple years by building Bradley-Terry models. And our 2006 champion David Shaddick won by using a model he created with this data for a stats training course at work for analysts. At the end of the day, while strange things can happen in the tournament, good statistical techniques can be used to do a great job of forecasting the games.
If you are looking for a fun way to learn statistics (or want to show off you statistical expertise by building the best model), the data is there for you to enjoy – you might be the next winner of the pool thanks to your efforts.
If you notice any issues with the data, please let me know! Enjoy the data!!!