Modelers – the 2017 Schedule Data is here

Well, it is ready through Saturday’s games! For the most part, the file is the same as normal – maybe a couple extra fields of information. I have continued with the approach that conference tournament game are counted as conference games instead of post-season games.

No promises that I will update every day, but wanted to get was available for everyone – and will update as I have time throughout the week – the spreadsheet has a page that says when it is last updated.

But for those of you who want to do crazy statistical research, build models, or just have all the schedule data at your fingertips to evaluate teams, the data is there in the research links under 2017 Schedule.

Obviously, remember the traditional Lunatic disclaimers. I have done some basic cleaning and quality checks against RPI data – but there are a lot of games, and so I will not make the claim that I have checked every piece of the dataset.

For those of you who are not familiar with this tradition, I will give you some more details.

As many of you know, one of my insane features is that I try to provide people with data about the teams in case they want to do research on the teams. Each year, we get several people who have demonstrated the power of statistics by building models in order to predict the games. Some of them have been extremely successful with this – especially Bill Kahn with his Bradley-Terry models, showing that even something extremely unpredictable as sports can be forecasted through good statistical techniques. But the part of this that has made me happy – and why I do this – is because a few people who were not statisticians but were taking a stats training course at work used this data for their class project and ended up having some success – including our 2006 champion, David Shaddick.

So, since that point, I decided to provide the scores to everyone in an attempt to provide people as much of a chance to try to leverage data to make their decisions. I realize that most of you will probably spend three to five minutes just looking at the teams and figuring who will do best – I probably don’t need a model to decide that the number 1 seeds will beat the 16 seeds… In fact, I typically spend so much effort maintaining the site that I just randomly pick late Wednesday evening.

However, if I can give people a chance to try to learn something about statistics in a very fun environment, it is well worth the effort.

If you notice something terribly wrong, let me know – no promises I have time to fix it, but at least everyone will know.

Enjoy the data!!!!

Stomp The Lunatic

Modelers – the 2017 Schedule Data is here

Leave a Reply Cancel reply