Well, it is ready through Saturday’s March 6th games! For the most part, the file is the same as normal. I have continued with the approach that conference tournament game are counted as conference games instead of post-season games. Since I had the new NET ranking pulled from the NCAA website and added to the standings, I added a couple other things. In the standings file, the NCAA NET page lists out a team record for the 4 quads, so I copied those into the Standings page. And because I had the rules and the NET ranking, I merged it onto the Schedule page in a way that you can see the NET ranking for each opponent, and what quad that game is. I did not validate that matched the NCAA site, so take the data for what it is…….
The NCAA Archive site doesn’t appear to be loading the PDFs that I had been downloading – they keep moving more and more contact to their Statistics webpage. But why not simply use that as an excuse to give you more access to the data I find. The 2021 NET Nitty Gritty link actually goes to the NCAA Statistics website – it gives you an option to download the Nitty Gritty as a PDF. If you click on one of the team’s names, it takes you to their schedule, and if you click on their NET Ranking on the schedule page, it will give you a team sheet that is similar to what they provide in the PDF. The only thing I don’t like about that is it doesn’t have all the other computer scores (the old PDF used to have KenPom and Sagarin scores for example on their team sheets). It also isn’t in one large document, which is annoying. If I find a better option, I will add information as we go……
But that does have the benefit that there is now only one file to update instead of 3. No promises that I will update the Schedule Excel file every day, but wanted it to be available to everyone as early as possible. I will update as I have time throughout the week – the spreadsheet has a page that says when it is last updated.
Obviously, remember the traditional Lunatic disclaimers. I have done some basic cleaning and quality checks that the records from the schedule I have match the official NCAA site – but there are thousands of games, and so I will not make the claim that I have checked every piece of the dataset. To be honest, I simply check to make sure the records match – I figure if I can get lucky enough that all 347 teams have the correct records, the rest of the data is probably right.
That being said, one really interesting thing that this file does create is a side-by-side comparison of the old RPI calculation (which my tool still calculates – as does some other webpages) vs. the new NET model that the Selection Committee is using to rank games into the quadrants. I do think that the NET score is giving the Selection Committee a better ranking. For example, Illinois right now would be 17th in the RPI instead of 4th in the NET. And Iowa would be at 39th in the RPI instead of 6th. I am sure that it could be even better – KenPom has Colgate as the 88th ranked team instead of 8th for the NET (or 7th for the RPI). But glad that the NCAA is moving a step forward….
For those of you who are not familiar with this tradition of me doing insane data pulls to grab all this great college basketball data, I will give you some more details.
As many of you know, one of my insane features is that I try to provide people with data about the teams in case they want to do research on the teams. Each year, we get several people who have demonstrated the power of statistics by building models in order to predict the games. Some of them have been extremely successful with this – especially Bill Kahn with his Bradley-Terry models, showing that even something extremely unpredictable as sports can be forecasted through good statistical techniques. But the part of this that has made me happy – and why I do this – is because a few people who were not statisticians but were taking a stats training course at work used this data for their class project and ended up having some success – including our 2006 champion, David Shaddick.
So, since that point, I decided to provide the scores to everyone in an attempt to provide people as much of a chance to try to leverage data to make their decisions. I realize that most of you will probably spend three to five minutes just looking at the teams and figuring who will do best – I probably don’t need a model to decide that the number 1 seeds will beat the 16 seeds… In fact, I typically spend so much effort maintaining the site that I pick Purdue to go far and just randomly pick the other games late Wednesday evening. So, I am not really sure what I am going to do this year with my Boilers looking like they might not be dancing.
However, if I can give people a chance to try to learn something about statistics in a very fun environment, it is well worth the effort.
If you notice something terribly wrong, let me know – no promises I have time to fix it, but at least everyone will know.
Enjoy the data!!!!