{"id":1720,"date":"2020-03-09T02:24:11","date_gmt":"2020-03-09T02:24:11","guid":{"rendered":"http:\/\/www.tehodgson.com\/stompthelunatic\/?p=1720"},"modified":"2020-03-09T02:35:49","modified_gmt":"2020-03-09T02:35:49","slug":"modelers-the-2020-schedule-data-is-available","status":"publish","type":"post","link":"http:\/\/www.tehodgson.com\/stompthelunatic\/?p=1720","title":{"rendered":"Modelers &#8211; the 2020 Schedule Data is available"},"content":{"rendered":"<p>Well, it is ready through Saturday&#8217;s March 7th\u00a0games!\u00a0\u00a0 For the most part, the file is the same as normal.\u00a0\u00a0 I have continued with the approach that conference tournament game are counted as conference games instead of post-season games.\u00a0 There is one additional data point &#8211; while I have not calculated it, I used the NCAA&#8217;s new NET rankings to validate the records (since it lists the official neutral game records).\u00a0 And since I had the new ranking on the file, I added it to the standings file.<\/p>\n<p>I have also added two PDF files &#8211; for those of you who would like to see what some of the data sheets that the Selection Committee gets when making their decisions.\u00a0 Fortunately for all of us, the NCAA puts their Team Sheets (which breaks each teams schedule into different rankings quadrants) and NET Nitty Gritty summary files on their RPI Archives Page &#8211; and so I have copied them and loaded them to the Research tab along with the Schedule 2020 Excel document.<\/p>\n<p>No promises that I will update these three files every day, but wanted to make this available for everyone &#8211; and will update as I have time throughout the week &#8211; the spreadsheet has a page that says when it is last updated.<\/p>\n<p>Obviously,\u00a0remember the traditional Lunatic disclaimers.\u00a0 I have done some basic cleaning and quality checks that the records from the schedule I have match the official NCAA site \u2013 but there are thousands of games, and so I will not make the claim that I have checked every piece of the dataset.\u00a0 To be honest, I simply check to make sure the records match &#8211; I figure if I can get lucky enough that all 354 teams have the correct records, the rest of the data is probably right.\u00a0 More importantly, because of multiple changes to the NCAA&#8217;s website as well as something unclear that has made me manually copy the box scores for all of Oral Roberts&#8217; games, there are potential issues to be checked.\u00a0 As one would expect, as I have been manually setting up the data, Oral Roberts started their Conference Tournament this evening.\u00a0 And of course, they won &#8211; so I have at least two more box scores I have to manually update &#8211; with my luck, they will make it to the Summit finals&#8230;..\u00a0 \u00a0Anyways, I suspect the scores are right since the complete records are correct.\u00a0 But take the data with a grain of salt.<\/p>\n<p>That being said, one really interesting thing that this file does create is a side-by-side comparison of the old RPI calculation (which my tool still calculates &#8211; as does some other webpages) vs. the new NET model that the Selection Committee is using to rank games into the quadrants.\u00a0 I\u00a0am sure that at some point I will have to ramble about statistical rankings but it is pretty interesting.\u00a0 There are definitely some teams (such as Texas Tech and Purdue) who have to be much happier that their bubble chances are based on the NET score instead of the RPI score.\u00a0 Then again, after Purdue lost to Rutgers on Saturday, even a good NET score might not save them.<\/p>\n<p>For those of you who are not familiar with this tradition of me doing insane data pulls to grab all this great college basketball data, I will give you some more details.<\/p>\n<p>As many of you know, one of my insane features is that I try to provide people with data about the teams in case they want to do research on the teams. Each year, we get several people who have demonstrated the power of statistics by building models in order to predict the games. Some of them have been extremely successful with this \u2013 especially Bill Kahn with his Bradley-Terry models, showing that even something extremely unpredictable as sports can be forecasted through good statistical techniques. But the part of this that has made me happy \u2013 and why I do this \u2013 is because a few people who were not statisticians but were taking a stats training course at work used this data for their class project and ended up having some success \u2013 including our 2006 champion, David Shaddick.<\/p>\n<p>So, since that point, I decided to provide the scores to everyone in an attempt to provide people as much of a chance to try to leverage data to make their decisions. I realize that most of you will probably spend three to five minutes just looking at the teams and figuring who will do best \u2013 I probably don\u2019t need a model to decide that the number 1 seeds will beat the 16 seeds\u2026 In fact, I typically spend so much effort maintaining the site that I pick Purdue to go far and just randomly pick the other games late Wednesday evening.\u00a0 So, I am not really sure what I am going to do this year with my Boilers looking like they might not be dancing.<\/p>\n<p>However, if I can give people a chance to try to learn something about statistics in a very fun environment, it is well worth the effort.<\/p>\n<p>If you notice something terribly wrong, let me know \u2013 no promises I have time to fix it, but at least everyone will know.<\/p>\n<p>Enjoy the data!!!!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Well, it is ready through Saturday&#8217;s March 7th\u00a0games!\u00a0\u00a0 For the most part, the file is the same as normal.\u00a0\u00a0 I have continued with the approach that conference tournament game are counted as conference games instead of post-season games.\u00a0 There is one additional data point &#8211; while I have not calculated it, I used the NCAA&#8217;s [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[37],"tags":[],"class_list":["post-1720","post","type-post","status-publish","format-standard","hentry","category-blog2020"],"_links":{"self":[{"href":"http:\/\/www.tehodgson.com\/stompthelunatic\/index.php?rest_route=\/wp\/v2\/posts\/1720","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.tehodgson.com\/stompthelunatic\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.tehodgson.com\/stompthelunatic\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.tehodgson.com\/stompthelunatic\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.tehodgson.com\/stompthelunatic\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1720"}],"version-history":[{"count":2,"href":"http:\/\/www.tehodgson.com\/stompthelunatic\/index.php?rest_route=\/wp\/v2\/posts\/1720\/revisions"}],"predecessor-version":[{"id":1723,"href":"http:\/\/www.tehodgson.com\/stompthelunatic\/index.php?rest_route=\/wp\/v2\/posts\/1720\/revisions\/1723"}],"wp:attachment":[{"href":"http:\/\/www.tehodgson.com\/stompthelunatic\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1720"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.tehodgson.com\/stompthelunatic\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1720"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.tehodgson.com\/stompthelunatic\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1720"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}