{"id":1433,"date":"2019-03-12T03:03:39","date_gmt":"2019-03-12T03:03:39","guid":{"rendered":"http:\/\/www.tehodgson.com\/stompthelunatic\/?p=1433"},"modified":"2019-03-12T03:03:39","modified_gmt":"2019-03-12T03:03:39","slug":"modelers-the-2019-schedule-data-is-available","status":"publish","type":"post","link":"http:\/\/www.tehodgson.com\/stompthelunatic\/?p=1433","title":{"rendered":"Modelers &#8211; the 2019 Schedule Data is Available"},"content":{"rendered":"<p>Well, it is ready through Sunday&#8217;s March 10th games!\u00a0\u00a0 For the most part, the file is the same as normal.\u00a0\u00a0 I have continued with the approach that conference tournament game are counted as conference games instead of post-season games.\u00a0 There is one additional data point &#8211; while I have not calculated it, I used the NCAA&#8217;s new NET rankings to validate the records (since it lists the official neutral game records).\u00a0 And since I had the new ranking on the file, I added it to the standings file.<\/p>\n<p>I have also added two PDF files &#8211; for those of you who would like to see what some of the data sheets that the Selection Committee gets when making their decisions.\u00a0 Fortunately for all of us, the NCAA puts their Team Sheets (which breaks each teams schedule into different rankings quadrants) and NET Nitty Gritty summary files on their RPI Archives Page &#8211; and so I have copied them and loaded them to the Research tab along with the Schedule 2019 Excel document.<\/p>\n<p>No promises that I will update these three files every day, but wanted to make this available for everyone &#8211; and will update as I have time throughout the week &#8211; the spreadsheet has a page that says when it is last updated.<\/p>\n<p>For those of you who want to do crazy statistical research, build models, or just have all the schedule data at your fingertips to evaluate teams, the data is there in the research links under Schedule 2019.<\/p>\n<p>Obviously,\u00a0remember the traditional Lunatic disclaimers.\u00a0 I have done some basic cleaning and quality checks that the records from the schedule I have match the official NCAA site \u2013 but there are a lot of games, and so I will not make the claim that I have checked every piece of the dataset.\u00a0 More importantly, because of multiple changes to the NCAA&#8217;s website (and my ramblings last week of difficulties pulling this data due to blanks in the box scores), there are potential issues to be checked.\u00a0 For example, I have noticed that some of the home sites seem to be attendance figures due to missing information on the box score summary.\u00a0 I suspect the scores are right since the complete records are correct.\u00a0 But take the data with a grain of salt.<\/p>\n<p>That being said, one really interesting thing that this file does create is a side-by-side comparison of the old RPI calculation (which my tool still calculates &#8211; as does some other webpages) vs. the new NET model that the Selection Committee is using to rank games into the quadrants.\u00a0 I will probably have to ramble about it &#8211; but lets just say that from a quick glance, North Carolina State and Indiana are thanking their lucky stars that the NCAA has moved to the NET score, and Arizona State, Seton Hall and Temple might be eventually wishing that the RPI was still the NCAA&#8217;s ranking system.<\/p>\n<p>For those of you who are not familiar with this tradition of me doing insane data pulls to grab all this great college basketball data, I will give you some more details.<\/p>\n<p>As many of you know, one of my insane features is that I try to provide people with data about the teams in case they want to do research on the teams. Each year, we get several people who have demonstrated the power of statistics by building models in order to predict the games. Some of them have been extremely successful with this \u2013 especially Bill Kahn with his Bradley-Terry models, showing that even something extremely unpredictable as sports can be forecasted through good statistical techniques. But the part of this that has made me happy \u2013 and why I do this \u2013 is because a few people who were not statisticians but were taking a stats training course at work used this data for their class project and ended up having some success \u2013 including our 2006 champion, David Shaddick.<\/p>\n<p>So, since that point, I decided to provide the scores to everyone in an attempt to provide people as much of a chance to try to leverage data to make their decisions. I realize that most of you will probably spend three to five minutes just looking at the teams and figuring who will do best \u2013 I probably don\u2019t need a model to decide that the number 1 seeds will beat the 16 seeds\u2026 In fact, I typically spend so much effort maintaining the site that I pick Purdue to go far and just randomly pick the other games late Wednesday evening.<\/p>\n<p>However, if I can give people a chance to try to learn something about statistics in a very fun environment, it is well worth the effort.<\/p>\n<p>If you notice something terribly wrong, let me know \u2013 no promises I have time to fix it, but at least everyone will know.<\/p>\n<p>Enjoy the data!!!!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Well, it is ready through Sunday&#8217;s March 10th games!\u00a0\u00a0 For the most part, the file is the same as normal.\u00a0\u00a0 I have continued with the approach that conference tournament game are counted as conference games instead of post-season games.\u00a0 There is one additional data point &#8211; while I have not calculated it, I used the [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[35],"tags":[],"class_list":["post-1433","post","type-post","status-publish","format-standard","hentry","category-blog2019"],"_links":{"self":[{"href":"http:\/\/www.tehodgson.com\/stompthelunatic\/index.php?rest_route=\/wp\/v2\/posts\/1433","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.tehodgson.com\/stompthelunatic\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.tehodgson.com\/stompthelunatic\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.tehodgson.com\/stompthelunatic\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.tehodgson.com\/stompthelunatic\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1433"}],"version-history":[{"count":1,"href":"http:\/\/www.tehodgson.com\/stompthelunatic\/index.php?rest_route=\/wp\/v2\/posts\/1433\/revisions"}],"predecessor-version":[{"id":1434,"href":"http:\/\/www.tehodgson.com\/stompthelunatic\/index.php?rest_route=\/wp\/v2\/posts\/1433\/revisions\/1434"}],"wp:attachment":[{"href":"http:\/\/www.tehodgson.com\/stompthelunatic\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1433"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.tehodgson.com\/stompthelunatic\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1433"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.tehodgson.com\/stompthelunatic\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1433"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}