Category: 2014 User Blog

  • Data junkies – go and start your research!!!

    That’s right.  In an incredible twist of fate, the Lunatic appears to be ahead of the game with getting box scores for all you statisticians out there who want to build the statistical model that will win this year’s contest.

    For those of you who are not familiar with this tradition, I will give you some more details.

    As many of you know, one of my insane features is that I try to provide people with data about the teams in case they want to do research on the teams. Each year, we get several people who have demonstrated the power of statistics by building models in order to predict the games. Some of them have been extremely successful with this – especially Bill Kahn with his Bradley-Terry models, showing that even something extremely unpredictable as sports can be forecasted through good statistical techniques. But the part of this that has made me happy – and why I do this – is because a few people who were not statisticians but were taking a stats training course at work used this data for their class project and ended up having some success – including our 2006 champion, David Shaddick.

    So, since that point, I decided to provide the scores to everyone in an attempt to provide people as much of a chance to try to leverage data to make their decisions. I realize that most of you will probably spend three to five minutes just looking at the teams and figuring who will do best – I probably don’t need a model to decide that the number 1 seeds will beat the 16 seeds… In fact, I typically spend so much effort maintaining the site that I just randomly pick late Wednesday evening.

    However, if I can give people a chance to try to learn something about statistics in a very fun environment, it is well worth the effort. So, just click on 2014 Schedule in the Research menu and get an Excel spreadsheet with summary box score and standing / RPI information for each game for every Division I team.

    Now – my disclosures:

    • I have updated the data as of Monday, March 10th – as the week goes on, I will continue to update this file.  No promises on daily updates.  But I will definitely update on Sunday once I can get all of Saturday night’s conference championships – that typically leaves you the complete schedule minus the 3 or 4 championships on Sunday.
    • I have gone through countless efforts to validate this data just to make the realization that I have no faith in validating this data.  What I can tell you is my overall records match the standings pages you can see on the commercial sites (to a point – which will continue to explain).
    • There are two instances of games played between conference teams that are considered non-conference games.  Rhode Island and George Mason from the Atlantic 10 played a non-conference game in November (believe it was on their schedules before George Mason joined the conference this year and they just decided to keep it).  NC Central and Hampton from the MEAC also played a game in January.  ESPN has it listed as a non-conference game and CBS has it as a conference game.  When going to the MEAC website (yes, the MEAC has a website), they list it as a non-conference game – and so that is what I have listed it as well.
    • I have gathered neutral site games by going on the NCAA official site and checking where they said the games were neutral.  Hopefully, I did not miss any but this is sadly not a straight forward process.
    • If I did, it would be tough to tell.  This is because I have found that ESPN, CBS, and the NCAA site all disagree on whether the venue is a neutral site or not.  I have obviously leaned on the NCAA site.
    • But then, I noticed that in one case between two Southland teams (my guess is this will not crush your analysis), the NCAA site had two teams (Grambling State and Arkansas-Pine Bluff in conference playing twice with both games at Arkansas-Pine Bluff.  This made no sense to me – and so I kept digging and the box scores on the NCAA site even said the games were played at two different venues.  So I am not sure how this translated to two home games for the one team.  I chose not to change my data – which according to the box scores said they played a home and away.
    • My RPI standings, while close to what the NCAA published, is not a 100% match.  This had me discouraged for several hours this weekend as I tried to figure out my error.  Then, I had a realization.  The CBS site and ESPN site don’t match what the NCAA published either.  This is because CBS considered different games as neutral games, and ESPN included Abilene Christian and Incarnate Word – two first year provisional division 1 members which are not supposed to be included in the RPI calculation (and are not eligible for the tournament).  I imagine ESPN has issues with the neutral games as well.   So, at this point, I gave up.  If ESPN and CBS with all their resources can’t get it right – and to be fair, the NCAA can’t even keep track of home and away games so their data is suspect as well, why should I have to be held accountable to this level of accuracy.
    • So – trust the data as much as you can.  If you find serious errors in it, let me know and I will fix it for everyone.

    Enjoy the data – but remember, don’t let it stop you from watching a couple of the games – they have been really good so far (although a little sad for a few regular season champions – which will have to be a post for another time).

  • 2014 User Blog

    This is your opportunity to put your own thoughts on the tournament onto the site. Whether it be comments on the games, telling the world who you think will win, or simply wanting to have fun – this is your chance to be heard!!!!

    My only rule is that you keep things clean – we have families who come to the site. I reserve the right to remove any inappropriate comments.

    All you have to do is reply using the form at the bottom of the page!!!!