Monday, March 24, 2008

Fooled by Randomness- March Madness edition (how to fill out your brackets...again)

A few weeks ago I reviewed a book by the same title as my post that had changed my life. Apparently, it didn't change it enough for I found myself and thousands others fooled by the randomness that is the NCAA tournament.

There are a lot of websites out there trying to help you fill out a more accurate bracket. Everyone is looking for a pattern or statistical explanation to previous tournaments to help them predict outcomes of future tournaments. I joined the quest and, judging from the emails I've received from PhDs, students, and other bracketologists, I did so much more rigorously than others out there.

In the book, Taleb decries some use of econometrics as quasi-science. He talks of Wall Street traders who spend long hours looking for any correlations between prices and other events, ranging from the weather to economic variables. Taleb is convinced that there is some price out there whose changes are correlated with the temperature changes in outer Mongolia. Of course, the 2 aren't related-- the relationship is purely random. But, many traders will invest based on these things like it's a sign from God.

Taleb makes the point that sometimes there are so many underlying and unrecorded factors that it's hard to say that past data is useful in understanding anything.

And so it is with NCAA tournament performance. This why the odds of predicting a 100% accurate bracket are somewhere between 1 in 2.1 billion to the trillions depending on who you ask.

Don't get me wrong the "tempo-free" data analysis is right on the money in evaluating team and player performance and is more useful than the dribble from Digger Phelps and Jay Bilas. There's a lot that fans and coaches can learn from it. It's just not useful in predicting tournament wins.

One website put data from a couple sources into a Monte Carlo simulator and simulated the games 10,000 times. According to it, the odds of Siena, Villanova, and San Diego all winning in the first round on the same day were 0.6%. The odds of the entire tournament coming together like it has are approaching astronomical (like it does every year). Do you know what it's called when astronomically improbable events occur? RANDOMNESS.

Of course, these games were simulated using data from the season. I question how rare these events actually are, because as Taleb writes, perhaps past events shouldn't determine the likelihood of an event in the future.

I see this with the point spreads. I started out tracking the predicted point spreads as reported and averaged by Tbeck. Just about every game in the last 2 days of tourny play have been several standard deviations beyond the mean prediction. And this is using "reliable" sources like Sagarin's Pure Points which supposedly get more accurate as the season goes on.

The analysis of past data told me that Davidson, Butler, St. Mary's, and Drake were all really good teams. If you were going to use that fact and choose some upsets, you'd go with them. But, only Davidson really did squat and pulled the "upset." And it upset a Georgetown team that past data told me was underrated itself. Siena, San Diego, and Villanova shouldn't have won. But they did. There's no explanation for this other than randomness. Basketball games have too many unobserved, unquantifiable human factors that go into them to predict accurately.

So, here's my suggestion for filling out your next bracket:
Flip a weighted coin. Go to BracketScience or Wikipedia or someplace and find out how many times each seed has won their game in the first round. Find a computer program to generate a 1 or a 0 based on those odds, like flipping a weighted coin. Fill out the bracket with whatever number it randomly generates. Then, fill out the 9 more you're allotted on ESPN to increase your chances of randomly getting a more accurate bracket than others.

So, will I do my regression analysis and forecasts next year? You bet! Because it's fun, and interesting, and I've made new friends doing it.


Jessica said...

I used the data I bought from you to complete one of my brackets. I'm currently 3rd in that group.

I used my own "data" (aka: choosing teams I've heard of or who have cute uniforms) to complete a bracket in another group. I'm currently 10th in that group.

Random or not, there must have been something to your data. I'll let you know how it turns out.

JTapp said...

LOL. My model is in the 3 percentile group on ESPN. Over 3 million people picked a better bracket thus far.

My second model is is in the 9th percentile. Only 2.8 million people did better.

Your odds with the cute uniforms are better, within reason, so long as you didn't pick a 16 seed over a 1 or something like that. Again, follow the weighted coin's projections.

TaylorW said...

it's always crazy, but that's why we like it!

solid thoughts...