Monday, February 08, 2016

Superforecasting by Gardner and Tetlock (Book Review #6 of 2016)


Superforecasting: The Art and Science of Prediction

(This review appears on my personal blog, hence the personal nature of the review.) I work in a state government office responsible for forecasting large-dollar revenues and expenditures. Given the billions of dollars at stake, you would hope the profession would follow the advice of this book; I can assure you, taxpayer, that they rarely do. I have been to many econometrics and forecasting training seminars, as well as seminars put on by analysts at the Federal Reserve. These are put on by academics and practitioners and attended by forecasters in both government and private sector (energy, finance, real estate, etc.). I can say, sadly, the studies and Nobel-prize winning research underpinning this book are never mentioned.

I like this book because it's in the logical progression of books produced by several authors who are cited in the book, all of which were influential to me. Kahneman (who sits on the board of the Good Judgement Project), Taleb (and Mandelbroit), Silver, Thaler, and a couple books on chaos theory I read long ago (Ziauddin Sardar was one author), for examples. The book lays out several concepts those authors presented about cognitive biases, not being fooled by randomness, and thinking probabilistically that I already try to be aware of daily-- I try to be a Superforecaster. One of the authors (Tetlock) gives a good summary of the book in his interview with the Freakonomics podcast, and there is some depth on particular details of the book in an interview on Russ Roberts' EconTalk as well; I recommend listening.

This book is the result of The Good Judgment Project, an experiment funded by IARPA (ie: US National Intelligence) to see if crowdsourced forecasts of world events by thousands of closely-scrutinized/measured forecasters could beat "experts" with greater access to unclassified intelligence. IARPA's involvement follows an over 20-year study by Tetlock and others involving hundreds of researchers making tens of thousands of forecasts. The authors are studying the difference between accurate and inaccurate forecasters and looking for differences. The researchers were aware of random success and messed around with regrouping experimenters to see how the results would be affected. The Brier Score is used to evaluate success, and the book explains the concept well. IARPA was interested in short-term forecasts as opposed to long-term ones.

The book, alarmingly, opens with Tom Friedman as an example pundit/forecaster. (Friedman is of course laughable for his repeated "wait six months" comments repeatedly from 2003-2007 during the Iraq war, creating the term "Friedman unit" for an inconvenient amount of time to be proven right/wrong. The authors are nowhere near harsh enough with Friedman.) How did Friedman, who travels the Middle East and gets news translated from various languages, not forecast the Arab Spring? The difficulty of forecasting the Arab Spring is that nobody could. Conditions weren't really different than they had been in previous years, some of which saw pundits predicting upheaval to no result. The Tunisian man setting himself on fire and sparking protests that spread like wildfire is a bit like the butterfly effect in chaos theory. Gardner and Tetlock cite Lorenz's insight that events aren't easily predictable because phenomena (like weather) are non-linear and cannot be precisely modeled.

Kahneman's first and second systems are discussed (as explained in Thinking Fast and Slow). One of Kahneman's insights that sticks with me is that algorithms, mathematical models, will beat subjective judgements almost every time; even in cases they don't, they're usually only tied with subjective judgement in successes. There is a good summary of various cognitive biases that plague decision-makers and forecasters. In hindsight, we look for reasons for everything. It pains me to watch PBS Newshour and hear "the Dow dropped 5 points on disappointing earnings news." Really, millions of shares changed hands over the course of the day and one indicator drops a tiny fraction of a percentage and there is only one reason for it? The best forecasters teper intituion with caution.

One caution that I need to heed in my own job is to avoid adjectives. "Significant" or "serious" mean different things to different people. The most famous example of this in the book was the CIA's assessment of success at the Bay of Pigs. Kennedy interpreted the CIA's "fair chance" of success to mean >50%, when the CIA meant more like 30%. (I'm not sure the veracity of this story.) Leaders need to know probabilities and possible outcomes.

Brier scores are concerned with resolution and calibration. It is a bit like the mean squared error but in dealing with probabilities rather than specifically forecasted variables. A weatherman who forecasts a 60% chance of rain can be measured over time to see how often it rained. If it rains exactly 60% of the times that he predicts a 60% chance, his Brier score is a 0, the best possible, he is completely reliable. All of the "sabermetrics" found in sports these days always have me looking at probabilities of teams winning. The best team does not always win, it would win the majority of the time. College basketball and football don't have best-of-7 series, it's an n=1 deal. So, models may predict an overwhelming 75% chance of success for a team, which means it would still lose 25 out of 100-- and perhaps the next one. (This drives me nuts every March when people marvel about bracket predictions. Don't be fooled by randomness, the best team will rarely win a one-and-done tournament.)

The best forecasters are also Bayesian, they update their forecasts when new information comes available. Nate Silver has probably written the most about Bayesian statistics from a popular standpoint. The book notes that Silver got famous by calling each state's presidential votes, but that a "no change" prediction would have won you 48 out of 50 states, almost as good. Adjusting and updating rather than sticking to your original forecast is crucial. Try, fail, analyse, adjust, try again. Don't let noise sway you (though it's hard to determine signal from the noise). Research shows that even when people adjust their positions to new information, they do so less so than Bayes' theorem would say is optimal-- conservation bias. This is slightly better cognitive weakness to have than confirmation bias, where you selectively screen out information that contradicts what you think is true.

Blending forecasts, as Silver also does, leads to better results than just one forecast. Interestingly, the researchers culled the top 2% of forecasters after one year and grouped them together and found that their forecasts significantly improved-- they became even more "super." 30% of "superforecasters" regressed to the mean, suggesting their initial success was luck. The authors admit they have no way of knowing how many of the "superforecasters" are still successful by chance, but as time and more predictions are made it appears pretty certain that there is a significant difference among those at the top. This is not Bill Miller beating the S&P 500 for 10 years straight, this is more akin to someone beating the S&P 500 thousands of times.

The authors seem to steal language from Jim Collins-- foxes versus hedgehogs-- by assigning definitions. (Collins borrowed from the Greek Archilocus.) Hedgehogs are people who have only one idea. These may be the Nouriel Roubinis or the Jim Cramers of the world, everything is "recession" or "inflation." A broken clock is right twice a day but this is not very useful for predictions. Foxes, meanwhile, have a broader range of knowledge and are significantly better at forecasts--particularly short-term ones. Foxes were also more accurate in the confidence level they put on their forecasts.

Another key to forecasting is to be sure to get the "outside view," again espoused by Kahneman and Tversky. The infamous 2003 National Intelligence Estimate on Iraq and its WMD capabilities never had any "red teams" check it for accuracy or question the assumptions. "Red teams" of people who will push back on these points and find weaknesses in the argument are critical. The CIA did a better job of this vetting in finding Osama bin Laden and the authors give illustrations from the movie Zero Dark Thirty. Probability estimates were given by each CIA analyst and each key decision-maker read those probabilities differently. The authors contrast President Obama's decisions and Leon Panetta's. (One now-famous analyst had the guts to say "100%" when going around the table.) President Obama supposedly put the odds at 50-50 and gave the order to go in anyway, while Panetta allegedly did not want the risk. (I've read Panetta's account of this in his memoir, there are differences.)

One interesting aspect of the research is that a forecaster's "faith score," ie: how strongly he felt about God's intervention in the world, or fate, correlated negatively with accuracy. Basically, people who believe events happen for a reason or are foreordained make worse forecasters than those who chalk things up to randomness. The authors wonder "is misery the price of accuracy?" (As a Calvinist/Augustinian-oriented Christian who happens to make forecasts and judgments about those forecasts, I find I can be detached enough and be mostly aware of the cognitive biases I might succomb to. As it is, I'm always warning people not to be fooled by randomness. How I square this with believing that no molecule in the universe spins outside God's control is a difficult matter that I have not pinned down yet. I have listened to R.C. Sproul lectures on the subject of chance and lectures by Christian physicists. Sproul has a better grasp on it as a philosopher than myself. I have also not yet determined which theory of time I subscribe to, and it's all related.)

Interestingly, the book contains a decent critique of Taleb's thoughts on antifragility. I agree with them that it is "expensive" and "impractical" to insure that decisions are antifragile. Fragility is essentially the existence of fat tails-- events with really small probabilities but enormously bad outcomes. I fly on airplanes because it would be inconvenient not to, even though airplane travel clearly falls into the realm of the fat-tail event of the crash.

It's hard to make predictions on everything. "Not everything that can be counted counts, and some things that count can't be counted." In the end, the researchers beat IARPA's objectives, their forecasters were up to 60% more accurate answering questions that IARPA proposed than analysts who had access to more sensitive information. The book closes with a less admirable look at Tom Friedman. The authors argue that our nations' critical institutions must change how they forecast in order to improve outcomes and the lives for everyone at stake.

I give this book five stars out of five. It is a good illustration of how I try to problem-solve. I would give it to anyone who wants to know how I think.

No comments: