Monday, August 17, 2009

Data Mining for Stocks

Recently, I linked to an NYT article about data mining and statistics. I wrote about how valuable these skills are becoming, but indicated that it's possible to make big mistakes if you don't know what you're doing. This is why it's a good idea to get trained by experts who use statistical analysis in their own work all the time (such as our faculty here at the University of Utah's David Eccles School of Business).

And as if by divine command the WSJ posts an article illustrating just this fact:

Data Mining Isn't a Good Bet For Stock-Market Predictions

Cool nugget in the article: Data mining techniques suggest that Bangaldeshi butter production is highly correlated with US stock returns. So, get the butter data and make a fortune in the stock market, right?

This, of course, is completely ridiculous. As the article reminds us, correlation does not imply causation, and you have be both smart and well-trained (see plug to DESB, above) to know the difference.

But there's more to the stock-market/data-mining interface than just this. To illustrate my point, note that we can probably summarize the WSJ article as follows:

Data mining with stock market data might give you spurious correlations.
But the deeper point (which isn't discussed in the WSJ article but should be) is this:
Any correlations you find doing data mining on stock market data are probably spurious correlations.
Why?

It's the profit motive together with the price mechanism.

Here's how it works: Suppose data mining uncovers a real, causal relationship between some variable and future equity prices. As an example, suppose that any time it snows more than 6 inches at Alta on Martin Luther King Jr. Day, then the S&P 500 goes up by 10% in February. Well, the profit motive means there are lots and lots of people out there looking for such patterns in the data. And once these patterns are discovered, people will start to trade using this information.

We'll end up with a lot of people watching the MLK-day Alta snowfall. When it snows, they'll all buy stock, hoping to cash in on that 10% return in February. And what will their purchases do to market prices? Drive prices up, of course. How far? Prices will be driven up the full 10% on the day right after MLK day. And this means that prices won't go up by 10% in February; because that increase has already been priced into the market.

The profit motive together with the price mechanism will tend to knock out any "real" data-mining/stock-market connections, pretty much immediately.

So the question to ask about data-mining for stocks is this: If the connection is real, then why haven't traders found it (and exploited it) already?

This isn't to say that one could never make money doing data-mining type stuff on stocks. I do know people who do just this, and they seem to make money (sometimes, at least). But these guys have really big computers, and really big data sets, and they're doing really complicated stuff, and on top of that they're always asking themselves why the correlations they've found haven't been found (and exploited) by others. It's an extremely, extremely competitive area. So you gotta be careful.

4 comments:

Unknown said...

Regular people should remember that the there are scary smart people who do this for a living.

The smartest math guy I knew at Stanford got a math BS at 19 and started the stats PhD program. He got snatched up by some Wall Street firm before he even finished his thesis. There he was part of a group of equally smart guys who had access to the best computers.

If he wasn't trading on the correlation...

Scott Schaefer said...

My sophomore year roommate was both scary and smart, but I don't think that's what you mean.

SeanM said...

As Yogi Berra once said - "predicting is difficult, especially about the future." Models can do a good job of predicting, as indicated in the WSJ article, however, variables can change in the future to throw models off.

The real problem with data mining as it relates to stock picking is that past performance is no guarantee of future results. I do believe, however, that market anomalies exist and can be turned into investment strategies. Markets are not efficient and arbitrage opportunities exist, but anomalies from data mining must persist for gains to occur.

Unknown said...

I think the author of "Black Swan"
Nassim Nicholas Taleb did something like this. His book talks about his method of winning it big, and its all about betting on the long shot with a known downside. He essentially concludes that predicting is impossible, so play a bunch of long shots and hopefully one will play off big.

Safe bets don't pay off big, Risky bets do, that's why something with gov't bailout as insurance is the best bet.