Wednesday, May 20, 2009

Google Runs a Regression

A former Kellogg student forwards this cool article about Google using an "algorithm" to try to figure out which of its employees are likely to quit.

Google Searches for Staffing Answers

While Google is being tight-lipped about what they're doing, I am here to tell you that it's not so complicated. I doubt they're doing very much more than running a simple regression, not that much more advanced than what MBAs learn in b-school. Linear regression is, as MBAs know, is the best linear unbiased estimator of an underlying relationship. So it's exactly the tool for picking out the relationship between "quits" and various factors affecting the work environment.

Here's how you do it:

The simplest way is to estimate what's called a "linear probability model." Let the dependent variable in your regression be a zero if the employee doesn't quit, and one if he does. Let the independent variables be all the data you have pertaining to the employee's personal characteristics and work environment. Run a linear regression, and look for factors that strongly predict quit behavior. In a linear probability model, you can interpret the coefficients on the explanatory variables as how much a one unit change in that explanatory variable increases the probability of a quit.

Now the linear probability model isn't exactly the best thing to do, because of some funny characteristics of probabilities. Specifically, because probabilities have to be between zero or a one but the linear regression model doesn't account for that, you can get situations where the predicted quit probability for a given individual is less than zero or greater than one. So keep that in mind. But the linear probability model is quick and dirty, and it will give you insight into the relationships in your data.

To do something a bit more rigorous, you can do a logit or probit regression. Most of your standard statistics packages can handle this kind of regression pretty easily, and these methods always yield sensible quit probabilities. You have to do some more work to figure out the strength of the relationship using these methods, but it's not too tough.

After you've identified factors that are associated with quits, you can think about doing something about those factors --- and perhaps reducing your turnover.

One danger in this kind of analysis, though, is misinterpreting what causes what. People who know they're likely to quit are probably people who aren't going to undertake long-term projects at work. So the data might well tell you that people who work on a succession of short-term projects are more likely to quit than people who work on long-term projects. But this doesn't mean that shifting everyone to long-term projects will reduce turnover, because it's the employee's turnover intentions that drive project choice, not vice versa. So you need to think hard about what your data means before making wholesale changes in your organization.

Now you're as smart as Google. But unfortunately not as cool.

No comments: