Wednesday, December 24, 2008

Linear Regression Analysis - 3 Common Causes of Multicollinearity and What Do to About Them

Multicollinearity in regression is one of those issues that strikes fear into the hearts of researchers. You've heard about its dangers in statistics classes, and colleagues and journal reviews question your results because of it.

Multicollinearity is simply redundancy in the information contained in predictor variables. If the redundancy is moderate, it usually only affects the interpretation of regression coefficients. But if it is severe-at or near perfect redundancy, it causes the model to "blow up." (And yes, that's a technical term).

But the reality is that there are only five situations where it commonly occurs. And three of them have very simple solutions. These are:

1. Improper dummy coding.

When you change a categorical variable into dummy variables, you will have one fewer dummy variable than you had categories. That's because the last category is already indicated by having a 0 on all other dummy variables. Including the last category just adds redundant information, resulting in multicollinearity. So always check your dummy coding if it seems you've got a multicollinearity problem.

2. Including a predictor that is computed from other predictors.

For example, I once had a client who was trying to test if larger birds had higher probability of finding a mate. This bird had a special tail, and he wondered if the size of the whole bird or the tail was more helpful to the bird in finding a mate. To compare them, he put three measures of size into the model: Body length, tail length, and total length of bird. Total length was the sum of the first two. The model blew up. Include two, but not all three.

3. Using the same or nearly the same variable twice.

A similar situation occurs when two measures of the same concept are included in a model. Sometimes researchers want to see which predicts an outcome better. For example, does personal income or household income predict stress level better? If they are both just measuring income, combine them into a single income variable using Principal Components Analysis.

And to get to the bottom of detecting and correcting for multicollinearity, I invite you to a free download of a 75-minute training audio when you go to The Analysis Factor website. Visit http://www.analysisfactor.com to get started today.

© 2008 Karen Grace-Martin - Statistical Consultant and founder of The Analysis Factor

Karen Grace-Martin has helped social science researchers practice statistics for 9 years, as a statistical consultant at Cornell University and in her own business. She knows the kinds of resources and support that researchers need to practice statistics confidently, accurately, and efficiently, no matter what their statistical background. To answer your questions, receive advice, and view a list of resources to help you learn and apply appropriate statistics to your data, visit http://www.analysisfactor.com

1 comment:

Toronto airport limo said...

The ideas of this blog really nice everyone get attracted to read this nice blog.