A challenge that many novice researchers face is deciding on the appropriate statistical test for their research problem or research question. Even experienced researchers or those who have spent time away from statistics may feel a bit rusty and need a refresher. Let’s say researchers are interested in looking at the relationship(s) across variables. They may ask themselves, “in this situation, do I apply a Pearson’s or Spearman’s Correlation? Or, should I consider setting up cross tabulations or maybe a regression analysis?” Then, there are those times when researchers want to compare means or hypotheses; the researcher might think about using t-Tests or ANOVA based on the sample data.
To make life easier, I have developed this statistical analysis decision tool. Read on to learn more.
This tool is designed to assist the novice and experienced researcher alike in selecting the appropriate statistical procedure for their research problem or question. Below we provide commonly used statistical tests along with easy-to-read tables that are grouped according to the desired outcome of the test. Also provided below are a variety of links for added support. Enjoy, and happy number crunching!
To use this tool please select the applicable goal of the analysis of the data then work through the tables from left to right to select the correct statistical test. Make sense? Let’s try one.
Sometimes the first step in any study is to organize the data and understand patterns. This can be accomplished with descriptive statistics such as frequencies, means, standard deviations, etc.
Relational analysis helps the researcher understand the relationship between two or more variables. “The notion of relation expresses the rapport that exists between two random variables” (Dodge, 2010, p. 455).
Correlation analysis identifies the strength of the relationship between two variables. When the relationship is closer 1, the relationship is more position, and the closer the number is to -1, the relationship is more negative. A positive relationship means both variables are going in the same direction, whereas a negative relationship means the variables are going in opposite directions. A zero value means there is no relationship between the two variables.
“Regression analysis is a technique that permits one to study and measure the relation between two or more variables. […] The goal is to estimate the value of one variable as a function of one or more other variables” (Dodge, 2010, p. 450).
Key differences between correlation and regression are articulated in these links http://www.graphpad.com/faq/viewfaq.cfm?faq=1141 and http://www.statpac.com/statistics-calculator/correlation-regression.htm), but – in summary – regression can do more for your research than correlation as regression allows you to model multiple variables.
Comparison analysis seeks to test hypotheses on a sample mean or to compare means of two samples. The outcome of this type of analysis is usually “there is a statistically significant difference” or that “there is not a statistically significant difference” between/among data sets.
“A parametric test is a form of hypothesis testing in which assumptions are made about the underlying distribution of observed data. […] The Student test is an example of a parametric test. It aims to compare the means of two normally distributed populations” (Dodge, 2010, p. 412). Nonparametric procedures are really handy when you think you are going to use one of procedures we have discussed, but for one reason or another (often sample size), you cannot. Below is a list of parametric tests along with their non-parametric equivalent.
Click on the links below for more information about the test, what it does, and how to use it.
Comments
Thank you for providing this, Scott.
I would like to have a pdf version to send to my students. Do you have one?
Would you make one for multivariate analysis?
Here is a link to the PDF. Enjoy!
Click to view or download the PDF version
Hi Kathleen - I'm sure we can post a PDF within the next few days or so. While ANOVA and Regression are designed to handle multivariate analysis a follow-up to this post could be to address this in more detail. Thanks, Scott
Scott,
I've talked to you briefly at various meetings. I'm very interested in analytics. I'm trying to decide which to focus on: R ? Python? Rapidminer? I know you have told me you use SPSS. As faculty, I have access to SPSS, but I think you need the SPSS "Modeler" for most of the clustering techniques, etc. Do we have access to this? Or should I just focus on the capabilities in the base SPSS product?
Any direction you can give will be greatly appreciated.
thanks!
Lou