# Simple Linear Regression

When used

Simple linear regression is used to identify the direct predictive relationship between one predictor and one outcome variable. With linear regression we determine if the regression between the variables are significant including the direction and the magnitude. Additionally, we can find how much of the variation on the dependent variable is explained by the independent variable(s). We can for example predict salaries (dependent variable) based on experience (independent variable)

Variable Types

The analysis is used when both variables are interval or ratio.  While categorical variables can also be used in simple regression analysis (and regression analysis in general), if they have more than two dimensions and it might be impossible to see ranking in them (e.g. sophomores are at higher in academic rank than freshmen), it might not be possible to make the expected interpretation about the relationship between the variables.

We can analyze categorical variable using the technique called ‘dummy’ coding. When using two variables is as simple as assigning 0 to one of the variable and 1 to the other. The same technique is applied to more than two categorical variables, but more needs to done by creating a series of dummy code combinations that created as many dummy variables minus 1 variables. (Keith, 2015). This technique allows to compare each variable against the others showing several coefficient combinations effectively allowing to analyze the impact of each independent variable at one time.

Sample size

A number of suggestions have been made for sample size in correlation and regression analysis with more focus on multiple regression rather than simple linear regression, some of which are: (i) N>50+8m (where m is the number of independent variables) is needed for testing multiple correlation and N>104+m for testing individual predictors (Green, 1991), (ii)  number of predictors by at least 50 (Harris, 1985), (iii) if there are 6 or more predictors, the absolute minimum of participants should be 10, though it is better to go for 30 participants per variable, (iv). According to (Field, 2013) these rules of thumb oversimplify things because they do not take in consideration the power and the effect size. The best course of action is to calculate the required sample size depending on the expected power and effect. One of the tools to calculate that is GPower (see references below)

Specific Characteristics

Simple linear regression allows identifying main effect of one independent variable on the outcome variable.  When we run multiple linear regression, we can see the impact of the individual independent variable into the dependent variable. These will be presented in the form of coefficients that determine how much the dependent variable will grow (or shrink) for one increment of  the independent variables

Data Analysis

The main coefficients in the simple regression analysis (as well as in multiple regression analysis) is the R2 and F- ratio.  If the F-ratio is statistically significant, it means that the regression model results is significantly better predictor for the outcome variable than if the mean value of the outcome variable has been used (Field, 2009).

Write up of Results

An example of reporting the results of a multiple regression analysis (and simple regression can also follow this format is):

“The ANOVA table showed the regression is statistically significant. F (2,666) = 32.286, p<.001. The model accounts for approximately 9% of the variance in knowledge sharing (R2 = .089, R2adj=.086, p<.001). However, the relationship between cohesive-affective responses with knowledge sharing was not found to be statistically significant (Topchyan, p.656)”

Additionally, we can use a scatterplot to show the linear regression graphically that  plot the dependent variable against the  independent variable and also to show the line of best fit

Resources

Video clips:

Linear Regression - SPSS (Part 1)

Simple Linear Regressions

Using G*Power to calculate Sample Size (A Priori) HD

G*Power: Statistical Power Analysis for Windows and Mac

G*Power Manual

How to Work Out Required Sample Size for a Correlation and a Regression Using G*Power

Calculating Statistical Power Tutorial

References:

Allum, N. (2015). Multiple Regression and the British Crime Survey (2007-2008): Worry About Crime and Confidence in the Police   Retrieved from http://methods.sagepub.com/dataset/multiple-reg-in-bcs-2007-8 doi:10.4135/9781473937765

Allum, N. (2015). Simple Regression and the Race Implicit Attitudes Test (2012): Implicit and Self-Reported Racial Attitudes   Retrieved from http://methods.sagepub.com/dataset/simple-reg-in-iat-2012 doi:10.4135/9781473947634

Field, A. (2009). Discovering statistics using SPSS. Sage publications.

Field, A. (2013). Discovering statistics using SPSS (4rd ed.). Thousand Oaks, CA: Sage Publications.

Green, S. B. (1991). How many subjects does it take to do a regression analysis. Multivariate behavioral research, 26(3), 499-510.

Harris, R. J. (2001). A primer of multivariate statistics. Psychology Press.

Institute, T. O. (2015). Simple Regression and the U.S. Statistical Abstracts (2012): Infant Mortality and Poverty Across the U.S   Retrieved from http://methods.sagepub.com/dataset/simple-reg-in-us-stats-abstract-2012 doi:10.4135/9781473947580

Keith, T. Z. (2015). Multiple Regression and Beyond: An Introduction to Multiple Regression and Structural Equation Modeling, Kindle Edition