Quantitative Experimental

Leader: Dr. Brian Sloboda

•When do we use the design?

The premise of the quantitative research is to determine the relationship between an independent variable (IV) and a dependent variable (DV) in a population. More specifically, quantitative research designs are either descriptive (subjects usually measured once) or experimental (subjects measured before and after a treatment). A descriptive study establishes only associations between variables. An experiment design attempts to establish some causality between the variables that was designated for this research.

• Type of problem appropriate for this design

This type of research design is called true experimentation because this research design establishes the cause-effect relationship among a group of variables of the research. In a truly experimental design, an effort is made to identify and impose control over all other variables except for one variable. An independent variable is manipulated to determine the effects on the dependent variable.

An example of this research design would be to examine the exposure to a new teaching method. During the teaching of the new method, researchers will attempt to control two groups to keep them as identical and normal as possible and then allow one group to be exposed to the new teaching method. By doing the latter, this enables the researchers to isolate the effects of the new teaching method.  That is, the control group should be almost identical to the experimental group. Then, the experimental group is exposed to the change designated by the researchers, while the control group remains constant.

• Theoretical framework/discipline background


Randomization in quantitative research is not new and has been around since the introduction by Fisher (1935) and further expanded by Pitman (1937, 1938).  However, the fundamental problem during the early years of the quantitative experimental design was the availability of the appropriate data to conduct these experiments.  Fisher (1935) provided a solid introduction to the methods of quantitative experimentation, but as McCall (1923) elucidated, there is a lack of good data to conduct these experiments especially in education. The lack of good data has plagued researchers for many years, but it is still a concern even in modern research. 

Completely Randomized Design (CFD):  The Simplest Version of Randomized Experiments

The Completely Randomized Design (CRD) is the simplest version of randomized experiments.  When using the CRD, the researcher is interested in the mean responses of the treatment groups. One obvious place to begin is to decide whether the means are all the same, or if at least one of them differs. Restating this question in terms of models, we ask whether the data can be adequately described by the model of a single mean, or if we need the model of separate treatment group means. Recall that the single mean model is a special case of the group means. That is, we can choose the parameters in the group means model, so in comparing models, we actually obtain the same mean for all groups. The single mean model is said to be a reduced or restricted version of the group means model. Analysis of Variance (ANOVA) is the statistical method for comparing the fit of two models, one a reduced version of the other.  Put another way, ANOVA compares the relative sizes of the treatment variation and the error variation (within-group variation). The error variation is unaccounted-for variation and can be viewed as variation due to individual differences within treatment groups.  Typically, the null hypothesis is that all the means are the same, and the alternative hypothesis states at least one of the means differ. If a significant difference in treatments is present, the treatment variation should be large relative to the error variation (Maxwell and Delaney, 2004; Creswell, 2013).

Because the researcher has more than two samples, the researcher must make multiple comparisons or conduct a simultaneous inference to identify the statistically significant relationships. When doing multiple comparisons, the researcher must consider the error rate, namely the Type I error.   When a family contains more and more true null hypotheses, the probability that one or more of these true null hypotheses is rejected increases, and the probability of a Type I errors could be quite large. Consequently, the multiple comparisons deal with the Type I errors (Miller, 1981).

There is no dearth of multiple comparison methods that can be used. Fisher (1935) suggested the first multiple comparisons technique (1935) with the advent of the SNK method by Newman (1939). Until the 1950s, there were no advances in the multiple comparison methods.  In the 1950s, the following multiple comparison methods were introduced: Duncan’s multiple range procedure (Duncan 1955), Tukey’s HSD (Tukey 1952), Scheff´e’s all contrasts method (Scheff´e 1953), Dunnett’s method (Dunnett 1955), and another proposal for SNK (Keuls 1952). The simplest version of the multiple comparisons is the Bonferroni inequality, and there were improvements in this method leading to the modified Bonferroni procedures in the 1970s and later (Holm 1979; Simes 1986; Hochberg 1988; Benjamini and Hochberg 1995).

Randomized Block Design (Two Way ANOVA)

Another variation of the CRD is the randomized block design. The randomized block design focuses on one independent variable (treatment variable). However, the randomized block design also includes a second variable, referred to as a blocking variable.  A blocking variable is used to control for the confounding variables in the research study. A blocking variable is a variable that the researcher wants to control but is not the treatment variable that is of interest to the researhcer A special case of the randomized block design is the repeated measures design. This design is a randomized block design in which each block level is an individual item or person, and that person or item is measured across all treatments. As an example, where a block level in a randomized block design is night shift and items produced under different treatment levels on the night shift are measured, in a repeated measures design, a block level might be an individual machine or person; items produced by that person or machine are then randomly chosen across all treatments. Thus, a repeated measure of the person or machine is made across all treatments (Maxwell and Delaney, 2004; Creswell, 2013).

Factorial Treatment Structure (Two Way ANOVA)

The ANOVAs discussed in the preceding subsection are completely randomized designs in which the treatments are assigned randomly. Thus, these treatments have no structure, and they were just treatments. Factorial treatment structure exists when the treatments are the combinations of the levels of two or more factors.  Put another way, some randomized experiments are designed, so two or more treatments (independent variables) are explored simultaneously.  The research is still a completely randomized design, but the researcher added a structure to the treatments.  In the factorial treatment structure, every level of each treatment is studied under the conditions of every level of all other treatments. Some researchers use the factorial design as a way to control confounding variables in a study. By building variables into the design, the researcher attempts to control for the effects of multiple variables in the experiment.  Recall that under a CRD, the variables are studied in isolation. With the factorial design, there is potential for increased power over the CRD because the additional effects of the second variable are removed from the error sum of squares (Maxwell and Delaney, 2004; Creswell, 2013).

Many researchers tend to look at the analysis of factorially structured data by looking at which main effects and interactions are significant. That is, the researcher should carefully look at the data and to attempt to determine what does the underlying data tell us. For example, reporting that factor X only affects the response at the high level of factor Y is more informative than reporting that factors X and Y have significant main effects and interactions.  That is, we should not just report the significant effects. Thus, there should be clear examination of the multiple comparisons which can easily be done by the methods discussed earlier and also modeling the interaction effects as proposed by (Johnson and Graybill 1972, Cook Weisberg, 1982 and Mandel, 1961).

Other Approaches to Randomized Design

Rubin (1973a, 1973b, 1974, 1977, and 1978) disseminated a series of papers that serves as the foundation for the dominant approach to the analysis of causal effects. More specifically, Rubin proposed interpretation of causal statements as comparisons of so-called potential outcomes: pairs of outcomes defined for the same unit given different levels of exposure to the treatment, with the researcher only observing the potential outcome corresponding to the level of the treatment received. Put another way, these models are developed for a pair of possible outcomes rather than the goal of a single outcome. Rubin’s formulation of the problem of causal inference is known as the Rubin Causal Model (RCM) as coined by Holland (1986). The RCM serves as the standard in quantitative experimental type statistical and econometric analysis. 

The importance of Rubin’s approach emphasizes the relationship between the assignment of the treatment and the potential outcomes.  In fact, the simplest form of an assignment is randomized, or the subject is assigned to a treatment or control group; consequently, it is independent of the covariates as well as the outcomes. In the latter randomized experiments, the research is able to obtain the estimators for the average effect of the treatment using attractive properties under repeated sampling (e.g., the difference in mean by treatment). RCM uses potential outcomes; however, only one of these potential outcomes can actually be observed which depends on the assignment mechanism or how the subject is assigned to a treatment. Holland (1986) calls the latter the “fundamental problem of causal inference.” In other words, for individuals who we observe under treatment, we have to form an estimate of what they would have looked like if they have not received the treatment.  The observed outcome can be written in terms of the outcome in the absence of treatment plus the interaction between the treatment effect for that individual and the treatment dummy variable.  Imbens and Wooldridge (2009) delve into a discussion concerning the advantages of thinking in terms of the potential outcomes. 

What role does the RCM play in the formulation of causal analysis?  First, it forces the researcher to clearly think of the causal effects of specific manipulations in the research design. So, the questions of the ‘effect’ of fixed individual characteristics (e.g., gender or race) do not do well here or need to be carefully construed. Holland (1986) and even Rubin stressed that “no causation without manipulation.” The second role of the RCM deals with the estimation of the treatment effects, and there will be uncertainty associated with it.  Holland (1986) and even Rubin stress that the uncertainty is not about the sampling variation. Access to the entire population of observed outcomes, y, would not redress the fact that only one potential outcome is observed for each individual unit, and so the counterfactual outcome must still be estimated—with some uncertainty—in such cases.

The literature regarding causal inference has been proliferating over the past few decades. There is no dearth of methods regarding the estimation of causal inference:  instrumental variable methods, panel regressions, regression continuity designs, quantile regressions, and other empirical approaches. The reader is referred to Angrist and Pischke (2009) for the details of these prior empirical methods.  Despite the proliferation of the literature, RCM still remains the dominant framework. The more recent literature has stressed the relaxing of the functional forms and the distributional requirements (not covered in this discussion), and the changes in the functional forms as well as distributional requirements have allowed for general heterogeneity in the effects of the treatment. 

Specific Characteristics

Sample Size

When doing the quantitative experimental design, there is not a minimum sample size. Given there is no minimum sample size, the researcher might have problems with statistical power which affects one’s ability to reject a false null hypothesis. If the researcher fails to reject the null hypothesis, the researcher should collect more data to increase the statistical power.

Sampling Method

Like other statistical inferential tests, the observations are to be independent of each other, and the observations are to be collected from a simple random sample because it adheres to the tenet of probability.

Data Analysis

In the experimental design, there are two main types of ANOVA:  One Way ANOVA and Two Way ANOVA with or without replication. 

●        One-way ANOVA: a researcher wants to test at least two groups to assess differences between these groups;

●        Two-way ANOVA (without replication): a researcher has one group and is double-testing that same group.

●        Two-way ANOVA (with replication): Two groups, and the members of these groups are engaged in more than one activity. As an example, two groups of patients from different hospitals attempting two different drug therapies.

Not discussed in this section is the Multivariate ANOVA (MANOVA) which is an extension of the ANOVA by having several dependent variables that adhere to the idea of a completely randomized design. MANOVA is similar to experiments with the purpose of finding out if the dependent variable is changed by manipulating the independent variable.

Depending on the nature of the research questions or what the researcher is attempting to answer, the researcher may need to use the instrumental variable methods, panel regressions, regression continuity designs, quantile regressions, and other empirical approaches. 


References Covering this Research Design:  

Angrist, J. D., & Pischke, J. S. (2009). Mostly harmless econometrics: An empiricist's companion. Princeton University Press. (Good for the instrumental variable methods, panel regressions, regression continuity designs, quantile regressions)

Leedy, P. D., & Ormrod, J. E. (2013). The nature and tools of research. Practical research: Planning and design (A good introduction to developing the appropriate research design)

Maxwell, S. E., & Delaney, H. D. (2004). Designing experiments and analyzing data: A model comparison perspective (Vol. 1). Psychology Press. (A good discussion concerning ANOVA and other related methods)

Pallant, J. (2016). SPSS Survival Manual. Buckingham, PA: Open University Press. (Good for the ANOVA, MANOVA, multiple comparisons, and factorial treatment structure)

Video clips

Khan Academy has good introductory videos on ANOVA:

VIMEO videos has good material on MANOVA:


Angrist, J. D., & Pischke, J. S. (2009). Mostly harmless econometrics: An empiricist's companion. Princeton University Press.

Benjamini, Y. and Y. Hochberg (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B 57, 289–300.

Cook, R. D. and S. Weisberg (1982). Residuals and influence in regression. London: Chapman and Hall.

Creswell, J. W. (2013). Research design: Qualitative, quantitative, and mixed methods approaches. Sage Publications.

Duncan, D. B. (1955). Multiple range and multiple F tests. Biometrics 11, 1–42.

Dunnett, C. W. (1955). A multiple comparisons procedure for comparing several treatments with a control. Journal of the American Statistical Association 50, 1096–1121.

Fisher, R. A. (1935).  The design of experiments. ( 1st ed.) London: Oliver &: Boyd.

Hochberg, Y. and A. C. Tamhane (1987). Multiple comparison procedures. New York: Wiley.

Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandanavian Journal of Statistics 6, 65–70.

Johnson, D. E. & Graybill, F.A. (1972). Analysis of two-way model with interaction and no replication. Journal of the American Statistical Association 67, 862–868.

Keuls, M. (1952). The use of the studentized range in connection with an analysis of variance. Euphytica 1, 112–122.

Imbens, G. W. & Wooldridge, J. M. (2009). Recent developments in the econometrics of program evaluation. Journal of Economic Literature 47(1), 5–86.

Mandel, J. (1961). Non-additivity in two-way analysis of variance. Journal of the American Statistical Association 56, 878–888.

Maxwell, S. E., &  Delaney, H. D. (2004). Designing experiments and analyzing data: A model comparison perspective (Vol. 1). Psychology Press.

McCall, W. A. (1923). How to experiment in education. New York: Macmillan.

Newman, D. (1939). The distribution of the range in samples from a normal population, expressed in terms of an independent estimate of the standard deviation. Biometrika 31, 20–30.

Holland, P.A. (1986). Statistics and causal inference.  Journal of the American Statistical Association 81,396.

Pitman, E. J. G. (1937). Significance tests which may be applied to samples from any populations: I and II. Journal of the Royal Statistical Society, Series B 4, 119–130, 225–237.

Pitman, E. J. G. (1938). Significance tests which may be applied to samples from any populations: III. Biometrika 29, 322–335

Rubin, D.B.(1973a). Matching to remove bias in observational studies. Biometrics, 29(1), 159-83.  

Rubin, D.B.(1973b).  The use of matched sampling and regression adjustment to remove bias in observational studies. Biometrics, 29(1), 184-203.

Rubin, D.B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5), 688-701.

Rubin, D.B. (1976). Inference and missing data. Biometrika, 63(3), 581-92.

Rubin, D.B. (1977). Assignment to treatment group on the basis of a covariate.  Journal of Educational Statistics, 2(1), 1-26.

Rubin, D.B. (1978). Bayesian inference for causal effects: The role of randomization. Annals of Statistics, 6(1), 34-58.

Scheff´e, H. (1953). A method for judging all contrasts in the analysis of variance. Biometrika 40, 87–104.

Simes, R. J. (1986). An improved Bonferroni procedure for multiple tests of significance. Biometrika 73, 751–754.

Tukey, J. W. (1952). Allowances for various types of error rates. Unpublished IMS address.