CEITR Proposal Acceptances for AECT 2022
This blog aims to provide an introduction to what public use data set are, how they might benefit your research agenda, benefits and drawbacks to using public use data sets, and ethical considerations. It concludes by providing examples of public use data sets and hints on how to start to identify public use data sets in your research area.
You likely already are aware of research conducted with public use data sets (PUDS), such as the US Census or the National Center for Health Statistics. Public use data sets (PUDS) refer to data sets where a research project was conducted and the data from the project was then made available for public use. PUDS are maintained by independent agencies, typically government or those affiliated with government funds, and have fixed surveys, samples, and variables that are set and cannot be changed.
PUDS have set data collection methods and logistics, including surveys, sampling designs, manuals – often referenced to in codebooks or data dictionaries – and complete data files (e.g. SAS, SPSS, STATA, ASCII). Many PUDS have two formats: a public-use dataset and a restricted-use dataset. The latter requires completing and acquiring a confidentiality contract to access additional variables or identifiers. Many PUDS are free, but some might charge fees to offset costs. Before subscribing to a fee-based PUDS, you should carefully review the methods and data available.
Public use data sets might also be called secondary data or archival data. However, not all archival, publicly available, or secondary data are PUDS. For example, many financial websites will track profit margins of companies over many years, which a researcher could then compile to examine archival research questions. While primary data analysis refers to a researcher examining a data set they designed and collected, secondary data analysis refers to a researcher examining a data set that they did not design and collect. Secondary data analysis on public use data sets (PUDS) is common among social scientists.
Primary data, or data that you have designed and collected yourself, has the advantage of providing you with the ability to answer specific questions with your specific population. However, secondary data analysis on PUDS can be a time- and money-saving resource if the PUDS is able to address your research questions and population.
Using PUDS could benefit your research agenda. If you are developing a research agenda PUDS might be a good place from which to start developing a line of research questions. Over time, you might find that you will progress to a set of research questions that are no longer supported by a PUDS and require primary data, or you might find that you can develop a series of research questions that are answered within PUDS. You might also use PUDS to start to explore a research area to better foundation for a larger research project. Last, a literature review might identify a current gap or next step needed from knowledge developed in a study using a PUDS, which you could then follow up on.
Most researchers find that the benefits of using PUDS outweigh the drawbacks, but these characteristics need to be considered in conjunction with your research questions and overall research agenda. Below is a list of common benefits and drawbacks.
One critical consideration a researcher needs to make is how well the sample design, data collection period, and variables can address their research question. For example, if you are interested in the higher education graduation to career transition, and a data set surveys participants once a year, you might not be able to capture the change. Likewise, if you are interested in the household composition of different ethnic groups, you could use Census data but are limited to their ethnic group categories. You also need to consider the variables in the PUDS align with how it is being conceptualized in the larger literature.
Research studies using PUDS most often can be categorized as containing minimal risk. In this case, researchers can submit a request for an expedited or exempt review. Exemption from review is a form of approval that the University of Phoenix’s (or sponsoring institution) IRB must provide; it is not up to the researcher to make this decision. Review status often depends on the data set and research questions. In addition, quality academic journals hold and abide by their own ethical standards that might require documentation of IRB exempt approval. At the Office of Scholarship Support, we advise all faculty to seek IRB approval or consultation before starting any research study.
The University of Phoenix Institutional Review Board (IRB) maintains a Human Research Protection Program that manages the IRB process, approvals, and provides a wealth of knowledge and guidance for further reference. During the course of the research project, researchers’ would need to work with IRB again if they identify additional research questions to investigate or find it necessary to change any aspect of an already approved study.
Once you have a research topic or question identified, your next step is to identify a PUDS that addresses this research topic or question. Below are six key hints to start to locate a PUDS in your research area:
Data.gov is an excellent place to start. It is multidisciplinary and houses the largest catalogue of data sets sponsored with public funds. Data can be searched for using keywords, categories, location, topics, and more. However, given the volume of data it can be quite overwhelming to use at first. The Inter-University Consortium for Political and Social Research (ICPSR), run out of the University of Michigan, is another excellent resource for locating data appropriate for your research question.
The most recognized and largest PUDS is the US Census. The US Census maintains and houses a collection of surveys and programs with data sets, such as the American Community Survey and the American Housing Survey. Below are examples of quality PUDS projects and the type of project they primarily reflect. They are a good place to start to become familiar with what PUDS looks like and can do for research.
Government based projects tend to house a number of different projects and tend to be cross sectional. Their focus tends to be on trends and current descriptive.
University based projects generally are specific to a study and tend to be longitudinal. They focus tends to be on answering developmental questions.
Collaboration projects are projects that are sponsored by and/or collaborate data collection with multiple organization. These may be houses in a government center, university, or private organization.
Independent projects capture other PUDS that private organizations develop and maintain.
When you locate a PUDS, you might consider asking yourself the following questions about the quality and availability of the data:
We hope this blog post provides valuable information about PUDS in relation to your agenda as a scholar. We invite you to continue to watch for additional trainings and webinars on PUDS, as well as engage with us below in what other questions, comments, or data sets you use are.