An Introduction to Public Use Data Sets
An Introduction to Public Use Data Sets
This blog aims to provide an introduction to what public use data set are, how they might benefit your research agenda, benefits and drawbacks to using public use data sets, and ethical considerations. It concludes by providing examples of public use data sets and hints on how to start to identify public use data sets in your research area.
What are Public Use Data Sets?
You likely already are aware of research conducted with public use data sets (PUDS), such as the US Census or the National Center for Health Statistics. Public use data sets (PUDS) refer to data sets where a research project was conducted and the data from the project was then made available for public use. PUDS are maintained by independent agencies, typically government or those affiliated with government funds, and have fixed surveys, samples, and variables that are set and cannot be changed.
PUDS have set data collection methods and logistics, including surveys, sampling designs, manuals – often referenced to in codebooks or data dictionaries – and complete data files (e.g. SAS, SPSS, STATA, ASCII). Many PUDS have two formats: a public-use dataset and a restricted-use dataset. The latter requires completing and acquiring a confidentiality contract to access additional variables or identifiers. Many PUDS are free, but some might charge fees to offset costs. Before subscribing to a fee-based PUDS, you should carefully review the methods and data available.
Public use data sets might also be called secondary data or archival data. However, not all archival, publicly available, or secondary data are PUDS. For example, many financial websites will track profit margins of companies over many years, which a researcher could then compile to examine archival research questions. While primary data analysis refers to a researcher examining a data set they designed and collected, secondary data analysis refers to a researcher examining a data set that they did not design and collect. Secondary data analysis on public use data sets (PUDS) is common among social scientists.
How Might PUDS Benefit My Research Agenda?
Primary data, or data that you have designed and collected yourself, has the advantage of providing you with the ability to answer specific questions with your specific population. However, secondary data analysis on PUDS can be a time- and money-saving resource if the PUDS is able to address your research questions and population.
Using PUDS could benefit your research agenda. If you are developing a research agenda PUDS might be a good place from which to start developing a line of research questions. Over time, you might find that you will progress to a set of research questions that are no longer supported by a PUDS and require primary data, or you might find that you can develop a series of research questions that are answered within PUDS. You might also use PUDS to start to explore a research area to better foundation for a larger research project. Last, a literature review might identify a current gap or next step needed from knowledge developed in a study using a PUDS, which you could then follow up on.
Benefits and Drawbacks to PUDS
Most researchers find that the benefits of using PUDS outweigh the drawbacks, but these characteristics need to be considered in conjunction with your research questions and overall research agenda. Below is a list of common benefits and drawbacks.
One critical consideration a researcher needs to make is how well the sample design, data collection period, and variables can address their research question. For example, if you are interested in the higher education graduation to career transition, and a data set surveys participants once a year, you might not be able to capture the change. Likewise, if you are interested in the household composition of different ethnic groups, you could use Census data but are limited to their ethnic group categories. You also need to consider the variables in the PUDS align with how it is being conceptualized in the larger literature.
Ethical Implications and IRB
Research studies using PUDS most often can be categorized as containing minimal risk. In this case, researchers can submit a request for an expedited or exempt review. Exemption from review is a form of approval that the University of Phoenix’s (or sponsoring institution) IRB must provide; it is not up to the researcher to make this decision. Review status often depends on the data set and research questions. In addition, quality academic journals hold and abide by their own ethical standards that might require documentation of IRB exempt approval. At the Office of Scholarship Support, we advise all faculty to seek IRB approval or consultation before starting any research study.
The University of Phoenix Institutional Review Board (IRB) maintains a Human Research Protection Program that manages the IRB process, approvals, and provides a wealth of knowledge and guidance for further reference. During the course of the research project, researchers’ would need to work with IRB again if they identify additional research questions to investigate or find it necessary to change any aspect of an already approved study.
How to Find PUDS
Once you have a research topic or question identified, your next step is to identify a PUDS that addresses this research topic or question. Below are six key hints to start to locate a PUDS in your research area:
- Searching data repositories, which help organize and house lists of PUDS. Two key data repositories are worth mentioning:
Data.gov is an excellent place to start. It is multidisciplinary and houses the largest catalogue of data sets sponsored with public funds. Data can be searched for using keywords, categories, location, topics, and more. However, given the volume of data it can be quite overwhelming to use at first. The Inter-University Consortium for Political and Social Research (ICPSR), run out of the University of Michigan, is another excellent resource for locating data appropriate for your research question.
- National and International Centers, such as the ones provided in the examples below, produce and house many PUDS.
- Associations in your research area may maintain a list data sets and repositories or house PUDS. For example, the American Psychological Association maintains a valuable list of credible PUDS. You might also locate key research agencies in your field that maintain PUDS, such as the Agency for Healthcare Research and Quality.
- The University if Phoenix’s Library has a number of resources to identify PUDS. From UOPX library’s home page, locate Library Resource. There are three useful resources: 1) Company Directories and Financials, 2) Country Profiles and Economic Data, and 3) Government Resources.
- Literature reviews of the methods in academic papers in your field might identify a PUDS available in your research area. Likewise, your peers might also be a source of knowledge.
- A general web search for public use data sets, open data sets, or secondary data sets combined with your research area key words can be fruitful. Many universities maintain lists of PUDS, such as SUNY-Geneseo.
Examples of Public Use Data Sets
The most recognized and largest PUDS is the US Census. The US Census maintains and houses a collection of surveys and programs with data sets, such as the American Community Survey and the American Housing Survey. Below are examples of quality PUDS projects and the type of project they primarily reflect. They are a good place to start to become familiar with what PUDS looks like and can do for research.
Government based projects tend to house a number of different projects and tend to be cross sectional. Their focus tends to be on trends and current descriptive.
- The US Census
- Bureau of Labor Statistics
- National Center for Education Statistics
- National Center for Health Statistics
University based projects generally are specific to a study and tend to be longitudinal. They focus tends to be on answering developmental questions.
Collaboration projects are projects that are sponsored by and/or collaborate data collection with multiple organization. These may be houses in a government center, university, or private organization.
- Longitudinal Immigrant Student Adaptation Study (LISA) with the National Science Foundation, the Spencer Foundation, and the W. T. Grant Foundation.
- United Nations
- World Bank Data
Independent projects capture other PUDS that private organizations develop and maintain.
- Hoover’s Company Profiles (available through the UOPX’s library)
Factors to Consider when Reviewing a PUDS
When you locate a PUDS, you might consider asking yourself the following questions about the quality and availability of the data:
What is the quality of the data?
- How well can I understand their data collection methods?
- What was the original purpose of collecting the data?
- What are the credentials of the author or sponsored organization?
- Have others published from this data set (perhaps through library search)? This might also be an opportunity to look at the journals in which analyses using these data are being published.
- Is there a data codebook or manual available?
What data is available?
- What variables are available? How were they defined?
- How were variables measured? What analysis can I use?
- When were the data collected?
- From whom and where were data collected?
- How did the PUDS team manage missing data?
We hope this blog post provides valuable information about PUDS in relation to your agenda as a scholar. We invite you to continue to watch for additional trainings and webinars on PUDS, as well as engage with us below in what other questions, comments, or data sets you use are.