Good analysis requires good data. While no dataset will be perfect, it’s important to rigorously evaluate the quality of your data and do your best to account for shortcomings you find.
Determine fit with your research goals
Using a valid, appropriate measure of what you’re studying is key. A good dataset for your particular project needs to include variables that can measure what you’re trying to measure!
To determine fit, ask yourself:
Understand the data generating process
The data generating process (DGP) is the real world process that creates the data you observe. The way your data was generated, for example who collected it and why, bears on its quality. Clear and transparent documentation for your dataset is useful in understanding the DGP and can also be a signifier of data quality.
Answering the following questions can help you better understand the DGP:
Examine coverage and bias
High quality data is unbiased and complete.
Run through these questions to check your data for comprehensive coverage and lack of bias:
Document (and do your best to account for!) shortcomings
It is exceedingly rare to find a dataset that perfectly captures what you’re trying to test and is totally unbiased and complete. As a researcher, it is your responsibility to understand where your data is imperfect and try to account for those imperfections in your analysis, or at the very least, transparently discuss the shortcomings of your data and how they might impact the results you’re presenting.
Ask for help
If you have any questions about evaluating your data, please reach out to the Research Facilitation team at Dartmouth Libraries. Our team page and email can be found here.