Skip to main content

Data Management

Background and links to more information about data management issues.

Workshops and Training

We offer a variety of workshops to expand on your research data best practices.  See our current workshop offerings on the Dartmouth Library calendar.  

Contact us at ResearchDataHelp@groups.dartmouth.edu to request a workshop.

Spring 2020 Reproducible Research Workshops

Interested in producing more transparent, clear, organized, and reproducible research?

The Library and ITC is pleased to announce a virtual workshop series aimed at expanding your reproducible research best practices.  This workshop series will explore different tools, concepts, and strategies to make your research more computationally reproducible.  Practicing reproducible research techniques will enable you to do better science, faster, and with fewer mistakes.

Please visit dartgo.org/RRADworkshops for more information and to register to attend live workshops or to view pre-recorded events below.

LIVE SESSIONS

Introduction to Version Control with GIT

Instructor: Lora Leligdon 

Date and time: Tuesday, April 28, 2020,  3 - 4 pm

 

Version control allows you to keep track of overtime changes in documents, and revert back to the previous version easily. Originally developed for source code management, it is a practical way to work with any text documents and more. Git, in conjunction with online platforms like GitHub, GitLab, or Bitbucket, allow you to backup your work in the cloud, share with collaborators anywhere in the world, synchronize your work between several machines, including HPC environment, regardless of the operating system, and publish and make your work accessible across the internet. 

 

This was a live session via Zoom, no recording available.

 

Workshop Slides: https://drive.google.com/open?id=1lIVOtiIJrUHJUEjAx5zIdsVuZnBBSD29

 

Webinar: Introduction to Database Design and Implementation

Instructor: Christian Darabos

Date and time: Thursday, April 30, 2020,  3 - 5 pm

 

Research Computing offers this hands-on workshop providing an introduction to database design. Using a relational database can help you store and analyze your research data and results more efficiently (than flat/text files). We will be using the relational database paradigm, the Unified Modeling Language (UML), the Entity-Relationship (ER) model, and implement a simple MySQL database. 

 

MySQL Workbench: https://dev.mysql.com/downloads/workbench/

 

This was a live session via Zoom, no recording available.

 

The Reproducible Research Workflow

Instructor: Lora Leligdon

Date and time: Tuesday, May 5, 2020, 3 - 4 pm

 

A research project can be considered reproducible if a second investigator (including you in the future!) can recreate the final reported results of the project, including key quantitative findings, tables, and figures, given only a set of files and written instructions.  In this session, we will discuss best practices and a reproducible workflow that will help make your research more clear, transparent, and organized from the start.

 

This was a live session via Zoom, no recording available.

 

Workshop slides: https://drive.google.com/open?id=1-qY2HsbqrZSe9pQ6w7UKqKVji6DagmBs

 

Workshop handout: https://drive.google.com/open?id=1vrU0CBQtrQ-OFYKy_W0wL9-X8l0B9lOw

 

Workshop reproducibility checklist: https://drive.google.com/open?id=1xLvFUWreuBqi-oo3jOarR_MlP_evTSAO

Webinar: Introduction to Database Query and Analytics

Instructor: Christian Darabos

Date and time: Thursday, May 7, 2020, 3 - 5 pm 

 

Research Computing offers this hands-on workshop providing an introduction to database query in Structured Query Language (SQL). Using a relational database can help you store and analyze your research data and results more efficiently (than flat/text files). We will be using SQL on a pre-populated database to extract, filter and answer simple analytics questions. If time permits, we will explore ways of programmatically accessing databases from tools such as R or 

Python in order to automate the analytical process. 

 

This was a live session via Zoom, no recording available.

Excel Best Practices for Reproducible Research 

Instructor: Pamela Bagley

Date and time: Tuesday, May 12, 2020,  3 - 4 pm

 

Actively managing your research data is an important part of reproducible research, and Excel is one of the most widely used tools.  In this session, we will discuss best practices for data management in Excel, along with tips on filenames, README files, and metadata.  We’ll introduce the free version of Colectica for Excel, a tool that can help you document your spreadsheet data. 

 

This was a live session via Zoom, no recording available.

 

Reproducible Statistical Data Analysis with R

Instructor: Jianjun Hua

Date and time: Thursday, May 14, 2020, 11 am  - 12 pm 

 

R is a free, open-source programming language that is known for its approachability and for becoming an increasingly popular tool for data analysis and visualization.  In this hands-on session, you will learn how to use R to conduct basic statistical data analysis, and how to save you time and increase the reproducibility of your research. 

 

This was a live session via Zoom, no recording available.

 

Getting Organized for Reproducible Research: File Management Systems

Instructors: Pamela Bagley and Elaina Vitale

Date and time: Tuesday, May 19, 2020, 3 - 4 pm

Format: Live Zoom

 

This workshop will introduce basic concepts for keeping track of data files.  Learn good habits in file organization —naming files, creating organized file structures, tagging files, version control, and documenting your process.  Following these habits will help you locate files easily, avoid confusion when working on teams or sharing files, and prevent data loss by accidentally overwriting data files. 

 

This was a live session via Zoom, no recording available.

 

ON DEMAND /VIEW ANYTIME SESSIONS

Python for Reproducible Research

Instructor: Paige Scudder

Format: Pre-recorded, Available now

 

Python is a free, open-source programming language used by programmers and researchers of all levels.  In this hands-on session, we will introduce basic programming concepts using Python, and show you how they can save you time and increase the reproducibility of your research.

Python for Reproducible Research  - Videos and Accompanying Files

Reproducible Research with Spatial Data

Instructors: Steve Gaughan

Format: Pre-recorded, Available now

 

We'll introduce the concepts of spatial data analysis and show how to create reproducible spatial workflows, from data input to maps to exporting spatial overlay and other analysis results. 

 

Reproducible Research with Spatial Data Workshop - Videos and Accompanying Files

The first two videos are slides.  The last two are screen captures of live-coding using R and R Studio.  Instructions for installing R and R Studio can be found at the "Accompanying Files" link above.  This link also contains sample datasets and a solution file for the R code.

Stata Refresher for Reproducible Research 

Instructors: John Cocklin, Catrina Cuadra

Format: Pre-recorded, Available now

 

Stata is a popular statistical analysis software that is used extensively by economists. In session one attendees will load datasets into Stata. In session two attendees will wrangle the data in preparation for data analysis and reproducible research. In session three attendees will run basic summary statistics and a linear regression on the data they prepared in sessions one and two.This workshop is appropriate for those who have learned Stata in the past and would like a refresher, or for those who are using Stata for the first time. 

 

This will be a pre-recorded session, available for viewing anytime at https://researchguides.dartmouth.edu/econ/stata

 

Stata Refresher for Reproducible Research - A Three Part Video Tutorial

If you would like to follow the videos using Stata, please download these files and place them in a single folder on your computer where you will be able to find them. For information on downloading Stata, see the "Statistical Software" research guide under Additional Resources below.

R for Reproducible Research

Instructor: James Adams

Format: Pre-recorded, Available soon

 

R is a free, open-source programming language that is known for its approachability and for becoming an increasingly popular tool for data analysis and visualization.  In this hands-on session, we will introduce basic programming concepts using R, and show you how they can save you time and increase the reproducibility of your research. 

 

This will be a pre-recorded session, available for viewing anytime.