Skip to Main Content

Research Data Management

Background and links to more information about data management issues.

Previous RDM Workshop Series

Interested in learning more about research data management?  The Library hosts a variety of  workshops aimed at expanding your data best practices.  The workshops explore different stages of the research data lifecycle, including data management planning, data cleaning, visualizing, storing, sharing, and preserving. 

 

Presentation materials and handouts for previous sessions are included below.

Data Management Planning with the DMPTool

Data management plans are critical for compliance on most sponsored projects.  But what are the essentials of a fundable data management plan?  This workshop will cover the basics of data management planning, the top 10 things to consider when writing DMPs for grants, and use of the DMPTool.  This hands-on session will include time to work on your own DMP using the DMPTool.

Presenters: Lora Leligdon, Physical Sciences Librarian and Pamela Bagley, Research and Education Librarian

Workshop materials:

Data Cleaning with OpenRefine and R

Data comes in all kinds of shapes and sizes, and usually needs a lot of cleaning up before it’s ready to be analyzed.  This hands-on session will serve as an introduction to tools for tidying up your data, including the intuitive point-and-click tool Open Refine and the programming language R.  We’ll use these tools to correct errors, reshape data, and make life easier for you and anybody else who will be using your data.  The first half of the class will focus on Open Refine, and the second will focus on the popular packages dplyr and tidyr for R.  We want to get started right away, so please install the following on your device prior to the workshop:

Presenter: James Adams, Data and Visualization Librarian

Workshop materials:

Data Visualization with Tableau Public and R

Data visualization helps us better understand our data, whether it’s a few data points or a few million.  This hands-on workshop will help get you started with data visualization using the point-and-click tool Tableau Public and the programming language R.  We’ll practice creating a variety of plots with each tool, which will help you to explore and present your own data.  The first half of the class will focus on Tableau Public and the second will focus on the ggplot2 package for R.  We want to get started right away, so please install the following on your device prior to the workshop:

Presenter: James Adams, Data and Visualization Librarian

Workshop materials:

Research Data Storage on Campus and Beyond

Dartmouth College offers a comprehensive set of storage solutions for your research data.  Storage comes in many models, each adapted to its own use cases (OneDrive, Box, AFS, SMB, (no)SQL, etc.).  During this workshop we will cover strategies, tools, and computational solutions to ensure the availability, the safety and the security of your research (input) datasets and results, and your documents.  We will compare and contrast online vs. on premise solutions, databases and directory-based storage (file systems), personal and shared (lab) spaces.

Presenter: Christian Darabos, Research Computing Life Science Informatics Specialist

Workshop materials:

Strategies for Data Sharing

Want to make your data discoverable and usable by other researchers?  Interested in sharing your data, but not sure how to do it?  In this workshop, we will discuss some practical strategies and best practices for sharing your data.  Topics will include data formats, metadata, versioning, options for data sharing such as data journals and repositories, licensing and publishing your data, and data citation. 

Presenter: Katie Harding, Physical Sciences Librarian

Workshop materials:

Data Preservation

What happens to your data after the project is complete?  What will you do to ensure that your research data will be reusable by you or others tomorrow, five years from now, even ten years from now?  This workshop is designed to provide researchers with best practices for preserving their digital and analog research data. 

Presenter: Jenny Mullins, Interim Head of the Preservation Services

Workshop materials:

Data Management with Excel

Actively managing your research data is an important part of the research data lifecycle and a critical component for compliance.  In this session, we will discuss best practices for data management in Excel, along with tips on filenames, README files, and metadata.

Workshop materials:

Workshops and Training

We offer a variety of workshops to expand on your research data best practices.  See our current workshop offerings on the Dartmouth Library calendar.  

Contact us at ResearchDataHelp@groups.dartmouth.edu to request a workshop.

SPRING 2020 PREVIOUS SESSIONS

Introduction to Version Control with GIT

Instructor: Lora Leligdon 

Date and time: Tuesday, April 28, 2020,  3 - 4 pm

 

Version control allows you to keep track of overtime changes in documents, and revert back to the previous version easily. Originally developed for source code management, it is a practical way to work with any text documents and more. Git, in conjunction with online platforms like GitHub, GitLab, or Bitbucket, allow you to backup your work in the cloud, share with collaborators anywhere in the world, synchronize your work between several machines, including HPC environment, regardless of the operating system, and publish and make your work accessible across the internet. 

 

This was a live session via Zoom, no recording available.

 

Workshop Slides: https://drive.google.com/open?id=1lIVOtiIJrUHJUEjAx5zIdsVuZnBBSD29

 

Webinar: Introduction to Database Design and Implementation

Instructor: Christian Darabos

Date and time: Thursday, April 30, 2020,  3 - 5 pm

 

Research Computing offers this hands-on workshop providing an introduction to database design. Using a relational database can help you store and analyze your research data and results more efficiently (than flat/text files). We will be using the relational database paradigm, the Unified Modeling Language (UML), the Entity-Relationship (ER) model, and implement a simple MySQL database. 

 

MySQL Workbench: https://dev.mysql.com/downloads/workbench/

 

This was a live session via Zoom, no recording available.

 

The Reproducible Research Workflow

Instructor: Lora Leligdon

Date and time: Tuesday, May 5, 2020, 3 - 4 pm

 

A research project can be considered reproducible if a second investigator (including you in the future!) can recreate the final reported results of the project, including key quantitative findings, tables, and figures, given only a set of files and written instructions.  In this session, we will discuss best practices and a reproducible workflow that will help make your research more clear, transparent, and organized from the start.

 

This was a live session via Zoom, no recording available.

 

Workshop slides: https://drive.google.com/open?id=1-qY2HsbqrZSe9pQ6w7UKqKVji6DagmBs

 

Workshop handout: https://drive.google.com/open?id=1vrU0CBQtrQ-OFYKy_W0wL9-X8l0B9lOw

 

Workshop reproducibility checklist: https://drive.google.com/open?id=1xLvFUWreuBqi-oo3jOarR_MlP_evTSAO

Webinar: Introduction to Database Query and Analytics

Instructor: Christian Darabos

Date and time: Thursday, May 7, 2020, 3 - 5 pm 

 

Research Computing offers this hands-on workshop providing an introduction to database query in Structured Query Language (SQL). Using a relational database can help you store and analyze your research data and results more efficiently (than flat/text files). We will be using SQL on a pre-populated database to extract, filter and answer simple analytics questions. If time permits, we will explore ways of programmatically accessing databases from tools such as R or 

Python in order to automate the analytical process. 

 

This was a live session via Zoom, no recording available.

Excel Best Practices for Reproducible Research 

Instructor: Pamela Bagley

Date and time: Tuesday, May 12, 2020,  3 - 4 pm

 

Actively managing your research data is an important part of reproducible research, and Excel is one of the most widely used tools.  In this session, we will discuss best practices for data management in Excel, along with tips on filenames, README files, and metadata.  We’ll introduce the free version of Colectica for Excel, a tool that can help you document your spreadsheet data. 

 

This was a live session via Zoom, no recording available.

 

Reproducible Statistical Data Analysis with R

Instructor: Jianjun Hua

Date and time: Thursday, May 14, 2020, 11 am  - 12 pm 

 

R is a free, open-source programming language that is known for its approachability and for becoming an increasingly popular tool for data analysis and visualization.  In this hands-on session, you will learn how to use R to conduct basic statistical data analysis, and how to save you time and increase the reproducibility of your research. 

 

This was a live session via Zoom, no recording available.

 

Getting Organized for Reproducible Research: File Management Systems

Instructors: Pamela Bagley and Elaina Vitale

Date and time: Tuesday, May 19, 2020, 3 - 4 pm

Format: Live Zoom

 

This workshop will introduce basic concepts for keeping track of data files.  Learn good habits in file organization —naming files, creating organized file structures, tagging files, version control, and documenting your process.  Following these habits will help you locate files easily, avoid confusion when working on teams or sharing files, and prevent data loss by accidentally overwriting data files. 

 

This was a live session via Zoom, no recording available.