Skip to Main Content

Research Data Management

Background and links to more information about data management issues.

Reproducible Research Training

Interested in learning new tools and skills to create more organized and reproducible research?

Each term, the Library and ITC will host a series of workshops aimed at expanding your computational research and data management best practices.  The workshops will explore different tools, concepts, and strategies to make your research more efficient and reproducible.

Join us for workshops on Text Analysis, Machine Learning, HPC, data repositories, bibliometrics and more!

Registration for all events: https://dartgo.org/RRADworkshops

Previous Workshop Tracks

Intro to Python 1: The Basics

An introduction to programming in Python for first-time programmers as well as those new to Python. We will begin discussing why researchers - whether scientists, humanists, artists, or engineers - may choose to learn to program and why Python is a great language for coding novices. We will then get hands-on practice in Python working with different data types (numbers, text strings, etc.), performing basic calculations, reading data from files, and creating and modifying different data structures (lists, sets, dictionaries).
This is the first of a four-part workshop series, with subsequent sessions on working with tabular data (4/25), creating visualizations (5/2), and analyzing texts (5/16). Participants can sign up for all four or - if they already have some Python experience - pick and choose from these options.

Besides these four sessions, we also recommend new Python programmers consider taking “Classy Code: Object-Oriented Programming” to better understand how an object-oriented programming (OOP) language like Python works.

Intro to Python 2: Dataframes
Nearly all researchers work with tabular data at one time or another. In this lesson, we will practice with dataframes, a Python data structure designed to work with tabular data, using the Pandas library. We will create our own dataframes, read in data from .csv files to a dataframe, subset and combine dataframes, and add or modify columns (variables) and observations (rows). We will also examine how these tasks work with messy and large datasets (i.e. millions or rows).

Intro to Python 3: Visualization

Regardless of what type of data we work with, visualization allows us to recognize patterns, observe distributions, discover relationships, and tell stories. In this lesson, we will work with Python’s Seaborn library to analyze data (both quantitative and qualitative) by creating a variety of different plots, from basic line, bar, and distribution plots to more advanced network graphs and maps. We will also learn how to modify the extents, axes, and display of each plot and how to export them as high-resolution images ready for publication.

Intro to Python 4: Text Analysis
Since the invention of different forms of writing, most information has been recorded in unstructured texts. Even today, far more data is stored in text form (rather than in quantitative databases, for example). Fortunately, Python offers a robust set of tools to read, process, analyze, interpret, and visualize texts, whether they be five-hundred-year-old historical chronicles or five-second-old tweets. In this lesson, we will practice with reading, processing (i.e. dividing a text into words), quantifying (calculating word counts), and analyzing patterns in texts. These basic steps will, in turn, provide the building blocks for more advanced text analysis techniques.


Gentle Introduction to Machine Learning: Statistics


Not just since the arrival of Large Language Models have machine learning and artificial intelligence moved into more and more areas of our work and personal lives. These models and algorithms can appear opaque, overly complex, incomprehensible, or even downright scary. Part of the problem is that the field uses a very specific language and set of concepts that is unfamiliar to many non-technical people. In this introductory series, we want to pull back the curtain on machine learning and start building up an intuitive understanding of the involved concepts that will help participants better understand the capabilities and challenges of AI. The target audience are people new to the field, no technical background needed. We will use only very basic math and almost no code to approach the subject matter in an accessible, intuitive, and hopefully entertaining way.

In this session, we will look at the fundamentals of statistics that will help us better understand the techniques described in the other two introductory sessions. You will gain (or refresh) knowledge on basic descriptive statistics, as well as probability distributions and what it all means for how AI sees the world.

This session is a great primer for the workshops on Classification and Regression later in the term.

Gentle Introduction to Machine Learning: Regression

In this session, we will discuss algorithms that model the relationship between two or more numerical variables: Estimating the USD price of a house based on its features, predict trends in college registration numbers, or estimate the number of library visitors on a given day based on weather or other factors. You will gain an intuitive understanding of how this relationship is found and learn about the limitations and pitfalls of these tasks.

It will be helpful, but not required, to attend the session on Gentle Introduction to Machine Learning: Statistics earlier in the term.

Complementing this session, there is also a workshop on Classification, which highlights another main branch of machine learning techniques.

Gentle Introduction to Machine Learning: Classification

In this session, we will discuss algorithms that associate an “observation” with a label. These so-called classifiers can be found all over the place: Spam detection, sentiment analysis, loan default prediction, facial recognition, and many more applications are all based on classifiers of some sort. You will gain an understanding of how a classifier “sees” an observation, how it arrives at a decision, and how we can gain insights into the decision making process.

It will be helpful, but not required, to attend the session on Gentle Introduction to Machine Learning: Statistics earlier in the term.

Complementing this session, there is also a workshop on Regressions, which highlights another main branch of machine learning techniques.


Classy Code: Object-Oriented Programming for Fun and Profit

Have you ever gotten lost in a maze-like piece of code you wrote? Are you interested in going beyond the basics of programming and learning more about how you can design and structure a robust, extensible, and easy-to-read program?
This workshop will introduce you to the paradigm of Object-Oriented Programming. We will introduce the main philosophy behind it, discuss use cases, show its limitations, and work through a variety of examples hands-on.
Participants are introduced to the basic principles of OOP, will learn about objects, classes, fields, and methods, and work through a series of hands-on examples in Python (although the concepts apply to any language supporting OOP). Finally, we will take a look at some reusable templates called design patterns that can be applied in a number of different contexts.

 

 

 

 

 

NIH Data Management Plans and Practices

Effective January 25, 2023, the National Institutes of Health (NIH) is implementing its Policy for Data Management and Sharing (DMS) to promote the management and sharing of scientific data generated from NIH-funded or conducted research.

This workshop will introduce the DMS policy, including required elements of the plan and institutional resources.  We will also explore the DMPTool, an online application that helps researchers create data management plans by providing funder and institutional guidance, as well basic concepts for data management and sharing, such as file naming, organization, and documenting your process.

Representatives from the Office of Sponsored Projects, the Library, and ITC will be available to answer your questions.

 

Introduction to Data Repositories

Many funders and publishers require researchers to share their data in an appropriate public data repository. In this session, we will discuss how to find, evaluate and select trusted data repositories, and discuss options for discipline-specific and generalist repositories, including IEEE DataPort, Dataverse, and ICSPR.


Bibliometric Analysis and Visualization

Bibliometric data can provide important insights on research groups, trends, collaborations, and even on an individual’s research impact.  In this session, we will explore the different research metrics (author, groups, institution, journals, etc.) available and introduce how to use SciVal, a benchmarking and competitive analysis tool, and other citation datasets to create reports, compare, and benchmark many different types of entities.

 

Introduction to Databases and SQL  
Description A database is a structured, organized collection of data.  Databases are used to store, manage and retrieve data.  Queries and reports from database tables drive visualizations, analysis and reproducible research.  The Structured Query Language (SQL) along with R and Python, can be used to interact with databases to extract, transform and load data for analysis and visualizations.  We'll use Software Carpentry's 'Databases and SQL ' for an introduction to these concepts.  https://swcarpentry.github.io/sql-novice-survey/

 

Getting started with the Discovery cluster SLURM
This class is for users new to the Discovery cluster. It covers how to set up your environment, submit jobs, transfer files to and from the cluster, how to use available storage and how to monitor your jobs.

 

Texturing in Adobe Substance Painter
This workshop will cover the fundamentals of texturing using Adobe Substance Painter. Participants will learn how to bake, texture, and render a premade model. This workshop will also briefly cover aspects of model preparation such as UV mapping and texel density. No prior knowledge of Adobe Substance Painter is required for this workshop.

 

Intro to Qualtrics
Qualtrics is a web-based survey generation tool that allows users to build surveys with drop-down selection menus, open-format questions, multiple-choice questions, Likert scales and more.  It allows sophisticated branching logic, distribution, and result analytics.  It also allows students, staff and researchers to share and collaborate on surveys with their colleagues. Users can also capture qualitative insights quickly and easily through Qualtrics text tools; and, most importantly, Qualtrics meets the highest HIPAA, PHI, IRB and security standards in the industry—giving everyone involved peace of mind.

 

Data Analysis with SAS - I
Join this hands-on workshop and learn the fundamental techniques for analyzing data using SAS software. Get equipped with essential skills for descriptive statistics, linear regression, ANOVA, and other key statistical methodologies. Enroll now to elevate your data analysis skills with SAS!

 

Introduction to Building Containers
Docker is the #1 most wanted and #2 most loved software developer tools, and helps millions of researchers and IT professionals build, share and run any app, anywhere - locally, in the institution’s infrastructure or in the cloud. It is a pivotal tool for Reproducible Research that allows anyone with a container to run any analysis the ways it was designed, including all the dependencies and necessary software.  Research Computing is offering this hands-on workshop as an introduction to containerization. In this introduction, you'll learn the fundamentals of containerization and how to build/use Docker/container images.  

 

Introduction to Recommender Systems in Python  
Discover how to build personalized recommendation systems in Python! This workshop teaches you how to create and evaluate models using real-world data. You'll learn how to recommend products, movies, and more to users based on their interests. No prior experience needed. Join us and take the first step towards building your own recommendation systems!