Interested in learning new tools and skills to create more organized and reproducible research?
During the winter term, the Library and ITC will host a series of workshops aimed at expanding your computational research and data management best practices. The workshops will explore different tools, concepts, and strategies to make your research more efficient and reproducible.
Join us for workshops on Text Analysis, Machine Learning, GIS, Introduction to Data Science, HPC, RDM, and Motion Data!
Please see the tabs below for more information on the sessions in each track. Tracks contain related sessions, but each one is designed to be stand alone content.
How can we use computational techniques to analyze texts and then visualize patterns buried within them? In this six-lesson series, you will learn how to get started with the Python programming language and how to apply Python to perform digital text analysis. You will practice identifying and visualizing patterns within individual texts and across large collections or corpora of texts.
6 sessions: 1/18, 2/1, 2/8, 2/15, 2/22, 3/1 at 12 pm
What can we learn about texts by applying text analysis in Python? How do we get started?
In this session, participants will:
Exploring the frequency of words and phrases in texts: what can they tell us about a text?
In this session, participants will:
How does the language of a select group of texts change over time? How can we compare texts using the frequency counts of words?
In this session, participants will:
How can we sort and classify texts by emotional register (anger, sadness, joy, etc.) and topic?
In this session, participants will:
In previous sessions of the Text Analysis series, we have learned how to describe texts in terms of, e.g., word counts, topics, or expressed sentiments. In this session, we will try to identify, visualize, and exploit patterns in these features: Which texts are more similar? Which are different? Can we use these features to classify texts into categories?
In this session, we will dig deeper into the extracted features and use dimensionality reduction techniques to visualize emerging patterns. Using the State of the Union dataset, we will practice what we have learned by trying to automatically guess if a speech was delivered by a Democratic or a Republican president.
While not strictly required, attending the previous sessions in the series and “Intro to Machine Learning with scikit-learn” is highly recommended.
How can we find and construct our own corpora?
In this session, participants will:
Scikit-learn (also known as sklearn) is a machine learning library written in Python. It features models and methods for supervised and unsupervised learning, dimensionality reduction, model selection and evaluation, and even some techniques for the visualization of results. Because of its efficient implementations and accessible interface, scikit-learn is very popular in educational, research, and production environments, and runs “under the hood” of many other, more streamlined libraries (e.g. NLTK, auto-sklearn, or PyCaret).
In this code-along workshop, we will introduce various components of scikit-learn. By the end of the session, you will be able to implement a typical machine learning workflow.
Deep Learning for everyone! PyTorch is a free and open source machine learning framework for the rapid development of neural networks for applications in computer vision, natural language processing, or speech recognition. It provides a simple Python interface, which makes it equally popular in education, research, and production environments. If a problem falls into fairly standard categories, powerful pre-trained models are available out-of-the-box. If a custom model is required, PyTorch makes it easy to define, train, and test neural networks using state-of-the-art algorithms and components from simple feed-forward networks to convolutional networks to LSTMs, transformers and more.
In this session, you will get a brief overview of the components provided by PyTorch. We will apply a pre-trained model to a problem with just a few lines of code, and we will define our own neural network! Finally, we will introduce the concept of transfer learning, which allows you to benefit from pre-trained models even if your particular problem is different from what the model was originally trained on!
The caret package (short for Classification And Regression Training) is one of the most popular R packages to handle statistics and machine learning problems. It contains functions to streamline the model training process for complex regression and classification problems and makes the process of training, tuning, and evaluating machine learning models in R consistent and easy.
During this session, you will be introduced to the basic functionalities of the caret package.
Basic knowledge of R and linear regression are helpful for you to understand the content of this webinar.
PyCaret is an open source Python machine learning library inspired by the popular R package – “caret”. The goal of the “caret” package is to automate the major steps for evaluating and comparing machine learning algorithms for classification and regression. The main benefit of the library is that one can achieve a lot with only a few lines of code and little manual configuration. The PyCaret library brings these capabilities to Python. It is well suited for seasoned data scientists who want to increase the productivity of their machine learning experiments by using PyCaret in their workflows or for citizen data scientists and those new to data science with little or no background in coding.
This is a suitable training session for people who already have basic knowledge of Python and are interested in learning more to perform high-level data analysis.
**Google Colab will be used to do all the demos.
We'll introduce the concepts of Geographic Information Systems to make maps, analyze geographic data and create new geographic data. We'll discuss and use some of the many different types of GIS data
This workshop will examine the use of mapping and geographic information systems in the Humanities and Social Sciences, and teach the use of some basic tools and techniques to create and edit GIS data as well as query existing geospatial datasets for information.
We'll use R and Python to show how to work with geospatial data and create reproducible workflows, results and maps.
This class is for users new to the Discovery cluster. It covers how to set up your environment, submit jobs, transfer files to and from the cluster, how to use available storage and how to monitor your jobs.
This class is for users new to the Discovery cluster. It covers how to set up your environment, submit jobs, transfer files to and from the cluster, how to use available storage and how to monitor your jobs.
Most modern programming libraries for computational work make it easy to parallelize your code and thus leverage the power of all CPU cores on your machine. But what if even that is not enough? How can we truly unleash the power of a High Performance Cluster like Dartmouth’s Discovery and use hundreds of CPUs distributed across multiple nodes?
The answer: MPI. MPI is a standard for a Message-Passing Interface that is implemented in various libraries. It allows several nodes within a cluster to communicate. By sending status messages and data back and forth between nodes, the computational load can be distributed across any number of available nodes.
Unix is a command-line-based platform that is a highly powerful and flexible tool for data management and analysis. It helps users automate repetitive tasks and easily combine smaller tasks into larger, more powerful workflows. Use of the shell is fundamental to a wide range of advanced computing tasks, including high-performance computing. This workshop introduces the basic concepts of UNIX operating system and shell scripting. We will explore essential hands-on skills to confidently use the command line interface.
R is a free, open-source programming language that is known for its approachability and for becoming an increasingly popular tool for data analysis and visualization. In this basic, hands-on 60 minute session, we will introduce basic programming concepts using R such as dataframes and plots, and show you how they can save you time and increase the reproducibility of your research.
In this session, we will introduce MPI in Python covering basic concepts, one-to-one, one-to-many, many-to-one communications, and we will close out with a few notes on pitfalls and good practices.
Shiny is an R package that makes it easy to build interactive web applications straight from R. Shiny package lets researchers transform any piece of analysis code in R into an interactive app, which is capable of use by a broad audience, without other coding and web-development skills.
It is strongly suggested that you have experience in R, and if not please sign up for 'Getting started with R” (1/11 @ 12pm). In this session, we will get you started building Shiny apps right away.
Click the link (https://rstudio-connect.dartmouth.edu/connect/#/apps/389dc504-a863-4b4f-bb3c-68673c82c79a/access) for recommended installation prior to the workshop.
R is a free, open-source programming language that is known for its approachability and for becoming an increasingly popular tool for data analysis and visualization. In this hands-on session, you will learn how to use R to conduct basic statistical data analysis, and how to save you time and increase the reproducibility of your research.
Research Computing, in partnership with the Reproducible Research Group and the Dartmouth Library, invites researchers of all levels to participate in a workshop focused on developing publication-ready data visualizations for health and biology research using R. Some prior experience in R expected, we recommend participation in Getting Started with R on January 11th.
Workshop Goals:
Attendees will be provided lecture slides and R notebooks. Publicly available experimental data will be provided, but attendees may choose to bring their own data for additional interpretation.
Effective January 25, 2023, the National Institutes of Health (NIH) is implementing its Policy for Data Management and Sharing (DMS) to promote the management and sharing of scientific data generated from NIH-funded or conducted research.
This workshop will introduce the DMS policy, including required elements of the plan and institutional resources. We will also explore the DMPTool, an online application that helps researchers create data management plans by providing funder and institutional guidance, as well basic concepts for data management and sharing, such as file naming, organization, and documenting your process.
Representatives from the Office of Sponsored Projects, the Library, and ITC will be available to answer your questions.
Effective January 25, 2023, the National Institutes of Health (NIH) is implementing its Policy for Data Management and Sharing (DMS) to promote the management and sharing of scientific data generated from NIH-funded or conducted research.
This workshop will introduce the DMS policy, including required elements of the plan and institutional resources. We will also explore the DMPTool, an online application that helps researchers create data management plans by providing funder and institutional guidance, as well basic concepts for data management and sharing, such as file naming, organization, and documenting your process.
Representatives from the Office of Sponsored Projects, the Library, and ITC will be available to answer your questions.
This workshop is for participants with basic knowledge of Autodesk Maya and will cover the basics of character design and modeling.
In this workshop participants will:
This workshop is for participants with a basic knowledge of Autodesk Maya. In this workshop participants will: