Skip to Main Content

Text Analysis

Dartmouth College Library's guide to text analysis tools and platforms

Learning Python for Text Analysis

There are many ways to apply text analysis. You can use software like Voyant. You may learn to code with Python or R.

Choosing to use a programming language will provide you with far more flexibility than offered by any existing software applications. And Python is the most widely used programming language for text analysis.

If you are new to Python and Text Analysis, a great resource to begin with is Melanie Walsh's free, online book, Introduction to Cultural Analytics and Python (2021). In this book you can learn how to get started with Python (Ch. 1), store data (including texts) in data tables known as dataframes (Ch. 2), and perform some basic text analysis (Ch. 5). This book even allows you to interactively work with her code if you click the Binder (rocket) logo at the top of the screen.

Dartmouth also has subscriptions to Constellate and ProQuest TDM Studio, both of which provide tutorials for applying Python to their text collections (or to your own). Please click on the Constellate and Proquest TDM tabs to the left.

Additional good places to start include:

1. The Programming Historian (over 100 DH lessons in English, including various on text analysis, as well as dozens of lessons in French, Spanish, and Portuguese).

2. The NLTK Book (aka. Natural Language Processing with Python) - NLTK is one of the most popular Natural Language Processing (NLP) Python packages and almost certainly is the most popular for text analysis learners.

If you start the lessons above and feel you need to start with the basics in Python, a great resource are the Software Carpentry tutorials (available online or you can sign up for in-person Software Carpentry the next time we host it).