Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Text Analysis

Dartmouth College Library's guide to text analysis tools and platforms

Need help?

Dartmouth College Library offers text analysis platforms, training, and support for your research and teaching needs.  We currently provide access to two text analysis platforms, Constellate and ProQuest TDM Studio.  We also offer individual or group training and support.  Please contact the Research Data Help team: researchdatahelp@groups.dartmouth.edu to get started.

Introduction to Text Analysis

Text analysis platforms provide researchers with the ability to search, manipulate, and explore knowledge in the scholarly record at a large scale to uncover important information and gain new insights.  Text analysis can enable us to answer questions on how texts are interconnected, what sentiments they contain, and when significant terms change within a collection of unstructured texts.

There are five main questions that text analysis can help answer:

  1.     What are these texts about?
  2.     How are these texts connected?
  3.     What emotions (or affects) are found within these texts?
  4.     What names are used in these texts?
  5.     Which of these texts are most similar?

Question 1: What are these texts about?

  •     Word Frequency (Beginner)

Counting the frequency of a word in any given text. This includes Bag of Words and TF-IDF. Example: "Which of these texts focus on women?"

  •     Collocation (Beginner)

Examining where words occur close to one another. Example: "Where are women mentioned in relation to home ownership?"

  •     Topic Analysis (or Topic Modeling) (Intermediate)

Discovering the topics within a group of texts. Example: "What are the most frequent topics discussed in this newspaper?"

  •     TF/IDF (Intermediate)

 Finding the significant words within a text. Example: "What language is most significant within 1970s political speech?"

Question 2: How are these texts connected?

  •     Concordance (Beginner)

Where is this word or phrase used in these documents? Example: "Which journal articles mention Maya Angelou's phrase, 'If you're for the right thing, then you do it without thinking.'"

  •     Network Analysis (Advanced)

How are the authors of these texts connected? Example: "What local communities formed around civil rights in 1963?"

Question 3: What emotions (or affects) are found within these texts?

  •     Sentiment Analysis (Intermediate)

Does the author use positive or negative language? Example: "How do presidents describe gun control?"

Question 4: What names are used in these texts?

  •     Named Entity Recognition (Intermediate)

List every example of a kind of entity from these texts. Example: "What are all of the geographic locations mentioned by Tolstoy?"

Question 5: Which of these texts are most similar?

  •     Authorship Attribution (Advanced)

Find the author of an anonymous document. Example: "Who wrote The Federalist Papers?"

  •     Clustering (Advanced)

Which texts are the most similar? Example: "Is this play closer to comedy or tragedy?"

  •     Supervised Machine Learning (Advanced)

Are there other texts similar to this? Example: "Are there other Jim Crow laws like these we have already identified?"

 

This lesson is a remixed version of Teaching Text Analysis with Constellate, a Jupyter Book CC BY, Nathan Kelber and Ted Lawless for Constellate.