Skip to Main Content

Hours & Login Menu

  • Hours
  • Login
    • Library Search Login
    • Interlibrary Loan
Dartmouth Libraries Dartmouth Libraries

Global dropdown menu

    • Borrow and Request
      • Who Can Borrow
      • What Can You Borrow
      • Loan Periods and Renewals
      • Borrow from Other Libraries
      • Request Materials
      • All Borrow and Request
    • Collections
      • Digital Collections
      • Media Collections
      • Oral Histories
      • Collections Care
      • All Collections
    • Course Reserves
      • Find Course Reserves
      • Create or Add Course Reserves
      • All Course Reserves
    • Off-Campus Access
    • Records Management
      • Retention and Disposition
      • Confidential Monthly Destruction
      • Electronic Records
      • Physical Records
      • Retention Schedules
      • All Records Management
    • Search and Browse
      • Library Search
      • Databases
      • Journals
      • Research Guides
      • Maps and Atlases
      • Newspapers
      • Dartmouth Digital Commons
      • Music Scores
      • BorrowDirect
      • Archives and Manuscripts
      • All Search and Browse
    • Design and Produce
      • Audio and Video
      • Book Arts
      • Design and Digital Art
      • Equipment and Hardware
      • Software
      • All Design and Produce
    • Data Services
      • Research Data Management
      • Data Analysis and Visualization
      • Data Repositories
      • Data Workshops
      • Datasets at Dartmouth
      • All Data Services
    • Digital Scholarship
    • Publishing and Copyright
      • Copyright
      • Open Access
      • Publisher Agreements
      • Publishing for Faculty
      • Publishing for Students
      • All Publishing and Copyright
    • Research Help
    • Teaching and Workshops
    • Print, Copy, Scan
    • Locations
      • Baker-Berry Library
      • Book Arts Workshop
      • Evans Map Room
      • Feldberg Business and Engineering Library
      • Health Sciences and Biomedical Libraries
      • Jones Media Center
      • Library Collections and Services Facility
      • Rauner Special Collections Library
      • Sherman Art Library
      • All Locations
    • Accessibility
    • Events
    • Exhibits
    • Hours
    • Study Spaces
    • About Dartmouth Libraries
      • Council on the Libraries
      • Diversity, Equity, and Inclusion
      • Friends of the Libraries
      • Library Departments
      • Strategic Framework
      • Staff Directory
      • All About Dartmouth Libraries
    • Employment
      • Staff and Professional Positions
      • Student Positions
      • Fellowships
      • All Employment
    • News and Highlights
    • Policies
    • Programs and Awards
      • Alumni Memorial Book Fund Program
      • MAD Research Video Contest
      • Staff Awards
      • All Programs and Awards
    • Contact Us
    • We're Here to Help
      • Students
      • Faculty
      • Alums
      • Staff
      • Visiting Researchers and Community
      • All We're Here to Help
    • Find a Specialist
      • Subject Librarians
      • Audio and Video Production
      • Preservation and Emergency Preparedness
      • Publishing and Copyright
      • Records Management
      • Research Data Services
      • Systematic Review
      • All Find a Specialist
    • Ask Us
  • Hours
    • Library Search Login
    • Interlibrary Loan

Global dropdown menu

    • Borrow and Request
      • Who Can Borrow
      • What Can You Borrow
      • Loan Periods and Renewals
      • Borrow from Other Libraries
      • Request Materials
    • Collections
      • Digital Collections
      • Media Collections
      • Oral Histories
      • Collections Care
    • Course Reserves
      • Find Course Reserves
      • Create or Add Course Reserves
    • Off-Campus Access
    • Records Management
      • Retention and Disposition
      • Confidential Monthly Destruction
      • Electronic Records
      • Physical Records
      • Retention Schedules
    • Search and Browse
      • Library Search
      • Databases
      • Journals
      • Research Guides
      • Maps and Atlases
      • Newspapers
      • Dartmouth Digital Commons
      • Music Scores
      • BorrowDirect
      • Archives and Manuscripts
    • Design and Produce
      • Audio and Video
      • Book Arts
      • Design and Digital Art
      • Equipment and Hardware
      • Software
    • Data Services
      • Research Data Management
      • Data Analysis and Visualization
      • Data Repositories
      • Data Workshops
      • Datasets at Dartmouth
    • Digital Scholarship
    • Publishing and Copyright
      • Copyright
      • Open Access
      • Publisher Agreements
      • Publishing for Faculty
      • Publishing for Students
    • Research Help
    • Teaching and Workshops
    • Print, Copy, Scan
    • Locations
      • Baker-Berry Library
      • Book Arts Workshop
      • Evans Map Room
      • Feldberg Business and Engineering Library
      • Health Sciences and Biomedical Libraries
      • Jones Media Center
      • Library Collections and Services Facility
      • Rauner Special Collections Library
      • Sherman Art Library
    • Accessibility
    • Events
    • Exhibits
    • Hours
    • Study Spaces
    • About Dartmouth Libraries
      • Council on the Libraries
      • Diversity, Equity, and Inclusion
      • Friends of the Libraries
      • Library Departments
      • Strategic Framework
      • Staff Directory
    • Employment
      • Staff and Professional Positions
      • Student Positions
      • Fellowships
    • News and Highlights
    • Policies
    • Programs and Awards
      • Alumni Memorial Book Fund Program
      • MAD Research Video Contest
      • Staff Awards
    • Contact Us
    • We're Here to Help
      • Students
      • Faculty
      • Alums
      • Staff
      • Visiting Researchers and Community
    • Find a Specialist
      • Subject Librarians
      • Audio and Video Production
      • Preservation and Emergency Preparedness
      • Publishing and Copyright
      • Records Management
      • Research Data Services
      • Systematic Review
    • Ask Us
  • Hours
    • Library Search Login
    • Interlibrary Loan
  1. Dartmouth Libraries
  2. Research Guides
  3. Text Analysis
  4. Examples of Text Analysis

Text Analysis

Dartmouth College Library's guide to text analysis tools and platforms
  • Text Analysis at Dartmouth
  • Getting Started
  • Text Analysis Working Group
  • ProQuest TDM Studio Dartmouth Library Introduction
    • Application, About ProQuest TDM Studio, and Best Practices
    • Using TDM Studio
    • Research Case Studies
  • Constellate
    • Using Constellate
  • HathiTrust Research Center Analytic
  • More tools
  • Workshops and Training
  • Examples of Text Analysis
  • External Learning Resources

Creating Digital Text Collections or Corpora

Digital Text Collections

Digital text repositories, like Hathi Digital Trust, store millions of digitized texts. For copyrighted material, your use of these texts may be restricted to keyword searches (returning only snippets) or, when allowed, renting these books for a set amount of time. More valuably, they offer access to millions of full-text copies of books and other texts that are in the public domain. You may view images of these texts (like the image to the top left above) through their browser. For larger text analysis projects, however, you may download a plain text copy (bottom left) of these texts and construct your own digital corpus or text collection for analysis. Note: Hathi and others just automatically digitize these texts using optical character recognition (OCR) software and thus, contain some errors depending on the quality of the page image.

Converting Formulaic Texts into Structured Data

Would you like to convert a formulaic text - like an encyclopedia, gazetteer, statistical abstract, directory, etc. - into a structured dataset that may be queried, sorted, and filtered?

Regular Expression techniques offer a powerful means to parse and extract specific types of information from such texts. For an introductory tutorial in this technique using LibreOffice, see the Programming Historian's Understanding Regular Expressions lesson. For more advanced applications of this technique using the programming language Python, see PH's Generating an Ordered Dataset from a Text File lesson.

Frequency Lists and n-grams

For copyrighted texts, some repositories offer word and term frequency lists to researchers in place of full-text content. JSTOR, for example, allows researchers to download n-gram lists (one-, two-, and three-word terms ordered by their frequency) for journal articles and book reviews in their database.

Annotated / Encoded Text Corpora

Annotated text corpora included special encoding or "tags" to identify and add commentary on the content of texts. By encoding texts with xml tags, these corpora allow web developers to sort, filter, or represent given elements of texts in a variety of ways. These tags have an additional use, however, They also allow researchers to transform these texts into searchable databases. 

Metadata Analysis

Authors

While full-text analyses of large corpora attract much of the attention, we can still learn much by reviewing the metadata of a corpus (i.e. the titles, authors, dates, and abstracts of books and articles). In a 2014 blog post, "Still Playing Catch Up," Cameron Blevins examines the American Historical Review's slow progress toward gender equality. While observing that female dissertation authors had more or less caught up to their male counterparts, for books reviewed by the AHR, male authors (as of 2013) still outnumbered female authors 2 to 1.* Interestingly, a curious gender imbalance also exists among the reviewers. While the reviewers of female-authored books have achieved near-gender parity, 3 out of 4 reviewers of male-authored books are men.

Blevins Gender Study

* As Blevins notes, using a study of first names to identify the gender authors is not without its pitfalls, including the fact that it "subtly reinforces an insidious gender binary framework."

 

Titles

Revolution Graph

 

 

In "Searching for the Victorians," Dan Cohen examines trends in the words found in book titles. Some words, such as "revolution" demonstrate predictable trends (see the graph to the right). Others, indicate patterns that deserve further exploration. For example, Cohen charts indicators of a growing pessimism in the nineteenth century, such as the decline in use of such words as "progress" and "happiness."

 

 

Word Searches, Frequencies, and N-Grams

[other examples]

Exploring Changes Over TimeSOTU graph

One of the most commonly mined sets of texts - at least for the United States - is the corpus of presidential State of the Union speeches delivered every year by the U.S. president since 1790. A 2015 article in The Atlantic describes some of the insights that text analysis provides about how presidential priorities have changed over time in the last 200+ years.

 

 

 

 

 

[work in progress - more examples will be added here...]

  • << Previous: Workshops and Training
  • Next: External Learning Resources >>
  • Last Updated: Mar 11, 2025 10:11 AM
  • URL: https://researchguides.dartmouth.edu/textanalysis
  • Print Page
Login to LibApps
Report a problem

Dartmouth Libraries

  • Baker-Berry Library
    • Book Arts Workshop
    • Evans Map Room
    • Jones Media Center
  • Health Sciences and Biomedical Libraries
  • Feldberg Business & Engineering Library
  • Rauner Special Collections Library
  • Records Management
  • Sherman Art Library

About Us

  • Staff Directory
  • Subject Librarians
  • Library Departments
  • Policies
  • Employment
  • Accessibility
  • Federal Depository Library

Contact Us

  • 25 North Main Street
    Hanover, NH, USA 03755
  • Phone: 603-646-2567
  • Contact Us

Give Us Feedback

Dartmouth Libraries

Footer copyright

  • Dartmouth College
  • Copyright © 2025 Trustees of Dartmouth College
  • Facebook
  • Instagram
  • YouTube
Privacy Policy