Have questions about text analysis at Dartmouth? Ask the Research Data Help team.
Dartmouth College Library offers text analysis platforms, training, and support for your research and teaching needs. We currently provide access to two text analysis platforms, Constellate and ProQuest TDM Studio. We also offer individual or group training and support. Please contact the Research Data Help team: researchdatahelp@groups.dartmouth.edu to get started.
Text analysis platforms provide researchers with the ability to search, manipulate, and explore knowledge in the scholarly record at a large scale to uncover important information and gain new insights. Text analysis can enable us to answer questions on how texts are interconnected, what sentiments they contain, and when significant terms change within a collection of unstructured texts.
Text analysis research questions explore a wide range of topics, from biomedical discovery to literary history. Research questions that are conducive for text analysis methods may involve these characteristics:
There are five main questions that text analysis can help answer:
Question 1: What are these texts about?
Counting the frequency of a word in any given text. This includes Bag of Words and TF-IDF. Example: "Which of these texts focus on women?"
Examining where words occur close to one another. Example: "Where are women mentioned in relation to home ownership?"
Discovering the topics within a group of texts. Example: "What are the most frequent topics discussed in this newspaper?"
Finding the significant words within a text. Example: "What language is most significant within 1970s political speech?"
Question 2: How are these texts connected?
Where is this word or phrase used in these documents? Example: "Which journal articles mention Maya Angelou's phrase, 'If you're for the right thing, then you do it without thinking.'"
How are the authors of these texts connected? Example: "What local communities formed around civil rights in 1963?"
Question 3: What emotions (or affects) are found within these texts?
Does the author use positive or negative language? Example: "How do presidents describe gun control?"
Question 4: What names are used in these texts?
List every example of a kind of entity from these texts. Example: "What are all of the geographic locations mentioned by Tolstoy?"
Question 5: Which of these texts are most similar?
Find the author of an anonymous document. Example: "Who wrote The Federalist Papers?"
Which texts are the most similar? Example: "Is this play closer to comedy or tragedy?"
Are there other texts similar to this? Example: "Are there other Jim Crow laws like these we have already identified?"
This lesson is a remixed version of Teaching Text Analysis with Constellate, a Jupyter Book CC BY, Nathan Kelber and Ted Lawless for Constellate.
Like all computational research methodologies, text analysis has several steps required to produce clear, reproducible results. Agnostic of the platform, users must determine required content, build the corpus dataset (of rights-cleared texts), complete analysis either using templates or custom computational scripts, and then export derived results.
Select Content
- Select specific publication titles or all titles in a database ** Note Only Selected, rights-cleared content should be used **
- Refine content by keyword, date, source type or document type
Create Dataset
- Use standard platform visualizations
or
- copy/import it to the computational notebook environment (Jupyter/Colab)
Explore using Computational Notebooks
- Use sample scripts to explore your dataset
Create Custom Scripts
- Setup R or Python environment to create your own scripts
Export Derived Results
- Export files and visualizations for dissemination