While full-text analyses of large corpora attract much of the attention, we can still learn much by reviewing the metadata of a corpus (i.e. the titles, authors, dates, and abstracts of books and articles). In a 2014 blog post, "Still Playing Catch Up," Cameron Blevins examines the American Historical Review's slow progress toward gender equality. While observing that female dissertation authors had more or less caught up to their male counterparts, for books reviewed by the AHR, male authors (as of 2013) still outnumbered female authors 2 to 1.* Interestingly, a curious gender imbalance also exists among the reviewers. While the reviewers of female-authored books have achieved near-gender parity, 3 out of 4 reviewers of male-authored books are men.
* As Blevins notes, using a study of first names to identify the gender authors is not without its pitfalls, including the fact that it "subtly reinforces an insidious gender binary framework."
In "Searching for the Victorians," Dan Cohen examines trends in the words found in book titles. Some words, such as "revolution" demonstrate predictable trends (see the graph to the right). Others, indicate patterns that deserve further exploration. For example, Cohen charts indicators of a growing pessimism in the nineteenth century, such as the decline in use of such words as "progress" and "happiness."
[other examples]
One of the most commonly mined sets of texts - at least for the United States - is the corpus of presidential State of the Union speeches delivered every year by the U.S. president since 1790. A 2015 article in The Atlantic describes some of the insights that text analysis provides about how presidential priorities have changed over time in the last 200+ years.