Skip to Main Content

HathiTrust: Introduction for text & data mining

Getting started with the HTRC

An overwhelming majority of the HTRC services require an account to log in and interact with the tools.

Additional Resources:

Only one major feature of HTRC does not require an account and that is the HathiTrust+Bookworm, which allows users visualize and analyze word usage trends in the 4.8 million public domain works in the HathiTrust Digital Library. This operates much like the Google Books Ngram Viewer does, but is specific to the HathiTrust Digital Library (of which Google Books is a participant)

I'm logged in. Now what?

On the Hathi Trust Digital Library, you can build "collections" - items around a particular theme, often between a specified period of time and/or tied to a particular location. These "collections" of items transfer into Hathi Trust's Research Center as "worksets", where you can analyze these groups of text with a specific purpose using a variety of algorithmic approaches. 

Worksets have two primary functions:

Organizational

  • Gather material of interest in one place
  • Search/metadata criteria to "slice and dice" results into a usable workset (e.g. a subset of a larger workset/collections)

Algorithmic

  • Delimit the scope of the analysis to a certain feature or element
  • Execute an analysis on your workset and observe 

Learn how to create a workset in the HTRC documentation: https://wiki.htrc.illinois.edu/x/cIFnAQ. (You will need a HathiTrust login to access their wiki.)