The Lab offers a platform and graphical user interface for running familiar, open source packages (coding experience not necessary)
The following analyses are available (source packages, where relevant, are described in parenthesis)
- Part-of-speech tagging - identify grammatical features like nouns, verbs, adjectives (spaCy)
- Named entity recognition - find proper nouns: people, places, dates (spaCy)
- Ngrams - track words or phrases in your content set using unigrams (turtle), bigrams (the turtle), trigrams (after the turtle)
- Sentiment analysis - assign weights to content sets to measure positive, negative, neutral statements (AFINN lexicon)
- k-means clustering - cluster documents in a content set based on lexical and overall linguistic similarity (scikit)
- Topic Modelling - identify lexical co-occurances across content set(s) in salient groupings (MALLET)
All of these methods are available for performing analyzes on Gale-derived content sets and your own content sets that can be uploaded to the platform. The Lab provides documentation, help files, and sample sets and sample projects to help you learn to apply these methods; for additional help, please contact the literary informatics librarian, Heather Froehlich (hgf5@psu.edu).
Your first pass with these approaches may not necessarily be the most informative
This is an iterative process, not one that will automagically solve your problems for you. Be prepared to take a trial-and-error approach to your methods and allow yourself to move beyond the most immediately obvious findings. Computers are good at keeping track of things that we as readers are bad at noticing - if you didn't need the computer to find it, it probably isn't a very meaningful discovery. Be open to finding something new or unexpected in this process, and be prepared to explain how it does or does not fit with your understanding of the data. Contact Heather Froehlich (hgf5@psu.edu) for help with the interpretive processes of using these methods.