Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Gale Digital Scholars Lab: an introduction

A brief survey of offerings from the Gale Digital Scholarship Lab for Penn State researchers

DSL offers six separate methods for performing analyses

The Lab offers a platform and graphical user interface for running familiar, open source packages (coding experience not necessary) 

The following analyses are available (source packages, where relevant, are described in parenthesis)

  • Part-of-speech tagging - identify grammatical features like nouns, verbs, adjectives (spaCy)
  • Named entity recognition - find proper nouns: people, places, dates (spaCy)
  • Ngrams - track words or phrases in your content set using unigrams (turtle), bigrams (the turtle), trigrams (after the turtle
  • Sentiment analysis - assign weights to content sets to measure positive, negative, neutral statements (AFINN lexicon)
  • k-means clustering - cluster documents in a content set based on lexical and overall linguistic similarity  (scikit)
  • Topic Modelling - identify lexical co-occurances across content set(s) in salient groupings (MALLET)

All of these methods are available for performing analyzes on Gale-derived content sets and your own content sets that can be uploaded to the platform. The Lab provides documentation, help files, and sample sets and sample projects to help you learn to apply these methods; for additional help, please contact the literary informatics librarian, Heather Froehlich (hgf5@psu.edu).

 

Your first pass with these approaches may not necessarily be the most informative

This is an iterative process, not one that will automagically solve your problems for you. Be prepared to take a trial-and-error approach to your methods and allow yourself to move beyond the most immediately obvious findings. Computers are good at keeping track of things that we as readers are bad at noticing - if you didn't need the computer to find it, it probably isn't a very meaningful discovery. Be open to finding something new or unexpected in this process, and be prepared to explain how it does or does not fit with your understanding of the data. Contact Heather Froehlich (hgf5@psu.edu) for help with the interpretive processes of using these methods.