Skip to Main Content

Constellate: An Introduction

Working with Datasets

Constellate's data builder encourages users to want materials from certain journals, but the search facets allow you to widen or narrow your parameters to develop a more specific data set. You can preview the changes you make on the results interface before finalizing the output.

You can build your dataset through the search interface, building on particular words or concepts of interest. Some options include:

  • searching on different kinds of documents (articles, chapters, books, newspapers, pamphlets, etc)
  • and from certain providers (newspapers but not from Chronicling America; articles specifically only from the journal Past and Present).
  • in particularly tagged fields (Art, but not inclusive of materials tagged 'literature' or 'performing arts')
  • Leaving various spaces blank or untouched widens your searches.

You may want to explore these settings before you settle on a final output. 

I want to save keep information about my dataset for future reference.

Once you've created a dataset or encountered one you want to use from their pre-selected material, you can download some features. For example, if you wanted to have access to information related to African-American history covering all the materials available from African American Review, Black American Literature Forum and Negro American Literature Forum between 1967-2020, you can download the following data. (The relevant file type is included in parentheses.)

  • metadata (.csv)
  • metadata, ngrams, full-text (.json)
  • sampled metadata (.csv)
  • sampled metadata, ngrams, full-text (.json)

You can also request all metadata, as well as sheets of unigrams, bigrams, and/or trigrams (all as .csv files). Finally, it is possible to request a Constellate Document Format json file which will cover all metadata, unigrams, bigrams, trigrams, and full-text. Different people have different needs; this provides a range of ways to get what you need. Read more about what these download options will offer on the Constellate Help Pages:

How do I analyze my dataset?

Constellate offers a Jupyter Notebooks environment for analysis and manipulation of your data sets using the Python scripting language. You can access them in an annotated, tutorial learning version and in a more minimalist research version. Constellate calls these the "Tutorial" option, which is designed to help beginners, and a more straightforward "Analysis" option. They offer several pre-established Jupyter Notebooks, including scripts for metadata and pre-processing, working with simple word frequencies and more complex word frequencies (TF-IDF), and topic modeling. You can also import your own notebook if you have your own scripts you'd like to run from any extant Github repository.

You can also run your own analyses with the data downloadable outright from Constellate.

I need help with the analysis functions provided by Constellate.

The Constellate team has developed some robust help documents to support you getting started with their platform.

And, they provide how-to guides which outlines the material covered here in a step-by-step format:

If you need more help