Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Voyant Tools: An introduction

Voyant Tools (http://voyant-tools.org) is a web-based platform for analyzing texts using a bunch of different visualization methods. This guide will walk you through several different ways to analyze texts using this web-based-resource.

Voyant Visualizations

 

Voyant offers five default visualizations of data uploaded.

Here we are looking at a lightly cleaned, optical character recognized version of Volume 1 of Aimé Cesaire's Cahiers césairiens (available online here). We removed all boilerplate language (headers, page numbers) and any text not in French prior to analyzing this file.

Visualization panels from Voyant Tools, numbered by purpose. On the upper left, a word cloud labeled (1); in the upper middle, a reading pane labeled (2); on the upper right hand side, a series of graphs to show general trends over time labelled (3). On the bottom left, you will find some basic statistics of our document(s) labelled (4). Finally, on the right-hand side, a concordance to observe words in context, labelled (5).

 

Interpreting the Voyant Tools Analysis box

Each section of the Voyant Tools platform above is numbered. This provides a guide to what each section represents and/or what is provided in that section:

  1. Word Cloud – visualize medium-to-high frequency content words in the corpus. You can increase or decrease the number of words using the TERMS bar below the visualization.
  2. Reading pane – Look at the source material
  3. General trends over time – shows overall variation across document, graphing the most frequent content words identified by the Voyant system for comparison. These are divided into 10 equal parts (determined from the size of the file). If you upload multiple documents, they will appear as faint boxes to show scope and space.
  4. Some basic statistics about our document(s) – some information about our documents including longest/shortest, a type/token ratio or lexical density score for each document, and some of the overall most frequent content words in each document
  5. Concordance – identify patterns in word use wit a keyword in context concordance format (by looking at words to the left, a specified keyword, words to the right).  You can expand the columns to get a better view, alphabetize results by clicking on the word ‘left’ or ‘right’, and look up specific words in the search box on the bottom-left.

Interpreting our visualization

The Voyant Tools interface provides lots of information to sort through, and some parts are easier to interpret than others.

The biggest challenge is moving from the most obvious words to the less obvious words. We do not need a computer to tell us these extremely high-frequency words are present with the Word Cloud (#1) ; we could have derived this from simply reading the materials. For example, words like césaire, politique - words we know are intrisinic to the text  (which contains interviews with Aimé Césaire about politics and political thinking) are not implausibly very frequent. But more interesting are the less-immediately-obvious words in the word cloud, such as premèire, christophe, française. We might already know that Cesaire is writing about La Tragédie du roi Christophe, but thinking about how he is using this phrase can be accomplished with they Keyword in Context viewer (#5, look up 'christophe' in the box on the bottom-left) to observe patterns of usage in context.

Secondly, we might want to see where these particular phrases are used in the whole of the document. The Text Reader (#2) is where we might find that certain words or phrases are more clustered in the top of the document or towards the end. This is something that would be harder to notice as a linear reader.

This is an iterative process! You might start with one word that looks interesting and end up somewhere totally different. That's OK and to be expected.