Data Visualization and Management: The Basics

Our October 25 workshop, “Data Management and Visualization: The Basics” went wonderfully! You can access some of the workshop materials here if you want to get your hands on them now, although Stacy, Ted, and I plan to clean them up and release a version with more detailed instructions for other libraries that may want to replicate our workshop. We gathered survey data from participants, so we will also be working on coding and analyzing that data over the coming months. Hopefully we can find areas to improve in future workshops. We do plan to publish on our experience at some point.

My particular section of the workshop focused on topical analysis: visualizing a corpus of TEI-encoded texts from the Victorian Women Writers Project (VWWP). The VWWP texts plus many other digital collections that have been encoded at IU were recently made available by Michelle Dalmau, Interim Head of Digital Collections Services at IU. They are available for download on Github and I would strongly encourage you to check them out.


Originally I hoped to use the HathiTrust Research Center tools to perform topical analysis but when we were planning the workshop they weren’t yet available. Instead, I turned to Voyant, a browser-based tool that allows users to upload texts and interact with them in various ways. Its functionality is limited, but it’s great at measuring word frequency. The interface consists of multiple panes to dig into the text/s you’ve uploaded. Overall, an excellent fit for our beginner-level workshop.


A screenshot of the Voyant interface. Stop words have been applied to the corpus. The Word Trends pane is automatically displaying the five most frequently appearing words across the corpus.


This screenshot shows the term “love” being selected with the Corpus Reader, causing the Word Trends to visualize the term frequency only with one of the texts rather than across all the texts. The Keywords in Context pane allows for closer inspection of these terms.

It’s worth noting that our use for the purposes of this workshop doesn’t do the corpus justice. While TEI is recognized by Voyant, the tool only removes administrative information such as who encoded it and other details that are not semantically relevant for analysis and visualization. Unfortunately, the tool cannot select individual TEI elements and explore them in detail.

I’ll be sure to post our final slides and workshop materials when they become available!



Leave a comment

Your email address will not be published. Required fields are marked *