Skip to main content

Decoding the Words: A Textual Data Analysis of Jane Addams' Papers

The Jane Addams Papers is a project largely based on textual documents. The letter exchanges and documents not only reveal the activities and addresses of Jane Addams and her associates but also guide us toward cultural and ideological aspects that underlie the texts. From an academic perspective, textual data presents thoughtful information about social and historical narratives. In this context, the textual analysis helps us understand how Addams and her associates communicated their ideologies, thoughts, and plans through their exchanges of letters. 

There are different methods of textual analysis that give different information. One such method is visualizing data using word clouds. A word cloud is a cluster of words depicted in different sizes depending on their occurrence in the textual data that is used to create it. It works in a simple way; the more a word occurs in a source of a textual document, the bigger and bolder it appears in the word cloud. For instance, the following depiction of a word cloud shows that “women”  is the most frequent word that appeared in the texts used to make that word cloud. The way that word cloud is used in this analysis is by creating a subset of the original dataset according to a certain range of time. This is to enable the comparison of word clouds produced by factoring in different time ranges. For instance, the following depiction represents a word cloud based on documents formulated in the year 1912. The first step to this is the creation of a sentence out of every word used in chosen attribute. In this case, the text used in transcription is used to create a sentence out of every word in each transcription of the year 1912. The sentence is then used to create a word cloud that generates the frequency of each word and visualizes it according to the size of its occurrence. 

Evidently, word clouds are a way to quantify text-based data into measurable analytics. It can deliver valuable insights into the most prominent items in a given text. Without the use of analytical tools as such, it can be very difficult to compare the sentiments of the text by simply reading each description on individual bases. Comparing the above diagram of a word cloud formulated based on the year 1912 to that of 1913 depicted below, we can see that there are some differences, but not very much. The obvious ones are the occurrence of the “Progressive Party” in 1912’s word cloud and the occurrence of “Congress” in 1913’s depiction. One probable cause of the appearance of the “Progressive Party” in 1912’s word cloud is because of Jane Addams’s support of Theodore Roosevelt’s run for the presidency in 1912 for the Progressive party. And that of “Congress’ in 1913’s word cloud is because of Congressional Testimony on Woman Suffrage in Washington, DC in 1913. It is also no surprise that both years have “women” as the most frequent word, as the data set was extracted using the “women suffrage” keyword. 

To understand the relationship between the words and the year it was most frequently used, a series of charts are created. The main data set is initially divided into subsets based on a single year. The program then searches for a chosen word in all the values of a chosen attribute and counts its occurrences on that. A stem chart is created with the year value on the x-axis and the frequency of the chosen word in the corresponding year on the y-axis. The first chart shows the occurrence of a chosen word - “Progressive” in this case - throughout the years based on its occurrence in the text attribute of the dataset.

As can be seen, the word “Progressive” has been used most frequently in 1912 and only once or twice in 1910, 1913, 1914, and 1916. Congress, on the other hand, has appeared frequently with the most occurrences in 1915. Delving further into all the mentions of “Congress” in the texts in 1915 and looking at all the corresponding descriptions of it, we can see that been mentioned in the texts with the respective description shown in the table below. 

The way that these figures have been devised is by taking a chosen word - “Progressive” and “Congress” in this case - and calculating its occurrence in each year based on the chosen column, which is “Text” in these cases.

Looking through the textual data of the Jane Addams Papers, we open fresh interpretations about the sentiments or patterns of a piece of writing. Sorting the text doesn’t only illustrate the themes or topics that are being discussed, it reveals relays of information about a person’s life and what is important to them that we are unlikely to appreciate without analytic tools. Overall, word-based textual data can provide valuable insights into the way people communicate and the issues that are important to them.

Further readings on the project:

This blog post was composed in the Fall of 2022 by Anushka Acharya, a senior Computer Science major at Ramapo College of New Jersey.