fbpx
Emily Dickinson and the Meter of Mood: An Experiment in Text Analysis Emily Dickinson and the Meter of Mood: An Experiment in Text Analysis
Article by Jen Looper, Principal Cloud Developer Advocate Lead at Microsoft on the Next Generation Team. I tie my Hat —... Emily Dickinson and the Meter of Mood: An Experiment in Text Analysis

Article by Jen Looper, Principal Cloud Developer Advocate Lead at Microsoft on the Next Generation Team.

I tie my Hat — I crease my Shawl —
Life’s little duties do — precisely —
As the very least
Were infinite — to me —

I put new Blossoms in the Glass —
And throw the old — away —
I push a petal from my gown
That anchored there — I weigh
The time ’twill be till six o’clock
I have so much to do —
And yet — Existence — some way back —
Stopped — struck — my ticking — through —

(443)

The enigmatic young lady staring directly into our eyes in the famous daguerreotype pictured above challenges us. What is she thinking, with her slightly pursed lips, small nosegay, and plain dress? Perhaps she is composing another of the nearly 1800 poems she wrote in her lifetime. Perhaps she is thinking of her garden, where she gathered some fresh flowers. Perhaps she is pondering the many things she has to do before six o’clock.

The poet Emily Dickinson, pictured above, is, of course, drawing on the “carpe diem” trope in her poem. As Robert Pinsky noted, the poet is well known for her somber, “steely perception” that time runs on. Her style and vocabulary have led readers of her work to portray her as a negative, austere spinster poet. Critics have also dismissed her as a hysterical or depressive recluse. Known as the “lady in white”, Dickinson was better known in her lifetime for her garden than for her poetry. It is easy to read through some of her poems and find darkness:

A train went through a burial gate,
A bird broke forth and sang,
And trilled, and quivered, and shook his throat
Till all the churchyard rang;

And then adjusted his little notes,
And bowed and sang again.
Doubtless, he thought it meet of him
To say good-by to men.

Emily Dickinson is now recognized as one of America’s great poets, on a par with Walt Whitman. But what was she really feeling when she wrote? Is there any way to know?

According to the Emily Dickinson Lexicon, the poet wrote “over 1,789 poems from 1850–1886. She wrote over 1,046 letters from 1842–1886. The collected poems contain over 9,275 unique words and nearly 100,000-word occurrences.” Her most prolific period as a poet was from 1858–65.

While some of her poems appeared during her life, most were published after her death. Her younger sister, Lavinia, discovered the bulk of her poetry in a box of papers. Her poems are written on scraps of paper, backs of envelopes, or might even have wrapped a bouquet from her celebrated garden, and eventually were bound into booklets (“fascicles”). The digitized collection kept by Amherst College shows the varied formats of the writing. Early partial editions, heavily edited, were published in the 1890s and again in the 1920s. The first scholarly edition of her poetry is the 1955 edition of The Complete Poems of Emily Dickinson, edited by Thomas H. Johnson.

From a stylistic standpoint, Dickinson’s poetics were revolutionary. She was prone to writing poetry without titles and with nonstandard punctuation. Her poetry is probably best known for its fascinating use of ‘slant rhyme’, pairing line endings by either shared consonants or vowels. In the sample below, “rides” pairs with “is” by the shared ‘s’ consonant. The effect is often dissonant and startling (and in this case, reminiscent of a snake’s hiss), but not without charm:

A narrow fellow in the grass
Occasionally rides;
You may have met him,– did you not?
His notice sudden is.

Scholarly reception of the poet has evolved over the years just as the editions of the poetry have evolved. Early critics dismissed the work after its initial publication. Dickinson’s rejection of 19th-century form, however, led more recent literary critics to label her a modernist. Feminist readings such as that of Adrienne Rich have raised her as an iconic woman writer. Currently, a large body of articles continues to uncover new facets of this fascinating author. The Emily Dickinson Journal is dedicated to her scholarship and is sponsored by the Emily Dickinson International Society (EDIS).

What questions can we ask of this historical poetic corpus? How can data mining and machine learning techniques help us unlock new aspects of an author whose work defied categorization?

Let’s start with the data. This poet’s data is comprised of thousands of words in nineteenth-century American English. If we could discover, scrape, or otherwise gather a good dataset of the nearly 1800 poems, it would produce an interesting exercise in data mining. But what thorny question about Dickinson can data mining answer?

Can data mining help answer controversial questions about Dickinson’s state of mind as reflected in her poetry? An interesting line of inquiry is suggested by John McDermott, MD. In an article in the American Journal of Psychiatry, the author suggests that Dickinson may have suffered from bipolar and seasonal affective disorders. He proposes that she was strongly affected by changes in the seasons and that her moods are reflected in her output. Laying aside the risk of doing a disservice to the artist by attempting to analyze her mental health purely by text analysis, let’s see if, by means of data mining and machine learning, we can predict, by the sentiment of a poem, in what season of the year it might have been authored.

Acquiring the Dataset

The problem with data science is…it requires data! To work with this poetry in a digital setting, you will need to find a way to digitally acquire an adequate dataset from a reliable resource. Not all the large amount of Emily Dickinson poetry available on the internet is appropriate to use for our purpose.

Ideally, there would exist a high-quality online website containing an authoritative version of each poem available to be gathered into a dataset. Various websites, however, offer content of varied quality.

  • The PoetryDB API is a handy tool that allows a user to gather poems via an endpoint provided by the API (Application Programming Interface). APIs are a useful connection between a database and a web browser. To get a listing of several poems by Emily Dickinson, visit the poetrydb in a web browser with an appropriately-formatted URL. This database, however, appears to be crowdsourced; it is unclear from where the poems were acquired. The purpose of this database is to inspire today’s poets, not necessarily to provide scholars with datasets.
  • The 1891 Loomis edition of poetry is available via Project Gutenberg as an HTML page. However, since it is well documented that the poems were heavily edited in this edition, it is not as useful for the data scientist bent on analyzing vocabulary.
  • The 1924 Bianchi edition available online contains 593 poems, thus it is incomplete. However, it is available on Bartleby’s website.
  • The 1955 Johnson edition was the first modern edition of the full corpus of poems. Importantly, Johnson attempted to assign a chronology by year to the poetry. A dataset is available but it would need considerable cleaning to render it usable. It is presented in one flat file and contains typographical errors.
  • The first truly scholarly edition of the poems is the three-volume 1998 Variorum edition by R. W. Franklin. It is available to the would-be data scientist by means of the brilliant edickinson.org project. This online database contains a wealth of data obtained from the other editions and most importantly the Variorum. It includes metadata by Franklin who attempted to assign seasons and years to many of the poems. This is a treasure trove!

Unfortunately, the dynamic panel layout of the edickinson website favors analyzing Dickinson’s handwritten copy over the text itself, which is only available after opening a tab. In addition, the Variorum edition includes all the variants of the poems and many notes, so the data would need to be carefully cleaned to reflect an authoritative version for linguistic analysis. Still, given the quality and completeness of this edition and the website, it is a critical tool for our project.

R. W. Franklin, most notably for our purposes, has attempted to assemble and tentatively date the poems. His edition is as accurate as possible, given Dickinson’s habits of rewriting poems and destroying the originals. Franklin, who dedicated years to untangling this poetic corpus, attempted to assign seasons or parts of a year to the poems based on observation of minutiae such as stamps on paper scraps. McDermott relied on this edition to help him plot the rise and fall in output, season over season, of Dickinson as a writer.

Our method, then, to determine whether Dickinson’s vocabulary reflects seasonal mood changes, will have to rely on the incomplete, non-scholarly digital edition of her poetry on Bartleby.com. We can then cross-check these poems against the Variorum edition via edickinson.org. In doing so we can test whether perceived ‘negative’ or ‘positive’ vocabulary, as determined by a machine learning algorithm, can be plotted against Franklin’s estimation of the season in which a poem was written.

Now that we have decided which dataset we will use, and how we will cross-check it for periodicity, we can start the process of data mining.

Note: there are various ways of acquiring data from an online source, whether via an API call to a database such as the poetrydb or some kind of scraping technology. Carefully check a site’s terms of service to see whether web scraping is permitted. If not, you will be obliged to gather your data manually into a private spreadsheet.

Add the seasons

Once you have your Excel file ready with one poem per row, you are ready to add their seasons, a somewhat tedious and manual task. Add a column called ‘seasons’ and populate it with data from the Variorum edition. You will need to search for each poem in this edition as listed on the edickinson.org project. Once you find the poem, drill down into its metadata and add Franklin’s best guess as to the season in which it was written. The terms he uses are:

  • Early in the year (we surmise Jan-March)
  • Second half of the year (we surmise Oct-Dec)
  • Late in the year (we surmise Nov-Dec)
  • Spring
  • Summer
  • Autumn
  • Winter

You can use the terms ‘early/second/late/spring/summer/autumn/winter’ for consistency, and ‘none’ where there is no guess.

Working with the Data in a Notebook

On your local computer, create a Jupyter notebook called emily.ipynb. Create a folder called input and add your Excel spreadsheet, saved as a .csv file. A good option for working with notebooks locally in Visual Studio Code with the Jupyter extension installed. You will use NumPy and pandas, so import those at the top:

Next, visualize the data to get an idea of its nature. Use Matplotlib to show the most common words in your two columns in a bar chart:

In this code, the data is sorted by how often words are found. Then, the graphs are drawn to the screen with colors, labels, and fonts specified. Experiment with changing fonts and colors to make the chart more readable.

To show the graph, you need to invoke the methods that you just set up. But if you do that, you will show words such as aand or the, as they will indeed be the most common. To avoid this, import one more package: nltk. This library is a great resource for natural language processing.

Here, you have imported libraries that help you filter out any ‘stop words’ as defined by these packages. Words are also lower-cased and the language is specified. The poetry is ‘tokenized’ into an array of individual words. Punctuation is filtered out. The filtered words are then fed to the plotWordFrequency method so the graph can be drawn.

Most commonly used words in the poems

Season distribution

This is a great exercise when trying to discover the ‘flavor’ or ‘feel’ of a corpus, as long as the language is not too archaic. Stopwords exist for many languages or you can create your own; try the technique on literature in other languages.

Keep in mind that this dataset is just a third of the entire corpus in size, which is already a subset of the poet’s large body of work.

Using Python scripts to skim through literary datasets is a great way to introduce yourself to the vocabulary employed by an author. First, you might notice which words are most common in the dataset. You also might note the ‘flatness’ of the graph. The graph is flattened due to the richness of Dickinson’s vocabulary. A great comparison might be the 250-odd song lyric of Beatles music from this dataset, where the graph is quite steep. The Beatles tended to repeat words like ‘love’ and had a reduced vocabulary, compared to Dickinson’s rich, varied stock of words.

The other interesting outcome of this exercise is the revelation of the actual words most common to Dickinson: day, sun, time, life, and heaven all precede night, death, god, and soul. In this corpus, life is more often invoked than death. The most common word here is ‘like’ — probably because of Dickinson’s extensive and rich use of metaphor. Even so, does the vocabulary here equate to darkness and depression?

Another interesting chart is the season’s chart which helps us understand how much we do NOT know about the season in which the poem was written. The seasonality that McDermott notes in his article is confirmed here: as the year wears on, the number of poems diminishes. But does the nature of the poetry itself change? Determining this is our next task.

The next step as we analyze this poetry is to assign each poem a ‘sentiment’ — an idea of the positive, negative, or neutral tone of the poem. This can be done by hand using natural language processing techniques. A good option is to use a Cognitive Service such as Microsoft’s Text Analytics.

Set up an instance of a Text Analytics service by following this tutorial. Since your data is already in a spreadsheet, you can use Power Automate to skim through the poems in their rows and assign an integer to each poem based on an analysis of its sentiment. While this type of service is generally used to gather product feedback, it’s interesting, and sometimes enlightening, to try it on other types of literature.

Power Automate is a low-code tool that allows you to set up ‘flows’ to perform automated tasks on data. To use it, you will need to add your spreadsheet to OneDrive, a cloud storage provider, so that Power Automate can find it. Convert your spreadsheet to a table in Excel by selecting the data and choosing Insert> Table. A table with column headers will be created.

Then, open Power Automate and create a ‘flow’ to append sentiment from Text Analytics. The flow will go through the spreadsheet, line by line, and assign a perceived sentiment (positive, negative, or neutral) for each poem.

Power Automate

Using the flow builder, create a three-part flow. First, use the ’Manually Trigger a Flow” block. Attach that to the “List Rows Present in a Table” block. In this block, specify the location of your spreadsheet and the table to analyze in the spreadsheet.

Add one more block: “Apply to each”. In this block, use an ‘AI Builder’ block to add ‘Analyze Positive or Negative Sentiment in a Text”. Specify the language as English and the text to be ’poem’, referring to the poem column header.

Attach a block to that AI Builder block, ‘Update a row’. Specify the key column as ‘id’ and the key-value as the ‘id’ column from your spreadsheet. In the ‘sentiment’ area of this block, specify ‘overall text sentiment’ as the value you want the flow to append to your spreadsheet’s sentiment column.

Save and run the flow using the Test panel. For this dataset, it will take several minutes for the flow to run. When it is complete, your ‘sentiment’ column should be populated with the words positive, negative, mixed, or neutral.

You may need to run this flow in batches to ensure that all data is processed

With your updated spreadsheet, you can now do some more data mining in your notebook to determine if any patterns can be detected to correlate sentiment with seasonality.

Because the spreadsheet contains text, rather than integers, plotting its data in a chart other than the bar charts you created prior is not feasible. To determine comparable patterns, a line chart is preferable. With a line chart, you can visualize several groups of data, superimposed on each other. We want to see if we can find a pattern of seasonality based on a group of poems, batched by season. We can sort these poems by sentiment as determined by the text analytics ‘sentiment’ cognitive services.

You can use the pandas package to create a dataframe of poems, grouped by season. In your notebook, sort the data by season and by sentiment:

A pattern of seasonal sentiment

The result is consistent: no matter what season, the sentiment of the poems evolves in basically the same pattern. We find similar proportions of negative, positive, mixed, or neutral poems written regardless of the season. The outlier is ‘winter’ because there is not enough data — only one poem is assigned ‘winter’ by the editor. There is also one extra ‘early’ data point that is also an outlier. There are many poems not assigned a season (the ‘none’ column) but even this larger group follows the same general curve of the corpus’s sentiment track.

“none” for season

Artificial intelligence is not needed to understand that, by and large, the predominant sentiment in Dickinson’s corpus, no matter the season, was indeed negative, although in the Spring, specifically, she produced a broader mix of ‘mixed’ and ‘neutral’ sentiments in her poetry. To get a more exact, detailed analysis against which the dataset categorized without season might be compared, we could turn again to Sentiment Analysis cognitive services to get a sentence-by-sentence reading of the poems which would clear some of the ‘mixed’ determination.

There are many questions that remain unanswered in this exercise. Was Franklin at all influenced by the content of the poems in his task of determining their seasons? Did the Bianchi edition cherry-pick a subset of poetry based on a particular aesthetic, mood, or sentiment? Is a Cognitive Service that is designed to analyze business-oriented text the proper tool for a poetic corpus? Does Dickinson’s experimentation with sentence form influence or confuse the Cognitive Service? Would a more custom solution for stop words and sentiment analysis work better for this dataset? And is it even appropriate to try to guess the mood of a reclusive poet writing over 150 years ago?

When applying data science and machine learning techniques to a historical dataset, all these questions should be kept in mind. The curious humanist is well-served by researching the nature of the dataset and accounting for its shape before trying these techniques. Dickinson, in her own work, asked whether she and her poetry were ‘alive’, seeking proof of her own existence through her pen:

I am alive — because
I do not own a House —
Entitled to myself — precise —
And fitting no one else —

If we can be excused for trying to fit the literary endeavors of this enigmatic and extraordinary woman into narrow confines, we still must allow that certain patterns can be determined. Fascinatingly, her vocabulary points to less dark imagery than would be expected given the negative overtones. The patterns that emerge when charting a season’s sentiment, however, show a remarkable similarity regardless of season, at least as calculated by a pre-trained text analytics tool.

Bibliography

Cynthia L. Hallen (2001). “At Home in Language: Emily Dickinson’s Rhetorical Figures.” Emily Dickinson at Home: Proceedings of the Third International Conference of the EDIS. Eds. Gudrun M. Grabher and Martina Antretter. Innsbruck, Austria: Wissenschaftlicher Verlag Trier, 2001, 201–222.

James McDermott (2001). “Emily Dickinson Revisited: A Study of Periodicity in Her Work” https://ajp.psychiatryonline.org/doi/full/10.1176/appi.ajp.158.5.68.

Marianne Novy (1990). Women’s Re-visions of Shakespeare: On the Responses of Dickinson, Woolf, Rich, H.D., George Eliot, and Others. University of Illinois Press. p. 117.

Adrienne Rich (1975), “Vesuvius at Home: The Power of Emily Dickinson”.

Lena Christianson, Editing Emily Dickinson. Taylor and Francis, 2007.

R. W. Franklin, Editing of Emily Dickinson: A Reconsideration. University of Wisconsin Press, 1967.

With grateful recognition of Dmitry Soshnikov’s review and tidy-up of parts of the Python!

Originally posted here. Reposted with permission.

2AF3A042-7F6C-4C55-AD99-6ED0522DA10D-CAB4A3FE-3B68-46A7-B74A-68A559B1D170About the Author: Jen Looper is a Principal Cloud Developer Advocate Lead at Microsoft on the Next Generation Team and a Google Developer Expert with over 20 years’ experience as a web and mobile developer, specializing in creating cross-platform mobile and web apps. She’s a multilingual multiculturalist with a passion for web technologies, applied machine learning and discovering new things every day. With a PhD in medieval French literature, Jen’s area of focus is curriculum development and the application of sound pedagogy to technical topics. She is the founder and CEO of Front-End Foxes, Inc., an international nonprofit charity that promotes diversity in front-end developer communities. Visit Front-End Foxes Inc at https://www.frontendfoxes.org and learn about our bootcamp for women at https://frontendfoxes.school. Visit Jen’s personal site at https://www.jenlooper.com, or connect via Twitter @jenlooper.

ODSC Community

The Open Data Science community is passionate and diverse, and we always welcome contributions from data science professionals! All of the articles under this profile are from our community, with individual authors mentioned in the text itself.

1