Text Mining “Interchange: The Promise of Digital History”

One of the readingĀ  assignments participants read during the Doing Digital History Institute was the 2008 JAH article, “Interchange: The Promise of Digital History.” The “Interchange” sought to explore the burgeoning field of digital history, tackling questions of definition, pedagogy, forms of institutional support, possible effects on the meaning and process of historical research, and the resonance digital history might have with various publics who might encounter it.

Today, Fred Gibbs introduced us to the concepts of data and text mining, and so I decided to see if I could apply what I learned to the JAH article. Would interesting patterns emerge from the various interviews that appeared? My initial work focused on converting the article into a plain text (.txt) file. I then divided it into a variety of smaller files: the questions posed by the JAH editor, all of the responses offered by each individual participant, and each question accompanied by its related set of answers (and here do I wish I knew how to automate this process instead of cutting and pasting for an hour). In the end, I had one large question file, eight participant files, and nine individual question files, as well as the original .txt file. Different computations were run through Voyant Tools.

Caveat: it is important to note that I have no idea how this interview was edited. I am assuming that the final printed comments reflected the overall contributions of each of the interviewees, but I most certainly cannot be sure.

Overall, one can see the general emphasis of the article through a simple word cloud:

The cloud specifically excludes common English words, as well asĀ  other common, but probably less helpful, words: digital, history, historians, historical, each of the author’s names (which were used to signal the start of each of their contributions in the article), and the word “JAH”, which was used before each question. We are left with the following twenty most frequently used words:

Certainly a couple of themes begin to emerge from this basic analysis. First, the emphasis digital historians placed on the field being “new” is clearly apparent, especially as “new” is frequently followed by “media” or “digital technology” in the article. Given the article’s goal of identifying and describing digital history as a new enterprise historians were embarking upon, this may not be surprising; however, the strong use of the word does supply evidence for Fred Gibbs’ point today of the somewhat overstated dichotomy between “traditional” (textually-based) history and “new” (digitally-based) history.

The interviewees also signaled a strong interest in thinking about “research” and “scholarship,” both of which appear more frequently than the word “student.” What might be even more interesting is the way that “research” and “scholarship” appear throughout the article, whereas “student” is mainly concentrated in the early questions on pedagogy:

Yet, despite the importance of words like “new”, “research”, and “scholarship” in the printed discussion, it is also worth noting how similarly the remaining words appear in frequency. In fact, 86% of the most frequently used fifty words in the article fall within one standard deviation of the mean usage of those fifty words (85% if the top three results are excluded). Thus participants appear to have been equally interested in most of the topics covered in the article.

If we examine the responses by interviewee, though, we do see some interesting differences begin to emerge. First, it is worth noting that each interviewee is not represented equally in the interview:

43% of all the text is supplied by two individuals. Consequently, if might be useful to see if Cohen and Thomas had a particular effect on the overall pattern of words in the article.

A graph of the seven most common words broken down by interviewee reveals some important trends:

First, Cohen’s responses overwhelmingly focused on “research”, “new”, “web”, “scholarship,” and “work.” Given his position as Director of the Roy Rosenzweig Center for History and New Media at the time of the interview, this is probably not very surprising. “Medium” and “scholarship” also appear quite frequently in Thomas’ interview, which given his role in the Valley of the Shadow project should also not be too surprising.

Fewer clear trends appear when these seven highly ranked words are analyzed by question:

In the end, this post is mainly an experiment to see if I could use the tools we were taught, but the data does allow for some broad conclusions to be drawn. Overall, it seems that the interview and interviewees were mainly concerned with thinking about the “newness” of digital history in 2008 – figuring out what it might mean particularly for scholarship, though with a reasonably strong emphasis on pedagogy. It is worth noting that certain topics that have dominated the 2014 Institute discussion, such as the place of public history and museums within and around digital history, are present, but are much lower in the list of frequently used 2008 words. Moreover, “questions”, “methods”, and “process” are also quite low in the list, possibly indicating a certain uncertainty about these topics six years ago.

For simple comparison, one can find a Wordle compiled by Spencer Roberts of participants’ blog posts,and one can see a much stronger emphasis placed on “students”, “project,” “comments”, and “sources.” Whether this change is signaling a shift in DH conversation, is resulting from who the participants of the Institute are (mainly from Master’s-granting degree programs, instead of larger research universities), or is arising from the structure of the Institute is beyond the goals of this overly long post.

 

Leave a Reply

Your email address will not be published. Required fields are marked *