David Patrick McKenzie

Historian working in academic, digital, and public forms

Month: October 2011

Data Mining & Distant Reading: Valuable Tools, but Merely Tools

This week’s readings (scroll to Week 10) concerned using digital technology to “read” texts in different ways.

I use the term “read” in quotation marks to draw attention to it, as this is not what many of us colloquially call reading–that is, what you are doing now, going over my post with your eyes. That term nonetheless applies–it describes what, for example, Google is doing with this post, going through it with algorithms to fish certain information out of it.

For me, the readings harkened back to those from week 3, particularly Susan Hockey’s “History of Humanities Computing.” In my post for that week, I mentioned my surprise, based on my own experience, how long of a history humanities computing had. Through most of that history, computers had been used for production of knowledge rather than its dissemination, beginning with Father Busa’s use of punchcards to index the works of Thomas Aquinas. This week’s readings focused on new, and not-so-new, ways of using digital technology in humanities research, particularly with texts.

Digital technology has assisted with knowledge production in the humanities by assisting us with the problem of quantity. Besides the basic function of searching through mountains of material to pull out what we need, the technology enables us to find patterns and quantities in the material itself.

As the readings all make clear, however, these tools are merely tools–means to an end, not ends in themselves. Nor should they be ends in themselves. To show that, I’ll use an example from my own work here.

For my American Revolution seminar at GW in 2006, I wrote a paper comparing ideology in the American Revolution and the contemporaneous Tupac Amaru Rebellion in Peru. Referencing other works’ historiography, I stated that interest in the Tupac Amaru Rebellion had picked up in the 1960s and 1970s. I revised that paper for my Ph.D. program writing sample in late 2010–just after the debut of Google’s N-grams Viewer.

So just for fun, I used the N-gram Viewer to find instances of the term “Tupac Amaru” in the English and Spanish corpuses since 1780. The results largely bore out what the historiography said: at least in English, a rise in mentions of that combination of terms in the 1960s and 1970s. Interestingly, though, the Spanish corpus shows a rise–indeed, a peak–in the 1950s.

As Dan Cohen correctly points out, using this tool is merely a start. Indeed, it leads to a host of other questions. For example, why do the English and Spanish corpuses have their peaks at different times? As Franco Moretti does with 18th- and 19th-century English novels, we need to look at the social contexts of those times to understand those peaks. In the case of Tupac Amaru, the rise of the term, in the English corpus at least, coincides–not coincidentally–with the rise of anticolonial movements and subaltern history. That’s what the historiographies in recent works said, at least. Why an earlier rise of the term’s frequency in the Spanish corpus? That is a question for further research.

To tease out other issues, we need to look more closely at the works cited. For example, the English corpus shows a rise of that combination of terms in the 1990s–not surprisingly, corresponding with the rise in popularity of the rapper Tupac Amaru Shakur, and, I’m guessing to a lesser extent, the Tupac Amaru Revolutionary Movement’s 1997 seizure of the Japanese embassy in Lima. Only by reading deeper–i.e., reading in the traditional, commonly-understood sense of the term–would one be able to learn whether that 1990s rise had to do with increased scholarship about the 1780-83 rebellion or the prominence of an individual and a group named for that rebellion’s leader.

Thus, my takeaway from this week’s readings: similar caveats as those that apply to the N-gram Viewer apply to other data mining and distant reading tools. The tools help us formulate questions, help us answer those and other questions, help us make sense of a mass of information. And they are super-cool. But they do not provide answers in themselves. For that, we still need to rely on the oldest tool in the humanities arsenal: the human brain.

Scholarship in the digital realm

This week’s readings concerned the question of scholarship in the digital realm. Specifically, what is digital scholarship, and how is it evaluated?

As the semester has gone on, we’ve learned how the digital makes a difference in format. As Lev Manovich discussed in last week’s reading, The Language of New Media, the rectangular computer screen forces a different language upon us. To make full use of the power of the digital, we must adapt our forms to its forms. This means that, while there is a place for the traditional monograph and article model of historical scholarship, we must think in new ways for the digital–not just replicating the old means on a screen.

What might digital scholarship consist of? The University of Nebraska’s Bill Thomas discusses one of the many possibilities in his article “Writing A Digital History Journal Article from Scratch: An Account.” He discusses his experience with an article that he and Edward Ayers of the University of Virginia wrote. This article was published in the American Historical Review print edition, and also online. In the print edition the scholars followed the traditional format. In the online edition, though, they experimented with new ways of making their argument. As they detail, some of these ways worked, some did not–in part, depending upon the audience. For me, the most interesting part of this article was its conclusion–where Ayers and Thomas challenged us to think outside of our usual paradigms–breaking down categories of archives, exhibits, etc.

As it is becoming clear, “digital scholarship” can mean many things. The fascinating Our Cultural Commonwealth, produced by the American Council of Learned Societies, offered as its starting point five categories worth quoting in full:

a) Building a digital collection of information for further study and analysis
b) Creating appropriate tools for collection-building
c) Creating appropriate tools for the analysis and study of collections
d) Using digital collections and analytical tools to generate new intellectual products
e) Creating authoring tools for these new intellectual products, either in traditional forms or in digital
form

As the report notes, only category (d) has been considered scholarship. As we have discussed in class, though, there are currently debates about inclusion of those other activities as scholarship. I’d like to see this discussion continued and expanded.

I found the Council for Library and Information Resources’ “Working Together or Working Apart: Promoting the Next Generation of Digital Scholarship” particularly valuable for its argument in favor of including the other aforementioned categories in the realm of scholarship. The portion that resonated most with me was Caroline Levander’s “The Changing Landscape of American Studies in a Global Era” (pages 27-33). First of all, being someone who gets on my high horse about how small the Rio Grande is physically, and yet how there seems to be more scholarship about transoceanic connections than connections across that little desert stream, I found myself saying “amen” when she gave a hemispheric definition of American Studies (I also wished I had looked at this report and quoted parts in my project proposal!). More to the point, though, she argues that the very content of an archive, and how it is formed, helps shape the questions that can be asked of it. Thus, she and her colleagues creating the Our Americas Archive Project–a collaboration of Rice University, the University of Maryland, and the Instituto Mora in Mexico City–are bringing together primary sources from throughout the Americas, as a way to bring about new questions. In this case, the formation of the archive–what to include, what to exclude, and how to search it–is the argument.

So if digital scholarship can mean many things beyond producing the traditional article or monograph, how is it evaluated? The evaluation of history produced in formats beyond those is not a new question, as the report and white paper of the Working Group on Evaluating Public History Scholarship make clear. Both of these–it’s worth reading the white paper and not just the final report–discuss how history departments can evaluate public history work. Their recommendations–that public history work be valued not just as service or teaching but as scholarship–carry over into the digital realm. Most importantly, they see scholarship as a process, not just the end product.

So in the end, what is “digital scholarship”? As I wrote this post, I noted that I skirted actually defining it–the closest I got was quoting Our Cultural Commonwealth. Based on these readings, it seems we can define digital scholarship as peer-reviewed, intellectually rigorous research and dissemination of that research using digital means. In other words, it is scholarship whose form is specifically digital. For example, although one can more easily read a traditional article or monograph on an electronic device like a Kindle or iPad–and even produce a “Kindle single”–I would not classify those as digital formats because they do not depend on the existence of the digital for their format. One could print them out and have the same.

Is that a satisfactory definition, or am I excluding too much by arguing that the form needs to be digital? Do we even want to define the term, or would that preclude too much? As my colleague Megan points out, public history doesn’t have a rigorous definition. What do others think?

Final grant proposal

Can be found here.

Week 8: Theory of New Media

John and I are this week’s discussion leaders. We’ve been emailing thoughts back and forth, and decided that each of us are posting our own thoughts/questions for the discussion on our respective blogs, and further commenting. So here are my thoughts:

We envision the discussion going along two interconnected strands: considering Manovich’s work on its own, with its broader implications, and then more narrowly with its implications for digital history. Since the book was published in 2001, one thing about which I’m curious is what others think of how it holds up. In my opinion, all in all it does, although some parts have changed over time.

All in all, I’ve liked the book. I don’t have formal background in media studies–but related more to the book than I initially expected.

One of my favorite things about the book was how Manovich related new media to other forms. His use of examples helped make these relations concrete. Manovich’s technique of going from the “inside” to the “outside” worked well–he brought us from the technology of the computer to what we see on the surface. His broad definition of new media–not just what is disseminated via computers but also what is produced (p. 19)–strengthens his analysis. As he notes, “the computer media revolution affects all stages of communication, including acquisition, manipulation, storage, and distribution; it also affects all types of media–texts, still images, moving images, sound, and spatial constructions” (p. 19).

This insight underpins much of what we have discussed so far this semester. The previous changes in media technology Manovich points out, such as cinema, television, and radio, have incrementally affected production of historical knowledge and its dissemination; the digital revolution has been just that, a revolution. It has forced us, as historians, to consider our craft in new ways.

So, some questions and discussion points that arise from this:

On page 7, Manovich discusses how the “previous cultural forms shaping [cinema] were still clearly visible and recognizable [in the early 20th century], before melting into a coherent language,” and relates that time in cinematic history to the present in new media, at least the present as of the book’s 2001 publication. Do we think the previous forms shaping new media have melted “into a coherent language,” particularly for digital history? Based on tweets from CHNM’s Sheila Brennan the other day (scroll to 10/13), and generally looking at what museums are doing online, I would argue that, particularly for history museums, new media is largely a reflection of the old.  Even Omeka takes the form of items, collections, and exhibitions, much like a physical museum or archive does. That platform’s major breakthrough is facilitation of digitization on the producer end and ease of access and information on the consumer end. So, are we any closer to a “coherent language” in 2011 than we were in 2001? Will we ever be, or is new media (and digital history) changing so fast–and is it by nature so varied–that we won’t have a “coherent language”?

On page 60, Manovich speaks of the “modern desire to externalize the mind.” He further states, “what before had been a mental process, a uniquely individual state, now became part of the public sphere.” This brought me immediately to think of technologies like Facebook and Twitter, particularly the idea that some people have ceased caring about privacy. Are these technologies the fulfillment of Manovich’s statements? How does this relate to the changes–and new demands–digital technology has brought to the history profession, particularly the (in my opinion, justified) demand that we make our work more transparent than before?

On page 65, page 97, and really throughout the book, Manovich talks about the windows present in the graphical user interface (GUI) first popularized by the Macintosh in 1984. Lately we’ve seen some moving away from that, as the iPhone, iPad, and now OSX Lion have adopted full-screen apps. But with those we still have the option of sliding between other full screens–for example, right now I have Tweetdeck open on another screen, a browser open on another, Zotero in another, Mail in another, and iTunes open in yet another. I can switch between them when I need (or, really, don’t need) a distraction. How has this affected how we interact with the computer? How will full-screen applications affect our digital history work? Will they help our end-user get more immersed in what we’re presenting?

On page 75, Manovich discusses how virtual reality enthusiasts envisioned a future 3-D Web. I’m curious as to the state of this. I haven’t heard much about things like Second Life in a while–even with places like the Smithsonian Latino Center making investments in a presence there. What do others know? Is there a future for a 3-D Web? Or will 3-D history projects be executed through more specific means, like Rome Reborn? What implications does the 3-D Web, or lack thereof, have for the way we do our work as historians?

Similarly, on page 86 Manovich discusses the importation of cinematic forms to new media–throughout, though, he largely discusses this through gaming. We are also seeing some applications of cinematic forms for digital history work, e.g., in reconstructing and representing spaces. Yet it seems a lot of digital history work is driven more by our older conventions of text-based media like the book and the article. How is all of this working together? To me, it seems we are using more cinematic technologies and techniques in concert with traditional means than previously–indeed, the ability to combine previously separate media is the power of new media for the history field.

In chapter 2, Manovich talks about (but does not bemoan) the idea of a person being a “prisoner” when it comes to media consumption. Even virtual reality constrains the body. How does this premise apply to the advent of mobile devices? Do these make us even more “prisoners” of the screen–we focus our attention on those small screens versus the real environment surrounding us? What about augmented reality?

On page 120 Manovich refers to the “overlap between producers and consumers,” a blurring of the lines. At the same time, he notes that as knowledge about particular aspects of programming becomes more widespread, programmers come up with more complex formats. We are seeing a parallel in history–we have the idea that digital technology democratizes history, allowing “everyone to do history,” but we are also seeing increased professionalization. After all, we in the class are all in advanced degree programs. Is it the case that everyone can do “the lower order tasks,” for lack of a better term, of history (e.g., transcription), while professionals are doing the “higher order” parts? How does this relate to what Manovich discusses? Is this how historical labor should be “divided”?

In parts of chapter 3, like page 130, Manovich discusses “authorship as selection.” On page 143 he states that “along with selection, compositing is the key operation of postmodern, or computer-based, authorship.” He contrasts that with the artistic ideal of starting with a blank canvas–while noting that collages, etc., have become more popular and accepted as art. For me, this evoked some of our discussions about the nature of authorship in the digital age–particularly Sharon’s argument that assembling an archive is a form of scholarship, with her caveat that many do not agree. Can we say that history work has generally consisted of “authorship as selection,” that we have always been “compositing,” and that new media is now making that more obvious? When writing a book or article we don’t just write what comes to mind. We assemble our evidence, and narrate an argument based upon it. I would like to discuss Manovich’s definition of compositing further in relation to other aspects of new media, particularly mashups and such, and the relations of compositing to digital history.

Along the same lines, in Manovich’s section on “digital compositing,” he notes that various images, sounds, etc., can be put together so that “the result is a single seamless image, sound, space, or scene.” With regard to images, Manovich notes on page 158 that “borders between different worlds do not have to be erased.” This has led to one of the dangers of new media for history–the questions of provenance and authenticity that we have discussed in class. As an example, Cohen and Rosenzweig point to an image of Lee Harvey Oswald seemingly jamming in the basement of a Dallas police station. One could argue we in digital history are trying to keep everything from being seamless. Where do we see this going in the future? Will this be even more difficult, or has new technology given us more tools to show provenance and authenticity?

In a similar vein, in chapter 4, Manovich argues that some digitally-produced 3-D images have been “too perfect,” and invokes the film Jurassic Park as an example–pointing out that the filmmakers had to make the dinosaurs less perfect. This made me think of depictions of the past on-screen that use computer-produced images. In many movies, and even in recent attempts to build “virtual cities” (to say nothing of historic sites)–are we seeing the past as being too clean, too perfect, when compared to reality? Is “dirtiness,” for lack of a better term, something we need to add to 3-D reconstructions? Even though I doubt someone would take, for example, Rome Reborn as what ancient Rome literally looked like, do we need to add, say, some dirt?

One section I particularly would like to discuss is Manovich’s discussion of the form, particularly his argument that narrative and database are “natural enemies.” As Andi notes, history needs narrative – even if not necessarily linear. As a case in point, I originally put a splash page on my planned site (before being rightfully smacked down). Is narrative the enemy of the database? I’m not so sure, and I feel (perhaps naively?) that we can reconcile what Manovich argues are two poles. For our work, we need to reconcile these. Do we turn to other forms of narrative, whatever those may be (no literary theorist here)? Or redefine what narrative is? As Manovich discusses on p. 243, have narrative and database yet successfully been merged “into a new form”?

What are some questions that others would like to discuss? Please feel free to post here, and we’ll make sure to address your questions and comments!

Reflections on the proposal for “Familiar Strangers”

In this post, I am answering several questions about my proposal for “Familiar Strangers,” a website about U.S. and Mexican visitors to each others’ countries between 1776 and 1846. Thanks to my lovely wife Laura, Sharon (the professor), Andrea, and my other classmates for their feedback on the proposal draft [PDF] and the presentation.

What is your inquiry question?

How did visitors between the United States and Mexico view each others’ countries as the two countries moved toward war?

What do you want your users to learn?

I want my users to understand the deteriorating relationship between the United States and Mexico as it played out “on the ground.” Particularly, I want them to learn how people from the United States and Mexico interacted, particularly in the “cores” of each country, in the decades before the two countries went to war with each other. As a corollary to that, I’d like my users to make comparisons to today, to understand the origins of mutual perceptions that people from the United States and Mexico have of each other.

What is your methodological stance?

For this project, I plan to present both primary sources and interpretation. The interpretation will help users place the primary sources in context and gain a greater understanding of the period.

This site approaches this history on a transnational basis, seeking to understand the leadup to the U.S.-Mexican War from both sides of the border through the interactions between the peoples of each country. It offers five means of accessing the primary sources and interpretation, allowing users to learn about different facets of this period (see below).

How does your design work to support these goals?

The site’s design allows users to retrieve primary sources and interpretations through five means: by visit (in my final draft, I’ve decided to use that term instead of “journey”), by person, by place, by a time period search, or by a keyword search.

For example, one can see how the people of a particular locality, such as Lexington, Kentucky, or the port of Tampico, Tamaulipas, interacted with visitors from the other country.

One could learn about a particular visit–whether it was a journey, as Antonio López de Santa Anna and Juan Almonte undertook to Washington from Texas in 1836-37, or a case of immigration, such as occurred with Spaniards who settled in New Orleans after Mexico expelled them.

One could also learn about a particular visitor and his or her interactions with the other country through time. For example, Juan Almonte, who accompanied Santa Anna to Washington in 1836-37, had been educated in the United States (even working in a store in New Orleans after the 1815 execution of his presumed father, the rebel leader José María Morelos, during Mexico’s War of Independence) and later served as Mexico’s minister in Washington. Another example is the Kentuckian John Davis Bradburn, who joined a Mexican rebel group during the 1810-21 War of Independence and later served in Mexico’s army.

The search by time period option, meanwhile, offers users the ability to see raw numbers of visits during a particular time period, and learn more about that time through visits. A keyword search allows for the finding of particular terms, such as “gringo,” in the primary sources.

What new things do you need to learn?

Many things. First and foremost, I will need to learn Omeka. I’ve had very cursory experience with it before, playing around a bit on Omeka.net and attending a couple of sessions at THATCamp Prime this summer. Indeed, the other day I had trouble with the one-click installation of Omeka on Dreamhost, and was thankful to be at CHNM when Sharon was around!

I also need to learn more principles of web design.

Also, I need to learn more about the sources that exist for this project. I have done some research on Santa Anna and Almonte’s journey to Washington in 1836-37, and through that have picked up some sources for other visits. But I need to learn more of what is out there.

How will you go about learning these things?

Just through my presentation and feedback I learned some more about what Omeka can and cannot do. The prototype that we are building for the second project will help me learn Omeka. As I work on the prototype, I will use Omeka’s extensive documentation to learn more how to use the software.

Meanwhile I will go back to Teach Yourself HTML and CSS in 24 Hours and some of our other readings to learn more about web design.

Learning more about where primary sources are will largely come with my dissertation research, but in the meanwhile I will plan to bring in more sample sources for the prototype.

What is the rationale for the decisions you’re making about source choices (by type, collection, time period, etc.)?

Time period: The ending date I chose for the site was easy: The outbreak of the U.S.-Mexican War in 1846.

The beginning date I chose was more arbitrary. Initially when conceiving of this project/my dissertation, I had thought about beginning at 1821, when Mexico gained independence from Spain. But that would leave out influential interactions before that time period. After Spain declared war on the United Kingdom in 1779 in support of U.S. independence, some aid came to the United States from New Spain. Spanish agents then operated in what was the Western United States to gain settlers’ loyalty–and split that region from the United States. Even in a recent conversation with the curator of the Alamo, who has been a mentor through the years, I learned of many visits pre-dating Mexican independence. Meanwhile, between 1800 and 1820, adventurers from the United States–filibusters–formed private armies to invade portions of northern New Spain, particularly Texas. Others, like John/Juan Davis Bradburn, joined Mexican rebel groups. These stories influenced the trajectory toward war in 1846 and deserve to be included. As such, I chose the date of 1776 as the beginning of the archive.

Types of sources: The sources for this project are spread in archives throughout the United States and Mexico. Part of the project’s rationale is uniting these sources. The sources are diverse, and I would use any from the period that cover visits between the United States and Mexico. Newspapers on both sides of the border reported on visits between the countries–whether it was covering (or publishing letters from) visitors, or covering travelers passing through a locality. Some travelers, such as Almonte, left diaries of their travels. Meanwhile, through the magic of ArchiveGrid I found Calista Long’s published diary of a trip through Kentucky in 1836, when she and her family stayed at the same inn as Santa Anna and Almonte. She reported a near-riot.

Others have left family papers. Other sources are more surprising; for example, at the Historical Society of Washington, D.C., I found a ship captain’s log that would be relevant to the project. This U.S. Navy captain, connected to a Washington family, escorted U.S. merchant vessels from New Orleans to Mexican ports in 1836-37, around the Texas Navy’s blockade. In several of those ports, the captain reported picking up merchants from the United States who, essentially, needed a ride home.

All of these sources would be included in the archive.

What questions remain for you to provide a convincing grant application?

The class presentation, the comments I’ve received, and this exercise have helped me answer some of my questions, particularly about what to include in the application. Besides the parts that Andrea and Sharon pointed out, I especially need to work on my work plan and my project team. I will look at other similar projects to gain a better idea of the timeframe and the people involved in making the project happen.

Presentation: “Familiar Strangers”

Here is my presentation for class on October 11.

Week 6: Digital Collections and Digital Preservation

This week’s readings focused on efforts to preserve and collect the past online, and assessments of those efforts. As the readings make clear, digitization of primary sources–and creation of new ones in the digital medium–has been one of the main ways that digital technology has affected history research. As Alison Babeu’s Rome Wasn’t Digitized in a Day noted (and I alluded to in my Week 3 post), thus far it seems many have simply incorporated the ability to search and find documents into their already-established techniques for dealing with “analog” documents. But these articles also allude to other ways that scholars can more specifically use the power of computing in exciting ways, to mine these primary sources. Babeu, in particular, gives an excellent analysis of the challenges and accomplishments of digital technology for the classics.

Another major focus of the readings–and what I found most interesting–was what it takes to build such online archives. While I knew that building online archives was complicated, I didn’t realize just how much so until these readings–and indeed, I gained a new appreciation for the complexity. This applies both to digitization of extant texts, and online collecting efforts. T. Mills Kelly and Sheila Brennan discuss difficulties–such as soliciting contributions–in creating the Hurricane Digital Memory Bank, and suggest that creating an online archive for even major events like the 2005 hurricanes is more difficult than they anticipated. It makes me feel better that some of my work’s online solicitations of material–about things nowhere near as significant as the hurricanes–have not worked so well! This article and Dan Cohen’s comparison of collecting efforts after the attack on Pearl Harbor and the attacks of September 11, 2001 also showed the importance of leaping into action to collect right after major events, and indeed of having an infrastructure in place.

One question I wonder, though: in this age of social media, might it be easier to get people to share for a project like the HDMB? Might it be easier now than even 6 years ago? Or might there be a barrier for many people in sharing for an archive versus on a largely public forum like Facebook or a completely public (and now even archived!) forum like Twitter?

All in all, this week’s readings gave me a greater appreciation of efforts to collect and preserve the past online. With the increased research power that digital technology provides comes increased effort to get extant material online, to collect new material online, and to preserve what is already online. Kudos to those making these efforts!

Draft: Project #1

See the attachment. Fellow students and Sharon: I’ve left the criteria in for now–hence why the narrative extends beyond six pages. I plan to remove for the final. Will look forward to your comments!

NEH-ODH grant draft