donderdag 13 augustus 2015

Biography of the future? Digital Humanities and a hypothetical biography of John de Witt (1625-1672)

Introduction[1]
As any humanist scholar of the twenty-first century, the modern biographer should account for what to do, or not to do, with the advances of the so-called digital humanities. A wide variety of biographical sources, primary sources like correspondence and secondary sources like biographical dictionaries, have been brought online over the past decades and will increasingly be consulted by future biographers. The digital turn does more, however, than only make biographers consult sources from behind their own computer rather than in a library or archive. The digitized sources can be analyzed in new, more advanced ways, visualizations of material help to see patterns, biographies can be presented in different ways, et cetera.
            The question remains if this digital turn, this increasing availability of ‘biographical data’ and new ways to consult them, really changes the biographies of the future.  This blog provides a critical reflection on the possibilities of digital humanities technologies for biographical research. I will take the life of grand pensionary John de Witt, the highest official of the Republic of the Netherlands (1653-1672), as an example of how a biographer could use digital humanities technology for biographical research, and to illustrate its potential and shortcomings. The choice for John de Witt springs forth from personal interest, the availability of plenty of primary sources to consult and the fact that he has been the subject of several larger biographies already, both in and outside the Netherlands.




I John de Witt: a life of diplomacy and writing

John de Witt is one of the most prominent figures in Dutch history. As grand pensionary of the province of Holland, the most powerful province of the Dutch Republic, he was considered to be the leader of the Republic and treated with all honors by foreign rulers. His untimely death in 1672, murdered and ripped apart by an angry mob, has contributed to the fame of his legacy. To date there is no conclusive evidence to point the blame to any particular person(s) for this murder, though his political enemy the prince of Orange, William III (the later king of England), is often mentioned as being (partly) responsible. Several biographies on John de Witt have appeared, in English and in Dutch.[2]

Statue of John de Witt in the Hague
 De Witt's archival legacy is immense. In his role as a state official alone, eight meters of letters are preserved in the Dutch National Archive, bundled in twenty volumes, written in official circuitous officialese jargon. [3]  He was not only a politician, but also a mathematician who is still considered to be a founding father of modern life insurance mathematics. 
            To write a biography on De Witt is no small feat. Nineteenth century politician J. R. Thorbecke noted that ‘To give us a life of De Witt worthy of the man is to assure oneself a place among historians of all time.’[4] Rowen needed almost 1000 pages to capture De Witt’s life in 1978. Panhuysen used slightly more than 500 pages to describe the life of De Witt in relation to that of his brother Cornelis in 2005. Recently, Prud’Homme van Reine needed more than 200 pages to describe the murder on De Witt and his brother alone.
            Rowen especially, has consulted a tremendous variety of sources to compose his work. Necessarily however, the biographer of De Witt, and of many other noteworthy persons in history, has to be selective in the topics to address and the sources to consult. One person can only ‘close-read’ a limited number of sources, especially in the twentyfirst century where there is no patience for academic output that takes longer than projects of four or five years. Digital humanities technology allows biographers to combine close-reading with ‘distant-reading’ in which larger texts are analyzed by the computer to facilitate finding patterns, test hypotheses and find leads to further research. 


II Analyzing the texts
Computer software is very good at reading and interpreting text, as long as it is modern text and written in a software interpretable (digitized) format. The first problem we encounter when we want to use digital humanities technology for a biography on De Witt, is that the correspondence of De Witt is not digitized yet, and even if it was, the OCR (Optical Character Recognition) that would have to translate the handwriting to computer readable text might not deliver great results.  Computers can be trained to recognize handwritings to improve performance, with the help of the crowd, but hand made transcriptions definitely have to be preferred.[5]  
            Let us assume however, that we have all eight meters of De Witt’s correspondence in a computer readable format of decent quality. The first thing we would want to know is all about the texts themselves to tell us a bit more about the man we are writing about. Questions we could ask with the help of digital tools, but could not ask before with great difficulty are: How long were De Witt’s letters? Did he use many words per sentence and many sentences per letter compared to his contemporaries? (this could tell us something about his working ethos and personality)  How does this change over time and why? 
            By lack of actually having the correspondence of De Witt available in a  digitized format, we used a transcription of a famous political text by De Witt  of 1654, his deduction,  as an illustration, in which he defends himself against serious allegations after conceding to the English lord protector Oliver Cromwell not to appoint a member of the House of Orange to the highest state offices. [6] By using simple and free online tools like Voyant Tools and WordCounter, we find out that De Witt used 34.456 words in this political text and that there are 5185 unique words in it. He uses 749 sentences with an average of 46 words per sentence. The most frequently used noun is ‘provincien’ (provinces). We can also see he uses that word frequently throughout the entire corpus and not just in one particular section, showing the importance of the relation between the different provinces of the Republic that De Witt addresses. A word like ‘Brabant’, the province, is only used in a very restricted part of the text. The name Orange (Oraingne), from the political adversaries of De Witt, is used often in the beginning of the text, as well as slightly after the middle of the text and in the very end.

Voyant Tools as a means to analyze John de Witt's texts
          

Another word De Witt uses often is ‘God’ (in several spelling variations) which we find 37 times.  If we would want to write a paragraph in our biography about how religious De Witt was in his thinking it would be a valuable exercise to compare the relative occurrence of ‘God’ in this political text to the mentioning of God in texts from other politicians of his time. This is necessary to contextualize De Witt and see how he compares to similar individuals. Once again, this would mean having to have access to full text computer readable versions of as many political texts as possible. 
            Finally, when we are dealing with a wide variety of texts, we should definitely consider the popular exercise in digital humanities of topic modelling. With topic modelling the computer extracts topics from text, by looking at the words that are mentioned together in statistically meaningful ways. In this way we could globally assess what is discussed in which documents, without having to read them fully. If we, for example, would want to know in which letters De Witt mentions the strength of the Dutch fleet, topic modelling could point us the way.


III Patterns in Networks
One of the main techniques for a biographer to contextualize his or her individual, is by analyzing the networks he or she was part of.  The analysis of correspondence with digital methods is a key component in finding out who had contact with whom and for what reasons. John de Witt was a statesman who led the anti-Orange party, who corresponded with foreign colleagues, ambassadors, scholars and international friends and who had an extensive patronage network. It therefore is of high interest to map his correspondence.  His biographers have also used his letters extensively to define his relationship to other people.
            A relatively simple exercise would be to analyze all the recipients of letters of John de Witt and all the people who have sent him letters. When you would visualize this with maps, graphs and figures (several off-the-shelf tools exist to do this) you would get a picture that allows us to see patterns we could not see before.[7] Stanford University’s Mapping the Republicof Letters, provides good examples of such visualizations. If we take the use case of the French philosopher Voltaire we can see graphs of the nationality and social background of his correspondents.  All is visualized on a map with the most modern techniques. For one, we may deduce that Voltaire’s correspondence was not as cosmopolitan as he might have wanted  to appear.
            An initiative directly relevant for John de Witt is the Circulation of Knowledge and Learned Practices in the 17th-centuryDutch Republic. Even though De Witt was primarily a statesman, he is likely to have been in contact with the most prominent intellectuals of his time.[8] When searching in the database we find five letters from the scientist (theoretical physicist) and inventor Christiaan Huygens to De Witt, from 1658 to 1670. These letters seem to give insight into quite a formal relationship between the two, in which Huygens calls himself  De Witt’s ‘humble servant’ as was the custom in that time.
            In one of these letters (1 February 1664) we also find mentioning of ‘lord Brus, brother-in-law of the lord of Somerdijck’.  Ideally, we want to know who this lord Brus is and what his relation to De Witt might have been and the same goes for any other people who are mentioned in the letters to and from De Witt. We also want to be able to match this lord Brus to other mentions of him in the correspondence.  A computer is able to search for other instances of lord Brus, but it cannot learn (without great difficulty) if these two lords are the same people, other than giving a negative match if two different Brusses are mentioned in letters that are chronologically too far apart.    
           
    
           

John de Witt, by Adriaen Hanneman
Another problem is a difference in spelling of the names and a difference in the way people are called or call themselves.  Christiaan Huygens signed the same letter with ‘Chr. Huygens van Zuylichem’, after the castle and estate his father had acquired in 1630. Similarly, a computer would have great difficulty to match a mention of a ‘master Vincent’ to the right person without knowing the context. The problem of recognizing and matching individuals automatically (NERD: named entity recognition and disambiguation) is common in projects that deal with biographical data.  Statistics combined with domain expert knowledge are increasingly successfully applied however,  to match names in separate documents.[9]

IV Comparisons
When using computers you want to make use of the strength of their calculating powers. For biographers it is particularly interesting to compare his or her individual to similar people, to be able to frame their individual in the context of their time.[10] To this end we need structured biographical data on as many people as possible.
            In  the case of John de Witt there are several groups of people we want to make comparisons to. De Witt started to study Law at the University of Leiden at the age of sixteen. If we were able to draw up schematic biographies of all students in the years he studied there we would know how he compared to them in regard to age, social and geographical background and later careers. De Witt was pensionary of Dordrecht shortly before becoming grand pensionary of Holland. By analyzing the previous office holders, and holders of the same office in other towns, we would get to know how unique his appointment was in that time. The same goes for the office of grand pensionary and practically any other network or group De Witt belonged to. Such prosopopgraphical, structured analyses would allow us to make much stronger substantiated claims about the person of De Witt, than usually is done in biographies.
            The reason  that such extensive comparisons often do not find their way in biographies is because they are very time consuming. With the help of digital methods however, such comparisons can be made easier. In order to do so we would need much biographical data online in a structured format.
 Digital biographical data can either be ‘digitized’, for example from a scanned book, or ‘digital born’.[11]  We can make a distinction between resources with primarily ‘generic’ biographical data, like dates and places of birth, marriage dates and dates and places of death, and resources with more narrative data with the description of a person’s life. If we, for example, take the Wikipedia entry on John de Witt then we have a quite extensive biography of his life (narrative data), accompanied by an info box with structured data on his life.  For a computer it is relatively easy to read and analyze the info boxes, but difficult to interpret the main text.
             
 Wikipedia is the biggest player in the field of online biographical data. Several studies have shown that for factual knowledge Wikipedia can compete with authoritative sources from professionals, especially in the field of data on people.[12]  DBpedia publishes the data of Wikipedia’s info boxes in linked data format. This allows analyses over different datasets and therefore increases the potential to compare John de Witt to different groups of people. Structured data on no less than three quarters of a million individuals worldwide are available through DBpedia. The advantages of having biographical data online is recognized as well by editors of biographical dictionaries.[13] These dictionaries contain relatively short biographies on people who were considered worth describing at the time of publication. Especially since the nineteenth century the dictionaries were published in multiple volumes all over Europe, containing descriptions of thousands of people.[14] These dictionaries form a rich source of information, that no human could ever fully analyze with traditional methods. 


V Publishing the Findings

Biographers, as historians in general, have a tendency to amass a vast amount of information on the person that is the topic of their research. Already in 1946 the Dutch biographer Jan Romein noted that biographies often were too long. The ideal length of a biography would be 200 pages, in which the biographer made a conscious selection of his source material.[15] When looking at the most prominent biographies of John de Witt, and to the length of modern biographies in general, we must conclude that Romein’s 200 page limit is rarely adhered to. Even if this is not necessarily a problem, it does reflect an above average struggle of biographers to select the right material and to keep a book within a certain page limit. Digital publishing might be the answer to both producing a manuscript of manageable size and being as ‘complete’ as possible. The monograph is not the only way anymore, or maybe even the most evident way, to publish your work.[16] With more authoritative sources online a change of attitude towards digital sources has taken place as well.  Recently, for example, academics have started to prefer the online version of the Oxford Dictionary of National Biography rather than the printed version.[17]
            There are many advantages to digital publishing. First of all, if we published our new biography of John de Witt online, we could easily rectify mistakes  If, for example, we would identify the previously mentioned lord Brus in the correspondence incorrectly, we could simply correct that and account for the change.
            Secondly, digital biographies make it easier to let go of the traditional narrative of an individual’s life from birth to death. We could divide our biography into ‘themes’ (e.g. the murder on John de Witt), provide hyperlinks to more detailed information (e.g. the Dutch Republic and the navy) and even publish the source material we used, or left out, for a particular topic (e.g. the letter from Christiaan Huygens to John de Witt of the previous paragraph). The Ludwig Boltzmann Institut für Geschichte undTheorie der Biographie is a forerunner in publishing biographies this way. In particular, they work on alternative modes of presentation for the lives of Austrian writers Ernst Jandl and Karl Kraus. They developed a content management system called Biographeme, “which breaks down the closed linear mode of life narratives in favour of a modular form of biography, the individual components of which can be combined and recombined according to interest or the question asked.”[18]
            Thirdly, the digital era offers unprecedented possibilities for researchers to also put their raw material online. We could put transcriptions of letters of John de Witt, systematically gathered information in a database and, when copyright permits it, original material online as part of an online publication. This would be a highly efficient way to facilitate further research and to allow others to check our findings. It is also a good way to show that the tax money invested in the research was well spent. It could or should be part of any data management policies at academic institutions to facilitate storing these data and making them available for further use. This would of course, also mean that a biographer should not ‘claim’ his or her subject as sometimes is the case, but let go of the subject once a biography is published (or maybe even before that).[19]
            Finally, there is the possibility of working together on a project if you put it online before a printed publication. By bringing your material online you open up a dialogue that holds the middle between writing and speaking.[20] In our case, for example, we could simply put the question online if anyone knows who lord Brus was. We could also provide our NERD results online and ask visitors of the website to spot incorrect matches. This way we avoid too many computer generated mistakes and get more input to refine our algorithms.   

       

VI Problems
In comparison to previous biographies on John de Witt, this hypothetical biography would be based on more resources, include quantifications on the topics John de Witt addressed most frequently, say more about John de Witt as an individual compared to his contemporaries, would include detailed and strongly visualized network analyses and would be presented in a more dynamic way with room for all the material we have gathered.  Unfortunately, at the moment this still remains largely hypothetical. 
            Perhaps the most fundamental problem which needs to be discussed is the one of representativeness. It is self-evident, but cannot be stressed often enough, that the scope of digital research is limited by the availability of computer readable digital data. The results of any exercise with computational methods therefore should be accompanied by a critical account of the completeness and biases of the sources. It would be very nice if we had all correspondence of John de Witt in a machine readable format, but that is not going to happen any time soon. Even only digitizing his correspondence would cost a lot of time and money.[21] In general. there is a huge amount of material that is not digitized (yet) and remains out of the scope of digital humanities research.[22]
            Another problem is the relatively high ratio of mistakes and inconsistencies which one would have to deal with when using digital tools on biographical data. The OCR quality of digitized texts alone can lead to problems in the analysis, especially when looking for names.[23] In the case of De Witt his seventeenth century handwriting poses another (high) hurdle to take.
            Finally, computers may be great at making calculations, but are bad at interpreting text. They can only work with the algorithms we feed them. If we ask them to match names from different documents they are bound to make more mistakes than a human would make. It is for example difficult for a computer to separate the lord Wassenaar from the location Wassenaar. They would, however, do the work much faster. A detailed documentation on how computer tools were used for our research on John de Witt should be provided in order for other researchers to check the analyses. Unfortunately, complex tools are more likely to provide precise results, but are more difficult to comprehend on a more than basic level for the digitally lay user.[24]

Conclusions
I discussed the potential of digital humanities tools for a hypothetical biography on John de Witt. The hypothetical biography is based on more documents and provides detailed network analyses, comparisons to contemporaries and visualizations. Some questions which could not easily be answered before, like how religiously influenced De Witt is in his writing compared to his contemporaries could even be answered. The hypothetical biography would be presented in a dynamic and interactive manner, providing possibilities for additions, dialogue and links to more information.
            Despite the apparent opportunities for biographical research, there still is a long way to go before this hypothetical biography of John de Witt could be written. First of all, much more archival sources should be put online in a format that makes it possible to analyze with digital methods.  Right now, we are still at the beginning of digitizing our cultural heritage. Techniques to translate handwriting into computer readable text is likewise still in the early stages of development.
            Some things however, can already be done for biographical research thanks to digital humanities technologies. New forms of presenting biographies already exist. In this chapter we have also shown some very basic examples of what can be done with digitized texts relating to the life of John de Witt. Even though the analyses do not provide any conclusive evidence, they do inspire new research questions and provide leads into what may be worthwhile investigating further.


Notes


[1] All URL’s in the  notes were last retrieved on 1 July 2015.
[2] Most notably: Panhuysen, De ware vrijheid; Rowen, John de Witt; Prud’Homme van Reine, Moordenaars van Jan de Witt.
[3] Panhuysen, De ware vrijheid, 7.
[4] Cited by Rowen, John de Witt, vii.
[5] See for example the Monk project.
[6] De Witt, De Deductie van Johan de Witt.
[7] Geerlings, 'A Visual Analysis of Rosey E. Pool’s Correspondence Archives', 65.
[8] Rowen, John de Witt, chapter 20.
[9] E.g. Bell and Ranade, 'Traces through time'.
[10] Ter Braake, ‘Het Individu en zijn Tijdgenoten.’
[11] See on this also: Haber, Digital Past, 103; Zaagsma, ‘On Digital History’, 24.
[12] Arthur, ‘Exhibiting history',  33-50; Currie, ‘The Feminist Critique'; Haber, Digital Past, 75-79; Liu, ‘Where is Cultural Criticism in the Digital Humanities?’496; Rosenzweig, ‘Wikipedia’, 62.
[13] E.g. Reinert, Schrott and Ebneth,  ‘From Biographies to Data Curation13-19.
[14] Caine, Biography and History, 49-53; Slocum, Biographical Dictionaries, xv- xvii.
[15] Romein, Biografie, 178-181.
[16] Van den Akker, 'History as Dialogue'Rigney, 'When the Monograph Is No Longer the Medium'
[17] Carter, 'Opportunities for National Biography Online', 345.
[18] Hannesschläger and Prager, 'Ernst Jandl and Karl Kraus', 1.
[19] On the claims of biographers on their subjects: Wilson, 'A Love Triangle.'
[20] Van den Akker, 'History as Dialogue'.
[21] Jeurgens, 'The Scent of the Digital Archive', 45.
[22] Sample, 'Unseen and Unremarked On', 187-88, 198-99;  Earhart, “Can Information Be Unfettered?' 309-314.
[23] Piersma and Ribbens, 'Digital Historical Research', 93.
[24] Lin, 'Transdisciplinarity and Digital Humanities', 306.

Reference List


Caine, B. Biography and History. Houndmills: Palgrave, Macmillan, 2010.
Carter, Ph. “Opportunities for National Biography Online: The Oxford Dictionary of National Biography, 2005-2012’.” In The ADB’s Story, edited by Melanie Nolan and Christine Fernon, 345–71. Canberra: ANU Press, 2013.
Haber, P. Digital Past. Digital Past. Geschichtswissenschaft Im Digitalen Zeitalter. München: Oldenbourg Verlag, 2011.
Lin, Y-w. “Transdisciplinarity and Digital Humanities: Lessons Learned from Developing Text-Mining Tools for Textual Analysis’.” In Understanding Digital Humanities, edited by David M. Berry, 295–314. Basingstoke: Palgrave, Macmillan, 2012.
Panhuysen, L. De ware vrijheid : de levens van Johan en Cornelis de WittAmsterdam: Atlas, 2005.
Prud'homme van Reine, R. B. Moordenaars van Jan de Witt : de zwartste bladzijde van de Gouden Eeuw. Utrecht: De Arbeiderspers, 2013.
Rigney, A. “When the Monograph Is No Longer the Medium: Historical Narrative in the Online Age.” History and Theory 49, no. 49 (2010): 100–117. doi:10.1111/j.1468-2303.2010.00562.x.
Romein, J. De Biografie. Een Inleiding. Amsterdam: Ploegsma, 1946.
Rosenzweig, R. “Wikipedia: Can History Be Open Source?” In Clio Wired. The Future of the Past in the Digital Age, 51–82. New York: Columbia University Press, 2011.
Rowen, H. H., John de Witt, grand pensionary of Holland, 1625-1672. Princeton, NJ : Princeton University Press, 1978
Slocum, R. B. Biographical Dictionaries and Related Works. An International Bibliography of Collective Biographies, Bio-Bibliographies, Collections of Epitaphs, Selected Genealogical Works, Dictionaries of Anonyms and Pseudonyms, Historical and Specialized Dictionaries. Detroit: Gale Research Co., 1967.
Ter Braake, S. “Het Individu En Zijn Tijdgenoten. Wat Een Biograaf Kan Doen Met Prosopografie En Biografische Woordenboeken.” Tijdschrift Voor Biografie 2, no. 2 (2013): 52–61.
Wilson, F. “A Love Triangle.” In Lives for Sale. Biographers’ Tales, edited by M Bostridge, 38–42. London and New York: Continuum, 2004.
Witt, J. de, De Deductie van Johan de Witt : manifest van de ware vrijheid uit 1654 / ingeleid en hertaald door Serge ter Braake. Arnhem: Sonsbeek Publishers, 2009.