maandag 29 augustus 2016

The Next Rembrandt: why did you do that?!

Michael Crichton's book Jurassic Park  caused a true 'dinomania' when Steven Spielberg decided to make a movie out of it in the early nineties of the previous century. The brilliance of his book lies in the scary message it conveys: even if science shows that you can do it, there may still be good reasons not to do it. It may not be a good idea to create a weapon of mass destruction (too late for that). It may not be a good idea to clone a dinosaur (still quite impossible). The question every scientist should ask him or herself is the same they ask their children on a daily basis: 'Why did you do that?!" It is tempting to say that the humanities are very good at asking the 'why' question, as suggested by the picture below, but that seems to be just vanity, as if the natural scientists never consider if they should be doing what they are doing.

This year a digital humanities project was in the news that was highly impressive and intriguing, but also immediately raised the question with me: 'Why did you do that?!' A team of the University of Delft collaborated with Microsoft and museum the Mauritshuis to create the 'next Rembrandt', a painting of the seventeenth century Dutch artist Rembrandt van Rijn. Their work is generously sponsored by the Dutch ING bank. They have put a very nice video on their website in which they explain how they worked on creating the 'next Rembrandt'. 

According to ING spokesperson Tjitske Benedictus, the ING wanted to bring their 'innovative spirit' to art and culture. Ron Augustus from Microsoft says they use 'technology and data' to create something like Rembrandt did with his paintbrushes. They had the computer analyse a large number of  (portrait) paintings from Rembrandt, to extract the typical Rembrandt 'nose',  'eyes', 'ear' and 'mouth', and calculate the distance between them on a face. They planted these features on a typical Rembrandt person, which not surprisingly is a 30-40 years old Caucasian male, leading to the 'next Rembrandt' (or 'average Rembrandt' maybe), printed by a 3D printer.

Even though I am very much impressed by the science behind all of this, the question remains 'why would you want to do that?' David de Witt from the Mauritshuis mentioned that Rembrandt was famous for being able to portray human emotions better than his contemporaries.  Emotions from real people, immortalised by the grand master is what makes a Rembrandt a Rembrandt. This 'next Rembrandt' however, is nothing more than an average of all those emotions,  planted on an average face of an average person that never existed, and as such not interesting. The creators may believe that Rembrandt would be pleased if he knew his work still lives on like this, but I find it more probable he would ask 'Why did you do that? My real paintings are still displayed all over the world.'
At the end of the video ING's T. Benedictus concludes:  'The Next Rembrandt makes you think about where innovation can take us, what’s next?' Despite my reservations about the product itself, I indeed am thinking of where this innovation could take us. The technology used to analyse Rembrandt's paintings could be invaluable for art historians. To name a few possibilities:

1) Use the data on the average Rembrandt to identify unknown paintings as belonging to Rembrandt (or not).
2) Compare the data on the average Rembrandt to data on the average contemporary Whatever Other Painter to see in what ways the grand master was really unique.
3) Compare average styles over time to see what developments took place.
4) Compare average styles per location to see what developments took place.

I would, for example, really like to know, would like to see quantified, how the faces Rembrandt painted were different from those painted by other Dutch seventeenth century masters, sixteenth century predecessors, or eighteenth century followers. Or to know what styles Rembrandt may have borrowed from sources that are not so apparent at first glance. To answer such questions a huge database of paintings from a huge number of artists would have to be analysed in the same way as the work of Rembrandt.  

Fortunately the ING thinks it's important to bring innovation to culture, so it should be a matter of time before they sponsor such a project. 

dinsdag 8 september 2015

Digital Humanities: From Source Criticism to Tool Criticism *

The political history of the county of Holland of the first half of the sixteenth century is rather well documented. There was a lively correspondence between the president of the Council of Holland Holland in The Hague, Gerrit van Assendelft , and the regent and stadtholder in Brussels. It is one of those coincidences of history that just because because stadtholder Anton van Lalaing resided in Brussels frequently (and not in the Hague), we have these sources at our disposal. This ritch correspondence however, does place the historian, like the younger version of myself ten years ago, in a difficult position. By only looking through the eyes of Van Assendelft at history the image gets distorted. His correspondence is biased by his personal visions on friends, enemies, relatives, and personal interests. Other opinions and visions are hardly available and when they are, for example when Van Assendelft was accused of corruption, heresy and nepotism, it is often difficult for the historian to assess which source is 'right'. Every student of history therefore is trained in a decent source criticism and to approach the sources related to his/her subject as objectively as posible. Of course there are few people who would claim they can approach a document completely objectively. Everyone is shaped by his/her own time, location and surroundings and develops sympathy/antipathy towards his/her subject.

So far nothing new. A good historian will always look at his/her sources critically and be aware that perspectives, including those of him/herself, are subject to change. What is less obvious, and what people seem to be only aware of to a small extend, is that tools for digital historical research also are far from objective. Just like a historian a tool gathers data and uses that to provide a synthesis/answer/visualisation. Just like a historian tools are filled with preconceptions/asumptions that can heavily influence the results of research. [1] If a tool always choses for a certain probability, for example that everyone without an exact date of birth always lived before the twentieth century, this can be a useful filter for one research question, but could have large and unwanted repercussions for the other. This realisation has the necessary consequences: every tool a historian uses should be criticised like a fellow historian, or even as a (sometimes very sloppy) co-author or student-assistent. This means that all choices which were made when developing a tool should be made explicit, and that ideally the complex algorithms which form the core of a tool should be understandable for the person using it. There are very few historians however, who have the necesary technical expertise, at least that of a bachelor in computer science, at their disposal to truly understand the finer nuances of computer code.

The question then is what could be done to breach the gap between the historian and technology. The most simple answer of course would be that the historian also must become a computer scientist  [2] or the other way around. Even though in the future there hopefully will be more of such hybrid academics than now, it is unlikely that we will have thousands of such people in the near future. One of my history teachers at University once said: 'A historian needs to be an amateur in every field,' Maybe it is enough to become an amateur in computer science as well. Traditionally, historians become amateurs in the fields of law, ancient languages, geography, archival science, art history, psychology, codicology and sociology. Computer science could simply be added to this list. Just like the other fields of study, computer science is an aid d to interpret all of the available data correctly.

It still is an open question what level of amateurism in computer science is acceptable to use digital tools wisely. Since digital humanities is a still emerging field this question knows many answers. Historians have used methods from other fields to various degrees over the centuries. The historian of a hundred years ago could not have predicted that statistics is now a widely accepted skill to analyse historical material and that Latin is becoming obsolete in many curriculi. I would say that necessary and (for now) sufficient conditions to use digital tools properly, are: 1) the availability of a detailed documentation of the choices made by the computer scientist, and 2) an understanding of how a computer scientist works and why he/she had to make certain choices. Or in other words: to a certain extend we need to master the languages of a computer scientist passively, which is also the level of how much historians grasp most other fields.  I read medieval French, have a basic knowledge of the work of the sociolologist Bourdieu, and I know what the legal terms mean in medieval verdicts. I would however never be able to speak medieval French (or even decent modern French), have no knowledge to be able to criticise Bourdieus work and have no clue if a medieval verdict is in line with how justice was applied in general in that time ... and I get away with it.

To graps how a tool works, historians therefore should not necessarily be able to convert a text to linked data, but should be able to grasp to a basic extend how this process works and what RDF triples are. This would entail a cultural change, in which tools are not only used as household appliances. but as the product of another academic field, that need to be approached critically before you can use them. Often historians stop at asking themselves how a tool can help them to answer their questions, while the importance of knowledge of how a tool is built and can be approached eludes them. Without such knowledge there can be no decent tool criticism,  which will become increasingly important besides the familiar source criticism.

* This is a (bad) translation and slight adaption of my blog from 24 June 2014

[1]See the important article of B. Rieder and T. Röhle: 'Digital methods: Five
challenges' .in: D. M. Berry ed., Understanding Digital Humanities (2012) 67–84.
[2] Throughout the text computer scientist can also be read as computationl linguist.

donderdag 13 augustus 2015

Biography of the future? Digital Humanities and a hypothetical biography of John de Witt (1625-1672)

As any humanist scholar of the twenty-first century, the modern biographer should account for what to do, or not to do, with the advances of the so-called digital humanities. A wide variety of biographical sources, primary sources like correspondence and secondary sources like biographical dictionaries, have been brought online over the past decades and will increasingly be consulted by future biographers. The digital turn does more, however, than only make biographers consult sources from behind their own computer rather than in a library or archive. The digitized sources can be analyzed in new, more advanced ways, visualizations of material help to see patterns, biographies can be presented in different ways, et cetera.
            The question remains if this digital turn, this increasing availability of ‘biographical data’ and new ways to consult them, really changes the biographies of the future.  This blog provides a critical reflection on the possibilities of digital humanities technologies for biographical research. I will take the life of grand pensionary John de Witt, the highest official of the Republic of the Netherlands (1653-1672), as an example of how a biographer could use digital humanities technology for biographical research, and to illustrate its potential and shortcomings. The choice for John de Witt springs forth from personal interest, the availability of plenty of primary sources to consult and the fact that he has been the subject of several larger biographies already, both in and outside the Netherlands.

I John de Witt: a life of diplomacy and writing

John de Witt is one of the most prominent figures in Dutch history. As grand pensionary of the province of Holland, the most powerful province of the Dutch Republic, he was considered to be the leader of the Republic and treated with all honors by foreign rulers. His untimely death in 1672, murdered and ripped apart by an angry mob, has contributed to the fame of his legacy. To date there is no conclusive evidence to point the blame to any particular person(s) for this murder, though his political enemy the prince of Orange, William III (the later king of England), is often mentioned as being (partly) responsible. Several biographies on John de Witt have appeared, in English and in Dutch.[2]

Statue of John de Witt in the Hague
 De Witt's archival legacy is immense. In his role as a state official alone, eight meters of letters are preserved in the Dutch National Archive, bundled in twenty volumes, written in official circuitous officialese jargon. [3]  He was not only a politician, but also a mathematician who is still considered to be a founding father of modern life insurance mathematics. 
            To write a biography on De Witt is no small feat. Nineteenth century politician J. R. Thorbecke noted that ‘To give us a life of De Witt worthy of the man is to assure oneself a place among historians of all time.’[4] Rowen needed almost 1000 pages to capture De Witt’s life in 1978. Panhuysen used slightly more than 500 pages to describe the life of De Witt in relation to that of his brother Cornelis in 2005. Recently, Prud’Homme van Reine needed more than 200 pages to describe the murder on De Witt and his brother alone.
            Rowen especially, has consulted a tremendous variety of sources to compose his work. Necessarily however, the biographer of De Witt, and of many other noteworthy persons in history, has to be selective in the topics to address and the sources to consult. One person can only ‘close-read’ a limited number of sources, especially in the twentyfirst century where there is no patience for academic output that takes longer than projects of four or five years. Digital humanities technology allows biographers to combine close-reading with ‘distant-reading’ in which larger texts are analyzed by the computer to facilitate finding patterns, test hypotheses and find leads to further research. 

II Analyzing the texts
Computer software is very good at reading and interpreting text, as long as it is modern text and written in a software interpretable (digitized) format. The first problem we encounter when we want to use digital humanities technology for a biography on De Witt, is that the correspondence of De Witt is not digitized yet, and even if it was, the OCR (Optical Character Recognition) that would have to translate the handwriting to computer readable text might not deliver great results.  Computers can be trained to recognize handwritings to improve performance, with the help of the crowd, but hand made transcriptions definitely have to be preferred.[5]  
            Let us assume however, that we have all eight meters of De Witt’s correspondence in a computer readable format of decent quality. The first thing we would want to know is all about the texts themselves to tell us a bit more about the man we are writing about. Questions we could ask with the help of digital tools, but could not ask before with great difficulty are: How long were De Witt’s letters? Did he use many words per sentence and many sentences per letter compared to his contemporaries? (this could tell us something about his working ethos and personality)  How does this change over time and why? 
            By lack of actually having the correspondence of De Witt available in a  digitized format, we used a transcription of a famous political text by De Witt  of 1654, his deduction,  as an illustration, in which he defends himself against serious allegations after conceding to the English lord protector Oliver Cromwell not to appoint a member of the House of Orange to the highest state offices. [6] By using simple and free online tools like Voyant Tools and WordCounter, we find out that De Witt used 34.456 words in this political text and that there are 5185 unique words in it. He uses 749 sentences with an average of 46 words per sentence. The most frequently used noun is ‘provincien’ (provinces). We can also see he uses that word frequently throughout the entire corpus and not just in one particular section, showing the importance of the relation between the different provinces of the Republic that De Witt addresses. A word like ‘Brabant’, the province, is only used in a very restricted part of the text. The name Orange (Oraingne), from the political adversaries of De Witt, is used often in the beginning of the text, as well as slightly after the middle of the text and in the very end.

Voyant Tools as a means to analyze John de Witt's texts

Another word De Witt uses often is ‘God’ (in several spelling variations) which we find 37 times.  If we would want to write a paragraph in our biography about how religious De Witt was in his thinking it would be a valuable exercise to compare the relative occurrence of ‘God’ in this political text to the mentioning of God in texts from other politicians of his time. This is necessary to contextualize De Witt and see how he compares to similar individuals. Once again, this would mean having to have access to full text computer readable versions of as many political texts as possible. 
            Finally, when we are dealing with a wide variety of texts, we should definitely consider the popular exercise in digital humanities of topic modelling. With topic modelling the computer extracts topics from text, by looking at the words that are mentioned together in statistically meaningful ways. In this way we could globally assess what is discussed in which documents, without having to read them fully. If we, for example, would want to know in which letters De Witt mentions the strength of the Dutch fleet, topic modelling could point us the way.

III Patterns in Networks
One of the main techniques for a biographer to contextualize his or her individual, is by analyzing the networks he or she was part of.  The analysis of correspondence with digital methods is a key component in finding out who had contact with whom and for what reasons. John de Witt was a statesman who led the anti-Orange party, who corresponded with foreign colleagues, ambassadors, scholars and international friends and who had an extensive patronage network. It therefore is of high interest to map his correspondence.  His biographers have also used his letters extensively to define his relationship to other people.
            A relatively simple exercise would be to analyze all the recipients of letters of John de Witt and all the people who have sent him letters. When you would visualize this with maps, graphs and figures (several off-the-shelf tools exist to do this) you would get a picture that allows us to see patterns we could not see before.[7] Stanford University’s Mapping the Republicof Letters, provides good examples of such visualizations. If we take the use case of the French philosopher Voltaire we can see graphs of the nationality and social background of his correspondents.  All is visualized on a map with the most modern techniques. For one, we may deduce that Voltaire’s correspondence was not as cosmopolitan as he might have wanted  to appear.
            An initiative directly relevant for John de Witt is the Circulation of Knowledge and Learned Practices in the 17th-centuryDutch Republic. Even though De Witt was primarily a statesman, he is likely to have been in contact with the most prominent intellectuals of his time.[8] When searching in the database we find five letters from the scientist (theoretical physicist) and inventor Christiaan Huygens to De Witt, from 1658 to 1670. These letters seem to give insight into quite a formal relationship between the two, in which Huygens calls himself  De Witt’s ‘humble servant’ as was the custom in that time.
            In one of these letters (1 February 1664) we also find mentioning of ‘lord Brus, brother-in-law of the lord of Somerdijck’.  Ideally, we want to know who this lord Brus is and what his relation to De Witt might have been and the same goes for any other people who are mentioned in the letters to and from De Witt. We also want to be able to match this lord Brus to other mentions of him in the correspondence.  A computer is able to search for other instances of lord Brus, but it cannot learn (without great difficulty) if these two lords are the same people, other than giving a negative match if two different Brusses are mentioned in letters that are chronologically too far apart.    

John de Witt, by Adriaen Hanneman
Another problem is a difference in spelling of the names and a difference in the way people are called or call themselves.  Christiaan Huygens signed the same letter with ‘Chr. Huygens van Zuylichem’, after the castle and estate his father had acquired in 1630. Similarly, a computer would have great difficulty to match a mention of a ‘master Vincent’ to the right person without knowing the context. The problem of recognizing and matching individuals automatically (NERD: named entity recognition and disambiguation) is common in projects that deal with biographical data.  Statistics combined with domain expert knowledge are increasingly successfully applied however,  to match names in separate documents.[9]

IV Comparisons
When using computers you want to make use of the strength of their calculating powers. For biographers it is particularly interesting to compare his or her individual to similar people, to be able to frame their individual in the context of their time.[10] To this end we need structured biographical data on as many people as possible.
            In  the case of John de Witt there are several groups of people we want to make comparisons to. De Witt started to study Law at the University of Leiden at the age of sixteen. If we were able to draw up schematic biographies of all students in the years he studied there we would know how he compared to them in regard to age, social and geographical background and later careers. De Witt was pensionary of Dordrecht shortly before becoming grand pensionary of Holland. By analyzing the previous office holders, and holders of the same office in other towns, we would get to know how unique his appointment was in that time. The same goes for the office of grand pensionary and practically any other network or group De Witt belonged to. Such prosopopgraphical, structured analyses would allow us to make much stronger substantiated claims about the person of De Witt, than usually is done in biographies.
            The reason  that such extensive comparisons often do not find their way in biographies is because they are very time consuming. With the help of digital methods however, such comparisons can be made easier. In order to do so we would need much biographical data online in a structured format.
 Digital biographical data can either be ‘digitized’, for example from a scanned book, or ‘digital born’.[11]  We can make a distinction between resources with primarily ‘generic’ biographical data, like dates and places of birth, marriage dates and dates and places of death, and resources with more narrative data with the description of a person’s life. If we, for example, take the Wikipedia entry on John de Witt then we have a quite extensive biography of his life (narrative data), accompanied by an info box with structured data on his life.  For a computer it is relatively easy to read and analyze the info boxes, but difficult to interpret the main text.
 Wikipedia is the biggest player in the field of online biographical data. Several studies have shown that for factual knowledge Wikipedia can compete with authoritative sources from professionals, especially in the field of data on people.[12]  DBpedia publishes the data of Wikipedia’s info boxes in linked data format. This allows analyses over different datasets and therefore increases the potential to compare John de Witt to different groups of people. Structured data on no less than three quarters of a million individuals worldwide are available through DBpedia. The advantages of having biographical data online is recognized as well by editors of biographical dictionaries.[13] These dictionaries contain relatively short biographies on people who were considered worth describing at the time of publication. Especially since the nineteenth century the dictionaries were published in multiple volumes all over Europe, containing descriptions of thousands of people.[14] These dictionaries form a rich source of information, that no human could ever fully analyze with traditional methods. 

V Publishing the Findings

Biographers, as historians in general, have a tendency to amass a vast amount of information on the person that is the topic of their research. Already in 1946 the Dutch biographer Jan Romein noted that biographies often were too long. The ideal length of a biography would be 200 pages, in which the biographer made a conscious selection of his source material.[15] When looking at the most prominent biographies of John de Witt, and to the length of modern biographies in general, we must conclude that Romein’s 200 page limit is rarely adhered to. Even if this is not necessarily a problem, it does reflect an above average struggle of biographers to select the right material and to keep a book within a certain page limit. Digital publishing might be the answer to both producing a manuscript of manageable size and being as ‘complete’ as possible. The monograph is not the only way anymore, or maybe even the most evident way, to publish your work.[16] With more authoritative sources online a change of attitude towards digital sources has taken place as well.  Recently, for example, academics have started to prefer the online version of the Oxford Dictionary of National Biography rather than the printed version.[17]
            There are many advantages to digital publishing. First of all, if we published our new biography of John de Witt online, we could easily rectify mistakes  If, for example, we would identify the previously mentioned lord Brus in the correspondence incorrectly, we could simply correct that and account for the change.
            Secondly, digital biographies make it easier to let go of the traditional narrative of an individual’s life from birth to death. We could divide our biography into ‘themes’ (e.g. the murder on John de Witt), provide hyperlinks to more detailed information (e.g. the Dutch Republic and the navy) and even publish the source material we used, or left out, for a particular topic (e.g. the letter from Christiaan Huygens to John de Witt of the previous paragraph). The Ludwig Boltzmann Institut für Geschichte undTheorie der Biographie is a forerunner in publishing biographies this way. In particular, they work on alternative modes of presentation for the lives of Austrian writers Ernst Jandl and Karl Kraus. They developed a content management system called Biographeme, “which breaks down the closed linear mode of life narratives in favour of a modular form of biography, the individual components of which can be combined and recombined according to interest or the question asked.”[18]
            Thirdly, the digital era offers unprecedented possibilities for researchers to also put their raw material online. We could put transcriptions of letters of John de Witt, systematically gathered information in a database and, when copyright permits it, original material online as part of an online publication. This would be a highly efficient way to facilitate further research and to allow others to check our findings. It is also a good way to show that the tax money invested in the research was well spent. It could or should be part of any data management policies at academic institutions to facilitate storing these data and making them available for further use. This would of course, also mean that a biographer should not ‘claim’ his or her subject as sometimes is the case, but let go of the subject once a biography is published (or maybe even before that).[19]
            Finally, there is the possibility of working together on a project if you put it online before a printed publication. By bringing your material online you open up a dialogue that holds the middle between writing and speaking.[20] In our case, for example, we could simply put the question online if anyone knows who lord Brus was. We could also provide our NERD results online and ask visitors of the website to spot incorrect matches. This way we avoid too many computer generated mistakes and get more input to refine our algorithms.   


VI Problems
In comparison to previous biographies on John de Witt, this hypothetical biography would be based on more resources, include quantifications on the topics John de Witt addressed most frequently, say more about John de Witt as an individual compared to his contemporaries, would include detailed and strongly visualized network analyses and would be presented in a more dynamic way with room for all the material we have gathered.  Unfortunately, at the moment this still remains largely hypothetical. 
            Perhaps the most fundamental problem which needs to be discussed is the one of representativeness. It is self-evident, but cannot be stressed often enough, that the scope of digital research is limited by the availability of computer readable digital data. The results of any exercise with computational methods therefore should be accompanied by a critical account of the completeness and biases of the sources. It would be very nice if we had all correspondence of John de Witt in a machine readable format, but that is not going to happen any time soon. Even only digitizing his correspondence would cost a lot of time and money.[21] In general. there is a huge amount of material that is not digitized (yet) and remains out of the scope of digital humanities research.[22]
            Another problem is the relatively high ratio of mistakes and inconsistencies which one would have to deal with when using digital tools on biographical data. The OCR quality of digitized texts alone can lead to problems in the analysis, especially when looking for names.[23] In the case of De Witt his seventeenth century handwriting poses another (high) hurdle to take.
            Finally, computers may be great at making calculations, but are bad at interpreting text. They can only work with the algorithms we feed them. If we ask them to match names from different documents they are bound to make more mistakes than a human would make. It is for example difficult for a computer to separate the lord Wassenaar from the location Wassenaar. They would, however, do the work much faster. A detailed documentation on how computer tools were used for our research on John de Witt should be provided in order for other researchers to check the analyses. Unfortunately, complex tools are more likely to provide precise results, but are more difficult to comprehend on a more than basic level for the digitally lay user.[24]

I discussed the potential of digital humanities tools for a hypothetical biography on John de Witt. The hypothetical biography is based on more documents and provides detailed network analyses, comparisons to contemporaries and visualizations. Some questions which could not easily be answered before, like how religiously influenced De Witt is in his writing compared to his contemporaries could even be answered. The hypothetical biography would be presented in a dynamic and interactive manner, providing possibilities for additions, dialogue and links to more information.
            Despite the apparent opportunities for biographical research, there still is a long way to go before this hypothetical biography of John de Witt could be written. First of all, much more archival sources should be put online in a format that makes it possible to analyze with digital methods.  Right now, we are still at the beginning of digitizing our cultural heritage. Techniques to translate handwriting into computer readable text is likewise still in the early stages of development.
            Some things however, can already be done for biographical research thanks to digital humanities technologies. New forms of presenting biographies already exist. In this chapter we have also shown some very basic examples of what can be done with digitized texts relating to the life of John de Witt. Even though the analyses do not provide any conclusive evidence, they do inspire new research questions and provide leads into what may be worthwhile investigating further.


[1] All URL’s in the  notes were last retrieved on 1 July 2015.
[2] Most notably: Panhuysen, De ware vrijheid; Rowen, John de Witt; Prud’Homme van Reine, Moordenaars van Jan de Witt.
[3] Panhuysen, De ware vrijheid, 7.
[4] Cited by Rowen, John de Witt, vii.
[5] See for example the Monk project.
[6] De Witt, De Deductie van Johan de Witt.
[7] Geerlings, 'A Visual Analysis of Rosey E. Pool’s Correspondence Archives', 65.
[8] Rowen, John de Witt, chapter 20.
[9] E.g. Bell and Ranade, 'Traces through time'.
[10] Ter Braake, ‘Het Individu en zijn Tijdgenoten.’
[11] See on this also: Haber, Digital Past, 103; Zaagsma, ‘On Digital History’, 24.
[12] Arthur, ‘Exhibiting history',  33-50; Currie, ‘The Feminist Critique'; Haber, Digital Past, 75-79; Liu, ‘Where is Cultural Criticism in the Digital Humanities?’496; Rosenzweig, ‘Wikipedia’, 62.
[13] E.g. Reinert, Schrott and Ebneth,  ‘From Biographies to Data Curation13-19.
[14] Caine, Biography and History, 49-53; Slocum, Biographical Dictionaries, xv- xvii.
[15] Romein, Biografie, 178-181.
[16] Van den Akker, 'History as Dialogue'Rigney, 'When the Monograph Is No Longer the Medium'
[17] Carter, 'Opportunities for National Biography Online', 345.
[18] Hannesschläger and Prager, 'Ernst Jandl and Karl Kraus', 1.
[19] On the claims of biographers on their subjects: Wilson, 'A Love Triangle.'
[20] Van den Akker, 'History as Dialogue'.
[21] Jeurgens, 'The Scent of the Digital Archive', 45.
[22] Sample, 'Unseen and Unremarked On', 187-88, 198-99;  Earhart, “Can Information Be Unfettered?' 309-314.
[23] Piersma and Ribbens, 'Digital Historical Research', 93.
[24] Lin, 'Transdisciplinarity and Digital Humanities', 306.

Reference List

Caine, B. Biography and History. Houndmills: Palgrave, Macmillan, 2010.
Carter, Ph. “Opportunities for National Biography Online: The Oxford Dictionary of National Biography, 2005-2012’.” In The ADB’s Story, edited by Melanie Nolan and Christine Fernon, 345–71. Canberra: ANU Press, 2013.
Haber, P. Digital Past. Digital Past. Geschichtswissenschaft Im Digitalen Zeitalter. München: Oldenbourg Verlag, 2011.
Lin, Y-w. “Transdisciplinarity and Digital Humanities: Lessons Learned from Developing Text-Mining Tools for Textual Analysis’.” In Understanding Digital Humanities, edited by David M. Berry, 295–314. Basingstoke: Palgrave, Macmillan, 2012.
Panhuysen, L. De ware vrijheid : de levens van Johan en Cornelis de WittAmsterdam: Atlas, 2005.
Prud'homme van Reine, R. B. Moordenaars van Jan de Witt : de zwartste bladzijde van de Gouden Eeuw. Utrecht: De Arbeiderspers, 2013.
Rigney, A. “When the Monograph Is No Longer the Medium: Historical Narrative in the Online Age.” History and Theory 49, no. 49 (2010): 100–117. doi:10.1111/j.1468-2303.2010.00562.x.
Romein, J. De Biografie. Een Inleiding. Amsterdam: Ploegsma, 1946.
Rosenzweig, R. “Wikipedia: Can History Be Open Source?” In Clio Wired. The Future of the Past in the Digital Age, 51–82. New York: Columbia University Press, 2011.
Rowen, H. H., John de Witt, grand pensionary of Holland, 1625-1672. Princeton, NJ : Princeton University Press, 1978
Slocum, R. B. Biographical Dictionaries and Related Works. An International Bibliography of Collective Biographies, Bio-Bibliographies, Collections of Epitaphs, Selected Genealogical Works, Dictionaries of Anonyms and Pseudonyms, Historical and Specialized Dictionaries. Detroit: Gale Research Co., 1967.
Ter Braake, S. “Het Individu En Zijn Tijdgenoten. Wat Een Biograaf Kan Doen Met Prosopografie En Biografische Woordenboeken.” Tijdschrift Voor Biografie 2, no. 2 (2013): 52–61.
Wilson, F. “A Love Triangle.” In Lives for Sale. Biographers’ Tales, edited by M Bostridge, 38–42. London and New York: Continuum, 2004.
Witt, J. de, De Deductie van Johan de Witt : manifest van de ware vrijheid uit 1654 / ingeleid en hertaald door Serge ter Braake. Arnhem: Sonsbeek Publishers, 2009.

donderdag 11 juni 2015

'Unfortunately our Computer Scientist isn't here': DHBenelux 2015 and the search for the Renaissance digital humanist

DHBenelux has quickly become THE meeting place for scholars working in the field of digital humanities in the Benelux. The purpose of this conference is to gather senior and junior researchers to discuss the latest DH projects, methods and technologies. With an acceptance rate of 94% of the submitred abstracts for this year's edition in Antwerp, it is a feel good, low threshhold meeting place. Not only many DH researchers from the Netherlands, Belgium and Luxembourg were present there, but also a few researchers from neighbouring countries.

With two keynote speakers, six parallel sessions in three rooms, a poster and a demo session DHBenelux2015 had a full program, spread over two days. I was present there with my BiographyNet colleague Antske Fokkens and our two Academy Assistents, Yassine Karimi and Miel Groten, who presented their research on changing perspectives on people in biographical dictionaries over time.

DHBenelux group photo, after dinner at the Antwerp Zoo. Photo Saskia Scheltjens

The program provided presentations of a nice mix of different kinds of digital humanists. I think we can divide them into a few categories, of course with the necessary overlap:

1) Humanist researchers that have little background in DH but try to apply ready made tools on their own datasets.

2) Humanist researchers who are experienced in using DH tools for their research but do not build tools themselves.

3) Humanist researchers that are part of an interdisciplinary team that builds DH tools (a category I belong to myself).

4) Computer scientists (a minority) that build DH tools in collaboration with humanist researchers (from either other category)

Who we may have missed at DHBenelux, but I believe this is a general problem, are researchers that have an extensive knowledge of both the humanities and the digital technologies. I was present there with a computational linguist and a (future) computer scientist and therefore would not have to be afraid of any difficult technical questions. Other teams however, were often only represented by the humanists, or only relied on ready made tools. This lead on several occasions to an apology for not being able to answer a question 'because our computer scientist isn't here.' Maybe even more worying is that tools are used without really knowing what they exactly do (black box tooling). You can compare the outcome of different tools and evaluate what seems to work best, but it is also necessary to know why some tools work better than others, to explain the differences and, if necessary, to be able to work on adjusting a tool for your own purpose. DH tools cannot be treated like text editing programs that may or may not function properly. DH tools read, interpret and manipulate the humanist data that are the core of our research. A proper tool criticism is necessary when using them.

The question remains of course, and has been asked many times before, how digital humanists, and current humanities students, can be trained in the best possible way. What is practical? What is needed? Where do we find the time and the resources? But maybe it is just a matter of waiting for the next generation to take over. At DHBenelux there were quite some very young junior researchers who already are becoming experts in both the humanities and in computer science (or only programming) and will be more comfortable in uniting the best of both worlds than most more experienced academics are now.

woensdag 12 november 2014

HistoInformatics, Barcelona 10 November 2014: the need for sceptics and heretics

This November the 6th International Conference on Social Informatics (Socinfo) has taken place in Barcelona, bringing together researchers worldwide from the social sciences and computer sciences. In conjunction with this main conference a set of workshops were held on related topics. One of these was Histoinformatics 2014, the second international workshop on computational history. As far as I know this is the only workshop dedicated soley to the use of computational methods for historical research, which sets it apart from the many international gatherings on Digital Humanities in general.

Placa de la Catalunya, Barcelona 9 November

The organizers claim they had to be quite rigorous in accepting papers for this workshop, since they had many submissions. Eleven papers made the cut to be presented there, including a paper from Antske Fokkens, Fred van Lieburg and myself on making an 'old' historical dataset of Dutch ministers available for current and future enhanced use. Considering the claimed high level of rejection it could be expected that the best of what digital history has to offer at the moment would be presented there. So with good hopes I took the plane to Barcelona, together with VU professor in Computer Science Guus Schreiber.

So was the workshop as good as anticipated? I would say maybe and no. Let me start with the maybe. Were the papers presented really the best what digital history has to offer? This is impossible to tell, since all participants came from Western Europe, which leaves out a huge field of digital historians from the United States, Asia and Australia.  The papers dealt with topics and techniques like linking different datasets, automatic data extraction, Named Entity Recognition, and methodological reflections on the use of computational techniques for historical research. All of these are worthwhile topics to investigate and it resulted in interesting and useful papers. Over all however, I did not hear much that was new to me and I feel this was true for most other participants as well. Maybe this is just the current state of digital history, that we all know what we are and should be doing, or maybe there was a lack of groundbreaking submissions for this conference only. It does make me wonder however, if initiatives like these are a bit too much 'preaching for one's own church'. Should we not rather concentrate on missionary work to convince other historians of the benefits of what we are doing?   

Prof dr. Guus Schreiber presenting his paper

Then there is the no of the workshop, which may or may not have all to do with the signalled lack of truely innovative papers. The discussions lacked liveliness and there were hardly any questions which took the presenters by surprise. This may partly be the fault of the location of the worskhop; a lecture room in one of the buildings of the Universitat Pompeu Fabra. The room was dark, the way people were seated did not facilitate interactive discussions and there were no facilities to stimulate a more informal exchange of ideas during the breaks. The fact that the keynote speaker did not show up, I assume for very good reasons, of course did not help either to stimulate a great sense of commitment to the workshop. While one would think that the city of Barcelona normally provides a more than stimulating environment, the city now only underlined the darkness and greyness of the workshop venue.

Sagrada Familia, Barcelona 10 November

Of course not everything in academia has to be exiciting, and it is perfectly OK to have a 'boring' workshop on a boring location. When there is a general concensus on what is presented however, one might wonder if it would not be a good idea to stimulate contributions and attendance from digital sceptics and digital heretics as well. Maybe invite a keynote speaker who does not believe at all in the benefitis of digital history and who keeps everyone sharp during the day? I at least look forward to doing a presentation on the same topic for an audience of hopefully sceptical humanities scholars on 21 November.

dinsdag 8 juli 2014

Academische Mediawijsheid

Mediawijsheid werd in 2005 door de Raad van Cultuur gedefinieerd als “’… het geheel van kennis, vaardigheden en mentaliteit waarmee burgers zich bewust, kritisch en actief kunnen bewegen in een complexe, veranderlijke en fundamenteel gemedialiseerde wereld.”[1] In de eenentwintigste eeuw worden mensen bestookt met een heterogene informatiestroom uit een divers scala van kanalen. Nieuws komt niet meer alleen tot mensen via het NOS journaal en de krant, maar ook via social media als Twitter en Facebook en meer statische websites van een diverse signatuur. Het wordt scholieren in toenemende mate aangeleerd om met wijsheid om te gaan met niet alleen de informatie die ze tot zich nemen, maar ook die ze zelf de wereld insturen. Door middel van copy-pasten worden nieuwsberichten snel overgenomen, wat soms leidt tot blunders, zelfs van landelijke dagbladen.[2]
                Mediawijsheid is ook een relevant begrip voor de academische wereld. Aan de ene kant is er natuurlijk het probleem dat websites vaak geen bron vermelden van hun informatie, waardoor de waarde daarvan moeilijk te controleren valt. Wikipedia heeft in dat opzicht een in toenemende mate goede naam verkregen, door het zelfcensurerende vermogen van de gebruikers en het gebruik van verwijzingen en een standaard format. Aan de andere kant hebben met name de mogelijkheden tot het in rap tempo verspreiden van informatie geleid tot de nodige kopzorgen bij onderzoekers. Genealoog Kees van Schaik verwoordde de problemen die dat met zich meebrengt in 2013 op eloquente wijze in De Nederlandsche Leeuw: “Het probleem van onjuiste afstammingen op internet is de hardnekkigheid waarmee ze voortbestaan. Omdat de internetgenealogen alleen oog hebben voor personen die op internet voorkomen worden dubbelgangers die niet op internet zichtbaar zijn over het hoofd gezien. Secundaire bronnen die DTB-gegevens zouden kunnen bevestigen worden zelden geraadpleegd. Bronvermeldingen ontbreken. Fouten verspreiden zich daardoor als een griepvirus en vormen uiteindelijk een canon die een gezonde publicatie geheel overwoekert. Het is onbegonnen werk om al die onzin middels goed gedocumenteerde publicaties te weerleggen.”[3] Het gevaar van de snelle verspreiding van informatie is dus dat als deze incorrect is ze heel moeilijk in de media en publieke opinie te corrigeren is.
                Van Schaik legt echter ook de vinger op een ander pijnlijk punt: zelfs als de beschikbare en overal gebruikte informatie wel correct is, dan betekent dat niet noodzakelijkerwijs dat daar goede wetenschap mee bedreven wordt. Wij worden overspoeld met informatie en weinig academici kunnen nog alle literatuur op hun eigen vakgebied bijbenen. Er zou daardoor de neiging kunnen zijn om alleen of vooral de online beschikbare bronnen en literatuur te gebruiken. Zo loop je echter het gevaar dat alleen wat er al bekend is in duizendvoud wordt herschikt en geanalyseerd en dat nieuwe inzichten die zich buiten de inmiddels geijkte mediapaden bevinden buiten beschouwing worden gelaten. De utopie van een maatschappij waarin we alle informatie denken te hebben kan zo snel de vorm aannemen van een gouden kooi waar nooit meer iets nieuws of verfrissends doordringt. In samenhang daarmee moeten we natuurlijk heel blij zijn met citation indexes, maar vaak (altijd?) zijn die alleen gebaseerd op online gepubliceerde tijdschriften. Mensen die iets willen weten over een bepaald onderwerp komen uit bij een bepaald artikel, dat vervolgens weer geciteerd wordt en waarvan vervolgens de citation index weer omhoog schiet en daardoor weer meer geciteerd wordt. Google zal vervolgens ook weer in toenemende mate dit artikel laten terugkomen als de meest relevante hit voor een bepaalde zoekopdracht. 
                  Dit “information rehashing process wordt nog eens versterkt doordat mensen zaken die ze delen met anderen sneller onthouden.[4] Meer unieke vondsten, waarvoor dus een minder groot referentiekader is, raken daardoor juist sneller uit het actieve werkgeheugen.  Artikelen van vergelijkbare kwaliteit als de zogenaamde topartikelen die door iedereen worden aangehaald dreigen op die manier onverdiend in de vergetelheid te raken. Ironisch genoeg was dat bijvoorbeeld tot de jaren zeventig het lot van het inmiddels klassieke werk van Maurice Halbwachs over de collectieve herinnering, totdat het werd ‘herontdekt’. Daarnaast is er het gevaar dat mensen bijna plichtsgetrouw verwijzen naar bepaalde werken over een onderwerp, omdat dat nu eenmaal schijnt te moeten, maar het werk zelf niet of nauwelijks hebben ingekeken.
                Nu zijn deze problemen zeker niet nieuw, maar door de digitale media hebben ze wel een grotere bewustzijnsurgentie gekregen. Voor de wetenschapper betekent het dat hij/zij vooral goed moet blijven doen waar een wetenschapper voor getraind wordt: op verantwoorde wijze de meest relevante informatie tot zich nemen. Die verantwoorde wijze houdt in dat hij/zij ook buiten de vaak prachtige mogelijkheden van de digitale wereld kijkt en de internetzeepbel waarin hij/zij zich wellicht bevindt probeert te doorbreken. Historici in het bijzonder moeten het archief in blijven duiken, zoeken naar relevante literatuur buiten de platgetreden paden en zich niet alleen laten leiden door gedigitaliseerde bronnen.

[2] Zo nam De Telegraaf meerdere keren berichten over van de satirische pagina van “De Speld” uit De Pers. (geraadpleegd 31-10-2013).
[3] K. van Schaik, ‘Pek en Veren’, De Nederlandsche Leeuw 130 (2013) 54.
[4] M. Halbwachs, Das kollektive Gedächtnis. Mit einem Geleitwort zur deutschen Ausgabe von Heinz Maus (Frankfurt am Main 1991) 29.