Data Exchange and Strategic Planning

Abstract

WG 5 aims to negotiate principles and standards of data exchange as well as plans for digitizing and making freely accessible learned correspondence from the early modern period. Building on the survey of collections of printed and manuscript letters, inventories, and finding aids undertaken in WG 4, WG 5 will develop a master plan for a joint digitization program. In addition, the WG will address issues of Open Access and legal restrictions, content syndication (e.g. to EMLO, Europeana, etc.), the development of worldwide unique persistent identifiers for letter, the connecting letters through semantic web techniques, and the long-term archiving of digitized letters.

WG 5 is led by Dr Thomas Stäcker, the Deputy Director of the Herzog August Bibliothek, Wolfenbüttel, responsible for the library’s programme of digitization and the Wolfenbüttel Digital Library.

wg5s

Agenda


The census of collections of early modern learned correspondence (pursued in WG 4) should be complemented by investigation into the most efficient and reliable methods of generating large quantities of digital catalogue records.

  1. For generating metadata on uncatalogued collections, WG 5 needs to explore the rapidly developing area of crowd-sourcing — scholarly, semi-scholarly and otherwise. Zooniverse is piloting flexible software for this purpose, and is looking for humanistic projects on which to pilot it (Oxford to lead).
  2. For existing catalogue data in traditional card files and related formats, experience of the most cost-effective and reliable means of scanning and keying should be pooled (library community to lead).

  1. This Action devolves the negotiation of individual components of the data model to the specialists assembled in WGs 1–4. WG 5 will chair the Action subcommittee within which these individual components will be integrated into a comprehensive standard data model.
  2. Since individual letters can exist in multiple manifestations (drafts, copies, extracts, abstracts, etc. in manuscript, print, and digital form), a unique identifier scheme for individual letters is needed. This project should build on the experience of similar schemes for people (e.g. VIAF) and printed books (e.g. VD 16).

The census of correspondence collections (produced by WG 4), could also provide one basis for drawing up a master plan to digitize collections of learned correspondence.

  1. Collections of printed correspondence now out of copyright can be digitized and web-mounted in the manner pioneered by CERA. WG 5’s initial role is to recommend standards and best practice.
  2. Collections of manuscript correspondence can be digitized and web-mounted.
  3. The challenge of funding such an enterprise could be assisted by drawing together information on various relevant funding schemes at the European, national, regional, civic, and institutional levels.

Generating large quantities of standardized digital metadata should be coupled with a campaign to persuade those in possession of such data to share it.

  1. Part of this challenge is legal: we need to develop a range of Open Access policies for metadata, images, documents, and full digital editions (drawing in this case on the experience of CCOpen DataGNU, etc.
  2. Another part of this challenge relates to scholarly conventions: we need to develop concise and accurate citation standard for digital materials. This might involve employing a persistent scheme (such as DOIURN or Handle), as well as XPath or Xpointer techniques) to cite more granular portions of text. A particular challenge is that, unlike printed materials, digital resources change over time, for which versioning pointers might be developed.

With such arrangements in place, WG 5 will coordinate a campaign to encourage contributions of relevant metadata, images, texts, and editions from a range of potential contributors.

  1. Repositories contributing metadata render their collections more visible and discoverable.
  2. Publishers of copyrighted editions of correspondence may regard digital catalogue records as advertisements for their products.
  3. Collaborative research projects gain access to the digital tools and larger pools of data available on shared infrastructure.
  4. Individual researchers render contributed data accessible and future-proof at minimal trouble and cost.

Issues of long-term preservation and sustainability require careful consideration.

  1. One technical challenge is to develop means to allow central repositories of digitized data to be regularly up-graded and up-dated without contaminating the data or disrupting the functionality of the digital tools developed to process them.
  2. Another is to develop an ontology cloud which allows data models, standards, and authorities to be incrementally refined over time as well. Here the experience of Aalto’s Semantic Computing Research Group will be particularly valuable, linking this strategic strand with work being undertaken in WG 2.
  3. The financial challenge is to develop funding sources and mechanisms which allow the preservation and up-grading of both data and platforms to be sustained indefinitely.