Cover Sheet for Proposals JISC Grant Funding Call Name of Programme & Strand: Programme Tags: Name of Call Area Bidding For: Name of Lead Institution: Name of Department where project would be based: Full Name of Proposed Project: Full Contact Details for Primary Lead and/or Contact for the Project: Length of Project: Project Start Date: Project End Date: Total Funding Requested from JISC: Funding Broken Down over Financial Years (April - March) Project Description / Abstract: Keywords describing project: I have looked at the example FOI form at Appendix B and included an FOI form in the attached bid I have read the Call, Briefing Paper and associated Terms and Conditions of Grant at Appendix D Information Environment 2011 Programme: Deposit of research outputs and Exposing digital content for education and research "INF11" and "JISCexpo" • Strand B - Expose University of Birmingham Institute for Textual Scholarship and Electronic Editing, School of Philosophy, Theology and Religion Linking documents, works and texts Name: Peter Robinson Position: Senior Research Fellow Email: p.m.robinson@bham.ac.uk Tel: 0121 415 8441 Skype/VoIP: peterrr73 Address: College of Arts and Law, University of Birmingham Postal Code: B29 6LG 9 months 1 June 2010 28 February 2011 £48,315 £48,315 This project will deploy an ontology of works, documents and texts to create over 500,000 RDF records. It will link these to item-level records for the works and documents and to books and articles dealing with specific text segments and manuscripts. These records will be made available over the web for harvesting by RDF federators. The project will create and document a public access demonstrator, showing how a dynamic interface can be built on the harvested RDF records. A public report will outline the lessons learnt from this work. Web 2.0; Resource Discovery; Research and innovation; Digital libraries YES YES 1 1. Project overview 1.1. Summary 1.1.1 This project begins with a request from a scholar: show me all the manuscripts which have the New Testament Greek Text of Chapter 1, Verse 1, of the Gospel of John. Further requests will follow: for the exact pages of those manuscripts containing those verses; for digital images of the pages; transcripts of the text on those pages; annotations on these; links to articles and books discussing these pages. Many of these materials are on the web. Yet, locating them is extraordinarily difficult: a highly-skilled expert scholar could spend hours with search engines and portals, and still not find all there is. 1.1.2 This is exactly the kind of task for which Web 2.0 technologies were created. We could create unambiguous metadata for each of the objects mentioned in the last paragraph; web crawlers could harvest all this metadata, and purpose-designed search engines could lead the reader to the materials sought. More specifically: we could use an ontology to define the various objects (manuscripts, texts, images, books and more) and their relationships. We could then use RDF statements, following the 'linked data' model, to populate our ontology with manuscripts, texts, images and other resources. We could place the RDF statements in a repository, open to harvesting; we could offer an interface with key search functionality, and open our metadata so that others can build their own access routes to our data. 1.1.3 That is what this project will do, for digital data it holds for four large textual traditions: the New Testament; Dante's Commedia and Monarchia; Chaucer's Canterbury Tales. It will use an ontology of works, documents and texts to answer the first questions given above: show me the manuscripts (documents) which contain a text of the work the Gospel of John.1 Linkage to further ontologies will permit the reader to find resources relating to these documents, texts and works: images, transcripts, catalogue entries, online and offline materials of every kind. Much work has already been done in creating ontologies for declaration and linkage of 'item-level' objects (books, articles), for example in the EU Discovery and DELOS Projects, in the AUSTLIT FRBR-based ontology. This project will extend these 'item-level' ontologies through an ontology of works, texts and documents jointly developed by the PI and Federico Meschini (Loyola University). The project base in the Institute for Textual Scholarship holds some fourteen thousand pages of manuscript transcripts and other information drawn from over 140 manuscripts, across four major textual traditions (New Testament Greek), Chaucer's Canterbury Tales, Dante's Monarchia and Commedia. The project will create RDF triple records for every text of each of these four works on every one of these pages: around 500,000 records.2 It will link these to item-level RDFs for the works and documents, so that a reader may go seamlessly from 1 The use of the terms 'document', 'work' and 'text' follows the usual practice of textual scholarship (e.g. T. Tanselle A Rationale of Textual Criticism). A 'document' is the physical carrier of a text, e.g. Codex Sinaiticus (c.f. FRBR 'item'; CIDOC-CRM 'information carrier'); a 'work' is the intellectual object, e.g. The Gospel of John (c.f. FRBR 'work'; CIDOC-CRM 'information object'); a 'text' is an instance of a work in a document, as in the text of the Gospel of John in Codex Sinaticus. For FRBR, 'Federated requirements for bibliographic records', see the IFLA website http://www.ifla.org/publications/functional-requirements-for-bibliographic-records; for CIDOC-CRM, representing an effort comparable to FRBR for cultural heritage documentation, see http://cidoc.ics.forth.gr/ . 2 A separate bid by the PI to the 'deposit' strand of this JISC call also proposes to create separate metadata for these same records. However, there is no overlap between the bids in their aims, though some economies will be achieved in resource use if both are funded. These economies will be applied to further development of the interface tools to be developed by the two projects. 2 an entry for the Gospel of John to a listing of the chapters and verses which compose it, to the manuscripts and manuscript pages which hold these, and to the images and transcripts on the web for those pages. Linkages will also be made in the other direction: to books and articles dealing with specific text segments and manuscripts, for two of the works: to resources catalogued in the Birmingham Research Publications Database and to the Chaucer Bibliography on line. 1.1.4 These records will be made available over the web for harvesting by RDF federators. The project will then create and document a public access demonstrator, showing how a dynamic interface can be built on the harvested RDF records. Some of the RDF records will be derived from ITSEE’s partners in Münster and Florence, to show handling of distributed records. A public report will outline the lessons learnt from this work. 1.2 Response to JISC Objectives 1.2.1 1. 2. 3. This project has three parts: Creation of some 500,000 RDF records from existing research data; Creation of an open-access demonstator A publicly-available report detailing the lessons of this project, in terms of barriers encountered, opportunities exposed, and paths for further exploitation. These three parts correspond to the three areas of work detailed in §29 of the grant funding call. In summary: the project will create a body of data of sufficient critical mass to test the hypothesis of this JISC call: that the linked data model, as outlined in the "Four rules for linked Data" (http://data.gov.uk/wiki/Linked_Data) will have considerable benefits for research data. 1.3 Value to JISC community 1.3.1 Over the last two decades, large quantites of data in digital form relating to documents, texts and works have been accumulated by many projects around the world. Almost all of this has been encoded at the level of the page (for example, digital images of manuscript pages); much has been encoded at the level of parts of works (for example, transcripts of particular pages holding particular parts of works). 1.3.2 However, there is a significant gap between the capacities of cataloguing systems and the level of detail now available within this data. Standard cataloguing systems inherit the print model of item-level identification, and so are extremely powerful at identifying particular copies of particular books. More recently, application of higher-level abstractions to item-level records have permitted more complex grouping and retrieval. The well-known FRBR entity definitions represent not only documents, works and texts as defined by this project but also persons, corporate bodies, concepts, objects, events and places, and represent also the relationships between the entities. CIDOC-CRM provides similar abstractions for cultural heritage materials, and a harmonization of the FRBR and CIDOC-CRM definitions has been created, as a single ontology ('Modelling Intellectual Processes: The FRBR-CRM Harmonization’, at http://cidoc.ics.forth.gr/docs/doer_le_boeuf.pdf). This concentration on item-level data has created successful models for complex linkages of (for example) books and articles. The AUSTLIT database (http://www.austlit.edu.au/), built entirely on FRBR ontologies illustrates this. Through AUSTLIT one can go to an author, see a list of the works of that author, then various expressions of those works (films, plays, as well as novels, poems, articles), then be taken to catalogue records for individual copies of those items (and, recently, to electronic versions of those). 1.3.3 In the print world, there was no need to consider how records might point to individual pages of documents, or separate parts of works. But in the digital world, the standard unit of information is one browser screen: an image or text transcript of a single manuscript page. Accordingly, there is now (as stated) a large body of 'born-digital' materials for which we have information (often, immense amounts of it) below item-level. To take just one dramatic instance: 3 for Codex Sinaticus, which itself as a whole might be just one 'item-level' record, we have information about the exact placing of each of the half-million words transcribed in the manuscript on each of the 800 surviving manuscript pages. For each word, we also know exactly its place in the verse, chapter and biblical books to which it belongs: altogether, over one million separate pieces of information. Creating an ontology which extends linkages between textual objects below the item level will permit expression of all this in a form accessible through existing item-level ontologies. One could (in the AUSTLIT example) navigate further, to a particular digital image of a particular page. 1.3.4 It would be difficult to overstate the possible impact of this work on that part of the JISC community which deals with documents and the texts contained in them. At present, finding individual parts of works or documents is difficult, and usually dependent on a particular project interface (for example, the Codex Sinaiticus interface at www.codexsinaiticus.org). The methods proposed by this project will make finding a part of a work or document precise and certain. The expression of this information as publically-available metadata will make it possible for users to go directly to the resource, independent of the project interface. See further 2.5 below. 1.3.5 This project may also prepare the way for a much larger impact. Typically, texts exist in many copies, distributed in many places. Finding, identifying, digitizing, transcribing and editing them is a task for whole communities, not just for the very few scholars who have so far been able to do this work. But for this to happen, we need a secure means of identifying all the individual parts of all these individual texts. This project offers a crucial first step towards that. 1.4 Innovation 1.4.1 As explained in the last section, much work has been done within the broad digital library community on item-level records and the relationships among them and other entities. Very little work has been done on developing a formal ontology for constructs below the item-level. Here is a paradox, and an opportunity. The paradox is that almost all digital projects dealing with text find a need to declare objects below the item level: for example, stating the sequence of pages in a manuscript. Yet, very little of this information is exposed to public view: typically, the reader must go through the project interface to access this. The opportunity is to create a means by which this data can be reliably and efficiently exposed. 1.4.2 With Federico Meschini, the PI has developed an ontology for documents, works and texts which provides the level of granularity required to support identification and linkages below the item-level: not just to the level of the page or text segment, but to words, to individual characters, even to the smallest mark on the page. This ontology is based on some five years of preparation by the PI (first published in ‘Current directions in the making of digital editions: towards interactive editions.’ Ecdotica 2007). It will be formally presented in a joint paper to the 2010 ADHO conference in London 2010. This project will be the first substantial instantiation of this ontology. 1.4.3 Do we need a new ontology? Are there existing systems which could achieve what we want? There are three other efforts to create formal structures which might be used to address the needs of this project. The first is the Canonical Text Services initiative (http://chs75.harvard.edu/projects/diginc/techpub/cts). The CTS scheme is not expressed as a formal ontology; it does not provide a secure discrimination between documents, works and texts; it is optimized for efficient retrieval of text fragments by applications, rather than formal definition of a scheme for labelling fragments. However, translation of CTS data to and from the ontology here proposed would be straightforward. 1.4.4 The other two possible methods are both digital library systems which create 'digital wrappers' for related objects, and define the relationships among them. These are the Library of 4 Congress Metadata Encoding and Transmission Scheme (METS: http://www.loc.gov/standards/mets/mets-home.html) and the Open Archives Initiative Object Reuse and Exchange initiative (OAI-ORE: http://www.openarchives.org/ore/). These provide powerful systems for managing objects within digital libraries. However, the resources this project addresses may be anywhere, and not within digital library systems: that is the nature of linked data. Indeed, many of the objects to be referenced by this project are not digital at all. A manuscript is made of parchment, not bytes, and the particular strength of the linked data model is that it is built on a clear distinction between an 'information resource' (an object in digital form, such as a digital image) and a 'non-information resource' (an object not in digital form, such as a manuscript).3 That said, there are elements within these systems which are highly relevant to the needs of this project. For example, from this project's metadata one would could readily create a sequence of digital images representing all the pages of a manuscripts. From this, a METS list of all the images could be created, and then sent to a METS image viewer. This suggests that the linked data to be created by this project and systems such as METS and OAI-ORE are complementary. This project's ontology could be used to declare fine-grained relationships among objects, which would then enable intelligent handling of these objects by digital library systems. 2. Project plan 2.1 Timetable and deliverables 2.1.1 Months 1-3: Creation of c.500,000 RDF records from materials held within ITSEE. Thus: For 100 mss of parts of the Greek NT, each of 50 pages, each containing 30 verses=100*50*25=150,000; Dante Monarchia 22*80*25=44,000; Dante's Commedia 9*200*70=126,000; Canterbury Tales 12*500*35=210,000. Deposit of these in the University Institutional Repository; exposure of these to RDF federating systems. Creation of a project website. Deliverable 1: the RDF records, mounted on the Birmingham IR 2.1.2 Months 4-6: Building two access demonstrators. The first will show navigation strategies for movement within the RDF records created by this project, from documents and works to their parts. It will show how alternative interfaces to the data may be developed from the metadata alone. The second demonstrator will show linkages between the RDF records using the project ontology and other resources. We will implement two kinds of linkage: i. From item-level catalogue records into the records made by this project. That is: a query to an online catalogue for 'The Gospel of John' should link to the records for that work in this project's ontology. The reader should be able to go from the catalogue entry through the project ontology to a list of all manuscripts containing this work, and thence to associated digital images and transcripts ii. From the records made by this project to resources outside the ontology developed by this project. The project will implement these links for two sets of data. First, it will implement links to the publications (books and articles) relating to the New Testament texts by members of the New Testament editing team in ITSEE, as listed in the Birmingham Research Publications Database, maintained by the Birmingham Institutional Repository team. Second, it will implement links to the Chaucer Bibliography Online for the Canterbury Tales materials. For example: the Chaucer Bibliography Online lists an article by Hugh Keenan on lines 345-346 of the General Prologue. The project will create 3 For this rather ugly terminology, see http://www4.wiwiss.fuberlin.de/bizer/pub/LinkedDataTutorial/. 5 a link between the instances of those lines in the RDF records and the online Bibliography, and express that link too in RDF form. The demonstrators will then be linked to the project website. Deliverables 2 and 3: the access demonstrators 2.1.3 Months 7-9: Dissemination activities. The project will host a one-day workshop showing its methods and results. A report will be placed on the project website. Deliverables 4 and 5: the workshop; public report. 2.2 Project management 2.2.1 Project management will follow the model offered by JISC’s Project Management Guidelines, May 2008, p. 9 ff, with responsibilities divided into three: 1. A project steering group. This will meet three times: at the commencement of the project in June 2010, at the end of month 3, and at the project end. The steering group will consist of the PI, two senior researchers in the university outside the project, and the codirector (with the PI) of ITSEE, Professor David Parker. The project manager will report monthly to the steering group. 2. The project manager: the PI, Robinson. One half-day a week to the project throughout. 3. The project technical officer (Green), reporting weekly to the project manager. 2.3 Risks: staff recruitment 2.3.1 The two key project staff, the PI and technical officer, are in post already. For the PI: scheduled completion of other projects before June 2010 will free time to work on this project. The Technical Officer (Green) is currently employed at 0.4 time, and so will be available on 1 June for this post. In the event that he is not so available: posts of this nature, even short term ones, invariably draw a strong field of applicants in this university, and can be filled quickly. This might, however, delay the project one or two months. 2.4 IPR 2.4.1 While in many cases the data to which the metadata points (manuscript images, transcripts) has IPR restrictions, no such restrictions apply to any of the metadata, in the form of RDF records, to be generated by this project. All these RDF records will be made available freeto-all under the Creative Commons attribution-share alike licence. This will permit the widest possible use and re-use of the records. Note that we will not apply a 'non-commercial' restriction to the licence. There are important commercial users of metadata such as this project will create (e.g. Talis) and the metadata should be as readily available to them as it is to anyone else. 2.5 Sustainability 2.5.1 In the first place, the project will secure the longevity of the metadata by depositing it in the University Institutional Repository, as sets of RDF-XML files containing multiple RDF records. The RDF-XML files will themselves have OAI-PMH compliant metadata, which will expose the data to worldwide RDF aggregators, and enable retrieval through RDF federation systems (e.g. JENA, SESAME) and the RDF standard query language, SPARQL. 2.5.2 We see the linked-data model behind the project as having a much more important impact on sustainability. One of the premises of this project is that the current model, where most access to high-quality digital resources depends entirely on the interface to those resources made by the projects which created those resources, is fundamentally flawed.4 This model means that the data 4 Key documents, setting out the approach to the interface which lies behind this proposal, are the papers by Roger Bagnall, Greg Crane and the PI at the ‘Shape of Things to Come’ conference, Charlottesville, March 2010: http://shapeofthings.org/papers/ (user name shapeofthings, password papers; to be published by Rice University Press in April 2010). 6 is only available so long as the interface is available: and as interfaces are extremely system- and browser-dependent, this is likely to be a rather short time. This project offers an alternative: by creating rich metadata for each distinct digital element (even, a single character in a text on one page of a manuscript) it will be possible to create multiple access routes to the data from the metadata alone. These would complement, and could ultimately replace, the dedicated interfaces so far created. To return to the example at the beginning of this proposal: one could create an interface for resources relating to the first verse of the Gospel of John, giving access to each manuscript which has this text, and images, transcripts, and to other materials relating to these, from the metadata alone. 3. Engagement with the community 3.1 Project stakeholders 3.1 It is in the nature of 'linked data' that everyone, everywhere is a stakeholder: the road leads to every door. The following groups have a special interest in this project, in order of widening focus: i. Scholars interested in the text of the four textual traditions ii. Scholars interested in other works and documents susceptible to the same methodology as developed for this project iii. Linked data developers, for whom the volume and characteristics of data on documents, works and texts will present challenges iv. Everyone interested in these texts. Even for the narrowest of these communities, (i) above, the numbers are large. The annual conferences of the Society of Biblical Literature draw several thousand professional scholars; over 25,000 copies of the fundamental text-critical edition of the Greek New Testament, the Nestle-Aland edition, are sold or given away each year, mostly to students in seminaries and universities. For the largest of these groupings, (iv) above: one may point to the more than one million individual visitors to the Codex Sinaiticus website from July to November 2011. 3.2 Dissemination 3.2.1 The project will target the first three stakeholder groups listed above, as follows: Scholars interested in the text of the four textual traditions: ITSEE co-director Parker will be responsible for the New Testament texts, and will present the project at the annual meeting of all participants in the Birmingham-Munster NT editing projects. PI Robinson will be responsible for the two Dante and Chaucer sets of materials. He will present these at the annual Kalamazoo medieval conference, the most widely-attended single conference in medieval studies. In addition, links to the access demonstrator will be provided from websites for all four editorial groups. Scholars interested in other works and documents: these will be targetted through presentations at the two major international conferences on textual scholarship: the Society for Textual Scholarship, New York (March 2011) and the European Society for Textual Scholarship, Pisa (November 2010). The PI is the UK representative on the ESF-COST InterEdition project, and the project will be presented to those groups also. Linked data developers: A workshop in the last months of the project will present the project's methodologies, focussing on the possibilities for dynamic interface development from the metadata created by the project. The ADHO 2010 presentation of the documents, works and texts ontology by the PI and Meschini will be developed into an article to be submitted to a major digital humanities journal. There is no direct way of targetting the fourth group: everyone interested in these texts. This project will focus on the first three groups. Later projects may seek to reach and foster wider textual communities, from the starting point provided by this project. Linkage of the records 7 created by this project for document and text segments below the item level, to the item-level records for the whole documents and texts of which they are part, will mean that any reader coming through a catalogue interface to the whole document or text will also be able to navigate through to the parts of the documents or texts recorded according to this ontology, and to the links between them. This will make resources related to the individual pages of documents and segments of texts considerably more visible than they are at present. 4. Impact 4.1 The project and the wider community 4.1.1 The project will have achieved its immediate aims if it reaches the first three stakeholder groups identified in 3.1 above. However, there is a further aim, toward which this project is a critical first step. This is the creation of 'textual communities' for the editing of large textual traditions based on digital technology. Within the communities, scholars and readers will execute the entire editing process, using (among others) the editorial tools and standards developed by the PI. The textual communities will be open, where anyone interested in a text (say, Dante’s Commedia) can find which manuscripts and printed editions hold the text; can locate digital images and transcripts of these; can compare them, search them and analyse them using many different tools; and can contribute his or her own knowledge and materials for others to use. 4.1.2 The ontology created and implemented by this project will be a key enabling technology towards achievement of this vision. Other tools have been or are being made for these communities (e.g. the 'son of SUDA online' in development by the Integrating Digital Papyrology project). Consider the following scenario: i. A reader notices that a new set of digital images for a manuscript of the New Testament has been created. He or she knows what part of the New Testament is contained on each page. The browser presents a tool which allows the reader to state, for each page, what text is on it; this is converted into RDF statements using the ontology here created and deposited in a RDF store; ii. A reader, somewhere else in the world, has declared he or she is interested in this particular text. RDF records, using this project's ontology, are generated stating this reader's interest. iii. Elsewhere: an RDF federating application matches the availability of the new images of the text in (i) with the reader's interest in this text in (ii), and generates an RSS record which is sent to the reader: 'you might be interested in this website, which contains an image of a text of A, in document B'. iv. The reader in (ii) discovers that there is no transcript of this text on this page available, by submitting a query through the browser to the RDF store. He or she makes a transcript of this page, and places it on a website. Again, RDF statements about this new transcript are generated, and deposited in a RDF store. One could extend this scenario indefinitely. Other readers could find the new transcript, correct it and augment it; others could then compare the transcript with other texts of the same part of the New Testament found in other documents; others could annotate it in various ways. In every case, the ontology first instantiated by this project would have a crucial role, in setting out the links in the chain. 4.1.3 Or, another scenario: a reader is looking at the first line of the Canterbury Tales in their browser. A piece of software running in the background notices this and thinks: that person is reading the first line of the Tales. What is there out there, relevant to what he or she is reading? The computer queries the RDF store and locates records using this project's ontology. It sorts the information into transcripts, images, commentaries, etc, and sends a message to the reader by the browser: 'you might be interested in ... '. 8 4.2 This project, the community and sustainability 4.2 It is usual to see the two issues of sustainability and communities as requiring separate strategies. One could pursue a centrally-based model of sustainability, and deal separately with community building. This project takes a different view. We believe that there is one solution to both problems. We aim to build a single model for the making of scholarly editions in the digital age which is both sustainable and which permits the widest engagement with the community. Our model is: the creation of textual communities for collaborative editing of large textual traditions based on digital technology, through services and data distributed across the web. 4.3 How will the creation of textual communities address sustainability? Sustainability is not only a matter of data handling standards and routines. Sustainability, of any kind, depends on community will. So long as people want to read the Commedia, they will want access to editions and information about it. However, the will of the community must be given practical shape, as crucial materials can be lost through negligence. The open architecture proposed by this project, interlocking with existing and foreseen data storage and migration facilities (particularly, the institutional repository movement), offers a route towards sustainability of distributed resources. 4.3 Evaluation 4.3.1 The project will be able to provide statistical measures of its progress, as follows: i. RDF records created, categorized by type ii. Accesses to the RDF records on the Birmingham IR iii. Accesses to digital on ITSEE servers materials referenced from the RDF records iv. Users of the access demonstrator v. Incorporations of access demonstrator elements on other websites. These statistical measures will be used throughout the project to assess its progress. They will be considered particularly at the first steering group meeting, at the end of month 3. These measures will be supplemented by a user survey in the last three months of the project. Resources are allocated in the project to commission a draft evaluation report, based on the survey results and statistics. This will then be revised by the Project PI, and submitted as a final evaluation report. 5. Previous Experience of Project Staff Peter Robinson, PI: codirector of ITSEE and of the Canterbury Tales project. Involved in the making of digital editions since 1990. His publications, as editor or facilitator, include twenty digital publications. He led the EU-funded MASTER project, which created the manuscript description encoding which is the basis of the TEI P5 manuscript description element, and was the major contributor to the TEI P4 chapters on text transcription and apparatus encoding. He most recently served on the Technical Standards Working Party of the Codex Sinaiticus project, and led the JISC-funded Virtual Manuscript Room project. [10% time, directly allocated] Jill Russell: manages the University of Birmingham institutional repository and is closely involved in developing the University’s emerging strategies for archiving digital documents and other research outputs. She holds a Masters Degree in Library and Information Studies and has extensive experience of work in Higher Education libraries. She has a successful record of managing internal and external projects. [Member of the steering group] David Parker: codirector of ITSEE and Executive Editor of the International Greek New Testament Project; PI of the Codex Sinaiticus Project and the IGNTP Project; Co-PI of the Vetus Latina Iohannes Project [Member of steering group and NT consultant] Zeth Green, technical officer: holds an undergraduate degree in Theology and a Masters in Electronic Editing. Ten years experience in web development; particular expertise in Python, XML databases; vice-chair of Python UK Society [50%, directly allocated] 9 6. Budget Directly Incurred Staff Apr10– Mar11 Robinson: 10%, grade 9, point 51 Green: 50%, grade 7, point 30 Total Directly Incurred Staff (A) Non-Staff Travel and expenses Hardware/software Dissemination Evaluation Other Total Directly Incurred Non-Staff (B) TOTAL £ Apr11– Mar 12 £ £ £ £ £ £ TOTAL £ Apr10– Mar11 £1,000 £2,000 £2,000 £1,500 £ £6,500 Directly Incurred Total (C) (A+B=C) Directly Allocated Apr11– Mar 12 £5,473 £ £14,987 £20,460 £ £26,960 £ £1,000 £2,000 £2,000 £1,500 £ £6,500 £26,960 Apr11– Mar 12 £3,438 £ £ £3,438 £ TOTAL £ 17917 £ £17,917 Total Project Cost (C+D+E) £48,315 £ £48,315 Amount Requested from JISC Institutional Contributions £48,315 £ £0 £ £48,315 £ Estates Other Directly Allocated Total (D) Apr10– Mar11 £5,473 £14,987 £20,460 £ Indirect Costs (E) Percentage Contributions over the life of the project JISC Partners X 100 X% No. FTEs used to calculate indirect and estates charges, and staff included No FTEs 0.6 10 Which Staff Robinson, Green £3,438 £ £3,438 Total 100% FOI Withheld Information Form 1. We would like JISC to consider withholding the following sections or paragraphs from disclosure, should the contents of this proposal be requested under the Freedom of Information Act, or if we are successful in our bid for funding and our project proposal is made available on JISC’s website. 2. We acknowledge that the FOI Withheld Information Form is of indicative value only and that JISC may nevertheless be obliged to disclose this information in accordance with the requirements of the Act. We acknowledge that the final decision on disclosure rests with JISC. Section / Paragraph No. - Relevant exemption from disclosure under FOI - Justification - 11 Professor Vincent Gaffney Director of Research and Knowledge Transfer College of Arts and Law University of Birmingham Edgbaston B15 2TT United Kingdom 16th April 2010 Statement of support I am writing to affirm the support of the University of Birmingham for the application submitted by Dr Peter Robinson under the name Linking documents, works and texts, closing 12 noon UK time on Tuesday 20th April 2010. This project has implications for, and so involves staff from, many segments of the university. I affirm that there has been wide consultation among all these divisions of the university in the preparation of the bid, that appropriate commitments of staff time have been made for this project in the event of the bid's success, and that the project costings have been prepared and approved by the University finance office. I affirm also that the University will administer this project, should the bid be successful. Yours sincerely, Professor Vincent Gaffney 12 Appendix: the Virtual Manuscript Room project, funded by JISC 2008‐2009 Project facts: start date 1 September 2008; finished 30 September 2009. Funding: £69,000 from JISC, matched by the University of Birmingham Resources: one full-time member of staff; one manager 10% time (the PI of this project) URL: http://vmr.bham.ac.uk This project, based in the Institute for Textual Scholarship and Electronic Editing (ITSEE) at the University of Birmingham, addressed both the issues of cost and of metadata. The first aim was to establish a pipeline for efficient submission of a full set of manuscript images, with accompanying metadata, to a web interface. This was achieved for 138 sets of manuscript images: 71 from the Mingana collection, 22 of Geoffrey Chaucer's Canterbury Tales, 38 miniscules of the Greek New Testament, 7 of Dante's Commedia, amounting to around 40,000 manuscript images. At a total cost to JISC of around £1.50 per image, this represents excellent value. Indeed, the marginal cost of adding an additional set of manuscript images to the system is much lower than that. It is around 15 minutes work to add a folder containing a full set of images for a whole manuscript to the VMR, inclusive of metadata generation: thus, pennies per image. The images for the manuscripts can then be viewed through the image viewer online: for example, at http://vmr.bham.ac.uk/Collections/Mingana/Islamic_Arabic_1572/table/. The second aim was to create appropriate metadata for each manuscript and each image which would allow the images and manuscripts to be accessed through the University of Birmingham Institutional Repository. This would greatly add to their exposure on the web, and also provide a route for long-term sustainability. This was achieved for the 71 Mingana manuscripts. Thus, http://epapers.bham.ac.uk/116/ provides parallel access to the same manuscript. Subject to copyright agreement, currently in negotiation with many of the manuscript holding institutions, images from the other three collections (New Testament, Chaucer and Dante) will be made available for public access as are the Mingana manuscripts. These achievements have provided a sound foundation to build from the VMR, as the base for digitization and for the range of editing activities carried on within ITSEE and its partners. As a pathway towards digitization: a plan has been developed to digitize all 3000 manuscripts of the Mingana Collection and use the same combination of the VMR and the Institutional Repository to present them to the world. As a base for editing: ITSEE's partner in the New Testament work, the Institute for New Testament Textual Research, has implemented its own version of the Virtual Manuscript Room, at http://intf.uni-muenster.de/vmr/NTVMR/IndexNTVMR.php, with links between the Munster and Birmingham implementations. The VMR project in Birmingham is also adding facilities to allows scholars to provide further information on materials held on the site, and a JISC-sponsored Workshop on Collaborative Editing in September 2009 explored the use of the VMR as a host for community-based editing. 13