Cover Sheet for Proposals JISC Grant Funding Call Name of Programme & Strand: Programme Tags: Name of Call Area Bidding For: Name of Lead Institution: Name of Department where project would be based: Full Name of Proposed Project: Name(s) of Partner HE/FE Institutions Involved: Name(s) of Partner Company/Consultants Involved: Full Contact Details for Primary Lead and/or Contact for the Project: Length of Project: Project Start Date: Project End Date: Total Funding Requested from JISC: Funding Broken Down over Financial Years (April - March) Project Description / Abstract: Keywords describing project: I have looked at the example FOI form at Appendix B and included an FOI form in the attached bid I have read the Call, Briefing Paper and associated Terms and Conditions of Grant at Appendix D Information Environment 2011 Programme: Exposing digital content for education and research "INF11" and "JISCexpo" Strand B - Expose Queen Mary University of London School of Electronic Engineering and Computer Science Linked Music Metadata None The Metabrainz Foundation Name: Position: Email: Tel: Address: Simon Dixon Lecturer simon.dixon@elec.qmul.ac.uk 020 7882 7681 Elec Eng, Queen Mary University of London, Mile End Rd, London Postal Code: E1 4NS 12 months 01/07/10 30/06/11 £94,894.00 July 2010- March 2011: £71301 April 2011 - June 2011: £23593 The MusicBrainz user community has created a metadatabase describing over 9 million musical recordings, which is used by media players and web services such as the BBC's /music. The data is structured but not linked. Part of this data has been made available on the semantic web in the past, but it lacks recent extensions and updates, as it was not based on a sustainable model. With the upcoming release of MusicBrainz' Next Generation Schema (May 2010, beta), it is an appropriate time to map the new metadata to RDF and publish the Linked Data directly from the MusicBrainz website. This data will be linked to music metadata on the semantic web (e.g. DBpedia, BBC), and exposed via a SPARQL endpoint. We will engage with end users in the Music and Music Informatics fields, providing tutorial materials and workshops to encourage uptake of project outputs. We will investigate and report on major issues arising in the project, such as scalability, provenance and sustainability. Music metadata, MusicBrainz YES YES 1 Appropriateness and Fit to Programme Objectives & Overall Value to JISC Community 1. Entertainment industries have been revolutionised by digital technologies, which have fundamentally changed the means of creation, production and distribution of media items. Alongside the new business models, new fields of research, such as Music Informatics, have arisen, and have a strong presence in UK Higher Education (HE) institutions (Lee et al., 2009). Likewise, traditional academic fields such as Musicology have benefited from the new opportunities provided by technology, in particular the (semi-)automated analysis (Cook, 2007) and visualisation (Cannam et al., 2006) of musical recordings. Technology has radically increased the scale at which work based on collections of musical recordings is performed, but this has also introduced new problems concerning the management, navigation and annotation of these super-collections. Vast quantities of metadata concerning musical recordings can be found on the Web, but this is typically unstructured and difficult to navigate automatically. MusicBrainz (musicbrainz.org) provides the largest free metadatabase about music recordings, and for several years has been in the process of extending its scope from covering simple bibliographic information to becoming a comprehensive music information site. As a community-based non-profit organisation, they lack the resources to port their data to the semantic web. With the beta release of their next generation schema (NGS) in May 2010, the opportunity is ripe for bringing this valuable metadata into the semantic web and linking it with related resources. 2. Objectives 3. The objectives of this project are to: (1) Publish the MusicBrainz metadatabase directly as a Linked Data resource on the semantic web; (2) Link it to other music metadata on the semantic web; (3) Create a semantic web query facility (SPARQL endpoint) for the MusicBrainz data; (4) Develop tutorial materials to explain and illustrate how the content can be used; (5) Engage the academic music and music informatics communities, and the broader semantic web community, in the development and use of the exposed content; and (6) Investigate and report on issues such as scalability, provenance and sustainability of the resource. Background 4. 5. 6. MusicBrainz is an open user community that collects, maintains and makes available to the public, music metadata in the form of a relational database. The MusicBrainz project was started as a free and open compact disc identification service. It was intended as an alternative to Gracenote’s CDDB which also began as a free service but then adopted strict licensing policies. The MusicBrainz project has grown beyond the CD identification task and now provides a wealth of crowd-sourced structured data about music. Using a well-defined set of community guidelines and a simple hashing approach for creating unique identifiers for music artists, albums, and tracks, the MusicBrainz project has assembled what is one of the cleanest and most comprehensive music metadata repositories on the Web. The Linked Data community has already recognized the immense value of the MusicBrainz project and MusicBrainz identifiers have been adopted by several Linked Data entities including the BBC Music website (http://bbc.co.uk/music). Although the MusicBrainz project had previously provided descriptions of resources in RDF/XML, this approach was abandoned in favor of an ad-hoc XML serialization. This decision was made nearly five years ago before the Linked Data movement had the traction it has now, and the tools for parsing and authoring RDF were not as mature. Currently, the MusicBrainz project does not directly provide Linked Data although, as mentioned before, it does provide unique identifiers for music-related entities that Linked Data practitioners find particularly attractive for the minting of URIs. For several years, we at the Centre for Digital Music (C4DM) have provided an RDF translation of the MusicBrainz data through the DBTune.org project, which was one of the first Linked Data entities to advocate the use of MusicBrainz identifiers for music-related URIs. Hosted by C4DM, DBtune.org has been an integral part of the Linking Open Data movement (Bizer et al., 2007). C4DM was also the key player in the development of the Music Ontology (Raimond et al., 2007), which is widely used in the Linked Data community and generally accepted as the most comprehensive and flexible ontological model for the music domain. The existing translation of the MusicBrainz data maps the MusicBrainz schema to the Music Ontology and provides a Linked Data version of MusicBrainz in parallel to the original MusicBrainz resource. A SPARQL endpoint for querying the MusicBrainz translation is also provided. (SPARQL is an RDF query language standardised and recommended by the World Wide Web Consortium.) While the MusicBrainz translation has been a useful Linked Data resource, it is not updated automatically from MusicBrainz, and 7. manual updates are not performed regularly. Further, it only translates a subset of the available metadata. Members of Linked Data community agree that assisting the MusicBrainz community in making the MusicBrainz website a full-fledged Linked Data source is the way forward (Jacobson et al., 2009a). In this way, MusicBrainz edits will be propagated immediately to the semantic web (they are currently over a year out of date) including the increasingly rich metadata which is becoming available. As the MusicBrainz project has evolved, additional types of data have been included. These new types of data usually take the form of a MusicBrainz Advanced Relationship (AR). The MusicBrainz community has grown a series of ARs organically and over the last two years MusicBrainz has been working to crystalise these additional concepts into a new database schema called the Next Generation Schema (NGS) (Kaye, 2008). The NGS includes additional structured data for example associating a lyricist with a particular track or a live-performance album with the original studio album. Such relationships are not included in the original DBTune.org MusicBrainz RDF translation. The impending release of MusicBrainz NGS (Kaye, 2010) makes the present a highly appropriate time for the Linked Data community to assist the MusicBrainz project in creating an RDF translation of the Next Generation Schema and publishing Linked Data directly from the MusicBrainz website. Users and Needs 8. In making music metadata available on the semantic web, we are addressing the needs of two primary types of user in the HE sector: (i) those working in the Music Informatics (or Music Information Retrieval) community, primarily in computing and engineering departments; and (ii) musicologists and musicians working in music departments. This is in addition to the (sizable) international MusicBrainz user community, who also stand to benefit from the linking of their data to other semantic web resources, as well as software developers working with on-line music services, and their users, who will indirectly benefit from this project. 9. During the OMRAS-2 project, we have been working closely with Music Informatics specialists (e.g. Prof. Geraint Wiggins and Tim Crawford at Goldsmiths) and Musicologists (Profs. Dan Leech-Wilkinson of Kings College London and Nick Cook of Cambridge University and their PhD students and RAs). We have also recently commenced work with the British Library, and via the NEMA project (nema.lis.uiuc.edu) we colloborate with international leaders in Music Informatics and Computational Musicology. Through these colloborations we have developed a good understanding of these users’ needs. The need that is central to this project is that of being able to identify musical entities (e.g. artists and recordings) unambiguously. Until the last decade, this was rarely an important issue. Data sets were small and their owners were aware of their content, often at an expert level. But as the size of music collections increases, and data is processed automatically rather than manually, it becomes essential to manage the metadata about the collections in a principled way. Linked data provides a means of joining different information sources, enabling the added value of “unexpected re-use of information” (Berners-Lee, 2006) to be realised. 10. The utility of linked data is however limited by the extent of the interlinking that exists between related data sets, and the current multiplicity of URIs for artists, albums and tracks is a potential hindrance to the goals of linked data. Since MusicBrainz provides the most extensive open metadatabase for musical recordings and is already a de facto standard in its field, it would be advantageous to establish it as a standard for Linked Data as well. For this to take place, the database must be exposed on the semantic web with infrastructure to ensure that updates are automatic or propagated in a timely and sustainable manner. 11. The success of the project can be measured in immediate terms by semantic web traffic: the number of web site hits with an “Accept RDF” header, or the number of distinct users connecting to the SPARQL endpoint. In the longer term, the use of MusicBrainz URIs in third party tools and services, and eventually new studies (enabled by being able to query the linked data) will also indicate the success of the project (see also Impact below). 2 Quality of Proposal and Robustness of Workplan 12. The work of the project is divided into 5 work packages (WP), three of which cover the technical work and one covers each of dissemination and management. WP1-3 address respectively the three areas of work listed in the Call (paragraph 29), namely: (i) “Make a collection of resources available ...”; (ii) “Develop a prototype ...”; and (iii) “Explore and report on the opportunities and barriers ...”. The work plan is illustrated in the Gantt chart overleaf. Linked Music Metadata - Diagrammatic Work Plan (Numbers show estimated percentage of effort between work packages) Key: M = month, D = deliverable, WP = work package WP1 Convert MusicBrainz to linked data 1.1 Mapping NGS schema to RDF 1.2 Content negotiation 1.3 Linking to other data sets M1 M2 M3 M4 M5 M6 M7 50 70 75 75 70 20 D1.1 WP4 Dissemination and engagement 4.1 Public engagement 4.2 Workshop 1 4.3 Workshop 2 4.4 Other dissemination activities WP5 Management M9 M10 M11 M12 D1.2 WP2 Prototypes and Tutorials 2.1 Creation of SPARQL endpoint 2.2 Production of tutorial materials WP3 Opportunities and barriers 3.1 Trust and provenance 3.2 Opportunities and barriers M8 40 75 75 75 70 D2.1 D2.2 10 10 5 5 5 5 60 D2.3 5 5 5 5 5 50 D3.1 30 D4.1 10 10 10 15 25 10 10 10 15 25 40 D4.2 D4.2 D4.3 10 10 10 10 10 10 10 10 10 10 10 10 WP1: Convert MusicBrainz to linked data (Months 1–6) 13. WP1.1 Mapping of MusicBrainz NGS database schema to RDF: The MusicBrainz community is in the final stages of creating a more expressive schema for describing music-related metadata, called the Next Generation Schema (NGS) (Kaye, 2010). While NGS provides clear structure and semantics for the MusicBrainz data, it does not directly provide Linked Data. To adhere to the principals of Linked Data the MusicBrainz NGS must be expressed using RDF (Berners-Lee, 2006). A mapping of the MusicBrainz NGS to appropriate OWL/RDFS ontologies is required. This task will involve feedback from the JISC community in the form of mailing list discussions on the Linking Open Data mailing list (Linked Data Community, 2010) as well as the Music Ontology Specification Group and the MusicBrainz community. The process of soliciting community feedback on early iterations of the mapping will ensure the use of the most appropriate and widely accepted ontologies and will encourage use of the resulting Linked Data resources output by the project. 14. WP1.2 Implementation of content negotiation and serving of RDF: We will contribute appropriate content negotiation code to the MusicBrainz server code base. Most of the resources described by the MusicBrainz project are non-information resources, that is the URIs refer to real-world things such as music artists or albums. Following practices of Linked Data, these URIs should provide 303 redirects to appropriate information resources when they are dereferenced via HTTP. Depending on the content header in the HTTP request the redirect will point to either a human-readable HTML page or a machinereadable RDF document. If for some unforeseen reason including redirects and content negotiation in the MusicBrainz server proves to be impossible, RDFa can be embedded directly in the MusicBrainz HTML documents. 15. WP1.3 Linking to other data sets: We will provide appropriate links to other datasets in the MusicBrainz RDF data to best meet the Linked Data recommendations. The MusicBrainz NGS contains a wealth of links to external resources including BBC Music, Discogs, IMDB, Wikipedia, and Myspace. These links can be used to create appropriate links to the corresponding Linked Data resources (i.e. DBPedia.org resources). Additional links can be automatically generated using the graph matching approach proposed by Raimond (2009), and then manually checked using the tried-and-tested crowd-sourcing framework that powers the MusicBrainz project. 16. Deliverable D1.1 (M2): RDF mapping of MusicBrainz NGS schema 17. Deliverable D1.2 (M6): Publication of the MusicBrainz metadatabase to the semantic web WP2: Creation of prototypes and tutorial material (Months 6–11) 18. WP2.1 Creation of SPARQL endpoint: We will create and maintain a SPARQL endpoint that allows users to query the MusicBrainz RDF. In our previous work with DBTune.org we have served RDF from a Postgres database using a D2R server which serves as a translation layer between the relational database and a SPARQL endpoint. While the D2R software is of great utility, its performance as a SPARQL engine is limited by the underlying database schema. We plan instead to perform an RDF dump from the MusicBrainz database into a purpose-built triple store (e.g. 4store). We will need to address the scalability issues which arise when serialising a database of this size. Assuming the use of a purpose-built triple store provides significant performance gains compared to a D2R server configuration, an infrastructure that automates the RDF dumping task will be implemented such that changes to the MusicBrainz dataset propagate to the SPARQL endpoint at regular intervals making the SPARQL endpoint resource sustainable and up-to-date. 19. WP2.2 Production of tutorial materials: We will design and produce tutorial materials, including sample SPARQL queries, screencasts and videos, describing how the structured data can be queried and accessed by third party tools and services. These tutorial materials will be released on our web site, as well as being presented at our workshops (see WP4). For the first workshop (M6), the more general tutorial material on music data and the semantic web presented at ISMIR 2009 will be refreshed. The second workshop (M11) will focus specifically on the project outputs. 20. Deliverable D2.1 (M10): SPARQL endpoint serving MusicBrainz data 21. Deliverable D2.2 (M6): Semantic web and music data tutorial materials 22. Deliverable D2.3 (M11): Tutorial materials on using SPARQL to query the MusicBrainz data WP3: Evaluation of opportunities and barriers (Months 1–12) 23. WP3.1 Trust and provenance: The MusicBrainz database tracks the provenance of the data (complete edit history), so it would be possible to publish this with the MusicBrainz data, in consultation with the MusicBrainz community. We will investigate other trust and/or provenance issues as necessary in conjunction with WP1 and WP2. 24. WP3.2 Opportunities and barriers: Issues will be tracked continually throughout the project as they arise (e.g. scalability in WP2.1). They will initially be discussed at weekly project meetings and notable issues will be logged on the project blog. These records will form the basis for a report which will be compiled at the end of the project. 25. Deliverable D3.1 (M12): Report on opportunities and barriers in publishing the MusicBrainz database as a semantic web resource. WP4: Dissemination and engagement (Months 1–12) 26. Different stakeholders will be engaged at various stages of the project. In the initial stages, the MusicBrainz and Linked Open Data communities will be engaged in the design of the RDF mapping (see WP1.1). Engagement with the JISC community will take place throughout the project via JISC programme events and the networking of the JISC Developer Community. End users in the higher education sector will be involved via two workshops in months 6 and 11. 27. WP4.1 Public engagement: A project web site and blog will be established and maintained to report the progress of the project to the general public. 28. WP4.2 Workshop 1: The first workshop will be held in conjunction with the annual Digital Music Research Network workshop at Queen Mary, which is widely attended by national and some international academics in the Digital Music field. We will present the results of publishing the MusicBrainz data (WP1), introduce our plans for the remainder of the project, and seek feedback from the end user community concerning the direction of the project. We shall also present an introductory tutorial on music information and the semantic web for those who are new to the field. The workshops will be open to members of the research community and the general public, and will be advertised in relevant mailing lists and web sites. 29. WP4.3 Workshop 2: The second workshop will be held at the end of M11 and seek to promote the use of the linked data resources in the Music and Music Informatics communities. Advanced tutorials will be presented, explaining how to build tools and services using the SPARQL endpoint. 30. WP4.4 Other dissemination activities: Outcomes of the work will be published in journals (e.g. Computer Music Journal, Journal of New Music Research) and major international conferences (e.g. International Society for Music Information Retrieval Conference), as well as JISC’s programme meetings and other events. We will also make the project known through our related projects such as OMRAS-2 (www.omras2. org), Software Sustainability (http://gow.epsrc.ac.uk/ViewGrant.aspx?GrantRef=EP/H043101/1) and NEMA (nema.lis.uiuc.edu). In M12, the RA will visit end user sites to work with users who attended the workshops and require assistance in getting started with building applications. 31. Deliverable D4.1 (M1): Establishment of project blog and web site, Core Resources Form and public version of project plan 32. Deliverable D4.2 (M6,M11): Delivery of two workshops 33. Deliverable D4.3: At least one publication in a refereed journal and international conference. (Due to the length of the project these are likely to occur after the project has ended.) WP5: Management (Months 1–12) 34. The project will be managed on a day-to-day basis by the PI, with project meetings held weekly to assess progress and problems. This has been our practice throughout OMRAS-2. At the workshops, and at conferences that we visit, we will hold meetings with key representatives of our user communities, and use their feedback to steer remaining parts of the project and future work. 35. Deliverable 5.1: Reports, budgets, plans as required by JISC. Risk Analysis 36. We have identified a number of risks, which are presented alongside mitigation actions in the following table. Risk Likelihood Impact Mitigating Action Unable to recruit RA Low High The named RA (Jacobson) is eager to work on this project; if unable to do so, we have other RAs (e.g. Dan Tidhar) who would also be able to work on this project. The named RA does not complete PhD on time Modest Medium In negotiation with JISC we could delay the project start date, or if this is not possible, employ someone else (see above). The MusicBrainz user community votes against the proposed changes Low Medium We plan to involve the user community from the beginning of the project (see WP1.1) so that their feedback shapes design decisions, which are then reflected in successive iterations of the RDF mapping. Nevertheless, if the user community still fails to support the project, we can mirror the MusicBrainz data and keep it current via live feed. UK HE end users fail to see the relevance of the work. Medium High We collaborate with a number of end users in both music information retrieval (e.g. Goldsmiths, University of London) and musicology (e.g. Cambridge University and King’s College London), and will seek feedback from them at the planned workshops (see WP4) to ensure that the project outputs will be highly relevant to users. Software development deadlines not met Modest Medium Our group has extensive experience in software engineering that we can draw upon if problems are encountered. We shall also contribute to and draw upon the mutual support available via the JISC developer community (http://devcsi.ukoln.ac.uk/). Management difficulties encountered Low High The PI leads a group of 5 PhD students and has experience as CI and/or RA on a number of national and EU projects. He can draw on the wealth of experience of the CI (Sandler) and other senior members of C4DM as necessary. Intellectual Property Rights 37. The MusicBrainz database contains two types of information. The core data, consisting of factual data such as the names of albums, tracks and artists, is in the public domain. The remaining data, such as an- notations, tags, opinions and ratings are protected by the Creative Commons Attribution-NonCommercialShareAlike 2.0 licence, which allows non-commercial use of the data under the conditions that MusicBrainz is given credit and that any derivative works are also made available under the same licence. Thus the linked data published by this project will follow the same licence arrangement. In particular, this allows the outputs of the project to be used freely in teaching and research. Software produced in the project will be protected by an open source licence in order to freely allow non-commercial use of the code. The exact terms will be determined in consultation with OSSwatch. Sustainability 38. C4DM has been addressing the issues of sustainability and reproducible research in our field, as evidenced by our recently awarded £940k EPSRC Grant, Sustainable Software for Digital Music and Audio Research (EP/H043101/1, 2010-2014). The Sustainable Software project will provide us with infrastructure to ensure the maintenance of the current project’s outputs at least until 2014 (i.e. beyond JISC’s minimum of 2 years), but also seeks to develop a model by which an income stream is generated to fund ongoing maintenance of software and data. Recruitment 39. We do not anticipate any difficulties with recruitment, as we have a large research group, part of which is actively involved in researching and using semantic web technologies related to music data. In particular, the named RA is currently completing his PhD thesis and plans to be ready for the start of this project. If this does not work out, we have flexibility in the OMRAS-2 grant to release one of the RAs to work on the project (see Risk Analysis above). 3 Engagement with the Community 40. The main stakeholders in the project are people who use information about musical recordings. In the HE sector, this includes Music Informatics researchers and practitioners, who develop new algorithms for analysing, navigating, manipulating and understanding musical works and collections; and music teachers, students and researchers, for whom recordings are works of art exemplifying the performers’ technical mastery and interpretative skill. Other stakeholders include the Linked Data community, JISC community and MusicBrainz user community. 41. Various stakeholders will be engaged at different parts of the project, as described in WP4. The NGS mapping (WP1) will take place in consultation with the MusicBrainz user community via their mailing list, IRC channel and forums, and with the Linked Data community via the Linked Open Data (LOD) mailing list. We will promote the project to key representatives of the end-user community early in the project, at the ISMIR conference (Aug 2010), by a planned presentation in the Late Breaking session and by personal communication. The UK user community will be reached via our workshops (see WP4), the first of which we be held in conjunction with DMRN 2010 (www.dmrn.org) at Queen Mary. We have direct experience of this type of interaction, having organised a workshop for invited researchers from the UK, Europe and USA in December 2008, as part of the OMRAS-2 project. The feedback from this involvement will be taken into account in further work. The second workshop in month 11 will showcase the SPARQL endpoint for the MusicBrainz database and feature tutorials demonstrating its use in solving typical information needs and potential use in answering new types of research questions. We will ensure that the tutorials are suitable for a wide range of users, including non-experts in semantic web technologies. The semantic web community will be reached via JISC events and the LOD list. 4 Impact 42. The current RDF translation of the MusicBrainz database contains only basic metadata and does not map any of MusicBrainz’ Advanced Relationship data. This richer data is being refactored according to the soon-to-be-released NGS, which, if exposed as linked data, will allow users to make more expressive queries and receive more useful responses. For example, a musicologist or music student searching for recordings of a particular orchestra or composer immediately faces a problem which renders the semantic web useless to them: the current version of the MusicBrainz database uses the categories artist, album and track, and for classical music, users might have entered the performer, composer, or a mixture of the two in the artist field. This problem is addressed by the NGS. Likewise a music student might want to find tracks featuring a particular saxophonist or lyricist, or find out if there are any live recordings of a particular piece for which they have only a studio version. Further, an MIR researcher developing an algorithm that classifies music by genre using audio features might want to test whether the recording engineer or studio have an effect on classification results. These are the types of information needs that a linked data version of the NGS could satisfy. For the wider semantic web community, the exposure of MusicBrainz’ NGS will provide a hub for talking about music and lay the groundwork for advanced semantic web applications. 43. Sustainability of this work is ensured by publishing the data on the MusicBrainz server itself. MusicBrainz has been running for about 10 years, and has support from major organisations such as Google and the BBC. In the unlikely event of MusicBrainz ceasing to exist, the database will be set up on the DBTune.org site. Software created in the project will be maintained by our project Sustainable Software for Digital Music and Audio Research, which is funded by EPSRC until 2014, and has plans for supporting long-term sustainability of research software. This strategy allows the project to have a continuing impact after it has ended. 44. A cursory examination of the current DBTune.org MusicBrainz translation server logs shows that the service receives over 1000 unique visitors a month with the SPARQL endpoint alone receiving around 750 unique visitors. (Note that this represents less than 1% of MusicBrainz users, the majority of whom would not currently be aware of the DBTune service.) We can compare this baseline level of traffic with the traffic visiting the new SPARQL endpoint produced in WP2.1. We would expect this new endpoint to spark additional interest as it would allow for more efficient execution of more expressive queries. This can be measured both by the number of unique visitors and by the number of requests for linked data, i.e. containing the “Accept RDF” header. 45. The longer term impact of the project, i.e. beyond the end of the project, can be measured in terms of uptake in the user communities, for example by the tools and services which are enriched or enabled by the availability of the data on the semantic web. In particular, we envisage that music students will have tools for identifying and researching recordings, and that musicologists and MIR specialists will be able to perform new types of research, answering new research questions involving orders of magnitude more data than they would have previously been able to consider. 5 Budget (see table overleaf) 46. A full-time postdoctoral research assistant (RA) with skills in software development and semantic web technologies is required for the 12-month duration of the project. The RA will perform the design, software development, updating the blog, and promoting the project to the user community. The named RA (Jacobson) has a track record in developing semantic web services for music and performing system administration for the DBTune.org site. He has also promoted the use of the semantic web for music data by giving a tutorial on this topic at the ISMIR 2009 conference in Japan. 47. The time allocation for the PI (Dixon) is factored at 10% which will be used for project management, staff management, reporting and research, as well as attendance at the start-up meeting, JISC Programme meetings, meetings with the Programme Manager and dissemination and evaluation events. The involvement of CI Sandler at 5% will focus on realisation of opportunities for knowledge transfer arising from the project and the integration of work from the NEMA and OMRAS-2 projects, for which he is PI. 48. Under hardware/software we request one laptop computer (dual-boot Linux/Windows) for the RA to be used for development work, document preparation, presentations, demonstrations and visits to user sites. We request a Lenovo ThinkPad X201 (£941) for this purpose, and allow £500 for any necessary software licences or other consumables. C4DM will provide access to a computing cluster and large-scale storage facilities as an in-kind contribution. 49. For dissemination, we request £2000 to cover the cost of 2 one-day workshops (room and AV hire £400; catering £525 for lunch and two coffee breaks for 30 people; printing costs £75). Other dissemination costs and evaluation costs come under the travel and expenses heading, which includes the cost of the RA travelling to one international conference (ISMIR 2010: travel £450, hotel and subsistence £400, registration £350), and the cost of the project team attending JISC events (Programme meetings, dissemination events, evaluation activities), local conferences and meetings, and visits to user sites to promote the project, which we estimate will amount to £1000. 50. The host institution has a large research group (see below) working at the interface of music and technology. In particular, we are very active in the Music Information Retrieval (MIR) community, working on the annotation and navigation of large music collections using semantic web technologies (see for example www.omras2.org). Exposing and linking the MusicBrainz database will be beneficial for present and future projects (see Impact above). We will contribute 20% of the cost of this project, on top of the in-kind contributions mentioned above. MusicBrainz will benefit from the exposure of their metadata on the semantic web, and will make in-kind contributions of access to their data and computing resources, and technical assistance in achieving the project goals. Directly Incurred Staff Apr10–Mar11 Apr11–Mar12 TOTAL £ Postdoc RA, Grade 5, 1650 hours, 100% FTE 31191 10397 41588 Total Directly Incurred Staff (A) 31191 10397 41588 Apr10–Mar11 Apr11–Mar12 TOTAL £ Travel and expenses Hardware/software Dissemination Evaluation Other 1950 1451 1000 0 0 250 0 1000 0 0 2200 1451 2000 0 0 Total Directly Incurred Non-staff (B) 4401 1250 5651 35592 11647 47239 Apr10–Mar11 Apr11–Mar12 TOTAL £ 8350 9413 702 2783 3137 234 11133 12550 936 Total Directly Allocated (D) 18465 6154 24619 Indirect Costs (E) 35069 11690 46759 Total Project Cost (C+D+E) 89126 29491 118617 Amount requested from JISC Institutional contributions 71301 17825 23593 5898 94894 23723 Non-staff Directly Incurred Total (C=A+B) Directly Allocated Staff Estates Other Percentage Contributions Over Life of Project No. FTEs used to calculate indirect and estates charges, and staff included 6 JISC 80% Partners 20% 1.15 Total 100% Jacobson (1.0) Dixon (0.1) Sandler (0.05) Previous Experience of the Project Team 51. The Centre for Digital Music (C4DM) at Queen Mary University of London is a world-leading multidisciplinary research group in the field of Music & Audio Technology. C4DM has around 50 members working on signal processing of digital music, music informatics, machine listening, audio engineering and interactive performance. Research funding obtained since 2001 totals over £14m, mainly from the EPSRC and EU. Current projects include Sustainable Software for Digital Music and Audio Research (EP/H043101/1, 2010-2014), the £2.5m Online Music Recognition and Searching II (OMRAS-2, EP/E017614/1, with Goldsmiths, University of London, 2007-10), an EPSRC Platform Grant (EP/E045235/1, 2007-12), the £5.9m Doctoral Training Centre in Digital Music and Media for the Creative Economy (EP/G03723X/1, 20092017), and Networked Environment for Music Analysis, funded by the Andrew W. Mellon Foundation (2008-10). 52. In the area of this proposal, music informatics and the semantic web, we have a strong research track record, including publications on ontologies (Raimond et al., 2007), music recommendation using Linked Data (Passant and Raimond, 2008), and music similarity on the semantic web (Jacobson et al., 2009b). The DBtune.org project has been hosting music-related RDF data since 2007 and has been an integral part of the Linking Open Data movement (Bizer et al., 2007). Created and hosted by C4DM, the DBtune. org project was one of the first Linked Data entities to advocate the use of MusicBrainz identifiers for musicrelated URIs and currently hosts an RDF translation of the MusicBrainz data based on the old database schema. The Music Ontology (Raimond et al., 2007) is widely used in the Linked Data community and generally accepted as the most comprehensive and flexible ontological model for the music domain. 53. Dr Simon Dixon (PI) is a Lecturer in Electronic Engineering at Queen Mary and leads C4DM’s Music Informatics group. He has a PhD (Sydney) in Computer Science (in the area of knowledge representation and reasoning) and LMusA diploma in Classical Guitar. His research interests cover various aspects of music informatics, including high-level music signal analysis and the representation of musical knowledge (particularly rhythm and harmony). He is CI on OMRAS-2 and was CI on Interfaces to Music (Vienna Science and Technology Fund, 2004-2007). He is author of the beat tracking software BeatRoot (ranked first in the MIREX 2006 evaluation of beat tracking systems) and the audio alignment software MATCH (Best Poster Award, ISMIR 2005), and co-author of the top-ranked Audio Chord Detection and Music Structure Segmentation systems (MIREX 2009). He was Programme Co-Chair for ISMIR 2007, and copresented the ISMIR 2006 tutorial on Computational Rhythm Description. 54. Prof Mark Sandler (CI) is Director of C4DM. He became a Professor of Signal Processing at Queen Mary in 2001, following 19 years at King’s College, where he was also Professor of Signal Processing. He is/was PI on OMRAS-2 and SIMAC. He was General Chair of DAFx’03 and General Co-Chair of ISMIR’05. He is Chair of the Audio Engineering Society Technical Committee on Semantic Audio Analysis. He is a Fellow of the IET and AES. He is Director of the Doctoral Training Centre in Digital Music and Media for the Creative Economy. 55. Kurt Jacobson (RA) is currently a doctoral student at C4DM working on the OMRAS-2 project. As assistant administrator of DBTune.org he has worked to create Semantic Web services for music including a service publishing structured data about music artists on Myspace and musicological data about classical music composers. He is working on modeling and exploring connections in music using structured data from heterogeneous sources including historical musicology, social networks, and audio analysis. He was co-presenter of the ISMIR 2009 tutorial on the semantic web and music information, titled Share and Share Alike, You Can Say Anything about Music in the Web of Data. 56. The MetaBrainz Foundation (http://metabrainz.org/) is a non-profit organisation based in San Luis Obispo, California, USA, that operates the MusicBrainz project, a user maintained community music metadatabase. The MusicBrainz database currently contains metadata on over 9 million recordings from over half a million artists. Metabrainz is supported by donations from companies such as Sun and Google, as well as private donations, and they license their data commercially to partners such as the BBC and ZEEZEE (zeezee.de). The BBC has also provided them with data from their Orpheus classical works database (containing over 100000 works) to aid the development of the next generation schema (NGS). References Berners-Lee, T. (2006). Linked data. Available at http://www.w3.org/DesignIssues/LinkedData.html. Bizer, C., Heath, T., Ayers, D., and Raimond, Y. (2007). Interlinking open data on the web. In Demonstrations Track, 4th European Semantic Web Conference. Cannam, C., Landone, C., Bello, J. P., and Sandler, M. (2006). The Sonic Visualiser: A visualisation platform for semantic descriptors from musical signals. In 7th International Conference on Music Information Retrieval, pages 324–327. Cook, N. (2007). Performance analysis and Chopin’s mazurkas. Musicae Scientae, 11(2):183–205. Jacobson, K., Humfrey, N. J., Raimond, Y., Brickley, D., and Idehen, K. (2009a). What about music-related URIs??? See mailing list archive at http://lists.w3.org/Archives/Public/public-lod/2009Sep/0029. html. Jacobson, K., Raimond, Y., and Sandler, M. (2009b). An ecosystem for transparent music similarity in an open world. In 10th International Conference on Music Information Retrieval, pages 33–38. Kaye, R. (2008). Next generation schema: Where we are today. Available online at http://blog. musicbrainz.org/?p=351. Kaye, R. (2010). NGS beta 2: May 24th 2010. Available online at http://blog.musicbrainz.org/?p=527. Lee, J., Jones, M., and Downie, J. (2009). An analysis of ISMIR proceedings: Patterns of authorship, topic, and citation. In 10th International Society for Music Information Retrieval Conference, pages 57–62. Linked Data Community (2010). Linking open data mailing list. Available online at http://lists.w3.org/ Archives/Public/public-lod/. Passant, A. and Raimond, Y. (2008). Combining social music and semantic web for music-related recommender systems. In Semantic Web Workshop. Raimond, Y. (2009). A Distributed Music Information System. PhD thesis, Queen Mary University of London, Centre for Digital Music. Raimond, Y., Abdallah, S., Sandler, M., and Giasson, F. (2007). The music ontology. In 8th International Conference on Music Information Retrieval, pages 417–422. FOI Withheld Information Form 1. We would like JISC to consider withholding the following sections or paragraphs from disclosure, should the contents of this proposal be requested under the Freedom of Information Act, or if we are successful in our bid for funding and our project proposal is made available on JISC’s website. NONE 2. We acknowledge that the FOI Withheld Information Form is of indicative value only and that JISC may nevertheless be obliged to disclose this information in accordance with the requirements of the Act. We acknowledge that the final decision on disclosure rests with JISC. 1 JISC, Northavon House, Coldharbour Lane, Bristol, BS16 1QD. Queen Mary, University of London Mile End Road, London E1 4NS Telephone: 020 7882 5209 Facsimile: 020 8980 6533 Website: www.dcs.qmul.ac.uk School of Electronic Engineering and Computer Science Professor E.P. Robinson, MA PhD, FBCS, CiTP Head of School Email: e.p.robinson@dcs.qmul.ac.uk 20/4/10 To whom it may concern, I am writing in strong support of the proposal Linked Music Metadata which is being submitted to the JISC Information Environment Funding Call 2/10: Deposit of research outputs and Exposing digital content for education and research, by Dr Simon Dixon and Prof Mark Sandler of the Centre for Digital Music in the School of Electronic Engineering and Computer Science. The Centre for Digital Music is a world-leading research group in the area of Music and Technology, with particular strengths in Music Informatics, Machine Listening and Interaction. For several years, the Music Informatics branch has included a strong semantic web component, and they have made important contributions to the Linking Open Data movement, including their work on the Music Ontology and the DBTune.org service. The current project will allow the continuation and expansion of the DBTune project, and will provide a metadata resource which will be of potential benefit to all academics (and others) working with music-related information. The project fits well with the research aims of the School, and I strongly support this application. Yours sincerely, Edmund Robinson Patron: Her Majesty the Queen Incorporated by Royal Charter as Queen Mary & Westfield College, University of London
© Copyright 2025