Cover Sheet for Proposals JISC Grant Funding Call

Cover Sheet for Proposals
JISC Grant Funding Call
Name of Programme & Strand:
Programme Tags:
Name of Call Area Bidding For:
Name of Lead Institution:
Name of Department where
project would be based:
Full Name of Proposed Project:
Name(s) of Partner HE/FE
Institutions Involved:
Name(s) of Partner
Company/Consultants Involved:
Full Contact Details for Primary
Lead and/or Contact for the
Project:
Length of Project:
Project Start Date:
Project End Date:
Total Funding Requested from
JISC:
Funding Broken Down over
Financial Years (April - March)
Project Description / Abstract:
Keywords describing project:
I have looked at the example
FOI form at Appendix B and
included an FOI form in the
attached bid
I have read the Call, Briefing
Paper and associated Terms
and Conditions of Grant at
Appendix D
Information Environment 2011 Programme: Exposing digital
content for education and research
"INF11" and "JISCexpo"
Strand B - Expose
Queen Mary University of London
School of Electronic Engineering and Computer Science
Linked Music Metadata
None
The Metabrainz Foundation
Name:
Position:
Email:
Tel:
Address:
Simon Dixon
Lecturer
simon.dixon@elec.qmul.ac.uk
020 7882 7681
Elec Eng, Queen Mary University of London,
Mile End Rd, London
Postal Code: E1 4NS
12 months
01/07/10
30/06/11
£94,894.00
July 2010- March 2011: £71301
April 2011 - June 2011: £23593
The MusicBrainz user community has created a
metadatabase describing over 9 million musical recordings,
which is used by media players and web services such as the
BBC's /music. The data is structured but not linked. Part of
this data has been made available on the semantic web in the
past, but it lacks recent extensions and updates, as it was not
based on a sustainable model. With the upcoming release of
MusicBrainz' Next Generation Schema (May 2010, beta), it is
an appropriate time to map the new metadata to RDF and
publish the Linked Data directly from the MusicBrainz
website. This data will be linked to music metadata on the
semantic web (e.g. DBpedia, BBC), and exposed via a
SPARQL endpoint. We will engage with end users in the
Music and Music Informatics fields, providing tutorial materials
and workshops to encourage uptake of project outputs. We
will investigate and report on major issues arising in the
project, such as scalability, provenance and sustainability.
Music metadata, MusicBrainz
YES
YES
1
Appropriateness and Fit to Programme Objectives & Overall Value
to JISC Community
1.
Entertainment industries have been revolutionised by digital technologies, which have fundamentally
changed the means of creation, production and distribution of media items. Alongside the new business
models, new fields of research, such as Music Informatics, have arisen, and have a strong presence in UK
Higher Education (HE) institutions (Lee et al., 2009). Likewise, traditional academic fields such as Musicology have benefited from the new opportunities provided by technology, in particular the (semi-)automated
analysis (Cook, 2007) and visualisation (Cannam et al., 2006) of musical recordings.
Technology has radically increased the scale at which work based on collections of musical recordings
is performed, but this has also introduced new problems concerning the management, navigation and
annotation of these super-collections. Vast quantities of metadata concerning musical recordings can be
found on the Web, but this is typically unstructured and difficult to navigate automatically. MusicBrainz
(musicbrainz.org) provides the largest free metadatabase about music recordings, and for several years
has been in the process of extending its scope from covering simple bibliographic information to becoming
a comprehensive music information site. As a community-based non-profit organisation, they lack the
resources to port their data to the semantic web. With the beta release of their next generation schema
(NGS) in May 2010, the opportunity is ripe for bringing this valuable metadata into the semantic web and
linking it with related resources.
2.
Objectives
3.
The objectives of this project are to:
(1) Publish the MusicBrainz metadatabase directly as a Linked Data resource on the semantic web;
(2) Link it to other music metadata on the semantic web;
(3) Create a semantic web query facility (SPARQL endpoint) for the MusicBrainz data;
(4) Develop tutorial materials to explain and illustrate how the content can be used;
(5) Engage the academic music and music informatics communities, and the broader semantic web community, in the development and use of the exposed content; and
(6) Investigate and report on issues such as scalability, provenance and sustainability of the resource.
Background
4.
5.
6.
MusicBrainz is an open user community that collects, maintains and makes available to the public, music
metadata in the form of a relational database. The MusicBrainz project was started as a free and open
compact disc identification service. It was intended as an alternative to Gracenote’s CDDB which also
began as a free service but then adopted strict licensing policies. The MusicBrainz project has grown
beyond the CD identification task and now provides a wealth of crowd-sourced structured data about
music. Using a well-defined set of community guidelines and a simple hashing approach for creating
unique identifiers for music artists, albums, and tracks, the MusicBrainz project has assembled what is
one of the cleanest and most comprehensive music metadata repositories on the Web.
The Linked Data community has already recognized the immense value of the MusicBrainz project and
MusicBrainz identifiers have been adopted by several Linked Data entities including the BBC Music website (http://bbc.co.uk/music). Although the MusicBrainz project had previously provided descriptions
of resources in RDF/XML, this approach was abandoned in favor of an ad-hoc XML serialization. This
decision was made nearly five years ago before the Linked Data movement had the traction it has now,
and the tools for parsing and authoring RDF were not as mature. Currently, the MusicBrainz project
does not directly provide Linked Data although, as mentioned before, it does provide unique identifiers for
music-related entities that Linked Data practitioners find particularly attractive for the minting of URIs.
For several years, we at the Centre for Digital Music (C4DM) have provided an RDF translation of the MusicBrainz data through the DBTune.org project, which was one of the first Linked Data entities to advocate
the use of MusicBrainz identifiers for music-related URIs. Hosted by C4DM, DBtune.org has been an
integral part of the Linking Open Data movement (Bizer et al., 2007). C4DM was also the key player in the
development of the Music Ontology (Raimond et al., 2007), which is widely used in the Linked Data community and generally accepted as the most comprehensive and flexible ontological model for the music
domain. The existing translation of the MusicBrainz data maps the MusicBrainz schema to the Music Ontology and provides a Linked Data version of MusicBrainz in parallel to the original MusicBrainz resource.
A SPARQL endpoint for querying the MusicBrainz translation is also provided. (SPARQL is an RDF query
language standardised and recommended by the World Wide Web Consortium.) While the MusicBrainz
translation has been a useful Linked Data resource, it is not updated automatically from MusicBrainz, and
7.
manual updates are not performed regularly. Further, it only translates a subset of the available metadata.
Members of Linked Data community agree that assisting the MusicBrainz community in making the MusicBrainz website a full-fledged Linked Data source is the way forward (Jacobson et al., 2009a). In this
way, MusicBrainz edits will be propagated immediately to the semantic web (they are currently over a year
out of date) including the increasingly rich metadata which is becoming available.
As the MusicBrainz project has evolved, additional types of data have been included. These new types of
data usually take the form of a MusicBrainz Advanced Relationship (AR). The MusicBrainz community has
grown a series of ARs organically and over the last two years MusicBrainz has been working to crystalise
these additional concepts into a new database schema called the Next Generation Schema (NGS) (Kaye,
2008). The NGS includes additional structured data for example associating a lyricist with a particular
track or a live-performance album with the original studio album. Such relationships are not included
in the original DBTune.org MusicBrainz RDF translation. The impending release of MusicBrainz NGS
(Kaye, 2010) makes the present a highly appropriate time for the Linked Data community to assist the
MusicBrainz project in creating an RDF translation of the Next Generation Schema and publishing Linked
Data directly from the MusicBrainz website.
Users and Needs
8.
In making music metadata available on the semantic web, we are addressing the needs of two primary
types of user in the HE sector: (i) those working in the Music Informatics (or Music Information Retrieval)
community, primarily in computing and engineering departments; and (ii) musicologists and musicians
working in music departments. This is in addition to the (sizable) international MusicBrainz user community, who also stand to benefit from the linking of their data to other semantic web resources, as well as
software developers working with on-line music services, and their users, who will indirectly benefit from
this project.
9. During the OMRAS-2 project, we have been working closely with Music Informatics specialists (e.g. Prof.
Geraint Wiggins and Tim Crawford at Goldsmiths) and Musicologists (Profs. Dan Leech-Wilkinson of Kings
College London and Nick Cook of Cambridge University and their PhD students and RAs). We have also
recently commenced work with the British Library, and via the NEMA project (nema.lis.uiuc.edu) we
colloborate with international leaders in Music Informatics and Computational Musicology. Through these
colloborations we have developed a good understanding of these users’ needs. The need that is central
to this project is that of being able to identify musical entities (e.g. artists and recordings) unambiguously.
Until the last decade, this was rarely an important issue. Data sets were small and their owners were
aware of their content, often at an expert level. But as the size of music collections increases, and data
is processed automatically rather than manually, it becomes essential to manage the metadata about the
collections in a principled way. Linked data provides a means of joining different information sources,
enabling the added value of “unexpected re-use of information” (Berners-Lee, 2006) to be realised.
10. The utility of linked data is however limited by the extent of the interlinking that exists between related data
sets, and the current multiplicity of URIs for artists, albums and tracks is a potential hindrance to the goals
of linked data. Since MusicBrainz provides the most extensive open metadatabase for musical recordings
and is already a de facto standard in its field, it would be advantageous to establish it as a standard for
Linked Data as well. For this to take place, the database must be exposed on the semantic web with
infrastructure to ensure that updates are automatic or propagated in a timely and sustainable manner.
11. The success of the project can be measured in immediate terms by semantic web traffic: the number of
web site hits with an “Accept RDF” header, or the number of distinct users connecting to the SPARQL
endpoint. In the longer term, the use of MusicBrainz URIs in third party tools and services, and eventually
new studies (enabled by being able to query the linked data) will also indicate the success of the project
(see also Impact below).
2
Quality of Proposal and Robustness of Workplan
12. The work of the project is divided into 5 work packages (WP), three of which cover the technical work and
one covers each of dissemination and management. WP1-3 address respectively the three areas of work
listed in the Call (paragraph 29), namely: (i) “Make a collection of resources available ...”; (ii) “Develop a
prototype ...”; and (iii) “Explore and report on the opportunities and barriers ...”. The work plan is illustrated
in the Gantt chart overleaf.
Linked Music Metadata - Diagrammatic Work Plan
(Numbers show estimated percentage of effort between work packages)
Key: M = month, D = deliverable, WP = work package
WP1 Convert MusicBrainz to linked data
1.1 Mapping NGS schema to RDF
1.2 Content negotiation
1.3 Linking to other data sets
M1 M2 M3 M4 M5 M6 M7
50 70 75 75 70 20
D1.1
WP4 Dissemination and engagement
4.1 Public engagement
4.2 Workshop 1
4.3 Workshop 2
4.4 Other dissemination activities
WP5 Management
M9
M10 M11 M12
D1.2
WP2 Prototypes and Tutorials
2.1 Creation of SPARQL endpoint
2.2 Production of tutorial materials
WP3 Opportunities and barriers
3.1 Trust and provenance
3.2 Opportunities and barriers
M8
40
75
75
75
70
D2.1
D2.2
10
10
5
5
5
5
60
D2.3
5
5
5
5
5
50
D3.1
30
D4.1
10
10
10
15
25
10
10
10
15
25
40
D4.2
D4.2
D4.3
10
10
10
10
10
10
10
10
10
10
10
10
WP1: Convert MusicBrainz to linked data (Months 1–6)
13. WP1.1 Mapping of MusicBrainz NGS database schema to RDF: The MusicBrainz community is in
the final stages of creating a more expressive schema for describing music-related metadata, called the
Next Generation Schema (NGS) (Kaye, 2010). While NGS provides clear structure and semantics for the
MusicBrainz data, it does not directly provide Linked Data. To adhere to the principals of Linked Data
the MusicBrainz NGS must be expressed using RDF (Berners-Lee, 2006). A mapping of the MusicBrainz
NGS to appropriate OWL/RDFS ontologies is required. This task will involve feedback from the JISC
community in the form of mailing list discussions on the Linking Open Data mailing list (Linked Data
Community, 2010) as well as the Music Ontology Specification Group and the MusicBrainz community.
The process of soliciting community feedback on early iterations of the mapping will ensure the use of
the most appropriate and widely accepted ontologies and will encourage use of the resulting Linked Data
resources output by the project.
14. WP1.2 Implementation of content negotiation and serving of RDF: We will contribute appropriate
content negotiation code to the MusicBrainz server code base. Most of the resources described by the
MusicBrainz project are non-information resources, that is the URIs refer to real-world things such as
music artists or albums. Following practices of Linked Data, these URIs should provide 303 redirects
to appropriate information resources when they are dereferenced via HTTP. Depending on the content
header in the HTTP request the redirect will point to either a human-readable HTML page or a machinereadable RDF document. If for some unforeseen reason including redirects and content negotiation in the
MusicBrainz server proves to be impossible, RDFa can be embedded directly in the MusicBrainz HTML
documents.
15. WP1.3 Linking to other data sets: We will provide appropriate links to other datasets in the MusicBrainz
RDF data to best meet the Linked Data recommendations. The MusicBrainz NGS contains a wealth of
links to external resources including BBC Music, Discogs, IMDB, Wikipedia, and Myspace. These links
can be used to create appropriate links to the corresponding Linked Data resources (i.e. DBPedia.org
resources). Additional links can be automatically generated using the graph matching approach proposed
by Raimond (2009), and then manually checked using the tried-and-tested crowd-sourcing framework that
powers the MusicBrainz project.
16. Deliverable D1.1 (M2): RDF mapping of MusicBrainz NGS schema
17. Deliverable D1.2 (M6): Publication of the MusicBrainz metadatabase to the semantic web
WP2: Creation of prototypes and tutorial material (Months 6–11)
18. WP2.1 Creation of SPARQL endpoint: We will create and maintain a SPARQL endpoint that allows users
to query the MusicBrainz RDF. In our previous work with DBTune.org we have served RDF from a Postgres
database using a D2R server which serves as a translation layer between the relational database and a
SPARQL endpoint. While the D2R software is of great utility, its performance as a SPARQL engine is limited by the underlying database schema. We plan instead to perform an RDF dump from the MusicBrainz
database into a purpose-built triple store (e.g. 4store). We will need to address the scalability issues which
arise when serialising a database of this size. Assuming the use of a purpose-built triple store provides
significant performance gains compared to a D2R server configuration, an infrastructure that automates
the RDF dumping task will be implemented such that changes to the MusicBrainz dataset propagate to the
SPARQL endpoint at regular intervals making the SPARQL endpoint resource sustainable and up-to-date.
19. WP2.2 Production of tutorial materials: We will design and produce tutorial materials, including sample
SPARQL queries, screencasts and videos, describing how the structured data can be queried and accessed by third party tools and services. These tutorial materials will be released on our web site, as well
as being presented at our workshops (see WP4). For the first workshop (M6), the more general tutorial
material on music data and the semantic web presented at ISMIR 2009 will be refreshed. The second
workshop (M11) will focus specifically on the project outputs.
20. Deliverable D2.1 (M10): SPARQL endpoint serving MusicBrainz data
21. Deliverable D2.2 (M6): Semantic web and music data tutorial materials
22. Deliverable D2.3 (M11): Tutorial materials on using SPARQL to query the MusicBrainz data
WP3: Evaluation of opportunities and barriers (Months 1–12)
23. WP3.1 Trust and provenance: The MusicBrainz database tracks the provenance of the data (complete
edit history), so it would be possible to publish this with the MusicBrainz data, in consultation with the MusicBrainz community. We will investigate other trust and/or provenance issues as necessary in conjunction
with WP1 and WP2.
24. WP3.2 Opportunities and barriers: Issues will be tracked continually throughout the project as they arise
(e.g. scalability in WP2.1). They will initially be discussed at weekly project meetings and notable issues
will be logged on the project blog. These records will form the basis for a report which will be compiled at
the end of the project.
25. Deliverable D3.1 (M12): Report on opportunities and barriers in publishing the MusicBrainz database as
a semantic web resource.
WP4: Dissemination and engagement (Months 1–12)
26. Different stakeholders will be engaged at various stages of the project. In the initial stages, the MusicBrainz
and Linked Open Data communities will be engaged in the design of the RDF mapping (see WP1.1).
Engagement with the JISC community will take place throughout the project via JISC programme events
and the networking of the JISC Developer Community. End users in the higher education sector will be
involved via two workshops in months 6 and 11.
27. WP4.1 Public engagement: A project web site and blog will be established and maintained to report the
progress of the project to the general public.
28. WP4.2 Workshop 1: The first workshop will be held in conjunction with the annual Digital Music Research
Network workshop at Queen Mary, which is widely attended by national and some international academics
in the Digital Music field. We will present the results of publishing the MusicBrainz data (WP1), introduce
our plans for the remainder of the project, and seek feedback from the end user community concerning
the direction of the project. We shall also present an introductory tutorial on music information and the
semantic web for those who are new to the field. The workshops will be open to members of the research
community and the general public, and will be advertised in relevant mailing lists and web sites.
29. WP4.3 Workshop 2: The second workshop will be held at the end of M11 and seek to promote the use
of the linked data resources in the Music and Music Informatics communities. Advanced tutorials will be
presented, explaining how to build tools and services using the SPARQL endpoint.
30. WP4.4 Other dissemination activities: Outcomes of the work will be published in journals (e.g. Computer
Music Journal, Journal of New Music Research) and major international conferences (e.g. International
Society for Music Information Retrieval Conference), as well as JISC’s programme meetings and other
events. We will also make the project known through our related projects such as OMRAS-2 (www.omras2.
org), Software Sustainability (http://gow.epsrc.ac.uk/ViewGrant.aspx?GrantRef=EP/H043101/1) and
NEMA (nema.lis.uiuc.edu). In M12, the RA will visit end user sites to work with users who attended the
workshops and require assistance in getting started with building applications.
31. Deliverable D4.1 (M1): Establishment of project blog and web site, Core Resources Form and public
version of project plan
32. Deliverable D4.2 (M6,M11): Delivery of two workshops
33. Deliverable D4.3: At least one publication in a refereed journal and international conference. (Due to the
length of the project these are likely to occur after the project has ended.)
WP5: Management (Months 1–12)
34. The project will be managed on a day-to-day basis by the PI, with project meetings held weekly to assess
progress and problems. This has been our practice throughout OMRAS-2. At the workshops, and at
conferences that we visit, we will hold meetings with key representatives of our user communities, and use
their feedback to steer remaining parts of the project and future work.
35. Deliverable 5.1: Reports, budgets, plans as required by JISC.
Risk Analysis
36. We have identified a number of risks, which are presented alongside mitigation actions in the following
table.
Risk
Likelihood
Impact
Mitigating Action
Unable to recruit RA
Low
High
The named RA (Jacobson) is eager to work on this
project; if unable to do so, we have other RAs (e.g.
Dan Tidhar) who would also be able to work on this
project.
The named RA does
not complete PhD on
time
Modest
Medium
In negotiation with JISC we could delay the project
start date, or if this is not possible, employ someone
else (see above).
The MusicBrainz user
community
votes
against the proposed
changes
Low
Medium
We plan to involve the user community from the
beginning of the project (see WP1.1) so that their
feedback shapes design decisions, which are then
reflected in successive iterations of the RDF mapping. Nevertheless, if the user community still fails to
support the project, we can mirror the MusicBrainz
data and keep it current via live feed.
UK HE end users fail
to see the relevance of
the work.
Medium
High
We collaborate with a number of end users in both
music information retrieval (e.g. Goldsmiths, University of London) and musicology (e.g. Cambridge
University and King’s College London), and will seek
feedback from them at the planned workshops (see
WP4) to ensure that the project outputs will be highly
relevant to users.
Software development
deadlines not met
Modest
Medium
Our group has extensive experience in software
engineering that we can draw upon if problems are
encountered. We shall also contribute to and draw
upon the mutual support available via the JISC developer community (http://devcsi.ukoln.ac.uk/).
Management difficulties encountered
Low
High
The PI leads a group of 5 PhD students and has
experience as CI and/or RA on a number of national
and EU projects. He can draw on the wealth of
experience of the CI (Sandler) and other senior
members of C4DM as necessary.
Intellectual Property Rights
37. The MusicBrainz database contains two types of information. The core data, consisting of factual data
such as the names of albums, tracks and artists, is in the public domain. The remaining data, such as an-
notations, tags, opinions and ratings are protected by the Creative Commons Attribution-NonCommercialShareAlike 2.0 licence, which allows non-commercial use of the data under the conditions that MusicBrainz
is given credit and that any derivative works are also made available under the same licence. Thus the
linked data published by this project will follow the same licence arrangement. In particular, this allows the
outputs of the project to be used freely in teaching and research. Software produced in the project will be
protected by an open source licence in order to freely allow non-commercial use of the code. The exact
terms will be determined in consultation with OSSwatch.
Sustainability
38. C4DM has been addressing the issues of sustainability and reproducible research in our field, as evidenced by our recently awarded £940k EPSRC Grant, Sustainable Software for Digital Music and Audio
Research (EP/H043101/1, 2010-2014). The Sustainable Software project will provide us with infrastructure to ensure the maintenance of the current project’s outputs at least until 2014 (i.e. beyond JISC’s
minimum of 2 years), but also seeks to develop a model by which an income stream is generated to fund
ongoing maintenance of software and data.
Recruitment
39. We do not anticipate any difficulties with recruitment, as we have a large research group, part of which is
actively involved in researching and using semantic web technologies related to music data. In particular,
the named RA is currently completing his PhD thesis and plans to be ready for the start of this project. If
this does not work out, we have flexibility in the OMRAS-2 grant to release one of the RAs to work on the
project (see Risk Analysis above).
3
Engagement with the Community
40. The main stakeholders in the project are people who use information about musical recordings. In the
HE sector, this includes Music Informatics researchers and practitioners, who develop new algorithms for
analysing, navigating, manipulating and understanding musical works and collections; and music teachers,
students and researchers, for whom recordings are works of art exemplifying the performers’ technical
mastery and interpretative skill. Other stakeholders include the Linked Data community, JISC community
and MusicBrainz user community.
41. Various stakeholders will be engaged at different parts of the project, as described in WP4. The NGS
mapping (WP1) will take place in consultation with the MusicBrainz user community via their mailing list,
IRC channel and forums, and with the Linked Data community via the Linked Open Data (LOD) mailing
list. We will promote the project to key representatives of the end-user community early in the project, at
the ISMIR conference (Aug 2010), by a planned presentation in the Late Breaking session and by personal
communication. The UK user community will be reached via our workshops (see WP4), the first of which
we be held in conjunction with DMRN 2010 (www.dmrn.org) at Queen Mary. We have direct experience
of this type of interaction, having organised a workshop for invited researchers from the UK, Europe and
USA in December 2008, as part of the OMRAS-2 project. The feedback from this involvement will be taken
into account in further work. The second workshop in month 11 will showcase the SPARQL endpoint for
the MusicBrainz database and feature tutorials demonstrating its use in solving typical information needs
and potential use in answering new types of research questions. We will ensure that the tutorials are
suitable for a wide range of users, including non-experts in semantic web technologies. The semantic web
community will be reached via JISC events and the LOD list.
4
Impact
42. The current RDF translation of the MusicBrainz database contains only basic metadata and does not map
any of MusicBrainz’ Advanced Relationship data. This richer data is being refactored according to the
soon-to-be-released NGS, which, if exposed as linked data, will allow users to make more expressive
queries and receive more useful responses. For example, a musicologist or music student searching for
recordings of a particular orchestra or composer immediately faces a problem which renders the semantic
web useless to them: the current version of the MusicBrainz database uses the categories artist, album
and track, and for classical music, users might have entered the performer, composer, or a mixture of the
two in the artist field. This problem is addressed by the NGS. Likewise a music student might want to find
tracks featuring a particular saxophonist or lyricist, or find out if there are any live recordings of a particular
piece for which they have only a studio version. Further, an MIR researcher developing an algorithm that
classifies music by genre using audio features might want to test whether the recording engineer or studio
have an effect on classification results. These are the types of information needs that a linked data version
of the NGS could satisfy. For the wider semantic web community, the exposure of MusicBrainz’ NGS will
provide a hub for talking about music and lay the groundwork for advanced semantic web applications.
43. Sustainability of this work is ensured by publishing the data on the MusicBrainz server itself. MusicBrainz
has been running for about 10 years, and has support from major organisations such as Google and the
BBC. In the unlikely event of MusicBrainz ceasing to exist, the database will be set up on the DBTune.org
site. Software created in the project will be maintained by our project Sustainable Software for Digital
Music and Audio Research, which is funded by EPSRC until 2014, and has plans for supporting long-term
sustainability of research software. This strategy allows the project to have a continuing impact after it has
ended.
44. A cursory examination of the current DBTune.org MusicBrainz translation server logs shows that the service receives over 1000 unique visitors a month with the SPARQL endpoint alone receiving around 750
unique visitors. (Note that this represents less than 1% of MusicBrainz users, the majority of whom would
not currently be aware of the DBTune service.) We can compare this baseline level of traffic with the
traffic visiting the new SPARQL endpoint produced in WP2.1. We would expect this new endpoint to spark
additional interest as it would allow for more efficient execution of more expressive queries. This can
be measured both by the number of unique visitors and by the number of requests for linked data, i.e.
containing the “Accept RDF” header.
45. The longer term impact of the project, i.e. beyond the end of the project, can be measured in terms of
uptake in the user communities, for example by the tools and services which are enriched or enabled by
the availability of the data on the semantic web. In particular, we envisage that music students will have
tools for identifying and researching recordings, and that musicologists and MIR specialists will be able
to perform new types of research, answering new research questions involving orders of magnitude more
data than they would have previously been able to consider.
5
Budget (see table overleaf)
46. A full-time postdoctoral research assistant (RA) with skills in software development and semantic web
technologies is required for the 12-month duration of the project. The RA will perform the design, software development, updating the blog, and promoting the project to the user community. The named RA
(Jacobson) has a track record in developing semantic web services for music and performing system administration for the DBTune.org site. He has also promoted the use of the semantic web for music data by
giving a tutorial on this topic at the ISMIR 2009 conference in Japan.
47. The time allocation for the PI (Dixon) is factored at 10% which will be used for project management, staff
management, reporting and research, as well as attendance at the start-up meeting, JISC Programme
meetings, meetings with the Programme Manager and dissemination and evaluation events. The involvement of CI Sandler at 5% will focus on realisation of opportunities for knowledge transfer arising from the
project and the integration of work from the NEMA and OMRAS-2 projects, for which he is PI.
48. Under hardware/software we request one laptop computer (dual-boot Linux/Windows) for the RA to be
used for development work, document preparation, presentations, demonstrations and visits to user sites.
We request a Lenovo ThinkPad X201 (£941) for this purpose, and allow £500 for any necessary software
licences or other consumables. C4DM will provide access to a computing cluster and large-scale storage
facilities as an in-kind contribution.
49. For dissemination, we request £2000 to cover the cost of 2 one-day workshops (room and AV hire £400;
catering £525 for lunch and two coffee breaks for 30 people; printing costs £75). Other dissemination
costs and evaluation costs come under the travel and expenses heading, which includes the cost of the
RA travelling to one international conference (ISMIR 2010: travel £450, hotel and subsistence £400,
registration £350), and the cost of the project team attending JISC events (Programme meetings, dissemination events, evaluation activities), local conferences and meetings, and visits to user sites to promote
the project, which we estimate will amount to £1000.
50. The host institution has a large research group (see below) working at the interface of music and technology. In particular, we are very active in the Music Information Retrieval (MIR) community, working on the
annotation and navigation of large music collections using semantic web technologies (see for example
www.omras2.org). Exposing and linking the MusicBrainz database will be beneficial for present and future
projects (see Impact above). We will contribute 20% of the cost of this project, on top of the in-kind contributions mentioned above. MusicBrainz will benefit from the exposure of their metadata on the semantic
web, and will make in-kind contributions of access to their data and computing resources, and technical
assistance in achieving the project goals.
Directly Incurred Staff
Apr10–Mar11
Apr11–Mar12
TOTAL £
Postdoc RA, Grade 5, 1650 hours, 100% FTE
31191
10397
41588
Total Directly Incurred Staff (A)
31191
10397
41588
Apr10–Mar11
Apr11–Mar12
TOTAL £
Travel and expenses
Hardware/software
Dissemination
Evaluation
Other
1950
1451
1000
0
0
250
0
1000
0
0
2200
1451
2000
0
0
Total Directly Incurred Non-staff (B)
4401
1250
5651
35592
11647
47239
Apr10–Mar11
Apr11–Mar12
TOTAL £
8350
9413
702
2783
3137
234
11133
12550
936
Total Directly Allocated (D)
18465
6154
24619
Indirect Costs (E)
35069
11690
46759
Total Project Cost (C+D+E)
89126
29491
118617
Amount requested from JISC
Institutional contributions
71301
17825
23593
5898
94894
23723
Non-staff
Directly Incurred Total (C=A+B)
Directly Allocated
Staff
Estates
Other
Percentage Contributions Over Life of Project
No. FTEs used to calculate indirect and estates
charges, and staff included
6
JISC
80%
Partners
20%
1.15
Total
100%
Jacobson (1.0)
Dixon (0.1)
Sandler (0.05)
Previous Experience of the Project Team
51. The Centre for Digital Music (C4DM) at Queen Mary University of London is a world-leading multidisciplinary research group in the field of Music & Audio Technology. C4DM has around 50 members working
on signal processing of digital music, music informatics, machine listening, audio engineering and interactive performance. Research funding obtained since 2001 totals over £14m, mainly from the EPSRC and
EU. Current projects include Sustainable Software for Digital Music and Audio Research (EP/H043101/1,
2010-2014), the £2.5m Online Music Recognition and Searching II (OMRAS-2, EP/E017614/1, with Goldsmiths, University of London, 2007-10), an EPSRC Platform Grant (EP/E045235/1, 2007-12), the £5.9m
Doctoral Training Centre in Digital Music and Media for the Creative Economy (EP/G03723X/1, 20092017), and Networked Environment for Music Analysis, funded by the Andrew W. Mellon Foundation
(2008-10).
52. In the area of this proposal, music informatics and the semantic web, we have a strong research track
record, including publications on ontologies (Raimond et al., 2007), music recommendation using Linked
Data (Passant and Raimond, 2008), and music similarity on the semantic web (Jacobson et al., 2009b).
The DBtune.org project has been hosting music-related RDF data since 2007 and has been an integral
part of the Linking Open Data movement (Bizer et al., 2007). Created and hosted by C4DM, the DBtune.
org project was one of the first Linked Data entities to advocate the use of MusicBrainz identifiers for musicrelated URIs and currently hosts an RDF translation of the MusicBrainz data based on the old database
schema. The Music Ontology (Raimond et al., 2007) is widely used in the Linked Data community and
generally accepted as the most comprehensive and flexible ontological model for the music domain.
53. Dr Simon Dixon (PI) is a Lecturer in Electronic Engineering at Queen Mary and leads C4DM’s Music
Informatics group. He has a PhD (Sydney) in Computer Science (in the area of knowledge representation
and reasoning) and LMusA diploma in Classical Guitar. His research interests cover various aspects of
music informatics, including high-level music signal analysis and the representation of musical knowledge
(particularly rhythm and harmony). He is CI on OMRAS-2 and was CI on Interfaces to Music (Vienna
Science and Technology Fund, 2004-2007). He is author of the beat tracking software BeatRoot (ranked
first in the MIREX 2006 evaluation of beat tracking systems) and the audio alignment software MATCH
(Best Poster Award, ISMIR 2005), and co-author of the top-ranked Audio Chord Detection and Music
Structure Segmentation systems (MIREX 2009). He was Programme Co-Chair for ISMIR 2007, and copresented the ISMIR 2006 tutorial on Computational Rhythm Description.
54. Prof Mark Sandler (CI) is Director of C4DM. He became a Professor of Signal Processing at Queen
Mary in 2001, following 19 years at King’s College, where he was also Professor of Signal Processing. He
is/was PI on OMRAS-2 and SIMAC. He was General Chair of DAFx’03 and General Co-Chair of ISMIR’05.
He is Chair of the Audio Engineering Society Technical Committee on Semantic Audio Analysis. He is a
Fellow of the IET and AES. He is Director of the Doctoral Training Centre in Digital Music and Media for
the Creative Economy.
55. Kurt Jacobson (RA) is currently a doctoral student at C4DM working on the OMRAS-2 project. As assistant administrator of DBTune.org he has worked to create Semantic Web services for music including
a service publishing structured data about music artists on Myspace and musicological data about classical music composers. He is working on modeling and exploring connections in music using structured
data from heterogeneous sources including historical musicology, social networks, and audio analysis. He
was co-presenter of the ISMIR 2009 tutorial on the semantic web and music information, titled Share and
Share Alike, You Can Say Anything about Music in the Web of Data.
56. The MetaBrainz Foundation (http://metabrainz.org/) is a non-profit organisation based in San Luis
Obispo, California, USA, that operates the MusicBrainz project, a user maintained community music metadatabase. The MusicBrainz database currently contains metadata on over 9 million recordings from over
half a million artists. Metabrainz is supported by donations from companies such as Sun and Google,
as well as private donations, and they license their data commercially to partners such as the BBC and
ZEEZEE (zeezee.de). The BBC has also provided them with data from their Orpheus classical works
database (containing over 100000 works) to aid the development of the next generation schema (NGS).
References
Berners-Lee, T. (2006). Linked data. Available at http://www.w3.org/DesignIssues/LinkedData.html.
Bizer, C., Heath, T., Ayers, D., and Raimond, Y. (2007). Interlinking open data on the web. In Demonstrations
Track, 4th European Semantic Web Conference.
Cannam, C., Landone, C., Bello, J. P., and Sandler, M. (2006). The Sonic Visualiser: A visualisation platform for
semantic descriptors from musical signals. In 7th International Conference on Music Information Retrieval,
pages 324–327.
Cook, N. (2007). Performance analysis and Chopin’s mazurkas. Musicae Scientae, 11(2):183–205.
Jacobson, K., Humfrey, N. J., Raimond, Y., Brickley, D., and Idehen, K. (2009a). What about music-related
URIs??? See mailing list archive at http://lists.w3.org/Archives/Public/public-lod/2009Sep/0029.
html.
Jacobson, K., Raimond, Y., and Sandler, M. (2009b). An ecosystem for transparent music similarity in an open
world. In 10th International Conference on Music Information Retrieval, pages 33–38.
Kaye, R. (2008). Next generation schema: Where we are today. Available online at http://blog.
musicbrainz.org/?p=351.
Kaye, R. (2010). NGS beta 2: May 24th 2010. Available online at http://blog.musicbrainz.org/?p=527.
Lee, J., Jones, M., and Downie, J. (2009). An analysis of ISMIR proceedings: Patterns of authorship, topic,
and citation. In 10th International Society for Music Information Retrieval Conference, pages 57–62.
Linked Data Community (2010). Linking open data mailing list. Available online at http://lists.w3.org/
Archives/Public/public-lod/.
Passant, A. and Raimond, Y. (2008). Combining social music and semantic web for music-related recommender systems. In Semantic Web Workshop.
Raimond, Y. (2009). A Distributed Music Information System. PhD thesis, Queen Mary University of London,
Centre for Digital Music.
Raimond, Y., Abdallah, S., Sandler, M., and Giasson, F. (2007). The music ontology. In 8th International
Conference on Music Information Retrieval, pages 417–422.
FOI Withheld Information Form
1. We would like JISC to consider withholding the following sections or paragraphs from
disclosure, should the contents of this proposal be requested under the Freedom of
Information Act, or if we are successful in our bid for funding and our project proposal is
made available on JISC’s website.
NONE
2. We acknowledge that the FOI Withheld Information Form is of indicative value only and that
JISC may nevertheless be obliged to disclose this information in accordance with the
requirements of the Act. We acknowledge that the final decision on disclosure rests with
JISC.
1
JISC, Northavon House,
Coldharbour Lane,
Bristol, BS16 1QD.
Queen Mary, University of London
Mile End Road, London E1 4NS
Telephone: 020 7882 5209
Facsimile: 020 8980 6533
Website: www.dcs.qmul.ac.uk
School of Electronic Engineering
and Computer Science
Professor E.P. Robinson,
MA PhD, FBCS, CiTP
Head of School
Email: e.p.robinson@dcs.qmul.ac.uk
20/4/10
To whom it may concern,
I am writing in strong support of the proposal Linked Music
Metadata which is being submitted to the JISC Information
Environment Funding Call 2/10: Deposit of research outputs
and Exposing digital content for education and research, by Dr
Simon Dixon and Prof Mark Sandler of the Centre for Digital
Music in the School of Electronic Engineering and Computer
Science. The Centre for Digital Music is a world-leading
research group in the area of Music and Technology, with
particular strengths in Music Informatics, Machine Listening and
Interaction. For several years, the Music Informatics branch has
included a strong semantic web component, and they have
made important contributions to the Linking Open Data
movement, including their work on the Music Ontology and the
DBTune.org service. The current project will allow the
continuation and expansion of the DBTune project, and will
provide a metadata resource which will be of potential benefit to
all academics (and others) working with music-related
information. The project fits well with the research aims of the
School, and I strongly support this application.
Yours sincerely,
Edmund Robinson
Patron: Her Majesty the Queen
Incorporated by Royal Charter as
Queen Mary & Westfield College,
University of London