JISC Grant Funding 03/09 Cover Sheet for Proposals (All sections must be completed) Rapid Innovation Programme Name of JISC Initiative: JISC Rapid Innovation Grants Name of Lead Institution: University of Bradford Name of Proposed Project: Concept Linkage in Knowledge Repositories Name(s) of Project Partner(s): National Media Museum, Bradford (part of the National Museums of Science and Industry Group) Full Contact Details for Primary Contact: Name: Position: Professor Peter Cowling Associate Dean (Research) Professor of Computer Science School of Computing, Informatics & Media p.i.cowling@bradford.ac.uk 01274 234005 School of Computing, Informatics & Media Email: Tel: Fax: Address: University of Bradford Bradford BD7 1DP Length of Project: 6 months Project Start Date: Project End Date: 1/6/09 Total Funding Requested from JISC: 30/11/09 £32,000 Funding Broken Down over Academic Years (Aug-July): Aug08 – July09 £10,667 Aug09 – July10 £21,333 Total Institutional Contributions: Outline Project Description Knowledge repositories proliferate at an accelerating rate. While these offer excellent support for specific information searches, there is limited support for unstructured browsing or semi-structured information gathering, when a user does not know what i there is to know (but wants to find information connecting known concepts). Students making the transition from School to University often feel swamped by information and need to develop skills in information literacy. There is strong evidence that Wikipedia is a very important source of information for University students (consider the JISC SEEL project), especially in year one. Tools for understanding the structure of information in these large repositories and for conducting semi-structured queries are needed by University students and by the general public. This project will build a tool for semi-structured searching of knowledge repositories based on finding previously unknown concepts that lie between other concepts. Consider a user who wanted to know about optimisation of crystal structures. A search which looks for concepts which lie between and hence connect “optimisation” and “crystal structure” may turn up previously unknown concepts such as “genetic algorithms” or “space groups” – which would be very difficult to find via conventional approaches to search (which assume that the user has a good understanding of what terms to search for). This project brings together the School of Computing Information and Media and the Teaching Quality Enhancement Group at the University of Bradford, together with the National Media Museum (based in Bradford). Hence the project has a range of critical friends to increase applicability, take up and longevity of the developed tool. The National Media Museum is in the process of assembling a gallery about the history, evolution and social impact of the Internet, and the proposed project may yield a tool which may form part of that gallery, in which case a very large number of users will try the system via the National Media Museum’s web site and physical gallery space. Working with the National Media Museum provides a unique opportunity to add value to the project by raising public understanding and awareness of new ways to understand and use information repositories, including the largest repository of them all, the Internet. Please note that the future Internet gallery is not yet in the public domain and therefore the National Media Museum would need to review and approve any press releases or information concerning the gallery, or the relationship of this project to the gallery. List of priority areas, highlight each that applies: Mashups of open data *** Aggregating tags and feeds Semantic web/ linked data *** Data search *** Visualising Data *** Personalisation Mobile Technologies Lightweight Shared Infrastructure Service User Interface Design *** I have looked at the example FOI form at YES NO Appendix A and included an FOI form in √ the attached bid (Tick Box) I have read the Funding Grant and associated Terms and Conditions of Grant at Appendix B (Tick Box) YES √ ii NO B. FOI Withheld Information Form We would like JISC to consider withholding the following sections or paragraphs from disclosure, should the contents of this proposal be requested under the Freedom of Information Act, or if we are successful in our bid for funding and our project proposal is made available on JISC’s website. We acknowledge that the FOI Withheld Information Form is of indicative value only and that JISC may nevertheless be obliged to disclose this information in accordance with the requirements of the Act. We acknowledge that the final decision on disclosure rests with JISC. Section / Paragraph No. F Relevant exemption from disclosure under FOI Budget iii Justification Pricing is commercial in confidence C. Appropriateness and Fit to Programme Objectives and Overall Value to the JISC Community 'Long before its competitors, Google's knack was to realise that people navigated their way through Cyburbia not by information per se, but by examining the relationship between bits of information'. James Harkin, Cyburbia, 2009. 1. 2. 3. 4. 5. 6. Consider a first year undergraduate student writing a first essay. The information resources available are immense, and possibly intimidating. Many students will use Wikipedia as a first point of reference. However, there is a problem with using Wikipedia, the Internet, or another knowledge repository – the student will find it very difficult to find new concepts. For example, suppose they wish to try to find a link between the second world war and the local area. A search for “second world war” or for the local area using a search engine such as Google, is unlikely to yield useful information. The required information lies between these two concepts (in a space of concepts which are unknown, and hence unsearchable, for the student). This project aims to address the first priority area in the JISC 03/09 Rapid Innovation Grants call – that of “Information seeking by learners, teachers and researchers”. We will build a software tool which allows the user to search for concepts which lie in an unknown (and hence unsearchable) space of concepts “between” other known concepts (the notion of “betweenness” is explained in more detail below). There is the potential here to revolutionise ideas of Internet and knowledge repository searching, but at a realistic level there is great potential to provide a powerful tool for learning and research, and to provide greater understanding of the structure of knowledge in the repository. We consider the concepts in the knowledge repository as the nodes in a network, with the connections between concept “nodes” giving the degree to which two concepts are related. The “relatedness” value which we use can consider numbers of common words, or common links, etc. Experimental analysis of appropriate “relatedness” values is an important goal of this project. Once we have constructed the network we can investigate it using a range of search and visualisation approaches. For example, we can use shortest path algorithms to find concepts which lie between two concept nodes, or we can explore the degree and distance by which concepts are related, to understand the structure of knowledge in an area. The second part of the project will investigate algorithms and visualisation tools for exploration of the knowledge repository network. The project is linked to the following priority areas: “Mashups of open data”, “Semantic web/ linked data”, “Data search”, “Visualising Data”, and “User Interface Design”, since it will provide an easy-to-use interface for software tools for data search using the publically available Wikipedia data repository (although the techniques we will use could be applied to another repository, such as a publication database or a subset of the Internet, in a follow on project). In particular, the project will provide a novel way of visualising and exploring information by considering the information as a network of connected concepts. The project brings together the research, software and project management skills of the School of Computing, Informatics and Media at the University of Bradford, the expertise in new methods for teaching and learning of the Teaching Quality Enhancement Group at the University, and the public engagement focus of the National Media museum. Brief CVs are given towards the end of this document. We have demonstrated the technical (software, modelling, algorithm development, user interface development) skills for this project over many related projects, which have attracted several million pounds of funding from research council, EU and industry sources. Engagement with user groups which represent both University students and a broader interested public should provide a software tool which is effective in a broad range of teaching, learning, research and public understanding applications, and likely to provide long term benefits to a wide range of users. Sustainability of the project and longevity of the software will be assured by involving two large user groups (representing University of Bradford students and a broader scientifically curious public), and by releasing all source code using an open source arrangement to allow broader dissemination to other Universities and groups. 1 D. Quality of Proposal and Robustness of Workplan Outline of Project 7. This project will aim to provide software tools for visualisation and analysis of large knowledge repositories by analysing concept linkage. We will use information from the online encyclopaedia Wikipedia as it is easily accessible (and can be downloaded in its entirety onto a PC for development/analysis), and contains a huge amount of connected data. 8. Visualising large networks is a difficult task. This project will produce tools which allow visualisation of dynamically changing networks. The networks will change during a user search, and due to changes in the user-selected measures of connectedness between concepts. 9. We will conduct a short study of existing algorithms for comparing two online documents, using textual and link analysis. 10. Finding a “path” between two knowledge items will identify new and previously unknown knowledge to the user. Shortest path algorithms such as the A* algorithm, can allow the user to tailor their search. We will implement a “user-directed” version of the A* shortest path algorithm. 11. The project will develop tools for visualisation of the search process. A graph of intermediate nodes which connect the start concept to the end concept will be presented in such a way that the user can browse at an appropriate level of detail – from simply discerning intermediate network structure, through to inspection of highly relevant connecting concepts, to inspection of less immediately relevant concepts. In all cases, it should be possible to inspect a node to determine the detailed contents, and then easily return to the network search. Deliverables 12. This project will deliver the following: a. Algorithms for quantifying similarity between knowledge items in Wikipedia, and a user interface to allow the user to easily specify their similarity measurement preferences. These algorithms may later be generalised across other knowledge repositories and the Internet. b. Software for visualisation of concept linkage. This tool will use the above algorithms to provide a visual representation of how two knowledge items are linked, and to visualise concepts lying close to a “path” of knowledge items linking them. c. Evaluation of the tools. Members of the Teaching Quality Enhancement Group and the National Media Museum will evaluate the software tools in the context of student learning and research and in the context of public engagement, respectively. Workplan 13. Preparatory work will take place and will include the following: (1 month) a. Project website setup. b. Investigation of requirements in brainstorming/planning meetings of the whole project group, and finalisation of a detailed project plan. c. Identification and assessment of project risks. 14. Analyse/design/implement knowledge item similarity metrics: (1 month) a. Analyse data in the Wikipedia database to understand the format which our algorithms must use. b. Concept similarity measures will be investigated and algorithms/software created to empirically test these methods 15. Analysis/design of search algorithms for concept linkage. (1 month) a. Investigate/implement search algorithms using previously implemented similarity measures. 2 b. Empirically test efficiency at finding “interesting” paths between concepts. 16. Visualisation/user interface software development (2 months) a. Design/implement software for fast retrieval of partial/whole concept data from Wikipedia at three scales (title only, title/brief description, full). b. Create resizable visualisations of subnetworks of the overall concept network. c. Design/implement software for visualising the search methods at different levels of detail as a tool for exploring concept linkage. 17. Evaluation and testing (1 month) Throughout the project, there will be continuing evaluation of designs and software prototypes. At the end of the project there will be a more detailed user evaluation: a. Testing/evaluation of the developed software as an educational tool, by undergraduate/masters/research students from the University of Bradford. b. Testing/evaluation of the developed software as a tool for public understanding of knowledge repositories and the Internet, by National Media Museum staff. 18. Documentation/dissemination: a. Minutes of Meetings will be recorded and project progress blogged. The final project will be well documented with help files for easy dissemination. Project Management Arrangements 19. The Project Roles are: Stephen Remde – Software analysis/design/development. Day-to-day task management. Peter Cowling – Overall project management and high level design. Steering/Quality assurance for researcher stakeholder group. Peter Hartley/Will Stewart – Steering/quality assurance for University teaching and learning stakeholder group Joe Stocks-Brook/Tom Woolley – Steering/quality assurance for public engagement/National Media Museum stakeholder group 20. The Project Steering Committee will consist of all the above people. The group will have two half day meetings at the start of the project (to brainstorm and develop stakeholder requirements and high level design ideas). The Steering group will also meet half way through the task given above in point 16 (the earliest point when there is a fully functioning piece of software) to assess progress, to identify needed modifications of the part-finished design and to consider possible follow-up projects. A final meeting will occur at the end of the project to finalise dissemination routes that will be used to publish the software. 21. If no steering group meeting is planned, a shorter virtual meeting of the steering group (using Skype) will take place to keep all project members up to date with project progress. 22. The project will be managed by Peter Cowling and the work undertaken by Stephen Remde. Documented weekly project meetings will take place to monitor progress and set short term goals between these two, in addition to daily ad hoc meetings. E. Engagement with the Community 23. Engagement with the Teaching Quality Enhancement Group at the University of Bradford means that the student learning stakeholder group is well represented in this proposal. Through the Teaching Quality Enhancement Group and directly with students from the School of Computing Informatics and Media, we will ensure that our software addresses the needs of taught course students. In particular, this may point to the possibility of a follow on project which uses other data repositories for student learning. 24. Engagement with the National Media Museum, at a time when they are at the early stages of assembling a gallery about the history, evolution and social impact of the Internet, provides a golden opportunity to engage members of the public. Our software should enable members of the public to access and understand data repositories and the Internet in new ways. In 3 25. 26. 27. 28. F. particular, this may point to the possibility of a follow-on project which considers a larger part of the Internet. Peter Cowling and Stephen Remde are both active researchers. In addition to providing a dissemination route for project deliverables via the scientific literature, they will work with other researchers within the School of Computing Informatics and Media to evaluate the applicability of the developed tools to the research domain. In particular, this may point towards the possibility of a follow on project using research data and research publication repositories (such as the Bradford University Repository Project funded by JISC). Software will be released using an open source licence agreement to be used as a teaching aid or research tool for a wide range of knowledge repository users. Progress will be blogged and a project website will be maintained throughout the project. We will work with other related JISC project groups, such as the Dynamic Learning Maps project at the Newcastle University, for which Peter Hartley is an external critical friend. Budget Directly Incurred Staff Software Developer (Stephen Remde) grade 7, 6 months full time Technical support – contribution 0.1 FTE Total Directly Incurred Staff (A) August 08– July 09 £5,633 August 09– July 10 £11,267 TOTAL £ £ 16,900 £508 £6,141 £1,016 £12,283 £ 1,524 £ 18,424 Non-Staff PC + development software Total Directly Incurred Non-Staff (B) August 08– July 09 £1,500 £1,500 August 09– July 10 £ £ TOTAL £ £1,500 £1,500 Directly Incurred Total (C) (A+B=C) £7,641 £12,283 £19,924 Directly Allocated Project Manager/Systems Architect 0.2 FTE (Peter Cowling) Co-investigator 0.05 FTE (Peter Hartley) Int. critical friend 0.05 FTE (Will Stewart) Estates Media Museum staff (5 days) Directly Allocated Total (D) August 08– July 09 £2,753 August 09– July 10 £5,505 TOTAL £ £ 8,258 £701 £456 £2,600 £248 £6,758 £1,403 £912 £5,200 £496 £13,516 £ 2,104 £ 1,368 £7,800 £744 £ 20,274 Indirect Costs (E) £8,360 £16,719 £25,079 Total Project Cost (C+D+E) Amount Requested from JISC Institutional Contributions £22,759 £10,667 £12,092 £42,518 £21,333 £21,185 £65,277 £32,000 £33,277 Percentage Contributions over the life of the project JISC 49 % Partners 51 % Total 100% No. FTEs used to calculate indirect and estates charges, and staff included No FTEs 0.65 4 Which Staff Stephen Remde (1.0 FTE x 6 months) Peter Cowling (0.2 FTE x 6 months) Peter Hartley (0.05 FTE x 6 months) Will Stewart (0.05 FTE x 6 months) G. 29. 30. 31. 32. 33. 34. Previous Experience of the Project Team Professor Peter Cowling (Principal Investigator) is Associate Dean (Research) and Professor of Computer Science in the School of Computing Informatics and Media at the University of Bradford. He is very active in computer science research and knowledge transfer, having published over 60 articles in high quality scientific journals and conferences, and edited 2 books, as well as securing substantial research funding. His research is related to the use of Artificial Intelligence to the modelling and search themes of this proposal. He has a passion for teaching, and has secured funding from Microsoft in recent years to develop three environments for teaching computer science and programming skills. He has managed several medium-sized (1-10 person-year) software projects, and was a software project manager for AI systems BV, Belgium, prior to joining academia. Stephen Remde (Software Designer/Developer) will submit his PhD thesis at the end of April 2009. His PhD was undertaken in collaboration with an active industrial partner (Trimble MRM Ltd.) which has provided him with excellent software development, project management and team working skills. The topic of his PhD was in systems for mobile workforce scheduling, requiring substantial expertise in the search and other AI techniques in this proposal, as well as user interface design methods. He has significant experience working as a professional software engineer and web application designer (for Intrica Ltd.). Prof Peter Hartley (co-investigator). Since moving to Bradford in 2003, he has been proactive in developing the university’s policies and practices to enhance student learning, extending support for e-learning and e-assessment, and establishing new project and elearning initiatives. His national involvement includes the University’s two successful partnership CETLs (LearnHigher and ALPS), Project Director/sponsor for 4 previous JISC projects (ELP1, ELP2, IT4SEA and ASEL), and work for three JISC Advisory Boards. He led the University’s Pathfinder project following Benchmarking and is External Evaluator for one CETL focussed on e-learning (SOLSTICE). He is currently one of the Critical Friends on the JISC Curriculum Delivery Programme. His work as National Teaching Fellow (NTF) has included multimedia software to support communication via virtual learning - The Interviewer, Gower 2004. Publications include research on assessment feedback and applications of ICT. Will Stewart (internal critical friend) began working in 2002 with the NLN Materials Team at Becta and was involved in promoting the integration of e-learning into the curriculum and encouraging the use of the NLN Materials in teaching and learning in the FE and ACL sectors. In his previous role as e-learning advisor with the JISC Regional Support Centre for Yorkshire and Humber, he actively supported staff development in this field and worked closely with other national services and initiatives, such as the HEA, LSDA Q Projects, DfES Standards Unit and other JISC services such as JISC Infonet and the Plagiarism Advisory Service. In his present role as University e-Learning Advisor, his main role is to support teaching staff in the use technology to enhance learning and teaching. Joe Stocks-Brook (external critical friend) is Gallery Development Manager at the National Media Museum, having extensive experience in driving and creating new visitor experiences at the National Media Museum and is instrumental in implementing new ways of using technology to deliver the museums’ public programme. While leading the Gallery Development department, Joe has successfully managed the technical delivery of all the museums current permanent and temporary exhibitions and is leading the development of the interactive, technological and 3D elements of the proposed Internet Gallery. Tom Woolley (external critical friend) is Curator of New Media at the National Media Museum, has five years experience as a web designer, in-depth knowledge of HTML web page structure, CSS layout and Adobe Flash. He also has over two years experience as a Museum curator, developing gallery content and working with designers to interpret information accordingly. 5 Pro Vice-Chancellor Professor of Electronic Imaging and Media Communications Rae Earnshaw PhD FBCS FInstP FRSA CEng CITP JISC Northavon House Coldharbour Lane Bristol BS16 1QD 20 April 2009 Dear Sir or Madam, JISC Call – Rapid Innovation Programme 03/09 The University wishes to confirm its full support for this bid on Concept Linkage in Knowledge Repositories. This bid seeks to address the challenge of unstructured information gathering and the development of tools and techniques to surf knowledge repositories based on finding concepts that lie between other concepts. The project will provide new visualization and analysis of large networks of related text. Initially this will be from the online encyclopaedia Wikipedia. The software will also be used in the National Media Museum as a way of visualizing information and determining how it can be used within the context of the Internet to intelligently find related articles. As the University is currently seeking to develop its learner environment, particularly for active and collaborative learning, this proposal is especially important and relevant. Background The University has a substantial E-Strategy programme (2004-2012) in support of the institution’s Corporate Plan which is designed to give students greater flexibility of learning and working via online information and learning environments. The University has adopted a blended learning strategy whereby online information is used to support, facilitate, and enhance the student learning process which is initiated in the lecture theatre, classroom, or laboratory. Perceived Advantages of Emergent Technologies A recent development in e-administration in the Student Support Services has enabled us to integrate a number of service units that previously operated separately. Technology has enabled this integration to be accomplished, and it has also facilitated the change of working practices so that the services are now more accessible and more student-focussed. This is currently being extended into a virtual Student Support Service so that the benefits are available to students 24 hours a day 7 days a week from wherever they are located. We therefore see greater utilisation of Web 2.0 technologies and mobile devices as the next logical step in terms of service delivery and support. This project will also be able to build on insights gained from previous and ongoing projects which have looked at the implications and applications of Web 2.0 technologies for the student experience, for example - the University Pathfinder project funded by the Higher Education Academy (HEA). Yours sincerely Professor Rae Earnshaw R.A.Earnshaw@bradford.ac.uk
© Copyright 2024