Digital information discovery systems for universities Prepared to support the presentation of an invited lecture at the International Conference on Digital Libraries = ICDL, in Delhi, India, 27-29 November 2013 by Paul.Nieuwenhuysen@vub.ac.be 2B114, Vrije Universiteit Brussel, B-1050 Brussel, Belgium 1 2 Text published in Proceedings ICDL2013. These slides should be available from the WWW site http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/ (note: BIBLIO and not biblio) contents = summary = structure = overview of this presentation • Introduction • Research problem • Findings 1. Federated search 2. Merging information 3. Commercial discovery systems versus Google Scholar 4. Empirical case studies • Concluding remarks 3 4 INTRODUCTION: Information discovery & access 1. Information discovery 2. Information delivery / access Information discovery system Information delivery / access system 5 INTRODUCTION: Information discovery & access 1. Information discovery 2. Information delivery / access Information discovery system Information delivery / access system 6 INTRODUCTION: Information discovery is important • The quantity of information in digital form is growing fast. • Information sources are scattered. • Even metadata are scattered NO single simple way to discover suitable sources in a dynamic digital network environment. 7 INTRODUCTION: Information discovery process Primary, scholarly information sources 8 INTRODUCTION: Information discovery process Numerous indexing and abstracting services Primary, scholarly information sources 9 INTRODUCTION: Scattering of sources • Integration / aggregation is still far from perfect. L 10 INTRODUCTION: Scattering of sources difficulties Using several information retrieval systems costs time: »to learn about the contents and purposes of each database, »to choose one or several suitable databases, »to learn about the various user interfaces and efficient ways to query each chosen database, which is confusing, »to formulate a suitable query adapted to the target database, »to inspect the results… L 11 INTRODUCTION: Scattering of sources difficulties Using several information retrieval systems costs time: »to repeat all the actions above for each further selected database, »to merge and deduplicate the interesting results & to save these in some way, (which is hindered by variations among the visible output formats on computer display and by variations among the field structures of records form various databases) L 12 INTRODUCTION: Scattering of sources difficulties Besides this user’s viewpoint, the viewpoint of librarians is that they spend a considerable part of their budget on databases, while these may not be well appreciated and exploited effectively by their clients. L 13 PROBLEM STATEMENT 1. Which methods are available these days to make it easier and more efficient to find relevant information sources? 2. Which are the pros and contras of these methods? 14 PROBLEM STATEMENT 15 PROBLEM STATEMENT In particular we focus on methods suitable for exploratory discovery of scholarly information sources in an academic environment by users who do not have much experience with selecting and searching specific databases, such as undergraduate students & others without expertise in the domain where their new information need occurs. 16 FINDINGS 17 Scattering of sources difficulties solutions ?! Solutions?! • Federated searching • Merging of databases 18 • Federated search 19 Federated searching through scattered databases User User Federated search system Search engine Database Search engine Database Search engine Database 20 Federated searching through scattered databases User User Federated search system Search engine Database Search engine Database Search engine Database 21 Federated searching: terminology / vocabulary / synonyms federated searching = meta-searching = metasearching = cross-database searching = multi-database searching = multi-threaded searching = one-stop searching = poly-searching = polysearching = broadcast searching = searching through a portal / gateway 22 Information discovery process with federated search Some federated search system Numerous indexing and abstracting databases Primary, scholarly information sources 23 Information discovery process with federated search Some federated search system Numerous indexing and abstracting databases Primary, scholarly information sources 24 • Merging of information into 1 database 25 Merging information into a searchable database User User Search engine Database or web site or… Aggregated database Database or web site or… Database or web site or… D or 26 Example Merging: applications: union catalogues of libraries 27 Information discovery process with an integrating system based on merging into 1 database Some integrating information discovery system Numerous indexing and abstracting databases Primary, scholarly information sources 28 • Comparison of merging databases with federated searching 29 Comparison of methods for information retrieval • The more general evolution of information and communication technology has partially determined the evolution of the information retrieval systems discussed here: »Federated searching has been pushed forward since the Internet made implementations possible with acceptable speed. »Merging information sources has more recently seen more implementations due to increasing capacities of computers and hard disks at decreasing prices. 30 Comparison of methods for information retrieval Merging databases Federated searching 31 Comparison of methods for information retrieval Size of the coverage Merging databases Federated searching -+ +- 32 Comparison of methods for information retrieval Size Independent of the of Internet / coverage WWW Merging databases Federated searching -+ +- +- 33 Comparison of methods for information retrieval Size Independent Up-to-date of the of Internet / information coverage WWW Merging databases Federated searching -+ +- +- -+ + 34 Comparison of methods for information retrieval Size Independent Up-to-date Pre-search analysis of the of Internet / information of all data coverage WWW (for better relevance ranking, to eliminate duplicates, to merge related database records into 1 record, etc…) Merging databases Federated searching -+ +- +- -+ + + - 35 Comparison of methods for information retrieval Size Independent Up-to-date Pre-search analysis of the of Internet / information of all data coverage WWW (for better relevance ranking, to eliminate duplicates, Speed of retrieval and display to merge related database records into 1 record, etc…) Merging databases Federated searching -+ +- +- -+ + + - +-+ 36 Comparison of methods for information retrieval: conclusions • A single, simple, standard method = approach = solution does NOT (yet) exist. • Two basic methods are common. • They have their own »advantages and »disadvantages. 37 • Commercially available information discovery systems / services 38 Federated search versus merging in digital libraries • In digital library searches: »Up to date information is not crucial in most cases, so that federated search is not required. »The method of a priori merging sources can perform better than federated search. • Therefore a few big players in the library information industry have built services based on this method, even though considerable investments are needed in terms computer systems, manpower, internet connectivity etc. 39 Commercial information discovery services • Several companies offer discovery services that are based mainly on collocating existing bibliographic databases into bigger merged databases to obtain a fast and panoramic discovery system that is hosted somewhere on the WWW = ‘in the cloud’. Such a discovery system can include > 1 BILLION items! 40 Commercial information discovery services as OPAC • The contents of the catalog of the library that implements such a system can also be imported in the database of the system. J 41 Commercial information discovery services • Terms used for such systems are »Information discovery systems »Resource discovery systems »Web-scale discovery systems • Their strength in usability and searching makes them also usable as “next generation library catalogs”. 42 Commercial information discovery services A few producers and systems / services: »EBSCO Publishing offers EBSCO Discovery Service = EDS »Ex Libris offers Primo »Innovative Interfaces »OCLC »(ex-Serials Solutions) Proquest offers Summon 43 Information discovery process with an “information discovery system” based on merging into 1 database Some integrating information discovery system Numerous indexing and abstracting databases Primary, scholarly information sources 44 Information discovery process with an “information discovery system” Some integrating information discovery system Numerous indexing and abstracting databases Primary, scholarly information sources 45 The online catalog: evolution More J COVERAGE Less L Less FUNCTIONALITY More 46 The online catalog: evolution More J COVERAGE Less L Less FUNCTIONALITY More 47 Commercial information discovery services Comments by librarians range from enthusiastic to skeptical. JL 48 Commercial information discovery services Decentralized: Centralized / integrated: Native database 1 Content of database 1 Native database 2 Content of database 2 Native database 3 Content of database 3 Native database 4 Content of database 4 Etc… Etc… OPAC OPAC 49 Commercial information discovery services Decentralized: Centralized / integrated: Content centered User centered Coupled content & user interface Decoupled content user interface Many different user interfaces One uniform user interface 50 Commercial information discovery services User(s) OPAC Catalog database 51 Commercial information discovery services User(s) Classical, OPAC integrated, library management system Catalog database Lending management 1 or several union catalogs 52 Commercial information discovery services User(s) Classical, OPAC integrated, library management system Catalog database Lending management 1 or several union catalogs 1 or several federated search systems 53 Commercial information discovery services User(s) Classical, OPAC integrated, library management system Catalog database Lending management 1 or several union catalogs 1 or several federated search systems Information discovery system 54 Information discovery services: limitations / drawbacks • These discovery systems offers a huge amount of metadata, but they can and do NOT cover »ALL information published »ALL information available directly in full text, »ALL publications that have been licensed by the local library for a fee L 55 • Commercially available information discovery systems compared with free discovery services 56 Commercial discovery services versus free discovery services • Besides the various discovery systems mentioned above, which can be implemented by a digital library service, many great discovery systems have become available relatively recently, which offer »a high coverage, »a user friendly interface »all this free of charge. & 57 Commercial discovery services versus free discovery services • The availability of more high-quality free discovery services leads to »the declining value of subscription-based abstracting and indexing services »doubts among librarians about the cost-effectiveness of the commercial information discovery services 58 Commercial discovery services versus free discovery services • Example: Of course the popular general WWW search systems, lead by Google since a few years. 59 Commercial discovery services versus free discovery services • Example: More specialized but similar systems devoted to scholarly information, such as Google Scholar. This is a relatively ‘new kid on the block’. The system provides good coverage and is increasingly used by students and researchers as a discovery system. ‘It appears that Google Scholar has supplanted the traditional library bibliographic database as a means of subject searching for journal full-text.’ 60 Commercial discovery services versus free discovery services • Producers and vendors of commercially available discovery systems talk and write about their competition as if this consists of the few other similar commercially available products. • This is misleading. • Who is a really significant competitor? 61 Google Scholar: screenshot 62 Google Scholar coverage and quality Google Scholar is steadily improving in coverage and quality. Chen, Xiaotian Google Scholar's Dramatic Coverage Improvement Five Years after Debut. Serials Review 2010 Volume 36, No. 4, pp. 221 - 226 63 Google Scholar coverage and quality “Google Scholar’s coverage is also comprehensive” Harzing, A.W. (2013) A preliminary test of Google Scholar as a source for citation data: A longitudinal study of Nobel Prize winners. Scientometrics, vol. 93, no. 3, pp. 1057-1075. 64 Google Scholar coverage and quality “Our data suggest that Google Scholar coverage is now increasing at a stable rate” Harzing, A.W. (2013) A LONGITUDINAL STUDY OF GOOGLE SCHOLAR COVERAGE BETWEEN 2012 AND 2013 http://www.harzing.com/download/gs_coverage.pdf 65 Information discovery services versus Google Scholar • It is hard for companies in the information industry to compete with the leading big company Google that produces Google Scholar & that offers this free of charge on the public internet. 66 Information discovery services versus Google Scholar Some federated search system Some integrating information discovery system Numerous indexing and abstracting databases OR ? Google Scholar search & discovery system/service Primary, scholarly information sources 67 Information discovery services versus Google Scholar Coverage supports exploratory search Google Scholar + Commercially available discovery systems + 68 Information discovery services versus Google Scholar Search results offer the user a link to local library holdings and access rights (if the local library integrates its knowledge base with the discovery system in a link generator). If the desired document is not directly available, then the user can directly request the document from the local library document delivery service (if the local library integrates this service with the discovery system in a link generator). Google Scholar + Commercially available discovery systems + 69 Information discovery services versus Google Scholar Can be used / implemented by a library free of charge Coverage includes not only classical publications, but also other files on the WWW, such as web pages and presentation files Google Scholar + + Commercially available discovery systems - - a few 1000 $ per implementation in/by a library (besides costs of access to databases) 70 Information discovery services versus Google Scholar Provides links from a bibliographic description NOT only to the publication on the site of the publisher, (which is perhaps NOT accessible) but also to open access copies on the website of the author at a university Google Scholar + Commercially available discovery systems - 71 Information discovery services versus Google Scholar Ranking of results exploits citations received by the retrieved document more influential documents rank higher Each document is accompanied by the number of citations received from other documents & by links to those citing documents Google Scholar + + Commercially available discovery systems - - 72 Information discovery services versus Google Scholar Offers search for documents on the WWW, with a similar user interface Google Scholar + Google ( = classic WWW search) Commercially available discovery systems - 73 Information discovery services versus Google Scholar Offers search for images on the WWW, with a similar user interface Google Scholar + images.google Commercially available discovery systems - 74 Information discovery services versus Google Scholar If the service / system is chosen by a library, then branding by the library is possible. (But this is probably more important for libraries as organizations than for the users they are serving, who do not care about who provides a good service.) Google Scholar - Commercially available discovery systems + 75 Information discovery services versus Google Scholar The local library can export local catalogue / holdings, for import in the database of the discovery system, mainly to add bibliographic descriptions of unique, local items that are not yet included from other sources Google Scholar - Commercially available discovery systems + 76 Overlaps of bibliographical databases Commercial information discovery service Catalogue of the library Google Scholar discovery service 77 Information discovery services versus Google Scholar: limiting to the library • An additional aspect in the comparison: Can the system / service limit searches to documents available in the library? • Commercially available information discovery systems are built to do this, while Google Scholar does NOT work in this way. • This aspect is exploited by producers of discovery systems to convince librarians that their system is superior & that Google Scholar should not even be considered as an alternative. 78 • Commercially available information discovery systems versus Google Scholar in case studies 79 Information discovery services versus Google Scholar: case studies • Empirical case studies have been carried out »to assess the validity and reality of the general comparison »to compare the precision in the results of searches 80 Information discovery services versus Google Scholar: case studies • The precision of the first 10 results is determined by several factors such as 1. coverage, 2. enrichment of metadata, 3. inclusion or not, of full-text of the document in indexing, 4. taking into account or not, of the citations and links received from other document files, 5. indexing algorithms, 6. relevance ranking algorithms, etc… 81 Information discovery services versus Google Scholar: test method • Identical searches have been carried out in different information discovery systems. 82 Information discovery services versus Google Scholar: test method • As examples of information discovery services, we used the implementation of Summon »at Chalmers University in Sweden & »at ULB in Belgium. • These services can be used by anyone free of charge from anywhere (at least the discovery / search phase; access / delivery of full document in the case of licensed materials is restricted.) 83 Information discovery services versus Google Scholar: test method • The search options are set »to include material that is NOT directly made accessible by the library, to expand coverage (This is NOT a default setting.) »to rank results according to relevance as estimated by the system 84 Information discovery services versus Google Scholar: test method • The topics for searches are well-known by the user who performs the tests. • The queries are simple, using only 1 or 2 words and no operators, to simulate queries of non-expert users for whom these systems have been developed in the first place. • The relevance of results and links to further information have been evaluated. 85 Information discovery services versus Google Scholar: test case 1 • A test of finding information on a particular, concrete subject / topic: the wooden pillars / poles / posts of the meeting house for communal decision making, with a low ceiling, which is present in each village of the Dogon people in Mali, West Africa; these are often decorated with a protective spirit in the form of a female (or male) figure; such a house is named toguna or togu na (with a space). • Search query to start with is: toguna 86 Information discovery services versus Google Scholar: test case 1, results IDS • The Summon information discovery services gave NO relevant results in the top 10 results in 2 tests that were performed with an interval of a few weeks. L 87 Information discovery services versus Google Scholar: test case 1, results IDS • Using as search term togu na: »At Chalmers University again NO relevant result, even though a book has been published with this tittle; restricting results to books gives again NO relevant result. L »At ULB result 1 refers to the book published with this title and to the copy available in the ULB library. • So when the book is not in the library collection, the user does not discover it, even though this is the most important publication on the subject. L 88 Information discovery services versus Google Scholar: test case 1, results GS1 • Google Scholar, in a first test, extended the search automatically to include togu na with a space. J • Results 1, 2, 4 point immediately to the important publication = the printed book with title Togu na published in 1977. J • These results also link to other documents that include a citation to this book ! J • Each result offers also links to related documents and some of these are also relevant. J 89 Information discovery services versus Google Scholar: test case 1, results GS2 • In a test that was performed a few weeks later, Google Scholar does NOT extend the search automatically to include togu na with a space; the first 10 results are NOT relevant. • When the query is changed to togu na (with the space), then results 2, 3, 4 point immediately to the important publication = the printed book with title Togu na. 90 Information discovery services versus Google Scholar: test case 1, results GS2 • These results also link to other documents that include a citation to this book ! J • Each result offers also links to related documents and some of these are also relevant. J 91 Information discovery services versus Google Scholar: test case 1, results GS2 • Using classical Google web search with both queries does not directly reveal the book. • Using Google Books with the term toguna , the system does NOT reveal the book, but when the query togu na is used, then result 1 gives the book. J 92 Information discovery services versus Google Scholar: test case 1, results GS2 • A few book titles that have been published were searched in Google Scholar; they were NOT found, in July 2013. All these findings indicate that Google Scholar has changed search functionality as well as inclusion of scholarly books. 93 Information discovery services versus Google Scholar: test case 2 1. A test of finding more information about the only journal article that has been published in a in a wellknown scholarly journal, by two specific authors known by the user. 2. Search query is: mettrop nieuwenhuysen 94 Information discovery services versus Google Scholar: test case 2, results IDS The Summon implementations give directly the bibliographical description of the article, as result 1, as expected. J 95 Information discovery services versus Google Scholar: test case 2, results GS 1. Google Scholar gives also directly the bibliographical description of the journal article, as result 1, as expected ! J 2. Furthermore, the service informs us directly that this article has received 84 citations, as found by Google J & it provides links to those citing documents ! J 3. Results 2, 3, 5, 6, 8 give descriptions of presentations at conference and preliminary publications that are related to the main published article and that may also be relevant ! J 96 Information discovery services versus Google Scholar: test case 3 1. A test of finding information about the famous type of mask created by the Songye people in DRC, Africa, which is named kifwebe. 2. Search query is: kifwebe 97 Information discovery services versus Google Scholar: test case 3, results IDS • The Summon implementations give a link to maximum 5 relevant document description in the 10 first results; most results link to an image and not to a document. L • When the search is refined by choosing Limit to articles from scholarly publications, then about 5 of the first 10 results are relevant. 98 Information discovery services versus Google Scholar: test case 3, results GS • Google Scholar always gives mainly links to scholarly sources by default & here the first 10 results are all relevant, most of them written by authors respected in the field of study; citations to printed books are included. J • Furthermore, as always, the service gives us the number of citations received by each document, as found by Google J & it provides links to those citing documents ! J 99 Information discovery services versus Google Scholar: results The outcome of these case studies can be formulated briefly and roughly by the scores: Information discovery service 0-3 100 Information discovery services versus Google Scholar: discussion • Evaluating and comparing the various systems / services is hindered »by lack of information about their coverage, their indexing methods, and their search & ranking algorithms »by the changes over time in their functions and performance 101 CONCLUDING REMARKS 102 CONCLUDING REMARKS • Increasing the chance that users discover the most suitable relevant information is an important task of libraries and information services. • More methods / techniques / services / systems are available than ever before to assist users with information discovery. • The systems are evolving fast. • Implementing a commercial information discovery system brings an additional COST to an information service. • The aim of this work is to help librarians to make decisions on the way forward in information discovery. 103 Questions are welcome
© Copyright 2024