V-Index: an Index based on Consistent Researcher Productivity Submitted By: S. M Saleem Yasir 364-FBAS/MSCS/F07 Supervised By: Dr. Ali Daud Assistant Professor Department of Computer Science and Software Engineering International Islamic University, Islamabad Department of Computer Science and Software Engineering Faculty of Basic and Applied Sciences International Islamic University, Islamabad V-Index: an Index based on Consistent Researcher Productivity Submitted By: S. M Saleem Yasir A dissertation submitted in partial fulfillment of requirements for the degree of MS in Computer Science at the Faculty of Basic and Applied Sciences International Islamic University Islamabad, Pakistan Supervised By: Dr. Ali Daud Assistant Professor Department of Computer Science and Software Engineering International Islamic University, Islamabad May 2012 In the name of Allah Most Merciful and Compassionate, the most gracious and beneficent whose help and guidance we always solicit at every step and moment. Dedicated to my Parents, Teachers and Muslim Ummah. Department of Computer Science and Software Engineering, International Islamic University Islamabad, Pakistan Date: 30 /08 /2012 Final Approval This is to certify that we have read and evaluated the thesis entitled V-Index: an Index based Consistent Researcher Productivity submitted by S M Saleem Yasir under Reg No. 364FBAS/MSCS/F07 and that in our opinion it is fully adequate in scope and quality as a thesis for the degree of Master of Science in Computer Science. Committee External Examiner: Dr. Zia Ul Qayyum Professor Department of Computing and Technology IQRA University H-9 Islamabad Internal Examiner: Dr. Ayyaz Hussain Assistant Professor Department of CS & SE IIU Islamabad Supervisor: Dr. Ali Daud Assistant Professor Department of CS&SE IIUI Declaration I hereby certify that the work presented in this thesis is, to the best of my knowledge and belief, original, except as acknowledge in the text, and that the material has not been submitted, either in whole or in part, for a degree at this or any other university. I acknowledge that I have read and understood the University’s rules, requirements, procedures and policy relating to my higher degree research award and to my thesis. I certify that I have compiled with the rules, requirements, procedures and policy of the University (as they may be from time to time). Name: _______________________________________ Signature: ____________________________________ Date: ________________________________________ ACKNOWLEDGMENT In the name of Allah, Most Gracious, Most Merciful Thanks Almighty ALLAH for giving me the courage and patience to carry out this work. I am very thankful to International Islamic University for providing such a good research environment. I wish to thank my supervisor Dr.Ali Daud for his continuous advise, support and encouragement throughout this work. He has instilled in me a state of confidence, with which I now feel that I can do research of any new topic following his research guidelines. I am grateful to Department of Computer Science and Software Engineering IIU Islamabad and faculty members for providing healthy environment for research. I would be failing in my duties if I would not remember to thank my fellow graduate students, especially Mr. Khalid Mahmood, Mr. Asad Mehmood Khan, Mr. Naveed Ahmad, Mr. Zafar Mahmood and Mr. Muhammad Abid for their continuous motivational support. I am looking forward to a continue collaboration with them in the future. I would also like to thank my dear friends Mr. Muhammad Mehran Ajmal, Mr. Ibrar Munsif, Mr. Mr. Waqar Ahmad, Mr. Saqib Hanif, Mr. Haris, Mr. Haider Ali Farooq and Mr. Waqas Mirza who have been a continuous motivation behind my success. I am specially thankful to Mr. Kashif Abbasi for helping me out when I was doing my implementation. Finally I am eternally grateful to my parents and whole family. Their endless support encouragement and stimulation have been a true source of strength and inspiration for me. I also thank my wife for her consistent optimism whenever I was frustrated. Abstract In the current era, tremendous amount of scientific research work is published by thousands of researchers annually. Different methods have been proposed for researcher productivity indexing based on quantity and quality of publications. Unfortunately, none of them considered the variation among the number of citations received by a researcher for his papers. In this paper, a new method named Variation-Index (v-index) is proposed to handle this issue. It will consider variation in number of citations received by the researcher’s publications. V-index considers the consistency in citations of researcher’s publication in addition to their quantity and quality for indexing. We have proposed an idea of indulging time factor within normal v-index by merging the capabilities of m quotient index. This citation variation enhancement is quite general and can be merged in any of the existing indexing measures with ease. We have used h and g indices as our base study and finally compare the results extracted from the real data set of scientists from Google Scholar database. In results analysis we have compared the results of both simple and time normalized v-index and it has clear impact on the scientist ranking. Scientist on higher rank in v-index gets impacted by the time normalized vindex and its ranking changes accordingly. This shows that time factor has strong impact on scientist ranking. Through statistical measure we have proved that our method performs better in term of consistency to evaluate scientist than other methods. Quantitatively we have calculated the standard deviation and applied it to final results for ranking purpose, more the standard deviation lower the consistency and thus lower ranking. This is our quantitative measure for ranking. We have developed an application which is used to calculate v-index of scientists and produces results in comparison with the existing h and g indices. Table of Contents Chapter 1 Introduction 1.1- Introduction .....................................................................................................................2 1.2- Motivation ........................................................................................................................3 1.3- Objective of Study ............................................................................................................4 1.4- Scope of Study .................................................................................................................4 1.5- Thesis Organization ..........................................................................................................4 Chapter 2 Related Work 2.1 Related Work ...................................................................................................................7 2.2 Ranking Methods .............................................................................................................7 2.2.1- Citations Count .................................................................................................................... 7 2.2-2 Impact Factors...................................................................................................................... 8 2.2-3 Index .................................................................................................................................... 8 2.3- Existing Methodologies ....................................................................................................9 2.3.1- H-Index .............................................................................................................................. 11 2.3.2- G-Index .............................................................................................................................. 12 2.4- What is Problem Statement? ........................................................................................... 15 2.5- Consistency .................................................................................................................... 15 2.6- Problem Statement ......................................................................................................... 15 Summary ............................................................................................................................... 16 2.7- Chapter 3 Methodology 3.1- Methodology .................................................................................................................. 18 3.2- Standard Deviation ......................................................................................................... 18 3.3- Proposed Method............................................................................................................ 19 3.3.1- V-Index: Variation Index .................................................................................................... 19 3.3.1.1- Example ..................................................................................................................... 19 3.3.2- Time Normalized V-index .................................................................................................. 21 3.3.3- Granularity of results .......................................................................................................... 22 3.3.4- Results and Discussions of solved example......................................................................... 22 3.4- Summary ........................................................................................................................ 23 Chapter 4 Experiments and Implementation 4.1- Experiment and Implementation ..................................................................................... 25 4.2- Dataset ........................................................................................................................... 25 4.2.1- Publish and Perish Utility ................................................................................................... 25 4.2.2- Data Extraction and Preprocessing...................................................................................... 25 4.2.3- Database Tables ................................................................................................................. 26 4.3- Development Tool and Programming language .............................................................. 28 4.3.1- Visual Studio 2008 ............................................................................................................. 28 4.3.2- C# ...................................................................................................................................... 28 4.4- Application ..................................................................................................................... 28 4.4.1- 4.5- Screen shots and Descriptions............................................................................................. 29 Summary ........................................................................................................................ 30 Chapter 5 Results and Analysis 5.1- Results and Analysis ....................................................................................................... 32 5.2- Scientists with same h-index ........................................................................................... 32 5.3- Scientists with same g-index ........................................................................................... 37 5.4- Scientists with same h and g-index ................................................................................. 42 5.5- Summary ........................................................................................................................ 45 Chapter 6 Research Contribution and Conclusions 6.1 Research Contribution .................................................................................................... 47 6.1.1 Productivity and Efficiency with Consistency............................................................................ 47 6.1.2 Proposal of Simple Method ....................................................................................................... 47 6.2 Conclusion ..................................................................................................................... 47 References ................................................................................................................................ 49 List of Tables 1. Table 2.1: Scientist A’s sample citation distribution……………………….14 2. Table 3.1: Sample data scientist A……………………………………..…..19 3. Table 3.2: Sample data scientist B……………………………………..…..19 4. Table 3.3: H,G and V Indices……………………………………….……..22 5. Table 4.1: Dataset Table…………………………………………….……..26 6. Table 4.2: Index Table…………………………………………….…….....27 7. Table 5.1: Scientist with same h-index 15…………………………..…..….33 8. Table 5.2: Result of scientists with h-index 15…………………………..... 34 9. Table 5.3: Scientists with h-index 16………………………….……………35 10. Table 5.4: Result of scientists with h-index 16…………………………….36 11. Table 5.5: Scientist with same g-index 25………………..…………….…..37 12. Table 5.6: Result of scientists with g-index 25…………………….……….39 13. Table 5.7: Scientist with same g-index 21………………..………….……..40 14. Table 5.8: Result of scientists with g-index 21……………….……………42 15. Table 5.9:Scientists with same h and g indices……………….……………42 16. Table 5.10:Results of scientists with same h and g indices……………..…44 List of Figures 1. Figure 1.1: Comparing Scientist’s work……………………………………..…03 2. Figure 3.1: Comparison between the received citations of author A and author B……………………………………………………..…21 3. Figure 4.1: Screen shot V-index simulation form…………………...…….……29 4. Figure 4.2: Screen shot index calculation form………………………….…......30 5. Figure 5.1: Chart of scientists with h-index 15………………..………………..34 6. Figure 5.2: Chart of scientists with h-index 16………………..………………..36 7. Figure 5.3: Chart of scientists with g-index 25…………………..……………...39 8. Figure 5.4: Chart of scientists with g-index 15……………………….…………41 9. Figure 5.5: Chart of scientists with same h and g indices……………………….44 Chapter 1 Introduction V-Index: An Index Based on Consistent Researcher Productivity 1 Chapter 1 1.1- Introduction Introduction In the current era scientific success is based on the research produced by the scientists in different field of studies. Researcher’s success is based on the papers published by him/her in different journals and conferences. A large amount of money is being invested on scientific research in advanced countries due to which competition is getting tougher every day. Currently massive amount of scientific research work is published and organizations need to evaluate researcher’s work for finding suitable researchers for emerging industry requirements [7,11]. All of the scientific progress depends on the quality of research work produced by the researcher’s. Scopus, Thompson ISI, Google Scholar and Microsoft Academic Search has built and maintained a large database of researcher’s publication in different journals and conference proceedings, and citations received by them. They provide h and g-indexes of researchers which are mostly used for judging researcher productivity. Ranking of scientific journals, conferences or individual scientists created a competitive environment and due to this competition enormous amount of research work is produced by the researchers as ranking depends upon the number of publications. The major problem for ranking is to evaluate the quality work among the huge data. Many measures have been proposed and used to evaluate individual work as well as journals and conferences. Different measures used different technique for evaluation i.e. some consider the number of publications, and some consider the received citations over the published work. With time new variations are introduced to overcome the problem of previous methods, number of other factors have been introduced like, time, age, field, area, application etc. Different indexing methods have been proposed and used to measure the quantity and quality of work of researchers. In the past, impact factor (IF) [13] was considered to be the best indexing method for evaluating the journal articles. It uses the fact of average number of citations received by an article published in science journals. Journals with high IF considered as more productive than of those with the lower IF. Impact Factor was limited to the journals indexing, consequently a general indexing scheme useful for journals, conferences and researchers named h-index [15] was proposed. It does not consider the average citations count to the number of documents published. One can use h-index to assess the work of individual researcher as well as group of researchers or team. Later g-index [9] was proposed, which has V-Index: An Index Based on Consistent Researcher Productivity 2 Chapter 1 Introduction used the same method as h-index to calculate the impact and quantity of published work by researcher but it is more sensitive than h-index by providing more importance to researchers with highly cited papers. Number of different variants of h-index and g-index were proposed by different researchers suggesting new enhancement to the existing method by removing their weaknesses [5,6,16,17]. H-index and g-index are also merged to get benefit from both at the same time [2]. All the existing indexing methods ignore the variation in citations of papers for researchers. 1.2- Motivation Identifying the scientist’s contribution in term of producing quality work has always been a problem. Any method can be used to evaluate scientist’s worth but it should be transparent and fair because this evaluation results the decisions for giving limited grants, promotions, fellowships and awards. Our motivation is to find the best way to assess the contribution of scientist by removing the gray areas of existing h and g indices. We have based our proposed solution on the “Consistency” factor which has been ignored by the previous methods. In our method more consistent scientist in producing quality work regarding received citations is more contributing. Figure 1.1: Comparing Scientist’s work V-Index: An Index Based on Consistent Researcher Productivity 3 Chapter 1 Introduction As shown in figure 1 we have the situation in which all the scientists have same h index value and we have to find the best scientist among them. 1.3- Objective of Study Objective of our study is to introduce our new index which uses consistency as the key parameter for evaluating the scientist’s work and then we have compared this index with existing well known indices the h-index and g-index. Following two is main objectives of this study. Proposing new v-index (variation index) which identify the variations and time efficiency in received citations and finds the most consistent scientist. Comparing the results of proposed method with existing h and g indices. To improve efficiency of existing ranking methods. 1.4- Scope of Study Scope of our study is to implement our proposed solution using real data of scientists and generate the results. We have developed simulation software using C# as a programming language in Visual studio 2008 environment. Dataset of all the scientist has be extracted from Google scholar database using Publish or Perish utility (PoP). We have performed all the calculations on the real dataset. 1.5- Thesis Organization Rest of the thesis is organized in following manner. Chapter 2: In this chapter we have described in detail different evaluation method for ranking scientist, journal or conferences and the history of these methods and their applications. And all the related worked is written in detail. Different indices and previously proposed methods for evaluating scientist along with their strength and drawbacks are described in this chapter. On the basis of this literature review problem statement is formulated and written down in this section. Chapter 3: In this chapter we have mentioned the methodology we have used to support our proposed solution. Definition and explanation of standard deviation and Variance is given in this V-Index: An Index Based on Consistent Researcher Productivity 4 Chapter 1 Introduction chapter and then we have explained our proposed solution that how it will be applied. Sample problem is also solved with dummy data here support our proposed solution. Chapter 4: In this chapter we have mentioned the Tools and utilities used to collect data and data preprocessing, tools and programming language used to develop our simulation software to calculate our variation index and then screen shot with descriptions are added in this chapter to give the UI look to the readers. Chapter 5: In this chapter input data and the output after execution of our developed program is discussed in details and analysis of the results is presented. We have taken different criteria to support our proposed methodology for the evaluation of scientist i.e. if two scientists have same h-index, same g-index and then same h and g indices values. Chapter 6: In this chapter we have concluded our thesis and enhancements. In a precise way research contributions are also mentioned in this chapter. V-Index: An Index Based on Consistent Researcher Productivity 5 Chapter 2 Related Work and Problem Statement V-Index: An Index Based on Consistent Researcher Productivity 6 Chapter 2 2.1 Related Work and Problem Statement Related Work We have studied in detail number of articles and research papers regarding scientist ranking and evaluation. Number of different methods are explored i.e. citation count, Impact factor, h-index and its variations and enhancements. During our literature review we have considered strength and weaknesses of different methods their application, impact and research contribution. On the basis of our literature review we have selected well known h-index and gindex as our base indices and then we have studied in detail the enhancement and variation to these indices. Through this extensive study we have found that element of Consistency of received citations is missing in the process of evaluating the scientist work. Details of our related work with selected papers from different scientists are explained in the coming section of existing methodologies. 2.2 Ranking Methods Ranking a scientist on the basis of his/her research contribution is very serious and highly discussed issues for many years. Different type of methods are used to rank scientist but no single method is able to compute ranking number more effectively by considering all the aspects[21]. If one of the evaluation methods is good in one aspect, it lacks in other. Some of the mostly used ranking methods are discussed below. 2.2.1- Citations Count Citation is defined as the process of acknowledging the work of one scientist to be used as reference in our work i.e. if we use idea or work published in any journal or conference as a reference in our publication or article it means that we have cited this paper and its citation count will increase by one. Citation count is one of the earliest methods used to evaluate the ranking of scientist, journal or conference by count the number of citations received in certain period of time [24][25]. Citation Count can be applied in following criteria. Individual publication evaluation ( how many have cited this publication) Scientist or Author ( Total number of received citation for each publication of a scientist) Journal or Conference (Average citations count received by the article or paper published in specific journal or conference). V-Index: An Index Based on Consistent Researcher Productivity 7 Chapter 2 Related Work and Problem Statement 2.2-2 Impact Factors Eugene Garfield proposed method to assess the quality of work published by the journal which is known as Impact Factor (IF). Impact factor has been widely used as standard method to evaluate the journal ranking. Journal which has higher Impact Factor was considered to be valuable among others. Impact Factor of a journal is calculated as the average number of citations for each of the published paper gained during the previous two years. Impact Factor of the journal published in 2011 would be measured as follows. T = Number of times paper received citations published in 2009 and 2010 in indexed journal during 2011. C = Total number of items published in 2009 and 2010 which can be cited. Then Impact Factor of year 2011 of that journal is calculated as IF (2011) = T/C Impact Factor was used to rank the journal and it cannot be applied to individual scientist’s work [5]. Some other improvements and variations to Impact factor were also proposed by the same organization. The Immediacy Index which is calculated by dividing the number of citations the published papers in a journal receive by the number of published papers. Then other enhancements like Aggregate Impact Factor were introduced for the subject category of the journals. All these measures can only be applied to journal ranking not the individual or group of people (Team) work. 2.2-3 Index A new way of ranking scientist and evaluating his work was proposed by JE Hirsch in 2005 which later known as h-index. This index based method incorporated both quality of work and the quantity of work produced by the scientist without taking complex mathematical calculation and derived a single value index. This index uses citations to identify the quality of published work and number on publication to consider quantity of the produced work at same time. This Hirsch index is very easy to use and so it is more widely used. There are number of other indices proposed are the variations of h index. V-Index: An Index Based on Consistent Researcher Productivity 8 Chapter 2 Related Work and Problem Statement 2.3- Existing Methodologies Garfield [13] proposed a method to assess the quality of work published by the journal which is known as Impact Factor (IF). Journal which has higher Impact Factor was considered to be valuable among others. Impact Factor of a journal is calculated as the average number of citations for each of the published paper gained during the previous two years. Impact Factor was used to rank the journals and it cannot be applied to individual researcher’s work [1] directly. As IF was representative of whole journal and a researcher who published a paper in that journal and even his paper did not get citations will get the same IF as other researchers published in that journal whose papers got many citations. Individuals should be indexed based of the quality and quantity of their own publications and citations received by their publications and not by journal in which they publish. To compare and evaluate the individual research Hirsch [15] proposed h-index. In it papers are arranged in descending order according to the citations received by them. The h-index is the paper number N, equals to or less than the number of citations of respective paper and all the proceeding documents have N or fewer citations. The h-index was robust in the sense that it did not punish a researcher for the number of papers which are not cited to the ones with high citation rate [4,12]. One everlasting limitation of these indexing methods is also discussed that they cannot be used to measure the impact of a researchers awarded with Nobel Prize on their extra ordinary work [15]. During the recent years h-index is used to be the most practicing index to measure and assess the quality and quantity of work of individual researchers directly unlike IF which can measure researcher productivity indirectly through journal citations. It can be applied to journal publications as well as article appears in different conferences, but it has been found that h-index appears to be less sensitive to tackle different factors like giving more importance to highly cited papers. Consequently, Egghe [9,10] proposed new index called g-index, which gives extra weight to highly cited papers. If publications of scientist are ranked in descending order then gindex is the largest document number such that top g publications collectively received at least g2 citations. The g-index calculation resembles to the h-index and it makes the procedure of ranking the scientist more sensitive [8], but as both of the indices used natural number to calculate so they both have deficiency of discriminatory authority. V-Index: An Index Based on Consistent Researcher Productivity 9 Chapter 2 Related Work and Problem Statement Both h-index and g-index ignored the career length of researcher which is discussed by Burrell [5] and an enhancement named m-quotient to existing h-index by including career length was proposed. In M-quotient the h-index value is divided by the number of years of research activity. Later, Burrell [6] proposed a-index by saying most prolific core of scientist output can be expressed as the average number of citations of a published paper in h core. Instead of using arithmetic average to measure the central tendency of citation distributions, new method based of median named m-index was introduced [20], by discussing the extreme values effect on arithmetic average. Another variation of g-index and h-index was presented by Kosmulski [17] known as h(2) index. Calculation of h(2) index just like original g-index, has added more sensitivity to h-index and gives importance to more cited papers like g-index. The h(2) index of a scientist is the natural number equals to h(2) such that most cited h(2) publications received at least (h(2))2 citations collectively [17]. A weakness of a-index was discussed [16], that its process involves the division by h-index which affects the result of a good researcher with higher h-index. Jin et al. [16] handled this unfair behavior of a-index and proposed new solution in the shape of r-index. In r-index instead of dividing by h-index value of a researcher, author used method of taking square root of the sum of the citations of published papers in Hirsch’s core to calculate the index. Jin et al. [16] along with r-index also proposed the Ar-index which adapted the power of r-index. It considered not only the intensity of the citations of the published article but also make use of the life time of the publication, which make it more sensitive as with the passage of time index of a scientist not only increases but can be decreased. New methods are suggested to complement existing h-index by removing the weakness of ignoring the details [18]. New idea is to create h-sequence and h-matrix of the scientist to find out rank at different scientific career time span, whereas one could also find out the original Hirsch index to that scientist in h-sequence and h-matrix. Egghe and Rousseau [11] proposed weighted h-index written as hw-index [11]. It depends on the number of citations obtained by the published papers in Hirsh core. It was presented in continuous settings and discrete. It was observed that in its continuous setting this index worked well and shows some good results, while in discrete setting some deviations from the ideal results are countered. Alonso et al. [2] tried to reduce the weaknesses of h-index and g-index by merging both of the indices. He merged the properties of both of the indices and created new index known as hg-index. The relationship V-Index: An Index Based on Consistent Researcher Productivity 10 Chapter 2 Related Work and Problem Statement of journals and researchers is discussed and an Indexing criteria by considering journal and the scientist at the time is proposed [3]. The intuition was that both entities are interrelated to each other as highly ranked journals have publications of highly ranked scientists. In the related work studied so far no one handles the problem of variations of citations of the published work of a researcher which motivated us to propose v-index. 2.3.1- H-Index Jorge E. Hirsch proposed this index which is known as h-index [1]. It is an index used to measure the productivity and impact of the published research or work of a scholar. This index uses the set of most cited papers of a scientist and the number of citations relevant to each of these papers which they have gained in the other scientist’s work. Hirsch index can be used to find the productivity and impact of the work produced by team of scientists. The h-index requires the distribution of citations received by the scientist’s publications. Hirsch defines it as A scientist has index h if h of [his/her] Np papers have at least h citations each, and the other (Np − h) papers have at most h citations each. Simply we can explain it as scientist who has received index h has produced h papers each of which has received at least h citation by other researchers. Thus h-index covers both, the publications and the number of citations they have received. Index was intended to measure the work of different scientists of the same field and it improves the impact of simple measures i.e total number of citations or publications. Following is the equation to find the h-index. ℎ= (1) Where NcT is the total number of citations received and “a” is the proportionality constant range between 3 and 5. The h-index is widely used as a substitute to the customary Impact factor to evaluate the efforts and contribution of particular scientist. The h-index considers the most cited papers of individual scientist so its calculation is simpler and easy process. The h-index of a scientist increases as citations accumulate, so it depends on the number of years of career of a scientist. V-Index: An Index Based on Consistent Researcher Productivity 11 Chapter 2 Related Work and Problem Statement Hirsch has proposed the h-index to address the main weaknesses of other evaluation indicators, such as impact of total number of papers or citations. As number of papers doesn’t mean the quality of work produced by the scientist while total number of citations can be influenced by the huge number of citations received by the one or two publications whereas other publications of same scientist have none or very few citation’s count. The h-index evaluates and counts both number of citations and the number of publications at the same time. The h-index is not affected by great number of citations of one or two papers. Example 1 Scientist A have Number of document published 12 Total number of Citations 249 Then h-index will be ℎ = = =7.889 h= 7.889 Nct = Total number of citation a = Proportionality constant that ranges between 3 and 5 h = h-index 2.3.2- G-Index Leo Egge in 2006 presented new solution based on existing h-index to assess the contribution of scientist [2]. This new solution known as g-index also depends upon the number of citations received by the published work of individual scientist. The g-index utilizes the same method of h-index for the calculations i.e. documents are ranked and sorted in descending order with respect to their achieved citation numbers. The index is measured by counting the distribution of citations achieved by the published article of the given scientist [3]. V-Index: An Index Based on Consistent Researcher Productivity 12 Chapter 2 Related Work and Problem Statement Given is the set of published papers sorted in descending order according to the number of citations the gained, then g-index is the largest number such that the top g papers collectively have) at least g2 citations [3]. The g-index is very much resembles to the h-index, and it tried to remove the weaknesses of h-index. It makes the procedure of ranking the scientist more sensitive, but as both of the indices used natural number to calculate so they both have deficiency of discriminatory authority. Example 2 LET WE HAVE SCIENTIST A WITH Number of document published Total number of Citations 12 249 The highlighted record in the sample data table shows the calculated g-index of scientist A. g-index = 15 V-Index: An Index Based on Consistent Researcher Productivity 13 Chapter 2 Related Work and Problem Statement Table Titles: Document Rank (DR), Citation (Cit), Sum of Citations (∑Ci) Table 2.1: Scientist A’s sample citation distributions PubR Ci PubR2 ∑Ci 1 50 1 50 2 44 4 94 3 40 9 134 4 31 16 165 5 25 25 190 6 18 36 208 7 12 49 220 8 10 64 230 9 9 81 239 10 5 100 244 11 4 122 248 12 1 144 249 13* 0 169 249 14* 0 196 249 15* 0 225 249 16* 0 256 249 V-Index: An Index Based on Consistent Researcher Productivity 14 Chapter 2 Related Work and Problem Statement 2.4- What is Problem Statement? Problem statement is most sensitive part of any research. A well formalized and declared problem statement shows the worth of a scientific work and so it can add value to the proposed methodology [23]. Problem statement provides the description of existing problem that help to define the scope and provide the direction to our research area. 2.5- Consistency Consistency is measure to estimate the distribution of data items in a data set that how closely they are dispersed around the mean value or center. More the consistency values means that data items in a given data set are closely coupled to each other. In our research, consistency is the basic element for the formulation of problem statement and the proposed solution. We have found in related work study that no one before consider this basic attribute to evaluate scientist. Our proposed solution defines that more consistent scientist in term of received citations to his/her published papers is more contributing scientist. 2.6- Problem Statement As we have discussed Hirsch’s h-index and g-index and there different variations proposed in the recent years. The detail study h-index and g-index shows that they are not handling the variation of number of citations between the work of two or more scientist with same value of h-index or g-index. For example we have two scientists A and B, both have hindex of 20. A has top 15 papers containing more than 50 citations each while B has no paper exceeding the 50 citations. This show that scientists A work is more worthy than that of the scientist B, but h-index and g-index are not sensible to that situation and are unable to consider the variation of number of citations of two scientists having the same h-index and g-index values. We are proposing new solution which will calculate the variation and then we divide the scientist’s h-index and g-index value with that variation value to get the final rank of the quality of work of a given scientist. Our method will make h-index and g-index more sensible to handle the variation count of citations. V-Index: An Index Based on Consistent Researcher Productivity 15 Chapter 2 2.7- Related Work and Problem Statement Summary In this chapter we have discussed in detail the methods or techniques of ranking scientist work Impact factor, Average count, and or indexes. After that details of recent work on ranking and its enhancement or variation of h-index are explained. In related work h-index and its variations like g index, m quotient, A-index, Ar-index, hg-index etc and many more are presented with their proposed work and weaknesses. Then we have taken h-index and g-index and solved the examples to calculated the indexes and found that in both examples scientists have resembled each other and h-index and g-index are unable to find the difference between two scientists. This make base for our motivation and formulation of problem statement. In our problem statement it is clearly mentioned that two scientist resembles each other in every aspect have some variations in their received citations. This citation variation factor plays an important role in scientist ranking. V-Index: An Index Based on Consistent Researcher Productivity 16 Chapter 3 V-Index: An Index Based on Consistent Researcher Productivity Methodology 17 Chapter 3 Methodology 3.1- Methodology The process of collecting and extracting data for scientific research is simply called research methodology. Collection of data may be done theoretically and or practically i.e. in theoretical way a concept is to be proved by taking strategic measures which are not practically proved where as in the later way real time statistical data is used and idea is experimentally or mathematically proved by strong analysis. In our research we have used real time data of scientists and then we applied different mathematical operations by implementing it practically to prove our proposed solution. 3.2- Standard Deviation Standard deviation is used to determine the inconsistency and diversity in statistics. It finds the disparity or distribution of a value from the mean value. Lesser standard deviation means the data point or value is closer to the mean value whereas higher standard deviation shows value far away from the mean[26]. Standard deviation of a dataset is simply the square root of variance. It is denoted as (σ) and formula to calculate standard deviation is given below. σ= ∑ ( µ) (3) Where ‘i’ from 1 to n represents the number of publication, ‘X’ show the citation received by the publication, ′µ′ represent the mean value and ‘d’ is the total number of publications. Standard deviation can be used if we have to Assess the degree of scattering of the values from its mean, Assess the inaccuracy in the mean of a data set is taken when making assumption of the mean of the whole population from which sample data set is was extracted. Calculate the probabilities of events occurring in the given data set. V-Index: An Index Based on Consistent Researcher Productivity 18 Chapter 3 Methodology 3.3- Proposed Method We are here proposing our proposed methods for scientist ranking. Our new indexes are v-index and v-index with time normalization. Both of these indexes are discussed in detail down below. 3.3.1- V-Index: Variation Index In this section, our proposed method named v-index is given, where ‘v’ stands for the variations in the received citations. The case of two scientists A and B given in Table 1 and Table 2 is taken to show that how variation in received citations of published papers play an important role to differentiate between work of researchers of same h-index and g-index value. We are dealing here with the situation in which both of the researchers have same number of published papers with same number of total citations. In this case quality of the work produced by the researchers is not differentiated by h and g indices due to their inability to handle citation variation. The v-index takes into consideration the quantity, quality and citation variation of papers altogether. 3.3.1.1- Example In the dataset given in Table 1 and Table 2 both scientist have same number of published articles, same number of total received citations. H-index and g-index shown with highlighted rows for researcher A and B which is the same for both. Publication Rank is denoted by (PubR), Ci is denoted by (Citations), Square of Publication Rank is denoted by (PubR2) and Cumulative citations is denoted by (∑Ci). TABLE 3.1:SAMPLE DATA SCIENTIST A TABLE 3.2: SAMPLE DATA SCIENTIST B PubR 1 Ci 50 PubR2 1 ∑Ci 50 PubR 1 Citations 108 PubR2 1 ∑Ci 108 2 44 4 94 2 50 4 158 3 40 9 134 3 20 9 178 4 31 16 165 4 14 16 192 5 25 25 190 5 13 25 205 6 18 36 208 6 13 36 218 7 12 49 220 7 11 49 229 8 10 64 230 8 10 64 239 V-Index: An Index Based on Consistent Researcher Productivity 19 Chapter 3 Methodology 9 9 81 239 9 9 81 248 10 5 100 244 10 1 100 249 11 4 122 248 11 0 122 249 12 1 144 249 12 0 144 249 13* 0 169 249 13* 0 169 249 14* 0 196 249 14* 0 196 249 15* 0 225 249 15* 0 225 249 16* 0 256 249 16* 0 256 249 Table 1 and Table 2 show that both researcher A and B have the same h and g indexes. Both of the indexes are less sensitive that they are unable to find the difference between the works of scientists by considering the variations of citations distribution. Figure 1 shows the citation variation for the work published by researcher A and B. It clearly shows that the citation variation for researcher A is less as compared to citation variation of researcher B. One can say that researcher A has a more stable graph of citations or productivity. For adding the citation variation factor in h or g index like indexes standard deviation is used which is a commonly used method of finding variation in data is given in Eq. 3. Standard deviation can also be calculated by taking the square root of variance1. Variance is calculated by taking the arithmetic mean of the square of difference of each value and the mean. 1 http://www.mathsisfun.com/data/standard-deviation.html. V-Index: An Index Based on Consistent Researcher Productivity 20 Chapter 3 Methodology Figure 3.1: Comparison between the received citations of author A and author B. After the citation variation is calculated, v-index is obtained by simply dividing the scientist existing index value i.e. h or g indexes by the calculated standard deviation and the the active life time of the scientist. The new value which is the v-index shows ranking of scientist with all the good features of h and g indices along with the consistency of their quality work. In this work, the citation variation effect is added only to h and g indexes (Eq. 4) but this enhancement is very general and can be added to all other existing index as well. v = σ or σ (4) Where h and g are the existing indices and σ is the calculated citation variation calculated through standard deviation. 3.3.2- Time Normalized V-index We can improve the results with time normalized v-index. This is another index which is derived from the simple v-index. In the index we have included the Author Research Age factor to show the efficiency of the author. Formula for Time normalized v-index will be. V-Index: An Index Based on Consistent Researcher Productivity 21 Chapter 3 Methodology v = (σ or σ) (4) Where t denotes the active life time of a scientist which can be calculate as the difference of years from the current time and the year of first publication of a given scientist. 3.3.3- Granularity of results We can see that results are appearing in decimal values most of the time which is bit complex to show the index of a scientist. Index should be more precise and single value, currently we have more granule value of the index of a scientist. We can multiply the results with 10, 100, 1000 and more to step out of the granularity unless we get the desired value and then rounding the results to get rid of decimal point. 3.3.4- Results and Discussions of solved example Table 3 shows the calculated indexes values of scientist A and B for h, g and v indexes. It can be noticed that v-index poses more sensitivity than widely used existing h and g indices. Both of the existing indices do not consider the consistency of producing good work. There are chances that one researcher produces a paper which receives great number of citations whereas all other publications receive average or less citation count while second scientist consistently produces good work with high number of citations. Both scientists have 9 h-index and 15 gindex values and both of these indices are insensitive to find the consistency of publishing highly cited work. Table 3 shows the final results and comparison of v-index and other indices with calculated standard deviation. Our proposed method v-index values are (0.56, 0.90) for scientist A and (0.30, 0.55) for scientist B for h and g indexes values, respectively. The higher values of v-index for researcher A as compared to researcher B which for the h and g indexes, shows that scientist A is better than scientist B due to more consistency in producing quality work. Table 3.3: h, g and v indices Scientist A Indices (h,g) Standard Deviation ( ) V-index H 9 16.19 G 15 16.69 Multiple of Multiple of 10 100 0.56 6 56 0.90 9 90 V-Index: An Index Based on Consistent Researcher Productivity 22 Chapter 3 Methodology B H 9 29.20 0.30 3 30 G 15 27.41 0.55 6 55 It is assumed that if this example of resemblance in ranking of two or more researcher’s v-index outperforms over existing methods, it can be applied to any real data for more accurate indexing. We can see that by multiplying the results with 10 and 100 we get numeric value which can be used as new index of the scientists. 3.4- Summary In this chapter we have discussed the methodology which is followed to implement the proposed solution. Variance and Standard deviation are the key factor involved in v-index calculation so in first part we have explained in detail both of these terms and formula to calculate. V-index method has been applied to solve the example. Calculation of v-index is simple in the first step calculate variance and standard deviation and then divide the existing h or g index value with calculated standard deviation. We will get the v-index now multiply the result with 100 and round the value to get single numeric value easy to understand. Value of 100 defines the granularity and helps in formulating results into human understandable format. Results table shows that scientist A has better index ranking than scientist B which was undetectable by existing h and g indexes. V-Index: An Index Based on Consistent Researcher Productivity 23 Chapter 4 Experiment and Implementation V-Index: An Index Based on Consistent Researcher Productivity 24 Chapter 4 Experiment and Implementation 4.1- Experiment and Implementation To fulfill the requirement of Thesis I have developed an application to prove the performance of Variation index which will execute all the process of calculating h and g indices and the variation index and give us the final results. Different experiments are performed using different criteria. All the experiments and simulations are run on local system using the real data set of number of scientists extracted from the database of Google Scholar. 4.2- Dataset There are different databases containing the information of scientists and their publications in journal and conferences. Google Scholar, Scopus and Web of Science (WoS) also known as Web of Knowledge provide the facility to calculate the h-index and g-index of given scientist. All of these mentioned above organizations are maintaining their own databases in the backend so there will be different result for the publications of same author/scientist on different databases and so the difference in h and g indices. We have used the Google scholar database to extract the data of scientist by using POP utility. Google scholar has been growing as the huge source of online data since recent years and it is free and easy to access source. 4.2.1- Publish and Perish Utility Publish or Perish utility by the Harzing provides the same facility of getting the author information about his published work and rank on different measuring scales. Publish or Perish is using Google Scholar’s database in the backend. We have used Publish or Perish (PoP) utility to extract the information of more than 12 thousand scientist with their respective publications and Citation records. All the data is extracted to Comma Separated Delimited data file. 4.2.2- Data Extraction and Preprocessing As we have all the data in the form of Text file which is difficult to manipulate. So there required some level of data preprocessing over the data to convert all the data our desired form. We have used Microsoft Access for database creation. The preprocessing of our data is to export all the data which is in the form of CSV file to respective database table in the appropriate fields. V-Index: An Index Based on Consistent Researcher Productivity 25 Chapter 4 Experiment and Implementation 4.2.3- Database Tables We have created two database tables in MS Access. Tables can be viewed as follows with Fields and structure. Table named as “Data set” is the main table contains information about the scientist and his publications with the citation received. We use this table data to calculate the indices of scientists. Table 4.1: Dataset table Dataset Number Citations Author 1 Author 2 Author 3 Author 4 Paper_Title Year Pub_Type Publisher Publisher_link Paper_Link Fields in the above table stores and contains following type of data. Number: It is the auto generated number shows the sequence number. Citations: It contains the received citations of the respective publication. Author1: It contains the name of first Author. Author2: It contains the name of second Author if any. Author3: It contains the name of third Author if any. Author4: It contains the name of fourth Author if any. Paper_Title : It save the title of the publication. Year: It contains the year of publication. Pub_Type: It shows the publication type either journal or conference. V-Index: An Index Based on Consistent Researcher Productivity 26 Chapter 4 Experiment and Implementation Publisher: It shows the name of the publisher. Publisher_link: It contains link to publisher web site. Paper_Link: It contains the link to online paper view. Second table named “Indextable” in our database contains the calculated indices information. This table data is calculated runtime during the execution of applications. Data of this table is used to show the results and the best scientist name during the result and analysis phase. Structure of “Indextable” is shown below. Table 4.2: Index table Indextable Publications Scientist_name H_index G_index Variance Standard_deviation V_index_hindex V_index_gindex Vindex_h_time Vindex_g_time Author_age Fields in the above table stores and contains following type of data. Publications: It contains the total number of publications of respective scientist. Scientist_Name: It contains the name of scientist. H_Index: It shows the calculated h-index of respective scientist. G_Index: It shows the calculated g-index of respective scientist. Variance: It shows the variation in the received citations of respective scientist. Standard_Deviation: It shows the standard deviation value. V-Index: An Index Based on Consistent Researcher Productivity 27 Chapter 4 Experiment and Implementation V_index_hindex: It shows the calculated Variation Index for h-index value of scientist. V_index_gindex: It shows the calculated Variation Index for g-index value of scientist. Vindex_h_time: It shows the time normalized result for h index value of scientist. Vindex_g_time: It shows the time normalized result for h index value of scientist. Author_age: It shows the activeAuthor Research Age of a scientist. 4.3- Development Tool and Programming language There are different development tool available. To simulate this project I have used Microsoft application development tool i.e. Visual Studio 2008 integrated environment with C# as programming language. 4.3.1- Visual Studio 2008 Microsoft Visual Studio provides an integrated development environment (IDE) for developers. User can develop different console and GUI based like windows form applications, web application and services. Visual Studio supports number of programming languages by using different language services, code editor and debugger with the help of common language runtime CLR maintain almost any programming language, provided a certain language-specific service is present and installed. There are number of built in languages which includes C, C++, Visual C++, VB.net, C#.net, J#, F# etc [22]. 4.3.2- C# Visual C# is designed for developing different type of applications using the .NET framework environment. C# is easy, versatile, authoritative, type-safe, and object-oriented, with its novelty. It enables quick application development and also keeps the clarity and style of old C language. I have coded the entire application using C# or sharp language 4.4- Application Our developed application is a desktop application contains different forms developed in Visual studio 2008 using C# development language. Our application is integrated with database tables created in MS Access. Two forms are designed, first form named as “V-index Simulation” V-Index: An Index Based on Consistent Researcher Productivity 28 Chapter 4 Experiment and Implementation is shown on launching the application. This form is used to load data from the database and to view the data of individual scientist on selection from the scientist list. Second form named as “Index Form” is visible when user click “Calculate Index” button on the first form. On loading the second form all the required calculation are made in the backend. This form is used to compare different scientist fulfilling the filtering criteria and result of the best scientist is shown on the form. 4.4.1- Screen shots and Descriptions In Figure 3 main form is visible which is shown when application is launched. This form is used to load the database and view the scientist’s list. User can select a scientist to view his/her publications and the relevant information of respective publications. User can select or filter scientist’s list in two ways i.e. with respect to number of publications and citations. Calculate Index button is available to calculate h, g and v indices of all the scientists available in the list. Figure 4.1: Screen shot V-index simulation form. When user clicks calculate index button on form one, new form gets open. This new form that is shown in Figure 4 is used to filter the scientists according to their indices. User can select scientist with respect to their h and g indices alone and as well as collectively. When user clicks V-Index: An Index Based on Consistent Researcher Productivity 29 Chapter 4 Experiment and Implementation Apply button after defining the filter criteria in the left pane all the details with calculated variance, standard deviation and v-index starts appearing meeting the selection criteria. In the mean while most consistent scientist’s details are viewable in the bottom left pane which is helpful for the user to see the scientist name and his/her v-index. Figure 4.2: Screen shot Index calculation form 4.5- Summary In this chapter we have discussed the tools and technologies used during the implementation of the project. Dataset has been generated through Google scholar by using publish and perish (PoP) utility available free. We have generated data of more than 10000 scientists with more than 16000 publications. All this data set was extracted in (.csv) file which is converted into MS Access database table. Another table for results calculation “Indextable” has been created. We have used c#.net and ADO to develop UI for our implementation and database connection. Screen shots are attached of our developed project. V-Index: An Index Based on Consistent Researcher Productivity 30 Chapter 5 Chapter 5 Results and Analysis Results and Analysis V-Index: An Index Based on Consistent Researcher Productivity 31 Chapter 5 Results and Analysis 5.1- Results and Analysis In this chapter we are discussing in detail the generated results by our proposed idea of V-index by using the real data of scientist extracted from Google scholar. To properly analyze the results we have defined three conditions on which two or more scientist may resemble each other and stand on the same rank and existing h and g indices are insensitive to this situation. Then we have used our proposed v-index method to distinguish between the scientists on the basis of consistency of their quality work. After applying our method we are able to rank the scientists more properly. 5.2- Scientists with same h-index In this section we are considering the criteria that if two or more scientists have same hindex values. By using our data set we have found that following seven scientists mentioned in Table4 (a) have h-index value of 15. This is our first problem which we have solved using vindex to distinguish between these scientist and rank them properly base on the consistency of their good work in sense of received citations on their publications. In the table citations distribution and the publications are given for the given scientists along with the calculated h-index which is highlighted in row 15. Below is the chart drawn using the above data values. In Figure 3(a) we can see that there is clear difference in the received citation rate for the individual scientist which is undetectable by the h-index V-Index: An Index Based on Consistent Researcher Productivity 32 Chapter 5 Results and Analysis Table 5.1: Scientist with same h-index 15 Pub R 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 MR AzimiSadjadi F Hirsch Citation s 83 80 69 57 50 30 27 26 23 22 20 19 18 16 15 14 14 14 14 13 13 12 0 0 Citation s 100 71 45 43 40 38 37 36 32 31 25 18 16 16 15 12 11 10 7 6 6 5 5 4 TF SyedaMahmoo d PK Bhattachary a M Bhattachary a Citations Citations Citations 147 111 65 49 49 42 41 35 31 29 26 18 17 15 15 14 14 13 12 11 10 9 8 6 132 90 73 72 68 57 50 38 34 32 21 19 15 15 15 10 10 9 9 8 8 8 8 7 375 54 52 36 27 24 22 22 18 18 18 17 15 15 15 13 12 12 12 11 10 10 9 9 DV Gupta AK Aziz Citation s 351 195 105 90 87 74 63 51 50 29 28 22 20 15 15 13 13 10 9 8 7 4 2 1 Citation s 770 463 201 101 50 33 31 31 29 25 24 20 18 15 15 14 13 13 13 12 11 11 10 9 . V-Index: An Index Based on Consistent Researcher Productivity 33 Chapter 5 Results and Analysis Figure 5.1: Chart of scientists with h-index 15 We have applied our V-index method to rank the scientists given in table 4(a). Results of the experiment are mentioned in table 8 and it is clear that v-index have detected the difference between the contributions of each scientist on the basis of consistency of received citations. Table 5.2: Result of scientists with h-index 15 V-index V-index Time Normalized Author Research Age (yrs) h-index Standard Deviation ( ) 15 30.6165221501631 49 2 21 15 22.696365347782 66 8 8 15 17.8922702118354 84 3 26 PK Bhattacharya 15 32.2897042414451 47 1 M Bhattacharya DV Gupta AK Aziz 15 15 15 62.5699608438426 72.1920331835336 145.03648966151 24 21 10 1 1 0 Scientist Name TF SyedaMahmood F Hirsch MR AzimiSadjadi V-Index: An Index Based on Consistent Researcher Productivity 49 38 36 51 34 Chapter 5 Results and Analysis Highlighted rows in the results table show the most consistent scientist and so are highly ranked in the following scientists. Table shows that scientist ranking is effecting by the time normalization factor in v-index calculation. We can see that with simple v-index calculation Mr Azimi-Sadjadi is ranked higher but when time factor normalization is applied F Hirsch moves upward in index table, which shows that time factor has major impact on the scientist ranking. In our second set of scientists we have found 4 scientists with same h-index values of 16. We have applied same process as applied on the scientists of table 4(a). Table 5.3: Scientists with h-index 16 M Heinz PubR Citations 1 135 2 127 3 81 4 67 5 65 6 56 7 41 8 35 9 32 10 29 11 26 12 22 13 22 14 20 15 18 16 17 17 15 18 13 19 13 20 12 21 11 22 10 R Bhattacharya PubR Citations 1 110 2 49 3 44 4 41 5 40 6 35 7 30 8 26 9 26 10 21 11 20 12 20 13 19 14 17 15 16 16 16 17 14 18 12 19 12 20 11 21 10 22 10 SY Chung PubR Citations 1 74 2 74 3 67 4 42 5 42 6 37 7 37 8 30 9 29 10 19 11 19 12 18 13 18 14 17 15 17 16 16 17 15 18 9 19 9 20 8 21 8 22 7 M Khalid PubR Citations 1 151 2 61 3 50 4 43 5 38 6 35 7 33 8 29 9 22 10 21 11 19 12 18 13 18 14 17 15 16 16 16 17 14 18 13 19 13 20 13 21 12 22 11 In figure 3(b) chart is drawn using the values given in table 4(b). This chart also shows the difference between the scientists. V-Index: An Index Based on Consistent Researcher Productivity 35 Chapter 5 Results and Analysis Figure 5.2: Chart of scientists with h-index 16 We have applied our proposed index (v-index) and after experiment results are generated shown in table 9. Values in result table shows that M Khalid is more contributing scientist among the other and so he is placed on the top highlighted in the table with normal v-index calculation. Result table shows that standard deviation value plays the key role in ranking the scientist. Less standard deviation results better performance of a scientist. Table 5.4: Result of scientists with h-index 16 Vindex V-index Time Normalized Author Research Age (yrs) Scientist Name hindex Standard Deviation ( ) M Khalid 16 19.0702733246098 84 3 32 SY Chung 16 19.631215331319 82 14 6 R Bhattacharya 16 20.3232498080061 79 3 27 M Heinz 16 32.6375875519641 49 12 4 V-Index: An Index Based on Consistent Researcher Productivity 36 Chapter 5 Results and Analysis Again when we introduce time normalized v-index, it impacts the ranking of scientists now SY Chung is placed higher than M Khalid even M Heinz got higher position from M Khalid. 5.3- Scientists with same g-index In our second case we have found multiple scientists with same g-index value. In this section we have extracted the data of 5 scientists with g-index value of 25 and then we have applied our v-index method to rank these scientists properly. In Table 7(a) all the scientists with their calculated sum of citations are mentioned. Highlighted row in the table shows the g-index value. Table 5.5: Scientists with g-index value 25 L Egghe SY Chung Cit Cit Pu Citat Pu ∑C bR ions bR2 i 1 75 1 75 74 1 74 2 44 4 119 74 4 3 43 9 162 67 4 41 16 203 5 32 25 6 30 7 T Syeda-Mahmood Cit ∑C bR2 i ons 2 110 1 110 92 1 92 126 1 126 148 49 4 159 60 4 152 83 4 209 9 215 44 9 203 58 9 210 73 9 282 42 16 257 41 16 244 57 16 267 70 16 352 235 42 25 299 40 25 284 56 25 323 46 25 398 36 265 37 36 336 35 36 319 49 36 372 40 36 438 27 49 292 37 49 373 30 49 349 39 49 411 39 49 477 8 26 64 318 30 64 403 26 64 375 23 64 434 38 64 515 9 25 81 343 29 81 432 26 81 401 20 81 454 22 81 537 10 24 100 367 19 100 451 21 100 422 17 100 471 21 100 558 11 23 121 390 19 121 470 20 121 442 16 121 487 11 121 569 12 21 144 411 18 144 488 20 144 462 15 144 502 10 144 579 13 21 169 432 18 169 506 19 169 481 14 169 516 8 169 587 14 21 196 453 17 196 523 17 196 498 13 196 529 8 196 595 ns bR2 ∑Ci atio ns V-Index: An Index Based on Consistent Researcher Productivity Citati PubR M Mahmood Pu atio Pu R Bhattacharya ∑Ci atio ns Pu bR2 37 ∑Ci Chapter 5 Results and Analysis 15 19 225 472 17 225 540 16 225 514 11 225 540 8 225 603 16 19 256 491 15 256 555 15 256 529 11 256 551 6 256 609 17 19 289 510 15 289 570 14 289 543 11 289 562 5 289 614 18 17 324 527 9 324 579 12 324 555 10 324 572 4 324 618 19 17 361 544 9 361 588 12 361 567 10 361 582 4 361 622 20 16 400 560 8 400 596 11 400 578 10 400 592 3 400 625 21 15 441 575 8 441 604 10 441 588 9 441 601 3 441 628 22 15 484 590 7 484 611 10 484 598 9 484 610 3 484 631 23 15 529 605 7 529 618 10 529 608 7 529 617 3 529 634 24 13 576 618 6 576 624 9 576 617 7 576 624 2 576 636 25 13 625 631 6 625 630 9 625 626 7 625 631 2 625 638 26 13 676 644 5 676 635 9 676 635 6 676 637 2 676 640 27 13 729 657 5 729 640 9 729 644 6 729 643 2 729 642 28 12 784 669 4 784 644 7 784 651 5 784 648 2 784 644 Figure 4(a) shows the chart of the citation distributions of the respective publication of the given scientists. Chart is drawn against the values given in table 7(a) and it also clearly show the difference and inconsistency of different scientists for their received citations. This inconsistency is unseen by the existing g-index. To handle this weakness of g-index we have applied our proposed v-index to evaluate the work of scientists. Results of the experiments are shown in Table 11. Our proposed v-index has found the dissimilarity between scientists which g-index was unable to determine. New ranking of the scientist is given in table 11. Previous methods are unable to consider the consistency in received citations. Experiment showed that consistency has play major role to rank the scientists. V-Index: An Index Based on Consistent Researcher Productivity 38 Chapter 5 Results and Analysis Figure 5.3: Chart of scientists with g-index 25 Highlighted row displays the most consistent researcher or scientist among the given scientists. Table 5.6: Result of scientists with g-index 25 Scientist Name L Egghe SY Chung R Bhattacharya T SyedaMahmood M Mahmood gindex 25 25 25 25 25 Vindex V-index Time Normalized Author Research Age (yrs) 11.2829554969327 222 6 35 19.631215331319 127 21 6 20.3232498080061 123 5 27 20.4859566192221 122 6 20 29.292154291385 85 3 27 Standard Deviation ( ) V-Index: An Index Based on Consistent Researcher Productivity 39 Chapter 5 Results and Analysis In the above table we can see that scientist ranking has different sets with normal v-index and vindex with time normalization. L Egghe is ranked higher in normal v-index while with time normalization SY change is placed at the top. Another set of scientist is found in our dataset with same g-index. In this set two scientists are present with g-index value of 21. Table 7(b) shows the calculated cumulative sum of the received citations of individual scientist and g-index value is shown in the highlighted row of the table. We have applied our proposed method on this set of data to find the most consistent scientist regarding the received citations record. Table 5.7: Scientists with g-index value 21 BB Bhattacharya Pub R Citati ons PubR2 ∑Ci P Dev Citati PubR ons 2 ∑Ci 1 65 1 65 108 1 108 2 62 4 127 37 4 145 3 41 9 168 37 9 182 4 27 16 195 32 16 214 5 26 25 221 25 25 239 6 23 36 244 23 36 262 7 18 49 262 22 49 284 8 17 64 279 20 64 304 9 17 81 296 19 81 323 10 16 100 312 18 100 341 11 16 121 328 14 121 355 12 15 144 343 13 144 368 13 14 169 357 13 169 381 14 14 196 371 12 196 393 15 13 225 384 12 225 405 16 13 256 397 10 256 415 17 13 289 410 7 289 422 V-Index: An Index Based on Consistent Researcher Productivity 40 Chapter 5 Results and Analysis 18 13 324 423 7 324 429 19 13 361 436 7 361 436 20 12 400 448 7 400 443 21 12 441 460 7 441 450 22 12 484 472 6 484 456 23 12 529 484 6 529 462 24 12 576 496 6 576 468 25 12 625 508 5 625 473 Figure 4(b) show the chart displaying the graphical representation of the data given in table 7(b). Chart clearly depicts that scientist BB Battacharya is more consistent than P Dev whereas existing g-index has placed both scientists on the same rank neglecting the consistency of received citations. With our new index the v-index we are able to find the more contributing scientists among them. Figure 5.4: Chart of scientists with g-index 21. Table 9 show the calculated results which shows that scientist BB Bhattacharya is more consistent than the other. Hence our proposed v-index has placed is on higher rank whereas V-Index: An Index Based on Consistent Researcher Productivity 41 Chapter 5 Results and Analysis existing g-index was unable to find the difference and both scientist has same g-index rank. Highlighted row shows the more consistent scientist in both case normal and with time normalization. Table 5.8: Result of scientists with g-index 21 Scientist Name Vindex V-index Time Normalized Author Research Age (yrs) 14.7586212570298 142 3 42 12.3199223944341 171 6 27 Standard Deviation ( ) gindex 21 P Dev BB Bhattacharya 21 5.4- Scientists with same h and g-index This is the third and last case in which we have consider the situation which rarely exist that if two are more scientists have same h-index and g-index values. Fortunately we have found the case in our data set. There are two scientists A Dev and DP Chakraborty with same h and g indices. Both scientists have 7 h-index value and 15 g-index value. This the best real data example that both of the existing indices collectively were unable to rank the scientists whereas after applying our v-index method on the base of consistency we have found the rank difference between these two scientists. Table 10 shows the citation distribution of both scientists with calculated h and g indices in the highlighted rows. Table 5.9: Scientists with same h and g indices A Dev Pub R Citati ons DP Chakraborty PubR2 ∑Ci Citati PubR ons 2 ∑Ci 1 57 1 57 138 1 138 2 45 4 102 26 4 164 3 41 9 143 19 9 183 4 22 16 165 9 16 192 V-Index: An Index Based on Consistent Researcher Productivity 42 Chapter 5 Results and Analysis 5 21 25 186 8 25 200 6 9 36 195 7 36 207 7 9 49 204 7 49 213 8 6 64 210 5 64 218 9 5 81 215 4 81 222 10 5 100 220 4 100 226 11 5 121 225 3 121 229 12 5 144 230 2 144 231 13 5 169 235 2 169 233 14 4 196 239 1 196 234 15 4 225 243 1 225 235 16 4 256 247 1 256 236 17 3 289 250 1 289 237 18 3 324 253 1 324 238 19 2 361 255 0 361 238 20 2 400 257 0 400 238 Figure 5 shows the graphical view of received citations with respect to the publications of both scientists. Chart shows the in consistency of received citations and difference between both scientists which existing method were even collectively unable to find. Our applied methodology founds the most consistent scientist effectively. Table 11 shows the results after processing the data of both scientists by using v-index methodology. We have found the clear difference between the scientists with respect to consistency in received citations. In the case V-index outperforms over the existing h and g indices. V-Index: An Index Based on Consistent Researcher Productivity 43 Chapter 5 Results and Analysis Figure 5.5: Chart of scientists with same h and g indices Table 5.10: Results of scientists with same h and g indices Scientist A Dev Indices (h,g) Standard Deviation ( ) V-index V-index Time Normaliz ed H 7 13.1708854000869 53 2 G 15 13.1708854000869 114 5 H 7 25.6166798339341 27 1 G 15 25.6166798339341 59 3 Author Research Age (yrs) 23 DP Chakraborty 23 V-Index: An Index Based on Consistent Researcher Productivity 44 Chapter 5 Results and Analysis 5.5- Summary In this chapter analysis on the calculated results has been performed. We have taken three cases and calculated the results by performing index calculation on the dataset. We have calculated v-index and v-index with time normalization factor. In first case scientist with same h index values are considered and it is observed that v-index and v-index time normalized has performed better and ranked the scientist properly in term of consistency. Likewise in other case with same g index and with same g and h index number of scientists are considered and we have applied both versions of v-index with time and without time normalization and it has major impact on scientist ranking. During result analysis it has been found that scientist ranking get disturbed when we apply time normalization factor. Time factor changes the order of the ranking on can see the highlighted rows in the results table. V-Index: An Index Based on Consistent Researcher Productivity 45 Chapter 6 Chapter 6 Research Contributions and Conclusions Research Contribution and Conclusions V-Index: An Index Based on Consistent Researcher Productivity 46 Chapter 6 6.1 Research Contributions and Conclusions Research Contribution The major contributions of this work are (1) highlighting the importance of consistent citations of publication for researcher productivity indexing (2) a proposal of a simple method for calculating variation among citations received by the papers of a researcher. To the best of our knowledge this is the first work which considers citations variation of papers for researcher productivity indexing. 6.1.1 Productivity and Efficiency with Consistency Highlighting the importance of consistency in term of received citations is the major contribution of our research work. We have shown that the how consistency plays an important role to rank scientists and find out the productivity and efficiency in his published work. With addition of adding scientist research age with v-index has produced great results. 6.1.2 Proposal of Simple Method In our research we have proposed simple method for the calculation of ranking index for scientist’s work. It is very easy to apply on any data set and it can be used with any existing indexes. Its main purpose is to add consistency and efficiency factor to existing indexes finding new index value. We have proven our results by applying on existing h and g indexes. 6.2 Conclusions Existing methods for indexing researchers or groups are not considering very important factor of their consistent productivity. The addition of consistent productivity factor in terms of citation variation of researcher papers is novel as researcher with more consistent citation record is more productive. The idea of consistent productivity is quite general and can be applied to all existing researcher productivity indexing methods by simply dividing their values by citation variation value calculated through standard deviation same as we did here in case of h and g indexes values. The time factor normalization like m-quotient for research career length [6] can be easily performed for citation variation by considering the Research Age of an author or scientist. This time factor indulging has improved the results and shown the more efficient scientist to achieve the certain ranking in short period of time. We can improve it by considering the year wise citations of each publication. This is an V-Index: An Index Based on Consistent Researcher Productivity 47 Chapter 6 Research Contributions and Conclusions open enhancement to rank the scientist through its career growth by receiving the citations yearly for specific publication. This will show the quality of work contributions of a scientist if publication is receiving more and more citations every year. V-Index: An Index Based on Consistent Researcher Productivity 48 References References [1] Adler, R., Ewing, J. and Taylor, P. (2008) “Citation Statistics” A report from the International Mathematical Union (IMU). [2] Alonso, S., Cabrerizo, F.J., Herrera-Viedma, E. and Herrera, F. (2010) “hg-index: A new index to characterize the scientific output of researchers based on the h- and g- indices.” Science metrics, Vol. 82(2), pp. 391-400. [3] Bouyssou, D. and Marchant, T. (2010) “Consistent Bibliometric rankings of authors and of journals.” Journal of Informetrics, Vol. 4, pp, 365-378. [4] Braun, T., Glänzel, W. and Schubert, A. (2006) “A Hirsch-type index for journals.” Jointly published by Akadémiai Kiadó, Budapest Scientometrics, and Springer, Dordrecht., Vol. 69(1), pp. 169–173. [5] Burrell, Q.L. (2007) “Hirsch’s h-index: a stochastic model”. Journal of Informatics, Vol. 1(1), pp. 16–25. [6] Burrell, Q.L. (2007) “On the h-index, the size of the Hirsch core and Jin’s A-index.” Journal of Informetrics, Vol. 1(2), pp. 170–177. [7] Cabrerizo, F.J., Alonso, S., Herrera-Viedma, E. and Herrera, F. (2009) “q2-Index: Quantitative and Qualitative Evaluation Based on the Number and Impact of Papers in the Hirsch Core.” Journal of Informatics, Vol. 4(1), pp. 23-28. [8] Costas, R. and Bordons, M. (2008) “Is g-index better than h-index? An exploratory study at the individual level”, Jointly published by Akadémiai Kiadó, Budapest and Springer, Dordrecht, Scientometrics, Vol. 77(2), pp. 267–288. [9] Egghe, L. (2006) “Theory and Practice of the g-index.” Jointly published by Akadémiai Kiadó, Budapest and Springer, Dordrecht; Scientometrics, Vol. 69(1), pp. 131–152. [10] Egghe, L. (2006) “An Improvement to H-index: The G-index”. ISSI News-Letter, Vol. 2(1) pp. 8-9. [11] Egghe, L. and Rousseau, R. (2008) “An h-index weighted by citation impact.” Information Processing and Management, Vol. 44(2), pp. 770-780. [12] Egghe, L. and Rousseau, R. (2006) “An informetric model for the Hirsch-index” Jointly published by Akadémiai Kiadó, Budapest and Springer, Dordrecht. Scientometrics, Vol. 69(1), pp.121–129. [13] Garfield, E. (2001) “Impact factors, and why they won’t go away.” Nature, Vol. 411(6837), pp. 522–522. V-Index: An Index Based on Consistent Researcher Productivity 49 References [14] Glänzel, W. (2006) “On the h-index – a mathematical approach to a new measure of publication activity and citation impact.” Scientometrics, 67, pp. 315-321. [15] Hirsch, J. E. (2005) “An index to quantify an individual research output.” Proceedings of the National Academy of Sciences of the United States of America, Vol. 102, pp. 16569– 16572. [16] Jin, B.h., Liang, L.M., Rousseau, R. and Egghe, L. (2007) “The R- and AR- indices: Complementing the h-index.”, Chinese Science Bulletin, Vol. 52(6), pp. 855-863. [17] Kosmulski, M. (2006) “A new hirsch-type index saves time and works equally well as the original h-index”. International Society for Scientometrics and Informetrics (ISSI). [18] Liang, L. (2006) “h-index sequence and h-index matrix: Constructions and applications.” Jointly published by Academia Kiadó, Budapest and Springer, Dordrecht, Scientometrics, Vol. 69(1), pp. 153–159. [19] Rousseau, R. (2006) “Simple models and the corresponding h and g-indexes.” http://hdl.handle.net/1942/944. [20] Sidiropoulos, A., Katsaros, D. and Manolopoulos, Y. (2007) “Generalized Hirsch h-index for disclosing latent facts in citation networks” Jointly published by Akadémiai Kiadó, Budapest and Springer, Dordrecht Scientometrics, Vol. 72(2), pp. 253–280. [21] M Jagdesh Kumar.(2011)“Evaluating Scientists: Citations, Impact Factor, h-Index, Online Page Hits and What Else?” published in http://mamidala.wordpress.com/2011/07/10/25/ [22] Visual Studio 2008, http://en.wikipedia.org/wiki/Microsoft_Visual_Studio. [23] Editorial Science Direct. (2007) “What is a Problem statement”. Library & Information Science Research 29, pp.307–309. [24] Garfield, E. (1955) “Citation Indexes for Science: A New Dimension in Documentation through Association of Ideas”. Science, Vol:122, No:3159, p. 108-111 [25] Garfield, E. (1973) “Citation Frequency as a Measure of Research Activity and Performance”. Published in Essays of an Information Scientist Vol.1, pp. 406-408 [26] Article on Standard deviation in Wikipedia. http://en.wikipedia.org/wiki/Standard_deviation [27] Article on standard deviation and Variance. http://www.managers-net.com/stddev.html V-Index: An Index Based on Consistent Researcher Productivity 50
© Copyright 2025