This article deals with an application of the h-index for topics. The h-index as defined by Hirsch is the number h such that, for a general group of papers, h papers received at least h citations while the other papers received no more than h citations. This definition has been extended by Banks to apply to topics. In this article the Web of Science, Google Scholar and Exalead are used to determine h-indices for topics of interest to developing countries. It is shown that there are important differences between searches performed in a specialized database (here the Web of Science) and the public, freely available, Web. It is further shown that country rankings for articles in the Web of Science and Google Scholar may show huge differences. Correlation coefficients between Google Scholar article rankings and Exalead website rankings are generally higher than between the Web of science and Exalead.Contents
Introduction
The Hirsch index
Aim
Methods
Basic results
Relation between the number of articles and the h-index
Searches on the Web
Conclusion
Introduction
STIMULATE stands for Scientific and Technological Information Management in Universities and Libraries: an Active Training Environment. It is an international training programme in information management, supported by the Flemish Interuniversity Council (VLIR), aiming at young scientists and professionals from developing countries. The programme has a dual purpose: it intends to develop the personal professional skills of the participants, and the participants are actively encouraged to transfer their newly acquired knowledge and skills to their colleagues and other stakeholders in their home country (Nieuwenhuysen, 2001).
One of the higher level STIMULATE courses introduces students to the use of the World Wide Web and to Thomsons Web of Knowledge as tools for library management and research evaluation. Research evaluation should, indeed, not be a matter of the administration alone, but information professionals should actively participate in such exercises. This article is the result of the active training part of this particular course. It shows how the Hirsch index, defined in the next section, can be applied to topics of interest to developing countries. It moreover illustrates differences between data available in professional databases such as the Web of Knowledge and data that are freely available on the Internet.
The Hirsch index
Recently J. Hirsch proposed a new index for evaluating a scientists life-time achievements (Hirsch, 2005). A scientists h-index is h0 if h0 is the largest natural number such that the first h0 publications in this scientists publication list, ranked according to the number of lifetime citations, received each at least h0 citations. This new index soon attracted a lot of interest and it was readily observed that simple variations of the basic idea can be used in the context of journal citation analysis (Braun, et al., 2006; Rousseau, 2006), and, more generally, in many source-item relationships (Egghe and Rousseau, 2006; Liu and Rousseau, 2007). Soon variations on the idea of an h-index have been proposed, notably by Egghe (2006) and Jin (2006). An interesting twist on the Hirsch-index idea has been provided by Banks (2006) who proposes to use topics, instead of scientists, and study the relation between articles related to this topic and the number of citations received, in the same manner as Hirsch had done for scientists. The idea is that when the period over which citations are received is restricted to the recent past a high Hirsch index refers to a hot topic.
Aim
We intend to demonstrate the use of the Web of Science (WoS) and the Internet for the determination of h-indices for topics, in particular topics of interest to developing countries. We will, however, not restrict ourselves to recent or hot articles, but we will consider the general interest in such topics. We also investigated differences between the availability of articles or Web sites related to such topics in the Web of Science and the public Internet.
Methods
Choice of topics
Students performed a search on two topics: the first one is her home country [note: in order to avoid writing his/her all the time, we use her because there are more female students than male ones involved], the second one is a topic of particular interest to her country. More specifically, each student member of the group suggested a topic for which she thought that her country would be among the first ten largest producers of articles according to the Web of Science. In case that turned out to be incorrect, a search began in the Web of Science in order to find such a topic. We finally succeeded for each country, except for Ghana where the best topic (malaria and pregnan*) places Ghana on rank 13 in the world. There were two students from the Philippines and two students from Vietnam in the group, explaining why these countries have two entries. The best topic (that we found) for Zambia turned out to be a topic where Zimbabwe is the highest producer of articles in the world (namely: Zambezi*). For this reason Zimbabwe has two topics. The list of topics is shown in Table 1.
Country Topic Bangladesh arsenic AND tube* Cuba D-003 Ethiopia tef AND (Zucc* OR Eragrostis) Ghana malaria AND pregnan* Philippines Pinatubo Philippines tilapia AND culture Sri Lanka water scarcity Sudan dromedary* Tanzania Kiliman* Uganda "Lake Victoria" Vietnam Mekong Vietnam Indochina Zambia Zambezi* Zimbabwe Zambezi* Zimbabwe cattle and tick* We provide now some information about those topics which may be somewhat unfamiliar for people not living in the topics country. The query arsenic and tube* refers to the problem of excessive levels of arsenic in drinking water in Southeast Asia, especially in Bangladesh (Ahmed, et al., 2006); D-003 is a mixture of very high molecular weight aliphatic acids purified from sugar cane wax showing cholesterol-lowering and anti-platelet effects (Gamez, et al., 2000). The search string tef AND (Zucc* OR Eragrostis) refers to tef (Eragrostis tef (Zucc.) Trotter), the most important cereal crop in Ethiopia, a grain somewhat similar to millet. Pinatubo refers, of course, to Mount Pinatubo, the volcano situated on the isle of Luzon (the Philippines) that erupted in June 1991. Finally, tilapia is a group of plant-eating fresh water fish, easily cultivated in ponds. We note that the study of geographic locations situated in (or in the neighbourhood of) the country such as Mount Pinatubo, Kilimanjaro, Lake Victoria, Zambezi, Mekong and Indochina gives local researchers an advantage on foreigners, bringing these developing countries high in the rankings (the exact rankings are given in the results section).
Data in the Web of Science were collected in November 2006 according to the following procedure.
Procedure
- Open the Web of Science, go to General Search, and do not change the setting of the years (hence use the period from 1972 to 2006).
- First use your country for a TOPIC search (NOT address!) in order to find articles studying your country. You may or may not truncate (using a *) in order to include citizens of your country or typical events or places related to your country.
- Take note of the total number of articles found.
- ANALYSE the result by COUNTRY, and copy the top 10 to an MS Excel file. When searching for your own country it will probably be number 1. When doing a special topic search it is possible that your country is not among the top 10. In that case try to find a better topic: one for which your country does make the top-10 of countries.
- Go BACK to the search results.
- Find the Hirsch-index of this list. Do this by sorting the list by times cited (menu on the right of the page, above ANALYSE). Note the Hirsch-index.
- Repeat steps 2 to 6 for your special topic (see Table 1).
Observe that we did not use the recently added option Citation Report. This was done for two reasons: first students learn more if they have to determine the Hirsch index themselves; secondly: this option does not work (at least for the moment) for more than 10,000 search results.
Basic results
In this section we show the results of the searches.
Search for countries
Country Number of articles Rank h-index Bangladesh 8,199 1 83 Cuba 9,427 2 (U.S. first) 53 Ethiopia 8,654 1 70 Ghana 5,759 1 55 Philippines 8,896 2 (U.S. first) 69 Sri Lanka 5,423 1 58 Sudan 7,564 1 64 Tanzania 9,132 1 77 Uganda 5,868 1 73 Vietnam 12,752 2 (U.S. first) 97 Zambia 4,001 1 52 Zimbabwe 6,321 1 55 The column rank refers to the position of the home country (shown in the first column) in the list of countries ranked according to the number of articles found. A search for Vietnam (search term: Vietnam*) as a topic, brought the U.S. on top. Given the recent history this may not come as a surprise. Students were encouraged to include not only the name of the country but also of its inhabitants. In this way the search Philippines OR Filipino OR Filipina, and Cuba OR Cuban OR Cubans too brought the U.S. on top of the country list. This did not happen for the other countries. Cuba* yielded too much articles related to cubane (a chemical substance) hence data refer to the query Cuba OR Cuban OR Cubans.
Topic searches
Topic Country Rank Number of articles h-index Arsenic AND tube* Bangladesh 5 381 34 D-003 Cuba 1 53 11 tef AND (Zucc* OR Eragrostis) Ethiopia 1 132 13 malaria AND pregnan* Ghana 13 1,090 53 Pinatubo Philippines 9 1,156 67 tilapia AND culture Philippines 6 1,507 27 "water scarcity" Sri Lanka 9 416 16 dromedar* Sudan 6 1,507 27 Kiliman* Tanzania 2 275 19 Lake Victoria Uganda 3 597 39 Indochina Vietnam 7 892 34 Mekong Vietnam 1 520 17 Zambezi* Zambia 5 428 24 Zambezi* Zimbabwe 1 428 24 cattle AND tick* Zimbabwe 8 1,840 46 Again the rank refers to the rank of the country shown in column two in the ranked list of countries according to the number of publications. Hirsch indexes for countries range between 52 and 97; for these special topics it ranges between 11 and 67. The fact that the Hirsch-indices for topics are rather low is not really surprising as these topics refer to scientific subfields, which are certainly not mainstream. On the contrary, they are chosen as to reflect the special interest of these developing countries. Complete lists of country rankings for each topic are given in the appendix. Numerical results for England and Scotland (numbers for Wales and Northern Ireland were negligible) were added leading to the U.K.-ranking.
Relation between the number of articles and the h-index
One expects that the higher the number of published articles the higher the Hirsch-index. We checked if this is true for this small data set, and which function best describes this relation. Figure 1 shows the data for the country search, Figure 2 for the special topics search. In the first case a linear relation between the logarithm of the number of articles and the h-index is acceptable: y = 70.11 x 203.85 (R = 0.69) is the equation of the best fitting line, with y = h-index, and x = log (# articles). For the second one a power law (exponent: 2.87) between log(publications) and the h-index describes the data better.
Figure 1: Relation between the logarithm of the number of published articles and the h-index for countries.
Figure 2: Relation between the logarithm of the number of published articles and h-index for special topics.
Figure 2 shows the best-fitting power law relation between x = log(# articles) and the h-index (y). Its equation is y = 1.67 x2.87 (R_ = 0.74). Here there is one clear outlier (highest h-index). This point corresponds to the results for Mount Pinatubo. Clearly articles related to Mount Pinatubo and its eruption attracted relatively more citations than articles related to the other topics studied here. Removing this point yields Figure 3. Here the relation can again be described by a linear equation: y = 23.1 x 34.14 (R = 0.74), but remember that x = log(# articles).
Figure 3: Relation between the logarithm of the number of published articles and h-index for special topics (without Pinatubo).
Searches on the Web
We performed the same topic searches (only for the special topics, not for the countries) on the Web. As Google does not allow for truncations (and, moreover, we found out that numerical results shown by Google are highly unreliable) we used Exalead, a European (France) based search engine which does allow truncation. Yet, also Exaleads numbers of retrieved Web sites are not completely stable. More precisely we used search queries of the type
arsenic AND tube* AND site:uk
These queries contain the search topic of Table 1 and each of the countries shown in the appendix. For the United States we performed a search for site:edu. Hence we began with arsenic AND tube* AND site:edu and ended with cattle AND tick* AND site:jp (see appendix). Of course, each participant performed the searches related to her own countrys topic. We collected the number of retrieved Web sites.
Next we did the same in Google Scholar, as an example of a freely available scientific Web site. As truncation is not possible in this database we slightly altered the queries. Tube* became tube OR tubes; Zambezi* simply became Zambezi; tick* became tick OR ticks; Kiliman* became Kilimanjaro OR Kilimandjaro; Zucc* was not used in the query; dromedary* became dromedarius OR dromedary; pregnan* became pregnant OR pregnancy.
Searches for "D-001" gave many false hits. In fact Exalead gave only one correct site. For this reason we did not take this query into account.
Once results were obtained, countries (the same as for the WoS searches) were ranked and Spearman rank correlation coefficients (denoted as RS) using average ranks in case of ties, were calculated between WoS and Exalead, WoS and Google Scholar and between Exalead and Google Scholar. Results are shown in Table 4.
Country Topic RS(WoS-Exalead) RS(WoS-Google Scholar) RS(Google Scholar-Exalead) Bangladesh arsenic AND tube* 0.349 0.518 0.499 Ethiopia tef AND Eragrostis 0.517 0.427 0.921 Ghana malaria AND pregnan* 0.358 0.450 0.897 Philippines Pinatubo 0.685 0.900 0.717 Philippines tilapia AND culture -0.006 0.190 0.687 Sri Lanka "water scarcity" 0.756 0.707 0.830 Sudan dromedar* -0.321 0.000 0.560 Tanzania Kiliman* 0.200 0.670 0.782 Uganda Lake Victoria 0.479 0.018 0.620 Vietnam Mekong 0.539 0.321 0.321 Vietnam Indochina 0.103 0.527 0.273 Zambia/Zimbabwe Zambezi* 0.140 0.235 0.887 Zimbabwe cattle AND tick* 0.345 0.648 0.685 Average 0.319 0.432 0.668 Low correlation coefficients between the rankings of countries obtained from the Web of Science and Exalead can be explained by two factors. One is the generally low presence of developing countries on the Internet. This brings these countries low in the Exalead rankings. The second factor is the fact that we performed searches using English terms, while many of the involved countries use and prefer other languages. This is particularly true for the dromedary query involving countries such as Sudan, the United Arabic Emirates, Saudi Arabia, Iran, Egypt and Morocco.
Although Google scholar and the Web of Science are both scientific databases their average correlation is considerably lower than that between Google Scholar and Exalead. Clearly the low Web presence of developing countries is the main factor here. We do not believe that it makes sense to perform statistical tests on this type of data, but just for the record we mention that for ten data points and a one-sided test at the 5 percent level RS-values above 0.56 reject the null hypothesis of no correlation. This happens in only 15 cases out of 39 (38 percent), mainly for Google Scholar-Exalead comparisons.
Conclusion
Following Banks (2006) we have shown that h-indices can be calculated for topic searches. It is only by gaining more experience that the real importance of the h-index, be it for scientists, journals or topics can be judged. We hope we made a contribution towards this aim. As shown by our small group, scientists and students from any country can contribute to this. It is clear that whatever the context, the used database or search engine will heavily influence the obtained results.
About the authors
Maisa Abdellatif Abbas Ali (Sudan), Kodjo Atiso (Ghana), Yuda Julius Chatama (Tanzania), Jawet Chiguvare (Zimbabwe), Rina Diaron (Philippines), Loan Thi Phuong Dinh (Vietnam), Matthew Lubuulwa (Uganda), Dilruba Mahbuba (Bangladesh), Urenika Samanthi Millawithanachchi (Sri Lanka), Thi Thu Nga Nguyen (Vietnam), Grace Quijano (the Philippines), Zulia Ramírez Céspedes (Cuba), Ronald Rousseau (Belgium), Molach Weldemichael (Ethiopia), Hazel M. Zulu (Zambia).
All correspondence should be sent to Ronald Rousseau, e-mail = ronald [dot] rousseau [at] ua [dot] ac [dot] be
Acknowledgements
The members of the STIMULATE 6 Group express their sincere thanks to Professor Paul Nieuwenhuysen (VUB, Brussels) and the VLIR (Flemish Interuniversity Council) who made this multinational collaboration possible.
References
M.F. Ahmed, S. Ahuja, M. Alauddin, S.J. Hug, J.R. Lloyd, A. Pfaff, T. Pichler, C. Saltikov, M. Stute and A. van Geen, 2006. Ensuring safe drinking water in Bangladesh, Science, volume 314, number 5806 (15 December), pp. 1687-1688.
Michael G. Banks, 2006. An extension of the Hirsch index: Indexing scientific topics and compounds, Scientometrics, volume 69, number 1, pp. 161-168.
Tibor Braun, Wolfgang Glänzel and András Schubert, 2006. A Hirsch-type index for journals, Scientometrics, volume 69, number 1, pp. 169-173.
Leo Egghe, 2006. An improvement of the h-index: The g-index, ISSI Newsletter, volume 2, number 1, pp. 8-9.
Leo Egghe and Ronald Rousseau, 2006. An informetric model for the h-index, Scientometrics, volume 69, number 1, pp. 121-129.
R. Gamez, S. Mendoza, R. Mas, R. Mesa, G. Castano, B. Rodriguez and D. Marrero, 2000. Dose-dependent cholesterol-lowering effects of D-003 on normocholesterolemic rabbits, Current Therapeutic Research Clinical and Experimental, volume 61, number 7, pp. 460-468.
Jorge E. Hirsch, 2005. An index to quantify an individuals scientific output, Proceedings of the National Academy of Sciences of the United States of America, volume 102, number 46, pp. 16569-16572.
Bihui Jin, 2006. H-index: An evaluation indicator proposed by scientist, Science Focus, volume 1, number 1, pp. 8-9 (in Chinese).
Luxian Liu and Ronald Rousseau, 2007. The Hirsch index, its generalizations and library management: The case of Tongji University Library, preprint.
Paul Nieuwenhuysen, 2001. An international training programme: STIMULATE: Scientific and Technological Information Management in Universities and Libraries, an Active Training Environment, D-Lib Magazine, volume 7, number 2, at http://www.dlib.org/.
Ronald Rousseau, 2006. A case study: evolution of JASIS Hirsch index, Science Focus, volume 1, number 1, pp. 16-17 (in Chinese). English version available at E-LIS, code 5430.
Appendix
List of countries, their special topic and ranked list of countries publishing the most articles on this topic according to the Web of ScienceBangladesh 1 usa Arsenic AND tube* 2 germany 3 japan 4 india 5 bangladesh 5 canada 7 u.k. 7 china 9 poland 9 spain Cuba 1 cuba "D-003" 2 australia 3 czech rep 3 japan 3 china 3 spain 3 u.s. 8 byelarus 8 canada 8 denmark Ethiopia 1 ethiopia tef AND (Zucc* OR Eragrostis) 2 u.s. 3 germany 3 u.k. 5 south africa 5 sweden 7 netherlands 8 austria 8 norway 10 india Ghana 1 u.s. malaria AND pregnan* 2 u.k. 3 france 4 netherlands 5 malawi 6 kenya 7 tanzania 8 thailand 9 switzerland 10 germany 11 australia 12 denmark 13 cameroon 13 ghana 13 sweden Phillipines 1 u.s. Pinatubo 2 germany 3 u.k. 4 japan 5 france 6 canada 7 italy 8 russia 9 phillipines 10 new zealand Phillipines 1 u.s. tilapia AND culture 2 israel 3 u.k. 4 thailand 5 belgium 6 japan 6 phillipines 8 netherlands 9 canada 9 india Sri Lanka 1 u.s. "water scarcity" 2 india 3 netherlands 4 u.k. 5 germany 5 spain 7 israel 8 sweden 9 jordan 9 sri lanka Sudan 1 india dromedar* 2 saudi arabia 3 egypt 4 u. arab emirates 5 u.s. 6 sudan 7 france 8 iran 9 u.k. 10 morocco Tanzania 1 u.s. Kiliman* 2 tanzania 3 u.k. 4 norway 5 germany 6 kenya 7 france 8 austria 9 south africa 10 nigeria 10 switzerland Uganda 1 u.s. Lake Victoria 2 kenya 3 uganda 4 u.k. 5 netherlands 6 tanzania 7 canada 8 germany 9 japan 10 belgium Vietnam 1 u.s. Indochina 2 france 3 china 4 u.k. 5 japan 6 australia 7 vietnam 8 canada 9 thailand 10 germany Vietnam 1 vietnam Mekong 2 u.s. 3 japan 4 thailand 5 u.k. 6 australia 7 france 8 netherlands 9 belgium 10 china Zambia + Zimbabwe 1 zimbabwe zambezi* 2 u.s. 3 south africa 4 u.k. 5 zambia 6 germany 7 france 8 australia 8 belgium 10 netherlands Zimbabwe 1 u.s. cattle AND tick* 2 australia 3 u.k. 4 brazil 5 zenka 6 south africa 7 india 8 zimbabwe 9 france 10 japan
Editorial history
Paper received 23 December 2006; accepted 18 January 2007.
![]()
![]()
Copyright ©2007, First Monday.
Copyright ©2007, STIMULATE 6 Group.
The Hirsch index applied to topics of interest to developing countries by the STIMULATE 6 Group
First Monday, volume 12, number 2 (February 2007),
URL: http://firstmonday.org/issues/issue12_2/stimulate/index.html