A Universal Citation Database First Monday
A Universal Citation Database as a Catalyst for Reform in Scholarly Communication

A universal, Internet-based, bibliographic and citation database would link every scholarly work ever written - no matter how published - to every work that it cites and every work that cites it. Such a database could revolutionize many aspects of scholarly communication: literature research, keeping current with new literature, evaluation of scholarly work, choice of publication venue, among others. Models are proposed for the cost-effective operational and technical organization of such a database as well as for a feasible initial goal: the semi-universal citation database

Contents

Introduction and Motivation
Literature Research
Citation Analysis and the Evaluation of Scholarly Work
The Reform of Scholarly Communication
Lowered Cost of Literature Indexing
Models for a Universal Citation Database
Steady-State Data Production at Source
Distributed Database Organization
An Initial Goal: the Semi-Universal Citation Database
Conclusion
Notes

Introduction and Motivation

Imagine a universal bibliographic and citation database linking every scholarly work ever written - no matter how published - to every work that it cites and every work that cites it. Imagine that such a citation database was freely available over the Internet and was updated every day with all the new works published that day, including papers in traditional and electronic journals, conference papers, theses, technical reports, working papers, and preprints. Such a database would fundamentally change how scholars locate and keep current with the works of others. In turn, this would also affect how scholars publish their own works, in light of the increased visibility of research regardless of publication venue and the increased potential to demonstrate the value of works through citation analysis. In short, a universal citation database would serve as an important catalyst for reform in scholarly communication.

Literature Research

The use of citation data to locate newer works that cite important prior works in a field has long been an important technique for literature research. Unfortunately, the available indexes (principally the Science Citation Index and the Social Sciences and Humanities Citation Index) tend to be quite limited in their coverage, indexing only selected sets of academic journals. The journal selection also tends to change slowly over time, often failing to keep pace with the development of new journals. Thus citation searching must usually be augmented with other methods of searching to find important works in un-indexed journals, conference proceedings and elsewhere. Furthermore, the delays of archival journal publication and post-publication indexing mean that, in some cases, papers may only be found in an index several years after they have been written. As the pace of change in scholarly communication continues to increase, citation searching using the existing indexes tends to become less and less useful.

By definition, however, a universal citation database would provide search results that are comprehensive and up-to-date. As soon as a paper is published in any form, its citation data would be incorporated in the universal database. Citation searches could thus return the latest working papers or technical reports in an area. With well-formulated citation searches on a universal database, it might often be the case that no further searching need be done at all. One could fail to find papers only if the authors themselves had failed to cite the relevant literature.

As well as locating the papers of interest, a citation database system can also provide useful ways of filtering them. For example, one may be interested only in papers that cite two specific earlier works, or ones that both cite a particular work and use certain keywords in their abstracts or titles. One could also use citation counts for filtering; papers would only be selected if they had been cited a certain number of times (or by a certain number of distinct authors). This could be particularly useful for general or initial reading in an area; effort could be concentrated on those papers deemed important by the research community through their cumulative citation counts.

Another intriguing possibility is the use of citation data in current awareness services. Scholars would be able to establish search and filtering criteria matching their general and specialized interests. As publications and citation data are entered, any new papers matching the criteria would automatically be brought to the scholar's attention, e.g., by e-mail.

Citation Analysis and the Evaluation of Scholarly Work

Beyond the benefits for literature research, the analysis of citation data also has an important role in the evaluation of scholarly work and institutional decision making. One study reported that 35% of biochemistry departments and 60% of sociology departments surveyed had directly used citation data for hiring, promotion or salary decisions [ 9 ]. Appeals of tenure decisions have been fought on the grounds of comparative citation evaluation between those given tenure and those denied tenure [ 19 ]. Citation measures have also been used for rankings of entire academic departments; these rankings can influence applicants for graduate studies and major funding initiatives. Citation analysis has even been used to bring senior academic administrators into account by questioning the overall pattern of salary decisions [ 8 ].

Citation analysis also has important indirect effects on scholarly evaluation through citation-based journal rankings. Journal Citation Reports annually ranks journals based on such measures as journal impact factor (the average number of citations received per article published in the journal [7 ]), cited half-life (the number of years covered by the most recent half of a journal's citations) and others. These citation measures are often used by research librarians in making or recommending journal purchasing or cancellation decisions [ 3 ]. Scholars are often evaluated by the prestige of the journals in which they publish; both the citation rankings themselves and the availability of journals on library shelves are contributing elements in establishing journal prestige.

However, there are serious methodological issues in the application of citation analysis to scholarly evaluation. Many studies have been criticized on the grounds that citation counts are sensitive to "fads, foibles and popular trends in science" [ 14 ] and that simple-minded citation counts are often used without correlation of those counts with other relevant data [ 17 ]. Furthermore, use of the existing citation indexes tends to overemphasize the role of the particular journals indexed and devalue all other forms of scholarly communication.

These inadequacies may well be ameliorated, however, with the development of a freely-available universal citation database. quote The universality of the database would value all forms of publication equally, allowing the impact of works to be judged without measurement bias imposed by the inclusion or non-inclusion in present academic indices. The free availability of the database would allow data therefrom to be easily correlated with other information relevant to the evaluation of scholarly works.

The Reform of Scholarly Communication

Our present-day system of scholarly communication through refereed journals archived in research libraries is in a serious state of crisis. Hyper-inflation in journal prices and proliferation of academic journal titles are straining the materials acquisition budgets of research libraries everywhere. The so-called "serials crisis'" has been explored in depth in a recent study for the Andrew W. Mellon Foundation published by the Association of Research Libraries [ 5 ]. Between 1970 and 1990, using price data for U. S. journals only, the Mellon study found an eleven-fold increase in scientific journal prices and a nine-fold increase in journal prices overall, during which time the general price index rose by a factor of three. Another study reported that prices of scientific journals from major European and Japanese commercial publishers rose even more rapidly [ 16 ]. Over the same time period, there has been significant growth in the number of serials published; the Mellon study reports more than a 50% increase in the number of serial publications listed in the Ulrich's catalog and a 25% increase in serial titles held by a 24-library composite. To maintain core journal collections, most research libraries have been spending an increasing percentage of their overall budget on serials, to the detriment of monograph acquisition. Even so, serials cancellation projects have been reported at most research libraries over the last decade. Many have undergone several iterations of serials cancellation [ 4 ].

In response to this crisis, there have been a number of efforts to reform the world of scholarly publication by developing and promoting lower-cost alternatives to expensive print journals. In particular, electronic publication and dissemination of scholarly work has emerged as a serious and potentially lower-cost alternative to the conventional academic journal on paper. Indeed, there have been a number of efforts to establish fully-refereed electronic journals that are distributed free of charge via the Internet. A number of the first generation e-journals, such as EJournal and Postmodern Culture started with a no-frills, text-only (ASCII) format [ 1, 12 ]. Subsequently, high quality typography started to appear in freely-distributed electronic journals; in particular, mathematics journals such as the Electronic Journal of Combinatorics and the New York Journal of Mathematics use typesetting conventions based on the TeX system with the American Mathematical Society extensions. With the development of the World-Wide Web, originally ASCII-based e-journals have generally developed enriched graphics and hypertext formats.

Subscription-based electronic journals are also beginning to be widely available. Some of the efforts in this area, such as the Chicago Journal of Theoretical Computer Science and First Monday have been to develop new electronic-only journals with a modest subscription basis to provide for long-term stability of the journal. Other projects, such as the Johns Hopkins University Press Project Muse [ 13 ] and the electronic journals of the American Mathematical Society have been converting existing print journals to electronic form with reduced prices for electronic-only subscription. Many of the major commercial publishers are now starting to experiment with electronic subscriptions, but without commitment to reduced pricing.

At the same time, there have been efforts to raise awareness of the serials crisis among faculty members and discourage publication in high-cost print journals. One study of physics journals showed a 80-fold variation in the per character cost of refereed physics journals [ 2 ]; the author and the American Physics Society were subsequently sued by one of the publishers whose journals were rated least cost-effective [ 15 ]. Some groups have proposed university copyright policies that require faculty members to consider the cost and accessibility of articles before transferring copyright to journal publishers [ 6 ]. Tenure and promotion policies have been proposed suggesting that faculty members be evaluated based on the perceived quality and contribution of a small number of publications rather than the usual metric of total publication count.

Nevertheless, reform in scholarly communication is slow to come. Although most electronic journals have sought to establish their credibility through strong peer review processes - indeed some have developed novel processes that can be considered a significant improvement over conventional practices [ 10 ] - refereed electronic journals have not yet become widely accepted as the equal of refereed print journals [ 11 ]. When academic careers are on the line, faculty members are slow to eschew the expensive but well-established print journals in favor of lower-cost but unproved alternatives such as electronic journals.

A universal citation database has significant potential to act as a catalyst for reform in scholarly communication by leveling the playing field between alternative forms of scholarly publication. This would happen in two important ways.quote First, the citation database would ensure that publications in any form are equally visible (but not necessarily equally accessible) to the literature research process. Regardless of which publication venue an author chooses, all that she/he need to do to make her/his work visible is to cite appropriate previous works. Publication venues would then compete on the important values that they bring to the publication process, such as refereeing standards, editorial control, quality of presentation, timeliness of dissemination and so forth. Publications would no longer enjoy an unfair competitive advantage simply by virtue of being indexed in a particular literature database.

The second way that a universal citation database would promote fairer competition among publication venues is by providing a method for evaluating the significance of individual papers independent of the publication venue chosen. University faculty members are often critically concerned with the recognition that their work receives because of its importance to the evaluation of their academic careers. Because the significance of papers is often judged solely by the perceived quality of the venues in which they are published, this encourages a very conservative approach to choice of publication venues. By providing citation data as an independent means of demonstrating the significance of a particular work, a universal citation database has the potential to encourage authors to choose publication venues for other qualities.

Lowered Cost of Literature Indexing

The literature indexing industry is big business. Hundreds of millions of dollars are spent annually by research libraries in acquiring literature indexes. A freely-distributed universal bibliographic and citation database has considerable potential to reduce these costs.

The costs of production for conventional literature indexes arise primarily from three sources. The first is simply the cost of assembling the basic bibliographic data for published works. This includes the not insignificant costs of acquiring the relevant publications as well as those of preparing and entering bibliographic data in the standard format of a particular index. The second cost area is that of adding classification, review or other information to the basic bibliographic data to aid in bibliographic search and evaluation. This includes the indexing of works by keyword, the classification of works by subject hierarchy and the independent analysis and review of works. The third cost area is that of marketing and distribution. With a universal bibliographic database, there are significant potential cost savings available in each area.

In particular, the cost of assembling basic bibliographic data could be reduced tremendously with a universal bibliographic and citation database organized along the lines suggested in this paper. With present literature indexes, there is a significant cost of post-publication bibliographic data entry and this cost is multiplied by the number of indexes that separately index a work. It is not uncommon to find a journal indexed by ten or more separate literature services. Under a universal database as proposed here, bibliographic data is contributed at source, that is, by authors and/or publishers. As we argue in Section II, the bibliographic data is already routinely developed at source; the cost of developing and contributing it in a standardized format can be made almost trivial and more than compensated by the increased value that a universal database represents. Indeed, in a thoroughly integrated system, bibliographic and citation data would be a derived product of the publication process and there would hence be a complete elimination of costs associated with manual entry of this data.

The reduction of data entry costs could be realized as savings to research libraries in two ways: cancellation of some literature indexes and lower prices for others. Cancellation would be appropriate and logical for those indexes that do not add much value beyond the assembly of basic bibliographic data; they become redundant with the availability of the universal database. Cancellation pressure should also ensure that the remaining indexes pass on data entry savings in the form of reduced subscription prices.

Additional cost reductions may be achieved by canceling indexes even if they do add value in the form of keyword- or classification-based indexes. A universal bibliographic and citation database will provide, at a minimum, searches by author and/or title word as well as citation searching. Arguably, citation searching on a universal database would be sufficiently powerful for most literature research activities. In this event, the value added by manually assembled keyword or classification indexes may not be sufficiently high to justify their continuation.

With the advent of a universal bibliographic and citation database then, significant cost savings for research libraries should be realized through the cancellation of numbers of conventional literature indexes. It should also be possible to arrange for the distribution of the universal database at a cost less than, or comparable to, that of distribution of the canceled indexes. With low and unduplicated data entry costs for entry of basic bibliographic data and no costs associated with manual "value-added" indexing, the universal bibliographic database should cost much less than the literature indexes supplanted.

Models for a Universal Citation Database

In this section, we consider three models for different aspects of a universal bibliographic and citation database. The first of these is an operational model for bibliographic data development in the steady state. The point of this model is to argue that the cost of entering bibliographic and citation data can be made very low and that the process could be controlled entirely be the universities and other organizations that originate scholarly work. The second model describes a technical organization for the bibliographic database, using replicated local storage for the bulk of historical data and a distributed database component on the Internet for access to the latest information. Finally, we present a model for a semi-universal database, as a more realistic goal for initial citation database development.

Steady-State Data Production at Source

In this section, we consider a model for a universal citation database organized as a distributed database over the Internet and maintained primarily by the academic institutions and other organizations that originate scholarly work. Each site would contribute, in a standardized format, the bibliographic and citation data for papers written by its scholars. Two devices would be necessary to make it possible for the data to be developed in this way: an institutional requirement on authors to submit bibliographic and citation data and a bibliographic software system to help prepare it.

Of course, the success of such an approach requires the cooperation of individual authors who may at first consider it a requirement for extra work. However, there is good reason to believe that preparing bibliographic and citation data in this way could in fact be a time saver. At present, authors or their assistants must spend considerable time in selecting and formatting their bibliographic references for every paper they write. Even with the use of standard bibliographic software packages such as BibTeX , ProCite, or EndNote, considerable drudgery is involved. Once a paper has been accepted for publication, additional work is often required to make the citations and references consistent with the ordering and formatting conventions of a particular journal, although the existing bibliographic packages may take care of this in some cases.

Imagine instead that authors have at their disposal a bibliographic software package that is integrated with the universal citation and bibliographic database and that both of these are based on canonical citation identifiers. quote These identifiers would uniquely identify particular scholarly works and allow for full retrieval of the bibliographic entry from the universal citation database. The bibliographic software package would accept these identifiers and would be able to generate bibliographies ordered and formatted according to the requirements of any particular publication venue. The package could also translate occurrences of the identifiers in the text of papers to the appropriate numeric or symbolic labels associated with the citations. Literature searches using the universal citation database would return the appropriate canonical identifiers automatically. Personal bibliographic databases would no longer need to include the full bibliographic data for each paper, but could simply be lists of canonical identifiers. The net effect should be to considerably simplify the preparation of bibliographies and reference lists for scholarly works.

We should also be aware that scholars will not always have access to their computers, so it is important that the canonical identifiers use a mnemonic scheme that works when reading printed materials and making notes with pencil and paper. For example, a technical report might be cited as U-SFraser-CMPT-TR:95-07, where the U designates a university publication, SFraser is a standard abbreviation for Simon Fraser University from the list published annually by the Association of Commonwealth Universities, CMPT is the department code used at Simon Fraser for the School of Computing Science, TR indicates that this publication is a technical report (vs. MSc and PhD for theses, for example) and 95-07 is the assigned number in the Computing Science Technical Report series. A journal citation might be in the form J-ACM-TOPLAS:11@194 for the first article that begins on page 194 of volume 11 of ACM Transactions on Programming Languages and Systems. Here, the initial J identifies the citation as a reference to a journal article, ACM is a code for the publisher (Association for Computing Machinery) and TOPLAS is an abbreviation for the journal (unique for the publisher). The use of a separate publisher code helps avoid conflicts between journal abbreviations, while keeping to standard abbreviations used in a discipline. For example, TOPLAS is widely accepted in the computing science community as the abbreviation of this particular journal, but may conflict with other uses of that abbreviation. Unfortunately, it may not be possible to create meaningful canonical abbreviations in every case; in particular, it is difficult to see how books can be uniquely identified without resorting to non-mnemonic designations. Nevertheless, it is certainly feasible to devise some scheme for uniquely identifying publications and making it mnemonic as possible seems highly desirable.

With such a bibliographic software system for preparing bibliographies and references, the task of submitting the required bibliographic and citation data for each paper becomes almost trivial. All that need be done is to extract the list of citation identifiers used in the paper (with an appropriate software tool) and to submit this list together with the paper's bibliographic data and abstract. Still, some authors will have difficulties with these tasks, so it will be important to train appropriate support staff (reference librarians and/or computing services personnel) to provide help. With these facilities in place, an institutional requirement that authors submit bibliographic data in the necessary form is certainly not onerous and should be more than compensated by the assistance provided in preparing and formatting reference lists.

Although the requirement for preparation of the bibliographic and citation data has been suggested as an institutional requirement on authors, other mechanisms appropriate to a particular institution could be employed instead. The key point is that submission of the publication data be required at some level of institutional operation to ensure the universality of the database. Application of this requirement to authors, however, seems consistent with typical university practices in requiring faculty members to submit publication data for academic career evaluation and publicity purposes.

The model suggested here is but one of many possible ways of implementing a universal citation database. Other possibilities may involve more participation on the part of academic societies, research libraries, and/or publishers in data preparation. However, the development of data by the institutions that originate scholarly work has a number of advantages that should be considered in any scheme. First of all, preparation of the citation and bibliographic data, albeit in a different form, already takes place at the originating institutions as part of the process of writing a paper and including references. Regenerating this data at some later stage in the publication cycle would seem to be an unnecessary duplication of effort. Secondly, scholars will be able to include in the database all of their works that they deem of interest, regardless of publication venue. This freedom to have works included in the database seems highly appropriate in view of the potentially heavy use of the citation database for academic career evaluation. Thirdly, authors can also ensure the most rapid possible dissemination of their work through timely data preparation. Finally, development of data by the originating institutions can be carried out even if some publishers refuse to participate; in this way, the originating institutions regain a measure of control over their publications to balance the loss of copyright typically required for journal publication.

Distributed Database Organization

Although the Internet has been suggested as the platform upon which the universal citation database is implemented, a reasonable organization would only use the Internet for access to the most recent bibliographic and citation information. Local copies of the database would provide access to the bulk of the historical data up to the current year, say and would help to minimize network traffic. Nevertheless, it is important to consider the organization of the distributed database component on the Internet both for providing access to the most recent possible information and for providing comprehensive access in the event that a local copy of the database is unavailable.

A natural and efficient approach to organizing the bibliographic database - for serial publications at least - is to distribute the data by publication, i.e., journal, conference, technical report series, and so on. Data for each publication would be provided by a logically distinct server on the Internet. Given any canonical citation identifier, a master index would allow the corresponding publication and its server (and perhaps backup servers) to be identified. Local copies of the master index will allow most of the publication server look-up traffic to be kept off the Internet. If the server for a particular publication changes, the original server would be expected to forward requests until local indexes are updated. For new publications not yet registered in the local copy, a query of the Internet master index server may be required.

Each publication server will provide both the basic bibliographic data and the citation data (if available) for articles published in the corresponding serial. The citation data will include both citations made and citations received by the article. Initially, a complete record of basic bibliographic data should be created at the time a canonical citation identifier is registered for an article. Ideally, the list of citations made in the article (i.e., canonical citation identifiers of cited articles) will also be entered at this time. In practice, it will be necessary to allow deferred entry of citation data. The field for citations received by an article will be initially set to empty. This field will be updated from time to time as citations from other articles become known. These updates will received from the publication servers of the citing articles as they enter data for those articles. Thus, each time a publication server adds or amends the list of articles cited for a given article, it should transmit that article's citation identifier to the publication servers for each of the cited articles. The Internet traffic generated by this process will be quite modest: short messages (containing only canonical identifiers of citing and cited works) typically sent from a citing server to a small number of servers for cited works.

With the distribution of citation data from citing servers to cited servers, citation searching on the Internet becomes quite efficient. For example, to find all works that cite at least two of a set of four articles, queries are sent to the servers for each of the articles. Each server returns a short message containing lists of canonical citation identifiers of citing works. The client software will then determine which citation identifiers occur in two or more of the lists and retrieve the full bibliographic data for only those identifiers. This approach generally avoids transmission of full bibliographic records until any citation logic which may reduce the retrieval set has been applied.

Of course, Internet traffic can be reduced even further when a local copy of the universal citation and bibliographic database is available. Even if Internet citation searching is used to obtain the most recent possible citations, the bulk of bibliographic record retrieval will normally take place from the local copy. However, it may also be reasonable to initially restrict citation searching to the local database alone. Before making an Internet query, users could be asked to analyze the results obtainable from a local search. These results should provide all relevant citing works up to the time that the local database copy was made. An option could be provided to update the results of a completed local search with information on the latest citations from the Internet. Indeed, it may be reasonable to restrict Internet access for certain groups of users (e.g., undergraduate students) to this form of query updating only.

quote In a minimal model, author/title and other forms of keyword searching would not be directly supported by the Internet component of the universal bibliographic and citation database. The reason is that such searches would be very costly in network resources; a keyword search could potentially match an article in any publication and hence would require that a query be issued to every publication server on the Internet. However, keyword searching could be supported indirectly as a filtering operation on a citation search; that is, once a set of records has been retrieved from the citation search, select only those records containing the keyword. Furthermore, general keyword searching could also be performed on the local copy of the universal database. This would miss some of the most recently published items, but would not be too different from keyword searching of present-day CD-ROM literature indexes.

Support for more general keyword and other forms of searching could also be provided through separately compiled subject indexes. For example, a particular scholarly society may want to create a subject index for a discipline X. It could establish a list of publications relevant to X and request that the publication servers for these publications forward bibliographic records as they are created. The scholarly society would be free to create any additional indexing information desired and could provide Internet access to the X subject index for accessing the most up-to-date material via keyword search.

Note that the results from different subject indexes could be freely combined under this model, providing that all search results are returned as lists of canonical citation identifiers. For example, one might be interested in interdisciplinary work involving concepts from two disciplines covered by different indexes. Searches of the disciplinary indexes could each identify papers relevant to one of the concepts; citation searching for papers that cite at least one paper in each list could provide a good starting point for the desired interdisciplinary literature.

The model presented here for Internet operation of a universal citation database represents just one vision for how such a database could be organized. The key point is that organization by publication server allows efficient updating and access to the data by canonical citation identifier. Beyond this feasibility argument, however, there are many technical and institutional issues to be explored in the design and deployment of such a database.

An Initial Goal: the Semi-Universal Citation Database

As an initial goal in establishing a universal citation database, and perhaps as an end in its own right, consider the concept of a semi-universal citation database. Such a database would contain all and only the citation links involving works published after a given point in time, hereinafter called the initiation date. Older works would be represented in the database only by their basic bibliographic data and to the extent that they have been cited by post-initiation publications. Of course, the advantage of a semi-universal citation database is that it forgoes the retrospective development of citation data prior to the initiation date. Establishing the semi-universal citation database then requires only two major tasks: (1) retrospective collection of basic bibliographic data for existing works, and (2) establishing the appropriate steady-state procedures for bibliographic and citation data collection.

Quite soon after initiation, however, a semi-universal citation database would have almost all of the benefits of a fully universal database. Consider the value of a semi-universal citation database five years past initiation. From each of the perspectives of literature research, evaluation of scholarly work and the reform of scholarly communication, this semi-universal citation database would be almost as effective as a fully universal database. From the literature research perspective, existing bibliographic databases would suffice for finding works published prior to the initiation date using standard searching techniques. More recent works could then be found using the semi-universal citation database. In evaluation of scholarly work, the semi-universal citation database helps with only the most recent five years of citations. But these include all the citations of recent work and all the recent citations of older work, precisely the citation information of greatest interest for evaluating academic careers (or departments or journals). Finally, in the reform of scholarly communication, citation databases have their value in modifying the publication behavior of scholars. Once initiated, a semi-universal citation database would have the same effect as a fully universal database in assuring scholars that their works - and the citation credits their works receive - are visible independent of the publication venue chosen.

Building upon existing sources of basic bibliographic data and instituting procedures for citation data collection for new works represents a realistic goal for initial citation database development. Once established, retrospective addition of historical citation data could be contemplated.

Conclusion

One approach towards the development of a universal (or semi-universal) citation database would be the establishment of a consortium of universities, academic societies and research library associations devoted to the purpose. To that end, this paper has argued the desirability and feasibility of such a project. A universal citation database would have considerable value as a tool for both literature research and the evaluation of scholarly work and hence could act as a strong catalyst for overall reform in scholarly communication. Models suggesting the operational and technical feasibility of such a project have also been described. Indeed, a net savings may be achieved by rational and integrated reallocation of existing resources presently devoted to bibliography preparation, curriculum vitae maintenance and literature indexing.

Another possibility is that some form of universal citation database will naturally grow out of World-Wide Web (WWW) developments on the Internet. Indeed, by using WWW "robots" to systematically explore Web space, prototype citation databases have already been created. At present, the usefulness of such databases is limited by several factors, including, in particular, the current implementation of WWW citations as universal resource locators (URLs). URLs provide specific technical information about the protocol, computer address, port number and file location to be used in retrieving a document. As such, they cannot serve as permanent canonical identifiers of scholarly works. However, there are proposals to replace URLs with more abstract specifications based on universal resource names URNs [ 18 ]. It is conceivable that a canonical naming scheme using URNs may serve as the basis of a useful citation database within the universe of WWW space. Nevertheless, it would be regrettable if such a development served to marginalize works because they are not published on the Web or overvalue works because they are.End of article

The Author

Robert D. Cameron is an Associate Professor with the faculty at the School of Computing Science at Simon Fraser University in Canada.
E-mail: cameron@cs.sfu.ca

An earlier version of this paper was published as TR 95-07 by the School of Computing Science, Simon Fraser University. This is a working draft; a revised version may be available at URL http://elib.cs.sfu.ca/project/papers/citebase/citebase.html Multiple copies of this draft may be may be made for use in classrooms, discussion groups, or committee meetings, provided that notice of the intent and extent of the copying is sent to the author (e-mail is satisfactory). Archival copying of this preliminary version of the paper is not permitted. All copying requires that the integrity of the paper be preserved and that this copyright notice be reproduced in full.


Notes

1. Eyal Amiran and John Unsworth, 1991. "Postmodern Culture: Publishing in the electronic medium," The Public-Access Computer Systems Review, Vol. 2, No. 1, pp. 67-76, available at http://info.lib.uh.edu/pr/v2/n1/amiran.2n1 >

2. Henry H. Barschall, 1988. "The cost-effectiveness of physics journals," Physics Today, Vol. 41, No. 7 (July), pp. 56-59. http://dx.doi.org/10.1063/1.881125 3. Robert N. Broadus, 1985. "A proposed method for eliminating titles from periodical subscription lists," College & Research Libraries, Vol. 46, No. 1 (January), pp. 31-35.

4. Tina E. Chrzastowski and Karen A. Schmidt, 1993. "Surveying the damage: Academic library serial cancellations 1987-88 through 1989-90," College & Research Libraries, Vol. 54, No. 2 (March), pp. 93-102.

5. Anthony M. Cummings, Marcia L. White, William B. Bowen, Laura O. Lazarus, and Richard H. Ekman, 1992. University Libraries and Scholarly Communication: A Study Prepared for the Andrew W. Mellon Foundation. Washington, D. C.: Association of Research Libraries, available at http://www.lib.virginia.edu/mellon/mellon.html

6. TRLN Copyright Policy Task Force, 1993. "Model university copyright policy regarding faculty publication in scholarly journals: A background paper and review of the issues," The Public-Access Computer Systems Review, Vol. 4, No. 4, pp. 4-25, available at http://info.lib.uh.edu/pr/v4/n4/trln.4n4

7. Eugene Garfield, 1972. "Citation analysis as a tool in journal evaluation," Science, Vol. 178, No. 4060 (November 3), pp. 471-479.

8. Philip Howard Gary, 1983. "Using science citation analysis to evaluate administrative accountability," American Psychologist, Vol. 38 (January), pp. 116-17. http://dx.doi.org/10.1037/0003-066X.38.1.116

9. Lowell L. Hargens, 1990. "Citation counts and social comparisons: Scientists' use and evaluation of," Social Science Research, Vol. 19 (January), pp. 205-221. http://dx.doi.org/10.1016/0049-089X(90)90006-5

10. Stevan Harnad, 1995. "Implementing peer review on the net: Scientific quality control in scholarly electronic journals," In: R. Peek and G. Newby, (eds.), Electronic Publishing Confronts Academia: The Agenda for the Year 2000. Cambridge, Mass.: MIT Press, available at ftp://princeton.edu/pub/harnad/harnad95.peer.review

11 Stephen P. Harter, 1996. "The impact of electronic journals on scholarly communication: A citation analysis," The Public-Access Computer Systems Review, Vol. 7, No. 5, pp. 5-34, available at http://info.lib.uh.edu/pr/v7/n5/hart7n5.html

12. Edward M. Jennings, 1991. "EJournal: An account of the first two years," The Public-Access Computer Systems Review, Vol. 2, No. 1, pp. 91-110, available at http://info.lib.uh.edu/pr/v2/n1/jennings.2n1

13. Susan Lewis, 1995. "From earth to ether: One publisher's reincarnation," Serials Librarian, Vol. 25, Nos. 3/4, pp. 173-180. http://dx.doi.org/10.1300/J123v25n03_19

14. D. Lindsey, 1989. "Using citation counts as a measure of quality in science: Measuring what's measurable rather than what's valid," Scientometrics, Vol. 15, Nos. 3-4, pp. 189-203. http://dx.doi.org/10.1007/BF02017198

15. Harry Lustig and Ken Ford, 1992. "Statement by the American Physical Society and the American Institute of Physics: Gordon and Breach press release is misleading," Newsletter on Serials Pricing Issues, Vol. 43, (August), article 2, available at ftp://ftp.lib.ncsu.edu/pub/stacks/prices/nspi-ns043

16. Kenneth E. Marks, Steven P. Nielsen, H. Craig Petersen, and Peter E. Wagner, 1991. "Longitudinal study of scientific journal prices in a research library," College & Research Libraries, Vol. 52, No. 2 (March), pp. 125-138.

17. A. Schubert and T. Braun, 1993. "Reference standards for citation based assessments," Scientometrics, Vol. 26, No. 1, pp. 21-35. http://dx.doi.org/10.1007/BF02016790

18. K. Sollins and L. Masinter, 1994. Functional requirements for uniform resource names. RFC 1737, Internet Engineering Task Force, (December), available at http://info.internet.isi.edu/in-notes/rfc/files/rfc1737.txt

19, Nicholas Wade, 1975. "Citation analysis: A new tool for science administrators," Science, Vol. 188, No. 4183 (May 2), pp. 429-432.



Copyright © 1997, First Monday

A Universal Citation Database as a Catalyst for Reform in Scholarly Communication by Robert D. Cameron
First Monday, volume 2, number 4 (April 1997),
URL: http://www.firstmonday.org/?journal=fm&page=article&op=view&path[]=522






A Great Cities Initiative of the University of Illinois at Chicago University Library.

© First Monday, 1995-2017. ISSN 1396-0466.