The reliability of information collected from at large Internet users by open collaborative wikis such as Wikipedia has been a subject of widespread debate. This paper provides a practical proposal for improving user confidence in wiki information by coloring the text of a wiki article based on the venerability of the text. This proposal relies on the philosophy that bad information is less likely to survive a collaborative editing process over large numbers of edits. Colorization would provide users with a clear visual cue as to the level of confidence that they can place in particular assertions made within a wiki article.
Think of Wikipedia as a massive garage where you can build any car you want to ... But every one else, and I mean everyone else in the garage can work on your car with you ... the people who are allowed to work on your car can completely disregard what you were doing with it. They could have flown in from BoolaBoola Island 2 hours ago, not know the language, can’t read the manuals, and just go in and paint your car pink. And drive it. And leave it somewhere. — Jason Scott (2004).
Before providing a recommendation for improving the reliability of open, collaborative wikis it makes sense to pause and consider whether these resources could ever be considered reliable at all. Wikipedia is the largest and most culturally significant experiment in open, collaborative wikis, and this paper will focus on it. However, the techniques presented in this paper could be applied to any collaborative wiki.
A great deal has been written about the reliability of the information contained in Wikipedia (Wikipedia: Criticisms, 2006). Our society has developed a certain expectation of what an encyclopedia should be. We expect it to be an authoritative, reliable reference that provides basic information about a wide variety of subjects. Encyclopedias have traditionally been produced by companies with teams of subject matter experts who compile information and fact check its accuracy. The idea that comparable authority could come from a resource that can literally be edited by anyone, regardless of their level of expertise, seems to defy logic. Yet, the existence of Wikipedia and its actual utility is, at first glance, evidence to the contrary.
Detractors point out that in spite of the fact that Wikipedia frequently contains correct information (Giles, 2005), it can also contain bad information. For example, in late 2005 American journalist John Seigenthaler publicly criticized Wikipedia because of a collection of inaccuracies in his biography page, including an assertion that he was involved with the assassination of former U.S. President John F. Kennedy (Seigenthaler, 2005). Apparently the inaccuracies remained in Wikipedia for 132 days. Because there is no single entity taking responsibility for the accuracy of Wikipedia content, and because users have no other way of differentiating accurate content from inaccurate content, it can be argued that Wikipedia content cannot be relied upon even if inaccuracies are rare.
I think that the error at the heart of this controversy is our inclination to hold Wikipedia up to our traditional expectations for encyclopedias. In making this observation I do not intend to be an apologist for Wikipedias failures. There are many applications suited to Wikipedia that cannot be appropriately served by a traditional encyclopedia, and vice versa. It is helpful to observe that these are different tools, that are produced in different ways, and are useful, ultimately, for different purposes.
Traditional encyclopedias have the benefit of authoritative editing by experts, but this authority comes at a cost, both in terms of the accessibility of the information, and the speed with which it becomes available. This makes encyclopedias particularly well suited to subjects that are relatively static, in contexts where the need for accuracy is greatest.
For example, Wikipedia contains many articles describing the complete technical details of various cryptography algorithms (Topics in cryptography, 2006). While these resources provide interesting information, its possible that they may contain inaccuracies at any given time. If one is tasked with the responsibility of implementing such an algorithm in software, accuracy is paramount. Programmers cannot be sure that the information they obtain from these resources will enable them to faithfully implement the algorithms described therein. A more authoritative, static resource is warranted.
Wikipedia trades authority for accessibility, breadth, and speed. It usually costs money to access a professionally edited encyclopedia. However, Wikipedia is available free of charge. Users who have a casual interest in a subject may find the ease with which they can access Wikipedia to be worth the risk of occasionally being misled.
Wikipedia covers a broad range of subjects that traditional encyclopedias cannot afford to cover. In March of 2006, Wikipedia announced that their one millionth article had been added. (Press Releases/English Wikipedia Publishes Millionth Article, 2006) For the sake of comparison, Encyclopedia Britannica contains around 120,000 articles, depending on the edition (Wikipedia:Size Comparisons, 2006). A onetoone comparison of article or even word count may be misleading because of differences in style and depth. However, there are many specific examples that lend support to the idea that Wikipedia has greater breadth. For example, I doubt that any professionally produced encyclopedia covers pop culture subjects like computer hacker magazines (2600: The Hacker Quarterly, 2006) or religious parodies like the Church of the SubGenius (Church of the SubGenius, 2006).
Another advantage of communityedited wikis is speed. Traditional encyclopedias are typically published on an annual basis, because the process they use for information collection and verification takes a long time. Current events obviously arent covered in last year's edition. While news organizations provide authoritative coverage of current events, their information is presented in a story format that is intended for real time consumption as each new piece of data becomes available. Even a few weeks after a major story has broken it can be labor intensive to piece together all of the details of what happened by searching news archives and reading multiple stories.
Wikipedia fills the time gap between the real time news media and the slow publication of authoritative encyclopedic resources by providing a central collection point for data about a recent event that is available immediately. Wikipedia coverage of the 2005 terrorist attacks on the London Subway system, for example, was available and useful almost immediately after the attacks occurred. (7 July 2005 London bombings, 2005).
Given enough eyeballs, all bugs are shallow. Eric S. Raymond (1998).
Wikipedia enables users to trade a guarantee of accuracy for several other factors that can be more important, depending on the context. However, the information in Wikipedia is often correct in spite of its apparent lack of authority (Giles, 2005). One explanation for this phenomena is that the effort required to vandalize a Wikipedia page exceeds the effort required to revert vandalized pages, making such activities fairly fruitless (Ciffolilli, 2003). However, there may be more to it than that. Internet luminary Joi Ito observed that the authority of Wikipedia comes from the fact that its content has been viewed by hundreds of thousands of people (with the ability to comment) and has survived. (Ito, 2004; also see Neus, 2001)
This observation is analogous to the observations that Eric S. Raymond (1998) made in his famous essay about Open Source software, The Cathedral and the Bazaar. Raymond argued that Open Source software projects eliminate bugs quickly because thousands of users have access to the code and can find and report them. An individual developer might not have enough expertise to identify every bug, but this expertise exists collectively among large numbers of userdevelopers. The Wiki editing process produces good results in the same way, by harnessing the collective knowledge of a large number of individual editors.
However, there is an important distinction between the Raymonds Bazaar model of software development, and the article editing process employed by Wikipedia. Open Source software projects are centrally maintained and general users do not have access to code changes submitted by other users until they have been centrally approved and incorporated. On Wikipedia, however, user submitted changes are available immediately. Wikipedia doesnt show its users a version of an article that has survived thousands of edits, it shows the latest version of each article, which has not survived any subsequent edits.
I propose that it would be better to provide Wikipedia users with a visual cue that enables them to see what assertions in an article have, in fact, survived the scrutiny of a large number of people, and what assertions are relatively fresh, and may not be as reliable.
Wikipedia could consider moving to a central editing process. Individuals could volunteer to maintain control of particular articles, serving as gatekeepers for user submitted changes. Different versions of articles on particular subjects, maintained by different editors, could compete for users just as various software projects that solve similar problems compete with each other in the marketplace.
However, it seems likely that such a model would detract from the usefulness of Wikipedia. Wikipedia users would be faced with the task of reading many different articles on each subject and making a personal evaluation of them, when their primary goal is simply to learn something. New contributors may find it easier to create an entirely new article on a particular subject rather than attempting to improve a flawed article maintained by a stubborn editor. This would contribute to the confusion and the amount of reading and evaluation required by users every time they use Wikipedia.
Instead of maintaining different official versions of each article, Wikipedia could use a reputation system to create a moderator class which has the power to approve user submitted changes. This idea is worthy of further research, and unfortunately a complete consideration of it is beyond the scope of this article. However, electing a moderator class that is large enough to handle the task but also responsible enough to handle it properly is a significantly difficult problem. It would also threaten to introduce subtle, institutional biases that could undermine Wikipedias attempts at openness.
I propose that it would be better to provide Wikipedia users with a visual cue that enables them to see what assertions in an article have, in fact, survived the scrutiny of a large number of people, and what assertions are relatively fresh, and may not be as reliable. This would enable Wikipedia users to take more advantage of the power of the collaborative editing process taking place without forcing that process to change. Visual cues would improve users confidence in the information they are gleaning while making it harder to mislead them with bad information.
It is not enough to simply present Wikipedia users with an old revision of an article rather than the current revision, as all revisions are present in the database even if the edits made in those revisions have been reverted or significantly revised. A better approach is to color the text of the latest revision depending on the age of each word. While Im aware of several projects that involved visualizing information in Wikipedia with color, such as IBMs History Flow visualizations (Wattenberg and Viegas, 2003), these projects have been attempts to study the process through which Wikipedia content evolves, rather than to change the way that Wikipedia content is used.
The first consideration that must be addressed is what age means in the context of a Wikipedia article. Its important to observe that Wikipedia articles are not created equally. Articles that cover subjects of intense public interest or present importance are frequently viewed and edited by large numbers of people. Other articles may go for months at a time without being viewed. Incorrect assertions made in popular articles are likely to be quickly repaired due to the volume of scrutiny those articles receive, and so assertions might be considered mature within a short period of time. However, incorrect assertions might exist in unpopular articles for months at a time. Clearly, any visualization system must adapt its concept of age depending on the popularity of each article.
Ideally, you could judge the age of a particular edit to a Wikipedia article depending on how often that article is viewed. If a large number of people view an article and decide not to edit it, you might conclude that those users havent seen an inaccuracy worth repairing. From my reading of the MediaWiki source code it appears that Wikipedia once stored information about the number of times an article had been viewed in its database. However, the performance requirements for collecting that information are significant and Wikipedia not longer does it. The best information available in MediaWiki about the reliability of an edit is the number of subsequent edits. If an article has been edited a large number of times since a particular edit took place, but the text introduced by that particular edit has been left unmolested, one might conclude that the later editors felt that this information was reliable.
Other considerations also come into play. Unpopular articles, and articles that are essentially complete, may go for long periods of time without being edited. Eventually, one must assume that the content of said article is mature in spite of the fact (or perhaps because of the fact) that the information has not been subsequently edited. There needs to be a fixed upper bound for the amount of time that an edit is considered immature when no other edits have taken place. Unfortunately, this time period may need to be very long. John Seigenthaler was complaining about inaccurate information which lived in his Wikipedia biography for many months.
On the other hand, as anyone can edit a Wikipedia article, malicious editors may introduce large numbers of inconsequential edits in an attempt to push inaccurate information to maturity on a shorter time scale. Consequently, there must also be a lower bound for the time period that an edit must wait before being considered mature. Fortunately, articles receiving a large number of automated, inconsequential edits are likely to show up frequently on lists of articles that have recently changed, hopefully prompting attention from other Wikipedia users who may by able to prevent the malicious activity. This policing activity might also be helped by introducing a page in Wikipedia listing articles that have received a sudden surge in editing frequency.
I have modified a version of MediaWiki to implement the colorization process Im describing in this paper. These changes are not robustly implemented such that they are ready for integration with the official MediaWiki project. However, they do help evaluate the technical challenges associated with this proposal and illustrate its value.
MediaWiki articles have a number of tabs at the top which provide access to different features related to the article. The tabs are article, discussion, edit this page, and history. I added a fifth tab called reliability, where users can click to access a colorized version of the article.
As this information is likely to be read more often than it is written, it makes sense to perform colorization on edit rather than on read. I added a new mediumtext field to the cur table in the MediaWiki database, called cur_reliability, where a colorized version of each article is written every time the article is edited. This information is displayed when the user clicks on the reliability tab.
There are a number of constants that must be selected in order for this colorization process to work. The most important of these is the number of edits that text in an article must survive before it is considered mature. Call this value Emature. Also important is upper bound for the amount of time an article can go without being edited before the contents are considered mature, as well as the lower bound for the amount of time text in an article must wait before being considered mature. Call these values Tvenerable and Tfresh.
In my implementation I chose four colors to represent text of varying degrees of maturity. Most hightech cultures on earth employ automobiles and have fairly consistent standards for traffic light coloring, so I employed these colors to indicate the age of text. The newest text in an article is colored red, indicating that users should employ caution in relying on it. Slightly older text is colored yellow. Text that is nearing maturity is colored green. Mature text is colored black.
When an edit occurs to an article, the date of the Emature edit is retrieved from the database. To clarify, if Emature is 50, that means text must survive 50 edits before it is considered mature, so I go back to the 50th previous edit of the article in question, and retrieve its date. This date is referred to as Tblack, and its associated revision is referred to as Rblack. All text in the current revision that is older than Tblack is colored black in our colorized revision.
If the date of Emature is older than Tvenerable, then I search for the most recent edit that is older than Tvenerable. That edit becomes Rblack, and Tblack is set to Tvenerable. If the date of Emature is newer than Tfresh, then I search for the most recent edit that is older than Tfresh. That edit becomes Rblack, and Tblack is set to Tfresh. Its also possible that I wont have Emature revisions for a particular article. In this case, Tblack is set to Tvenerable, and I only have an Rblack revision if there is a revision older than Tvenerable.
I also need to find two other times, and associated revisions. Tgreen is two thirds as old as Tblack, and Rgreen is the newest revision older than Tgreen and newer than Tblack. Tyellow is one third as old as Tblack, Ryellow is the newest revision older than Tyellow and newer than Tgreen.
At the end of this process, I may have four revisions, Rblack, Rgreen, Ryellow, and Rred. Rred is the new revision submitted in the present edit. Its possible that I am missing one or more of these revisions because they do not exist. In any case, I color the text by producing diffs of the revisions that I have.
In the simplest case, I have only one revision. If that revision is Rblack, I simply write it into the database as my reliability text. If the revision is another color, then I color it by adding HTML SPAN tags to the text. I modified the Cascading Style Sheet for the default MediaWiki theme to include classes for each text color. The colorized version is then written to the database as the reliability text.
The next case to consider is the case where we have two revisions, lets say Rblack and Rgreen. I produce a diff of these two revisions using MediaWikis built in Diff function. This function returns lines of text and information indicating how each line has changed. There are four possibilities for each line, copy (meaning the text hasnt changed), add (meaning the text is new in the later revision), delete (which means that text from the old revision is not present in the new revision), and change (which means that something within that line has changed).
When a line has been copied, I simply append that line into my final reliability revision. The line will be colored black, as that is the default. When a line has been added, I surround the line with HTML SPAN tags indicating that it should be colored, and then I append it to the final revision. I dont do anything with lines that have been deleted. Ill consider this case later on in this paper. The more complex case is where the line has changed. I modified the WordLevelDiff function in MediaWikis DifferenceEngine to add SPAN tags around new words that have been added to text lines.
Additional complexity is faced when I have completed a diff of Rblack and Rgreen, and I wish to continue with the process by coloring the differences between Rgreen and Ryellow. The text of my colorized version of Rblack and Rgreen is the same as the text in Rgreen, with the addition of some HTML SPAN tags. I perform my diff of Rgreen and Ryellow, and follow along in my colorized version or Rgreen, line for line. When text is unchanged between Rgreen and Ryellow, I copy that line from my colorized version Rgreen. When lines have changed, I merge the output from the WordLevelDiff with the content of the line in my colorized version of Rgreen. New lines in Ryellow are copied in to my output with the addition of SPAN tags.
As these reliability texts are created on edit, texts for articles that are infrequently edited may need to be updated with the passage of time. A cron job is required which searches for articles that havent been edited in a third of Tvenerable, and regenerates their reliability texts.
To illustrate the value of these colorized texts I selected a Wikipedia article with vandalism that has survived multiple edits. Glenn Harlan Reynolds is a law professor at the University of Tennessee who maintains a popular weblog called Instapundit (Reynolds, 2002). Competing bloggers, jealous of Instapundits success, started a rumor that Mr. Reynolds drinks energy drinks made by mixing puppy dogs in a blender, in a satirical attempt to reduce Mr. Reynolds popularity (J., 2003). This information frequently finds its way into Mr. Reynolds Wikipedia biography, as it did on 28 February 2005 (Glenn Reynolds, February 2005). This time the information managed to survive two edits, before being removed on 14 March (Glenn Reynolds, March 2005).
The attached screen shot shows a colorized reliability text for this Wikipedia article as it appeared on 14 March before the vandalism was removed. This is simply a demo, and in this case Emature was essentially set to 4. In a real use scenario its likely that all of the colorized text would be red given the timeframe. This version of the article makes it clear what text from the article is reliable and what text should be considered suspect.
This example also illustrates two problems with the present implementation. The first is my color choices. Wikipedia already uses the color red to indicate links to wiki keywords that do not yet have articles associated with them. Although the shade of red is different, overloading the meaning of the color could lead to confusion.
The second problem is that one of the editors modified a previous link to the wiki word Instapundit with two wiki words, Instapundit and weblog. My SPAN tags interfere with MediaWikis parsing of the special characters that indicate a link to a wiki word, which occurs on read rather than on write. There are a number of special character sequences that my span tags might interfere with, and those cases will need to be handled in a production implementation.
There are many questions about this technique that warrant further investigation. To begin with, the color scheme may need to be reconsidered. Traffic light colors are easy to understand, but they can cause confusion with links to wiki words and people who are color blind may not be able to distinguish them. An alternative approach would be to shade the text with a single color, with the text becoming more opaque, or darker, as it becomes more mature. It may also be appropriate to make considerations for blind users who, of course, cannot see any color variations.
Appropriate values for Emature, Tvenerable, and Tfresh need to be selected. This question should be informed through controlled studies involving large numbers of articles and large numbers of editors. It is possible that a one size fits all value for Emature is unworkable in practice and a more sophisticated method of scaling will be required to appropriately handle vast differences in article editing frequency.
Deleted text also needs to be handled better. The passage Glenn Reynolds does not drink puppy smoothies would have a significantly different meaning if the word not was removed, but as currently implemented there would be no indication of this change in my colorized version. One potential approach to this problem is to introduce an icon indicating that text has been removed. When the user moves their mouse over the icon the deleted text could be displayed.
The present algorithm deals with most vandalism fairly seamlessly. If a vandal makes a page edit which is quickly reverted, the text added by the vandal wont be present in diffs of the latest revision and older revisions. The exception to this is the case where in a vandal deletes a large amount of text from an article, and the vandalized revision happens to be selected by the algorithm as one of the revisions to use for coloring. In this case the deleted text is likely to be interpreted as new text by the algorithm and colored appropriately. Protecting against this failure in the majority of cases may be as simple as adding a check to ensure that any edit selected for coloring wasnt reverted by the next edit to the article, and if so, selecting the next oldest revision to represent the color instead.
In spite of its problems, Wikipedia is an enormously important information resource, used by a community of millions of people all over the world. I believe the popularity of Wikipedia stems from the fact that it fills an important niche in the constellation of information resources that was previously unserved. Improvements to this technology can have a positive impact on how these millions of users think and collaborate. Hopefully, I have demonstrated that the utility of collaborative wikis can be improved through the application of some relatively simple techniques. My intent in writing this paper is to spur discussion about the viability of my proposal, and to encourage others to pick up these questions where Ive left off.
About the author
Tom Cross is the co-founder of MemeStreams, a reputationenabled collaborative weblog. His day job involves computer security vulnerability research. He has a B.S. in Computer Engineering from the Georgia Institute of Technology.
Email: tom [at] memestreams [dot] net
Thanks to Tim Alexander, James Baldwin, Greg Conti, Jacob Langseth, Nick Levay, Jason Scott, and others whose conversations and encouragement contributed to this paper.
7 July 2005 London bombings, Wikipedia, 7 July 2005, at http://en.wikipedia.org/w/index.php?title=7_July_2005_London_bombings&oldid=18359349, accessed 25 June 2006.
2600: The Hacker Quarterly, Wikipedia, 16 June 2006, at http://en.wikipedia.org/w/index.php?title=2600:_The_Hacker_Quarterly&oldid=59026595, accessed 25 June 2006.
Church of the SubGenius, Wikipedia, 16 June 2006, at http://en.wikipedia.org/w/index.php?title=Church_of_the_SubGenius&oldid=58979995, accessed 25 June 2006.
Andrea Ciffolilli, 2003. Phantom authority, selfselective recruitment and retention of members in virtual communities: The case of Wikipedia, First Monday, volume 8, number 12 (December), at http://www.firstmonday.org/issues/issue8_12/ciffolilli/, accessed 25 June 2006.
Frank J., 2003. Its Fun to Be Spiteful, IMAO (19 April), at http://www.imao.us/archives/000567.html#000567, accessed 25 June 2006.
Jim Giles, 2005. Internet encyclopaedias go head to head, Nature, volume 438, number 7070 (15 December), pp. 900901.
Glenn Reynolds, Wikipedia, 28 February 2005, at http://en.wikipedia.org/w/index.php?title=Glenn_Reynolds&oldid=10861064, accessed 25 June 2006.
Glenn Reynolds, Wikipedia, 14 March 2005, at http://en.wikipedia.org/w/index.php?title=Glenn_Reynolds&oldid=11180788, accessed 25 June 2006.
Joy Ito, 2004. Wikipedia attacked by ignorant reporter, (29 August), at http://joi.ito.com/archives/2004/08/29/wikipedia_attacked_by_ignorant_reporter.html, accessed 25 June 2006.
A. Neus, 2001. Managing information quality in virtual communities of practice, In: E. Pierce and R. Katz Haas (editors). IQ 2001: Proceedings of the 6th International Conference on Information Quality at MIT. Cambridge, Mass.: MIT Sloan School of Management, and at http://opensource.mit.edu/papers/neus.pdf.
Press releases/English Wikipedia Publishes Millionth Article, Wikimedia Foundation, 11 March 2006, at http://wikimediafoundation.org/w/index.php?title=Press_releases/English_Wikipedia_Publishes_Millionth_Article&oldid=13571, accessed 25 June 2006.
Eric S. Raymond, 1998. The Cathedral and the Bazaar, First Monday, volume 3, number 3 (March), at http://www.firstmonday.org/issues/issue3_3/raymond/, accessed 25 June 2006.
Glenn Reynolds, 2002. about me, (24 May), at http://instapundit.com/about.php, accessed 25 June 2006.
Jason Scott, 2004. The Great Failure of Wikipedia, at http://ascii.textfiles.com/archives/000060.html, accessed 25 June 2006.
John Seigenthaler, 2005. A false Wikipedia ‘biography, USA Today (29 November), at http://www.usatoday.com/news/opinion/editorials/2005-11-29-wikipedia-edit_x.htm, accessed 25 June 2006.
Topics in cryptography, Wikipedia, 4 March 2006, at http://en.wikipedia.org/w/index.php?title=Topics_in_cryptography&oldid=42135113, accessed 25 June 2006.
M. Wattenberg and F. B. Viegas, 2003. history flow, at http://www.research.ibm.com/history/index.htm, accessed 25 June 2006.
Wikipedia:Criticisms, Wikipedia, 28 May 2006, at http://en.wikipedia.org/w/index.php?title=Wikipedia:Criticisms&oldid=55537605, accessed 25 June 2006.
Wikipedia:Size comparisons, Wikipedia, 16 June 2006, at http://en.wikipedia.org/w/index.php?title=Wikipedia:Size_comparisons&oldid=58868290, accessed 25 June 2006.
Paper received 18 July 2006; accepted 20 August 2006.
This work is licensed under a Creative Commons Public Domain License.
Puppy smoothies: Improving the reliability of open, collaborative wikis by Tom Cross
First Monday, volume 11, number 9 (September 2006),
A Great Cities Initiative of the University of Illinois at Chicago University Library.
© First Monday, 1995-2017. ISSN 1396-0466.