The dangers of Webcrawled datasets

Graeme Baxter Bell


This article highlights legal, ethical and scientific problems arising from the use of large experimental datasets gathered from the Internet - in particular, image datasets. Such datasets are currently used within research into topics such as information forensics and image-processing. This paper strongly recommends against webcrawling as a means for generating experimental datasets, and proposes safer alternatives.


internet; webcrawler; webcrawling; data gathering; image-processing; information forensics

Full Text:



A Great Cities Initiative of the University of Illinois at Chicago University Library.

© First Monday, 1995-2019. ISSN 1396-0466.