The Tor network is an encrypted network that allows anonymous access to the Internet for its users. The Tor network also hosts hidden services which constitute the infamous dark Web. These hidden services are used to carry out activities that are otherwise illegal and unethical on the surface Web. These activities include distribution of child pornography, access to illegal drugs, and the sale of weapons. While Tor hidden services provide a platform for uncensored ventures and a free expression of thoughts, they are outnumbered by grey activities taking place. In this paper, we have collected the addresses of about 25,742 hidden services and analyze the data for 6,227 available services with the help of custom-made crawler in Python. We analyzed the dataset and manually classify the data into 31 different categories to identify the nature of content available on the dark Web. The results indicate that a large share of hidden services provide illegal content along with a large number of scam sites. Non-English content was also studied and categorized. Russian was the leading language of the dark Web after English and hidden services having forums and blogs were predominantly present over other content.
2. Related work
3. Collection of data
4. Categorization of hidden services
5. Conclusion and future work
The dark Web is a subset of the Internet that is hidden within the encrypted layer below the regular Web and requires special tools to access it. The Onion Router or Tor is the most common tool to access the dark Web. It provides its users with a platform that enables them to remain anonymous while surfing the Internet. The same Tor network hosts the dark Web within it. The Tor browser uses onion routing technique (Dingledine, et al., 2004) where a circuit consisting of three nodes is established between the client and the server for communication. The data is encrypted at each node when it passes through the circuit. This hides the actual IP address of the client and enables a given user to browse the Web without being tracked. The detail working of Onion routing and Tor browser can be obtained from their official Web site (Tor Project, at https://www.torproject.org/).
The dark Web is composed of sites called hidden services. A hidden service is a Web service hosted on the dark Web and has its IP address hidden from the outside world, thus rendering it untraceable. A typical URL is composed of 16 characters that are difficult to comprehend (for example, tgazrkvyctepwxnp.onion). These hidden services, unlike the regular sites, end with .onion domain and require the Tor browser to access them. Moreover, the user should also know the URL of a given hidden service, although there are Web sites (Hidden Wiki, at https://thehiddenwiki.org) that provide links to many hidden services but these are limited. Also, there is no search engine, like Google, that provides a complete list of available hidden services as they are not indexed by traditional search engines.
The dark Web has been used for legitimate and criminal activities. For some, the dark Web provides the means to exercise their freedom of expression while others may use it for criminal exercises. Individual users and communities seeking online anonymity can utilize the dark Web for communications (Tor Project, at https://www.torproject.org/). Government bodies, like law enforcing and intelligence agencies, can leverage this platform to conduct covert operations and Internet surveillance (Chertoff and Simon, 2015). This platform is extremely useful in countries where a specific government controls the access, distribution, and publication of Internet content. In fact, there is a high demand for this platform by residents in repressive regimes in order to access blocked content (Jardine, 2018a).
Just as ordinary people benefit from the anonymity offered by the Tor platform so do criminals, militants, and anti-social elements. One study reports that offenders are increasingly relying on the online platform to spread their crimes (Finklea and Theohary, 2015). For example, traditional crimes, like the sale of prohibited drugs, are now being carried out on the Internet (Van Buskirk, et al., 2014). In addition, the dark Web has a history of being a breeding ground for many illicit activities (Greenberg, 2014). Illegal drugs, weapons, and identity theft are offered on this platform. Child abuse, extreme human experiments, gore, and violence are available as well (Sanchez-Rola, et al., 2017). A prime example of criminal activity on the dark Web is Silk Road, an online marketplace infamous for the illegal sale of drugs (Christin, 2012). The marketplace was shut down by the U.S, Federal Bureau of Investigation (FBI) in 2013 (Clark, 2013). A report also states that ISIS has been utilizing the dark Web to spread their network and to generate funds (Bertrand, 2015).
As evident from these examples, dark Web technology has two faces that create a dilemma for policy-makers. The dual use of technology makes it difficult for law enforcement to bring down the network. On one hand, it emerges as a boon for the citizens of repressive regimes while on the other hand, it facilitates criminal and illicit activities (Jardine, 2015). Therefore learning more about dark Web content allows for a better understanding of the cumulative harms and benefits that result from a technology like Tor.
So far, studies on the dark Web appear to only address English content. The findings on non-English content was only limited to the number of services available on the dark Web. As far as we know, there is almost no information available regarding the nature of the non-English content accessible on the dark Web.
The aim of this study is to gain insight into which services are being hosted on the dark Web and their legalities. We also attempt to secure information on non-English content as well as classify this content into various categories. To gather data, we built a custom Web crawler in Python that collected over 25,000 unique onion domains. The crawler started from a single URL and, by employing multiple crawl sessions that span the period of one month, collected onion domains. The collected dataset was then manually classified into different categories.
According to the Tor Project, there were approximately 80,000 unique and active hidden services at the time of the crawl (Tor Metrics, at https://metrics.torproject.org/hidserv-dir-onions-seen.html) yet only 6,227 hidden services were found online. The surprising difference in numbers could be due to the presence of chat services, like Ricochet (Wikipedia, 2019a), TorChat (Wikipedia, 2019b), and Tor Messenger (Wikipedia, 2019c), where each user is identified by a distinct 16 character .onion address.
The rest of this paper is organized as follows: In the next section, we describe the research already completed in this area. Section 3 examines the methodology and tools used to collect data. Section 4 treats a categorization of hidden services. Finally, the last section provides some observations in conclusion and discusses future opportunities.
2. Related work
Anonymity offered on the dark Web can be leveraged for a variety of tasks. The anonymous platform ensures the secrecy of communications in different domains. For instance, individuals can discuss personally confidential matters on forums and private chat rooms, while business partners can securely communicate to finalize deals and agreements (Tor Project, at https://www.torproject.org/). Tor has also been reportedly used for evading Internet censorship by repressive governments (Jardine, 2018b). Activists protesting against authoritarian regimes may also use it to protect communication from mass surveillance. Whistleblowers utilize Tor for securely sharing classified information with journalists (Jardine, 2018b).
The Tor network makes it possible for criminals to conduct illegal transactions on the dark Web. A varied range of illicit and nefarious activities, like the sale and purchase of legally prohibited drugs, distribution of child abuse content, hiring hitmen, unethical hacking guides, counterfeits, and weapons, are carried out on the dark Web (Chertoff and Simon, 2015). The trading of prohibited drugs is one of the most common activities on the dark Web (Dolliver, 2015), with several markets specializing in exotic drugs (Soska and Christin, 2015). Terrorists may also use the dark Web to spread propaganda, generate funding, and develop terrorist plans (Weimann, 2016).
The variety of services hosted on the dark Web has attracted researchers to identify its nature and composition. These studies have explored content available on the dark Web and classified this content into several categories. While these categorizations focus largely on supply side indicators with potentially limited utility as threat metrics (Jardine, 2019), they provide a clear sense of the types of content that are available on the Tor-hosted dark Web. Guitton (2013) collected a list of hidden services from three available databases. Some 1,171 hidden services were analyzed and classified into 23 different categories. These 23 categories were further divided into two broad categories, as ethical and unethical. Guitton found that 45 percent of content in hidden services falls into an unethical category of which 18 percent appeared as child pornography. This result was further confirmed by Spitters, et al. (2014), using a topic modeling technique. They found that half of the adult content sites offered child pornography. They identified 30 different languages used in dark Web content using a language classifier.
Biryukov, et al. (2014) used shortcomings in the Tor software to collect the onion addresses of 39,824 hidden services and adopted an automated approach to classify them into 18 categories, using MALLET (http://mallet.cs.umass.edu) and uClassify (https://www.uclassify.com). Their study claimed that nearly half of the hidden services were selling compromised accounts, counterfeit, and stolen items. Using Langdetect software, they identified 17 different languages, other than English, in which hidden services were offered and their popularity among the users. Owen and Savage (2016) examined a technical side of the dark Web, finding that the majority of services were running on Apache Web (https://httpd.apache.org) servers.
In a six-month study, Owen and Savage (2015) explored Tor hidden services and their usage. They found that services related to drugs contributed to 15 percent of their dataset. This was followed by fraud sites (nine percent) and child abuse (two percent). However, in their study, it was observed that around 80 percent of the traffic generated from Tor was directed towards requests for child abuse content.
In yet another quantitative study of Tor, Intelliagg (2016) reported the presence of drug trafficking, weapons, forged documents, and credit card dumps, among other services. As compared to Owen and Savage (2015), they found one percent of content related to child abuse.
Moore and Rid (2016) classified hidden services into 12 categories. Their aim was to develop a picture of what kinds of services were offered by Tor. Their crawler was designed to scrape only textual content, in order to avoid any explicit or illegal content. They clearly reported a profuse presence of illicit content in their results.
To create a more detailed picture of dark Web content, Al Nabki, et al. (2017) increased the number of possible categories for classification of hidden services. They manually divided 7,931 hidden services into 26 categories of which eight contained illegal content. They found that tf–idf (TFIDF or term frequency–inverse document frequency), along with logistic regression, produced the highest accuracy for classifying illegal activities.
These efforts to index hidden services demonstrated that both legal and illegal content is available on the dark Web. The dark Web ecosystem is highly unpredictable where old sites vanish and new sites emerge every day (Owenson, et al., 2018). Frequent changes in the number of hidden services demands continuous efforts towards their indexing in order to keep some sort of record of content. These efforts should examine all content, especially non-English content.
3. Collection of data
The data for the study was collected in two steps. In the first step, we collected the maximum possible hidden service addresses over a period of one month. This was accomplished with the help of custom-made Web crawler built in Python, called Link-Grabber. The crawler was configured to connect to the Internet through Tor network using the SOCKS (Koblas and Koblas, 1992) proxy. As it was not possible to secure a complete list of hidden services at the start of data collection, the crawler was fed with links available on the Hidden Wiki (https://thehiddenwiki.org). These links were crawled one by one and searched for fresh links. The newly found links were stored in a text file seeds.txt and any duplicate links were removed. At the end of this step, we had a list of 25,742 unique hidden service addresses or onion addresses.
Once we collected these onion addresses, we proceeded to utilize another customized crawler, called Content-Grabber. This crawler downloaded the markup (html) of the opening page of each hidden service it discovered and stored that information locally. All other content — like images, audio, video, downloadable files, and hyperlinks — were ignored by the crawler.
Out of 25,742 hidden services discovered in the second step, only 6,227 were online, while remaining sites could not be accessed at the time of crawl. This could be attributed to the short uptime of some hidden services (Owenson, et al., 2018). Thus it becomes difficult to crawl each and every hidden service hosted on the dark Web at any moment. The scanning of dark Web also exposed hidden services which were cloned. Many of these clones was related to Bitcoin and forged documents. A single hidden service had multiple instances of clone services and each was running with a unique domain name.
While scraping hidden services, 53 sites forbid the crawler and hence could not be crawled. For these sites, the crawler did not try to crawl by adjusting the HTTP request header, respecting their status. However, to get the type of content being hosted on those sites, we performed a manual inspection by accessing them via the Tor browser. For 31 sites, the browser was unable to connect to the server and the connection timed out. The remaining sites were successfully accessed, including those sites that required login details and internal server configuration pages.
4. Categorization of hidden services
To determine the nature of the dark Web it is useful to identify content hosted by hidden services. This classification includes examining content of each hidden service and placing that information under an appropriate category. We decided to manually classify hidden services, not using automated tools. However manual classification was quite tedious as there were more than 6,000 html files to classify. However, we assumed that manual classification would be more accurate than output from any automated classifier.
In addition, there was a broad range of topics available on the dark Web, including blogs and multitopic forums which required a close examination in order to classify them correctly.
In the process of classification, we found hidden services that did not fit into any particular category and were removed. Reasons for dismissal included:
- Text consist of three or less than three words;
- Error returned by a hidden service, like server configuration error, database error, client side-script error, etc.;
- Only an image was displayed with no accompanying text;
- Empty or blank Web pages; and,
- Sites containing redirection links.
We found altogether 2,125 hidden services. After elimination, the dataset included 4,102 hidden services out of a total of 6,227 for classification. Out of 4,102 hidden services, 3,480 were in English and 622 were not English.
4.1. English content
English is the most commonly used language, counting 3,480 sites of hidden service content. We identified 31 different categories as described in Table 1. As the legality of content on the dark Web varies from jurisdiction to jurisdiction, we used United States laws as a basis to determine the legality of discovered content. The results clearly indicate that illicit and criminal content is common.
Table 1: Categories and count of Tor hidden services, English content. Category Count Category Count Category Count Adult content 165 (4.7%) Electronics 60 (1.7%) Other cryptocurrencies 62 (1.8%) Bitcoin doubling 188 (5.4%) Ethical hacking 112 (3.2%) Personal Web sites 50 (1.4%) Bitcoin mixer 117 (3.4%) Forged documents 40 (1.1%) Political 9 (0.3%) Bitcoin trading 125(3.6%) Forums & others 337 (9.7%) Religious 6 (0.2%) Bitcoin wallets 68 (2%) Gambling-betting 31 (0.9%) Services 358 (10.3%) Books 36 (1%) Hosting 131 (3.8%) Software 131 (3.8%) CC dumps & others 271 (7.8%) Login 127 (3.6%) Tor 66 (1.9%) Counterfeits 37 (1.1%) Marketplace 355 (10.2%) Uncensored journalism 70 (2%) Directory 128 (3.7%) Music-entertainment 44 (1.3%) Violence 98 (2.8%) Drugs 179 (5.1%) News 26 (0.7%) Whistleblowers 48 (1.4%) Educational 5 (0.1%) Total illicit/illegal 1,315 (38%) Total licit/legal 2,165 (62%) Grand total 3,480
From Table 1, we can see that Bitcoin related categories are a large part of hidden services in the dataset. This includes hidden services for Bitcoin mixing, Bitcoin doubling, Bitcoin wallets, and trading. The authenticity of sites claiming to double Bitcoin could not be confirmed. Such a large share of Bitcoin related services reveals its popularity for buying goods and services on the dark Web, despite the possibilities of fraud (Böhme, et al., 2015).
The Bitcoin mixer enhances the anonymity of Bitcoin transactions by pooling transactions into random combinations, making them highly complicated. This activity ensures that personal details of a given user performing Bitcoin transactions are not traceable. All these sites promise to hide transaction trails for nominal fees.
Bitcoin trading includes services for Bitcoin exchange where one can buy and sell Bitcoin. However, users may have to face the risk of fraud and loss while using these exchanges (Moore and Christin, 2013).
Bitcoin is used for a variety of transactions, including the acquisition of drugs (Christin, 2012). Terrorist groups have also shown interests towards cryptocurrencies, which may increase in the future (Brantly, 2014; Bertrand, 2015) Bitcoin is also used for obtaining fake passports, driving licenses, and other identity documents for the U.S. or European Union (EU) countries (Baravalle, et al., 2016). Imagine the threats posed by an individual holding a fake passport for any EU member nation. The same person may secure guns and explosives illegally in exchange for Bitcoin (O’Neill, 2014), affecting th internal security and peace of any country. Figure 1 illustrates how Bitcoin could be used to perform criminal activities through the dark Web.
Figure 1: Role of Bitcoin in potential criminal activities on the dark Web.
Cryptocurrencies, including Bitcoin, are legal in the U.S. and elsewhere, used for acquiring goods and services. A seller can accept payments in Bitcoin. However, any payments made in cryptocurrencies are taxable, similar to transactions in other forms of currencies.
The category of Services contains a broad range, like e-mail, encryption-decryption, anonymity and privacy tools, escrow and jabber, and many others. It contributes around 10 percent of all hidden services. Services with unacceptable content, that may call for legal action, are described in Table 2.
Table 2: Variety of services available on the dark Web. Service Description Border crossings Services to cross international borders through illegal means DDoS attacks Organized denial of service attacks against specific servers Doxing Posting personal information or images online to malign a given individual Malware/ransomware providers Services that offer malware and ransomware iCloud activation Services for unlocking stolen Apple devices Pirated software Distribution of versions of licensed software at highly reduced prices Phishing Web page development Services that create a clone of any Web page specifically for phishing
The category Marketplace contains all of the hidden services operating as black markets, selling a range of illegal goods and services. The products may include, but are not limited to, stolen digital goods and jewelry, pirated movies, hacked Netflix accounts, and fake merchandise. One of the hidden services in this category claims to be selling human organs (liver, kidneys) for transplantation.
CC dumps & others contains hidden services offering cloned credit cards, details on hacked PayPal and bank accounts, social security numbers, and other personal information. Dumps refers to details on hacked and stolen credit cards, permitting the generation of cloned cards that can be used worldwide. The categories Counterfeits and Electronics provide fake currencies (U.S. dollars and euros) and stolen mobile phones, respectively.
Drugs were featured on 179 hidden services, offering cannabis, MDMA, stimulants, and others. These sites routinely claimed to deliver illegal products to any where in the world, accepting payments in Bitcoin.
Adult content was found on around 165 hidden services. We did not perform a deep analysis of this information, as it required downloading. Frequently, title tags for content provided sufficient clues and our results were in agreement with Guitton (2013).
With respect to the legality of adult pornography, U.S. law agencies routinely conduct the Miller test to determine if content is obscene. Content which fails to pass this test is usually subjected to legal action. In India, laws permit watching or possessing adult pornographic content but its production and distribution is against law and may lead to legal action. Many European countries have laws that allow the distribution of pornography but only to individuals above a specified age, often 18 years old. In some countries,like Saudi Arabia, accessing, watching, or distributing pornography is completely prohibited.
However, child pornography is largely illegal almost everywhere, including the U.S., India, and member states of the EU. Nevertheless, the dark Web features considerable child porn. Owen and Savage (2016) found that the large share of Tor network traffic was generated for child abuse content. Bestiality and distribution of animal porn has mixed legal status in many jurisdictions.
The category Violence includes, obviously, violent content, capable of harming individuals or masses. Content in this category is described in Table 3.
Table 3: Violent content available on the dark Web. Service Description Hit man Hire a hit man to murder/assassinate your rival.
Charges depend on the status of the individual to be assassinated.
Arms and ammunition Acquire guns like AK-47 as well as bullets and bombs illegally Guides Learn to manufacture explosives at home using easily available ingredients (obviously not for educational purposes) Human experiments Videos showing extreme torture of victims usually by mutilation Red room Online live rooms where a victim (usually female) is brutally assaulted per messages received from customers (paying in Bitcoin). Terrorism Web sites promoting terror propaganda, as well as offering tips on how to become a terrorist. Self-destruction Step-by-step procedures to physically harm or kill oneself with minimal pain using different methods. Gore Images showing horrific bloodshed, accidental deaths, medical procedures.
The category Forged documents includes services that claim to provide genuine passports to a variety of countries like the United States, Canada, and member nations of the European Union. Also available are driving licenses, birth certificates,social security numbers, academic degrees, and other documents. The prices of these products vary; U.S. citizenship documents cost US$500.
Other important categories that do not seem to be offering illegal content are described below.
The category Forums & others contains subcategories, such as blogs, forums, social networks, anonymous postings, and imageboards. All together this category contributes to eight percent of all content. The kinds of discussions that take place in these forums needs to be explored, an area of future research.
Another category that consists of around 128 services is Directories/wiki that contains all domains that provide links to other resources and hidden services. Some also provide information on the current status of a given site.
One contrasting feature of the dark Web that differentiates it from the visible Web is the amount of educational content present on it. We found only five hidden services in the dataset that offered educational content.
Our results found approximately five percent of hidden services offering drugs, in agreement with findings by Guitton (2013). However, other research found up to 15 percent of services that were drug-related (Owen and Savage, 2015; Moore and Rid, 2016). These differences could be due to the high turnover of services and sites within the ecosystem of the dark Web. It must be noted that the percentage of services trading in weapons remained almost constant — around 1.5 percent — across various indexing efforts. One important finding which is in line with Moore and Rid (2016) is the near absence of hidden services promoting Islamic terror propaganda. To be specific only five such sites were found in the present work.
Bitcoin and other cryptocurrency related services are a growing part of the dark Web, the percentage of these sites doubled in our study as compared to previous studies. The trending usage of cryptocurrencies also opens up possibilities for scams and fraud. Overall, total illicit/illegal content in our study is 38 percent, less than that reported by Guitton (2013) and Moore and Rid (2016). The decrease of illegal content might be due to the proliferation of Bitcoin related sites recently; the collective share of Bitcoin related sites is nearly 14 percent. Excluding Bitcoin sites from comparison raises illegal content to 45 percent, close to levels of illegal content found in past studies. Our study examined only content available in English, while other studies reported overall content, irrespective of language.
4.2. Non-English content
There were 622 hidden services available in non-English languages. We used Google Translate (https://translate.google.com/) to manually classify each hidden service from the dataset, finding 31 different languages. Russian accounts for 29 percent followed by German, French, and Spanish. Figure 2 illustrates the most common languages besides English and their percentage on dark Web. It has been found that European languages top the chart in terms of their availability on the dark Web. The complete list of languages is provided in Table 4.
Figure 2: Major languages other than English present on the dark Web.
Table 4: Non-English languages available on the dark Web. Number Language Count 1 Russian 183 2 German 87 3 Spanish 56 4 French 53 5 Portuguese 44 6 Chinese 30 7 Indonesian 28 8 Arabic 19 9 Italian 18 10 Finnish 15 11 Latin 14 12 Japanese 9 13 Czech 8 14 Dutch 7 15 Korean 7 16 Thai 7 17 Turkish 7 18 Polish 5 19 Catalan 4 20 Hebrew 4 21 Swedish 4 22 Danish 3 23 Bosnian 2 24 Afrikaans 1 25 Bengali 1 26 Bulgarian 1 27 Esperanto 1 28 Greek 1 29 Luxembourgish 1 30 Ukrainian 1 31 Vietnamese 1 32 Total 622
Non-English content was classified into the same categories as used for English content. Only 18 categories, of 31, could be identified for non-English content. Sites displaying messages of “Down for maintenance” or “Moved to a new address” were removed before classification, amounting to 68 out of 622 sites. The number of hidden services in each category, along with their respective languages, is described in Table 5. U.S. laws were applied as a guideline to determine the legality of examined content. The category Forums & others had the largest number of hidden services. Forums and blogs were the most common services appearing on the dark Web whether in English or in other languages.
Table 5: Categorization of non-English content on the dark Web. Language Categories Number Total Russian Adult content 9 Illegal: 77 (51%)
Legal: 75 (49%)
Drugs 31 Forged documents 7 Forums & others 38 Hosting 24 Login 4 Marketplace 13 Political 3 Religious 6 Violence 17 German Adult content 2 Illegal: 20 (26%)
Legal: 58 (74%)
Drugs 4 Forums & others 22 Ethical hacking 2 Login 2 News 1 Services 36 Uncensored journalism 9 Spanish Adult content 13 Illegal: 22 (39%)
Legal: 34 (61%)
Drugs 1 Forged documents 8 Forums & others 19 Ethical hacking 2 News 6 Personal Web sites 3 Political 4 French Adult content 5 Illegal: 22 (42%)
Legal: 31 (58%)
Drugs 4 Forged documents 9 Forums & others 11 Ethical hacking 2 Hosting 4 Login 1 Marketplace 4 News 2 Personal Web site 2 Religious 1 Uncensored journalism 8 Portuguese Adult content 8 Illegal: 9 (20%)
Legal: 35 (80%)
Bitcoin trading 1 Drugs 1 Forums & others 13 Ethical hacking 5 Music & entertainment 4 News 8 Political 1 Religious 3 Chinese Adult content 2 Illegal: 10 (38%)
Legal: 16 (62%)
Forums & others 5 News 5 Political 1 Services 8 Software 5 Indonesian Adult content 3 Illegal: 7 (28%)
Legal: 18 (72%)
Forums & others 3 Ethical hacking 1 Music & entertainment 3 News 5 Personal Web sites 2 Services 4 Software 1 Uncensored journalism 3 Italian Adult content 4 Illegal: 5 (29%)
Legal: 12 (71%)
Drugs 1 Forums & others 7 News 2 Political 2 Religious 1 Arabic Forums & others 4 Illegal: 0 (0%)
Legal: 15 (100%)
News 6 Personal Web sites 4 Software 1 Finnish Drugs 2 Illegal: 6 (40%)
Legal: 9 (60%)
Forged documents 4 Forums & others 2 Hosting 2 Login 2 Software 3 Japanese Adult content 6 Illegal: 6 (67%)
Legal: 3 (33%)
Forums & others 1 Software 2 Czech Bitcoin trading 2 Illegal: 0 (0%)
Legal: 7 (100%)
Music & entertainment 3 Forums & others 2 Korean Forums & others 3 Illegal: 0 (0%)
Legal: 7 (100%)
Login 1 Personal Web sites 3 Thai Forged documents 2 Illegal: 2 (29%)
Legal: 5 (71%)
Forums & others 1 Ethical hacking 4 Turkish Forums & others 3 Illegal: 2 (29%)
Legal: 5 (71%)
Software 1 Uncensored journalism 1 Violence 2 Dutch Adult content 1 Illegal: 1 (17%)
Legal: 5 (83%)
Forums & others 1 News 1 Personal Web sites 1 Software 2 Polish Login 3 Illegal: 0 (0%)
Legal: 4 (100%)
Software 1 Catalan Political 4 Illegal: 0 (0%)
Legal: 4 (100%)
Hebrew Forums & others 2 Illegal: 0 (0%)
Legal: 4 (100%)
Uncensored journalism 2 Swedish Drugs 2 Illegal: 2 (50%)
Legal: 2 (50%)
Forums & others 1 News 1 Danish Ethical hacking 1 Illegal: 1 (33%)
Legal: 2 (67%)
Hosting 1 Violence 1 Bosnian Hosting 1 Illegal: 1 (50%)
Legal: 1 (50%)
Marketgplace 1 Afrikaans Religious 1 Illegal: 0 (0%)
Legal: 1 (100%)
Bengali Violence 1 Illegal: 1 (100%)
Legal: 0 (0%)
Bulgarian News 1 Illegal: 0 (0%)
Legal: 1 (100%)
Esperanto Forums & others 1 Illegal: 0 (0%)
Legal: 1 (100%)
Greek Political 1 Illegal: 0 (0%)
Legal: 1 (100%)
Latin Drugs 1 Illegal: 1 (100%)
Legal: 0 (0%)
Luxembourgish Personal Web sites 1 Illegal: 0 (0%)
Legal: 1 (100%)
Ukrainian Political 1 Illegal: 0 (0%)
Legal: 1 (100%)
Vietnamese Political 1 Illegal: 0 (0%)
Legal: 1 (100%)
Total Adult content 53 (10%) Illegal: 195 (35%)
Legal: 359 (65%)
Bitcoin trading 3 (1%) Drugs 47 (8%) Forged documents 30 (5%) Forums & others 139 (25%) Ethical hacking 17 (3%) Hosting 32 (6%) Login 13 (2%) Marketplace 18 (3%) Music & entertainment 10 (2%) News 38 (7%) Personal Web sites 16 (3%) Political 18 (3%) Religious 12 (2%) Services 48 (9%) Software 16 (3%) Uncensored journalism 23 (4%) Violence 21 (4%)
The results indicate that the most common and illegal non-English services were akin to those found in English, such as like drugs, fake documents, and violent content. The category with the largest number of hidden services was Forums & others, which may be due simply that individuals prefer to express themselves in their native languages. The same factors may apply for the presence of a large number of news Web sites. Bitcoin-related services were almost absent amounting to less than one percent of active sites, a large variance from English content where Bitcoin-related sites amounted to nearly 15 percent of the total. Adult content sites and fake document sites were more significant as non-English content. Political sites were almost ten times more common than in English content.
Figure 3 illustrates the percentage of top ten categories of non-English content as compared to corresponding English content present on the dark Web.
Figure 3: Category wise comparison (in percentage) of English and non-English content.
French has the most varied content across 12 categories. Hidden services offering drugs were available only in European languages (except for Latin) with the majority of drug-related services offered in Russian. Russian hosting services were four percent of total non-English content, greater than the total percentage of hosting services offered in English (3.8 percent).
Total illegal content on the non-English dark Web was less than legal content, in part because a large share of illegal Russian content. If illegal Russian content was removed from the total illegal content, there would be decrease from 35 to 21 percent. Overall, the non-English dark Web was a slightly less illegal than solely the English dark Web. This gap increased if the comparison took into account the share of Bitcoin sites in English.
Figure 4 depicts the share of four major categories in the top five languages present on the dark Web. The category Forums & others occupied the largest part of all of the top five languages.
Figure 4: Comparison of different categories present among the top five languages.
5. Conclusion and future work
This research described the nature of hidden services hosted on the Tor dark Web. To collect onion addresses, we developed custom crawlers in Python that scanned the dark Web and downloaded content of 25,742 hidden services. Only 6,227 services were active at the time of crawl. We found that Russian was the most widely used language on the dark Web, after English. We classified downloaded content into 31 different categories. There were a number of services providing illegal, unethical, and controversial content, supporting a perception of the illicit nature of the dark Web. A classification of non-English content found that forums as well as drugs occurred frequently. Other than English or Russian, content was frequently available in German, French, and Spanish.
Our future research will focus on examining a greater number of hidden services. In addition, this paper investigated dark Web services accessible only through Tor. There is a need to treat other dark Web networks, like the Invisible Internet Project (I2P, at https://geti2p.net/en/) and Freenet ( https://freenetproject.org/author/freenet-project-inc.html), in order to develop a more complete picture of the dark Web.
About the authors
Mohd Faizan is pursuing a Ph.D. in the Department of Information Technology, Babasaheb Bhimrao Ambedkar University, Lucknow, India.
Direct comments to: imfaizan15 [at] gmail [dot] com
Raees Ahmad Khan is currently working as a Professor and Head of the Department of Information Technology, Babasaheb Bhimrao Ambedkar University, Lucknow, India. His areas of interest are software security, software quality, and software testing.
M.W. Al Nabki, E. Fidalgo, E. Alegre, and I. de Paz, 2017. “Classifying illegal activities on Tor network based on Web textual contents,” Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, pp. 35–43, and at https://aclweb.org/anthology/E17-1004, accessed 23 April 2019.
A. Baravalle, M.S. Lopez, and S.W. Lee, 2016. “Mining the dark Web: Drugs and fake ids,” Proceedings of the IEEE 16th International Conference on Data Mining Workshops, pp. 350–356.
doi: https://doi.org/10.1109/ICDMW.2016.0056, accessed 23 April 2019.
N. Bertrand, 2015. “ISIS is taking full advantage of the darkest corners of the Internet,” Business Insider (11 July), at https://www.businessinsider.com/isis-is-using-the-dark-web-2015-7, accessed 27 September 2018.
A. Biryukov, I. Pustogarov, F. Thill, and R.-P. Weinmann, 2014. “Content and popularity analysis of Tor hidden services,” ICDCSW ’14: Proceedings of the IEEE 34th International Conference on Distributed Computing Systems Workshops, pp. 188–193.
doi: https://doi.org/10.1109/ICDCSW.2014.20, accessed 23 April 2019.
R. Böhme, N. Christin, B. Edelman, and T. Moore, 2015. “Bitcoin: Economics, technology, and governance,” Journal of Economic Perspectives, volume 29, number 2, pp. 213–238.
doi: https://doi.org/10.1257/jep.29.2.213, accessed 23 April 2019.
A. Brantly, 2014. “Financing terror bit by bit,” CTC Sentinel, volume 7, number 10, pp. 1–5, and at https://ctc.usma.edu/financing-terror-bit-by-bit/, accessed 23 April 2019.
M. Chertoff and T. Simon, 2015. “The impact of the dark Web on Internet governance and cyber security,” Global Commission on Internet Governance, paper series, number 6, at https://www.cigionline.org/sites/default/files/gcig_paper_no6.pdf, accessed 25 September 2018.
N. Christin, 2012. “Traveling the Silk Road: A measurement analysis of a large anonymous online marketplace,” arXiv (28 November), at https://arxiv.org/abs/1207.7139, accessed 25 September 2018.
L. Clark, 2013. “A guide to the Silk Road shutdown,” Wired (9 October), at http://www.wired.co.uk/article/silk-road-guide/, accessed 6 September 2018.
R. Dingledine, N. Mathewson, and P. Syverson,2004. “Tor: The second-generation onion router,” SSYM’04: Proceedings of the Thirteenth Conference on USENIX Security Symposium, volume 13, p. 21.
D.S. Dolliver, 2015.“Evaluating drug trafficking on the Tor Network: Silk Road 2, the sequel,” International Journal of Drug Policy, volume 26, number 11, pp. 1,11–1,123.
doi: https://doi.org/10.1016/j.drugpo.2015.01.008, accessed 23 April 2019.
K. Finklea and C.A. Theohary, 2015. “Cybercrime: Conceptual issues for Congress and U.S. law enforcement,” Congressional Research Service Report, R42547 (15 January), at https://www.hsdl.org/?view&did=762027, accessed 23 April 2019.
A. Greenberg, 2014. “Hacker lexicon: What is the dark Web?” Wired (19 November), at https://www.wired.com/2014/11/hacker-lexicon-whats-dark-web/, accessed 25 February 2019.
C. Guitton, 2013. “A review of the available content on Tor hidden services: The case against further development,” Computers in Human Behavior, volume 29, number 6, pp. 2,805–2,815.
doi: https://doi.org/10.1016/j.chb.2013.07.031, accessed 23 April 2019.
Intelliagg, 2016. “Deeplight: Shining a light on the dark Web,” at https://media.scmagazine.com/documents/224/deeplight_(1)_55856.pdf, accessed 23 April 2019.
E, Jardine, 2019. “The trouble with (supply-side) counts: The potential and limitations of counting sites, vendors or products as a metric for threat trends on the Dark Web,” Intelligence and National Security, volume 34, number 1, pp. 95–111.
doi: https://doi.org/10.1080/02684527.2018.1528752, accessed 23 April 2019.
E. Jardine, 2018a. “Privacy, censorship, data breaches and Internet freedom: The drivers of support and opposition to Dark Web technologies,” New Media & Society, volume 20, number 8, pp. 2,824–2,843.
doi: https://doi.org/10.1177/1461444817733134, accessed 23 April 2019.
E. Jardine, 2018b. “Tor, what is it good for? Political repression and the use of online anonymity-granting technologies,” New Media & Society, volume 20, number 2, pp. 435–452.
doi: https://doi.org/10.1177/1461444816639976, accessed 23 April 2019.
E. Jardine, 2015. “The dark Web dilemma: Tor, anonymity and online policing,” Global Commission on Internet Governance, paper series, number 21, at https://www.cigionline.org/publications/dark-web-dilemma-tor-anonymity-and-online-policing, accessed 23 April 2019.
D. Koblas and M.R. Koblas, 1992. “SOCKS,” Proceedings of the 1992 Usenix Security Symposium, at https://www.usenix.org/legacy/publications/library/proceedings/sec92/full_papers/koblas.pdf, accessed 23 April 2019.
D. Moore and T. Rid, 2016. “Cryptopolitik and the Darknet,” Survival, volume 58, number 1, pp. 7–38.
doi: https://doi.org/10.1080/00396338.2016.1142085, accessed 23 April 2019.
T. Moore and N. Christin, 2013. “Beware the middleman: Empirical analysis of Bitcoin-exchange risk,” In: A.-R. Sadeghi (editor). Financial cryptography and data security. Lecture Notes in Computer Science, volume 7859. Berlin: Springer, pp. 25–33.
doi: https://doi.org/10.1007/978-3-642-39884-1_3, accessed 23 April 2019.
P.H. O’Neill, 2014. “Enter the Armory, the Dark Web’s Walmart of weaponry,” Daily Dot (4 April), at https://www.dailydot.com/crime/armory-deep-web-weapons/, accessed 8 September 2018.
G. Owen and N. Savage, 2016. “Empirical analysis of Tor hidden services,” IET Information Security, volume 10, number 3, pp. 113–118.
doi: https://doi.org/10.1049/iet-ifs.2015.0121, accessed 23 April 2019.
G. Owen and N. Savage, 2015. “The Tor dark net” Global Commission on Internet Governance, paper series, number 20, at https://www.cigionline.org/publications/tor-dark-net, accessed 23 April 2019.
G. Owenson, S. Cortes, and A. Lewman, 2018. “The Darknet’s smaller than we thought: The life cycle of Tor hidden services,” Digital Investigation, volume 27, pp. 17–22.
doi: https://doi.org/10.1016/j.diin.2018.09.005, accessed 23 April 2019.
I. Sanchez-Rola, D. Balzarotti, and I. Santos, 2017. “The onions have eyes: A comprehensive structure and privacy analysis of Tor hidden services,” WWW ’17: Proceedings of the 26th International Conference on World Wide Web, pp. 1,251–1,260.
doi: https://doi.org/10.1145/3038912.3052657, accessed 23 April 2019.
K. Soska and N. Christin, 2015. “Measuring the longitudinal evolution of the online anonymous marketplace ecosystem,” 24th USENIX Security Symposium, pp. 33–48, and at https://www.usenix.org/node/190887, accessed 23 April 2019.
M. Spitters, S. Verbruggen, and M. Staalduinen, 2014. “Toward a comprehensive insight into the thematic organization of Tor hidden services,” Proceedings of the 2014 IEEE Joint Intelligence and Security Informatics Conference, pp. 220–223.
doi: https://doi.org/10.1109/JISIC.2014.40, accessed 23 April 2019.
J. Van Buskirk, A. Roxburgh, R. Bruno, and L. Burns, 2014. “Drugs and the Internet,” National Drug and Alcohol Research Centre (Sydney), at https://ndarc.med.unsw.edu.au/sites/default/files/ndarc/resources/Drugs&TheInternet_Issue2.pdf, accessed 23 April 2019.
Gabriel Weimann, 2016. “Terrorist migration to the dark Web,” Perspectives on Terrorism, volume 10, number 3, pp. 40–44, and at http://www.terrorismanalysts.com/pt/index.php/pot/article/view/513, accessed 23 April 2019.
Wikipedia, 2019a, “Ricochet (software),” at https://en.wikipedia.org/wiki/Ricochet_(software), accessed 10 March 2019.
Wikipedia, 2019b, “TorChat,” at https://en.wikipedia.org/wiki/TorChat, accessed 10 March 2019.
Wikipedia, 2019c, “Tor (anonymity network),” https://en.wikipedia.org/wiki/Tor_(anonymity_network), accessed 10 March 2019.
Received 16 October 2018; revised 3 March 2019; revised 12 March 2019; revised 18 March 2019; accepted 20 March 2019.
This paper is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Exploring and analyzing the dark Web: A new alchemy
by Mohd Faizan and Raees Ahmad Khan.
First Monday, Volume 24, Number 5 - 6 May 2019