![]()
The increased commercialisation of Internet domain sales created the unanticipated side effect that domain extensions no longer signify the residence of the domain user. As a result, the analysis of the domain attributes in the Web access logs no longer provides accurate information on the origin of the users and thus of the geographical reach of a given site. This study provides an alternative method to assess the geographical reach by calculating the average demand for Web pages in hourly intervals originating from each time zone. The resulting analysis tool, which relates to Greenwich Mean Time, is location independent and can be applied to Web sites world wide.Contents
Introduction
Hourly demand as a solution?
Daily human life patterns and Web usage
Distribution of Web users in the world
Introduction
One of the major selling points of an on-line presence is that business is conducted 24/7, that is at all times of the day of any given day of the year. From a business perspective, it is worth considering though, whether this actually occurs, or whether it is a mere figment of managerial imagination.
Geographical analysis of the users is becoming increasingly problematic, because (i) many servers calling up a Web site provide unresolved URLs; and, (ii) the .com, .net and .org domains are also used outside the U.S. Unresolved URLs are calls made to a Web server where the DNS entry of the calling server does not extend beyond its numerical IP address. Locational information, through a coded domain name (such as .au, .de, .fr, etc.) is not provided. As result of the sale of domain names it is now possible to register domains in various countries without maintaining a server there. Furthermore, the increasing globalisation of the communications industry has seen an expansion of U.S. communication companies. Many of these now offer services in Germany, Canada, Australia and other countries. While these providers have resolved their DNS entries, they end in .net (such as bellsouth.net, comcast.net, etc.) or .com (such as rr.com, charter.com). While it is often possible to analyse the server logs and break the location code of some of these providers, this is cumbersome. It is near impossible to do so for the unresolved IP addresses, even with a reasonable level of effort for each.
Standard, publicly available Web site statistics provide information on a wide range of data, both on absolute usage and on a temporal scale well as permit the tracking of users movements through a Web site (Petersen 2004; 2005). Many of the commercial programs are either too technical for the lay person, or the Web site maintained is not deemed to warrant expenditure in commercial applications or analysis contracts. There is a need for a roughandready method that allows anyone with access to basic Web site statistics to estimate the internationality of a site.
The author, a university academic, maintains The Marshall Islands A Digital Library and Archive, a Web site providing a wide range of primary and secondary source material on that Pacific Island country (http://marshall.csu.edu.au). The subject matter, covering all aspects of history, culture, health, politics and economy of the island country, appeals to the Internet user population as a whole, but to ensure that the site meets the expectations of the audience, it is necessary to understand the origin and nature of the users, as well as what information they are seeking. An analysis of the user population and their needs has been carried out (Spennemann, 2004). One of the key questions to be answered asks how international the site usage actually is. Figure 1 sets out the frequency of domains requesting pages from the Marshall Islands site arranged by continent for the quarter November 2003 to January 2004. The heavy preponderance of .com. .net and unresolved domains is selfevident, distorting any meaningful analysis. Attributing all .com and .net domains to users in the U.S. is clearly untenable in the light of the international sales of Web domains.
Figure 1: Frequency of domains requesting pages from the Marshall Islands site arranged by continent.
Alternative options, such as assessing the frequency of links pointing to the site, or the Google rank (rank 3, search term Marshall Islands) only provide an estimate of the popularity of the site among other Web service providers, but do not provide information on the origin of the actual users. The same applies to the unsolicited comments and information queries that the site generates.
In view of this an alternative method had to be found that provides some measure to assess the internationality of the Web site. This paper will provide a comparatively simple method that allows Web site authors to gain an indication on the range of utilisation, based on the hourly frequency of server calls. The method described below is also designed in such a way that it allows to assess any changes in internationality over time.
Hourly demand as a solution?
Figure 2 shows the average hourly demand for pages of the Marshall Islands site (in percentage of requests per day) for the period 2000 to May 2004. The data for 2000 and 2001 (until August inclusive) are based on portal page access only; for September 2001 to March 2002 they are based on all pages, with statistics provided courtesy of Netscapes now defunct hitometer service. Commencing April 2002 accurate serverlevel statistics (through logs) became available. To smooth monthly variations in load, the totals were expressed in percent. These monthly percentage values were then averaged for each year.
The hourly demand curve shows a gradual flattening out nearing the theoretical ideal loading for a server. Does this mean that the demand for the service provided by the Marshall Islands site is nearing the ideal worldwide audience? Before we can answer this question, however, we briefly need to look at the patterns of human daily life.
Figure 2: Average hourly demand for pages of the Marshall Islands site (in percentage of requests per day) for 2000 to 2003, compared to the ideal load (time zone is Greenwich Mean Time)..
Daily human life patterns and Web usage
The normal daytoday life pattern of a citizen of a Western, industrialised country follows the routines of works, leisure and sleep. Daily routines, such as working, using the public roads network, as well as using communications media such as the telephone, follow a diurnal pattern, whereby usage is at its lowest during the early hours of the morning, and at its highest in the middle of the day. Other activities, such as leisure (e.g., watching television) follow similar curves, taking into account time spent at work.
Figure 3: Diurnal variation in the demand for electricity, road traffic, telephone services and television (whole week).
Figure 3 shows the diurnal variation in the demands for electricity (Albury, Australia); roads (Seattle, Wash.; 2x Sydney, NSW; State of Vermont); telephone (six switches along the West Coast of the U.S.; seven exchanges along the eastern seaboard of Australia) and television (Australia). These data clearly demonstrate the different demands on services by work conditions: telephone usage is concentrated in the standard work hours 8:00 to 17:00, while high road usage starts and ends about one hour outside the standard work hours. Television usage, on the other hand is a widely used leisure activity, which, not surprisingly, peaks after the conclusion of the work day. The exception is the curve that for electricity. While it shows a small depression in the hours of early morning, the majority of the curve is flat. This is hardly surprising given that electricity supply is a regulated product, which is generated solely based on demand. Here the providers have the opportunity to level out the curve close to the theoretical ideal.
Web usage on the other hand, is a combination of work and leisure activity. Thus we can expect a combination of demand during and after work hours, with a low demand in the early hours of the morning. A Web usage standard was calculated based on the average number of dial up call made through ten exchanges, five on the western seaboard of the United States and five on the eastern seaboard of Australia. PacWest provided the data for the five U.S. exchanges (Phoenix, Ariz.; Seattle, Wash.; Stockton, Calif.; Los Angeles, Calif. [2 x]), while Telstra and Albury Internet provided the five Australian data sets (Albury, Brisbane, Melbourne, Sydney [2x]). Figure 4 shows just how similar the Australian and U.S. averages are, with the Australian Web users showing a slightly higher usage in the hours between 4 pm and 9 pm.
Web usage is high at the start of the work day and then peaks at the end of the work period, with another slight peak by 7 pm, presumably after the dinner period, representing those who come online in the evening (Figure 4). Table 1 provides the raw data set to permit the reader a comparison with their own Web site demand curves.
Table 1: Web usage Standard for a single time zone.
Local time Percentage 00:0001:00 1.45 01:0002:00 1.01 02:0003:00 0.80 03:0004:00 0.72 04:0005:00 0.83 05:0006:00 1.27 06:0007:00 2.34 07:0008:00 3.73 08:0009:00 5.14 09:0010:00 5.91 10:0011:00 6.00 11:0012:00 5.70 12:0013:00 5.49 13:0014:00 5.56 14:0015:00 5.83 15:0016:00 6.35 16:0017:00 6.65 17:0018:00 6.25 18:0019:00 6.20 19:0020:00 6.30 20:0021:00 6.07 21:0022:00 4.80 22:0023:00 3.44 23:0024:00 2.15
Figure 4: Diurnal standards of Internet usage in Australia and the U.S. West Coast compared to the ideal load.
Distribution of Web users in the world
Given the observed diurnal variations in human activity patterns, it follows that a Web site which possesses a purely national audience will have a demand curve that resembles the standard diurnal pattern. A site with a truly international audience, therefore, should have a flat curve, caused by superimposing the diurnal curves for all individual world time zones. Such a totally horizontal curve is hypothetical only, however, because the Worlds population is not evenly spread across the time zones, with the Pacific Ocean being the most obvious and prominent deviation (Figure 5).
Figure 5: Map of the world showing time zones (Hong Kong Observatory, at http://www.hko.gov.hk/gts/time/clock/clockA.htm).
To arrive at an accurate picture, the distribution of Web users per time zone was calculated drawing on countryspecific demographic data and Web user estimates. The total world population estimate as of July 2003 has been culled from the CIA World Factbook 2003 (CIA, 2003). The latter source also served for the estimate of the population of Web users. The CIA Handbook, however, does not provide estimates of Web users for some countries/domains, which are represented in the logs of the Marshall Islands site. For some of these, such as Guernsey or the Isle of Man, the United Kingdom percentage was applied, while for others small token numbers of 1,000 or less were estimated. Table 3 (see end of paper) sets out these data. The countries of the world were regrouped into the worlds time zones, using Coordinated Universal Time (UTC). The time zone information was culled from SwissInfo (2003).
The estimated number of Web users in countries which stretch multiple time zones, i.e. Australia (3), Brazil (3), Canada (6), Indonesia (3), Mexico (3), Russia (10), and the U.S. (6), was proportionally allocated to the time zones they span, based on the actual total human population distribution per time zone within the country. It is acknowledged that the distribution of Internet access in countries such as Russia, Indonesia or Brazil, may not be uniformly distributed, but be concentrated in population centres. It is impossible at this time to develop a more detailed approach that would remove any ambiguities, not matter how small they are deemed to be at present.
What becomes obvious from a perusal of Figure 6 is that the relative proportion of Web users across the globe is distributed quite differently from the population. This is not surprising, given the socioeconomic differential between the developed countries in North America, Europe and Australia on the one hand, and the populous regions of SouthEast Asia and China on the other.
Figure 6: Worldwide distribution of Internet users compared to the total population arranged by time zones.
The matter is complicated by that fact that not all Web users are Englishspeaking. While in the early days of the World Wide Web the fear had been expressed that English would dominate the Web at the expense of small linguistic groups (cf. Spennemann, et al., 1996), the development of the Web and the concomitant development of graphicsbased personal computers has seen a multitude of languages being used on the Internet, some of them using different character sets such as Japanese, Chinese and Arabic or Russian. While English is one of the most widely understood languages, one cannot readily extrapolate from the number of Web users in the world to the number of Englishspeaking Web users. While there are tables detailing the languages spoken in the countries of the world, there is no compilation that sets out the percentage of people in a given country that either speaks or at least understands English.
For the Web user community limited estimates were published by Global Reach (2003, see for methodology and references; see also OCOL, 2002). In the absence of reliable and accessible data, a new estimate was compiled for this study based on the languages spoken in each country. Where English is spoken as an official language it was assumed that 95 percent of the population had the language ability to read or use Englishlanguage Web sites, taking into account a small minority of nonEnglish speaking immigrants. Where English was listed as a second language or as a language widely spoken, a value of 50 percent was used, while a value of 33.3 percent was used for countries where English is deemed as widely understood. Those countries where English is learned as a foreign language, were split into countries with a welldeveloped Westernoriented education system (with Englishlanguage proficiency set at 15 percent) and the remaining countries for which a proficiency of five percent was assumed. Japan and China were set at five percent while South Korea was set at 15 percent.
It is obvious that such arbitrary figures are not a reflection of the true distribution, but only an approximation. It is probable that the five percent estimates for many of the developing countries are an underestimate. The distribution of Englishproficient Web users against populations proficient in English (Figure 7) is even more skewed than the previous comparison on the basis of total population.
Figure 7: Worldwide distribution of people proficient in English compared to Englishproficient Web users (see text).
These graphs, however, cannot be taken at face value for comparison with Web demand of a given site. As the Internet usage for each country follows the diurnal pattern set out earlier, these figures need to be modulated. For this reason, the potential Web users for each time zone were distributed in percent according to the diurnal standard set out in Figure 4. This results in the modulated demand curves shown in Figure 8. Table 2 provides the raw data set to permit the reader a comparison with their own Web site demand curves.
Figure 8: Worldwide distribution of Internet users compared to Englishproficient Web users modulated for diurnal variations of usage.
Table 2: Web usage Web usage Standard for all time zones combined.
GMT Total Web users Englishproficient
Web users-11 3.57 2.97 -10 3.75 3.60 -9 3.95 4.24 -8 4.18 4.71 -7 4.27 4.89 -6 4.14 4.83 -5 3.92 4.77 -4 3.88 4.86 -3 4.05 5.11 -2 4.33 5.33 -1 4.56 5.52 GMT 4.61 5.54 1 4.52 5.38 2 4.48 5.24 3 4.54 5.08 4 4.65 4.75 5 4.71 4.18 6 4.63 3.55 7 4.35 3.01 8 4.12 2.70 9 3.94 2.51 10 3.78 2.36 11 3.58 2.34 12 3.49 2.52
Figure 9: Comparison of the usage of the Marshall Islands site (2003May 2004) with the Australian Web standard and the world standard Englishspeaking Web users.
Finally, how does the demand curve for the Marshall Islands site, which started this investigation, stack up? Figure 9 shows the demand curve for the Marshall Islands site (average January 2003 to May 2004) compared to the Australian Web standard and compared to the international Web standard of Englishspeaking Web users. Clearly, the demand curve for the Marshall Islands site does not conform at all to the national Australian curve, demonstrating a much more international appeal of the site. Yet while the demand curve follows the international Web standard in general terms, it does not follow it closely: it is too flat. Looking at the time zone distribution in relation to GMT, this suggests that the site has a higher than standard appeal in the AustraliaPacific region, a slightly lower than standard appeal in the U.S. and a lower that standard appeal in the SouthEast Asian region. Given the regional thematic focus of the site, dealing with a small Pacific Island nation, this distribution pattern is to be expected.
Table 3 : Population, number of Web users and estimated Englishlanguage proficiency per country
Land
Time Zone
UTC
Population
Estimated Web users
Percentage EnglishSpeaking
Estimated Englishproficient Web users
Afghanistan *
4.5
28717213
200
5
10
Albania
1
3582205
12000
5
600
Algeria
1
32818500
180000
5
9000
American Samoa *
-11
70260
1000
95
950
Andorra
1
69150
24500
5
1225
Angola
1
10766471
60000
5
3000
Anguilla
-4
12738
919
33
303
Antigua and Barbuda
-4
67897
5000
95
4750
Argentina
-3
38740807
3880000
50
1940000
Armenia
4
3326448
30000
5
1500
Aruba
-4
70844
24000
5
1200
Australia EST
10
1745365
940262
95
893249
Australia CST
9.5
16060997
8652369
95
8219751
Australia WST
8
1925622
1037370
95
985502
Austria
1
8188207
3700000
15
555000
Azerbaijan
4
7830764
25000
5
1250
Bahamas
-5
297477
16900
95
16055
Bahrain
3
667238
140200
50
70100
Bangladesh
6
138448210
150000
50
75000
Barbados
-4
277264
6000
95
5700
Belarus
3
10322151
422000
5
21100
Belgium
1
10289088
3760000
15
564000
Belize
-6
266440
18000
95
17100
Benin
1
7041490
25000
5
1250
Bermuda
-2
64482
25000
95
23750
Bhutan
6
2139549
2500
5
125
Bolivia
-4
8586443
78000
5
3900
BosniaHerzegovina
1
3989018
45000
5
2250
Botswana
2
1573267
33000
95
31350
Brazil Andes
-5
589810
45297
15
6795
Brazil Western
-4
16176261
1242328
15
186349
Brazil Eastern
-3
165266534
12692375
15
1903856
British Virgin Islands *
-4
21730
1000
95
950
Brunei
8
358098
35000
50
17500
Bulgaria
2
7537929
585000
5
29250
Burkina
0
13228460
25000
5
1250
Burma
6.5
42510537
10000
5
500
Burundi
2
6096156
6000
5
300
Cambodia
7
13124764
10000
50
5000
Cameroon
1
15746179
45000
95
42750
Canada Pacific
-8
4228780
2211084
95
2100530
Canada Mountain
-7
3235934
1691959
95
1607361
Canada Central
-6
2254368
1178732
95
1119795
Canada Eastern
-5
20032559
10474341
95
9950624
Canada Atlantic
-4
1904476
995786
95
945997
Canada Newfoundland
-3.5
550996
288097
95
273692
Cape Verde
-1
412137
12000
5
600
Cayman Islands *
-5
41934
5000
95
4750
Central African Republic
1
3683538
2000
5
100
Chad
1
9253493
4000
5
200
Chile
-4
15665216
3100000
15
465000
China
8
1286975468
45800000
5
2290000
Christmas Island (Indian Ocean) *
7
433
0
95
0
Cocos Islands (Indian Ocean) *
6.5
630
0
95
0
Colombia
-5
41662073
1150000
5
57500
Comoro Islands
3
632948
2500
5
125
Congo, Dem. Rep. of the
1
56625039
6000
5
300
Congo, Rep. of the
1
2954258
500
5
25
Cook Islands *
-10
21008
1000
50
500
Costa Rica
-6
3896092
384000
5
19200
Cote dIvoire
0
16962491
70000
5
3500
Croatia
1
4422248
480000
15
72000
Cuba
-5
11263429
120000
5
6000
Cyprus
2
771657
150000
33.3
49950
Czech Republic
1
10249216
2690000
5
134500
Denmark
1
5384384
3370000
15
505500
Djibouti
3
457130
3300
5
165
Dominica
-4
69655
2000
95
1900
Dominican Republic
-4
8715602
186000
33.3
61938
East Timor *
9
997853
200
50
100
Ecuador
-5
13710234
328000
5
16400
Egypt
2
74718797
600000
15
90000
El Salvador
-6
6470379
40000
5
2000
Equatorial Guinea
1
510473
900
5
45
Eritrea
3
4362254
10000
5
500
Estonia
2
1408556
429700
50
214850
Ethiopia
3
66557553
20000
50
10000
Falkland Islands *
-4
2967
100
50
50
Faroe Islands
0
46345
3000
5
150
Fiji
12
868531
15000
95
14250
Finland
2
5190785
2690000
15
403500
France
1
60180529
16970000
15
2545500
French Guyana
-3
186917
2000
5
100
French Polynesia
-10
262125
16000
5
800
Gabon
1
1321560
18000
5
900
Gambia
0
1501050
5000
95
4750
Georgia
4
4934413
25000
5
1250
Germany
1
82398326
32100000
15
4815000
Ghana
0
20467747
200000
95
190000
Gibraltar *
1
27776
5000
95
4750
Greece
2
10665989
1400000
5
70000
Greenland
-2
56385
20000
5