The famous Sapir-Whorf Hypothesis posits a linguistic determinism arguing language plays a central role in creation of a worldview. In the sense that language is a product of words, one can say that a culture's worldview is affected and influenced by the words of its particular language. Words both create and communicate worldviews. The greatest potential in history for the observation and analysis of words exists on the Internet. Indeed, the Internet can be considered history's greatest observatory and laboratory of words.
Depth (Synchronic Words)
Cycles (Diachronic Words)
Organization of Information
Beyond Sites and Words
The Promise of Research
"Search is the number-one thing people do on the Web today. Number two is e-mail. And the big complaint we get isn't about speed - we're the fastest there is. What people want is even more information, from sources that are not currently searched. Research reports, news files, historical archives, university projects - they're out there, but not yet in publicly searchable sources. Getting that information indexed is a Herculean, but incredibly important job."
- Eric Schmidt, CEO, Google, in American Spectator (August 2001).
"Words are a mirror of their times. By looking at the areas in which the vocabulary of a language is expanding fastest in a given period, we can form a fairly accurate impression of the chief preoccupations of society at that time and the points at which the boundaries of human endeavour are being advanced."
- John Ayto, Twentieth Century Words.
"The Internet became a mass reservoir of thought, fads and feelings. No one has taken advantage of that reservoir. No one has measured it, dived into it, and seen it for what it is - the first tool of its kind for plumbing the secrets of the public and the private mind.."
- Howard Bloom, Author, The Lucifer Principle, Professor, New York University.
There is a relationship between Internet words and leading events, people and products of cultures. More specifically, there is a relationship between Internet words in specific cultures and leading events and things in those cultures.
While our main interest throughout this report is the relationship between leading American Internet words and leading events, people and products of American culture, the techniques and theories we discuss have cross-cultural applications.
Whatever culture one examines, the relationship of Internet words to culture has the potential to offer startling new insights into that culture.
However, these insights will come only when the Internet is viewed as a tool for observation rather than economic production.
Not long ago, the Internet began as an interactive forum where people from all over the world could meet and speak to each other despite the barriers of land, oceans and culture. Then came major corporations that tried to erect a central stage to make the Internet a platform to preach their own messages.
Despite the early interactive history of the Internet, its superstructure was increasingly dominated by major corporations and their business paradigms of one-way communication based around broadcasting models. In effect, the Internet was used as an economic tool to promote business rather than as an informational resource to understand culture. Economic goals dominated cultural insight.
The dot.com crash caused the disappearance of many Internet companies and forced a re-evaluation of business models. Yet the crash had little effect on the Internet's interactive social community and its untapped potential for understanding culture.
Today, the Internet offers a vast unexplored territory of social and cultural insight ready to be mapped and mined by a new generation of Internet explorers. Members of this new generation will be observers of information more than producers of it. Their efforts will help reduce information rather than produce information. As Web inventor Tim Berners-Lee notes in Weaving The Web, "The Web is more a social creation than a technical one."
Back To The Future of Linguistics
In Language and the Internet, one of the world's leading linguists David Crystal observes that "as the Internet comes increasingly to be viewed from a social perspective, the role of language becomes central." Crystal notes that "what is immediately obvious when engaging in any Internet function is its linguistic character. If the Internet is a revolution, therefore, it is likely to be a linguistic revolution."
Much of the Internet linguistics revolution might be viewed in the larger context of general linguistic history. In a sense, it involves a journey back to the nineteenth century in order to journey forward into the future.
During the nineteenth century, the fragmentary approach to reality prevented scholars from getting beyond the immediate facts in matters of language. During this time, language was seen as mechanical and atomistic, a conception of language which was reflected in the historical studies of comparative philologists.
The diachronic study of language, or study of the structure of language over a period of time, prevailed over the synchronic study of language, or study of language at a moment in time.
During the early years of the twentieth century, the great Swiss linguist Ferdinand de Saussure changed this nineteenth century conception of language. As a result of his influential work, particularly his Course in General Linguistics, the atomistic and diachronic methods gave way to the development of a synchronistic perspective of language and a change from the past history of language to the present structure of language.
In a sense, the Internet allows the return to the study of the pre-Saussure atomistic elements of language in words. At the same time, it also applies the synchronic method Saussure developed to study overall language structure to words. The Internet makes it possible to return to the earlier atomistic parts of language with the powerful new synchronic ability to rank and study these word atoms of language as never before.
Both the diachronic and synchronic methods of linguistic analysis live on in studying words on the Internet. The diachronic method continues in studying not the history of languages but rather the histories and cycles of words. And, the synchronic method continues onward in studying word ascendance (what we term the "rise" index) and word ranking depth.
Whatever the case, it is becoming increasingly obvious that the ability to index, search for and rank words on the Internet is giving new meaning to those word atoms of language we have taken for granted for so long. In a postmodern world of increasing symbols, the electric symbols of words are becoming our greatest symbols.
For members of the new generation of Internet observers, there are three major benefits of being associated with the search function on the Internet.
The first involves the great quantity of traffic passing through Internet search functions. The second involves the ability for the quantification of this huge amount of traffic. The third involves the ability for segmentation of searches from various communities.
First, the quantity benefit involves the strategic position of search engines at the leading "doors" to cyberspace. Search engines are the virtual equivalents of real world airports with vast traffic passing through them on each day on its way to particular destinations. As Google CEO Eric Schmidt reminds in the August 2001 American Spectator, "Search is the number-one thing people do on the Web today." The number two thing is e-mail.
Most of the top Internet search engines generate more than ten million searches each day. The leading search engine Google generates 150 million searches a day. Even the smallest of these virtual "airports" represent far more traffic than even the greatest real world airports.
While the search engine traffic is immense, one needs to add the caveat that search still involves only Internet users and not the entire "real world" population. In this sense, search engines are subject to the current overall demographics of Internet users. As Internet use becomes more widespread, these particular demographics will diminish as they "bleed" into the general population.
Second, there is the quantification benefit of search engines and their association with one of the most trackable actions of Internet users. Words keyed into search engines are easily ranked for specific periods of time. Histories of word and phrase ascendance and descendance is part of this quantification process. As we suggest, the dynamics of word movement might tell Internet observers as much or even more than the rankings of words.
With search engine companies, the greatest quantity of traffic on the Internet is matched with the greatest ability for quantification.
Certainly Internet search functions have value as barometers of worldwide Internet activity. However, much greater value resides in the ability to segment searches into various real world communities such as nations and corporations.
The ability for basic segmentation into nations (by identifying origination points for searches) is already possible on most search engines. Further segmentation into various communities will increase the application and practicality of search functions to more real world (rather than virtual world) scenarios.
Segmentation by nations offers a powerful new tool for analysis in a number of disciplines. While the techniques might prove to have high transferability between nations, our interest currently is mostly in segmentation of searches by those originating in America.
Search functions on the Internet are embodied in various search engines. While the search function in general draws large quantities of quantifiable traffic, some search engines draw far more traffic than others. These mega-search engines like Google offer the best positions for observation of Internet activity.
Mega-search engines draw more quantity through them mainly because of the quality of their results and their commitment to the traffic passing through them rather than to the destination Web sites of the traffic.
These differences can be seen by the segmentation of the search engine types into portals and directories, advertising sponsored search (pay for performance) and pure search engines.
Portals & Directories
Portals and directories such as Yahoo! rely on human editors to scour the Web and appropriately categorize pages and their associated links. Portal editors are much like librarians.
One problem for portals is that directories take tremendous effort to maintain. Finding new links, updating old ones, and maintaining the database technology add to a portals administrative burden and operating costs.
The Leading Pay-For-Performance Search Network Overture (http://www.overture.com) enables Web sites to enhance their revenues and user functionality by offering Overture's search results to their users. More than 95 percent of Overture's paid introductions are generated through its thousands of affiliate partners. Overture's search results reach 85 percent of active, U.S. Internet users.
Overture's search results are distributed to thousands of sites across the Internet, including Yahoo!, MSN, Netscape, AltaVista, Lycos, HotBot and many others.
Advertisers pay Overture the amount of their bid only when a consumer clicks on their listing, providing them with one of the most cost effective ways to drive targeted customer leads to their sites.
Search engines pay special attention to metadata in the pages that they spider through and add to their index databases. If the new librarians of the Internet run portals, mathematicians run search engines. More precisely, they are run by the formulas of mathematicians.
In the simplest case, this metadata might take the form of content in <Meta> tags. Many search engines return results on how often keywords appear in a Web site.
More advanced search engines, like Google, rely more on subtle information. For example, Google evaluates not only the occurrence of key words on a page, but also the number of outside links to the page itself, as a measure of importance or popularity.
Google developed an advanced search technology that involves a series of simultaneous calculations typically occurring in under half a second - without any human intervention. At the heart of this technology is PageRankTM technology and hypertext-matching analysis developed by Larry Page and Sergey Brin. Google's search architecture also is scalable, which enables it to continue to index the Internet as it expands.
PageRank technology performs an objective measurement of the importance of Web pages and is calculated by solving an equation of 500 million variables and more than two billion terms. Google does not count links; instead PageRank uses the vast link structure of the Web as an organizational tool. In essence, Google interprets a link from Page A to Page B as a "vote" by Page A for Page B. Google assesses a page's importance by the votes it receives.
Google also analyzes the pages that cast the votes. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages important. Important, high-quality pages receive a higher PageRank and are ordered or ranked higher in the results. Google's technology uses the collective intelligence of the Web to determine a page's importance. Google does not use editors or its own employees to judge a page's importance.
Finally, unlike conventional search engines, Google is hypertext-based. It analyzes all the content on each Web page and factors in fonts, subdivisions, and the precise positions of all terms on the page. Google also analyzes the content of neighboring Web pages. All of this data enables Google to return results that are more relevant to user queries. As a result, millions of users worldwide look to Google as the fastest, easiest way to find exactly the information they're looking for on the Web the first time.
Search + AdvertisingA new service from Google is AdWords SelectTM. Basically, it combines Google concepts with those of Overture, http://www.overture.com/ (formerly GoTo.Com).
An advertiser first chooses how they want to segment their market. (Currently, only large segmentations based on nations and languages are possible. However, someday perhaps we will be able to choose local areas such as states, counties, cities or metro areas.) Secondly, an advertiser chooses the key words specific to their business. This creates an ad group of words.
When a Google user searches for these key words, an ad in a box appears to the right on the Google search. Each ad group has a maximum cost-per-click (CPC) the advertiser is willing to pay. This and the click through rate (CTR) determine where the ad will be shown. If click through is high, advertiser pays less to stay in a top position. If the same, advertiser pays only pennies more to stay in top position. If CTR is less, then the advertiser pays more for top position. All of this is monitored by Google's AdWords Discounter that automatically adjusts rates.
For example, the words "furnished office space boston" yield a list of sponsored links on the right which go directly to sites of the paid advertisers. The box positioned at the top of the right column list is based on the above formula.
Unlike Overture, though, Google does not place these results in their hypertext links on the left. Rather, they put them into their own column on the right. A Google user can therefore easily see what links are sponsored and which ones result from the Google internal math formulas. Pure search resides side by side with paid for search.
The Semantic Web
Yet as sophisticated as leading search technology has become, the ultimate quality of search on the Web is a two-way street. Search engines can get more and more sophisticated at locating information. But Web pages need to also get more sophisticated at helping search engines locate information.
Currently, the methods for providing information lag far behind the methods for searching for information. This is because there is little standardization of providing information on the Web.
As Uche Obbuji writes in the June 2002 of New Architect, "The Semantic Web is a vision of a next-generation network that lets content publishers provide notations designed to express a crude "meaning" of the page, instead of merely dumping arbitrary text onto a page. Autonomous agent software scan then use this information to organize and filter data to meet the user's needs."
Semantic Web proponents are looking to XML and RDF to meet these new challenges.
Analysis"And the big complaint we get isn't about speed - we're the fastest there is. What people want is even more information, from sources that are not currently searched. Research reports, news files, historical archives, university projects - they're out there, but not yet in publicly searchable sources. Getting that information indexed is a Herculean, but incredibly important job."
- Eric Schmidt, CEO, Google, in American Spectator (August 2001).
As one might suspect, a number of search engines post their word search rankings. Some attempt analysis of the relationship between their top words and events in culture.
For the most part, though, their analysis focuses on the top ranked words and analysis of their meaning involves a good deal of speculation. Some are niche products of large branded search engines like The Lycos 50. The Lycos 50 (http://50.lycos.com/) attempts interpretations from the Lycos search engine receiving about 12 million searches each day. It has strong filtering requirements eliminating words such as those associated with corporations, pornography and computer technology.
There are also a number of independent word ranking consulting services that combine rankings from a number of search engines. One of the oldest and best is Beyond Engineering's Word Spot (http://www.wordspot.com/) that customizes searches for words clustered around client products. For example, word clusters around the search word "MP3" offers a client insight into top words associated with products relating to MP3. The online report from Word Spot is at http://www.wordspot.com/samplecustom.html.
The search engine Google has focused on the quantity of traffic and information rather than the quantification and analysis of it. As Google CEO Eric Schmidt notes, the task of indexing information is a "Herculean" task by itself.
It's had incredible success in this effort so far. Among search engines, Google has become the largest in just a few years, processing over 150 million searches a day from its two billion Web-page database.
While Google has focused on search services rather than interpretation of results, it has created a weekly and monthly "Google Zeitgeist" of the top ten ascending and descending words in Google rankings. It has also created an archive of this information that goes back to June 2001 at http://www.google.com/press/zeitgeist/archive.html.
For example, consider the Google Zeitgeist top gainers and losers for the week ending 13 May 2002. It is arrived at by comparison of search queries that have either risen or dropped by a significant percentage between the week ending 6 May with the week ending 13 May.
Some of the top ascenders were #3 "Mother Day" for Mother's Day on 12 May, #4 "Star Wars" for the new Star Wars film which was opening and #6 "Dia de la Madre" the Spanish for Mother's Day. Some of the top decliners from the previous week was #1 "Cinco de Mayo" the recently past Mexican Holiday, #2 "Kentucky Derby" which was run on 5 May, #3 "Le Pen" the conservative French politician, #7 "Spiderman" which had opened the previous week and #9 "Kirsten Dunst" the female star of the film Spiderman. All top ten ascenders and descenders are represented in the chart below.
Table 1: Weekly Google Zeitgeist - 13 May 2002
Ascenders Descenders 1 muttertag cinco de mayo 2 volkert van der graaf kentucky derby 3 mothers day cards le pen 4 star wars hubble 5 rolling stones may day 6 dia de la madre vappu 7 judith soltesz-benetton spiderman 8 luke john helder muguet 9 world cup kirsten dunst 10 earthquake kelli carpenter
Notice the close connection between the weekly Zeitgeist and leading news of popular culture.
The weekly Google Zeitgeist focuses on the fast "revolving door" of popular culture and popular culture's increasingly short attention span. On the other hand, the monthly Google Zeitgeist casts a wider net picking up larger trends or those things that gain attention for a month rather than a week.
For example, some of the top ascenders for April 2002 reflect events and people with longer staying power than weekly indexes. These were #3 "Linda Lovelace" the infamous pornography film star who died in April, #4 "Spiderman" the movie with the largest opening box office in history, #6 "Le Pen" the conservative French politician, #8 "Robert Blake" the film star accused of murdering his wife and #10 "Lisa Lopes" the teen singing star who died in an accident. Some of the top descenders were events which had passed such #1 "Oscars," and #2 "NCAA," and #7 the film "Ice Age." The full list of ascenders and descenders for April 2002 is reproduced below.
Table 2: Monthly Google Zeitgeist - April 2002
Ascenders Descenders 1 loft story oscars 2 linda lovelace ostern 3 gran hermano brooke gordon 4 spiderman presa canarios 5 tawny kitaen national geographic 6 le pen ncaa 7 Israel ice age 8 robert blake halle berry 9 irs formula 1 10 lisa lopes gareth gates
While the monthly Google Zeitgeist is more indicative of broader trends than the weekly Zeitgeist, in months with major news or events the monthly Zeitgeist might represent even broader cultural trends. For example, April 2002 had no big news events while September 2001 had one huge news event. The top ascenders and descenders for September 2001 are reproduced below.
The ascendance of the September 2001 words on Google are fairly self-explanatory. It is interesting, though, that the word "Nostradamus" has top ascendance ranking possibly indicating an event of such proportions to place it in the category of prophetic. Top descending words are also fairly self-explanatory. Interestingly, the idea of travel is one of the largest decliners as Americans decided to stay home after the terrible airline tragedy of September 11th.
Table 3: Monthly Google Zeitgeist - September 2001
Ascenders Descenders 1 nostradamus aaliyah 2 cnn powerball 3 world trade center code red virus 4 osama bin laden mariah carey 5 taliban baycol 6 afghanistan n503is 7 nimda travel 8 american flag lisa harrison 9 bbc american pie 10 fbi etna
Apart from top ascending and descending words on the monthly Google Zeitgeist, Google is also experimenting with other listings.
Table 4: Top Product Searches - April 2002
1. ferrari 2. sony 3. bmw 4. nokia 5. disney 6. ryanair 7. hp 8. ikea 9. adobe 10. nike
For example, for the month of April 2002, they post the top brands searched for on Google. These had the order listed in above Table 4. Since advertising and publicity is tied to products, much of the positions in the above indicate advertising or publicity campaigns driving the positioning.
The Google Zeitgeist (as well as other search engines ranking words) is relatively new. As the years pass, though, there will develop Internet rankings of top yearly words in particular cultures. Exploring relationships between these top yearly words and leading cultural events offers much research potential.
Long before the era of Internet word rankings, a number of scholars and organizations have been tracking leading words in culture. For example, each issue of American Speech, published by the well-respected American Dialect Society features leading new words in its "Among The New Words" (ATNW) section. The "ATNW" section of American Speech has been a feature of the journal for most of the twentieth century.
A fascinating book project by leading linguists David Barnhart and Allan Metcalf is America in So Many Words. It is an attempt to identify the leading words which have entered the American vocabulary on a yearly basis starting in the year 1555 and ending in 1998.
Table 5: American Words added to Dictionaries in the 1990s
1990 PC 1991 about 1992 Not! 1993 newbie 1994 go postal 1995 Newt 1996 soccer mom 1997 Ebonics 1998 millennium bug
For example, leading words during the beginning years of American culture were "canoe" (1555), "skunk" (1588), "Indian" (1602) and "turkey" (1607). At the other end of American history, leading words for the 1990s were the revealing list of the above words.
Unlike the ability of the great search engines like Google to rank word popularity, the authors have simply relied on new invented words that became so prominent they were added to the American dictionaries.
Theoretically, if historical content analysis of American print media was possible on a massive scale, a particular type of ranking of leading words by year might be possible.
Of course if Internet search engines like Google had been around, we would have immediate rankings of words on a yearly basis.
The exciting thing is that we have already developed a database of the most popular words in America over a few years period. An extremely important database to rate American word rankings continues to grow. Google alone adds 150 million word rankings each day to this database.
Since 1879, when James Murray began the work of compiling the Oxford English Dictionary (OED), readers all over the world have been collecting examples of new words, idioms and meanings for the Oxford database. Once sufficient evidence for a new word has been collected, Oxford's lexicographers prepare a new entry that is then reviewed by expert consultants before it is added to the OED database. The OED database has reached one million words.
During the last century, some 90,000 new words were added to the Oxford English Dictionary and its supplements. This represents a 25 percent increase in the total vocabulary over the previous thousand years. This is not surprising when one realizes the number of native speakers of English in the world nearly tripled in the twentieth century, from 140 million to 400 million. In addition, a further 100 million people added English as a second language.
From these 90,000 words added to the OED in the twentieth century, British lexicographer John Ayto has selected the most salient new words coined in each decade - some 5,000 words in all - for his book 20th Century Words. As Ayto notes:"Words are a mirror of their times. By looking at the areas in which the vocabulary of a language is expanding fastest in a given period, we can form a fairly accurate impression of the chief preoccupations of society at that time and the points at which the boundaries of human endeavour are being advanced."
Grouped by decade, the words are a mirror of their times and the events, preoccupations, inventions and discoveries of each decade.
In a February 2000 interview on Lingua Franca, an Australian radio program, Ayto observes that words help us define new decades and help us make a new start. "We throw off the fashions of the previous decade and perhaps make a conscious effort to move into the new one." One example is how the names of popular dances have defined decades."And so the names for all the dances, for example, that were popular in the 1920s, were very swiftly superseded and made to seem very old hat in the 1930s, and the same thing happened in the '40s and '50s and so on. You can really encapsulate a decade almost by the names of the dances that were popular at a particular time."
In a general sense, Ayto observes in the radio interview that the twentieth century could be characterized as the century of abbreviation. "I suppose you could try and draw all sorts of morals about our hurried lifestyle from that. There has been the acronym, for example, like AIDS and NATO where you take the initial letter of a string of words, put them all together and make another word out of it. That was virtually unknown before the twentieth century."
In the coming years, it will be interesting to see how new areas such as Internet word analysis develops in relationship to traditional areas such as lexicography, semantics and linguistics.
Google is a worldwide search engine and its overall, non-segmented results reflect a mixture of interests from its worldwide users. However, Google demographics show that this mixture of interests is heavily weighted to the English language and western nations.
Currently, Google has the ability to segment searches into five languages and most of the major nations of the world. In order for Google to be an effective tool for cultural analysis, searches originating in specific countries need to be segmented from the overall worldwide searches on Google.
Apart from key segmentation into various nations for cultural analysis, searches might also be segmented into various forms of real world (rather than virtual) communities. Search segmentation most likely will be based on physical origination points. In this manner, interests of smaller search communities (within the larger overall search community) are defined and ranked. Examples of some of these word search communities are corporations, cities, universities and zip codes. However, it is conceivable that search segmentation might also be done by non-locality identifiers such as industry SIC codes.
The ability to segment search data into various communities has two important consequences. One, it obviously demonstrates the interests of community members. Second, and less apparent, it allows for cross community comparisons of word rankings.
For example, in the illustration below, assume four different segmented word search communities 1, 2, 3 and 4. These might represent four different corporations, universities or towns which are able to segment internal generated searches from external generated ones via fire-walled Intranet technology. For comparison purposes, it is important they are similar communities. For example, all should be corporations or all should be towns. Better comparisons are obtainable if they are similar corporations and towns.
Assume the letters A, B and C represent word rankings levels (top, middle and lower) in these communities. Community 1 and 2 are in word alignment with each other. Community 1 and 2 are in alignment with top ranked words in community 4 but out of alignment with middle and lower ranked words. Community 3 and 4 are out of alignment with top ranked words but in alignment with lower ranked words.
Figure 1: Word Rankings In Four Communities
Cross Community Comparisons
Of course this is a simplified example. Yet it represents the basic principles of cross community word comparisons. What might this information tell us?
In later sections of this report, we discuss issues of word rankings such as depth, cycles and clusters (with their content and context words). One of our main arguments is that top ranked words (we term "content" words) are more reflective of external events in the community while middle and particularly lower ranked words (termed "context" words) are more expressive of collective psychology and moods, emotions and attitudes of the internal world.
Applying this to the above example, word communities 1, 2 and 4 have content word alignment (matching "A"s) yet lack context word alignment ("C"s are not aligned with "B"s). However, communities 3 and 4, while lacking content word alignment have context word alignment at their lowest level of words. This low level word alignment might ultimately be more important than the surface alignment of top ranked content words.(For purposes of better understanding, the reader is encouraged to return to this section after reading the upcoming sections on depth, cycles and clusters. However, we feel it is important that segmentation be discussed first for these sections to have more practical application and meaning).
Below are some examples of prime segmentation candidates that might have word searches separated by search engines from outside traffic for a focused understanding of the dynamics of their own communities. These are nations, corporations, towns, counties, states, universities, zip codes and industries.
Segmentation of search via nations is perhaps the first essential action for cultural analysis. But it also offers powerful new tools for international marketers as well as for governments. Searches originating in different nations might be compared to locate commonality or differentiation of interests between nations. A word ranked high in one nation might be ranked low in another nation. A word with a high "Rise Index" in one nation might have a low "Rise Index" in another nation. On the other hand, there may be much commonality between some words.
President Bush (as well as other political and military leaders) has observed that modern wars will increasingly be fought on many fronts using a number of means outside of conventional military force.
As modern wars increasingly become "battles of symbols" it becomes more and more important to gauge the top symbols in other nations and formulate a strategy in some type of alignment with these symbols. The top ranked words of one's enemies and allies become important in this emerging global scenario.
By segmenting word searches originating in specific nations, search engines are able to create national lists of top ranked words.
Table 6: Fictional Top Search Words of Nations
America Afghanistan China England Man (#605) Man (#100) Man (#350) Man (#550) Woman (#50) Woman (#900) Woman (#100) Woman (#225)
In the above illustration, assume that the word "Man" and "Woman" are subject to a close cross-cultural interpretation in search words on the Internet and that searches are conducted at close to the same moment in search time. Lower rank of the word shows greater importance. Higher ranking shows less importance.
Most interesting, are the relative weights of the rankings between various nations. For instance, at the particular moment in time represented by the above chart, America gives less importance to the word "Man" than all the other nations in the study. However, it gives greater importance to the word "Woman" than the other nations. Might variations such as these in key symbol words suggest the dominance of a feminine context or zeitgeist over a masculine one? Might this simply be a coincidence? Might there not be enough data to make any meaningful conclusion?
For a minute, assume there is enough information to draw some broad general conclusions. Assume that an American culture with a high "Man" word index and low "Woman" index indicates a general American feminine cultural zeitgeist in contrast to a general masculine cultural zeitgeist in Afghanistan. Much can be gained for communication purposes if a culture understands it is in a feminine cycle and that it is communicating with another culture that is in its masculine cycle.
Beyond speculation of top word rankings for various nations, the Google Zeitgeist is already posting top word rankings from various nations in its monthly Google Zeitgeist listings. The month of April 2002 is below.
Table 7: April 2002 Word Rankings - Four Nations
Word Rank America UK Germany France 1 loft story spiderman ice age loft story 2 linda lovelace loft story handyzubehoer
sncf 3 gran hermano queen mother servlet schulung sms gratuit 4 spiderman alton towers hochzeit anpe 5 tawny kitaen eurostar fussball air france 6 le pen tesco sms kostenlos immobilier 7 israel london marathon deutsche bahn olympique de marseille 8 robert blake football aldi pc maroc 9 irs debenhams kommunion tfl 10 lisa lopes british airways mallorca degriftour
In the above, note the perfect alignment between America and France on "Loft Story" as well as a close alignment between America and Britain on Spiderman.
One of the more interesting and valuable word search communities is in current application with the Google Search ApplianceTM. This appliance is licensed from Google and provides the ability to segment Google Internet search technology behind firewalls and inside various Intranet communities such as corporate Web sites.
Besides offering greater communication within corporations, the various shifts in page hits and word rankings potentially offer an incredible new tool for corporate management in determining key interests of employees.
Towns, Counties And States
While Google can create segmented searches behind firewalls and corporate Intranets, it currently does not have the ability to segment inquiries originating in localized areas (smaller than nations) such as towns, counties or states.
Connected communities or cyber-cities presently utilize Internet technology to increase local "social capital" by attempting to maximize economic, educational, political and social value. Word searches originating in particular cities or local communities could offer a valuable method of gauging local interests.
In the early years of the Internet, many universities were locked behind firewalls. However, today almost all universities have removed firewalls allowing for searching outside their specific communities. Some universities, though, still maintain various forms of firewalls. To the extent that these firewalls capture intranet activity, they offer valuable insight to school administrators regarding leading issues with faculty and students.
Searches segmented by zip codes have the potential of relating to perhaps the leading marketing segmentation tool in the Claritas and PRIZM database.
Basically, Claritas has segmented America into a number of distinct psychographic and demographic groups with direct relationships to zip codes. For example, if a marketer provides a zip code to Claritas, they can provide an excellent model of the consumers living in this zip code.
If zip codes could be isolated for search purposes, leading word rank could be cross-indexed with Claritas data for a powerful new understanding of markets.
As we mentioned earlier, it is conceivable search segmentation might also be undertaken by non-locality identifiers such as industry SIC codes.
For example, searches originating in different industries might be isolated for industry analysis as well as for cross industry comparison. Also possible might be further segmentation of niches within industries.
Industry specific words might be defined and word rankings obtained from this community of words. The rise or decline of certain words (discussed in the following Ascendance chapter) associated with industry products might greatly help in forecasting industry or product trends.
Depth (Synchronic Words)
While the Google Zeitgeist is interesting, for the most part the top ten Google words (weekly or monthly) simply reflect leading things and people coming and going from the attention of popular culture. The top ten words offer few surprises and little insight into the hidden forces behind popular culture.
Far greater cultural insight exists in the larger database of words ranked outside the Google top ten words. It is at the lower ranking levels that words move away from reflecting external cultural events to expressing internal attitudes.
Figure 2: Deep Zeitgeist
Just as the surface of a lake reflects the clouds over it, so too do top ranked words reflect general interest in external current events.
In the above illustration, top ranked words (red dot) are closer to popular culture and the external world. In this sense, they are more reflective of the external world. On the other hand, lower ranked words (orange dot) are closer to collective psychology and the internal world and are more expressive of the internal world.
The arrows in the illustration show the path origination of forces creating the word rankings. Top ranked words are reflections of leading events in culture and the arrow moves down from the external world to the words. Lower ranked words are expressions of attitudes and moods in individuals and the arrow moves up from this internal world to the words.
The real "zeitgeist" or "spirit of the times" behind the quick flash in the pan events of popular culture exists within the patterns obtained from these larger and deeper ranking of words.
In addition to exploring a deeper Google Zeitgeist, the history of the ranked words may also offer new insights into culture. Here, the speed of a word's rise might be given an index and rating. Words that rise the fastest (and are not related to obvious cultural events) may indicate the collective psychology of a culture, or, what Carl Jung termed the collective unconscious.
The same word searched for over a shorter period of time may be more indicative of collective psychology than the same word searched for over a longer period of time. Words might be given a "Rise Index" to rate them apart from their "Rank Index."
Table 8: Rise Index
Words Searched For At The Same Time
Indicative Of Collective Factors?
Rank Index Word Searches Time Period Rise Index #200 Baby 500,000 1,000 Minutes 500/Minute #800 Clouds 100,000 10 Minutes 10,000/Minute
In the above chart, although the word "Baby" is rated higher than "Clouds" it may not be as important as "Clouds" since it took 1,000 minutes (16.6 hours) to rise to #200 while "Clouds" took only 10 minutes (1/6 hour) to rise to #800. Dividing total searches by the time period a "Rise Index" is obtained. This index measures the speed of ascent. Note in the above example, the "Rise Index" for "Clouds" (at 10,000 per minute) is 20 times greater than the rise index for "Baby" (at 500 per minute).
Most interesting within the "Rise Index" are words searched for simultaneously demonstrating a type of synchronicity of search. The higher the Rise Index, one might posit the higher degree of synchronicity involved. For example, a word ranked #1,500 but based on 50,000 simultaneous searches and with no observable external stimulus offers a valuable research potential into the concept of word search synchronicity on the Internet and its relationship with collective psychology.
Cycles (Diachronic Words)
Cycles are composed of beginnings, endings and a sequence of stages between the two. Research into cycles has shown that cycles move between symbolic opposites inherent in the symbolism of beginnings and endings.
While many cycles have been observed in nature, there is growing evidence there are also cycles in culture. Some of the key researchers in the area of cultural cycles are Arthur Schlesinger Sr. and Arthur Schlesinger Jr., Frank Klingberg, William Mayer, Harold Lasswell, Lloyd deMause, Pitirim Sorokin, William Strauss and Neil Howe.
One of the most famous cycle theories of cultural cycles is the Elliott Wave Theory proposed by Ralph Nelson Elliott. It relates the economy to cyclic social moods. Robert Prechter in The Wave Principle of Human Social Behavior extends the Elliott Wave theory to wider areas of sociology and psychology to create a theory of socionomics. Within Prechter's theory there is the suggestion words go through cycles similar to Elliott waves.
Within the area of cultural word cycles, one needs to distinguish between external word cycles and internal word cycles. The first involves word relationships to annual cyclic events. The second more closely resembles Elliott Wave cycles and collective cultural moods.
Certain words and classes of words demonstrate repeating cycles related to external annual cultural events such as major holidays, sporting events or governmental deadlines. For example, the word "Internal Revenue Service" or "IRS" evidences a cyclic rise before the April 15th U.S. tax deadline and a decline after it. Or, as another example, words associated with the words "Super Bowl" show an annual rise in January and decline in February.
Far more interesting, though, are word cycles relating to internal factors rather than external events. According to cycle theory, the general types of words clustered around the beginning of these internal cycles should be different (opposite in fact) than words clustered around the end of these cycles. They should also demonstrate a sequential clustering in stages between the beginning and endings of these cycles.
Figure 3: External and Internal Word Cycles
For instance, one would expect to see a dominant cluster of feminine related words at the beginning of an internal word cycle and a dominant cluster of masculine related words at the end of an internal word cycle. Whether the clusters of feminine or masculine words have a relationship to popular culture (feminine film and story genres, political leaders, colors, etc.) is an important research question by itself.
In the above illustration, external word cycles (Yellow) demonstrate repeating cycles related to external annual cultural events such as major holidays, sporting events or governmental deadlines. Internal word cycles (Blue) demonstrate repeating cycles related to biological rhythms and internal psychology.
Clusters"The meaning of an episode was not inside like a kernel but outside in the unseen, enveloping the tale which could only bring it out as a glow brings out a haze."
- Joseph Conrad, Heart of Darkness.
Besides showing a depth archaeology of ranking levels, Internet words also demonstrate a clustering phenomenon centered around key words searched for. This clustering phenomenon can be analogized to the orbit of planets around stars, of moons around planets. The force of certain words to draw other words to them in orbits is similar in many ways to gravitational forces of the universe. Words may have a ranking on the Internet but words also possess their own internal gravity pulling other words into orbit around them. All words are symbols and all symbols refer to other words or symbols.
Much of this sounds esoteric until one considers that many key words on the Internet relate to products. People search the Internet largely to find these leading products of popular culture. But more importantly, they unconsciously search the brand contexts and narrative entertainment genres containing these products. We suggest culture first creates a narrative context before placing products in it.
Just as the outer planets of the solar system are symbols for the context of collective psychology, so too might words in the outer orbits around products present collective factors associated with these core products.
All physical products and non-physical services are wrapped in the images, emotions and words of brands. The core product can be viewed as a few particular words with a number of related words orbiting around it in increasing concentric orbits. Those brand words most directly related to the product have the smallest (closest) orbits while those indirectly related to the product the largest (most distant) orbits.
Words clustering around products can be isolated and given various levels of importance. For example, a word directly related to a product might be given more weight than one in a larger orbit farther away from the core product. As a result, lower word rankings of direct product attributes might mean as much or more than higher word rankings of indirect product attributes in more distanced orbits.
Figure 4: Products & Word Orbits
In the illustration above, a product in the green circle is defined by words relating to its specific features and benefits. Words describing the actual product (like color, size and shape) are clustered within the small yellow circle. Outside the yellow circle various words orbit around in growing concentric circles. In effect, these words create the overall "brand" surrounding the product. The orbits of the Inner Brand Words are closer to the product and more objective in defining it. Those Outer Brand Words are less objective and more subjective in defining the product.
One might visualize the green product as placed within an advertisement or a commercial. The words of the outer circles are words relating to the context of the advertisement, the setting or the background environment which best presents the product. For example, assume the product is a rugged off-road vehicle. Key words describing it would center on masculine words associated with freedom and power. Words in the outer circles would most likely center on a non-urban context of the product such as the desert suggestive of the setting for westerns.
As Philip Kotler notes in Marketing Management (8th Edition), there are five levels of a product that expand out in circles. At the center is the Core Benefit, then the Generic Product, the Expected Product, the Augmented Product and finally the Potential Product. Kotler notes that much competition takes place at the Augmented Product level rather than the Core Benefit level. The Augmented Benefit level is really the brand level. As Harvard Business School's Theodore Levitt notes in The Marketing Mode:"The new competition is not between what companies produce in their factories, but between what they add to their factory output in the form of packaging, services, advertising, customer advice, financing, delivery arrangements, warehousing, and other things that people value."
One might add, that this has increasing application to products searched for on the Internet through search engines. As the power and interest in brands increases, consumer attention focuses more away from our green circle (containing what Kotler calls the product's Core Benefits) and more on Levitt's outer bundle of brand elements in the blue circle.
The attention is away from key descriptive words of the product itself and more towards words relating to the context of the product. In effect, for the marketer selling rugged off road vehicles, word searches involving contextual places like "desert" might indicate a more subliminal and predictive interest trend in the product than straight searches based on words like "rugged vehicle." In this way, the outer words of product brands are similar to the deeper layers of word rankings.
Content And Context Words
We suggest outer brand words and inner brand words might be termed context and content words. In many ways, these words represent symbols on opposite ends of a spectrum or continuum.
In a general sense, one observes that content words are more objective and descriptive of the external world while context words are more subjective aspects of the internal world. Context is a pervasive, ubiquitous medium surrounding messages within it. It is similar to the water surrounding a fish and difficult for our visually oriented Western culture to sense. As Marshall McLuhan once commented, "While we're not sure who discovered water, we're pretty sure it wasn't a fish."
Table 9: Content & Context Words As Opposite Symbols
Content Words Context Words Objective, descriptive Subjective Shape, size, color, substance Place, space, time, elements Hero, product Setting (film set, commercial set) Messages Medium Symbol, Image, Icon Mythology, Narrative, Story Genres Top Internet words Middle and Lower Internet words Reflective of current culture Predictive of future trends Linear rise ascendance Simultaneous rise ascendance Follow context words on the Net Precede content words on the Net Product Brand Masculine, consciousness Feminine, unconsciousness External events & annual cycles Internal events & natural cycles Extroversion (Thinking) Introversion (Feeling)
The oppositions can be shown in a few ways. First, they can be seen (below) in a cluster manner as context and content words orbiting around products.
Figure 5: Content & Context Word Clusters
In the illustration above, the Product is represented by the small green circle (P). Content words are objective describing such product features as shape, size and color. These words have a close physical relationship to products (P). In effect, they form the clothing of products. Context words are subjective providing such containers for products such as space, place and time. Both content words and context words surround the actual product (P).
They can also be seen in a depth manner as top ranked words or lower ranked words. For example, in the illustration below, the yellow box represents top ranked content words close to leading products in culture while the blue level box represents lower ranked words closer to the context of leading products in culture.
Figure 6: Content & Context Word Depth (Rankings)
However one views these words, it is important to see the dichotomies of content and context words and their attachment to products of culture.
(Entertainment & Story Genres)
In the same manner that related brand words cluster around products, so also do related words cluster around narrative story genres. Once word clusters for particular genres can be isolated, their rankings can then be tracked and trends of narrative interest predicted.
Unlike the variations in products and brands, the number of entertainment genres is relatively fixed. Historically, they have derived from the major modes of Greek drama in tragedy, comedy, satire and romance. Today, historical modes of drama find expression in film and novel story genres such as adventure, western, romance and comedy.
As one might suspect, words relating to narrative are found much more in the outer contexts surrounding products than the inner contents orbiting products. Joseph Conrad might have had narratives in mind when he said in Heart of Darkness:"The meaning of an episode was not inside like a kernel but outside in the unseen, enveloping the tale which could only bring it out as a glow brings out a haze."
Products that make it into leadership positions envelop themselves in powerful narrative context rather than produce "kernels" of content. A bad product cast into the contextual words of a good story does a lot better than the contentual words of a great product cast into the context of a bad story.
Like context words and collective forces of culture, there is reason to suspect that context words exist in the rankings of lower words rather than the top rankings.
Closer Offline & Online Product Clusters
At the height of dot.com mania, there existed a great digital divide between offline "bricks and mortar" business models and online virtual business models. Despite attempts by many key players to suggest the two possessed an underlying synergy, the truth was that the two worlds of business were never able to work together in any productive ways.
A little noticed result of the dot.com crash was the movement closer together of the real and virtual business worlds. The collapse of Internet business models and advertising revenue made the Internet increasingly a tool for promoting offline products. Revenue came from marketing offline products as sites could no longer depend on site-generated advertising revenue.
The failure of the advertising revenue model is emphasized in a joint research study from McKinsey and Jupiter Media Metrix reported in The McKinsey Quarterly for May 2002. The report "Can Broadband Save Internet Media" argues that even if the wide-use of broadband arrives, online advertising will never be able to support sites alone. One of the key new business models suggested for sites is marketing off-line entertainment.
In this emerging scenario, it is likely that offline entertainment products will have a closer and closer connection to the Internet. The concepts of movie sites will move into other areas.
The closer connection between offline products and online advertising of these products, suggests a new connection between online and offline words. In this respect, online words will offer an increasingly important reflection of real world products.
Some forms of prediction based on word rankings are fairly obvious and in common use today. One example is word rankings based around pre-release of Hollywood films and searches for Web sites of these films. Other large-scale projects like theme parks might utilize this method in pre-opening prediction of interest.
Other areas of prediction, though, are much more intuitive and subtle and have yet to see practical application. We offer speculation based on previous concepts we've discussed such as depth, ascendance, cycles and clusters.
As old methods of pre-screening films to test audiences declines, Hollywood relies more and more on Web sites to predict interest in films before they are released. Words searched relating to upcoming movie titles are therefore an important aspect in predicting the popularity of a film.
A film's sales potential relative to other films opening at the same time can be estimated by comparison of word rankings and Web site hits of the film with its opening date competitors. If the predicted competition from another film is too stiff, the release date might be changed to avoid head on competition and improve sales.
In addition to film sites, other pre-product Web sites have great potential. Some of the areas these might be related to are commercial and residential real estate developments, theme parks and urban developments. In this manner, large projects may garner a community of interest long before they formally open their "doors" to the public.
Not only can tremendous advance publicity be generated by this method, but they can also be viewed as market research sites garnering important information about the target markets for these large projects. Problems might be fine-tuned in advance before project completion dates.
While words associated with upcoming films have a practical value, trailer advertising in movie theaters drives much of the word rankings related to upcoming films. Films that are able to preview advance trailers (and their Web sites) with blockbuster films generate the greatest amount of word searches centered on their titles.
In effect, while one might be able to predict a film's success from the number of word searches before it opens, it is more accurate to say that one is able to predict the number of word searches based on the popularity of the film the trailer runs with. Advertising exposure precedes word search popularity.
Far more value for prediction relates to utilizing concepts we've discussed such as ranking depth, ascendance, word cycles and product clusters. These are not related to advertising exposure and marketing dollars as much as more subtle factors centered on collective psychology and symbolism.
Word prediction extends the concepts we have discussed in previous sections that lower ranked words are expressive of the internal concerns of collective culture while top ranked words are reflective of the external events and things of popular culture. As we have suggested, lower ranked words are subjective context words while top ranked words are objective content words.
The illustration below represents these predictive concepts in a simplified form. Top ranked content words are represented by the red dot while lower ranked context words are represented by the orange dot.
Figure 7: Content Predicted By Context
The red dot simply reflects the external present popular culture (PC) while the orange dot expresses internal collective psychology and the content of future popular culture.
A few initial hypotheses are suggested below for this exciting new area of cultural prediction based on word search.
- Context Precedes Content
The context of today is the content of tomorrow. As we suggest, contextual words reside in the middle and lower levels of word rankings rather than the top word rankings which consist of content words.
We argue that narrative and words associated with place, space and time are contextual words. As we note, they are words in the outer orbits of products that comprise the overall brand or the settings for products. The collective psychology of a particular culture first finds a context and then creates content in the form of products, people or events to inhabit its contextual environment.
Using our past example of a rugged off-road vehicle product, collective culture would first show an interest in the background setting for this type of vehicle before an interest in the vehicle. Contextual words would most likely center on a non-urban places such as the desert suggestive of the setting for westerns as well as western genre narratives.
In effect, culture's interest in particular contextual words creates particular background environments before products are placed within these environments. Context precedes content and contextual words precede content words.
- Ascendance Over Rank
The synchronic search for particular words and their resulting rise in ascendance has a closer relationship to collective psychology in culture than top word rankings which simply reflect leading things of culture.
Collective psychology creates the context of culture before content is placed within this context.
- Lower Rank Over Higher Rank
Words at certain levels of the ranking archaeology indicate greater predictive value than words at other levels. In a general sense, we argue that top ranked words reflect culture while lower ranked words predict culture.
- Internal Cycles Over External Cycles
Particular levels of word rankings (for the purpose of example, say words ranked in the level from #700 to #400) might indicate greater internal cyclic activity than words at higher levels which reflect greater external cyclic activity. Top ranked words are subject to cycles of set annual events such as national holidays, sporting events and governmental deadlines and elections. Levels of lower ranked words are more subject to natural, biological and psychological cycles and to collective moods, themes, attitudes and narratives.
Organization of Information
While directories and portals have been categorizing and grouping information in their databases for quite a while, the use of mathematical algorithms to carry out this grouping function is a new concept.
Google's News SearchTM (http://news.google.com/) presents information culled from many of the world's news sources collected over the previous week. With continuous updates throughout the day, users are able to keep up to date with what's happening now and learn about the stories that led to the most recent developments.
What's different about Google's News Search is the unique grouping technology Google has developed to automatically put related stories together in the same search result. This makes it easy to quickly scan the headlines while providing the option of reading multiple accounts of a story from different news sources.
The headlines that appear on Google's homepage are selected entirely by a mathematical algorithm, based on how and where the stories appear elsewhere on the Web. There are no human editors at Google selecting or grouping the headlines and no individual decides which stories get top placement.
On the Internet, words have always dominated over images and sounds but there have been an increasing number of images and sounds on the Internet. Like words, images and sounds are tied to specific content and this content is subject to being indexed and searched for on the Internet.
To meet the need of this exciting new area Google has developed Google's Image SearchTM. It is the most comprehensive image search on the Web, with more than 330 million images indexed and available for viewing. The search is conducted by selecting the Image Search feature on the Google search engine or by visiting the specific image search site at http://images.google.com.
To create their database of images, Google analyzes the text on the page adjacent to the image, the image caption and dozens of other factors to determine the image content. Google also uses sophisticated algorithms to remove duplicates and ensure that the highest quality images are presented first in its results.
When a query is entered in the image search box and the "Search" button is clicked, a number of thumbnail images appear in the results box. The user can then click the thumbnail to see a larger version of the image as well as the Web page on which the image is located.
One of the most exciting imaging possibilities is not necessarily technology from Google but rather technology that allows for images to be digitized. Certainly the revolution in digital photography and film is indicative of this trend. One of the key new technologies are the color sensory chips of Foveon that provide a tremendous leap in clarity for images and films.
Google is tremendously important to the new Foveon technology. First of all, technologies like Foveon consume a tremendous amount of storage space. Google has a lot of storage space. Secondly, Google holds the key to communication of images through its image search technology.
Certainly one of the broader lessons from the Napster era was that the Internet is increasingly about sounds rather than just words. While most sounds on the Internet might be related to music and songs today, there are many other types of sounds that will find their way to the Internet and the search engines of the future.
As Forbes Publisher Rich Karlgaard notes in the April 2002 Forbes, "Yesterday's text hunt for a New York Times article becomes tomorrow's sound search for an ad jingle."
Sensory Product Clusters
As we have observed, there are a number of words clustered around products. One could also say, there are also a number of images and sounds clustered around products.
We suggested that images associated with products reside in what we termed outer brand orbits or contextual words. The same is true for sounds associated with products. These context words of place, space and time suggest qualitative factors associated with products. They are the background settings of commercials and advertisements, the most effective environments for the advertisements from Madison Avenue.
Soon, context words will turn into images and sounds. In this sense, the future product clusters on the Internet will have a number of images and sounds revolving around them as well as words.
As images and sounds move onto the Net and are indexed and subject to search, there will arise rankings of leading images and leading sounds for particular moments in time.
The great search engines of the future will provide mathematical models locating underlying correspondences between sound, word and image. Today's leading word will be tomorrow's leading images and sounds.
Beyond Sites and Words
Apart from search engine data, there are also a couple of other rich sources of information on those modern electric symbols of words.
For instance, Usenet and mailing lists provide potential "mother lodes" of electric word information. Here words are placed in an interactive context and analysis might move forward somewhat like traditional methods of print content analysis in the past.
Interestingly enough, Google acquired the dejanews database not too long ago (http://groups.google.com/). The conversations in these groups provide a different slant on the Zeitgeist from that provided by searches alone. So far, to our knowledge, no one is capturing the ascenders and descenders of Usenet topics or words. It might be valuable to look at things like word count, subject areas and changes in the traffic levels of particular newsgroups. It might also be valuable to index search words attempting to get to particular Usenet groups.
Mailing lists provide yet another view of the electric Zeitgeist of words. Many of these are private and unsearchable but many have public archives. Both types can be found at http://groups.yahoo.com/. Again, no one (to our knowledge) gathering statistics on the content of these lists but it would certainly be another interesting source of data. In the future, it will be a natural source of data.
Beyond Web Sites
In a sense, one might visualize the entire Internet as a vast collection of electronic words in various segmented states. For example, the potential for the analysis of electronic words is barely tapped, even by the greatest search engines today.
As the below dark blue arrow illustrates, the great search engines today only explore the yellow and light blue areas and not the orange and green boxes to their right. One day, as the red arrow illustrates, the search functions for words may extend from the yellow box on the left to the green box on the right. With Google's acquisition of the dejanews database, practical Usenet search might not be far away.
Figure 8: Beyond Web Site Words
Progress will be dependent not only on the technology of the great search engines like Google but also on the evolution and standardization of an Internet language. In simplified terms, the current change is evolving from an HTML to an XML language. Understanding of the new languages of the Internet will allow leading search engines to input solid data and then index it.
As the red arrow above grows to index across various new sectors of words, it is important that Internet technology leaders observe the Internet actions of their major users. Information needs to stay closely attuned to Internet users and their needs of search into the growing universe of Internet words. What technology in this area can do needs to be expanded to what technology should do and current users need to interface with Internet technologists.
For example, one day it may be possible to have the entire content of one's Internet messages to Usenet groups and Lists searched for and matched up with others who are similar. New communities could be formed around content analysis of words on the Internet.
The indexing and ranking of Internet words outside of current Web site searches will greatly expand the searchable cyberspace universe.
Figure 9: Beyond Words
But as words from new areas (like Usenet groups and Lists) are indexed, the searches will also be getting deeper to include both images and sounds. This three-dimension growth dynamic of Internet search is shown in the illustration above. The dark blue arrow represents current word search. The red arrow represents future word search. As search expands with the red arrow, it will also go deeper indicated by the orange arrow.
Propaganda & Privacy
Of course there are many today who look upon this prospect with great trepidation because of this invasion of their privacy rights. But it might just be time that we ask if the question of invasion of privacy rights has perhaps been one of those political propaganda phrases neatly manufactured by the powers that be to take our attention off important things. Like many things today, it is more valuable to consider where they come from, their contextual source, rather than what they consist of, their content. As we suggest earlier, context, or source, precedes content.
In many ways, the right of privacy argument (advanced by top American leaders and institutions) attempts to be the "knee jerk" phrase for those who would like to keep America private rather than public. Like any good "knee jerk" slogan or phrase, it requires little thought. Just that you buy into its "brand."
These are the Americans who perpetuate the "Bowling Alone" thesis of Harvard's Robert Putnam. The Americans that peak out from windows at the violence on the streets below.
It is grand political question for all Americans to consider today. Is the privacy argument applied to the Internet really an attempt to preclude communication on the Internet between all of the communities of interest?
The Promise of Research"After all is said and done, a hell of a lot more is said than done."
- H.L. Mencken
Most of the material in this report is what one might term informed speculation. In order for the ideas and theories discussed within it to be tested, the great search engines such as Google need to release greater depths of word rankings on a periodic basis. They also need to allow for segmentation of their results at least into national word searches so that specific cultures can be studied.
If this happens, there are excellent prospects for the development of a new science based on the study of Internet words and eventually Internet images and sounds. The validity of the theories of this new science will increase as more and more people have access to the Internet and use it to search for what interests them most in the world at particular times.
Two key areas of research will be market research and academic research. The new generation of Internet explorers will apply much of their efforts in these two areas.
In an increasing world of electronic actions, cultural analysis, market research and opinion polling still center on non-electronic techniques subject to various interpretations.
Some of the specific beneficiaries of Internet word market research are the following:
Marketing Research Firms
Public Relations Firms
Public Opinion & Polling
Entertainment Companies (Film, TV, Music)
Internet word research involves a number of academic disciplines. Some of these disciplines include the following:
Particle Swarm Intelligence
Forgotten Science of Observation
Overall, Internet word analysis has the potential to return scientific research to that old, seemingly forgotten concept of observation rather than the emotional speculation which fuels it in the middle of raging cultural wars of contemporary American politics. It offers the possibility to observe not just a few paid people behind the glass of a Madison Avenue focus group, but rather millions.
Ultimately, it is difficult for one to logically deny the argument that observing what a large part of culture does through electronic actions offers a far more accurate picture than asking a selected few of its members what they do through questions.
As cultural wars increase, cultural analysis becomes filled with greater political spin. In this atmosphere it becomes crucial for those who navigate the future course of America to observe what the majority of the culture does rather than listen to what a selected few people say it does.
There are a number of Americans who believe a new electronic democracy is creating a democracy parallel to traditional democracy.
In traditional democracy, American political leaders are elected into leadership positions every two or four years by the formal votes of, sadly around 30 percent of the population. In this new electric democracy, American products, media and Internet words are elected into leadership positions twenty-four hours a day, seven days a week by the informal "votes" of much more than 30 percent of the population. As Internet penetration and tracking electronic activities grows, daily electronic "votes" might some day approach 100 percent of the American population.
In essence, the freedom of all Americans to access the Internet is really the modern symbol for the American Constitution's right to free speech. It is a right so much more important than the right to assemble around a building holding banners or the right to march together in a group, however large. As access to the Internet in America spreads to all levels of society, electronic democracy will grow and become an important part in defining new American political parties.
Today, American political leaders move farther away from representing the current interests and needs of the nation, expressed through her actions. At the same time, the leaders of electric democracy (such as words) move closer to representing the immediate interests of the majority of Americans.
Leading Internet words may very well be the new electric symbols, barometers (or perhaps lightning rods) for the stormy forces of this new electronic democracy.
The Promise of the Internet
While Google (and a few other large search engines) has initiated some of the concepts in this report, most still remain as potential rather than reality.
If this potential is translated into reality, a new era of cultural insight will open up.
This is the beginning of a story about this new era. Like any beginning, it's filled with questions rather than answers.
It is a story about the potential of the Internet and its original promise, a promise yet to be realized.
About the Author
John Fraim has a BA in History from UCLA and a JD from Loyola Law School (Los Angeles). After a career as a marketing executive he founded The GreatHouse Company in 1995, a publishing, consulting and research firm with a focus in the area of the symbolism of popular culture. The company's online presence is www.symbolism.org.
His articles and reviews have been published in a number of leading publications including Business 2.0, The Industry Standard, Ad Busters, The Journal of Marketing, The Journal of Psychohistory, Spark OnLine, Media & Culture Journal, The Jung Page and Psychological Perspectives.
His book Spirit Catcher won the 1997 Small Press Award for Best Biography. Anticipated publication in the near future are three of his works on symbolism - Symbolism of Place, Symbolism of Popular Culture and Battle of Symbols. The first book explores the symbolism of context in stories, mythology and drama. The second book applies story concepts to American popular culture. The third book Battle of Symbols applies the symbolism of American popular culture to a global context. Parts of these books are currently posted on www.symbolism.org.
Thanks to a number of others who have offered advice and comments on this project. First of all, to Raymond Nasr at Google for helpful comments, direction and advice during the initial phases of the project.
And (of course) very special thanks to the Google search engine itself. Most of the research for this project was found through searches on Google exclusively. In addition to searching for information, Google also proved invaluable for locating people, many who eventually became enthusiastic advisors on the project. One of the things this demonstrates (again) is that information has the power to draw together communities of interest. Often, Google is the first step in this process yet few acknowledge this "downstream " power of Google to attract in addition to its familiar power to find.
Special thanks to a number of people in this Internet "community" attracted to the subject of this report. Your comments and advice are greatly appreciated. David Crystal, author of The Cambridge Encyclopedia of Language and professor at the University of Wales. It has been tremendously helpful to have one of the word's leading linguists comment and act as an advisor on this project. Also, to Allan Metcalf - Executive Secretary American Dialect Society and co-author of America In So Many Words; John Landry - Associate Editor, Harvard Business Review; Neil Howe - Author of The Fourth Turning; William Sulis - Canadian psychiatrist and a leading researcher in complexity theory and collective behavior; James Kennedy - Author of Swarm Intelligence; Stuart Sigman - Dean of Communications, Emerson College; Nelson Thall - Creator McLuhan List, Canada; Barry Wellman - Professor of Sociology, University of Toronto; Robert Delamar - Editor, Spark OnLine; Howard Bloom - Professor, New York University; Douglas Rushkoff - Author and Internet commentator; Charles Grantham - President, Institute for Distributed Work and author The Future of Work; Peter Forbes Crossman - Editor and Sparkplug, Cyber City Consortium; Ted Goertzel - Professor of Sociology, Rutgers University, Camden, N.J.; Thomas Cermak - Media Theorist, Canada; Arthur Berger - Professor of Communications, University of San Francisco; Stuart Fischoff - Professor of Communications, California State University, Los Angeles; Mark Baker - Professor of Linguistics, Rutgers University and author The Atoms of Language: The Mind's Hidden Rules of Grammar; and, Edward McQuarrie - Professor of Management, Santa Clara University.
David A. Aaker and Erich Joachimsthaler. 2000. Brand Leadership. New York: Free Press.
Walter Truett Anderson, 2001. All Connected Now: Life in the First Global Civilization. Boulder, Colo.: Westview Press.
David K. Barnhart and and Allan A. Metcalf, 1997. America in So Many Words: Words That Have Shaped America. Boston: Houghton Mifflin.
Morris Berman, 2000. The Twilight of American Culture. New York: W.W. Norton.
Tim Berners-Lee with Mark Fischetti, 1999. Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web by its Inventor. San Francisco: HarperSanFrancisco.
Howard K. Bloom, 2000. Global Brain: The Evolution of Mass Mind From the Big Bang to the 21st Century. New York: Wiley.
Daniel J. Boorstin, 1992. The Image: A Guide to Pseudo-Events in America. New York: Vintage Books.
Joseph Campbell, 1973. The Hero With A Thousand Faces. Princeton, N.J.: Princeton University Press.
Elias Canetti, 1998. Crowds and Power. New York: Noonday Press.
James W. Carey, 1992. Communication as Culture: Essays on Media and Society. New York: Routledge.
Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins, Jon M. Kleinberg, and David Gibson, 1999. "Hypersearching the Web," Scientific American, (June).
Tom Chetwynd, 1993. Dictionary of Symbols. San Francisco: Aquarian-Thorsons.
Jean Chevalier and Alain Gheerbrant. 1994. A Dictionary of Symbols. Translated by John Buchanan-Brown. Cambridge, Mass.: Blackwell.
Juan Eduardo Cirlot, 1971. A Dictionary of Symbols. New York: Philosophical Library.
Randall Collins, 1998. The Sociology of Philosophies: A Global Theory of Intellectual Change. Cambridge, Mass.: Belknap Press of Harvard University Press.
Jean C. Cooper, 1978. An Illustrated Encyclopaedia of Traditional Symbols. London: Thames and Hudson.
David Crystal, 2001. Language And The Internet. Cambridge: Cambridge University Press.
David Crystal, 1997. The Cambridge Encyclopedia of the English Language. Cambridge: Cambridge University Press.
Edward R. Dewey with Og Mandino, 1971. Cycles: The Mysterious Forces That Trigger Events. New York: Hawthorne Books.
Mary Douglas, 1996. Natural Symbols: Explorations in Cosmology. London: Routledge.
Mircea Eliade, 1991. Images and Symbols: Studies in Religious Symbolism. Princeton, N.J.: Princeton University Press.
Mircea Eliade, 1985. Cosmos and History: The Myth of the Eternal Return. New York: Garland.
Erik H. Erikson, 1997. The Life Cycle Completed. New York: W.W. Norton.
Stuart Ewen, 1976. Captains of Consciousness: Advertising and the Social Roots of the Consumer Culture. New York: McGraw-Hill.
John Fraim, 1998. The Symbolism of Popular Culture. Unpublished; sections on www.symbolism.org.
John Fraim, 1995. The Symbolism of Place. Unpublished; sections on www.symbolism.org.
Thomas Frank, 2000. One Market Under God: Extreme Capitalism, Market Populism, and the End of Economic Democracy. New York: Doubleday.
Sir James Frazer, 1964. The Golden Bough. New York: Mentor.
Sigmund Freud, 1989. Group Psychology and the Analysis of the Ego. New York: W.W. Norton.
Sigmund Freud, 1975. The Future of an Illusion. New York: W.W. Norton.
Sigmund Freud, 1963. Character and Culture. New York: Collier.
Erich Fromm, 1995. Escape From Freedom. New York: Henry Holt.
Northrop Frye, 2000. Anatomy of Criticism: Four Essays. Princeton, N.J.: Princeton University Press.
Malcolm Gladwell, 2000. The Tipping Point: How Little Things Can Make a Big Difference. New York: Little, Brown.
Samuel Hayakawa, 1991. Language in Thought And Action. 5th Edition. New York: Harvest Books.
Carl Jung, 1990. The Archetypes of the Collective Unconscious. Princeton, N.J.: Princeton University Press.
Carl Jung, 1977. Mysterium Coniunctionis: An Inquiry into the Separation and Synthesis of Psychic Opposites in Alchemy. Princeton, N.J.: Princeton University Press.
Carl Jung, 1978. Aion: Researches into the Phenomenology of the Self. Princeton, N.J.: Princeton University Press.
Carl Jung, 1976. Symbols of Transformation: An Analysis of the Prelude to a Case of Schizophrenia. Princeton, N.J.: Princeton University Press.
Carl Jung, 1973. Synchronicity: An Acausal Connecting Principle. Princeton, N.J.: Princeton University Press.
Otto F. Kernberg, 1998. Ideology, Conflict, and Leadership in Groups and Organizations. New Haven: Yale University Press.
Roland Marchand, 1985. Advertising The American Dream: Making Way For Modernity, 1920-1940. Berkeley: University of California Press.
Eric McLuhan, 1998. Electric Language: Understanding the Message. New York: St. Martin's Press.
Marshall McLuhan, 1999. The Medium and the Light: Reflections on Religion. Toronto: Stoddart.
Marshall McLuhan, 1994. Understanding Media: The Extensions of Man. Cambridge, Mass.: MIT Press.
Marshall McLuhan, 1989. The Global Village: Transformations in World Life and Media in the 21st Century. New York: Oxford University Press.
Joshua Meyrowitz, 1985. No Sense of Place: The Impact of Electronic Media on Social Behavior. New York: Oxford University Press.
Erich Neumann, 1995. The Origins and History of Consciousness. Princeton, N.J.: Princeton University Press.
Jan Willem Schulte Nordholt, 1995. The Myth of the West: America as the Last Empire. Grand Rapids, Mich.: Eerdmans.
Joseph S. Nye, 2002. The Paradox of American Power: Why the World's Only Superpower Can't Go It Alone. New York: Oxford University Press.
Vance Packard, 1985. The Hidden Persuaders. New York: Pocket Books.
Peter G. Peterson, 1999. Gray Dawn: How the Coming Age Wave Will Transform America - and the World. New York: Times Books.
Sal Randazzo, 1993. Mythmaking on Madison Avenue: How Advertisers Apply The Power of Myth & Symbolism to Create Leadership Brands. Chicago: Probus.
Otto Rank, 1959. The Myth of the Birth of the Hero, and Other Writings. New York: Vintage.
David Riesman with Nathan Glazer and Reuel Denney, 1969. The Lonely Crowd: A Study of the Changing American Character. New Haven: Yale University Press.
Ferdinand de Saussure, 1966. Course in General Linguistics. New York: McGraw-Hill.
Thomas Schatz, 1981. Hollywood Genres: Formulas, Filmmaking, and the Studio System. Philadelphia: Temple University Press.
Jeffrey Scheuer, 1999. The Sound Bite Society: Television and the American Mind. New York: Four Walls Eight Windows.
Elaine Showalter, 1997. Hystories: Hysterical Epidemics and Modern Culture. New York: Columbia University Press.
Huston Smith, 2001. Why Religion Matters: The Fate of the Human Spirit in an Age of Disbelief. New York: HarperCollins.
Barbara Maria Stafford, 2001. Visual Analogy: Consciousness As The Art of Connecting. Cambridge, Mass.: MIT Press.
William Strauss and Neil Howe, 1997. The Fourth Turning: An American Prophecy. New York: Broadway Books.
Donald F. Theall, 2001. The Virtual Marshall McLuhan. Montreal: McGill-Queen's University Press.
Jack Trout, 2000. Differentiate Or Die: Survival in our Era of Killer Competition. New York: Wiley.
Frederick Jackson Turner, 1996. The Frontier in American History. New York: Dover.
Joseph Turow, 1997. Breaking Up America: Advertisers and the New Media World. Chicago: University of Chicago Press.
James B. Twitchell, 1999. Lead Us Into Temptation: The Triumph of American Materialism. New York: Columbia University Press.
Harold L. Vogel, 1998. Entertainment Industry Economics: A Guide for Financial Analysis. 4th Edition. Cambridge: Cambridge University Press.
Michael J. Weiss, 2000. The Clustered World: How We Live, What We Buy, and What It All Means About Who We Are. Boston: Little, Brown.
Michael J. Wolf, 1999. The Entertainment Economy: How Mega-media Forces Are Transforming Our Lives. New York: Times Books.
Appendix A: Related Subject Areas
Particle Swarm Intelligence
Collective, Group Psychology
Appendix B: Contacts/Advisors
Allan Metcalf - Executive Secretary, American Dialect Society and co-author of America In So Many Words.
John Landry - Associate Editor, Harvard Business Review.
Neil Howe - author of The Fourth Turning.
William Sulis - Canadian psychiatrist and a leading researcher in complexity theory and collective behavior.
Robin Lakoff - Professor of Linguistics, University of California, Berkeley and author of The Language War.
James Kennedy - author of Swarm Intelligence.
George Colony - CEO, Forrester Research, Cambridge.
William MacElroy - President, Modalis Research, San Francisco.
Stuart Sigman - Dean of Communications, Emerson College.
Nelson Thall - Creator, McLuhan List, Canada.
Donald Theall - author of The Virtual Marshall McLuhan.
Barry Wellman - Professor of Sociology, University of Toronto.
Clay Shirky - Internet author and commentator.
Andy Oram - O'Reilly Associates and Editor, Peer to Peer.
David Sims - Editor, O'Reilly Associates.
Eric McLuhan - Professor, University of Toronto.
Robert Delamar - Editor, Spark OnLine.
Howard Bloom - Professor, New York University.
Joshua Myrowitz - Professor of Communications, University of New Hampshire.
Paul Duguid - Professor, University of California, Berkeley and co-author of The Social Life of Information.
Douglas Rushkoff - author and Internet commentator.
Charles Grantham - President, Institute for Distributed Work, and author of The Future of Work.
Peter Forbes Crossman - Editor, Spark Plug for the North Bay Cybercity Consortium.
Ted Goertzel - Professor of Sociology, Rutgers University, Camden, N.J.
Thomas Cermak - media theorist, Canada.
Arthur Berger - Professor of Communications, University of San Francisco.
Stuart Fischoff - Professor of Communications, California State University, Los Angeles.
Frederick J. Newmeyer - President, Linguistic Society of America, and Professor of Linguistics, University of Washington.
Winfriend Kurth - Chair for Practical Computer Science & Graphics Systems, Department of Computer Science, Technical University at Cottbus, Germany.
Linguistics Society of America - http://www.lsadc.org/.
American Dialect Society - http://www.americandialect.org.
Brian Joseph - Editor, Language, Journal of the LSA, Ohio State University.
Dennis R. Preston - President, American Dialect Society, and Professor of Linguistics, Michigan State University.
Connie Eble - Editor, American Speech, and Professor of English, University of North Carolina at Chapel Hill.
Mark Baker - Professor of Linguistics, Rutgers University and author of The Atoms of Language: The Mind's Hidden Rules of Grammar.
Edward McQuarrie - Professor of Management, Santa Clara University.
David Crystal - Editor, Cambridge Encyclopedia of Language, Professor of Language, University of Wales, and author of Language And The Internet.
Appendix C: Major Search Engines
By SearchEngineWatch.Com Staff
(Updated: Jan. 22, 2002)
AllTheWeb.com (FAST Search)
AllTheWeb.com (also known as FAST Search) consistently has one of the largest indexes of the Web. FAST also offers large multimedia and mobile/wireless Web indexes, available from its site. The site, also known as AllTheWeb.com, is a showcase for FAST's search technologies. FAST's results are provided to numerous portals, including those run by Terra Lycos. FAST Search launched in May 1999.
AltaVista is one of the oldest crawler-based search engines on the Web. It has a large index of Web pages and a wide range of power searching commands. It also offers news search, shopping search and multimedia search. AltaVista opened in December 1995. It was owned by Digital, then run by Compaq (which purchased Digital in 1998), then spun off into a separate company that is now controlled by CMGI.
AOL Search allows its members to search across the Web and AOL's own content from one place. The "external" version, listed above, does not list AOL content. The main listings for categories and Web sites come from the Open Directory (see below). Inktomi (see below) also provides crawler-based results, as backup to the directory information.
Ask Jeeves is a human-powered search service that aims to direct you to the exact page that answers your question.
Direct Hit measures what people click on in the search results presented at its own site and at its partner sites, such as HotBot. Sites that get clicked on more than others rise higher in Direct Hit's rankings. Thus, the service dubs itself a "popularity engine." Aside from running its own Web site, Direct Hit provides the main results that appear at HotBot (see below) and is available as an option to searchers at MSN Search. Direct Hit is owned by Ask Jeeves (above). Some Direct Hit information appears at Ask Jeeves. See the Using Direct Hit Results page to learn more about Direct Hit.
Google is a top choice for Web searchers. It offers the largest collection of Web pages of any crawler-based search engine. Google makes heavy use of link analysis as a primary way to rank these pages. This can be especially helpful in finding good sites in response to general searches such as "cars" and "travel," because users across the Web have in essence voted for good sites by linking to them. The system works so well that Google has gained wide-spread praise for its high relevancy. Google provides Web page search results to a variety of partners, including Yahoo and Netscape Search (see below). Google also provides the ability to search for images, through Usenet discussions and its own version of the Open Directory (see below).
In most cases, HotBot's first page of results comes from the Direct Hit service (see above), and then secondary results come from the Inktomi search engine, which is also used by other services. It gets its directory information from the Open Directory project (see below). HotBot launched in May 1996 as Wired Digital's entry into the search engine market. Lycos purchased Wired Digital in October 1998 and continues to run HotBot as a separate search service.
iWon's results come from both Overture & Inktomi. iWon gives away daily, weekly and monthly prizes in a marketing model unique among the major services. It launched in Fall of 1999.
Originally, there was an Inktomi search engine at UC Berkeley. The creators then formed their own company with the same name and created a new Inktomi index, which was first used to power HotBot. Now the Inktomi index also powers several other services. All of them tap into the same index, though results may be slightly different. This is because Inktomi provides ways for its partners to use a common index yet distinguish themselves. There is no way to query the Inktomi index directly, as it is only made available through Inktomi's partners with whatever filters and ranking tweaks they may apply.
LookSmart is a human-compiled directory of Web sites. In addition to being a stand-alone service, LookSmart provides directory results to MSN Search, Excite and many other partners. Inktomi provides LookSmart with search results when a search fails to find a match from among LookSmart's reviews. LookSmart launched independently in October 1996, was backed by Reader's Digest for about a year, and then company executives bought back control of the service.
Lycos started out as a search engine, depending on listings that came from spidering the web. In April 1999, it shifted to a directory model similar to Yahoo. Its main listings come from AllTheWeb.com with some results from the Open Directory project. In October 1998, Lycos acquired the competing HotBot search service, which continues to be run separately.
Microsoft's MSN Search service is a LookSmart-powered directory of Web sites, with secondary results that come from Inktomi. Direct Hit data is also made available.
Netscape SearchNetscape Search's results come primarily from the Open Directory and Netscape's own "Smart Browsing" database, which does an excellent job of listing "official" Web sites. Secondary results come from Google. At the Netscape Netcenter portal site, other search engines are also featured.
The Open Directory uses volunteer editors to catalog the Web. Formerly known as NewHoo, it was launched in June 1998. It was acquired by Netscape in November 1998, and the company pledged that anyone would be able to use information from the directory through an open license arrangement. Netscape itself was the first licensee. Netscape-owner AOL also uses Open Directory information, as does Google and Lycos.
Yahoo is the Web's most popular search service and has a well-deserved reputation for helping people find information easily. The secret to Yahoo's success is human beings. It is the largest human-compiled guide to the Web, employing about 150 editors in an effort to categorize the Web. Yahoo has well over one million sites listed. Yahoo also supplements its results with those from Google. If a search fails to find a match within Yahoo's own listings, then matches from Google are displayed. Google matches also appear after all Yahoo matches have first been shown. Yahoo is the oldest major Web site directory, having launched in late 1994.
Paper received 15 May 2002; revised version received 20 May 2002; accepted 20 May 2002.
Copyright ©2002, First Monday
Electric Symbols: Internet Words And Culture by John Fraim
First Monday, volume 7, number 6 (June 2002),
A Great Cities Initiative of the University of Illinois at Chicago University Library.
© First Monday, 1995-2017. ISSN 1396-0466.