Desperately seeking the consumer: Personalized search engines and the commercial exploitation of user data
First Monday

Desperately seeking the consumer: Personalized search engines and the commercial exploitation of user data by Theo Rohle



Abstract
With reference to surveillance studies theory, this paper critically assesses the role of personalized search engines as a mediator between advertisers and users. It first sketches the economic and technical background of online marketing and personalized searches. Then, it engages in an in–depth discussion of two examples of personalized search engines with regard to the data collection process used and the way in which this data is used for advertising purposes. The discussion shows that users’ information needs, as well as their personal data, are subject to a growing pressure in terms of commercial exploitation. Essentially, search engines now fulfill the task of translating information needs into consumption needs.

Contents

1. Introduction
2. The value of user data
3. The development of search engine marketing
4. The personalization of searching
5. Social search
6. Recommender systems
7. Conclusions

 


 

1. Introduction

For those who thought that search engines primarily deals with finding things on the Internet, the year 2007 offered a surprise. Within only a few months, all three major search engine providers announced the acquisition of online advertising companies: Google bought DoubleClick, Yahoo! bought Right Media and Microsoft bought aQuantive. Additionally, the Google CEO, Eric Schmidt, explained in an interview that one should think of Google’s future role “first as an advertising system”, then as a provider of other kinds of services (Vogelstein, 2007). Such a substantial move towards advertising clearly indicates a shift of priorities within the search engine market (Sachoff, 2007). Whereas earlier, advertising provided the means for search engines to improve the quality of their results, now, search engines have developed into advertising platforms, with the search function acting as their traffic generator.

The introduction of contextual advertising in 1998, a technology that allowed advertisers to connect search terms to their commercial messages, was the first step in the search engines’ development into a “logical tool to connect advertisers to consumers.” [1] However, this technology is only the beginning of an ever–tighter connection between search and advertising, which aims at the ultimate translation of users’ information needs into consumption needs. Search engines have successfully built a “database of intentions” (Battelle, 2005) from their users’ queries. But to gain real insight into the interests and wishes of their users, they need considerably more data. This is where personalized search enters the picture. Personalization is a feature of a new generation of search engines that are about to take over from today’s Web search. The personal data processed by a personalized search engine can be used to tailor advertising exactly to individual interests.

 

++++++++++

2. The value of user data

The continuing success of search engine marketing can be related to two fundamental problems associated with traditional print and TV advertising. Firstly, traditional forms of advertising rely on comparatively crude demographic models such as lifestyle or milieu. These models have difficulties reflecting a development where consumption patterns are increasingly differentiated [2]. Especially in niche markets, many advertising campaigns therefore face the problem of unsatisfactory returns on investment (ROI). Contextual advertising in search engines provides a solution to this problem. It allows advertisers to reach the “long tail” of users (Waters, 2007) in a more cost–effective way by addressing them with advertising that corresponds to the specific information needs they express in their search queries.

The second fundamental problem for traditional advertising has been a widespread lack of acceptance by the targeted audience. This rejection finds its expression in both practical and psychological avoidance strategies. Turow (2006) offers pop–up blockers and spam protection as well as digital video recorders and zapping as examples of practical avoidance strategies. An example of a psychological avoidance strategy is the drop in attention during commercial breaks on TV (Turow, 2006). According to Turow, advertisers have adopted a range of different strategies to bypass these technical and psychological barriers. These strategies converge towards a logic he termed “Customer Relationship Media”:

“First, a firm must reach wanted consumers in the main stream [sic] of their media activities in ways that make them feel connected to the company and its products. Second, the connectivity must encourage consumers to give up information that can help the company fine–tune discriminatory sales pitches. Third, the selling environment must be structured to make sure that they pay attention — or at least do not eliminate — their commercial messages.” [3]

The first point illustrates one of the major advantages of search engine marketing. Since searching is the second most used service on the Internet (Rainie and Shermack, 2005), users can effectively be addressed in the mainstream of their media activity. Compared to TV advertising, contextual advertising also provides a much more effective way to create a feeling of connectivity between the customer and the product. The second point draws attention to the use of the search query as the basis for targeted advertisements. The information need expressed by the user provides information to advertisers that they can use to automatically adjust their sales pitch.

Contextual advertising has proved successful for search engine providers as well as for advertisers, but recent developments in search engine technology seem even more closely connected to the criteria of Customer Relationship Media. Up until now, both contextual advertising and search itself have relied almost exclusively on the users’ search queries. With personalization, a range of contextual information can be added to the queries, which improves both the relevance of the search results and the targeting accuracy of the displayed advertisements.

... the collection and commercial use of user data in personalized search engines gives rise to serious privacy concerns.

However, the collection and commercial use of user data in personalized search engines gives rise to serious privacy concerns. A look at surveillance studies theory can help to put these concerns into a larger perspective. The commercial exploitation of personal data has gained growing attention within the academic discussion of surveillance. Haggerty and Ericson (2000) introduced the concept of the “surveillant assemblage” to describe the transition from the monolithic structures of state surveillance to less centrally organized forms of surveillance which are based on the compilation of heterogeneous data streams into digital “data doubles”.

This transition also entails a shift in the focus of surveillance. Surveillance no longer aims at imposing rigid disciplinary measures on society, but at maintaining a certain level of predictability. Haggerty and Ericson pointed out that the anonymity of the modern city renders surveillance more difficult than in the more directly observable spheres of the village. Decreasing social coherence and increasing social differentiation lead to a lack of control which is countered by the “surveillant assemblage”. It provides a means for certain institutions to be able to predict behavior, which, especially for commercial actors, entails a high value:

“Increasingly important to modern capitalism is the value that is culled from a range of different transaction and interaction points between individuals and institutions. Each of these transactions is monitored and recorded, producing a surplus of information. The monetary value of this surplus derives from how it can be used to construct data doubles which are then used to create consumer profiles, refine service delivery and target specific markets.” [4]

Faced by a growing economic pressure, the protection of personal data has low priority in such a scenario. Due to the sheer amount of available data, it is also of less importance whether it is possible to trace back the data to an individual user: “It is not the personal identity of the embodied individual but rather the actuarial or categorical profile of the collective which is of foremost concern to the surveillant assemblage.” [5] Reflecting on data protection in personalized search from a legal point of view, Boris Rotenberg therefore advocated a focus on data autonomy rather than privacy:

“In this view, it is not so important whether you know the real–world identity of the user who entered the search terms, or whether the information can be linked to a particular real–world identity [...]. Surveillance by market players is intended to induce (as opposed to suppress) users into buying behaviour, but it is no less invasive of our autonomy than government control that may want to prevent users from engaging in certain behaviour. The fact that we are often watched by machines which seem less invasive from a secrecy point of view does not make it less problematic from a data protection point of view.” [6]

Surveillance no longer aims at imposing rigid disciplinary measures on society, but at maintaining a certain level of predictability.

The increased collection of user data in new generations of search engines has to be assessed against the background of a growing need for data on the part of commercial interests. Due to increasing computing power and storage capacities, it becomes possible to analyze ever–larger amounts of data with data mining methods. The better these methods are at predicting behavior, the more effective will be the ways they provide to sell products. Technical implementations that increase users’ control over their data are very rare [7]. Instead, the collected user data is used as raw material for an increasingly effective targeting engine.

The following sections of this paper provide the economic and technical background for a better understanding of the current fusion of search and advertising. The first chapter describes the development of search engine marketing, from banner advertising to contextual advertising and behavioral targeting. It is followed by an outline of technical developments from simple keyword searches to the increased integration of contextual information in today’s personalized search engines. Finally, two implementations of personalized searching are discussed in depth to reveal which personal data is gathered and how advertisers use it to improve the targeting of individual users.

 

++++++++++

3. The development of search engine marketing

3.1 Banner advertising

Graphical banner advertising on the Internet was introduced by HotWired in 1994 (Fain and Pedersen, 2005). Like in traditional print and TV advertising, prices for banners are calculated on a cost–per–mille (CPM) basis, meaning that advertisers have to pay for each time their ad is displayed. Within this advertising paradigm, the essential asset of the search engines was their popularity, since more visitors to their homepages meant more page impressions of the displayed advertisements. However, typical users usually only stayed on the homepage for a short time, since they were more interested in the search results than in the homepage itself. Search engine providers like AltaVista tried to solve this problem by expanding their homepages into larger portals with e–mail, chat and other services, in order to keep users within their reach. However, this strategy ultimately failed since Google’s clean search interface proved to be more popular with many users.

3.2 Contextual advertising

A major change in online marketing was brought about by a system that was introduced by the product search engine GoTo.com in 1998. Instead of placing banner ads on the homepage, entries in the result lists were auctioned out directly to advertisers. The developer of the system, Bill Gross, summarized its principle as follows: “I realized that when someone types ‘Princess Diana’ into a search engine, they want, in effect, to go into a Princess Diana store — where all the possible information and goods about Princess Diana are laid out for them to see.” [8]

The system created by GoTo.com made it possible for the first time to tie users’ information needs directly to commercial messages. Today, all the major search engines offer contextual advertising, albeit separated more clearly from the so–called “organic” results.

Here, CPM has been replaced by cost–per–click (CPC) as a pricing scheme, meaning that advertisers only have to pay if users actually click on their messages. Since users are more likely to interact with ads that reflect their interests, relevance has become the key to economic success within this advertising paradigm.

In 2001, GoTo.com changed its name to Overture, reflecting a shift in focus towards providing contextual ads for other search engines. Yahoo! acquired Overture in 2003 and, two years later, renamed it Yahoo! Search Marketing. Instead of relying on Overture to provide contextual ads, Google developed its own systems to sell contextual ads to the highest bidders — AdWords and AdSense.

The introduction of contextual advertising was a major novelty, not only for search engines, but also for marketing as a whole. Instead of choosing target groups based on costly demographic research, this type of advertising made it possible for advertisers to communicate directly with users the moment they expressed a certain interest.

The logic behind contextual advertising is expressed clearly in Bill Gross’ quote: a specific information need is taken to be an indicator for a corresponding consumption need. However, there is no empirical evidence for such a general assumption [9]. The connection of a search query and a commercial message therefore seems less like the answer to the users’ wish for a “Princess Diana Store”, but rather like the answer to businesses’ need for a more direct way to reach individual customers. Recent developments in online marketing provide even clearer evidence that the analysis of search queries is only a small part of a growing system of user surveillance in the interests of commercial actors.

3.3 Behavioral targeting

Behavioral targeting is a development within online marketing that is gaining in importance for search engine providers. The term describes a number of methods for analyzing user behavior to determine when a user is most receptive to certain kinds of advertising. Internet users can be observed in a basic way by analyzing IP address, bandwidth, browser type and operating system in server logs. Since 1997, companies like DoubleClick have also provided the means to track users via cookies on several different Web sites and over longer periods of time. With behavioral targeting techniques this user data can be analyzed to generate data models of online purchasing behavior. By correlating current user behavior with these data models, advertisements can be displayed when they are most likely to influence users’ purchasing decisions (Long, 2007). Since behavioral targeting requires large amounts of user data to generate reliable statistical models, it is most effective in larger networks and portals where a lot of user data can be collected.

In 2006, Yahoo! re–launched its behavioral targeting platform in the U.S. (Papadopoulos, 2006). The system is based on an analysis of past search queries, usage patterns and reactions to advertising campaigns that have been displayed within the reach of Yahoo!’s network. In an interview, Yahoo!’s Chief Financial Officer, Sue Decker, explained more explicitly what kind of data is included in these models:

“We’re actually in a fairly unique position to be able to take advantage [...] of the enormous data and insight we have on the largest online audience in the world. [...] We can see what people are putting in their search strings. We can see what kinds of ads they click on. We can see what kinds of sites they were on prior to the site that they are currently on.” [10]

With a growing number of companies offering behavioral targeting services, user queries provide a significant competitive advantage for search engine providers. By combining contextual advertising and behavioral targeting, all user activities including searching, browsing and shopping, can be accompanied by precisely targeted advertisements. However, search queries encompass especially sensitive user data since they allow deep insights into the personal interests and habits of individual users [11]. Thus, behavioral targeting programs implemented in search engines represent a new quality of user data exploitation for commercial purposes.

The success of behavioral targeting hinges on two factors: the reach of the network in which user behavior can be monitored and the depth of the user data that can be collected. In 2007, Google, Yahoo!, AOL and MSN have each extended the reach of their networks considerably by acquiring DoubleClick, Right Media, Tacoda and aQuantive respectively. To be able to extend the depth of the data collection process as well, new methods are needed to make users disclose more of their personal data. As will be shown below, the personalization of search provides the technology that exactly fulfills this purpose.

 

++++++++++

4. The personalization of searching

Online searching is a highly competitive market where the relevance of search results plays a key role. Broder (2002) explained the success of today’s search engines, mainly based on link topology, over older versions, mainly based on text statistics, by a significant boost in relevance. However, a comparable boost in relevance is hardly feasible in the current search engine environment (Enge, 2007). While the amount of data to be indexed continues to grow at an exponential rate, the quality of users’ search queries does not improve. Most queries consist only of one or a few terms with frequently unclear connotations [12].

Since a general, or horizontal, search engine can only use general criteria for determining relevance it becomes more difficult to manage the amount of available data. Methods to improve relevance in horizontal search engines have drawn on contextual information, e.g. in the form of query refinements, language settings or date restrictions. Vertical search engines that specialize in certain subject areas also have a better chance of finding individually relevant results (Enge, 2007). Personalized search takes a step beyond these rather basic parameters by storing personal preferences over longer periods of time and integrating them into the search process. Keenoy and Levene (2005) have provided a comprehensive list of technical aspects that are involved in the personalization of a Web search. Three of those techniques deserve further attention here: a) the user data collection method; b) profile storage; and, c) the personalization method.

a) User data collection method

User data can be collected either through explicit user interaction or implicitly by automatically analyzing user behavior. Early forms of personalized search often provided a clickable list of topics that users could choose from to specify their interests (Khopkar, et al., 2003). In other forms of personalized search, such as Google Web History (www.google.com/history), the interests of the user are inferred implicitly by analyzing clicking behavior in the result lists.

b) Profile storage

For the collected data to be used in different search processes, it needs to be stored as a personal profile. This can be done either on the server of the search engine provider or client–side as a cookie on the user’s computer. Both options have disadvantages: whereas server–side storage often implies commercial use of the data, a cookie stored on the client’s side might be erased accidentally. Another aspect of profile storage is whether the data is stored adaptively or statically. Adaptive storage implies that the data can be changed manually or automatically to reflect changes in the user’s preferences.

c) Personalization method

Most importantly, the stored data has to be incorporated into the search process. This can be done by expanding or modifying the search query with terms drawn from the personal profile. It is also possible to re–rank the results on the basis of personal preferences [13]. A third option is to filter the results according to some specified criteria.

Although no single personalization technique has emerged as a standard on either level, it is generally agreed that personalization is going to play a major role in the future of Web search (Anand and Mobasher, 2005). The following chapters discuss two specific applications of personalized search that have evolved beyond research labs. The discussion focuses on the collection of user data and on how this data is used to target users as consumers. Drawing on the theoretical background, the chapters also discuss how an increased use of these personalization techniques can be evaluated within a wider context.

 

++++++++++

5. Social search

Personalization features like relevance feedback and filtering are not necessarily confined to individual usage, but can also be used by groups. Collective relevance feedback, implemented in Web 2.0 services such as Digg (www.digg.com) and Reddit (www.reddit.com), has gained in popularity. Collaborative personalization methods have also been implemented in larger search engines in projects such as Google Coop (www.google.com/coop) or Yahoo! MyWeb (www.yahoo.com/myweb). Termed “social search”, these services have attracted growing attention on the part of online marketers (Sherman, 2006).

Among the technically most advanced projects in the area of social search are the so–called Swickis developed by Eurekster. A Swicki (from “search” and “wiki”) is a search engine that can be optimized for searches in certain subject areas. As the developers explain, they are therefore especially useful in connection with special–interest Web sites or weblogs: “It’s a natural extension of creating a Web page or blog or podcast to create a search engine around a topic you and your community care about.” (“Eurekster. About us”, 2007). A Swicki offers numerous possibilities to set search parameters, both for the owner and the user. During the process of creating a Swicki, the owner can specify the sources that should be included in the search, e.g. a blog search, the Yahoo! index, the owner’s Web site and up to ten different other URLs. Additionally, the owner can choose a query expansion that is included in every search and the number of results from each source can be specified.

The result list offers explicit relevance feedback by letting users vote for or against each result and by giving users the possibility of adding new URLs to the index. Additionally, the system uses forms of implicit relevance feedback, since URLs which receive a lot of clicks can move up in the result list and can even be added to the list of URLs which are included in every search. To prevent manipulation by the user, the owner can adapt the stored profile in several ways. URLs can be blocked completely and explicit as well as implicit relevance feedback can be allowed or denied separately for the community and for the owner.

When creating a Swicki, the owner can decide whether or not to integrate advertisements, where to display advertising and which advertising program to use [14]. Any revenue from these programs is shared evenly between Eurekster and the Swicki owner. Eurekster does not disclose exactly how advertising is matched to the displayed content, but they do claim that “on swicki–enabled sites the value of search words, ad placements, click–through and other merchandising vehicles is increased benefiting advertisers.” (“Eurekster. swicki FAQs”, 2007)

The way to map advertising to the users’ interaction with the Swicki bears close resemblance to the logic of Customer Relationship Media discussed by Turow. The first aspect might not be met completely, since the use of Swickis in connection with special interest Web content is not yet part of the “mainstream” of media activities. The second aspect is, on the other hand, fulfilled very well by Swickis, since they offer especially good conditions for creating a connection between users and advertisers. Compared to the results in traditional horizontal search engines, which hardly disclose any information about their relevance criteria, Swickis offer more transparency and interactivity. Through the subject focus and the possibilities for collaboration, users are addressed as active members of a community rather than as anonymous searchers. This activation of users can also be interpreted as a way of developing a less hostile attitude towards advertisements.

Swicki owners play an ambiguous role in this system. They lay the foundations for the evolution of an online community by creating the content around the Swicki and setting the search parameters. Since they receive a share of the advertising revenue, they are encouraged to keep up this commitment on a regular basis. A weekly report from Eurekster informs them about recent search activity and the development of their Swicki’s worth. The owners thus receive a limited means of control over their user base and they are tightly drawn into the process of commercially exploiting the users’ data. In what can be seen as a version of crowdsourcing (Howe, 2006), the creation of profiled communities as well as the marketing for the corresponding Swickis is thus laid out to a large number of Swicki owners.

The small number of users per Swicki and the implementation of the Swickis across a large number of dispersed Web sites make Eurekster seem less data–hungry than a search engine giant like Google. Despite the dispersal of the Swickis, though, Eurekster is still able to monitor all click streams from its central location within the network. The various ways of user interaction also create much larger amounts of data per user than do traditional search engines. The activation of Swicki owners and users thus creates the ideal basis for the second part of the Customer Relationship Media criteria: the collection of additional information to fine–tune sales pitches. Whereas increased participation and collaboration is usually portrayed as a step towards more democratic forms of Internet usage, in this context it should rather be seen as a strategy to create tighter bonds between advertisers and users.

 

++++++++++

6. Recommender systems

It is debatable whether or not recommender systems should be seen as part of the development of personalized search. Rather than displaying information as the result of a search query, these systems usually deliver recommendations based on an analysis of a user’s online behavior. However, since both search engines and recommender systems share the aim of helping users find relevant information, it seems legitimate to discuss them in this context. Technically, search engines and recommender systems can overlap to varying degrees, depending on the information sources used in the recommendation process (Keenoy and Levene, 2005).

The following discussion concentrates on PersonalWeb, a client–based application that creates a personalized homepage based on the user’s online behavior. PersonalWeb has been developed by Claria Corporation as a demonstration tool for their personalization software, Axon. Since this software is ultimately aimed at larger Web publishers and advertisers, Claria especially stresses its benefits in terms of advertising targeting possibilities. It therefore seems an especially appropriate example to discuss with regard to developments within online marketing as outlined above.

After its installation on a client computer, PersonalWeb creates a customizable Web page which is substituted for the browser’s opening page. In the beginning, this Web page consists of a number of elements displaying general information like news, weather, etc. As in online applications offering similar services, such as iGoogle (www.google.com/ig) or Netvibes (www.netvibes.com), the user can remove elements or include new ones, such as blog feeds, search engines or maps. Whereas other customizable homepages require considerable user interaction to add new elements, PersonalWeb instead actively recommends new sources of information.

The recommendations are based on user behavior like surfing, searching and navigating within Web sites and are integrated into the personalized site if the user chooses to adopt them. Personal data is thus mainly collected implicitly, since user behavior is monitored without any direct interaction with the application. Data is collected explicitly only during the initial set–up of personal information sources and the feedback on recommendations. The personal profile is stored locally, but Claria is able to access this data to identify additional sources of information that might be relevant to other users (Martin and Veteska, 2006). The profile also contains rules that are used by the system to decide which recommendations and which advertising to present. The collected data as well as the rules are stored adaptively in order to be able to reflect changing behavior patterns. Over time, the system is able to narrow down the interests of the user more precisely by drawing on the growing amount of accumulated data. This way, sources of information about specific subject areas can be inferred from one user and recommended to another with matching interests.

Similar to the Swicki developers, Claria stresses the improved targeting possibilities offered by its system. As Claria representative Scott Eagle declared in an interview: “In effect, what search engines do is wrap target content with relevant advertising, and with PersonalWeb we’ll be wrapping targeted personalized content with relevant advertising.” (Sgambelluri, 2006).

The more information the system is able to gather about an individual user, the more specific recommendations it is able to make.

The more information the system is able to gather about an individual user, the more specific recommendations it is able to make. The same holds for advertising, which is displayed on the personalized Web page. Since the system is installed as a stand–alone application on the user’s computer, there is no need for a combinatory cookie analysis across a large network of affiliated sites. Instead, the complete user behavior is mapped client–side. In its corporate information, Claria describes the advantages of this system for advertisers:

“Axon can provide a robust understanding of consumer interests by understanding their behavior across the Web. While other systems can provide glimpses into consumer usage patterns from behavior on a site or two, Axon illuminates the most complete view of a consumer’s online interests resulting in more than a 15–fold increase in the ability to automatically personalize for the consumer.” (“Axon White Paper”, 2007).

According to Claria, this “ability to automatically personalize” also improves click–through–rates for the displayed advertisements by up to 15 times (“Axon White Paper”, 2007). The application meets at least two of Turow’s criteria for Customer Relationship Media. Firstly, advertising is integrated tightly into the mainstream of media activities, since the personalized home page, in a way once envisaged by portals [15], serves as the starting point for each online session and is displayed each time a browser window is opened. Since the displayed advertisements fit the interests of the user, they are especially effective in creating a sense of connectedness between the advertiser and the user. Secondly, the stored user data provides extremely rich material that can be drawn upon to fine–tune sales pitches in real–time.

While PersonalWeb concentrates on the user’s online behavior as a basis for recommendations, other recommender systems create their recommendations from the documents stored on the user’s computer. Watson, an application described by Budzik and Hammond (1999), uses text documents to automatically generate search queries and to present contextually relevant information and advertising.

In recent personalization strategies, personal behavior on the one side and personal documents on the other merge into a singular data corpus to be analyzed by the system. The Microsoft research project PSearch seeks to integrate all documents stored on the user’s hard disk as well as past search queries and visited URLs into the personalization process (Dumais, 2006; Teevan, et al., 2005). Such forms of personalization raise especially serious privacy concerns, since they include a wide range of personal data in their analyses. The PSearch developers claim to address this problem by transferring the complete personalization process to the client’s computer. However, since a growing number of personal documents are created and stored server–side via services such as Google Docs and Spreadsheets (www.google.com/docs), these boundaries are starting to disappear.

As an example, the service “Interesting items”, which is part of Google Web History (www.google.com/history) is a small–scale recommender system that suggests URLs based on the user’s search and browsing activities. The necessary data is collected from both the personal search history and the click streams during the time the user is logged on. Technically, there is no reason why personal documents stored in associated accounts for Google Docs and Spreadsheets and Gmail should not be included in this system to make more specific recommendations. Neither is there a technical reason not to accompany such recommendations with highly targeted advertising.

Keenoy and Levene (2005) suggested, with reference to Liu, et al. (2003), that behavior patterns (so–called “episodes”) could be identified to improve the presentation of search results. This notion seems very close to the behavioral patterns developed in behavioral targeting systems. Selling such “information episodes”, comprised of search query, personal profile and current user behavior to advertisers would mean the epitome of the commercial exploitation of user data that is currently underway. Discriminatory sales pitches could then be fine–tuned in such an exact way that they might even find ways around users’ psychological avoidance strategies.

 

++++++++++

7. Conclusions

The discussion has shown that the merging of online search and online marketing into a singular technical and economical system has already progressed past its initial stages. The crucial factor for the economic success of such a system is the amount of user data that can be collected. Web search can deliver more relevant results and advertising can be more effective in addressing the interests of individual users. By drawing on user data collected explicitly as well as implicitly, personalized search engines deliver the technical infrastructure to advance this development.

The discussion of social search and recommender systems in terms of Customer Relationship Media has shown that these technologies offer unmatched conditions to pave the way for advertisers to reach their customers. By matching the users’ information needs very closely to corresponding advertisements, they are able to create an especially strong feeling of connectedness. By analyzing personal information, search queries and online behavior in real–time, sales pitches can be fine–tuned in ways unprecedented by other forms of advertising.

The development of personalized search implies an expansion of surveillance in the interests of commercial actors. Advertisers are faced with increased social differentiation, resulting in difficulty in effectively reaching larger audiences. Personalized search provides a tool that helps to render consumer behavior predictable again. Importantly, this effect is achieved without impeding the process of differentiation itself. There is no need to force users into any kind of behavioral pattern. On the contrary, the systems activate their users by providing means of interaction with the software. However, in the midst of all this vibrant activity, companies like Eurekster and Claria have gained the means to acquire extensive knowledge about their users and to reach each individual with exactly targeted commercial messages. Since more data means more revenue in this scenario, it seems inevitable that users’ data autonomy remains the weakest link in this chain for the foreseeable future. End of article

 

About the author

Theo Röhle is a PhD candidate in media culture at Hamburg University. His dissertation seeks to establish Actor–Network–Theory and Foucauldian concepts of power within search engine research. His research interests include knowledge politics, network theory and surveillance and privacy in the context of new media. More information can be obtained at www.netzmedium.de.

 

Acknowledgments

An earlier version of this paper has been published in the German online journal kommunikation@gesellschaft. It can be retrieved at http://www.soz.uni-frankfurt.de/K.G/B1_2007_Roehle.pdf.

 

Notes

1. Fabos, 2006, p. 189.

2. Slater, 1997, p. 63 f.

3. Turow, 2006, p. 295 f.

4. Haggerty and Ericson, 2000, p. 615 f.

5. Hier, 2003, p. 402.

6. Rotenberg, 2007, p. 98.

7. A report by the Center for Democracy and Technology (Schwartz and Cooper, 2007) listed some improvements in search engine privacy practice, e.g., shorter data retention periods. However, to date, the service AskEraser developed by the search engine Ask (www.ask.com) seems to be the only technical implementation that actually provides the users themselves with some degree of control over their data.

8. Gross in Battelle, 2005, p. 106 f.

9. Battelle, 2005, p. 28.

10. Decker in Klaasen, 2007, p. 3.

11. This became evident when 20 million search queries were released to the public by AOL researchers in August 2006. Even without the actual IP addresses, the combination of queries made it possible to identify an individual user in at least one case (Barbaro and Zeller, 2006).

12. Spink and Jansen, 2004, p. 77ff.

13. A more detailed overview of various re–ranking methods can be found in Keenoy and Levene (2005).

14. For example, Google AdSense (www.google.com/adsense), Context Web (www.contextweb.com) or Chitika (www.chitika.com).

15. Miller, 2004, p. 175.

 

References

Sarabjot Singh Anand and Bamshad Mobasher, 2005. “Intelligent Techniques for Web Personalization,” In: Bamshad Mobasher and Sarabjot Singh Anand (editors). Intelligent Techniques for Web Personalization. Berlin: Springer, pp. 1–36.

“Axon White Paper,” at http://www.claria.com/assets/white-papers/claria-white-paper-axon.pdf, accessed 16 August 2007.

Michael Barbaro and Tom Zeller, Jr., 2006. “A Face is Exposed for AOL Searcher No. 4417749,” New York Times (9 August), at http://www.freepress.net/news/16973, accessed 16 August 2007.

John Battelle, 2005. The Search. How Google and its rivals rewrote the rules of business and transformed our culture. New York: Portfolio.

Andrei Broder, 2002. “A taxonomy of Web search,” SIGIR Forum, volume 36, number 2, at http://www.acm.org/sigir/forum/F2002/broder.pdf, accessed 16 August 2007.

Jay Budzik and Kristian Hammond, 1999. “Watson: Anticipating and Contextualizing Information Needs,” at http://citeseer.ist.psu.edu/budzik99watson.html, accessed 16 August 2007.

Susan Dumais, 2006. “SIGIR 2006 PIM Workshop Position Paper: Interfaces for Combining Personal and General Information,” at http://pim.ischool.washington.edu/pim06/files/dumais-paper.pdf, accessed 16 August 2007.

Eric Enge, 2007. “Are Vertical Search Engines the Answer to Relevance?” at http://searchenginewatch.com/showPage.html?page=3624377, accessed 16 August 2007.

“Eurekster. About us,” at http://eurekster.com/about/technology, accessed 16 August 2007.

“Eurekster. swicki FAQs,” at http://swickihome.eurekster.com/faqs.htm, accessed 16 August 2007.

Bettina Fabos, 2006. “The Commercial Search Engine Industry and Alternatives to the Oligopoly,” Eastbound, volume 1, number 1, pp. 187–200, at http://eastbound.eu/journal/2006-1/contents/fabos/060109fabos.pdf, accessed 16 August 2007.

Daniel C. Fain and Jan O. Pedersen, 2005. “Sponsored Search: a Brief History,” Bulletin of the American Society for Information Science and Technology, volume 32, number 12, at http://www.asis.org/Bulletin/Dec-05/pedersen.html, accessed 16 August 2007.

Kevin D. Haggerty and Richard V. Ericson, 2000. “The surveillant assemblage,” British Journal of Sociology, volume 51, number 4, pp. 605–622.

Sean P. Hier, 2003. “Probing the Surveillant Assemblage. On the dialectics of surveillance practices as processes of social control,” Surveillance & Society, volume 1, number 3, pp. 399–411, at http://www.surveillance-and-society.org/articles1(3)/probing.pdf, accessed 16 August 2007

Jeff Howe, 2006. “The Rise of Crowdsourcing,” Wired, volume 14, number 6 (June), at http://www.wired.com/wired/archive/14.06/crowds.html, accessed 16 August 2007.

Kevin Keenoy and Mark Levene, 2005. “Personalisation of Web Search,,” In: Bamshad Mobasher and Sarabjot Singh Anand (editors). Intelligent Techniques for Web Personalization. Berlin: Springer, pp. 201–228.

Yashmeet Khopkar, Amanda Spink, C. Lee Giles, Prital Shah and Sandip Debnath, 2003. “Search engine personalization: An exploratory study,” First Monday, volume 8, number 7 (July), at http://www.firstmonday.org/issues/issue8_7/khopkar/, accessed 16 August 2007.

Abbey Klaasen, 2007. “The right ads at the right time — via Yahoo; Web giant looks to offer behavioral–targeting tools outside its own properties,” Advertising Age (5 February), p. 3.

Jiming Liu, Chi Kuen Wong and Ka Keung Hui, 2003. “An adaptive user interface based on personalized learning,” IEEE Intelligent Systems, volume 18, number 2, pp. 52–57.

Danielle Long, 2007. “The Revolution Masterclass on behavioural targeting,” Revolution (1 February), p. 62.

Anthony G. Martin and Eugene Veteska, 2006. U.S. Patent No. 7,149,704: System, method and computer program product for collecting information about a network user.

Vincent Miller, 2004. “Stitching the Web into Global Capitalism: Two Stories,” In: David Gauntlett and Ross Horsley (editors). Web.Studies. Second edition. London: Arnold, pp. 171–184.

Anna Papadopoulos, 2006. “Yahoo’s New Behavior,” at http://www.clickz.com/showPage.html?page=3620586, accessed 16 August 2007.

Lee Rainie and Jeremy Shermak, 2005. “Data Memo: Search Engine use November 2005,” at http://www.pewinternet.org/pdfs/PIP_SearchData_1105.pdf, accessed 16 August 2007.

Boris Rotenberg, 2007. “Towards Personalised Search: EU Data Protection Law and its Implications for Media Pluralism,” In: Marcel Machill and Markus Beiler (editors). Die Macht der Suchmaschinen. The Power of Search Engines.. Köln: Herbert von Halem Verlag, pp. 87–104.

Mike Sachoff, 2007. “Schmidt: Google’s Beyond Search Now,” at http://www.webpronews.com/topnews/2007/04/10/schmidt-googles-beyond-search-now, accessed 16 August 2007.

Ari Schwartz and Alissa Cooper, 2007. “Search Privacy Practices: A Work In Progress,” at http://www.cdt.org/privacy/20070808searchprivacy.pdf, accessed 16 August 2007.

Mario Sgambelluri, 2006. “Q&A With Claria’s Scott Eagle,” at http://www.imediaconnection.com/content/8914.asp, accessed 16 August 2007.

Chris Sherman, 2006. “Who’s who in social search,” at http://searchenginewatch.com/showPage.html?page=3623173, accessed 16 August 2007.

Don Slater, 1997. Consumer Culture and Modernity. Cambridge: Polity Press.

Amanda Spink and Bernard J. Jansen, 2004. Web search: Public searching on the Web. Boston: Kluwer Academic Publishers.

Jaime Teevan, Susan Dumais and Eric Horvitz, 2005. “Personalizing Search via Automated Analysis of Interests and Activities,” Proceedings of the 28th Annual ACM Conference on Research and Development in Information Retrieval (SIGIR ’05), at http://research.microsoft.com/~horvitz/SIGIR2005_personalize.pdf, accessed 16 August 2007.

Joseph Turow, 2006. “Cracking the Consumer Code. Advertisers, Anxiety, and Surveillance,” In: Kevin D. Haggerty and Richard V. Ericson (editors). The New Politics of Surveillance and Visibility. Toronto: University of Toronto Press, pp. 279–307.

Fred Vogelstein, 2007. “As Google Challenges Viacom and Microsoft, Its CEO Feels Lucky,” Wired (9 April) at http://www.wired.com/techbiz/people/news/2007/04/mag_schmidt_qa?currentPage=all, accessed 16 August 2007.

Richard Waters, 2007. “Act Two: how Google is muscling its way into the advertising mainstream,” Financial Times, London Edition (19 January), p. 1.

 


 

Editorial history

Paper received 17 August 2007; accepted 21 August 2007.


Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 2.0 Germany License.

Desperately seeking the consumer: Personalized search engines and the commercial exploitation of user data by Theo Röhle
First Monday, volume 12, number 9 (September 2007),
URL: http://firstmonday.org/issues/issue12_9/rohle/index.html





A Great Cities Initiative of the University of Illinois at Chicago University Library.

© First Monday, 1995-2014.