Does social media users' commenting behavior differ by their local community tie? A computer-assisted linguistic analysis approach
First Monday

Does social media users' commenting behavior differ by their local community tie? A computer-assisted linguistic analysis approach by Weiai Wayne Xu, Liangyue Li, Michael A. Stefanone, and Yun Fu

This study is an exploratory attempt to use automatic linguistic analysis for understanding social media users’ news commenting behavior. The study addresses geographically–based dynamics in human–computer interaction, namely, users’ tie to a geographic community. Specifically, the study reveals that commenting behavior differs between users of different levels of local community tie. Comments by local users, those with higher level of local community tie, exhibit different linguistic patterns in comparison to national users who are less involved in local community. The linguistic differences are reflected in the use of pronouns, personal pronouns, social words, swear words, anxiety words and anger words. We argue that identification of the difference is crucial in the practice of mining social media conversations for public opinion.


Community ties and media use
Computer–assisted linguistic analysis
Case study: Hydro–fracking
Discussion and conclusion




Social media and social networking sites (SNSs) in particular, provide venues for public discussions about a wide range of topics. It is common practice for organizations, news media included, to use social media to distribute content, to interact with their audiences. User–generated content in the form of comments in response to information provided by organizations are potentially rich data sources that can be analyzed with the goal of gauging collective thinking and public opinion (Walker, et al., 2012).

Given the massive amount of user–generated data available today, analysis of public conversations mediated by social media is usually handled by automated textual analysis tools. Automated text analysis allows extraction of meanings from online interactions. Automated text analysis can be used to examine or even predict user attributes such as gender, cultural identity and personality (Tausczik and Pennebaker, 2010). Automated text analysis can be also used to analyze community attributes such as well–being of community members (Wang, et al., 2012). Particular to our interest, automated text analysis has been applied in the context of public opinion analysis. For example, researchers used digital conversations to understand public sentiment towards traumatic events such as the events in September, 2001, in New York (Back, et al., 2011). In addition to public sentiment, automated text analysis can also reveal what concepts, keywords, and contextual issues are frequently mentioned in public discussion of current events (Russell, et al., 2011). The application of automated text analysis for public opinion analysis has real–world implications. For instance, automated text analysis can provide data used to predict election results (Tumasjan, et al., 2010). Automated text analysis also tracks the change of public understandings of certain issues and identifies stages when people are targetable for persuasive communication (Russell, et al., 2011).

While automatic analytic technology may reveal comprehensive pictures of public opinion about a range of topics, it may be the case that social media conversations differ systematically based on the demographic and socio–economic backgrounds of the conversation participants. For example, Thelwall, et al. (2010) found females’ comments on SNSs were characterized by more positive, emotion–based language. Similarly, Guiller and Durndell (2007) found that conversation by females in online discussion groups exhibited higher levels of agreement, social support and emotionality, opposed to male users.

This evidence suggests that in order to better evaluate the opinions expressed online via automated textual analysis, users should first be categorized into increasingly specific groups. Subsequent analyses should then be able to detect systematic differences in these conversations. For example, SNSs users typically provide information about themselves through profile pages including their age, birthday, sex and political orientation. This generally public demographic and socio–economic information may be an effective way to categorize users along those dimensions.

However, access to profile information could be restricted due to users’ privacy settings (Debatin, et al., 2009). Mining users’ profile information may also raise privacy concerns (Debatin, et al., 2009). Alternatively, a less intrusive approach to infer users’ characteristics would be to focus on where they engage in conversation.

User characteristics are systematically different across different social media sites. For example, White and Asian Americans are more likely to use Facebook in comparison to Hispanics who are more likely to use MySpace (Hargittai, 2007). In addition, social media choice is also associated with gender differences (Hargittai, 2007). Hargittai and Litt (2011) found that age, gender, ethnicity, and interest in specific type of news are associated with Twitter use.

In this study, we propose that another way to characterize differences in online conversations is based on the level of community ties SNSs users exhibit. In the current research, community ties represent the extent to which individuals are connected to others in their physical community and involved in the issues that affect that community. We propose that the level of community ties has a systematic relationship with the kinds of online communities where they choose to participate. For example, research shows that people who are more closely tied to their local communities tend to use more traditional local media (i.e., local newspapers; Hoffman and Eveland, 2010; McLeod, et al., 1996). Today, local news organizations are also building an online presence via social media to facilitate public discussion about local issues. These platforms are local online communities based on shared interest in local affairs. Therefore, people who are more tied to their local communities use more local media and might also use more local online communities powered by local news organizations. Take Buffalo, N.Y. as an example. The local newspaper, Buffalo News, local television networks WIVB or WGRZ, and local NPR station WBFO all have fan pages on the popular SNS Facebook. It is logical to expect that people who comment on these local news organizations’ Facebook pages are more closely tied to Buffalo, N.Y., opposed to people who comment on Facebook pages hosted by national news outlets such as CNN, Fox News, and the New York Times. Ultimately the goal of this research is to examine whether users with different levels of community ties produce different discourse concerning issues in the public sphere, while controlling for the demographic variables discussed above.



Community ties and media use

The concept of community is complex and can be characterized along multiple dimensions. Gusfield (1975) distinguished community as a geographical term referring to specific neighborhoods such as countries, towns and cities, as well as a relational term (e.g., professional community, or spiritual communities). In addition, communities are developed around shared interests (e.g., an academic community; Durheim, 1964; Stacey, 1969). Despite the multiple dimensions, the traditional definition of community was always locally strained (Gruzd, et al., 2011). New communication technology challenges the traditional concept of community (Gruzd, et al., 2011). Clearly the Internet brings together people who share interests yet are geographically dispersed. These Internet–based interactions and subsequent relationships are the foundation of virtual communities (Gruzd, et al., 2011). Because online communication is not restricted due to geographic and temporal constraints (Hiller and Franz, 2004), scholars argued that geographic proximity and offline social ties — the two factors that define traditional community — are less salient in virtual communities (Wellman, 1979; 1988).

However, empirical evidence and daily experience have shown that the importance of physical locality in online interactions persists (Gruzd, et al., 2011; Takhteyev, et al., 2012). On one hand, Internet use facilitates off–line engagement in local communities (Haythornthwaite and Kendall, 2010). On the other hand, online communities form based on locality–specific factors. For example, Takhteyev, et al. (2012) found that distance, national borders and language differences all affected the formation of online communities on Twitter. Further, Kang (2009) found that migrants used Internet–based communication tools to reproduce the cultural and social environments of home countries.

Rothenbuhler (2001) refers to the emotional and cognitive bonds with a geographically bound community as community ties. The emotional bond with a community is called community attachment which can be conceptualized as one’s psychological closeness and connection to a physical community. The cognitive bond with the community is defined as community involvement (Rothenbuhler, 2001) which includes attending (awareness), orienting (thinking about community issues), agreeing (sharing concerns), connecting (talking and listening to others), and manipulating (working for change) in the community (Hoffman and Eveland, 2010).

Community ties are also associated with traditional mass media use. Specifically, research shows that individuals who have strong community ties tend to consume more local media (Hoffman and Eveland, 2010; McLeod, et al., 1996). Early sociological studies found that individuals who had greater integration into their local communities tended to focus more on local issues, and considered local news as more valuable than national news (McLeod, et al., 1996). While locally oriented individuals used more local media, individuals with cosmopolitan orientations tended to favor national media (McLeod, et al., 1996). Thus, local media use is a function of interest in local issues (Mesch and Talmud, 2010).

Further, shared exposure to local news increases connectedness to local community which stimulates increased interest in local news (McLeod, et al., 1996; Mesch and Talmud, 2010). Additionally, one’s closeness to a geographically bounded community is also related to social media use. For example, studies have confirmed that local clustering tends to be very high in online social networks (Ugander, et al., 2011). This is to say that people tend to associate more with local friends in online social networking. Research also shows that SNS users localize social networking sites by using these tools for primarily local social interactions (Khalid and Dix, 2010). In addition, people who are physically separated from their local communities use SNSs to keep in touch with their homes (Komito and Bates, 2009)

As a common practice, local news media rely on SNSs to distribute content and engage audiences. This is one of the many ways traditional mass media organizations are diversifying and adapting to new media systems. Today, individuals can follow local news on their favorite SNS and they can also comment on the text– and video–based content provided by local media on SNSs. Users’ comments on local news may reflect users’ shared interest and concerns for local affairs. We argue that the shared interest and concerns form a basis for community. To use Benedict Anderson’s (1983) notion of imagined communities, traditional mass media create shared awareness of public affairs and the shared awareness brings together people who are geographically dispersed and not connected by social ties. With social media, people not only share awareness of public issues but also collectively engage in discussion of the issues. The shared interest and concerns in local affairs, along with the collective commenting on the issue, may form the basis for a geographically based online communities powered by local news media.

Moreover, local media typically target audiences in specific regions. Shared concerns and interest in local affairs, will also reflect one’s ties to local community. Thus, those who follow and engage local media on SNSs are likely to reside in the region. Therefore we propose that those individuals who follow and comment on local news sites via SNSs (local users) likely have stronger community ties than those who follow and comment on national news sites (national users). Furthermore, commenting on local news content made available via SNSs is analogous to having conversations in local settings (i.e., conversations with fellow community members). Logically, the perceived geographic proximity and social distance influence local online conversations just as they would influence local off–line conversations (Tom Tong and Walther, 2011).

We concede that there are various ways to infer about SNS users’ community ties. For instance, users reveal location–based information through the details they provide on their profile pages. The location information can be systematically gathered for distinguishing local users from national users. However, as explained above, access to information about physical location may be intrusive or otherwise restricted. Moreover, this self–provided information may not accurately reveal one’s actual interest and involvement in a community. A former resident of a community may continue to follow events occurring in that community. Users who have significant others living in a community may follow news of that community. These users may show stronger community ties which are hard to infer from their self–disclosed location information. Therefore, we consider that use of SNSs for local news is actually a better indicator of community ties than the self–provided information about physical location often found on SNS profile pages.

To summarize, we consider SNSs use for reading and commenting on local news the basis for forming geographically–based online community. We also suggest that consuming and commenting on local news via SNSs is a proxy for local community ties. This is because local media in SNSs bring together users who are geographically proximate to one another as well as users who share common interests and concerns.

If the rationale outlined above is accurate and valid, then systematic differences in local and national conversations about events in the public sphere should be evident. Further, these differences should be a function of the intensity of community ties the conversation participants have. Our goal is to explore the utility of using automated, computer–assisted linguistic analyses to aid in the identification of these differences.

Local news media and community involvement

Because local media use correlates with higher levels of community involvement, local news media themselves likely reflect their own local community involvement through localized reporting. Obviously, local media appeal to their audiences by focusing on issues and events relevant to their geographic area. For example, Gartner (2004) found that local media focused on local causalities when reporting international conflicts. The “localizing” effect helps transform distant, foreign events into relevant issues for local audiences. This localizing effect was also observed in coverage of Congressional legislators (Schaffner and Sellers, 2003). Kaniss (1991) suggested that local news coverage of environmental groups often focused on local economic growth (Andrews and Caren, 2000).

To conclude, SNSs use for reading and commenting on local news will indicate social media users’ local community ties. In the current study, we are interested in detecting behavioral differences between users with various levels of local community ties. Linguistic characteristics of user–generated comments will reflect users’ and media outlets’ various levels of local community ties. Linguistic characteristics are inferred from uses of certain words. Automated, computer–assisted linguistic analysis examines word frequencies and is able to reveal subtle differences in linguistic styles.



Computer–assisted linguistic analysis

Language use is associated with individual attributes. For example, Freud (1995) saw slips of the tongue as revelations of a person’s hidden intentions. Psychologists have also examined writing samples to better understand psychological dispositions (Tausczik and Pennebaker, 2010). For example, descriptions of inkblots and stories told in response to drawings are related to individual’s needs, intentions, and motives (Tausczik and Pennebaker, 2010).

Early text analyses were done using human coders (Tausczik and Pennebaker, 2010). However, more recently software applications have been developed for automated examination. The first linguistic analysis program, called General Inquirer, relied on author–defined coding schemes for assessing personality dimensions (Tausczik and Pennebaker, 2010). Some pointed out that the author–defined coding scheme was less transparent because of manipulations of language variables (Tausczik and Pennebaker, 2010). More recent developments in software favor more straightforward approaches like calculating word frequencies. Linguistic inquiry and word count (LIWC), used in the current study, is one such application.

LIWC, developed by Pennebaker and colleagues, calculates frequency of word use across 80 predefined linguistic categories. The central idea behind LIWC is that words have specific psychological meanings which was derived from Weintraub’s (1989) reasoning that the use of pronouns and articles in everyday language reflect one’s inner mind. To further expand Weintraub’s reasoning, LIWC associates different dimensions of language use with gender, personality traits, social status, cultural background, and age (Tausczik and Pennebaker, 2010). The dimensions of word use reveal psychological constructs (e.g., affect words, cognition words), personal issues (e.g., leisure, work, anxiety), and linguistic style (e.g., the use of pronouns or verbs).

Prior research using LIWC has produced ample results concerning differences in language use across gender, culture and situations. Specifically, females tended to use more social language whereas males used more complex language (Tausczik and Pennebaker, 2010). Extroversion is associated with the use of fewer large words, more social words (i.e., family, friends, etc.), more positive emotion, and less negative emotion, opposed to introverts (Tausczik and Pennebaker, 2010). Older people used less self–focus language (Tausczik and Pennebaker, 2010). In addition, LIWC captures linguistic differences across cultures as reflected in politeness and social closeness (Tausczik and Pennebaker, 2010). LIWC studies also linked linguistic characteristics to specific communication process such as deception, impression management, persuasion, social support seeking, negotiation, etc. (Tausczik and Pennebaker, 2010).

Today, our conversations are more often computer–mediated and broadcast to global audiences (Stefanone and Jang, 2007). This complicates the nature of our conversations and their linguistic features. Therefore, more and more studies use LIWC to examine computer–mediated interactions (Thelwall, et al., 2010). For example, on the individual level, LIWC predicts individual behavior like blogging (Yarkoni, 2010), online negotiation (Hine, et al., 2009) and self–disclosure through social networking sites (Pfeil, et al., 2009). On the societal level, LIWC is often applied to evaluating public sentiment (Walker, et al., 2012). The goal of the current research is to explore whether ties to physical community can be used to predict linguistic features of mediated conversations. We present conversations around the energy–related issue of hydraulic fracturing as an example of how local and national conversations differ systematically.



Case study: Hydro–fracking

Hydraulic fracturing, also known as hydrofracking (or, fracking), is a technique which uses highly pressurized fluids to release petroleum and natural gas from layers of rock beneath the surface of the earth. While fracking makes it possible to utilize formerly inaccessible energy resources (Howarth, et al., 2011), it is controversial because this practice raises a range of environmental concerns. Environmental risks associated with fracking include the possible contamination of ground water, air pollution, and leaking of chemicals to the earth surface (Howarth, et al., 2011). Fracking is active or being debated in states throughout the northeast and Midwest and is an issue particularly salient to residents in western New York state including the Buffalo–Niagara Falls metropolitan area.

News media have framed hydrofracking as both an environmental and political issue. In early 2011, the New York Times published an investigative report exposing the potential environmental hazard of hydrofracking and the Wall Street Journal reported that different communities across upstate New York and Pennsylvania hold different views towards fracking. Opposition is particularly strong in western New York (Kiernan, 2012).

Linguistic characteristics of community involvement

As discussed above, community involvement is a function of one’s cognitive investment in local community affairs. Community involvement, operationalized as a type of cognitive involvement, is likely reflected in one’s language use. Language reveals how individuals processe and interpret information obtained from the environment (Tausczik and Pennebaker, 2010). Past linguistic studies have associated word use with cognitive investment. Specifically, the depth and complexity of thinking is reflected in the use of exclusion words (e.g., but, without, exclude), conjunctions (e.g., and, also, although), and prepositions (e.g., to, with, above), cognitive mechanisms (e.g., cause, know, ought), and words greater than six letters (Tausczik and Pennebaker, 2010). Use of these words indicates that communicators are providing concrete and complex information.

Recall that we consider SNS users commenting on local news is a function of stronger local community ties. Thus, we expect that people with stronger local community ties are more involved in community affairs and should be more highly cognitively involved in their conversations. Therefore, these conversations should reveal linguistic characteristics signaling more deep and complex thinking. Thus,

H1: In discussions related to hydro–fracking, local users use more (a) exclusion words, (b) more conjunctions, (c) more prepositions, (d) more cognitive mechanisms, and (e) more words greater than six letters, opposed to national users.

Given local users’ community ties, local issues likely affect them personally. Therefore, community involvement should translate into self–focus when users discuss local issues. In linguistic studies, self–focus language use is a form of attention allocation that is reflected in language use (Tausczik and Pennebaker, 2010). Specifically, the use of pronouns and personal pronouns reflects communicator’s focus of attention (Tausczik and Pennebaker, 2010). Studies have shown that people tend to use more first person pronouns when they describe events which affect them personally (Tausczik and Pennebaker, 2010). Instead of talking about issues in detached and de–personalized ways, their language use should be increasingly personal. We hypothesize that

H2: In discussions of hydro–fracking, local users use more pronouns, opposed to national users.

In addition, LIWC provides word categories that reflect one’s involvement in issues. For instance, anxiety and anger words are associated with personal concern and attention. Given the opposition to fracking in western New York (Kiernan, 2012), we hypothesize the following:

H3: In discussions of hydro–fracking, local users use (a) more anxiety words, (b) more swear words, and (c) more anger words, opposed to national users.

Community attachment and linguistic characteristics

Recall that community attachment is a function of emotional links to communities (Rothenbuhler, 2001). Community is typically defined by social relations (Hoffman and Eveland, 2010). We propose a link between community attachment, emotionality and interpersonal relationships in language use. According to Tausczik and Pennebaker (2010), emotionality is reflected in the use of positive emotional words (e.g., love, nice, sweet) and negative emotional words (e.g., hurt, ugly, nasty). Use of emotional words is a measure of degree of immersion (Tausczik and Pennebaker, 2010). It is expected that, people with stronger community ties are more immersed in events occurring in that community, and thus reveal more emotions through language. Thus,

H4: In discussions of hydro–fracking, local users use more emotional words, opposed to national users.

Because community is defined by social relations, emotional ties with a community will reflect ties with people in that community. A local event may affect the users as well as those in the community who have ties with the users. Social words refer to social roles (e.g., child and mate) and social process (e.g., talk). We expect that people attached to community are more invested in social ties in the community. Therefore their language will reflect a focus on social ties and social process. Thus,

H5: In discussions of hydro–fracking, local users used more social words, opposed to national users.

Geographic proximity likely enhances the anticipation of future interaction, and the anticipation of future interaction should lead to more polite and socially desirable communication (Tom Tong and Walther, 2011). Perceived geographic proximity may also lower social distance, leading to more polite and sociable conversation (Tom Tong and Walther, 2011). Therefore, we expect local conversations to be increasingly pro–social and polite. Community attachment may also indicate group identity. Linguistically, Tausczik and Pennebaker (2010) suggest that the use of first–person plural pronouns signals group identity. Thus,

H6: In discussions of hydro–fracking, local users use more first–person plural pronouns, opposed to national users.

Gender, culture, and linguistic characteristics

Gender is rooted in the biological distinction of sex. Gender is also a social construction as societies assign roles based on biological sex (Eagly, 1987). Much of the research on gender as culture focuses on revealing behavioral differences between genders and how social forces contribute to these differences (Yates, 1997). One important gender difference lies in language use.

Maltz and Borker (1982) first proposed the gender–as–culture hypothesis by examining male and female language use. According to Gudykunst and Kim (1984), gender differences in communication can be thought of as subcultural differences. Men and women are seen as socialized into distinct subcultures with unique attitudes about the way they communicate. Culture can also be conceptualized along individualistic and collectivist dimensions (Hofstede, 1980) and there is evidence that the communicative features favored by males and females differ along the individualistic and collectivistic division (Mulac, et al., 2001).

Following the gender–as–culture argument, prior LIWC studies have tapped into gender differences in linguistic characteristics. For example, studies found that females tend to use more personal, positive, social, and supportive language than males (Brownlow, et al., 2003; Kapidzic and Herring, 2011). Specifically, female uses more self–referent pronouns while male uses more de–individualized language including larger words, articles, passive, and third–person speech (Brownlow, et al., 2003). Women are also found of using more words that describe social acts including appreciation and apology and ‘hug’ (Kapidzic and Herring, 2011). In addition, women tend to use language that shows positive emotion whereas men use more aggressive and violent words such as swear words and words describing violent acts (e.g., killing; Kapidzic and Herring, 2011). All of these differences are evident online and off–line (Thelwall, et al., 2010).

In the current study, we are also interested in the function of gender on linguistic behavior. Given the potential harm of hydro–fracking to one’s personal well–being and the welfare of friends and family, we expect that female language will signal greater levels of concern. In addition, given the salience of social relations in local conversations, we further expect that local female users’ conversations will indicate more personal and social concerns than national male users. Recall that self–referent pronoun (e.g., I) and anxiety words indicate personal involvement. Also recall that the use of social word indicates one’s concerns over social ties and others.

Thus, H7: In discussions of hydro–fracking, local female users use (a) more self–referent pronouns, (b) more anxiety words, and (c) more social words, opposed to national male users.

Data gathering and text preparation

We mined all posts and users’ gender information from Facebook pages of local and national news media through Facebook’s API. WIVB and WGRZ (local CBS and NBC affiliates in Buffalo, N.Y.), WBFO (Buffalo’s NPR station), and the Buffalo News (Buffalo’s daily newspaper) were selected to represent local news media. CNN, FOX News, the Wall Street Journal and the New York Times were selected to represent national–level news media. The complete dataset covers all content posted from 1 June 2010 through 1 June 2012. This timeframe corresponds to the growing public interest in and media attention to the issue as reflected in keyword search on Google News. Based on Google Trends (, searching for the term hydrofracking in the U.S. started to pick in January 2010 and peaked at November 2011. Searching for the term hydrofracking in New York picked up in April 2011 and the interest carried on until January 2012.

Keyword searches were conducted to filter comments about hydro–fracking. In total, 293 fracking–related posts were gathered from local news media pages. In all, 163 unique users commented about fracking on these local news pages. The national media dataset includes 154 posts and 134 unique users. The unit of analysis is each user who commented on the issue of hydro–fracking. Users were also coded for gender (0 = female, 1 = male). In our analysis, we applied the default LIWC2007 dictionary which includes 4,500 words grouped into 76 linguistic categories. LIWC assigns each word in the text to a specific linguistic category and normalize the total number of words in each category (Tausczik and Pennebaker, 2010).




From the local dataset, there were 76 (46.6 percent) female and 87 male users. They sent on average 1.8 comments and used 69 words per comment. The national dataset consisted of 72 female (53.7%) and 62 male users. Each national user sent an average of 1.15 comments and used 49 words. Table 1 presents summary statistics for users and their comment behavior.


Table 1: Descriptive statistics for local and national users.
Local 76 (Male) M=78.85 (SD=122.21) 293
87 (Female) M=56.86 (SD=70.31)
National 62 (Male) M=55.71 (SD=80.26) 154
72 (Female) M=43.43 (SD=64.28)


A series of one–way ANOVAs were used to test linguistic differences between local and national users. We applied log–transformation to normalize distributions.

Table 2 presents findings from the one–way ANOVA. Recall that hypothesis 1 address the uses of exclusion, conjunctions, prepositions, words of cognitive mechanisms, and words greater than six letters. Results show no significant differences between local and national users’ language as a function of their community ties. Thus, hypothesis 1 was not supported.


ANOVA for local and national user comparison


Hypothesis 2 addresses the use of pronouns and was supported, F (1,295)=4.87, p <.05. Local users (M=2.60, SD=.68) used significantly more pronouns than national users (M=2.42, SD=.71). Specifically, we found a significant difference in usage of personal pronouns between local users and national users, F (1,295)=9.20, p <.01, where local users (M=2.09, SD=.65) used significantly more personal pronouns than national users (M=1.86, SD=.65).

With regard to hypothesis 3, there was a significant difference in the use of swear, F (1, 295)=6.97, p <.01, anxiety word, F (1, 295)=7.03, p <.001, and anger words, F (1, 295)=9.00, p <.001. Local users (M=1.174, SD=.24) used significantly more swear words than national users (M=1.11, SD=.07). Local users also used significantly more anxiety words (M=1.22, SD=.32) than national users (M=1.14, SD=.16). Finally, local users used more anger words (M=1.36, SD=.49) than national users (M=1.22, SD=.28). Thus, hypothesis 3 was supported.

The results revealed no significant difference in the use of emotional words, thus hypothesis 4 was not supported. Hypothesis 5 addresses the use of social words. Results show that local users used more social words (M=2.31, SD=.67) than national users (M=2.10, SD=.71), F (1,295)=7.10, p <.01. Thus, hypothesis 5 is supported. Lastly, the results offer no support for hypothesis 6 which addressed the use of first–person plural pronouns.

Hypothesis 7 focuses on gender–based linguistic style and local community ties. To test the interaction between gender and community tie, we applied univariate ANOVA.However, the results offer no support for hypothesis 7.



Discussion and conclusion

This paper reports on an exploratory attempt to use automatic linguistic analysis for understanding commenting behavior on social media outlets regarding issues in the public sphere. We examined whether individuals who post to local and national media outlets about the issue of hydro–fracking express different levels of community tie, reflected by linguistically distinct narratives. This is an important area for research considering that user–generated comments on social media Web sites are potentially rich resources of data for evaluating opinion about a range of topics. The key distinction in the current research is that the linguistic characteristics of online conversations differ systematically based on user’s ties to their local community. Specifically, we hypothesized that those users who post on local media Web sites should show higher levels of community tie. We set out to test whether local users’ higher community tie is reflected in the linguistic characteristics of their online conversations.

We began by analyzing differences in language use which reflect cognitive engagement and complexity. It was hypothesized that local conversations would demonstrate use of exclusive, conjunctions, prepositions, cognitive mechanisms and words with more than six letters, more so than national conversations. However, this hypothesis was not supported. There does not seem to be any systematic difference in language use as reflected by the complexity of the conversations. Perhaps this is not surprising considering that individuals who consume news about public affairs and engage in conversations about those affairs are generally well educated. As such, they converse at the same level of complexity, regardless of their level of community engagement.

Next, we found that local users used more personal pronouns which are associated with personal involvement in conversations (Tausczik and Pennebaker, 2010). We expected this relationship because users posting on local media Web pages are more likely to be directly affected by the issue under discussion, opposed to those who post on national media sites. In this case, these users are personally invested in the outcomes of debates on hydro–fracking, and their language use is a reflection of this condition. We also hypothesized that local conversations should include more anxiety, swear, and anger-related language. This hypothesis was supported, and provides additional evidence that local conversations are more personal in nature. These are important distinctions to note when considering using online conversations as a proxy for indications of public opinion about issues. The results show that there are systematic differences in linguistic style which are a consequence of locality.

Although no differences were found for the level of emotional language use in online conversations, we found that local conversations were characterized by use of more social words, as expected. Recall that social words are a function of social ties and social/interpersonal behavior. Again, this style of language was expected because of the interpersonal relationships local users have with others living in the geographic areas (or, communities) affected by hydro–fracking. The potential health consequences associated with hydro–fracking heightens the level of concern local users have for their friends and family. The language use reflects user’s belief that the issues surrounding the practice of hydro–fracking are not restricted to just environmental issues. Rather, for these users the issue is viewed as increasingly personal and interpersonal. This is consistent with the use of more swear and anxiety–related words as well. This signifies concern over the range of consequences associated with hydro–fracking. It was also hypothesized that geographic proximity of local conversation participants would also result in increasingly pro–social and polite language and language signaling group identity, as reflected in use of first person plural. The results do not support this hypothesis.

Finally, we expected systematic differences in linguistic style based on gender. Recall that we operationalized gender as a cultural difference, and hypothesized that female conversations should exhibit higher levels of self–pronoun use, anxiety related language, and social language because of gender role differences. However, the results show that there were no significant differences between gender based on local vs. national comparisons. However, post hoc analysis shows that local female users used more first person pronoun, anger words, anxiety words, and swear words than national female users. This finding is well expected. Given the salience of personal and interpersonal impact of hydro–fracking in local community, female’s comments will reflect their personal involvement in the issue and concerns over personal and others’ well–being.

The current research is not without limitations. First, this study is restricted to online conversations about hydro–fracking. Although hydro–fracking is a salient environmental, social and political issue, it may well be distinct from other typical energy and political issues. The unique nature of hydro–fracking may result in public discourse systematically different from discourses concerning other issues. Future research should employ a similar approach used herein to a range of other issues in the public sphere.

Second, the relationship between local media use and one’s involvement in local community has not been empirically tested, although the results presented herein support that contention. It is possible that those who posted on national media Web sites also follow local media. It is also possible that local users followed national media for national news that could impact local communities. Even if the association between local media use and community involvement is empirically validated, there are potential confounds in this association. It could be that differences in local and national coverage of hydro–fracking drove differences in linguistic styles of users’ comments on the issue. Since previous studies showed that local media localized news events by emphasizing local impacts, local coverage of hydro–fracking may produce narratives that are more personally relevant and sensational to local users. Future studies should attempt to examine the media content to which users are responding to.

In addition, we observed that there were two forms of news commenting behaviors. One was non–directed commenting in which users posted feedback to news postings without specifying recipients of the feedback. In this case, it can be assumed that the user was simply replying to news media’s posting per se. Contrarily, users can direct their comments at someone who also commented on the news posting. In this kind of directed news commenting, users can address personally the issue to specified audiences. We suspect that the social nature of directed commenting may result in more uses of personal and social language. We suggest that future studies distinguish these two types of news commenting behavior.

Therefore, readers are advised to interpret our results with caution. Despite the limitations, our study points to the possibility of using automatic linguistic analysis for public opinion mining. Also our study points to the necessity of identification of users’ attribute in mining public opinion through social media data. End of article


About the authors

Weiai Wayne Xu is a doctoral student in the Department of Communication at the University at Buffalo, The State University of New York. He is interested in political engagement on social networking sites and social impact of technology use.
Direct comments to: weiaixu [at] buffalo [dot] edu

Liangyue Li is a doctoral student in the Department of Electrical and Computer Engineering at Northeastern University in Boston, Mass.
E–mail: shengli [at] ece [dot] neu [dot] edu

Dr. Michael A. Stefanone is Associate Professor in the Department of Communication at the University of Buffalo, The State University of New York, and Director of the Department’s Singapore Program.
E–mail: ms297 [at] buffalo [dot] edu

Dr. Yun Fu is Assistant Professor and Founding Director of the SMILE Lab in the Department of Electrical and Computer Engineering at Northeastern University.
E–mail: yunfu [at] ece [dot] neu [dot] edu



This work was supported by the U.S. Air Force Office of Scientific Research (AFOSR) [GRANT10768352].



B. Anderson, 1983. Imagined communities: Reflections on the origin and spread of nationalism. London: Verso.

K.T. Andrews and N. Caren, 2010. “Making the news movement organizations, media attention, and the public agenda,” American Sociological Review, volume 75, number 6, pp. 841–866.
doi:, accessed 26 December 2013.

M.D. Back, A.C. Küfner, and B. Egloff, 2011. “‘Automatic or the people?’ Anger on September 11, 2001, and lessons learned for the analysis of large digital data sets” Psychological Science, volume 22, number 6, pp. 837–838.
doi:, accessed 26 December 2013.

S. Brownlow, J.A. Rosamond, and J.A. Parker, 2003. “Gender–linked linguistic behavior in television interviews,” Sex Roles, volume 49, numbers 3–4, pp. 121–132.
doi:, accessed 26 December 2013.

B. Debatin, J.P. Lovejoy, A.–K. Horn, and B.N. Hughes, 2009. “Facebook and online privacy: Attitudes, behaviors, and unintended consequences,” Journal of Computer–Mediated Communication, volume 15, number 1, pp. 83–108.
doi:, accessed 26 December 2013.

A.H. Eagly, 1987. Sex differences in social behavior: A social–role interpretation. Hillsdale, N.J.: L. Erlbaum Associates.

S.S. Gartner, 2004. “Making the international local: The terrorist attack on the USS Cole, local casualties, and media coverage,” Political Communication, volume 21, number 2, pp. 139–159.
doi:, accessed 26 December 2013.

A. Gruzd, B. Wellman, and Y. Takhteyev, 2011. “Imagining Twitter as an imagined community,” American Behavioral Scientist, volume 55, number 10, pp. 1,294–1,318.
doi:, accessed 26 December 2013.

W.B. Gudykunst and Y.Y. Kim, 1984. Communicating with strangers: An approach to intercultural communication. Reading, Mass.: Addison–Wesley.

J.R. Gusfield, 1975. The community: A critical response. New York: Harper & Row.

J. Guiller and A. Durndell, 2007. “Students’ linguistic behaviour in online discussion groups: Does gender matter?” Computers in Human Behavior, volume 23, number 5, pp. 2,240–2,255.
doi:, accessed 26 December 2013.

S. Freud, 1995. The basic writings of Sigmund Freud. Translated and edited by A.A. Brill. New York: Modern Library.

E. Hargittai, 2007. “Whose space? Differences among users and non–users of social network sites,” Journal of Computer–Mediated Communication, volume 13, number 1, pp. 276–297.
doi:, accessed 26 December 2013.

E. Hargittai and E. Litt, 2011. “The tweet smell of celebrity success: Explaining variation in Twitter adoption among a diverse group of young adults,” New Media & Society, volume 13, number 5, pp. 824–842.
doi:, accessed 26 December 2013.

C. Haythornthwaite and L. Kendall, 2010. “Internet and community,” American Behavioral Scientist, volume 53, number 8, pp. 1,083–1,094.
doi:, accessed 26 December 2013.

H.H. Hiller and T.M. Franz, 2004. “New ties, old ties and lost ties: The use of the internet in diaspora,” New Media & Society, volume 6, number 6, pp. 731–752.
doi:, accessed 26 December 2013.

M.J. Hine, S.A. Murphy, M. Weber, and G. Kersten, 2009. “The role of emotion and language in dyadic e–negotiations,” Group Decision and Negotiation, volume 18, number 3, pp. 193–211.
doi:, accessed 26 December 2013.

L.H. Hoffman and W.P. Eveland, 2010. “Assessing causality in the relationship between community attachment and local news media use,” Mass Communication and Society, volume 13, number 2, pp. 174–195.
doi:, accessed 26 December 2013.

R.W. Howarth, A. Ingraffea, and T. Engelder, 2011. “Natural gas: Should fracking stop?” Nature, volume 477, number 7364 (15 September), pp. 271–275.
doi:, accessed 26 December 2013.

G. Hofstede, 1980. Culture’s consequences: International differences in work–related values. Beverly Hills, Calif.: Sage.

P. Kaniss, 1991. Making local news. Chicago: University of Chicago Press.

T. Kang, 2009. “Homeland re–territorialized: Revisiting the role of geographical places in the formation of diasporic identity in the digital age,” Information, Communication & Society, volume 12, number 3, 326–343.
doi:, accessed 26 December 2013.

H. Khalid and A. Dix, 2010. “The experience of photologging: Global mechanisms and local interactions,” Personal and Ubiquitous Computing, volume 14, number 3, pp. 209–226.
doi:, accessed 26 December 2013.

P.J. Kiernan, 2012. “An analysis of hydrofracturing gubernatorial decision making,” Albany Government Law Review, volume 5, number 3, pp. 769–914.

L. Komito and J. Bates, 2009. “Virtually local: social media and community among Polish nationals in Dublin,” Aslib Proceedings, volume 61, number 3, pp. 232–244.
doi:, accessed 26 December 2013.

D.N. Maltz and R.A. Borker, 1982. “A cultural approach to male–female miscommunication,” In: J.J. Gumperz (editor). Language and social identity. New York: Cambridge University Press, pp. 195–216.

J.M. McLeod, K. Daily, Z. Guo, W.P. Eveland, J. Bayer, S. Yang, and H. Wang, 1996. “Community integration, local media use, and democratic processes,” Communication Research, volume 23, number 2, pp. 179–209.
doi:, accessed 26 December 2013.

G.S. Mesch and I. Talmud, 2010. “Internet connectivity, community participation, and place attachment: A longitudinal study,” American Behavioral Scientist, volume 53, number 8, pp. 1,095–1,110.
doi:, accessed 26 December 2013.

A. Mulac, J.J. Bradac, and P. Gibbons, 2001. “Empirical support for the gender–as–culture hypothesis,” Human Communication Research, volume 27, number 1, pp. 121–152.
doi:, accessed 26 December 2013.

U. Pfeil, R. Arjan, and P. Zaphiris, 2009. “Age differences in online social networking — A study of user profiles and the social capital divide among teenagers and older users in MySpace,” Computers in Human Behavior, volume 25, number 3, pp. 643–654.
doi:, accessed 26 December 2013.

E.W. Rothenbuhler, 2001. “Revising communication research for working on community,” In: G.J. Shepherd and E.W. Rothenbuhler (editors). Communication and community. Mahwah, N.J.: Lawrence Erlbaum Associates, pp. 159–179.

M.G. Russell, J. Flora, M. Strohmaier, J. Poschko, and N. Rubens, 2011. “Semantic analysis of energy–related conversations in social media: A Twitter case study paper,” presented at the International Conference of Persuasive Technology (Persuasive 2011); version at, accessed 26 December 2013.

B.F. Schaffner and P.J. Sellers, 2003. “The structural determinants of local Congressional news coverage,” Political Communication, volume 20, number 1, pp. 41–57.
doi:, accessed 26 December 2013.

M.A. Stefanone and C–Y. Jang, 2007. “Writing for friends and family: The interpersonal nature of blogs,” Journal of Computer–Mediated Communication, volume 13, number 1, pp. 123–140.
doi:, accessed 26 December 2013.

Y. Takhteyev, A. Gruzd, and B. Wellman, 2012. “Geography of Twitter networks,” Social Networks, volume 34, number 1, pp. 73–81.
doi:, accessed 26 December 2013.

Y.R. Tausczik and J.W. Pennebaker, 2010. “The psychological meaning of words: LIWC and computerized text analysis methods,” Journal of Language and Social Psychology, volume 29, number 1, pp. 24–54.
doi:, accessed 26 December 2013.

M. Thelwall, D. Wilkinson, and S. Uppal, 2010. “Data mining emotion in social network communication: Gender differences in MySpace,” Journal of the American Society for Information Science and Technology, volume 61, number 1, pp. 190–199.
doi:, accessed 26 December 2013.

S. Tom Tong and J.B. Walther, 2011. “Just say ‘no thanks’: Romantic rejection in computer–mediated communication,” Journal of Social and Personal Relationships, volume 28, number 4, pp. 488–506.
doi:, accessed 26 December 2013.

A. Tumasjan, T.O. Sprenger, P.G. Sandner, and I.M. Welpe, 2010. “Election forecasts with Twitter: How 140 characters reflect the political landscape,” Social Science Computer Review, volume 29, number 4, pp. 402–418.
doi:, accessed 26 December 2013.

J. Ugander, B. Karrer, L. Backstrom, and C. Marlow, 2011. “The anatomy of the Facebook social graph,” arXiv (18 November), at, accessed 26 December 2013.

M.A. Walker, P. Anand, R. Abbott, J.E.F. Tree, C. Martell, and J. King, 2012. “That is your evidence? Classifying stance in online political debate,” Decision Support Systems, volume 53, number 4, pp. 719–729.
doi:, accessed 26 December 2013.

N. Wang, M. Kosinski, D.J. Stillwell, and J. Rust, 2012. “Can well–being be measured using Facebook status updates? Validation of Facebook’s gross national happiness index,” Social Indicators Research (February).
doi:, accessed 26 December 2013.

W. Weintraub, 1989. Verbal behavior in everyday life. New York: Springer.

B. Wellman, 1988. “Structural analysis: From method and metaphor to theory and substance,” In: B. Wellman and S.D. Berkowitz (editors). Social structures: A network approach. Cambridge: Cambridge University Press, pp. 19–61.

B. Wellman, 1979. “The community question: The intimate networks of East Yorkers,” American Journal of Sociology, volume 84, number 5, pp. 1,201–1,231.

T. Yarkoni, 2010. “Personality in 100,000 words: A large–scale analysis of personality and word use among bloggers,” Journal of Research in Personality, volume 44, number 3, pp. 363–373.
doi:, accessed 26 December 2013.

S.J. Yates, 1997. “Gender, identity and CMC,” Journal of Computer Assisted Learning, volume 13, number 4, pp. 281–290.
doi:, accessed 26 December 2013.


Editorial history

Received 2 August 2013; accepted 30 November 2013.

Copyright © 2014, First Monday.
Copyright © 2014, Weiai Wayne Xu, Liangyue Li, Michael A. Stefanone, and Yun Fu.

Does social media users’ commenting behavior differ by their local community tie? A computer–assisted linguistic analysis approach
by Weiai Wayne Xu, Liangyue Li, Michael A. Stefanone, and Yun Fu.
First Monday, Volume 19, Number 1 - 6 January 2014
doi: 10.5210/fm.v19i1.4821.

A Great Cities Initiative of the University of Illinois at Chicago University Library.

© First Monday, 1995-2017. ISSN 1396-0466.