Using the Web to Examine the Evolution of the Abortion Debate in Australia 2005-2015

<h1>

Twitter and Facebook. Our research also demonstrates the difficulty of distinguishing behavioural change on the web relating to social phenomena (such as changing attitudes or policies relating to abortion) from technology-induced behavioural change (resulting from the emergence of social media, for example).

<h1>2. Abortion in Australia
Abortion has been widely available in Australia since the early 1970's. While available, it was still legislated through various criminal codes rather than through health legislation until the 1990s. In 2015, abortion is legal in most states but is still a highly contested and controversial issue, although the debate does not have the same heat as that in the US (Albury, 1999). In 2004 the then Federal Health Minister, Tony Abbott, declared an "abortion epidemic" in Australia. While disputed by some, little data were available at the time to respond to this claim. This led to a parliamentary library report on abortion data collection (Parliamentary Library, 2005). Wyatt and Hughes (2009) argue that conservative politics enabled and encouraged a resurgence of the abortion debate in 2004. However, Siedlecky (2005), a long time pro-choice commentator, insists that the debate never really goes away. McLaren (2013) suggests that the debate in Australia continues because the position of each side of the debate does not change. She argues this is because the debate is grounded in symbolism and emotion which leads to the language around the arguments also remaining unchanged over time.
Abortion is legislated by individual states and territories and there is a lot of inconsistency across the different jurisdictions. In the 10-year period being examined in this chapter, there has been legislative change in Victoria, Queensland (twice) and in Tasmania. One high profile court case also occurred in this period. In April 2009, a 19-year-old Cairns woman was charged for procuring her own miscarriage. Her partner was charged for assisting her. The case was heard in the Cairns District Court in October 2010, where the jury brought down not guilty verdicts in both charges. The charges related to the use of RU486 1 .
At the federal level there has been a senate debate on transparency in advertising (a bill designed to ensure consumers are aware of the abortion stance of pregnancy counselling services). There has also been a senate debate about removing funding subsidies for 2 nd trimester abortions. In 2006 the ban on using the drug known as RU486 was lifted paving the way for the provision of a medical alternative to surgical abortion. In 2012 RU486 was registered by the Therapeutic Goods Administration and made available under the Pharmaceutical Benefits Scheme 2 in 2013. Since 2005 there has been continuous debate and legislative change surrounding the provision of Abortion services in Australia. Federal changes to the availability and provision of RU486 are evident in the analysis in this chapter.

<h1>3. Hyperlink network and text content analysis -some background
In this section, we provide some context for the two analytical techniques used in this chapter.

<h2>3.1 Hyperlink network analysis
Hyperlink network analysis involves the construction of a network where the nodes are websites or web pages and the connections between nodes are hyperlinks. In the present research we focus on networks of websites, since we are more interested in mapping the inferred connections between organisations or groups who are involved in the abortion debate, rather than the connections between individual resources (web pages). 1 RU486 is the common name for the abortion drugs Mifepristone and Misoprostol.
2 The Pharmaceutical Benefits Schemes makes pharmaceutical products available at subsidised prices.
Social scientists recognised the potential of hyperlink networks to provide insights into society in the first phase of the development of the web, what is now known as the Web 1.0 era. 3 However, while Jackson (1997) argued that the idea of using "a methodology based on the metaphor of a network to examine a communication medium based on the metaphor of a web seems to be so obvious that it threatens to be trivial", the author was not convinced that concepts and methods from social network analysis (SNA) could successfully be implemented with hyperlink networks. In particular, Jackson had concerns whether nodes in a hyperlink network (pages or sites) could reasonably be described as social actors and also whether a hyperlink network could satisfy one of the core assumptions of SNA -the interdependence of actors. Other authors have similarly expressed caution about the potential use of hyperlink networks for social science research, with Park and Thelwall (2003) noting that hyperlink data can be used to "...potentially discern fingerprints of social relations...", and Brügger (2012) has suggested that hyperlink data may need to be supplemented by other data and methods (for example, interviews) in order for a website to be equated with a node and a hyperlink to be equated with a tie, and social network analysis techniques applied.
The evolution of the web from Web 1.0 to Web 2.0, where there is a blurring of the distinction between webmasters and users with social media services such as Facebook and Twitter enabling nontechnical people to both produce and consume content, has led to broader interest from social scientists in the web as a source of data for social network research. Ackland (2009) noted that while both theoretical and methodological concerns can make it challenging to regard an unobtrusively-collected hyperlink network as a social network, many of these concerns are not present in the case of networks derived from social network services such as Facebook.
3 A typical Web 1.0 website provides content (often reflecting organisational goals, background, services etc.) that does not change regularly and does not allow a lot of interactivity.
It is now possible to describe a typology of online networks, and the place of hyperlink networks within this typology. For example, Ackland and Zhu (2015) identify two dimensions of ties in online networks (Table 1): directionality refers to whether a tie between any pair of nodes is directed versus undirected, while manifestation refers to the substantiality of the relations between nodes, with active acts (e.g., invitation, acceptance) leading to explicit ties, while implicit ties are more inferred (e.g., cooccurrence or interactions). The typology leads to four categories or types of online networks.
Networks which are the closest to the classic notion of social networks result from explicitly undirected ties, i.e., friendships that require mutual consent to establish (Facebook is an example). Explicitly directed ties involve a one-way, public (or broadcast) mode of relations among users (Twitter is an example). Implicitly undirected ties are inferred by social network analysts post hoc, based on semantic similarity (e.g., co-usage or co-occurrence of keywords or tags) between pairs of nodes (the Flickr photo tagging site is an example). Finally, implicitly directed ties can be extracted from the interactions of people in newsgroups or blogs; these ties are implicit because while a person might reply or respond to another person in a newsgroup, such "opinion exchanges" are really only inferred connections between the people. Hyperlinks between websites are also examples of implicitly directed ties, since their existence implies a connection between the sites (or the organisations running the sites) but the exact nature of the connection is generally unknown to the researcher (in the context of large-scale unobtrusive data collection).
[ Table 1 about here] Hyperlinks have been described as the "essence of the Web" (Jackson, 1997;Foot et al., 2003) and their implicit nature means that various interpretations have been ascribed to the existence of a hyperlink between two websites. At one level, a hyperlink can be thought of purely in terms of information provision and hence a sign of authority (Kleinberg, 1999) or trust (Davenport & Cronin, 2000) regarding the information on the page that is being hyperlinked to and the author of the information (the website owner). However, in the context of debate or contention over a social issue (such as abortion), it is also relevant to think of hyperlinks as reflecting communicative or strategic choices (Rogers & Marres, 2000), organizational alliance building and message amplification (Park, Kim & Barnett, 2004), tools for the construction of information public goods in the context of collective action (Fulk et al, 1996;Shumate & Dewitt 2008) and online collective identity (Ackland & O'Neil, 2011).
We note that all these potential interpretations of the meaning of a hyperlink imply that the tie has positive affect, that is, the sender of the hyperlink is attempting to confer some positive benefit on the receiver of the hyperlink. As noted by authors such as Brügger (2012), in the absence of some form of analysis of the text surrounding the hyperlink (for example, the "anchor text"), it is problematic to assume that a hyperlink has positive affect. However, it is technically challenging to conduct such text analysis in the context of large-scale unobtrusive data collection. In their study of the hyperlink networks of refugee and asylum seeker advocacy groups in Australia, Lusher and Ackland (2011) purposely did not include government websites such as the Commonwealth Department of Immigration in the analysis (even though this site was connected to advocacy group websites) precisely for the reason that they wanted to remove negative affect relations from the network (e.g. advocacy groups linking to government policies they do not agree with), which would have complicated the interpretation of hyperlinks in that study.
In studying hyperlink networks of actors who are participants in debates over social or political issues, the focus has been directed to assessing the extent of connections between opposing sides in the debate. Adamic and Glance's (2005) "Divided they blog" study found evidence of marked polarisation in the US political blogosphere. Hargittai, Gallo, and Kane (2008) also found substantial polarisation among political blogs but no evidence that this was increasing over time, leading the authors to refute the existence of cyberbalkanization -a fragmenting of the online population into narrowly-focused groups of individuals who share similar opinions (Putnam 2000;Sunstein 2001).
Another interesting aspect of this research has been whether conservatives and liberals display similar levels of political homophily, or the tendency to connect with actors of similar political persuasion. Adamic and Glance (2005) found some evidence that conservative weblogs tended to cite other conservative weblogs more frequently than liberal weblogs cited other liberal weblogs: "Through...visualizations, we see that right-leaning blogs have a denser structure of strong connections than the left, although liberal blogs do have a few exceptionally strong reciprocated connections." (Adamic and Glance 2005, p. 40) Ackland and Shorish (2014) did not find evidence of marked differential political homophily using the 2004 blog data collected by Adamic and Glance (2005), but with a replication of the Adamic and Glance dataset collected in 2011, Ackland and Shorish found that a conservative weblog was around 8 times more likely to hyperlink to another conservative weblog, while a liberal blogger was only about 4 times more likely to link to another liberal blogger.

<h2>3.2 Text content analysis
The second analytical approach used in this chapter is quantitative analysis of web content, and it is useful to first briefly summarise the key features of this approach.
First, it is necessary to identify the population that is being studied and the sampling approach (this point is also relevant to the construction of the hyperlink network). If the objective of the study is to only collect content from websites of organisations that have an offline presence, then it may be possible to sample from, for example, official registers of such organisations (e.g. a register of nonprofit organisations). However in many examples of Web 1.0 research it is not possible to identify the population from which a sample is being drawn. Authors such as Lusher and Ackland (2011) and Ackland and O'Neil (2011) have identified samples of websites using techniques similar to those proposed by Rogers and Zelman (2002) for researching "issue networks": entering key words or terms into search engines to identify relevant websites and then using a web crawler to iteratively discover other relevant websites (this is an example of what Rogers and Zelman refer to as "public trust logics"finding groups commonly linked to by players trusted to be important in the debate). 4 This technique of using a search engine and web crawler to construct a sample of websites is a form of snowball sampling and it must be emphasised that it does not lead to a representative sample. Caution is therefore required when making inferences about the underlying population, based on analysis of the sample.
Second, one needs to decide whether the focus is on the manifest content (content that exists objectively and unambiguously in the text i.e. what the author actually wrote) or the latent content (content that is more conceptual and not directly observed in the text i.e. what the author meant). Social scientific quantitative web content analysis will often involve latent content; for example Ackland et al.
(2010) conducted a principal components analysis of content from websites of organisations involved in nanotechnology (manufacturing, research, commercialisation) and found three main discourses or orientations: an industrial or proactive discourse (focusing on business, investment and opportunity), a science or education discourse, and a social or critical discourse (stressing health risks and the need for political discussion). 4 A web crawler is software that automatically traverses a web site, in a manner similar to the way a human user enters the homepage of a website, and then clicks internal links to visit other parts of the website. The crawler can be designed to collect and store text content and hyperlinks (both internal and external) from each page it visits.

<h1>4. Constructing the sample of websites via Google search results
We attempted to use exactly the same approach for data collection in both 2005 and 2015. Our first step was to search Google using the query "abortion Australia", and collect the top 500 pages returned. The 2005 data were collected in October 2005 (Ackland and Evans, 2005) and the 2015 data were collected in June of that year.
When we collected the 2005 data, Google was the dominant search engine and while there are now significant competitors in the search space (e.g. Bing), Google is still the dominant search engine today. 5 We chose to use Google as our starting point because in 2005 search engines were, and still are today, a first step for many people who are seeking information. Hence, we contend that by using Google we are constructing a sample of websites that people searching for information on abortion are most likely to encounter. 6 In our present example, it is not possible to identify the population of websites run by organisations engaged in the abortion debate in Australia, and instead we are using the Google search engine to identify web pages which Google ranks as being relevant to the topic, and then we identify our sample of websites (the "seed sites") from the list of returned web pages. A second step would be to use the 5 According to Experian Hitwise, Google Australia was the top-ranked website in January 2016 with a 11.2% share of traffic, and no other search engine is in the top-10 (source: http://www.experian.com.au/hitwise/online-trends.html, accessed 27 th January 2016).
6 These search results would most likely have been affected by Google search customisations associated with the location (based on IP address) of the computer which was used for the search (there are national, and potentially even sub-national differences in search results). There is also a chance the browse and search history of the computer used for conducting the search could have impacted on the search results. We note these potential biases in the search results, but it was beyond the scope of this chapter to investigate their magnitude and significance. web crawler to identify further websites relevant to the study -a website that sends a hyperlink to or receives a hyperlink from one of our seed sites might also be run by an organisation or group engaged in the abortion debate in Australia (even if it was not in the list of sites returned by Google). 7 As noted above, it is important to mention that our sample is not representative of the underlying population (which in this case, cannot be identified).
It should also be noted that while our search query was designed to locate web pages focused on the issue of abortion in Australia, we placed no restriction on the actual geographic location of the website or the organisation running the website. That is, we did not restrict Google to return pages only from websites in Australia (based on either IP address or country code top-level domain) and similarly, we did not attempt to identify (using the whois service, for example) the geographic location of the organisation. Thus, as will be clear below, some of the sites in our sample are not Australian, but they are still participating in the abortion debate in Australia via the nature of the content hosted on their websites.
As discussed above, while our unit of data collected is the web page, our analysis is conducted at the level of the website, and our search resulted in 343 unique websites in 2005 and 376 websites in 2015. We identified websites using the hostname part of the URL. For example, Family Planning New South Wales (NSW) has two web pages that were collected in the 2015 Google search: http://www.fpnsw.org.au/374118_8.html, and http://www.fpnsw.org.au/144423_8.html; in the analysis, these pages were collapsed to a single website, based on the hostname: www.fpnsw.org.au. 8 7 We did not take this step here because our initial Google search was fairly extensive and we expect that including additional sites into the analysis is unlikely to qualitatively impact on the research findings.
8 We acknowledge that the use of hostnames is a somewhat rudimentary way of representing websites (and indeed groups or organisations). For example, it could be that a single organisation has We classified the websites according to abortion stance (Table 2) and the type of site or the organisation/group running the site (Table 3). The classification of abortion stance was done manually.
Each site was viewed and a judgement was made about the stance of the organisation. This was relatively easy in most cases however some required discussion, and further investigation of the organisation or the host organisation. Generally, websites hosting academic articles were classified as neutral unless the article listed in the Google search results was clearly pushing one side of the debate.  Table 2). The proportion of sites that are neutral in the abortion debate was roughly constant between the two years, but the proportion of unrelated sites increased markedly from 3% in 2005 to 22% in 2015. This was due to a large increase in the amount of spam pages (containing unrelated content) and "attack" pages (pages or sites identified by Google as potentially hosting malicious code designed to steal private information or otherwise damage computer systems) that appeared in the Google search results. Table 3 shows that the sites ranked by Google as being related to abortion in Australia became less academic (falling from 17 to 6% of the returned sites) and more commercial (growing from 3 to 20% of sites) over the last 10 years. The proportion of media sites increased from 12% to 20%. It is also more than one sub-domain (e.g. subdomain1.website.com and subdomain2.website.com) and both of these hostnames would be present in the dataset. Another problem is that different organisations could share a hostname (e.g. that of a commercial web hosting company), and these different organisations would then effectively be merged into a single data point. Casual inspection of our data lead us to conclude that this is not a major problem, in that it would not impact qualitatively on our results. notable that the presence of political parties and politicians declined; in 2005 there were 8 political party sites in the Google search results, but only 3 sites in 2015. The change for politician websites was even more marked, falling from 4 in 2005 to none in 2015. Finally, there was a significant decline in religious presence, with the proportion of sites belonging to religious organisations halving from 9% to 4% and the proportion of religious media sites also declining (from 5% to 3%).
The above analysis was of all sites returned by the search query (which collected the first 500 search results), but the reality of search behaviour is that most people do not search beyond the first couple of pages of search results. In order to better assess the visibility of different participants in the abortion debate in Australia (and changes thereof in the past 10 years), Table 4 shows the top 20 sites [ Table 4 about here]

<h1>5. Hyperlink network and website text content analysis
The VOSON software (Ackland, 2010) incorporates a web crawler, which was used to collect hyperlink and website text content data (meta keywords, body text) in both years. The 2005 hyperlink and text data were collected in October 2005, while the 2015 data were collected in June 2015. So, while this research involves analysis of historical web data (from 2005), the data were collected and archived by the authors using the VOSON software in 2005, rather than via access to institutional repositories of archived web data (we return to this in the discussion section below).
The first step in the data collection involved setting the crawler parameters such that the crawler would visit each of the "seed" pages returned by the Google searches, collect text content from each page, and then leave the page. That is, in this first step, the crawler was set so it would not iteratively crawl throughout the entire website, but only collect text content from the seed page. This was done for practical reasons (the version of VOSON in 2005 was more limited in the amount of text content it could store) but also for methodological reasons: the Google search engine has returned these pages because they contain text content relevant to the topic of abortion in Australia, and by allowing the crawler to collect text content from other pages in the website, this is likely to introduce irrelevant text content into the analysis (this is known as "topic drift" in information retrieval).
The second step in the data collection was the collection of hyperlinks. While text content was collected from all the seed pages, hyperlinks were only collected from the seed pages identified as belonging to websites that are participants in the abortion debate (i.e. either pro-life or pro-choice websites). Again, this was done in order to prevent "topic drift" -by crawling sites deemed irrelevant to the research topic, we would simply be collecting hyperlink data that would not be used in the research -and also as a means of preserving bandwidth resources. The VOSON crawler only collected outbound hyperlinks, and the crawler stopped when it had collected either 1,000 links to external pages or else had crawled 100 internal pages.

<h2>5.1 Network-level analysis
As discussed in Section 4, our unit of analysis is the website rather than the web page, and this affects the construction of the hyperlink networks. Specifically, the crawling process results in a network of web pages, but a data processing step reduces this to a network of websites where, as was the case with the Google search data discussed above, nodes in this research are websites (identified by hostname) rather than web pages. Thus, in the case of Family Planning NSW, this organisation had 248 web pages in the hyperlink network of web pages (the two seed pages discussed above, and 246 pages that the VOSON crawler identified are being hyperlinked to by various seed pages), however in the network of websites this organisation is represented by a single node: www.fpnsw.org.au which reflects all the connections to and from pages in this website. This process of "collapsing" from a network of pages to a network of websites results in a significant reduction in the scale of the data. While the 2005 (2015) full network of pages (by "full", we mean it contains all the seed pages identified by the Google searches and all the new pages identified by crawling these pages) contains 40,776 (71,644) nodes, as show in Table 5, the corresponding full network of websites contains only 13,240 (6,192) nodes.
[ Table 5 about here] Table 5 shows key network statistics for four networks for each of the two years: the full network, the participant network (pro-life and pro-choice sites), and separate networks for each of the pro-life and pro-choice groups. 9 The first thing to note is that the size of the full network halved between 2005 and 2015 (from 13240 to 6192 nodes) and it also became more disconnected, with the number of connected components (sets of nodes that are connected) increasing from 3 to 27 and inclusiveness (the proportion of non-isolated nodes as a proportion of total network size) falling from 99.4% to 97.9%.
The conclusion is that over the past 10 years, pro-life and pro-choice sites collectively significantly reduced the number of hyperlinks they make to other sites.
The decline in hyperlinking activity is even more apparent when we consider the subnetworks for participants (pro-life and pro-choice), and for these networks we can also see a marked decline in network density, which is the number of ties as a proportion of the total possible number of ties that could exist. Researchers such as Adamic and Glance (2005) have found some evidence that conservative actors create denser online networks, compared with their liberal counterparts. As shown in Table 5, the network densities for 2005 for the pro-life and pro-choice subnetworks were very similar (0.0306 for pro-choice subnetwork, compared with 0.0319 for pro-life subnetwork). However when isolates have been removed, there is some evidence that the pro-life network is more densely connected, with pro-life sites in 2005 creating 4.28% of the hyperlinks that potentially could be created and pro-choice sites only creating 3.75% of the potential hyperlinks. This difference remained in 2015 (at least as calculated for the networks with isolates removed). Table 5  The changes in the participant subnetwork are visually apparent in Figures 1 and 2. In these visualisations, node size is proportional to indegree and node colour reflects abortion stance (pro-life is red, pro-choice is blue). The force-directed graphing algorithm has produced clusters are very clearly demarcated according to abortion stance, a visual representation of the existence of homophily in hyperlinking behaviour.
[ Figures 1 and 2 about here]

<h2>5.2 Prominent sites
There are many different node-level metrics that can used to identify nodes that are taking significant or prominent roles within a network. In this chapter we focus on the simplest of these measures: indegree (number of inbound hyperlinks) as a measure of visibility and outdegree (number of outbound hyperlinks) as a measure of activity. Table 6 shows the top-20 sites by indegree in the full hyperlink networks for the two years. The most striking (but not unexpected) finding is the rise of social media; in 2005 Twitter, Facebook and YouTube either did not exist or had been barely launched, while in 2015 these were the top-3 sites in terms of indegree. 10 These sites are prominent because abortion-related sites are providing links to their accounts on social media (e.g. "follow us on Twitter") but these sites are also providing links to resources such as videos on YouTube. Media sites became prominent over the last 10 years, with the number of media sites in the top-20 increasing from 5 to 7, and Australian media sites are relatively more highly ranked in 2015, compared with 10 years ago.
[ Table 6 about here] The apparent decline of the Web 1.0 presence of pro-life groups identified above is reinforced by   Table 6 that point to general changes in the web that have occurred over the past decade.
For example, two sites that were popular for hosting small websites run by individuals and groups (geocities.com, aol.com) were in the top-20 in 2005 but are no longer providing this service in 2015 (for more on Geocities, see Milligan 2016). It is also notable that in 2005 the second ranked site was adobe.com but in 2015 this site does not make the top-20 as PDFs are ubiquitous and website owners no longer feel the need to provide a link to the Adobe PDF reader.
10 The reader may wonder why, in Table 6, facebook.com and youtube.com are classified as "neutral" while twitter.com is classified as "unknown". The reason is that pages from Facebook and Youtube appeared in the 2015 Google searches, and these websites were classified as "neutral" since the companies hosting the sites are not participants in the abortion debate. In contrast, Twitter did not feature in the Google search results (but it was picked up from the web crawl), and hence it was not classified.  [ Table 7 about here] Finally, Table 8 shows the top-20 sites on the basis of outdegree in the full network and it is apparent that while pro-life sites have declined relatively in terms of numbers of sites, they are still active in terms of their linking behaviour, with half of the sites in the top-10 being pro-life (in 2015 6 of the top-10 sites were pro-life). From this we can surmise that the relative decline in the visibility of pro-life sites on the web is more due to the decline in numbers of sites, rather than a decline in the number of hyperlinks being created.

<h2>5.3 Text analysis
Text analysis further deepens our understanding of the patterns described above. The text analysis presented here only involves manifest content (we do not attempt to discern latent content). We focus on what text content is prevalent on abortion-related websites (frequency analysis) and whether these keywords or terms are related to the type of organisation behind the website (pro-choice or pro-life).
The text analysis involves two types of text extracted from the web pages: "meta words" are words extracted from the page meta data (keywords, title, description), and "page words" are words extracted from the body of the web page. In the case of meta words, if a website owner used a pair of words in the meta keyword section of the web page (for example, "abortion clinic") then the pair of words is treated as a single term (i.e. it will appear as "abortion_clinic" in the text analysis). However with the page words, only single words are used in the analysis, that is, "abortion clinic" would be split into two words "abortion" and "clinic". The other thing to note is that for the analysis the words "abortion" and "australia" were excluded since they were likely to be appearing on all of the sites, given the search query, and hence do not add to the analysis. 11 Two types of visualisations are used. 12 Word clouds are a random placement of the words, with frequency reflecting the number of times the word appeared across all of the sites in the group (prochoice or pro-life). Comparison clouds provide a means of comparing across groups, by placing the word clouds for both groups on the same page and, importantly, they only display the words that are unique to each group.
In 2005 there was a noticeable difference in the meta words used by pro-life and pro-choice websites (Figures 3 & 4).The word cloud for pro-choice meta words is dominated by the words health, women, pregnancy, clinic, rights, information, whereas the word cloud for the pro-life meta words is dominated by Catholic, life, prolife, Christian, human, news, family. This shows the obvious association with religion and religious pages linked with the pro-life movement. In 2015 the difference in the type of words still exists (Figures 5 & 6). However, the websites of both the pro-life and prochoice sides are using fewer meta words. This likely reflects a change in behaviour of webmasters in response to the fact that meta keywords are no longer as important as they used to be for ensuring appropriate search engine ranking, since search engines now make use of page text (and indeed, other information such as click through behaviour in search results), in addition to meta words.
[ Figures 3, 4, 5 and 6 about here] 11 We also used a word "stop list" to ensure that commonly used words (e.g. "and", "but", "the") were not included in the analysis.
12 The visualisations were created using the tm and wordcloud packages in the R statistical software.
For reasons of space, the word clouds for the page words are not displayed, but they follow a similar pattern to what was found with meta keywords, in terms of the comparison between pro-choice and pro-life sites. The pro-choice page words emphasise the service and health nature of pregnancy termination (services, access, public, safe, women, right, health). On the other hand the pro-life page words are more focused on the individual (will, women, children, life, human, child, time). The overall number of page words in the word clouds does not decrease between 2005 and 2015, unlike what we found for meta keywords, and this supports our contention that the reduction of meta keywords was response of webmasters who no longer saw them as being necessary for good search results.
As noted above, the comparison clouds highlight the differences between the language of the two sides of the abortion debate by only displaying the words that are different on each side of the debate. was still yet to occur (Figure 9). The unique pro-life page words in 2005 include religious references, death, babies, human and cancer. The comparison cloud in 2015 shows an even greater focus of prochoice sites on services relating to abortion while, as was found for the meta keywords, the pro-life sites present a more diffuse set of words with no apparent major themes (Figure 10).
[ Figures 9 and 10 about here] The harvesting of meta-and page-words provides the opportunity to add a depth of understanding of the differences between types of organisation on the web that cannot be gained with hyperlink analysis on its own. The analysis here shows that pro-life and pro-choice groups use different words and have a different focus on the content of their web sites. Pro-choice sites are dominated by information about services, whereas pro-life sites focus on religious beliefs about abortion. A qualitative analysis of these web sites has not been conducted and would be a fruitful endeavour, but one that is beyond the scope of this chapter. We do note, however, that our results are similar in nature to those found in the US and Germany in an analysis of newspaper text (Ferree et al 2002) and in Australia (McLaren 2013) through an analysis of the use of foetal images.
Additionally, the analysis shows the decreasing use of meta words over time as organisations change their web behaviour in light of changing search engine technology. In general, we discerned that the pro-life "message" became relatively more diffuse over the past 10 years.

<h1>6. Discussion and conclusions
Ten years is a long time, especially on the web. Over the past 10 years there have been changes in the social issue in Australia that we have focused on (abortion), but there have been even greater changes in the technological space which is the source of data for our analysis. It is a challenge for us to be able to distinguish changes that originate in the behaviour of the actors we are studying (participants in the abortion debate) and changes that relate to the technological space in which they operate (the web).
One of the marked changes in the web over the past 10 years is that it has become even more commercially oriented, and this is reflected in the number of commerce-related sites appearing in the Google search results. In 2005, abortion drugs had not been approved for use in Australia and so the Google search in that year tended to return pages and sites that were focusing on abortion as a social and policy issue. In contrast, after 10 years of legal access to abortion drug and services the 2015 search results returned many more sites that were providing access to these services and drugs (and during this period, there has been a marked commercialisation of the web). We also noticed a marked increase in the number of spam and attack pages in the Google results. The second potential reason why pro-life presence has declined on Web 1.0 is purely related to the social issue of abortion. In Australia the abortion debate has largely been won by the pro-choice side, with abortion services legally and widely available. We contend that for this reason, many pro-life groups have largely left the abortion battleground and are instead focusing their attention on current social issues that are still in policy contention, such as marriage equality. Meanwhile, many pro-choice groups are still active on Web 1.0 because they are involved with service provision, for example, and require a (Web 1.0) web presence for those activities.
Another notable finding is the decline in political parties and individual politicians as a presence in the abortion debate on Web 1.0. As with the above, it is difficult to ascertain the reason why political party and politician web pages did not appear in the Google search in 2015. Is it because in 2005 abortion was a policy issue and hence parties and politicians were making statements while in 2015, abortion is no longer a political issue? Or is it because of technological change, with parties (and in particular politicians) in 2015 focusing on cheaper (and potentially more effective) social media channels rather than Web 1.0 websites?
Our final comments are about methodology. Historical analysis of Web 1.0 hyperlink networks is challenging because in order to construct large-scale hyperlink networks from web archives, it is necessary that these archives allow crawlers or else provide publicly available application programming interfaces (APIs) so that the hyperlink network data can be programmatically extracted at scale. There does not exist an Australian web archive with such capabilities and hence, we could not have conducted the research presented in this chapter without having crawled the live web at both time points (2005 and 2015) i.e. effectively creating a purpose-built archive of hyperlink and website text data. Thus, historical hyperlink network analysis typically requires researchers to collect snapshots from the live web over time.
Conducting comparable web research 10 years apart is not straightforward, even when (as in this case) one of the authors is the lead developer of the web crawl software that was used. One technical challenge we faced was that while in 2005 it was possible to use the Google API to find all 500 pages that mentioned our search criteria, the 2015 version of the Google API only allows one to return the first 100 search results (even if one is prepared to pay for API access). So that meant we needed to manually copy and paste the Google search results.
Finally, the fact that we use Google also needs noting. Google was the dominant search engine in 2005 and it is still the dominant search engine in 2015. However, Google is not synonymous with the web (even Web 1.0) and so our finding that, for example, the composition of the sites related to abortion in Australia became markedly more commercial over the past 10 years could simply reflect a change in Google's ranking algorithm i.e. it is promoting more commercial websites, and not that the web itself has become more commercial. However, we regard the latter as being likely to be true.        Note: pro-life -red, pro-choice -blue. Node size is proportional to indegree. Note: pro-life -red, pro-choice -blue. Node size is proportional to indegree.