Scanning the Science-Society Horizon
Abstract
Science communication approaches have evolved over time gradually
placing more importance on understanding the context of the
communication and audience.
The increase in people participating in social media on the
Internet offers a new resource for monitoring what people are
discussing. People self publish their views on social media,
which provides a rich source of every day, every person thinking.
This introduces the possibility of using passive monitoring of
this public discussion to find information useful to science
communicators, to allow them to better target their
communications about different topics.
This research study is focussed on understanding what open source
intelligence, in the form of public tweets on Twitter, reveals
about the contexts in which the word 'science' is used by the
English speaking public. By conducting a series of studies based
on simpler questions, I gradually build up a view of who is
contributing on Twitter, how often, and what topics are being
discussed that include the keyword 'science'.
An open source a data gathering tool for Twitter data was
developed and used to collect a dataset from Twitter with the
keyword 'science' during 2011. After collection was completed,
data was prepared for analysis by removing unwanted tweets. The
size of the dataset (12.2 million tweets by 3.6 million users
(authors)) required the use of mainly quantitative approaches,
even though this only represents a very small proportion, about
0.02%, of the total tweets per day on Twitter
Fourier analysis was used to create a model of the underlying
temporal pattern of tweets per day and revealed a weekly pattern.
The number of users per day followed a similar pattern, and most
of these users did not use the word 'science' often on Twitter.
An investigation of types of tweets suggests that people using
the word 'science' were engaged in more sharing of both links,
and other peoples tweets, than is usual on Twitter.
Consideration of word frequency and bigrams in the text of the
tweets found that while word frequencies were not particularly
effective when trying to understand such a large dataset, bigrams
were able to give insight into the contexts in which 'science' is
being used in up to 19.19% of the tweets.
The final study used Latent Dirichlet Allocation (LDA) topic
modelling to identify the contexts in which 'science' was being
used and gave a much richer view of the whole corpus than the
bigram analysis.
Although the thesis has focused on the single keyword 'science'
the techniques developed should be applicable to other keywords
and so be able to provide science communicators with a near real
time source of information about what issues the public is
concerned about, what they are saying about those issues and how
that is changing over time.
Description
Citation
Collections
Source
Type
Book Title
Entity type
Access Statement
License Rights
Restricted until
Downloads
File
Description