With the rise of Twitter bots in social and political spheres, their implications in scientific communication and altmetrics have become a concern. However, there are no large-scale studies that identify the population of bots and their impact on altmetrics. This quantitative study aims to analyse the presence and impact of Twitter bots in the dissemination of Social Science papers on Twitter and to explore the specific case of Information Science & Library Science (ISLS) as a case study. The overall presence of bots discussing Social Science papers has been found to account for 3.61% of users and 3.85% of tweets. However, this presence and impact is uneven across disciplines, highlighting Criminology & Penology with 12.4% of the mentions made by bots. In the specific case of ISLS, it has been determined by Kendall's correlation that mentions of bots have no impact on altmetrics.
The Elon Musk paradox: Quantifying the Presence and Impact of Twitter Bots on Altmetrics with Focus in Social Sciences
Wenceslao Arroyo-Machado*, Enrique Herrera-Viedma** and Daniel Torres-Salinas*
*wences@ugr.es; torressalinas@ugr.es
0000-0001-9437-8757; 0000-0001-8790-3314
Department of Information and Communication Sciences, University of Granada, Spain
**viedma@ugr.es
0000-0002-7922-4984
Department of Computer Science and Artificial Intelligence, Andalusian Research Institute of Data Science and Computational Intelligence, DaSCI, University of Granada, Spain
Abstract
With the rise of Twitter bots in social and political spheres, their implications in scientific communication and altmetrics have become a concern. However, there are no large-scale studies that identify the population of bots and their impact on altmetrics. This quantitative study aims to analyse the presence and impact of Twitter bots in the dissemination of Social Science papers on Twitter and to explore the specific case of Information Science & Library Science (ISLS) as a case study. The overall presence of bots discussing Social Science papers has been found to account for 3.61% of users and 3.85% of tweets. However, this presence and impact is uneven across disciplines, highlighting Criminology & Penology with 12.4% of the mentions made by bots. In the specific case of ISLS, it has been determined by Kendall's correlation that mentions of bots have no impact on altmetrics.
In the legal battle that Elon Musk triggered when he initiated the acquisition of Twitter, bots became one of the main workhorses (Conger, 2022). While Twitter advocated a reduced prevalence of such accounts, below 5%, for Musk they posed a serious threat and reached a presence of 33%1. A number that has in fact been studied before and which in previous studies has been estimated to be right in the middle of Twitter's and Musk's calculations at 15% (Varol et al., 2017). Despite disagreements about the presence of bots and Elon Musk's concerns about them, there have been only vague announcements via Twitter2 after the purchase of Twitter in October 2022 about the measures put in place to deal with this serious problem.
What is clear is that the problem of bots is certainly as complex, especially in the social and political sphere, as the concept is broad. A twitter bot is not a dangerous or malicious tool in itself, in fact its use is permitted and Twitter even offers a tag so that these accounts can be marked as carrying out automated activity. There are therefore different types of accounts that have the automation of their activity as a common point, some of these typologies being social bots, spambots, trolls or cyborgs (Gorwa & Guilbeault, 2020). The implication that these can have is therefore very varied, with the most negative and extreme cases being notorious, especially the dissemination of fake news, even influencing electoral processes (Bovet & Makse, 2019; Shao et al., 2018).
Bots have a strong presence in scientific dissemination and communication. For example, bots alerting of scientific publications in a particular research area are common (Robinson-Garcia et al., 2019; Ye & Na, 2020). But despite the negative connotations of bots, the truth is that in this domain there is no clear harmful activity around science as in the aforementioned cases. In this sense, it is worth pointing out one of the classic concerns about the use of these mentions as metrics in evaluative contexts, which is the ease with which they can be manipulated and thus produce a distortion of social attention (Thelwall, 2014). This is why their implications have been addressed, warning of the risk of inflation of these mentions, and differentiating between bots and cyborgs (Haustein et al., 2016; Robinson-Garcia et al., 2017). While the first type carries out non-selective activity by indiscriminately disseminating scientific publications, the second type combines human behaviour, for example by making comments. Despite the known presence of bots in science and their specific characteristics, there are no large-scale studies that identify the population of bots and their impact on altmetrics. Taking all of the above into account, from the point of view of scientific communication and altmetrics, the question is: is it possible to determine the effect of bots and what influence do they have?
In order to answer these questions, the main objective of this quantitative study is to analyse and measure the presence of bots in the dissemination of Social Science articles on Twitter. The specific objectives established to carry out this study are the following:
Objective 1. To quantify the overall presence of bots on Twitter that mention Social Science papers and the volume of mentions they represent.
Objective 2. To determine the presence and impact of Twitter bots for each of the Social Science disciplines.
Objective 3. Quantitative the presence an Influscience of bot in Information Science & Library Science as a case study.
This is a first approach to the global study of bots in scientific communication on Twitter by means of the use all the mentions concerning one of the major research areas. Ultimately, we offer a methodological framework for the identification of bots on Twitter and the study of their intervention in science to understand how this discussion is being affected by the participation of automated accounts.
The collection of bibliographic records and their Twitter mentions was carried out on 5 September 2022. All papers published between 2017 and 2021 indexed in the Science Citation Index (SCIE), Social Science Citation Index (SSCI) and Art & Humanities Citation Index (AHCI) were retrieved from Web of Science, a total of 9,141,593 papers. These publications were classified into the 31 subject categories relating to the Social Sciences, General ESI (Essential Science Indicators) field, following the scheme proposed by Arroyo-Machado & Torres-Salinas (2021), reducing them to 500,696 papers. The DOIs of the papers were then queried on Altmetric.com to retrieve all the mentions they had received on Twitter. Finally, a dataset composed of 265,999 papers receiving a total of 4,944,663 Twitter mentions by 802,363 unique Twitter accounts was generated.
For the Twitter bot identification process, the machine learning tool BotometerLite API was used, which employs a lighter model than the main Botometer API (V4) and is directly focused on the prediction of large volumes of data. This model reduces the features used for the classification of accounts, using basic elements such as the number of tweets, friends or whether the profile has been personalised, as well as features derived from these such as the frequency of tweets or the ratio of followers and friends, ultimately obtaining good results in the identification of bots at a lower computational cost (Yang et al., 2020). On 26 December, its API was queried with the identifiers of Twitter accounts that mention Social Science papers. For each of the 802,363 Twitter accounts, the botscore was obtained, a score indicating to what extent the account behaves as a human or bot. The botscore ranges from 0 to 1, where 0 reflects human-like behaviour and 1 reflects bot-like behaviour. The main problem with the botscore lies in establishing a threshold from which to determine which accounts are bots (Yang et al., 2022). There are multiple proposals for such an approach, although in all cases they are related to the main Botometer model. In our case we have analysed the distribution of the botscore of the accounts and we have set the threshold at the maximum value of the boxplot (Q3 + 1.5*IQR), this value being 0.52, so that all the outliers in the upper zone have been set as bots (Figure 1). This value is also very similar to the 0.5 recommended by developers of the tool and which has been used in previous studies (Shao et al., 2018; Vosoughi et al., 2018). This makes it easy to identify bots via the BotometerLite API, for example the @ BotArxiv account with a botscore of 0.74 would exceed the established threshold and thus be tagged as a bot.
Figure 1: Botscore density distribution of Twitter accounts mentioning Social Science papers and threshold for recognising bots.
For the study of bots, we have carried out an initial exploratory study to provide a first general overview of the presence and impact of bots in scientific communication on Twitter. The volume of bots and their tweets were identified and analysed for the Social Sciences in general, obtaining a profile of the presence and impact of bots for each of the 31 subject categories of Social Sciences. More specifically, the Information Science & Library Science category has been selected as a case study to analyse this phenomenon in greater detail by identifying the main accounts detected as bots and studying whether their mentions have a relevant impact or not by means of a Kendall correlation between the total mentions and the mentions not made by bots of all the papers.
Overall, a total of 28,961 accounts have been identified as bots (3.61% of the accounts), mentioning an average of 6.58 papers (±115.17). This means that a total of 190,443 tweets (3.85% of the tweets) have been made by these automated accounts. Similarly, although retweets predominate over original tweets in the overall mentions to Social Science publications, accounting for 67.1% of all tweets, the situation is reversed in the case of bots, with 57.2% of the mentions being original tweets.
However, as can be seen in Figure 2, this situation is not homogeneous for all the subject categories of Social Sciences. On average, 4.1% of mentions are attributed to bots, with Criminology & Penology standing out with a percentage of 12.4% of mentions made by bots and, to a lesser extent, Social Sciences, Mathematical Methods with 8.4%. In both cases, the number of bots is small, 3.5% and 3.8% of users respectively. However, this activity is particularly concentrated in the case of Criminology & Penology, where only one of the 1762 bots, the @CrimPapers account, is responsible for 63.85% of the bot mentions. Political Science is in a completely opposite position to those mentioned above, being the category with the highest number of mentions, 711,070 mentions, and only 2.7% of them made by bots. It can thus be seen that, in general, both the presence of bots and the impact of their mentions are low in the Social Sciences, especially in the case of the subject categories that have the most mentions on Twitter.
Figure 2: Presence of bots and their mentions as a percentage of total mentions to Social Science papers by subject categories. The dashed lines reflect the average values for each axis.
In the case of Information Science & Library Science, the presence of bots and the impact of their activity on the volume of mentions is slightly above average, with 5220 mentions (4.6% of the total produced in this category) made by 1920 bots (4% of the total number of users in this category). This activity is also very distributed and no single account stands out in this respect.
Analysing the top 10 bots with the most mentions in ISLS can highlight all this, as well as illustrate the potential and limitations of BotometerLite for bot identification (Table 1). The bot with the highest impact is @arxiv_cscl, an account that automatically posts Computation and Language papers published in arXiv. This account makes 133 mentions of ISLS papers, although its activity is not concentrated in this category, as these mentions account for 39% of all mentions of Social Science papers. This type of bots aimed at posting academic news is one of the clearest examples of bots and can also be seen in the cases of @OpenSciTalk, @HealthLitUpdate, @M157q_News_RSS, @ComputerPapers and @OpenScienceR, some of them self-declared as bots in their Twitter profiles. However, this automated identification as a bot is not always so clear, as can be seen in the case of the journal Twitter accounts @ORMS_IEOM and @ijcscl, although these doubts stand out above all in the case of the @v_i_o_l_a account, whose activity pattern and profile is identified as very similar to that of a bot with a botscore of 0.72, but which when analysed in detail manually generates doubts.
Table 1. Top 10 bots that most mention ISLS papers and Twitter profiles.
User name | Tweets | Followers | Friends | Botscore | ISLS mentions |
% Social Sciences | ISLS mentioned |
---|---|---|---|---|---|---|---|
@arxiv_cscl | 18,1950 | 6121 | 1 | 0.83 | 133 | 39% | 30 |
@ORMS_IEOM | 3711 | 126 | 63 | 0.55 | 127 | 87% | 89 |
@JOLIS45 | 200 | 446 | 301 | 0.59 | 89 | 100% | 84 |
@v_i_o_l_a | 133,496 | 2463 | 3133 | 0.72 | 88 | 73% | 82 |
@OpenSciTalk | 145,655 | 4280 | 1 | 0.66 | 87 | 22% | 67 |
@ijcscl | 107 | 339 | 0 | 0.62 | 81 | 50% | 47 |
@HealthLitUpdate | 6401 | 398 | 0 | 0.77 | 76 | 16% | 64 |
@M157q_News_RSS | 530,660 | 1098 | 0 | 0.64 | 73 | 41% | 54 |
@ComputerPapers | 42,275 | 613 | 4 | 0.88 | 68 | 62% | 68 |
@OpenScienceR | 4620 | 3037 | 2488 | 0.58 | 65 | 67% | 60 |
At the publication level, a total of 2694 papers (26.2%) receive at least one mention from bots. The most extreme case in this regard is the paper by Zhang & Zhang (2021), whose 11 mentions are made by three bots (@arxiv_cshc, @arxiv_cscl and @arxiv_cs_cl). To determine whether mentions by bots have a considerable impact on the ISLS case, a Kendall correlation was calculated between the total number of mentions and the total number of mentions excluding bots. As a result, a very strong Kendall's tau-b coefficient of 0.95 (p-value < 0.05) was found between both mentions, so it can be pointed out that the role played by bot mentions does not have a considerable interference.
This study has identified the presence of bots and their mentions in the dissemination of Social Science papers on Twitter. The volume of bots and mentions affected by this type of accounts is small, although it is uneven across subject categories. Analysing the specific case of Information Science & Library Science highlights the potential for detecting bots using this methodology, as well as the reduced impact they have on Twitter mentions as a metric of social attention. The results are very much in line with results previously reported by other studies, which have estimated in other case studies the mentions made by these bots at between 3% and 9% (Haustein et al., 2016; Ye & Na, 2020). It is thus necessary to aim for a contextualised study to better understand this phenomenon whose incidence has been shown to vary by subject categories in our case.
In this sense, the impact of automated bots which, in a benevolent manner, disseminate publications in a specific area as an alert, has been highlighted in the specific case of Criminology & Penology. The Elon Musk paradox serves as an example of the complexity of the bot problem, but with the methodological framework provided by this study, further research can be conducted to better understand the intervention of automated accounts in scientific communication. This work is the preliminary result of a much larger and still developing research. Therefore, these results will be extended in the future by incorporating all ESI fields as well as Twitter user metadata to provide a more comprehensive analysis of the presence of bots in the dissemination of papers on Twitter.
Open science practices
The results of this paper are part of a research project that is still under development. That is why the data and codes have not yet been shared but will be shared openly, respecting the privacy policies of the data providers, once the research concludes. This statement reflects a responsible and ethical approach to research data sharing.
Author contributions
Wenceslao Arroyo-Machado: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Visualization, Writing—original draft.
Enrique Herrera-Viedma: Project administration, Supervision
Daniel Torres-Salinas: Conceptualization, Funding acquisition, Investigation, Methodology, Resources, Validation, Writing—review & editing.
Competing interests
Authors declared its participation and leadership in the altmetric project InfluScience
Funding
This work has funded by the Spanish Ministry of Science and Innovation grant number PID2019-109127RB-I00/ SRA/10.13039/501100011033 (InfluScience). Wenceslao Arroyo-Machado has an FPU Grant (FPU18/05835) from the Spanish Ministry of Universities
References
Arroyo-Machado, W., & Torres-Salinas, D. (2021). Web of Science categories (WC, SC, main categories) and ESI disciplines mapping. https://doi.org/10.6084/m9.figshare.14695176.v2
Bovet, A., & Makse, H. A. (2019). Influence of fake news in Twitter during the 2016 US presidential election. Nature Communications, 10(1), 7. https://doi.org/10.1038/s41467-018-07761-2
Conger, K. (2022, August 4). Musk Says Twitter Committed Fraud in Dispute Over Fake Accounts. The New York Times. https://www.nytimes.com/2022/08/04/technology/musk-twitter-fraud.html
Gorwa, R., & Guilbeault, D. (2020). Unpacking the Social Media Bot: A Typology to Guide Research and Policy. Policy & Internet, 12(2), 225–248. https://doi.org/10.1002/poi3.184
Haustein, S., Bowman, T. D., Holmberg, K., Tsou, A., Sugimoto, C. R., & Larivière, V. (2016). Tweets as impact indicators: Examining the implications of automated “bot” accounts on Twitter. Journal of the Association for Information Science and Technology, 67(1), 232–238. https://doi.org/10.1002/asi.23456
Robinson-Garcia, N., Arroyo-Machado, W., & Torres-Salinas, D. (2019). Mapping social media attention in Microbiology: Identifying main topics and actors. FEMS Microbiology Letters, 366(7). https://doi.org/10.1093/femsle/fnz075
Robinson-Garcia, N., Costas, R., Isett, K., Melkers, J., & Hicks, D. (2017). The unbearable emptiness of tweeting—About journal articles. PLOS ONE, 12(8), e0183551. https://doi.org/10.1371/journal.pone.0183551
Shao, C., Ciampaglia, G. L., Varol, O., Yang, K.-C., Flammini, A., & Menczer, F. (2018). The spread of low-credibility content by social bots. Nature Communications, 9(1), 4787. https://doi.org/10.1038/s41467-018-06930-7
Thelwall, M. (2014). A brief history of altmetrics. Research Trends, 1(37), 2.
Varol, O., Ferrara, E., Davis, C., Menczer, F., & Flammini, A. (2017). Online Human-Bot Interactions: Detection, Estimation, and Characterization. Proceedings of the International AAAI Conference on Web and Social Media, 11(1), 280–289. https://doi.org/10.1609/icwsm.v11i1.14871
Vosoughi, S., Roy, D., & Aral, S. (2018). The spread of true and false news online. Science, 359(6380), 1146–1151. https://doi.org/10.1126/science.aap9559
Yang, K.-C., Ferrara, E., & Menczer, F. (2022). Botometer 101: Social bot practicum for computational social scientists. Journal of Computational Social Science, 5(2), 1511–1528. https://doi.org/10.1007/s42001-022-00177-5
Ye, Y. E., & Na, J.-C. (2020). Profiling Bot Accounts Mentioning COVID-19 Publications on Twitter. In E. Ishita, N. L. S. Pang, & L. Zhou (Eds.), Digital Libraries at Times of Massive Societal Transition (Vol. 12504, pp. 297–306). Springer International Publishing. https://doi.org/10.1007/978-3-030-64452-9_27
Zhang, Y., & Zhang, C. (2021). Enhancing keyphrase extraction from microblogs using human reading time. Journal of the Association for Information Science and Technology, 72(5), 611–626. https://doi.org/10.1002/asi.24430
Arroyo-Machado, W., Herrera-Viedma, E. & Torres-Salinas, D. (2023). The Elon Musk Paradox: Quantifying the Presence and Impact of Twitter Bots on Altmetrics with Focus in Social Sciences [preprint]. 27th International Conference on Science, Technology and Innovation Indicators (STI 2023).