Do original tweets and retweets differ in indicating research impact across various subject areas in multidisciplinary papers published in PLoS?

Twitter is a popular platform to discuss and share scientific articles. Earlier altmetrics studies have often focused on investigating whether the number of tweets mentioning scientific articles could be used as an indicator of scientific impact or attention, with results showing weak to moderate correlations with citation counts and some disciplinary differences. But all tweets may not be equal, as original tweets and retweets may reflect different levels of engagement and, with that, impact. This research analyzed whether the correlation between citations and original tweets differs from that between citations and retweets and whether there is any disciplinary difference between the two. For this purpose, the relationship between original tweets and retweets and Scopus citations was analyzed for a total of 330,022 PLoS publications and compared over time and across subject fields. The findings showed that the correlations were strongest between citations and original tweets, and the relationship was stronger in Social Science and Humanities subject fields than in Natural Science, Engineering and Medicine. The results showed that tweets and retweets are very different, and thus they should be considered two different metrics and analyzed separately.


Introduction
Twitter is a popular social media platform where users (often called tweeters) can publish and share content to their network of followers.Through retweeting tweeters can easily disseminate content that someone else has originally published.While creating an original tweet can take a bit of effort, retweeting can easily be done just by clicking or tapping on a button, thus it seems fair to say that retweeting doesn't require as much effort as tweeting does.Because of that we can also argue that retweeting signals less engagement than creating and publishing an original tweet does.In altmetrics, i.e., the measuring of engagement or attention that scientific outputs have received online, Twitter is one of the main data sources, as there is significant activity around scientific articles on the platform (Costas et al., 2015;Haustein et al., 2015).Often in altmetrics research tweets and retweets are counted as one measure, without making any distinction between them.We argue that because the two acts are fundamentally different, indicating different levels of engagement and possibly attention or impact, combining them in statistical analyses may lead to false results.The goal of this research is to investigate whether this is true, and whether original tweets and retweets should be analyzed separately in altmetric research.

Background
Much of early altmetrics research focused on examining whether altmetrics could be an alternative to traditional citation-based measures of impact.The research focused on testing for correlations between tweets and citation counts, providing some mixed results with large scale studies (e.g., Barthel et al., 2015;Costas et al., 2014Costas et al., , 2015) ) showing lower correlations between tweets and citations than studies with more focused, journal or discipline specific samples (e.g., Eysenbach, 2011;Shuai et al., 2012).Earlier research has also discovered disciplinary differences in how scientific articles get tweeted, as scientific articles from social sciences and biomedical and health sciences tend to attract more attention on Twitter than articles from mathematics and computer science, and natural sciences and engineering (Costas et al., 2015;Haustein et al., 2015).Other characteristics too, such as the length of the article (Haustein et al., 2015), OA status (Holmberg et al., 2020), and research funding (Didegah et al., 2018), may have an influence on the attention scientific articles receive on Twitter.A more recent study investigated how different types of user engagement behaviors on Twitter, i.e., liking, retweeting, quoting, and replying, were used in connection to scholarly content (Fang, Costas, & Wouters, 2022).The results showed that while likes (44%) and retweets (36%) were frequently used, quotes (9%) and replies (7%) were less frequent.While earlier research has already shown disciplinary differences in the uptake of scientific articles on Twitter (e.g., Haustein, Costas, & Lariviére, 2015), and how researchers use Twitter (Holmberg & Thelwall, 2014), the results by Fang, Costas, and Wouters (2022) showed that there are disciplinary differences also in the ways with which users engage with scientific content on Twitter.But do the disciplinary differences extend to both tweeting and retweeting?Or are the possible differences evened out if tweets and retweets are treated as same?This research investigates possible disciplinary differences between tweeting and retweeting, as well as if there are any differences in how citation counts correlate with the number of tweets and retweets.

Data
A total of 330,022 PLoS publications published between 2003-2023 were extracted from Scopus in April 2023.The extracted publications were published in nine PLoS journals and eight proceedings, with majority of the papers (94%) being journal articles.Altmetric.comwas used to extract separate datasets of 1) all tweets and 2) original tweets, which were then used to count the number of retweets for each paper.

Subject fields
As all PLoS papers are only classified as multidisciplinary in Scopus, we used the classification used by altmetric.com(Australian and New Zealand Standard Research Classification 2020 (ANZSRC)1 ) to assign subject fields to each article.For the analysis we used: (1) first subject field of each paper (no duplicates); and (2) all publications in a subject field (duplicates included between subject fields).Table 1 shows the number of publications when counting only first/primary subject field and all publications within a subject field.The first 11 fields in Table 1 are from Natural Science, Engineering and Medical and Health Sciences (STEM) and the second 11 fields are from Social Science and Humanities (SS&H).Of all the publications about 19% have not been assigned to a field; these mostly were Erratum and non-tweeted.

Analysis
To analyze the possible relationship between citations and all tweets, original tweets and retweets, comparisons across fields and over time were conducted.For this purpose, proportion non-zero and Geometric mean of citations, tweets and retweets were calculated and normalized for comparisons between subject fields and with the world average (here, all PLoS publications).The data was first prepared (Thelwall, 2017) and then the calculations were conducted with Webometric Analyst (lexiurl.wlv.ac.uk).
(a) Normalized Proportion non-zero was used as an estimate for publications with non-zero Scopus citations, tweets and retweets, with a 95% confidence interval.(b) World normalised proportion non-zero of metrics (EMNPC) were used for comparisons.
EMNCP values for fields are compared for any variation from the world average (=1).(c) Geometric mean was calculated based on the logarithm of raw metric counts + 1 or ln(1+raw data), as proposed by Thelwall (2017) and all calculation were in 95% confidence interval.(d) World normalised mean metrics (MNLCS) were calculated on log-transformed data of ln(1+raw data) and calculated in 95% confidence interval.MNLCS values also need to be compared with value one which represents the world average.

Normalized Proportion Cited
Figure 1 shows that the total publication frequency of PLoS had significantly increased from 87 in 2003 to just below 35,000 in 2013, after which the level drops and remains at around 20,000 annually.The proportion non-zero citations show a cumulative increase over time, the number of publications mentioned in tweets rose from about 20% in 2010 (about the time when altmetric.comstarted to collect tweets) to 76% in 2016 and then a fall to about 65% in 2022.
The proportion non-zero retweets shows a delayed rise since 2013, rising to 40% by 2018, levelling off after that, while proportion tweeted has slightly dropped in the same period.Presenting the results from the normalized proportion non-zero of metrics, Figure 2 shows that on average 92% of publications in STEM fields had been cited, while only 82% in SSH fields had received citations.On average, 75% of articles in STEM had been tweeted, compared to 85% in SSH, while only 35% of STEM articles and 50% of SSH articles had been retweeted.

World normalised proportion non-zero for metrics or EMNPC
Figure 3 shows that after world normalization of proportions non-zero, both tweets and retweets appear significantly above world average in SSH fields for STEM the results are mixed both below and above the world average.Mathematical Science, Earth Science, Environmental Science and Information and Computing Sciences all show EMNCP >1 for tweets and > 1.5 for retweets, while all the other STEM fields remain below the world average.The results also showed that the diversion from the world average for retweets is at higher magnitude than for tweets across all fields; for above world average counts, the proportion non-zero retweets was significantly higher than for tweets, and for below world average counts, the proportion nonzero retweets was significantly lower than for tweets.This may suggest a greater discrepancy across fields in terms of retweeting behaviour.

Geometric mean Citations vs. Original tweets and Retweets
Figure 4 illustrates the changes in geometric mean metrics over time, showing that the geometric mean for citations peaked at about 49 in 2008 before gradually dropping over years.The trend is, however, almost reversed for the metrics from Twitter, showing a slow drop between 2003 and 2009 (<1) before rising to about 3 for total tweets in 2018 (about 2 for original tweets and 1.25 for retweets), soon after which they too start to fall.The results also show, that the ratio of retweets to original tweets has been over 1 since 2017, suggesting that in the past six years, a majority of tweets mentioning scientific articles have in fact been retweets rather than original tweets.The average geometric mean citations across STEM fields is 14, while about 9 across SS&H fields.In contrast, the average geometric mean all tweets, original tweets and retweets across STEM fields (3, 2 and 1, respectively) is approximately half the SS&H fields (6, 4, and 2).

World normalised mean metrics or MNLCS
The mean of world normalized ln(1+ raw metric values) metrics from Twitter mentions indicate subject bias (Figure 6).In STEM fields, such as Chemical Science, the results are below the world average for tweets and retweets, while slightly above it for citations, but the case is very different for the SSH fields.A majority of SSH fields perform below world average in citations, but significantly above the world average in tweets and original tweets (up to 1.5 times the world average) and with 3.5 times the world average in retweets (e.g., History and archaeology, and Creative arts and writing).

Spearman's Correlation between Citations and different types of tweets
The correlation coefficients between citations and all tweet metrics showed stronger correlations when the zeros, i.e., articles with no citations or tweets, were included in the calculation (Figure 7).The correlations were weak but significant across the line.The strength of the relationship between citations and tweets has, however, first increased over time and then from 2019 started to fall.Furthermore, the correlation coefficients with all tweets were slightly stronger than for original tweets from 2014 (r = .282> .280,respectively) through 2018 (r = .330> .329),and since 2019 (r = .343< .346)until 2022 the relationship between citations and original tweets appears to be slightly stronger than for all tweets in both zero-included and nonzero datasets.Figure 8 illustrates a heatmap of the correlation coefficients between Scopus citations and the three metrics of all tweets, original tweets, and retweets across subject fields for zero (Z) and non-zero datasets (-) and first-assigned subject (F) and all publications in a field (A).The findings suggest that the median correlation coefficient of Scopus citations across fields is highest with original tweets (median r = .310),while remaining weak but significant with retweets (median r = .087)when the first assigned subject fields were used.Using all publications in a subject fields led to even weaker correlations than with first-assigned subjects.However, for the first-assigned SSH subject fields the median correlation coefficients between citations and original tweets were at medium level (median r = .409),in contrast to the weak correlation in STEM subject fields (median r = .162).
Including zero metric counts in the datasets resulted in stronger correlation coefficients between citation and all the other metrics in SSH fields (median r with original tweets = 0.409 in zeroincluded dataset, 0.329 in non-zero dataset), but weaker in STEM subject fields (median r = 0.175 in non-zero dataset, r = 0.149 zero-included dataset).It would appear that tweets are moderately likely to align with traditional research impact in Social Science and Humanities, but they indicate only a weak relationship and a limited usage in STEM subject fields.FZ: Raw metric counts with zeros (Z) in first (F) assigned subject field; F: Non-zero raw metric counts in first assigned subject field; AZ: Raw metric counts with zeros in all (A) assigned publications to field; A: Non-zero raw metric counts in all assigned publications to field.

Discussion
Current study compared citations, original tweets, and retweets, as measures of impact assessment.The results were in line with some of the findings in earlier research (e.g., Costas et al., 2015;Haustein et al., 2015).The results showed clear disciplinary differences in how scientific articles had been mentioned and shared on Twitter, but it was also discovered that scientific articles in Social Science and Humanities receive up to 2 to 3 times as much retweets as the world average, compared to Natural Science and Engineering which were below the world average.The results also showed how the correlations between citations and original tweets were clearly stronger than between citations and retweets, and how the correlations overall were stronger for SSH subject fields, than STEM subject fields.The results clearly point at the differences between original tweets and retweets, confirming that the two do reflect different types of actions and therefore, should be treated separately, at least when it comes to altmetrics research.

Figure 1 :
Figure 1: Frequency of total publications, publication cited, tweeted and retweeted and normalized proportion non-zero in the metrics

Table 1 .
Number of PLoS papers according to first/primary subject field assigned and total number of publications in a subject field (including duplicates).