How researchers collaborate across disciplines? Patterns of interdisciplinary collaboration based on a dual-perspective framework

: With a combination of new data sources and mixed methods including bibliometrics, machine learning and network analysis, this study puts forward a new framework of categorizing different interdisciplinary collaboration patterns from a discipline-contribution perspective. Based on 20,542 research articles published on PLoS series of journals in 2018, 14,744 articles with interdisciplinary collaborations (ICAs) are recognized. By establishing six indicators that measure the variety, similarity and balance of authors’ disciplines and their contribution roles, ICAs are divided into four categories after the agglomerative hierarchical clustering. With a fine-grained analysis of the structural and correlation characteristics of authors’ disciplines and contribution in different clusters, four interdisciplinary collaboration patterns of sheep flock , bee colony , intercropping and rainforest are found. Our results may contribute to developing new methodologies and theories of interdisciplinary collaboration, and enrich the understanding of interdisciplinary collaboration as well as relevant policies.


Introduction
Challenges facing humanity become increasingly diverse and comprehensive. Nature (2015) released a news report in 2015 in which it said, to solve the grand challenges facing societyenergy, water, climate, food, healthscientists and social scientists must work together. The implication is two-foldscientific collaboration has become indispensable to achieve major scientific breakthroughs and address complex social problems; there is a growing need for interdisciplinary collaboration, especially between natural and social sciences (Barthel & Seidl, 2017). In recent years, research funding agencies worldwide have taken actions to support interdisciplinary collaboration, such as the establishment of Department of Interdisciplinary Sciences of National Natural Science Foundation of China in 2020, and the European Innovation Council Pathfinder Pilot for interdisciplinary cutting-edge science collaborations that underpin technological breakthroughs in 2021. In this context, scholars' enthusiasm of breaking down the knowledge wall and conducting research across disciplines has been unprecedentedly high.
Along with the increasing demand for evidence-based policymaking, policy formulation and implementation are more and more dependent on scientific evidence based on empirical data and quantitative methods. Although interdisciplinary collaboration has been prevalent in research practice, the progress of relevant research lags far behind its development in reality. In specific, previous studies are mostly based on two methodologies, bibliometric analysis driven by large-scale scientific data or social survey method with data about a group of individuals. The former tends to reveal phenomena and characteristics of interdisciplinary collaboration at the individual (Abramo, D'Angelo, & Di Costa, 2018), group (Bellotti, Kronegger, & Guadalupi, 2016;Lungeanu, Huang, & Contractor, 2014), organizational (Pessoa Junior, Dias, Silva et al., 2020;Zuo & Zhao, 2018) and disciplinary (Benz & Rossier, 2022;Urbano & Ardanuy, 2020) level. However, these studies mostly lack in-depth investigations and explanations and fail to deepen the understanding of intrinsic patterns. In comparison, the latter focuses on the underlying mechanisms of interdisciplinary collaboration, including the division of labour (Horwitz, 2003), specific process (Haythornthwaite, 2006) and impact factors (Bishop, Huck, Ownley et al., 2014;Gardner, 2013). However, the limitation of data makes these studies hard to lay a universal and solid foundation for decision making in policy practice.
As more open data like author contribution statement become accessible, deeper exploration into interdisciplinary collaboration has been much facilitated. Our study aims to integrate multi-source data and various methods, and build up a dual-perspective framework of categorizing and profiling different interdisciplinary collaboration patterns based on authors' disciplines and their contribution roles. The following two questions will be answered: (1) How to distinguish articles of different interdisciplinary collaboration patterns from each other? (2) How authors from different disciplines contribute in each interdisciplinary collaboration pattern? A new perspective and implementation of methodologies for the research on interdisciplinary collaboration is provided in the study, and political implications will be discussed.

Data
This study involves two data sourcesthe website of Public Library of Science (PLoS), a non-profit publisher of open-access journals in science, technology, and medicine and other scientific literature, and OpenAlex, a fully open catalogue of the global research system launched in January 2022. The former adopts the Contributor Role Taxonomy (CRediT) 1 to label authors' contribution roles for each publication. The latter has a six-layer disciplines classification system and labels each author with a distribution that reflect research interest based on the author's all publications. 19 root-level concepts 2 are adopted in this study, which are used to represent authors' discipline backgrounds.
We retrieved the basic information of 20,542 research articles published on PLoS series of journals in 2018, and crawled the author contribution statement of each article on the detailed page on PLoS website. With the digital object unique identifier (DOI) of each article, the discipline background of each author in each article can be retrieved from OpenAlex. The information of 19,589 (about 95%) articles can be matched accurately in OpenAlex.

Methodology
Based on the dataset described above, we developed a three-step framework to answer the two research questions put forward in the Introduction section (see Figure 1) -(a) the recognition of articles with interdisciplinary collaborations (ICAs) is the basis of our whole analysis; (b) the clustering analysis based on dual-perspective indicators solves the problem of dividing ICAs into different categories; (c) fine-grained analysis based on authors' specific disciplines and roles helps understand and conclude different interdisciplinary collaboration patterns. Interdisciplinary collaboration in this study refers to a research mode in which researchers from different disciplines collaborate to conduct scientific research with research articles as outcomes. For each article, if there exist two authors A and B, with discipline assembles DA and DB respectively, and there is at least one discipline in DA not in DB, and there is at least one discipline in DB not in DA as well, then the collaboration between the two authors is interdisciplinary collaboration, and the article is an ICA. Noted that author discipline assembles provided by OpenAlex are distributions. Since disciplines in the distribution with lower scores may be not suitable to be regarded as an author's knowledge base, we selected disciplines with scores greater than the average as the author's major disciplines. Disciplines in the following all refer to authors' major disciplines.
Among 19,589 articles, 14,744 (79%) ICAs can be recognized. Because individual roles and responsibilities become ambiguous when the team size grows, large-scale co-authorships should be excluded from our analysis. In our dataset, about 95% articles are signed by less than 15 authors, consistent with the previous research (Paul-Hus, Mongeon, Sainte-Marie et al., 2017), so we limited the number of co-authors to 2-14.

Clustering based on dual-perspective indicators
To quantitatively depict features of authors' disciplines and their roles in ICAs, six indicators are established to measure variety, similarity and balance respectively. Based on these indicators, an agglomerative hierarchical clustering algorithm can be applied to divide ICAs into different clusters. The reason to use this algorithm is because it adopts a bottom-up strategy, which allows us to select the most appropriate number of clusters with the assistance of a tree map. The rest of this subsection will introduce the dual-perspective indicators in detail.
The variety derives from the concept of biodiversity in biology (McCann, 2000), which is used to quantify the number of differentiated individuals. Stirling's three-dimensional framework (variety-balance-disparity) of measuring diversity in science, technology and society lays a solid foundation for our work (Stirling, 2007). Suppose a piece of article authored by m researchers, and author Ai (i = 1, 2, …, m) has ni disciplines or roles. Then the discipline/role variety of an article is calculated as follows: The similarity measures the disparity among authors' disciplines and their roles for each article. Suppose a piece of article authored by m researchers, and the 0-1 distribution on 19 disciplines or 14 roles of author Ai (i = 1, 2, …, m) is vector ni. The average cosine similarity with other m-1 authors is si. Then the discipline/role similarity of an article is calculated as follows: The concept of balance originates from the Gini coefficient in economic theory (Gini, 1912), which is used to reflect the overall degree of concentration of disciplines or roles for each article. Suppose a piece of article authored by m researchers. Count the frequencies of 19 disciplines or 14 roles occurring in authors' discipline/role assembles in each article. A frequency distribution of disciplines or roles can be received. Then normalize the frequencies and sort the items by ascending order. The discipline/role balance is calculated as follows: where x is the value after sorting, n is the number of items in the list, and i is the index of x.

Fine-grained profiling of different clusters
After dividing ICAs into different clusters based on dual-perspective indicators, a fine-grained analysis of authors' disciplines and their roles will be conducted to better understand the rich connotation and key characteristics of each cluster, so as to name the clusters to receive different interdisciplinary collaboration patterns.
In specific, in an article, each author is labelled with a discipline assemble, which is composed of several disciplines, and also a contribution role assemble, which is composed of several roles. As shown in Figure 1, this structure reflects how authors from different disciplines play different roles in a research. In this way, two aspects of questions will be further investigated. Firstly, what is the structural characteristics of authors' disciplines and their roles? Namely, how do different disciplines or roles combine at the author level? Secondly, what is the correlation characteristics of authors' disciplines and their roles? Namely, are the tendencies toward different roles vary among authors from different disciplines? For co-authors playing a certain role in an article, what disciplines are they from? Mixed methods of network analysis and statistical analysis will be applied to solve the above questions. Salton method (Salton & Bergmark, 1979) is used to normalize values when depicting the relationship between disciplines, roles or disciplines and roles, so as to reduce the interferences of the scale of disciplines or roles themselves.

Results and discussion
Following the research framework in Figure 1, results of clustering and fine-grained analysis are presented in this section. Whereas the former shows how we distinguish articles of different interdisciplinary collaboration patterns, the latter discloses the core characteristics of each interdisciplinary collaboration pattern based on authors' disciplines and contribution. Our interpretation of these patterns will be discussed at last.

Clustering the ICAs based on dual-perspective indicators
After calculating six indicators for each ICA, the indicators were firstly normalized to eliminate the impact of the scale among indicators to make them comparable. Then the correlation between indicators was tested and the information redundancy was found (see Figure 2 (a)). Thereby, the principal component analysis method was applied to reduce dimensions and simplify the data. On this basis, an agglomerative hierarchical clustering algorithm using Ward linkage was adopted to divide 14,744 ICAs into different clusters. According to the clustering tree in Figure 2 (b), four clusters can be found distinctively, with 3710, 3891, 2350 and 4973 articles respectively.
To confirm the cluster validity, the Silhouette coefficient was calculated for each ICA in each cluster. As shown in Figure 2 (c), whereas the vertical axis represents the number of ICAs, the horizontal axis represents the Silhouette coefficient of each ICA, and the dashed line represents the average Silhouette coefficient. Overall, the values in four clusters are more to the right of the vertical line at zero, indicating that the clusters are well differentiated from each other. The performance of six indicators of ICAs in four clusters is further compared (see Figure 3). Between each cluster pair, the significance of the difference is labelled. The differences are always significant except for the "role balance" between cluster 1 and 3.
In terms of authors' disciplines, the variety and balance of ICAs in cluster 1 and 2 are low and the similarity is high, while the opposite is true for cluster 3 and 4. The difference implies that authors of ICAs in the former two clusters have relatively monotonous, homogeneous and unbalanced discipline backgrounds, whereas those in the latter two clusters tend to have more diverse discipline backgrounds.
In terms of authors' roles, the variety and balance of ICAs in cluster 1 and 3 are higher, while the opposite is true for cluster 2 and 4. However, as for the similarity, cluster 1 and 3 show opposing performances, which means that although authors' roles are overall various and evenly distributed in ICAs from both cluster 1 and 3, those as for the former are more varied across authors, and those for the latter are more homogeneous.

Profiling the dual-perspective features of different clusters
In order to clarify the differences among clusters more comprehensively, fine-grained analysis based on authors' disciplines and roles is conducted in this section. In specific, we'd like to know how different disciplines or roles combine and what is the relationship between them.

Structural characteristics of disciplines and roles
Co-occurrence networks are used to show how different disciplines or roles combine (see Figure 4 & 5) 3 .
In terms of authors' disciplines, Biology and Medicine dominate the networks, with large node sizes and strong connections. Characteristics of other nodes and links varies among four clusters. In cluster 1 and 2, co-occurrence of nodes within the same type is relatively prominent, such as Biology-Medicine, Chemistry-Physics-Engineering and Art-Philosophy. However, connections between nodes with different types are weak, which means authors' discipline backgrounds of ICAs in these clusters are somewhat monotonous. In comparison, authors of ICAs in cluster 3 and 4 tend to have more diverse disciplines backgrounds.

Figure 4: Co-occurrence networks of authors' disciplines
In terms of authors' roles, more authors tend to play roles like Writingreview & editing. Although the overall structure of networks seems to be more balance in relative to the discipline co-occurrence networks, networks vary a lot across different clusters. In cluster 1 and 3, authors tend to play roles of various types. On the contrary, authors' roles in ICAs under cluster 2 and 4 are more monotonous. Roles like resource, validation or T-type roles even seldom co-occur with other roles in an author's role assemble.  Table 1. Distribution of discipline/role assembles 3.2.2 Correlation characteristics between disciplines and roles Figure 6 reveals the relationship between authors' disciplines and their roles. The subplots above show what roles authors from each discipline tend to play, from which we can learn that BM-type disciplines dominate nearly all roles in cluster 1 and 2, while authors with non-BM-type disciplines participate more in C-type or I-type roles with substantive contribution in cluster 3 and 4. The subplots below show what role assembles authors with different discipline assembles tend to have, from which we can learn that the correlation between BM, BM+E1 and C+I+L+R, C+I+R, I in cluster 2 and that between BM, BM+E1 and C+I+L+R, C+T+I+L+R in cluster 3 are especially strong. The tendency of different discipline assembles toward different role assembles in cluster 1 and 4 is relatively balanced. Since tendencies toward different roles (assembles) vary across authors with different disciplines (assembles), we'd like to further know which disciplines co-authors playing a certain role in a piece of article come from. As shown in Table 2, for C-type and I-type roles, whereas co-authors from BM+E1 disciplines dominate in cluster 1 and 2, more authors with other-type disciplines participate in cluster 3 and 4. For L-type and T-type roles, co-authors in cluster 1 and 2 even always come from only BM-type disciplines. In particular, the discipline backgrounds of co-authors with the Writingreview & editing role are always more diverse. Table 2. Distribution of discipline assembles of authors playing different roles

Interpreting interdisciplinary collaboration patterns
Based on the above analysis of both indicators and specific contents from the disciplinecontribution perspective, four interdisciplinary collaboration patterns can be concluded (see Figure 7), corresponding to cluster 1-4 respectively.
Bee Colony describes a pattern in which authors with common knowledge base fully divide their labour and duties, just like a bee colony with queen, drone and worker bees, who belong to the same species but maintain a multi-functional living system. In this pattern, BM-type disciplines dominate and authors' roles are heterogeneous and evenly distributed in most roles.
Sheep Flock describes a pattern in which authors with common knowledge base have shared contribution, just like sheep in a crowd resembling and following each other's thought, emotion and action, with limited functions and outputs yet. In this pattern, BM-type disciplines dominate and authors' roles are homogeneous but concentrated on several roles.
Intercropping describes a pattern in which authors with diverse knowledge base have shared contribution, just like cultivating two or more crops simultaneously on the same field to jointly produce a greater yield on a given piece of land by making use of resources or ecological processes. In this pattern, multi-type disciplines dominate and authors' roles are homogeneous and evenly distributed in most roles.
Rainforest describes a pattern in which authors with diverse knowledge base fully divide their labour and duties, just like a rainforest system with a great variety of species with different functions, which form a comprehensive system with a main goal of absorbing carbon dioxide and a greenhouse gas and increasing local humidity. In this pattern, multi-type disciplines dominate and authors' roles are heterogeneous but concentrated on several roles.

Conclusion
This study has zoomed into interdisciplinary collaboration and established a brand-new framework of categorizing and profiling different interdisciplinary collaboration patterns. With the combination of new data sources and multiple methods, four interdisciplinary collaboration patterns of bee colony, sheep flock, intercropping and rainforest are found, with distinguished characteristics of authors' disciplines and their contribution. However, limited by space, we have not specified how this dual-perspective framework can be further applied. The multi-dimensional features of the outcomes of these four patterns will be analysed and compared in future work, so as to serve as references for relevant policymaking.

Open science practices
The data used in this study is openly available. From the official website of Public Library of Science (PLoS) (https://plos.org/), the basic information of 20,542 research articles and their author contribution statement (our original dataset) can be retrieved. With the digital object unique identifier (DOI) of each article, the discipline background of each author in 19,589 (about 95%) articles can be retrieved from OpenAlex (https://docs.openalex.org/). As for software and source codes, we combine Python, Gephi, R and other tools like these to conduct our analysis and visualize the results. All tools can be downloaded from the Internet, and all codes are self-developed (mostly the application of packages). If required, please contact us via caozhe@whu.edu.cn.