Distance matters: The causal effect of coauthor mobility on scientific collaboration

This study examines the impact of geographic distance on scientific collaboration, with a focus on the mobility of coauthors as a proxy for changes in distance. While advancements in information, communication, and transportation technologies have facilitated collaboration across distances, the extent to which these means of communication compensate for the lack of face-to-face interaction remains a subject of ongoing debate. By utilizing a matching method to construct two groups of scientist pairs and comparing the changes in their collaboration performance before and after mobility, the causal effect of geographic change on collaboration is estimated. The findings indicate that when collaborators who previously resided in the same city relocate to different cities, the productivity and citation impact of their co-publications are significantly reduced. These results suggest that physical proximity still plays a crucial role in scientific collaboration despite the availability of technological means for remote collaboration.


Introduction
Scientific collaboration has emerged as the dominant approach to knowledge production in both the natural and social sciences (Wuchty et al., 2007).The escalation of complexity and magnitude of scientific problems (Milojević, 2014), the narrowness of expertise due to the burden of knowledge (Jones, 2009), and the dependence on equipment, facilities, and other resources (Catalini et al., 2020) have necessitated collaborative research.By engaging in joint problem-solving, learning, and resource sharing, scientists are better able to address complex issues.Numerous studies have demonstrated that scientific collaboration stimulates innovation and enhances impact (Wuchty et al., 2007;Ding, 2011).
Several factors may hinder the realization of the benefits of scientific collaboration.One of the main hindrances is geographical distance, which can lead to communication difficulties and increased travel expenses.However, the development of information and communication technologies (ICTs) and transportation has made remote collaboration more accessible and cost-effective.As a result, scientific collaboration is increasingly transcending geographical boundaries (Jones et al., 2008).In fact, research indicates that scientific teams are becoming more dispersed (Adams et al., 2005;Hoekman et al., 2010).Some scholars have even proposed the "death of distance" hypothesis (Cairncross et al., 1997), which suggests that distance no longer has a significant impact on long-distance collaboration.
However, some scholars have questioned the validity of the "death of distance" hypothesis.Although ICTs can mitigate the inconvenience associated with distance and facilitate communication to some extent, the benefits of in-person, offline interaction cannot be entirely replaced by virtual, remote interaction.Face-to-face interactions foster a more flexible environment that encompasses not only language but also behavior, which can better facilitate the transfer of tacit knowledge (Olson & Olson, 2000;Werker & Ooms, 2020).
Despite the advancement of information and communication technologies (ICTs) and transportation, it remains inclusive whether geographic distance still affects scientific collaboration.This study aims to explore the causal effect of geographic change on collaboration.Changes in the geographic proximity of collaborators due to scientific mobility, such as cross-country or cross-region mobility, can impact the original collaborative relationship by introducing new collaborators and increasing the cost of work with original collaborators (Jonkers & Tijssen, 2008).We used the mobility of a collaborator from the same city to another as a proxy for geographic change and examined its effect on the collaboration productivity and quality of scientist pairs.

Spatial characteristics of collaboration
The development of (ICTs) and transportation results in improved cost and convenience of travel within and between cities and nations (Catalini et al., 2020).Additionally, instant messaging tools have made long-distance collaboration more accessible, allowing scientists to work together online without face-to-face meetings.Technological advances have transformed the spatial characteristics of scientific collaboration in two main ways.First, globalization has led to an increase in the average distance between collaborators in recent decades (Hoekman et al., 2010;Waltman et al., 2011).However, longer distances have been observed to decrease collaboration between scientists (Hoekman et al., 2010;Luo et al., 2018).

Geographic distance and collaboration
The impact of geographic distance on collaboration has been discussed for years (Katz, 1994).Studies have found that geographic distance can hinder collaboration and negatively affect knowledge spillovers (Fernández et al., 2016;van der Wouden & Youn, 2023).Geographic proximity is particularly important for collaborations between institutions with diverse backgrounds (Ponds et al., 2007).Offline meetings serve as a starting point for most collaborations (Freeman et al., 2014).Face-to-face interactions are crucial for establishing new partnerships or maintaining long-distance collaborations.The impact of geographic distance on collaboration is primarily attributed to the friction of distance, which results in higher communication and search costs when there is a lack of face-to-face interaction between collaborators (Hoekman et al., 2010).To meet, collaborators have to bear the time, financial, and energy costs.Several studies have estimated the effect of travel costs on scientific collaboration through special event shocks.For instance, Catalini et al. (2018) conducted a quasiexperiment and found that the introduction of low-cost airlines increased the quantity and quality of collaboration.Similarly, Hu et al. (2022) examined the impact of direct flights between China and the United States on scientific collaboration and found that it boosted influential papers in China.
However, counterarguments suggest that long-distance collaboration may lead to improved academic outcomes.Long-distance collaboration enables the combination of diverse skills and resources, which can foster innovation (Fleming, 2001).Furthermore, providing dissemination opportunities to remote research centers can improve the citation impact of research papers (Abramo et al., 2020).Nomaler et al. (2013) examined international co-authored papers in Europe and found that citation impact of papers was positively correlated with geographic distance between collaborating countries.

Scientific mobility and collaboration
Scientific mobility has been demonstrated to have an impact on both the productivity and collaboration of scientists, partly due to changes in geographic distance.Following relocation to a new institution, the lack of frequent communication and face-to-face interaction due to the extended distance renders it arduous to uphold connections and other social capital established at the initially affiliated institution (Groysberg et al., 2008;Jaffe et al., 1993).However, evidence from several studies suggests that scientists can not only form new collaborations after mobility but also sustain previous collaborations (Liu & Hu, 2022;Zhao et al., 2022).This phenomenon is attributed to the advancements in information and communication technologies (ICTs) that have facilitated communication and coordination (Frenken et al., 2009).

Data collection
To track the mobility of scientists, we used the ORCID snapshot of February 10 th 2022.The ORCID platform provides CVs and unique identifiers of researchers, effectively addressing the issue of name disambiguation among them.We extracted the education and employment records of individual researchers.Records lacking affiliation or with missing starting and ending times were excluded, culminating in a total of 13,971,340 records for 4,344,940 scientists.We utilized co-authored papers to capture collaboration between scientists.The ORCID identifiers of scientists were used for matching articles.The core Web of Science dataset was used for obtaining articles from January 2008 to December 2021 since the ORCID identifiers date back to 2008 in the database.

Research design
In this study, we focused on a set of scientist pairs consisting of scientists who have not moved for a long time and their collaborators who have moved during the collaboration.We estimated the impact of geographic change on scientific collaboration by comparing the variations in the collaborative performance of these pairs before and after the collaborator's mobility.We used whether the collaborators moved to other cities as a proxy for geographic change.Since there are variations in geographical distance, administrative systems, transportation facilities, and other factors, the cost of traveling between different cities is often higher than traveling within the same city (Tang et al., 2022).The sample of scientists and their collaborators encompassed in the study were required to satisfy the following three criteria: (1) The scientists have not moved to other institutions for a specified period.
(2) The scientist had a collaborator in a different institution in the same city, who has moved to another institution in the same city during this period.The pairs have collaborated on publishing papers prior to the mobility.
(3) The scientist had a collaborator in a different institution in the same city, who has moved to other cities during the period.The pairs have collaborated on publishing papers prior to the mobility.
Based on the established criteria, the pairs of scientists were found and divided into two groups.Figure 1 provides a schematic illustration of our identification and matching process.Specifically, S1 (scientists without mobility), S2 (collaborator of S1 with samecity mobility), and S3(collaborator of S1 with cross-city mobility) were originally affiliated with different institutions located in City A. S2 moved to another institution within City A at the time T0, while S3 moved to an institution situated City B at time T1.To eliminate the potential influence of time-varying factors, we restricted the time difference between the mobility of S1 and S2 to a maximum of three years.The S1-S2 pairs were classified as Group 1, which is used to evaluate the effect of same-city mobility of collaborators by calculating the difference in collaboration performance before and after the mobility.The S1-S3 pairs are classified into Group 2, which we utilized to evaluate the impact of cross-city mobility.The effect of the geographic change is captured by the difference between the two effects.

Variables Dependent variables
We used collaboration performance as the dependent variable and quantified them in two aspects: productivity and citation impact.Productivity was measured by the number of publications published by the collaborators with mobility three years before or after the mobility (# Publication), and the impact was measured by the average citations of co-publications three years before or after the mobility.The citation window is limited to three years (C3) and five years (C5).

Independent variables
The primary independent variable is whether the pairs collaborate after the geographic change (AfterMoveFar).It is a dichotomous variable, taking a value of one when the collaboration took place subsequent to the collaborator's cross-city mobility and zero otherwise.Additionally, there exist two other independent variables, one serving to differentiate whether the pair have undergone a geographic change (MoveFar), and the other refers to whether the collaboration conducts prior to or post the mobility (After).

Control variables
To mitigate the influence of factors that could potentially impact collaboration performance amongst scientists, a set of control variables is incorporated.These control variables comprise of (1) the number of publications published before the scientist-pairs collaborate by the collaborators who have moved to other institutions (CollaboratorProductivity), to control for the collaborators' past productivity (Fernandez-Zubieta et al., 2013); (2) the average number of authors for the papers the collaborators who have moved to other institutions published (AvgTeamsize) to control the collaboration propensity (Lariviere et al., 2014;Abramo et al., 2014).

Estimation strategy
We estimate the effect using OLS linear model including the S1-fixed effect: CollaborationPerformance = α + β 0  +  1  + β 2  +  3  +  + , (1) where AfterMoveFar is an indicator variable that takes the value of one after the collaborators move cross-city; MoveFar refers to whether the mobility is cross-city or in the same city; After refers to whether the collaboration is conducted after the collaborators' mobility; γ is an S1-fixed effect to control unobservable, time-invariant differences of scientists who do not move to other cities (S1 in Figure 1); and ε is an error term.

Results
Table 1 displays the descriptive statistics for the variables.The dependent variables are #publication, C3, and C5, respectively.The #publication ranges from 0 to 247, with a mean of 3 and a standard deviation of 13.The average values of C3 and C5 are 25 and 55, respectively.The geographical distance of 45.9% of the scientists changed while 54.1% of scientist pairs remains in the same cities.Table 2 presents the estimated effects of geographic change on collaboration performance.Column 1 shows the results when only the independent variables were included in the model.The coefficient for the variable AfterMoveFar is negative and statistically significant.We sequentially added all the control variables and the S1-fixed effect in Column 2 and Column 3, respectively.The coefficient for the variable AfterMoveFar is -0.290 (p<0.01) in Column 3, suggesting that the geographic change leads to a 29% decrease in the number of co-publications by the pairs.Columns 4 and 5 provide the regression results using C3 and C5 as the dependent variables, respectively.Both coefficients for AfterMoveFar are negative and significant at 0.01 level, indicating that the geographic change results in a decrease of 35.4% and 47.2% in the three-year and five-year citations of the co-publications by the pairs.To test the robustness of the results, we restricted the time difference between the mobility of S2 and S3 to one year and two years and re-ran the benchmark model.The results shown in Table 3 are consistent with the benchmark model.The coefficients of AfterMoveFar are negative and statistically significant at 0.01 level, and the magnitude of this coefficient does not change significantly.
Table 3. Model estimation results with a more strict time difference of collaborators' moving year.

Discussion and Conclusions
This study examines the influence of geographic change on scientific collaboration due to the mobility of scientist collaborators.We focused on pairs of scientists, where one scientist has not relocated for a prolonged period, and their collaborator has either relocated within or across cities.For each immobile scientist, a collaborator with crosscity mobility is matched with another collaborator with same-city mobility.By comparing the changes in collaboration performance before and after the mobility, we estimated the causal effect of geographic change on collaboration.The results indicate that geographic change has a negative effect on scientific collaboration.Specifically, the numbers and citation impact of co-publications by scientist pairs decreased after collaborators moved to other cities.
The geographic mobility of collaborators has a significant impact on scientific collaboration in two distinct ways.First, collaboration productivity decreases after the cross-city mobility of collaborators.Geographic proximity is associated with collaboration opportunities (Freeman et al., 2014).Long-distance collaboration brings high communication and travel costs, which significantly reduces collaboration productivity.Second, long-distance collaboration lowers the quality of collaboration.Transmission of tacit knowledge, in contrast to codified knowledge, heavily relies on face-to-face interaction, making it challenging to disseminate effectively through digital means (Hu et al., 2022).Despite advancements in virtual communication, faceto-face collaboration is difficult to replicate, which hinders mutual learning among scientists (van der Wouden & Youn, 2023) and the generation of creative ideas (Brucks & Levav, 2022;Horvát & Uzzi, 2022).In contrast, increasing offline meetings can lead to an improvement in the citation impact of co-publications papers (Catalini et al., 2020;Hu et al., 2022).
The contribution of this study exists in evidencing the argument that distance plays a crucial role in collaboration from a perspective of causal inference.The implications of this study are substantial for scientific collaboration, as it raises awareness among scientists regarding the significance of geographical proximity.
The study has several limitations.First, the ORCID data used in this study has inherent defects.The registered ORCID users are typically younger, and the distribution of countries is uneven (Bohannon, 2017), which may introduce some biases in our findings.Second, we used whether scientists move to a different city as a proxy for geographic change.However, different cities cover different geographic areas.In future work, we will utilize geographic coordinates to calculate the average geographic distance.Third, we overlooked the flow direction of collaboration, which may have varying impacts on collaboration performance due to changes in the research environment and institutional reputation (Deville et al., 2014).

Open science practices ORCID (Open Researcher and Contributor ID
) is an open science data source that provides researchers with a unique identifier to distinguish them from other researchers.The ORCID platform provides a snapshot of all publish data in the Registry each year.The 2022 ORCID dataset we used was downloaded from https://orcid.figshare.com/articles/dataset/ORCID_Public_Data_File_2022/21220892/6.

Figure 1 :
Figure 1: Illustration of sample identification and matching process.S1 (in green) refers to the scientist who stays in institution I1 (in green) for a long period of time.S2 (in blue) is a collaborator of S1, who moves from institution I2 (in blue) in City A (light gray rectangle) to institution I3 (in light blue) in the same city at T1 time.S3 (in orange) is another collaborator of S1, who moves from institution I4 (in orange) in City A to institution I5 (in light orange) in the City B (yellow rectangle) at T2 time.Both the pairs S1-S2 and S1-S3 published co-publications before the mobility.

Table 1 .
Descriptive statistics of the variables.

Table 2 .
The effect of geographical distance changes on collaboration performance.
* p < 0.1, ** p < 0.05, *** p < 0.01 p < 0.1, ** p < 0.05, *** p < 0.01 Further, we calculated the number of co-publications and the citations per article in the four and five years before and after the mobility of collaborators and re-ran the model by replacing the dependent variables.The results shown in Table4are still consistent with those of the benchmark model.