A Large Scale Perspective on Open Access Publishing: Examining Gender and Scientific Disciplines in 38 OECD countries

Gender inequality is a persistent issue in scientific publishing, but recent studies suggest that Open Access (OA) publishing can increase the visibility and impact of female scientists' research. To provide a comprehensive, large-scale view of OA publishing considering gender and individual scientific disciplines, we analyzed a dataset of OpenAlex. Our final sample contained over 20 millions publications, including closed OA status publications, and over 8 millions publications, excluding closed OA status publications. Using the Genderize.io tool, we determined the gender of the authors. Results showed that female scientists publish more in OA mode, especially in Gold OA. Male scientists, on the other hand, tend to use Green OA. Our results also revealed a decreasing trend of Bronze OA and a growing trend of Gold and Hybrid OA. Biology, Physics, and Mathematics have the largest share of OA publications. Overall, our study provides a global perspective on gender and OA publishing and contributes to efforts to promote gender equality and inclusiveness in science and research.


Introduction
The current state of OA has been shaped by many initiatives, policies, and funder conditions. OA is now a common instrument for sharing results, not only from projects such as Horizon Europe, Research Councils UK, the US National Science Foundation or the Wellcome Trust.
The principle of open access to research and development (R&D) results is also being incorporated into initiatives that aim to change the evaluation of science and research, such as The San Francisco Declaration on Research Assessment (DORA, 2023), Hong Kong Principles for assessing researchers: Fostering research integrity (Moher et al., 2020), The Coalition for Advancing Research Assessment (COARA, 2022). Both funders and initiatives aiming to change the evaluation of R&D also advocate gender equality, equal opportunities and inclusiveness. However, far too little attention has been paid to the study of gender and OA, despite the presence of gender inequality in the modern scientific publishing system (Aksnes et al., 2019;Astegiano et al., 2019;Cui et al., 2022;Kataeva et al., 2023). With gender differences favor male in citation and authorship positions (Kim & Grofman, 2019;Koffi, 2021;Vranas et al., 2020). This can adversely affect female´s scientific careers and undervalue their scientific impact for promotion and funding. However, Nguyen (2021) and Wilson et al. (2022) argue that OA publishing can make female scientists more visible and improve their impact, making published research highly accessible to academia and the public. Murphy et al. (2020) and (Wilson et al., 2022) propose that open science has the potential to facilitate greater diversity and inclusiveness. Nguyen et al. (2022) show in the case of Vietnamese Social Science and Humanities that female participation is positively associated with OA publishing probability, but this effect is negated by the high ratio of female researchers in a publication. Similar results, that is, that female publish more in OA mode, also were reported by Ruggieri, Pecoraro and Luzi (2021) in Italy, by Wilson et al. (2022), in Australia and by dos Santos Costa, Weitzel, and Leta (2020) in Brazil. However, we must point out that females outnumbered males by only units of percent. Publications with mixed-gender authorship were most likely published under Gold OA (Nguyen et al., 2021). Although some research has been conducted on OA and gender, no studies have been found with a large-scale approach that would provide a global perspective on gender and OA.

Open Access and gender
This research aims to provide a comprehensive, large-scale view of OA publishing considering gender and individual scientific disciplines.

Methodology
For the purpose of the study, a dataset of OpenAlex (Priem, Piwowar and Orr, 2022) works in MAG format has been used. The database has been downloaded from the Registry of Open Data on AWS (https://registry.opendata.aws/openalex/) to our own AWS S3 cloudcomputing services. The Databricks platform has been used to retrieve aggregated results using i3.xlarge 30.5 GB memory, 4 Cores (2-8 workers) cluster. Our sample included works from the last 30 years ) that belonged to the following genres: book section, monograph, journal article, book, proceedings article, book chapter and proceedings for which the document type was: Book, Book Chapter, Conference and Journal. Later we classified them as publications (N journal articles = 18,941,403, N proceedings articles = 1,175,116, N others = 629,249). The publications came from 38 OECD countries (due to limitations in the number of possible gender determinations for other countries). Each publication had the following variables: year, type, genre, and OA status, field, type of collaboration by team size and type of collaboration by gender. Our final sample contained 20,745,768 publications including closed OA status publications and 8,153,315 publications excluding closed OA status publications.

Determination of publication field
Each publication has been accompanied by a list of fields from MAG's hierarchical fields structure at level zero (level-0; based on the "PaperFieldsOfStudy" file). It includes 19 major disciplines: Art, Biology, Business, Chemistry, Computer science, Economics, Engineering, Environmental science, Geography, Geology, History, Materials science, Mathematics, Medicine, Philosophy, Physics, Political science, Psychology, Sociology. The distribution is shown in Table 1.

Determination of author gender
The files "Authors", "PaperAuthorAffiliations" and the Genderize.io tool were used to determine the authors' gender. Each author in each paper had been given a unique identifier and a normalized value containing all first names (values without special characters, dots, nationality-specific characters). The first sub-word was then selected from the normalized value, assuming it represented author's first name. For the final sample, the number of unique author names were determined accounting both for authors from OECD countries and their collaborators from abroad (N unique names = 949,533). Then, for each name, the Genderize.io tool was launched, returning name, gender, probability and number of observations on which Genderize.io made a probability calculation. Using this tool, we were able to retrieve gender for 638,717 names (around 67%) with a probability greater or equal to 0.5. To increase the confidence in the results that were obtained, we only accepted author gender if the probability was greater or equal to 0.85 (463,707 names; around 72.6% of names with determined gender). This limitation has been applied because such a threshold provides high evaluation metrics for similar tools (Elsevier, 2020).

Determination of type of collaboration by team size and gender collaboration type
Knowing the gender of the individual authors in the team, we proceeded to assign the type of collaboration. Among the publications, two types of collaboration by the team size were specified: (1) single-authorships, when the number of authors in the publication was 1, and (2) collaborative, when in the team there were two or more authors. Next, the gender type of the collaboration was determined. For single-authorship publications, two possibilities were included: (1) male and (2)

OA status choice
We have adopted the following OA distribution by Piwowar et al. (2018): • "Gold OA: Published in an OA journal that is indexed by the DOAJ.
• Green OA: Toll-access on the publisher landing page, but there is a free copy in an OA repository. • Hybrid OA: Free under an open license in a toll-access journal. • Bronze OA: Free to read on the publisher landing page, but without any identifiable license. • Closed OA: All other articles".
From the set of publications, only those for which the OA status value belonged to the one of following types were selected: bronze (N pubs = 2,668,320; around 32.73% excluding closed), green (N pubs = 2,194,554; around 26.92% excluding closed), gold (N pubs = 2,438,966, around 29.91% excluding closed), hybrid (N pubs = 841,475; around 10.44% excluding closed) and additionally closed (N pubs = 12,602,453). The closed status was included for the purpose of analyzing the change in the share of open access publications among all publications over time.

Country selection
For the purpose of the study, publications were selected in which at least one co-author was from a country that belonged to the group of OECD countries (Australia, Austria, Belgium, Canada, Chile, Colombia, Costa Rica, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Israel, Italy, Japan, Korea, Latvia, Lithuania, Luxembourg, Mexico, Netherlands, New Zealand, Norway, Poland, Portugal, Slovak Republic, Slovenia, Spain, Sweden, Switzerland, Türkiye, United Kingdom, United States).

Results
The results show that 39,3 percent of publications in our dataset have been freely available in some OA form. Within the OA category, the distribution of publishing options was as follows: 32.73 percent were published in the Bronze OA model, 26.92 percent in Green OA, 29.91 percent in Gold OA, and 10.44 percent in Hybrid OA. The remaining 60.7 percent were non-OA publications or had a closed status. The inclusion of closed status was necessary to track changes in the share of open access publications over time. In the following sections, we describe the evolution of publishing in OA and its relationship to gender and science. Figure 1 shows the proportion of share of OA status among OA publications by gender type over time. From Figure 1 can be seen that females are more likely to choose Gold OA than male in recent years. Male, on the other hand, more using the Green OA option than female (although the trend for Green OA is decreasing across our results). When interpreting the green OA we have to be careful and take into account the embargo and also the possibility of preprints that are not linked to the postprint version. We have not considered these factors in our study and they may be the subject of further research. At the same time, we can observe a decreasing trend of Bronze OA and a growing trend of Hybrid OA. Similar results can be seen in the results comparing share of OA status among all publications by gender type over time (Figure 2). Even more evident from the data in Figure  2 is the long-term growth of Gold OA and the growth of Hybrid OA in recent years. Figure 2: Share of Open Access Status among all publications by team size collaboration and by gender

Open Access Status, Gender and Discipline
The results also show the proportions of OA publications across disciplines. As can be seen from Disciplines such as Economics and Physics have had a large proportion of Green OA -and as we have already noted above, males in particular are using Green OA (see Figure 3 and Figure 4). However, the Green OA is showing a downward trend in all disciplines, except for Mathematics, Economics and Physics, where Green OA is increasing.
The finding that females in the humanities (Arts and Philosophy) largely publish in Gold OA also provides an interesting perspective. Overall, humanities disciplines publish more in Gold OA than other disciplines.

Conclusion
This study provides a comprehensive, large-scale view of Open Access publishing, with a particular focus on gender and individual scientific disciplines.  Nguyen et al., 2021;Ruggieri et al., 2021;Wilson et al., 2022), that females are more likely to publish in OA. Future research is to examine why females are more likely to publish in OA Gold. Considering the disciplines, we can see that Economics and Physics publish the most in Green OA, which can probably be explained by the existence of established repositories in both disciplines where researchers can deposit their publications. However, these propositions need deeper grounding, which we will work on in future publications.
The study provides a global perspective on gender and OA that could contribute to efforts to promote gender equality and inclusiveness in research. The findings of this research may have implications for funding agencies, universities, and scientific publishers. We hope that our findings will help to promote gender equality and inclusiveness in science and research and contribute to efforts to create a more equitable and accessible scientific publishing system. API, open-source code) source of scholarly metadata, OpenAlex has potential to improve the transparency of research evaluation, navigation, representation, and discovery…". We are not opening our source code yet because we are interested in working on the study further and developing its potential.