Green technologies’ reliance on science: Investigating the science knowledge base via scholarly work cited in patents

The objective of this study is to understand the relationships between science and technology to help synergize and foster green technology innovation. Given the ongoing reality of climate change, green technologies have been recognised as a useful mean to mitigate its effect. Alongside this, knowledge diffusion has been seen as a key to develop green technologies and promote innovation. This paper examines the interconnections between science and technology in green technologies by investigating the scholarly work cited in patents. In doing so, this study can yield valuable insights on the knowledge base of nascent technologies to determining present-day policy priorities and a future roadmap for research direction.


Introduction
Science has long been seen to have a significant impact on innovation and economic growth (Poege et al., 2019).Several studies have pointed out the complex relationship between science and technology, which have been described as distinct but interconnected bodies of knowledge (Narin, 1976;Dasgupta and David, 1994).Modern technology's reliance on science has become not only obvious but highly complex and capturing the constantly evolving relationship between science and technology remains a challenge.This is particularly important in areas of social relevance, such as the development of technologies for climate change mitigation.Climate change is the most persisting and pressing global issue and there has been growing recognition that green technologies can serve as a means to mitigate its destructive effects (Azadi et al., 2020).Studies have demonstrated how green technologies such as solar panels, wind turbines, and hybrid vehicles can effectively reduce carbon emissions and enhance energy efficiency (Azadi et al., 2020;Jafari et al., 2021).What we do not know is how the scientific knowledge base informed and enabled these developments.
In this paper, I want to shed light on this question by investigating the linkages between science and green technologies.To do so, I consider how green technology patents link to a scientific knowledge base.Patent citations to scholarly work are achnowledged as a proxy for the technology's closeness to the scientific frontier (Mowery et al., 1996;Shibata et al., 2008).Citations to non-patent literature (NPLs) in patent citations has been recognized as a scientific knowledge resource that is commonly integrated into the research and development (R&D) process (Marx and Fuegi, 2020).NPLs in backward citation of patents can thus be considered as the science knowledge base of patents (Glänzel & Meyer, 2003;Bhattacharya et al., 2003;Lyu at al., 2020).
The present study also aims to explore the extent of indirect dependence on non-patent literature (NPL) by scrutinizing whether patents make reference to other patents that in turn cite NPLs.Through this approach, the investigation will yield valuable insights into the nature and degree of both direct and indirect reliance on NPLs in the domain of patent literature.
The results of this paper can provide insight into areas of green research that warrant further exploration, similar to keeping abreast of current policy agendas.In this context I also try to carve out how emerging knowledge takes shape in knowledge diffusing networks as the potential results could chart a course for future research direction aimed at the development of crucial scientific and technological advancements.

2.1.The links of Science and Technology
The relationship between science and technology has been the subject of extensive research and discussion.Scholars have been exploring how scientific research influences technological development, how scientific knowledge is translated into technological innovation, and how technological advancements inform scientific inquiry.One major theme that emerges from the literature is the idea that science and technology are deeply intertwined.Particularily, scientific knowledge has been seen as an important factor in shaping technological trajectories and creating breakthrough innovations (Meyer, 2000;Cohen et al., 2002;Adner and Kapoor, 2010).This is not the same for all areas of technological development, however.Ahmadpoor and Jones (2017), for instance, demonstrated that in the medical field, technological advances were driven by scientific knowledge.This echoed the work of Pavitt (1984), which argued that certain industries, such as chemicals and pharmaceuticals, are more likely to innovate through the application of scientific knowledge, while others, such as machinery and textiles, rely more heavily on technical knowledge.The distinction drawn between science-based and engineeringbased industries has significant implications for innovation policy.In particular, this implies the certain differentiation exists that affects how policymakers and funding agencies prioritize and allocate resources for research and development activities across these two industry types.Depending on the specific goals and objectives of innovation policy, the emphasis may need to be placed on one type of industry over the other.
There have been further debates on the links between science and technology.While Dasgupta and David (1994) argued that technological change is driven by the accumulation of scientific knowledge, with new discoveries and innovations building on earlier ones, Gibbons et al. (1994), in their discussion of "mode 2", emphasized the importance of interdisciplinary collaboration between scientists and engineers.Similarly, Poege et al. (2019) analyzed the impact of interdisciplinary knowledge on technological innovation, suggesting that the integration of knowledge across multiple fields could lead to more significant technological breakthroughs.These scholars have shown that the links between science and technology are complex and multifaceted, and that understanding these links requires a nuanced and interdisciplinary approach.
Overall, the past research suggests that the relationship between science and technology is complex and multifaceted, with science playing a critical role in shaping technological development.More than that, the literature also highlights the complex links between science and technology, and the need for interdisciplinary collaboration to foster innovation and address societal challenges.This could be of particular relevance in the domain of green technologies, which are emerging technologies that may require substantial scientific advances (Rotolo et al, 2015).

3.1.Data
The research data is being gathered from two databases, namely Lens.org and "Reliance on Science", as part of a data collection exercise (Marx and Fuegi, 2020;Marx and Fuegi, 2022).This includes various details pertaining to citations to scholarly work, such as the year of publication, the title of the paper, and the subject categories assigned to it by both OECD and Web of Science.The data under scrutiny relates to a time frame of 30 years, commencing on January 1, 1990 and concluding on December 31, 2020December 31, (1990December 31, -01-01 -2020-12-31)-12-31), with the earliest priority date assigned to each patent serving as the basis for its inclusion in the analysis.The final count of patent data is 382,500 with 32 variables and 720,133 NPLs.
Green technology patents are classified according to the Cooperative Patent Classification (CPC) system, which has been widely used in research studies as a means of identifying and categorizing patents related to green technologies.In the context of green technologies, the Y02E patent classification is particularly relevant as it specifically relates to "reduction of greenhouse gas [GHG] emissions, climate change mitigation technologies or the use of renewable energy sources" (USPTO, n.d.).

3.2.Non-patent literature (NPLs)
Methodologically, patents and their citations to prior scientific research can provide valuable insights into the connections between science and technology.Under this logic we treat prior art such as patent citations and NPLs as knowledge source or technological building blocks.The earlier work by Narin (1976) and Narin and Noma (1985), which examined the connections between science and technology through patent citations, indicated that scientific research played an essential role in technological innovation and found a positive correlation between the two (Narin et al. 1997).
Non-patent literature (NPL) refers to non-patent citations such as scientific articles, conference proceedings, and technical reports (Haeussler & Harhoff, 2010).They offer a potential way of uncovering the various academic fields in a technological domain as well as embodying the the knowledge flows between science and technology (Narin et al., 1997;OECD, 2009;Verbeeck et al., 2002).NPL can be used in combination with patent data to create a more comprehensive picture of innovation activity (Harhoff & Hoisl, 2007).There has been a clear trend of increasing NPLs when it comes to international patents (More than 55% of NPL citations) in the field of biotechnology (OECD, 2009).Prior work such as Nagaoka (2007) has highlighted that when firms are innovative and focus on science, there is a strong possibility that these firms are developing high tech products.Studies have shown that NPL is cited in a significant proportion of patent applications, with some estimates suggesting that up to 70% of patent applications cite NPL (Haeussler & Harhoff, 2010).Marx and Fuegi (2020) further show that patents by university scientists contain more citations to NPL compared to firm or government patents.
The OECD Patent Statistics Manual (2009) revealed that NPLs account for 15% of all citations in patents (the rest citing other patents), and that in in biotechnology, fine organic chemistry this share is larger than 50% (OECD, 2009).Recent years have also see a growth of patent citations to scholarly work (Marx and Fuegi, 2022), It seems possible that the prior art of patents in emerging and interdisciplinary areas are no longer just dominated by patent-to-patent citations (Verbeek et al., 2002).Some databases of NPL that can be searched in conjunction with patent data (Harhoff & Hoisl, 2007).For example, the RoS (Reliance on Science) database is a scholarly work database developed by Marx and Fuegi (2020) to support research on science, technology, and innovation.This database mainly connects patent and scholarly work by Microsoft Academic Graph (MAG) identifier, which allows users search for NPL by topic, author, publication type, and contextual information.Additionally, several studies have explored the use of NPL in patent citation analysis to gain insights into the diffusion of knowledge and the impact of scientific research on innovation (Haeussler & Harhoff, 2010).
In conclusion, non-patent literature (NPL) is a valuable source of information for patent examination and innovation research, as it can provide additional context about the state of the art and the knowledge available to the inventor at the time of invention.

Methodology
The methodology for this study involves several steps to investigate the reliance on non-patent literature (NPL) from patent data.Firstly, the study aims to determine the percentage of patents that cite NPLs by employing algorithms and the programming language Python to match patent data with non-patent literature (NPL) data.To gain a more in-depth understanding of the relationship between patents and NPLs, the study examines the number of patents and patent families that have NPLs.Furthermore, the study analyzes which CPC Y02E by examining the share of patents with NPLs by CPC, and the average number of NPLs per patent by CPC.
This study aims to uncover the nature of the knowledge base in the field of green technology, which identifys key concepts within a corpus of non-patent literature spanning the past three decades.The initial set of analyses utilized the TF-IDF method, which is widely recognized as a useful tool for identifying significant concepts in a given corpus of text (Juršič et al., 2012;Qaiser and Ali, 2018).The term "frequency-inverse" refers to the fact that infrequently used words may be deemed important if they possess high semantic relevance to the corpus as a whole (Jalilifard et al., 2021).
In addition to these analyses, the study will examine the diversity of journals cited in patent literature, and the total number of different journals cited.To further investigate the relationship between patents and NPLs, network graphs will be created to identify how different fields cluster based on their appearance on patents.Additionally, the study will investigate indirect reliance on NPLs by examining whether patents cite other patents that in turn cite NPLs.This will provide insight into both direct and indirect reliance on NPLs in patent literature.
Overall, in order to capture the links between various scientific fields, both citation and text analysis will be utilized.Such methodology aims to provide a comprehensive understanding of the reliance of NPLs, and the relationship between patents and NPLs in different fields.

Overview of green patents overtime
The table presents a breakdown of patent document types based on count and percentage.The majority of patent documents following priority year in the sample are patent applications, accounting for 222,768 or 58.27% of the total.Granted patents make up 41.56%, with a total count of 158,906.A small percentage of documents fall into the category of amended application and amended patent, with 452 (0.12%) and 200 (0.05%) documents respectively.The table provides important information on the distribution of patent document types in the sample, highlighting the prevalence of patent applications as the dominant document type.The line graph presented in Figure 1 illustrates the trend in patent applications and granted patents over time.The dataset has been organized chronologically based on the priority year.
Both lines show a consistent increase in numbers, with a peak occurring in around 2010.However, following the peak, a slight drop in the numbers is observed.The graph presents important insights into the temporal evolution of patent activity, highlighting a period of significant growth followed by a subtle decline in patenting.
Figure 1: The count of patent data overtime.

How many of patents have NPLs and which CPC areas are more reliant on science
The visual representation in Figure 2 displays the overall number of successful patent matches for each technological field.It is important to note that a larger quantity of patent matches with NPLs does not necessarily indicate a greater influence from science, as it also depends on the size of the denominator.If the denominator is large, then the influence will be relatively small.For instance, technologies for the production of fuel of non-fossil origin (Y02E10/10) has a sum of 2,584 patents with 1,003 successful matches and 1,581 unmatched counts.The successful match rate is roughly 38.81%, while the unmatched count rate is approximately 61.18%.In terms of NPL match count and successful match rate, there are noticeable variations among technological categories, despite the overall similarity in levels.Table 2 presents a breakdown of the number of patents linked to NPL data against the total number of patents in each category.The highest successful match rate is observed in Biofuels (Y02E50/10), indicating a stronger dependence on scientific knowledge for technological innovation.Fuel of non-fossil origin (Y02E50/00), which has the highest number of matches, ranks second in terms of successful match rate.Remarkably, Microcrystalline silicon PV cells (Y02E50/545) rank third in successful match rate.When comparing the technologies in terms of statistics, the data reveals that biofuels have the highest successful match rate of 94.93%, followed closely by technologies for the production of non-fossil fuel with a successful match rate of 94.08%.In contrast, microcrystalline silicon PV cells have a lower successful match rate of 90.88%.On average, these top patent categories have a successful match rate of over 90 percent.

The TF-IDF matrix of NPLs
The graphical representation in Figure 4 indicates that the most relevant concept in NPLs is microgrids, with a TF-IDF value of 0.84988.Microgrids refer to one of the microsources of power electronics (Lasseter, 2002).Additionally, the word "storage" (TF-IDF=0.52697)holds significant importance in describing various forms of energy storage, including electrochemical and nucleic acid storage (Wan et al., 2010;Yang et al., 2011).The remaining words in the graph have a TF-IDF value of zero, suggesting that the knowledge within NPLs is diverse, but the prominence of these two concepts, which are related to the top CPC class of Biofuels, cannot be ignored.Upon the removal of stop words and vectorization, the resulting TF-IDF matrix consisted of 175,595 instances and 63,084 features.In the Figure 5, if we narrow our focus to patents filed within the last 5 years.During this time, a greater diversity of keywords is evident in the data.The Y02E patents have predominantly focused on Natural Sciences, with significant contributions from Chemical Sciences, Biological Sciences, and Environmental Biotechnology.Approximately two-thirds of the patents in this dataset are related to Physical Chemistry, Electrochemistry, and Multidisciplinary Chemistry, which is noteworthy.One interesting observation is that the keywords used in these patents are highly specialized and domain-specific, resembling technical jargon.Consequently, the knowledge contained within each patent category is distinct and not easily comparable, making it challenging to identify interdisciplinary phenomena.
Figure 5: TF-IDF analysis on top 6 categories of paper titles over the last five years.

Conclusion
The visual representations of patent matches with NPLs reveal noticeable variations among technological categories, this information has made us aware that the concept of interdisciplinarity can remain relevant to my research as its complexity and diversity align with the research elements.
Whilst it is imperative to understand the components that constitute the knowledge base, green technologies is a range of broad and diverse knowledge and to make informed decisions about research investment and resource allocation could be a challenge.
There remain several puzzles that require further investigation.For example, each Y02E patent category to scholarly work is specific to its domain, and there is a need for a comprehensive analysis that establishes the connection between various categories.While social network analysis is an effective approach to accomplish this, it may not be the only feasible method.
As I focus on the emerging area of green technology, which is vibrant and dynamic, I am still exploring the optimal means to capture this dynamism and track the interplay between science and technology.

Figure 2 :
Figure 2: The distributing sum of NPL count in each technological fields

Figure 3 :
Figure 3: The count of NPLs by CPC Y02E technological categories.

Figure 4 :
Figure 4: The bar chart of TF-IDF of the entire NPLs

Table 1 .
Table captures patent document types.

Table 2 :
The The statistics of match rate of NPLs by technological categories.