Platform logo
Explore Communities
Chemistry logo
ChemistryCommunity hosting publication
You are watching the latest version of this publication, Version 1.
review article

Best practices in machine learning for chemistry. Article review

11/10/2023| By
Yuliia Yuliia Pavlovska
0 Views
0 Comments
Disciplines
Keywords
Abstract

Chemistry has experienced a fundamental shift toward the use of statistical modeling and analysis in the constantly changing world of scientific research. These technologies have proven to be priceless resources that go beyond the limitations of human intuition and help researchers discover brand-new, elusive links. Chemspace offers a comprehensive service that combines DNA-encoded libraries (DEL), machine-learning models, and large chemical spaces to enhance early Drug Discovery. This approach, initially introduced by Kevin McCloskey et al., has shown promising results in obtaining active compounds for screening. Chemspace’s full-service approach includes DEL screening, machine-learning model development, and the provision of potentially active compounds, making it ideal for projects with limited structural or activity data.

Preview automatically generated form the publication file.

Chemistry has experienced a fundamental shift toward the use of statistical modeling and analysis in the constantly changing world of scientific research. These technologies have proven to be priceless resources that go beyond the limitations of human intuition and help researchers discover brand-new, elusive links. Chemspace offers a comprehensive service that combines DNA-encoded libraries, machine-learning models, and large chemical spaces to enhance early Drug Discovery. This approach, initially introduced by Kevin McCloskey et al., has shown promising results in obtaining active compounds for screening. Chemspace's full-service approach includes DEL screening, machine-learning model development, and the provision of potentially active compounds, making it ideal for projects with limited structural or activity data.

Before embarking on a journey through the world of statistical learning, a solid foundation must be laid. In the article ‘Best practices in Machine Learning in Chemistry’, they have proposed a set of ‘best practice guidelines and a corresponding checklist that can be used as a compass when navigating the complex landscape of chemical Machine learning. The researchers' goal was to maximize the validity and reproducibility of the conclusions and models derived from these best practices.

Several suggestions for using Machine Learning as a source of data:

The quality, quantity, and variety of accessible information have a significant impact on the efficacy of machine-learning models in chemistry. Data can be gained by guided experiments or calculations, or they can be static and come from well-known chemical databases. The methodology used to generate the data or the setting in which a dataset was compiled are only two examples of the many elements that might lead to bias in data sources. Bias must be acknowledged and dealt with to produce clear and reliable models.

Additionally, databases in this discipline change over time, making version control systems and ongoing access to earlier dataset versions necessary to guarantee study reproducibility. It is crucial to list data sources, explain data selection methods, and give information about access times or version numbers.

Several suggestions for using Machine Learning as a model choice:

In chemistry, Machine Learning offers a wide range of options, from traditional techniques like support-vector machines to modern approaches like deep neural networks, especially for graph-based chemical representations. However, model complexity doesn't always mean better results. For instance, a simple model with just two parameters performed as well as a complex neural network with over 13,000 parameters. This underscores the importance of choosing models wisely based on the specific problem.

Baseline comparisons, like selecting the most common class or comparing them to simpler models such as the 1-nearest-neighbor approach, play a crucial role in model selection, especially when benchmarked against state-of-the-art models.

Providing a software implementation of the chosen model is also recommended to facilitate training and testing with new data.

In summary, the field of Machine Learning in chemistry offers a range of options, but selecting the right model involves considering the complexity, conducting baseline comparisons, and ensuring accessibility through software implementations.

Several suggestions for using Machine Learning as a code and reproducibility:

The text analyzes the research reproducibility crisis, focusing on concerns such as selective reporting of good results, data dredging (p-hacking), and hypothesizing after results are known (HARKing). It emphasizes the significance of making research data and codes public to increase trust in science.

Explaining choices while training models are not adequate for reproducibility in Machine Learning. Hyperparameters and software versions are important considerations. As a result, the article suggests storing the whole code or workflow in a public repository for long-term replication and enhancement. A script or electronic notebook with all parameters should be provided as a bare minimum.

The most important message is the importance of transparency and accessibility in addressing the reproducibility challenge and bolstering scientific legitimacy.

To summarize, the development of Machine Learning has led to astounding improvements that have revolutionized our ability to understand and control chemical processes. However, to address the challenges of reproducibility and transparency, we all need to pursue an open science strategy. The recommendations in this article provide a roadmap for building robust machine-learning models and improving the readability of methods in chemistry.

For a more comprehensive understanding of this topic, readers are encouraged to explore the full article by following the provided link.

Submitted by11 Oct 2023
Download Publication

No reviews to show. Please remember to LOG IN as some reviews may be only visible to specific users.