您现在的位置是:Scientists use the language of molecules to accelerate material and drug discovery >>正文

Scientists use the language of molecules to accelerate material and drug discovery

上海品茶网 - 夜上海最新论坛社区 - 上海千花论坛8293人已围观

简介By subscribing, you agree to our Terms of Use and Policies You may unsubscribe at any time.Predictin...

By subscribing, you agree to our Terms of Use and Policies You may unsubscribe at any time.

Predicting molecular properties and generating new molecules is critical for material and drug discovery. The advancement of machine learning (ML) technologies has led them to be employed for material and drug discovery.

Scientists use the language of molecules to accelerate material and drug discovery

However, one of the problems with using ML models for material and drug discovery is the training process, which often requires extensive datasets, which can be expensive and time-consuming to create. 

Now, a team of researchers from Masacheussets Institute of Technology (MIT) has built a unified framework that can predict molecular properties and generate new molecules while trained on a relatively small dataset.

The team was led by Minghao Guo, a graduate student at MIT, who is also the study's first author. Their system is more efficient than traditional deep-learning approaches. 

See Also Related
  • Scientists use AI to develop drugs that can treat opioid addiction 
  • AI-discovered drug enters Phase II trials, first patients dosed in the US, China 
  • AI zeroes in on an antibiotic drug that can kill a dangerous bacteria 

"Our goal with this project is to use some data-driven methods to speed up the discovery of new molecules, so you can train a model to do the prediction without all of these cost-heavy experiments," said Guo in a press release. 

The language of molecules

Traditional methods rely on ML models acquiring knowledge based on large datasets which aren't domain-specific. This results in the model performing poorly.

The research team decided to take a different approach by relying on the language of molecules. Atoms and molecules obey laws or rules of physics that dictate how they interact with each other to form molecules. The researchers used this molecular grammar to train their system.

The system can produce new compounds and anticipate their attributes in a data-efficient manner by learning this language and identifying the similarities between molecular structures.

"Once we have this grammar as a representation for all the different molecules, we can use it to boost the process of property prediction," explained Guo in the press release. 

The team used reinforcement learning to train the system on the production rules of molecular grammar. They simplified the learning process by breaking the molecular grammar into two components—a general metagrammar and a molecule-specific grammar.

Combined with reinforcement learning, this hierarchical approach accelerated learning and empowered the system to generate viable molecules and make accurate predictions about their properties.

Making predictions

The researchers tested their system and found that it outperformed several state-of-the-art ML approaches at generating feasible polymers and polymers, as well as predicting their properties. This was when the model was trained on a domain-specific dataset having only a hundred samples.  

Some prior approaches also needed costly pretraining, which their system dodges. Their system performed remarkably nally well at predicting the properties of polymers like glass. These properties are hard to determine experimentally, requiring very high pressures and temperatures. 

The researchers achieved comparable results using only 94 samples, cutting the training set by more than half.

"This grammar-based representation is very powerful. And because the grammar itself is a very general representation, it can be deployed to different kinds of graph-form data. We are trying to identify other applications beyond chemistry or material science," Guo said in the press release.

The researchers aim to extend their research to incorporate 3D geometry to study polymer chain interactions. They also work on an interface to display learned grammar rules and gather user feedback for improved accuracy.

Their findings were presented at the Proceedings of the 40th International Conference on Machine Learning.

Abstract:

The prediction of molecular properties is a crucial task in the field of material and drug discovery. The potential benefits of using deep learning techniques are reflected in the wealth of recent literature. Still, these techniques are faced with a common challenge in practice: Labeled data are limited by the cost of manual extraction from literature and laborious experimentation. In this work, we propose a data-efficient property predictor by utilizing a learnable hierarchical molecular grammar that can generate molecules from grammar production rules. Such a grammar induces an explicit geometry of the space of molecular graphs, which provides an informative prior on molecular structural similarity. The property prediction is performed using graph neural diffusion over the grammar-induced geometry. On both small and large datasets, our evaluation shows that this approach outperforms a wide spectrum of baselines, including supervised and pre-trained graph neural networks. We include a detailed ablation study and further analysis of our solution, showing its effectiveness in cases with extremely limited data.

Tags:

相关文章



友情链接