Home >

news ヘルプ

論文・著書情報


タイトル
和文: 
英文:Compound virtual screening by learning-to-rank with gradient boosting decision tree and enrichment-based cumulative gain 
著者
和文: 古井 海里, 大上 雅史.  
英文: Kairi Furui, Masahito Ohue.  
言語 English 
掲載誌/書名
和文: 
英文:In Proceedings of The 19th IEEE International Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB 2022) 
巻, 号, ページ        
出版年月 2022年8月26日 
出版者
和文: 
英文:IEEE 
会議名称
和文: 
英文:19th IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology(IEEE CIBCB2022) 
開催地
和文:オタワ 
英文:Ottawa 
公式リンク https://doi.org/10.48550/arXiv.2205.02169
 
DOI https://doi.org/10.1109/CIBCB55180.2022.9863032
アブストラクト Learning-to-rank, a machine learning technique widely used in information retrieval, has recently been applied to the problem of ligand-based virtual screening to accelerate the early stages of new drug development. Ranking prediction models learn based on ordinal relationships, making them suitable for integrating assay data from various environments. Existing studies of rank prediction in compound screening have generally used a learning-to-rank method called RankSVM. However, they have not been compared with or validated against the gradient boosting decision tree (GBDT)-based learning-to-rank methods that have gained popularity recently. Furthermore, although the ranking metric called Normalized Discounted Cumulative Gain (NDCG) is widely used in information retrieval, it only determines whether the predictions are better than those of other models. In other words, NDCG cannot recognize when a prediction model produces worse than random results. Nevertheless, NDCG is still used in the performance evaluation of compound screening using learning-to-rank. This study used the GBDT model with ranking loss functions, called lambdarank and lambdaloss, for ligand-based virtual screening; results were compared with existing RankSVM methods and GBDT models using regression. We also proposed a new ranking metric, Normalized Enrichment Discounted Cumulative Gain (NEDCG), aiming to evaluate the goodness of ranking predictions properly. In addition, the results showed that the GBDT model with learning-to-rank outperformed existing regression methods using GBDT and RankSVM on diverse datasets. Finally, NEDCG showed that the predictions by regression were comparable to random predictions in multi-assay, multi-family datasets, demonstrating its usefulness for a more direct assessment of compound screening performance.

©2007 Institute of Science Tokyo All rights reserved.