Compound virtual screening by learning-to-rank with gradient boosting decision tree and enrichment-based cumulative gain

Kairi Furui; Masahito Ohue

doi:10.1109/CIBCB55180.2022.9863032

論文・著書情報

タイトル

和文:
英文:	Compound virtual screening by learning-to-rank with gradient boosting decision tree and enrichment-based cumulative gain

著者

和文:	古井海里, 大上雅史.
英文:	Kairi Furui, Masahito Ohue.

言語

English

掲載誌/書名

和文:
英文:	In Proceedings of The 19th IEEE International Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB 2022)

巻, 号, ページ

出版年月

2022年8月26日

出版者

和文:
英文:	IEEE

会議名称

和文:
英文:	19th IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology(IEEE CIBCB2022)

開催地

和文:	オタワ
英文:	Ottawa

公式リンク

https://doi.org/10.48550/arXiv.2205.02169

DOI

https://doi.org/10.1109/CIBCB55180.2022.9863032

アブストラクト

Learning-to-rank, a machine learning technique widely used in information retrieval, has recently been applied to the problem of ligand-based virtual screening to accelerate the early stages of new drug development. Ranking prediction models learn based on ordinal relationships, making them suitable for integrating assay data from various environments. Existing studies of rank prediction in compound screening have generally used a learning-to-rank method called RankSVM. However, they have not been compared with or validated against the gradient boosting decision tree (GBDT)-based learning-to-rank methods that have gained popularity recently. Furthermore, although the ranking metric called Normalized Discounted Cumulative Gain (NDCG) is widely used in information retrieval, it only determines whether the predictions are better than those of other models. In other words, NDCG cannot recognize when a prediction model produces worse than random results. Nevertheless, NDCG is still used in the performance evaluation of compound screening using learning-to-rank. This study used the GBDT model with ranking loss functions, called lambdarank and lambdaloss, for ligand-based virtual screening; results were compared with existing RankSVM methods and GBDT models using regression. We also proposed a new ranking metric, Normalized Enrichment Discounted Cumulative Gain (NEDCG), aiming to evaluate the goodness of ranking predictions properly. In addition, the results showed that the GBDT model with learning-to-rank outperformed existing regression methods using GBDT and RankSVM on diverse datasets. Finally, NEDCG showed that the predictions by regression were comparable to random predictions in multi-assay, multi-family datasets, demonstrating its usefulness for a more direct assessment of compound screening performance.

Home

各種検索

サポート

T2R2について

関連リンク

論文・著書情報