Machine Learning-Based Prediction of EGFR Bioactivity Using Molecular Fingerprints
by Kandula Siri Chandana, Vanitha Kakollu
Published: May 1, 2026 • DOI: 10.51244/IJRSI.2026.1304000079
Abstract
The process of drug discovery involves a number of factors and can be described as complicated, time-taking and costly. EGFR has become one of the main targets for further investigation in oncological diseases research. To discover new medicines, it is necessary to discover active chemicals against EGFR. This work proposes the use of machine learning to predict bioactivity based on molecular fingerprints extracted from the SMILES string of a compound. The used dataset contains data from the ChEMBL database. The dataset was preprocessed into binary classes of bioactive molecules. We implemented a variety of machine learning models such as Random Forest, Support Vector Machine, Logistic Regression, Gradient Boosting, and XG Boost. The best performance among all tested models was provided by Random Forest. The obtained accuracy was 87%. The implementation of the model was done using Streamlit web framework.