Machine Learning-Driven QSAR and Docking Pipeline for Identification of Amyloid Beta-A4 Inhibitors in Alzheimer’s Disease

Date:

conf Background: Alzheimer’s disease (AD) is a neurodegenerative disease that usually starts slowly and progressively worsens. It is the cause of 60–70% of cases of dementia. The goal of this study is to use machine learning-driven QSAR and docking techniques to discover lead compounds that target the amyloid beta-a4 protein as a potential target for Alzheimer’s. Methods: QSAR modeling is used in this study to compare different machine learning models to predict the potency (pIC50) of candidate compounds, which are then validated by molecular docking based on their binding affinity. A non-redundant dataset consisting of 1241 compounds for amyloid beta-a4 was retrieved from the ChEMBL database. 880 substructure fingerprints were used to define these compounds, followed by multiple machine learning models building and comparison using LazyPredict. Results: The Histogram-based Gradient Boosting Regression Tree achieved the root mean square error (RMSE) of 0.65, R-squared value of 0.73, and time efficiency of 0.78. RFR-derived Gini index revealed the importance of the compound’s features, SubFP23, SubFP405, and SubFP577. The Kennard–Stone algorithm was used to select a diverse set of 10 compounds from the active inhibitors list for testing. The API was developed and deployed using Streamlit. Later, molecular docking was performed using PyRX to evaluate binding affinity and generated visualizations using ChimeraX. Conclusion: The lead compound (CHEMBL4177507) with pIC50 7.25M and Binding affinity of -9.2kcal/mol was identified. This ML-based QSAR Modeling and docking approach is an effective strategy for accelerating drug discovery. YouTube