A peek behind the paper – a strategy for global QSAR modeling
We take a peek behind the paper with first author Jonna Stålring; can a quantitative structure–activity relationship model predicting human liver microsomal stability be developed and confidently applied to drug discovery?
Jonna Stålring earned her PhD in 2001 with a thesis focused on method development for multi-configurational electronic wavefunctions. During her PhD she co-authored the quantum chemistry software package, MOLCAS.
For the past 15 years, Stålring’s major research interest has been the application of machine learning algorithms within chemometrics, image analysis and ADMET predictions in drug discovery. Stålring has strived to develop methodology within the QSAR area, through her own ideas and by introduction of new methods in collaboration with machine learning scientists. In 2011, Stålring published a paper about the QSAR platform AZOrange that has been publicly available ever since to support modeling and method development within the QSAR area.
Stålring has conducted her research in various scientific positions at AstraZeneca (Gothenburg, Sweden) and Medivir (Huddinge, Sweden), and is currently a Principal Scientist at LEO Pharma (Ballerup, Denmark).
Can you give us a short summary of your recent research article, ‘Confident application of a global human liver microsomal activity QSAR’?
The paper describes a strategy for global QSAR modeling, which is particularly applicable to limited datasets. The strategy entails two methods: one method for efficient use of data, termed project-specific hold-out sampling, and another method for selection and validation of a prediction confidence algorithm accounting for the different origins of lack of predictivity, termed holistic prediction confidence (HPC).
Despite the limited size of the data set (2000 compounds), the methodology makes it possible to develop a global human liver microsomal (HLM) QSAR model that can be confidently applied in new drug discovery (DD) projects. In my opinion, the methodology is also useful for datasets containing tens of thousands of compounds as, relative to the size of chemical space, all QSAR datasets are limited.
What inspired you to carry out this research?
I believe that global QSAR models are often misused in DD, because the importance of the applicability domain (AD) is generally underestimated. The situation illustrates the challenges associated with transferring methods between disciplines. Within the machine learning area, most research is based on the assumption of independently and identically distributed data. Hence, in general, machine learning methods do not address violations of the independently and identically distributed criteria.
Due to the very nature of DD, new parts of chemical space are continuously explored and these parts are most often not independent nor identically distributed with the training set used during model development. These differences in distribution are overlooked when the generalization accuracy of a model is solely quantified by randomly sampled test sets, including cross-validation. The resulting accuracy could be completely misleading as an indication of the performance in the future applied setting. In my opinion, a QSAR model and its confidence metrics should be evaluated by several test sets, one randomly sampled from the training set and others representing future variations, such as new congeneric series.
Many scientists define the AD or prediction confidence somehow. However, the validation of the AD methodology, similarly to the validation of the model accuracy, is often overlooked, resulting in disappointing accuracy of model predictions in DD projects. The lack of validation of the AD method, in conjuction with an overoptimistic quantification of the generalization accuracy resulting from random test sampling, has created a poor reputation for global QSAR models that I would like to influence by providing a method to address this challenge.
What were the key conclusions?
The study shows that by using HPC, a global HLM model can also be confidently applied in DD projects lacking any compounds in the training set of the model. For projects with very dissimilar compounds, all compounds are likely to be considered outside of the AD. However, by updating the model with as few as tens of compounds, the model rapidly becomes useful for new chemistry. The results show the importance of internal model development and updating for prediction of HLM activity.
What challenges did you come across?
For a computational scientist it is always challenging to understand and interpret biological data. My co-authors Anna-Karin Sohlenius-Sternbeck and Ylva Terelius were a great help in defining and pruning the dataset. I’d also like to acknowledge the stimulating discussions with Kevin Parkes.
What work are you hoping to do next in this area?
In this study, HPC and project-specific hold-out sampling has only been applied to one dataset. I would like to extend the study to multiple data sets to prove that the approach can generally increase the confidence in global QSAR predictions and thereby transform the application of global QSAR models in DD.