Active search for computer-aided drug design
The growth in data arising from drug discovery efforts and the current excitement around artificial intelligence are prompting new developments; we summarize our recent work in this area and its wider context.
Jonathan D Hirst1
1School of Chemistry, University of Nottingham, Nottingham, UK
Our work  arises from a collaboration between computational chemists and computer scientists at the University of Nottingham (UK) and medicinal chemists at GSK (Stevenage, UK). It is part of a wider drug discovery project, where University of Nottingham undergraduate chemists synthesize new molecules, which are then assayed by GSK. This collaboration presents an excellent and relatively rare opportunity for us to investigate how new machine learning algorithms can enhance the drug discovery process.
Many projects within pharma are time-constrained to the point of precluding fundamental algorithm development; on the other hand, academia does not usually have access at the level we have to the drug discovery infrastructure that is in place via a strong strategic partnership.
This collaboration presents an excellent and relatively rare opportunity for us to investigate how new machine learning algorithms can enhance the drug discovery process.
In our research, we push forward the development of machine learning methods, focused on the specific demands of drug discovery and on the specific challenges of a process which involves learning from previous experiments. Many existing methods have been built on assumptions that are often not strictly appropriate in this context. By starting from a stronger theoretical foundation, we expect to derive algorithms that are more successful and more generally applicable than earlier work.
Idiopathic pulmonary fibrosis
The disease of interest is idiopathic pulmonary fibrosis (IPF), which leads to the formation of scar tissue in the lungs. The disease affects approximately 50 per 100,000 people and is on the increase. Fatality often occurs within two to five years. The best current treatment, lung transplantation, is available to only 5% of patients and recently approved drugs slow the disease, but do not reverse or cure it.
The target protein is one of a family of proteins known as the integrins. They are transmembrane receptor proteins involved in cell–cell interactions and interactions between cells and the extra-cellular matrix. Integrins play a key role in immune responses, clotting and scar formation. Antagonism of αVβ6 is one promising avenue for the development of a novel therapeutic treatment of IPF and is the receptor that we focus on. The natural substrate is transforming growth factor β1 (TGF-β1), which has a tri-peptide binding motif, arginine-glycine-aspartic acid, or RGD (in the one-letter code for amino acids).
Antagonism of αVβ6 is one promising avenue for the development of a novel therapeutic treatment of IPF.
We consider a series of compounds  derived from an RGD mimetic, with a naphthyridine fragment as an arginine mimic, an alkyl chain with an amide bond as a glycine mimic and a carboxyl group as an aspartate mimic. We start with a particular individual integrin antagonist compound as the parent and our algorithm explores substitutions at five possible points on an aryl ring. As a proof of principle, we consider a virtual space of approximately 200,000 compounds.
A variety of possible substituents were considered: H, F, Cl, Br, methyl, ethyl, propyl, isopropyl, cyclopropyl, methoxy, hydroxyl, CF3, OCF3, SO2Me, nitrile and several heterocycles: imidazole, pyrazole and triazole (with possible substituents of H, methyl or ethyl).
Searching 'intensionally' defined design spaces
Chemical space is large, to the point of precluding its explicit enumeration. Thus, it represents a so-called intensionally defined design space. Search strategies for intensionally designed spaces are a current area of interest in machine learning.
The algorithm is designed to propose compounds that maximally increase the known information.
We have implemented and applied an active search algorithm in the form of a data-driven adaptive Markov chain. At the core of the search is the Open Eye molecular docking program, FRED, which uses a rigid ligand approach, where a large number of conformations are generated and each of those are docked successively. The process is made significantly more effective by using a probabilistic surrogate, maximum entropy model, at every iteration of the search, which is updated by relatively few calls to the molecular docking routine.
The algorithm is designed to propose compounds that maximally increase the known information. This is achieved by accepting new compounds according to a Metropolis criterion based on an estimate of the probability the current model predicts the compounds as hits.
Summary and perspective
The study is a “proof of principle” and we are currently developing some considerable refinements to the algorithm. Nonetheless, from a medicinal chemistry and drug discovery perspective, the molecules suggested for synthesis are promising. Many of the proposed compounds conform with both unpublished and published work; the approach discovered 19 out of the 24 active compounds which are known to be active from previous biological assays.
Nonetheless, from a medicinal chemistry and drug discovery perspective, the molecules suggested for synthesis are promising.
Our algorithm is: (i) soundly based in machine learning, (ii) proposes structures from an implicitly defined space of potential designs, (iii) is guaranteed to converge, and (iv) achieves a large structural variety of proposed target structures, some of which provoke significant interest from a medicinal chemistry perspective. We are embarking on the synthesis and characterization of some of the more promising proposed compounds.
- Oglic D, Oatley SA, Macdonald SJF, Mcinally T, Garnett R and Gärtner T. Active search for computer-aided drug design. Mol. Inform. 37(1–3), 1700130 (2018)
- McInally T and Macdonald SJF. Unusual undergraduate training in medicinal chemistry in collaboration between academia and industry. J. Med. Chem. 60(19), 7958–7964 (2017)
- Hatley RJD, Macdonald SJF, Slack RJ et al. An αv-RGD integrin inhibitor toolbox: drug discovery insight, challenges and opportunities. Angew. Chem. Intl. Ed. 57(13), 3298–3321 (2018)
- Adams J, Anderson EC, Blackham EE et al. Structure activity relationships of αv integrin antagonists for pulmonary fibrosis by variation in aryl substituents. ACS Med. Chem. Lett. 5(11), pp 1207–1212 (2014)
- Oglic D, Garnett R and Gärtner T. Active search in intensionally specified structured spaces. Proc. 31st AAAI Conf. Artif. Intell. 2449 (2017)
- McGann M. FRED pose prediction and virtual screening accuracy. J. Chem. Inf. Model. 51(3), 578–596 (2011)