Bridging the gap between discovery and development
We spoke to Connor Coley, PhD candidate at MIT, to learn more about the research presented at the RSC-BMCS/RSC-CICAG Artificial Intelligence in Chemistry conference (London, UK; 15 June).
Connor Coley is a PhD candidate with professors Klavs F Jensen and William H Green at Massachusetts Institute of Technology (MIT; MA, USA), working on computer-assistance and automation for organic synthesis, including both computational planning and experimental execution. He received his BS in Chemical Engineering from Caltech (CA, USA) in 2014 and MS in Chemical Engineering Practice from MIT in 2016. His recent work under the DARPA Make-It program and the newly-formed Machine Learning for Pharmaceutical Discovery and Synthesis (MLPDS) Consortium focuses on the development of a data-driven synthesis planning program and in silico strategies for predicting the outcomes of organic reactions.
Please can you tell us about your research?
I’ve always been fascinated with the scale and complexity of the chemical industry. We take it for granted, but almost everything man-made relies on – at some level – our ability to understand and control chemical processes. This led me to study Chemical Engineering in college and ultimately decide to continue on to a PhD.
We realized that the opportunities for machine learning apply much more broadly to pharmaceutical discovery.
Since joining Massachusetts Institute of Technology (MIT; MA, USA), my research has focused on two aspects of chemical synthesis with an emphasis on small molecule chemistries. The first is experimental: how can we use laboratory automation to avoid rote manual tasks and obtain the data (or compounds) we need as efficiently as possible? The second is computational and has used techniques in data science and machine learning: how do we fully utilize the enormous amount of knowledge contained in the chemical literature to make decisions related to chemical synthesis?
What is the DARPA Make-It program? How is this being applied to drug discovery?
The DARPA Make-It program, to put it very briefly, is meant to systematize small molecule synthesis. A primary goal is to develop methodologies and platforms for automatically identifying, optimizing and executing synthetic routes to small molecules of interest. We have a particular interest in flow chemistry, as this helps with reproducibility and scalability. If we properly generalize prior knowledge, we can find viable synthetic routes to new molecules and better synthetic routes to known molecules, all without human intervention. There is value in celebrating human creativity and individuality when talking about long total syntheses of complex natural products, but many of the molecules we make don’t necessarily warrant that.
Make-It is about small molecule synthesis more broadly, but an obvious application of this is to drug discovery. I think there are two main aspects where these tools would be most useful. The first aspect is in accelerating the classic design-make-test cycle, where time and labor constraints may necessitate selecting compounds on the basis of those perceived as fast to synthesize, rather than those which are most informative to assay. With these new techniques, we will be able to make more objective decisions about synthesizability, and minimize any negative effects of human bias.
The second aspect is in bridging the gap between discovery and development. In most companies, there is a major disconnect between synthetic routes used in discovery efforts and those developed by process chemists. This is consistent with the two having disparate goals. However, if process considerations, such as optimizing for yield or FDA solvent class, can be accounted for earlier, then the lead time for compound scale-up can be reduced.
Can you give us an overview of the MLPDS consortium?
While working on the Make-It program, we realized that the opportunities for machine learning apply much more broadly to pharmaceutical discovery. In May and September 2017, we held two pre-consortium meetings and invited about a dozen chemical and pharmaceutical companies to come to MIT to both hear about our ongoing research and talk about open challenges in the field. Based on the excitement around the topic, we formed the Machine Learning for Pharmaceutical Discovery and Synthesis (MLPDS) Consortium, which kicked off in May 2018.
This is a joint effort between the Departments of Chemical Engineering, Chemistry and Computer Science, and our industrial collaborators. There are currently eight member companies: Amgen (CA, USA), BASF (Ludwigshafen, Germany), Bayer (Leverkusen, Germany), Eli Lilly (IN, USA), Novartis (Basel, Switzerland), Pfizer (CT, USA), Sunovion (MA, USA) and WuXi AppTec (Shanghai, China), and several other organizations considering joining or in the process of joining.
How can artificial intelligence aid drug discovery?
There are so many ways!
The ones that are easiest to envision are those that take the current paradigm of drug discovery and improve individual stages within it. Machining learning techniques are already routinely used for QSAR, ADMET and binding affinity prediction tasks. There are now dozens of companies starting to work on candidate compound selection to make each design-make-test loop more meaningful – whether that’s through focused de novo molecular generation, high throughput virtual screening or active learning strategies. I’ve mentioned a few ways that we can try to alleviate the synthesis bottleneck, but we can also think more generally about questions of molecular design, experimental prioritization and information capture from electronic lab notebooks and the literature.
You may also be interested in:
Using artificial intelligence to predict outcomes of organic chemistry
Active search for computer-aided drug design
Where do you see this field heading in the next 5–10 years?
The biggest uncertainty in my mind is the future of data. At least from the academic perspective, our ability to develop and validate new methodologies is limited by the data we have access to. There is a lot that can be done with the current tools we have but getting better access to data and establishing precompetitive data sharing mechanisms will help us continue to make progress.
The biggest uncertainty in my mind is the future of data. At least from the academic perspective, our ability to develop and validate new methodologies is limited by the data we have access to.
However, I think it’s realistic to say that in 5–10 years there will be several drugs in clinical trials that were developed with minimal human intervention between hit identification and preclinical status. That’s not to say that there shouldn’t be humans in the process – the most successful discovery programs will almost certainly be a combination of artificial intelligence techniques and human input – but as we get better at designing algorithms to suggest compounds, and designing experiments compatible with automated synthesis platforms, the role of a medicinal chemist and the composition of project teams is going to evolve.