Computational chemical biology: the intersection between chemical biology and computational chemistry
In this ediotrial, J.B. Brown (Kyoto University) and Jürgen Bajorath (Rheinische Friedrich-Wilhelms-Universität) discuss the many applications of computational chemistry to chemical biology.
J. B. Brown1 and J. Bajorath2
1Life Science Informatics Research Unit, Laboratory for Molecular Biosciences, Kyoto University Graduate School of Medicine, Kyoto, Sakyo, Japan.
2Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, D-53115 Bonn, Germany.
Key words: Targets; small molecular probes; selectivity; target-ligand interactions; cellular phenotypes; computational methods; data analytics; machine learning; predictive modeling
There is no doubt that interdisciplinary science at interfaces between chemistry and biology continues to be on the rise, and it is not trivial to comprehensively describe or define the still rapidly evolving field of chemical biology. Of course, the use of small molecules to probe biological functions represents a central task of this multi-facetted discipline, which comes along with a number of requirements.
First and foremost, small molecular probes are typically required to achieve a high level of target selectivity – or even specificity – such that firm conclusions can be drawn about direct links between target engagement and specific biological effects or phenotypes. Further, it has been recommended that a quality probe be guaranteed for open use by the research community. Desirable characteristics of chemical probes set requirements for chemical biology apart from medicinal chemistry and drug discovery, where multi-target activities of compounds and ensuing polypharmacology are also desirable for certain therapeutic applications.
Moreover, in contrast to the open science attitude promoted in the field of chemical biology, drug discovery data are typically not disclosed. Further expanding the use of molecular probes for target deconvolution from phenotypic assays or the systematic study of ligand–target interactions closely connects chemical biology and chemogenomics, and the boundaries between these fields become rather fluid.
Similar to other data-rich scientific fields – not even mentioning ‘big data’ issues in the life sciences – chemical biology also benefits from computational efforts, not only for data analysis but also for modeling. There is a wide spectrum of computational methods that are relevant for chemical biology, giving rise to the sub-discipline of ‘computational chemical biology’ (CCB), which has been increasingly discussed in recent years. For CCB, there currently is no definition written in stone but the immediate relevance of a variety of computational approaches – often originating from different areas, such as chemical informatics or medicinal chemistry – for chemical biology is undeniable.
If one wanted to delineate the methodological spectrum of CCB, any computational approach addressing the target selectivity of small molecules, directly or indirectly, is immediately relevant. This directly relates to machine learning and probabilistic methodologies. Furthermore, molecular similarity methods to infer targets on the basis of compound similarity relationships certainly fall into the CCB spectrum; such methods are often applied for target deconvolution from phenotypic assays.
Moreover, computational concepts to systematically analyze and predict ligand–target interactions, often more narrowly defined as chemogenomics modeling (vide supra), are highly relevant. To these ends, a variety of machine learning approaches are applied, also including methods that have become en vogue recently, such as active learning and deep learning. Comparing these approaches is intellectually stimulating because they utilize either minimally informative (active learning) or maximally sized (deep learning) data sets for model building.
For the prediction of ligand–target interactions, structure-based approaches are also of interest, such as ‘inverse docking’, which attempts to predict targets of active small molecules (probes). There are more computational approaches one could – or should – consider, including, among others, the design of target-focused libraries for probe identification or of probe libraries with general target deconvolution potential. Both of which are critical to the narrowing of initial pools of statistically significant differentially expressed genes obtained from transcriptomic and/or proteomic analysis; the discussion could go on.
As mentioned above, many computational approaches falling into the CCB spectrum have originated from different areas and have been further adapted for chemical biology. Going forward, it will also be interesting to attempt the development of conceptually new methodologies specifically for CCB, for example, computational methods for the prediction or design of target-specific compounds, which currently are still in their infancy.
Aiding to this need for new developments specifically framed with CCB in mind, several public repositories of chemical probe data are now available, as recently reviewed by Schwarz and Gestwicki. These databases support CCB method development through high-volume, expert-curated and/or industry-donated pre-competitive probe ensembles and ensuing perspectives.
In order to further develop and sharpen its scientific profile, CCB will benefit from raising increasing awareness and showcasing new developments, for example, using special publication initiatives or conferences such as the 7th International Chemical Biology Society Conference in Vancouver (BC, Canada), which will feature a dedicated session on CCB.
The immediate application of existing statistical pattern recognition techniques to chemical probe data sets will provide a baseline assessment of the transferability of existing chemoinformatic and bioinformatic methods. However, the identification of aspects where existing methodologies fail to capture relationships in selectivity versus phenotypical aspects of probes will be the essential step for laying the groundwork of CCB-specific data mining methodologies.
Whereas computational chemogenomics often focuses solely on a ligand–target matrix and its analysis, CCB methodologies will require expansion to incorporate parameters affecting cellular conditions and phenotypical readouts; the challenge here will be compensating for additional sparsity induced by aggregation of heterogeneous chemical biology data sets. Developing ideas for systematically representing the intersection of chemical, biological, and pharmacological parameter and outcome spaces, which may include time-dependent processes, and the subsequent extraction of interpretable knowledge through computational means represent a critical direction for CCB and a challenge going forward.