A peek behind the paper – Entering the ‘big data’ era in medicinal chemistry: molecular promiscuity analysis revisited
In this interview, take a look behind the scenes of our paper "Entering the 'big data' era in medicinal chemistry: molecular promiscuity analysis revisited", recently published open access in Future Science OA.
Take a look behind the sciences of our paper “Entering the ‘big data’ era in medicinal chemistry: molecular promiscuity analysis revisited”, a data-driven look at the future of big data and its associated issues for the medicinal chemist, published open access in Future Science OA.
What inspired you to write this article?
It was our intention to raise awareness of big data issues in medicinal chemistry (and chemoinformatics) and help to avoid misunderstandings in dealing with and judging about big data (i.e., there is more to it than large volumes). Therefore, the discussion of big data in medicinal chemistry was put into a larger context. In addition, I felt it might be interesting to present an article that would be a little off the beaten path by deliberately combining a perspective on big data -including personal views- with what the big data challenge is all about: data analysis and knowledge extraction. This is why we selected data-driven research on compound promiscuity to complement the perspective.
What were the key conclusions?
Big data associated with small molecules will play an increasingly important role in medicinal chemistry. For example, drug discovery programs can no longer afford not to pay attention to relevant data that are already out there in the public domain. However, accessing such data and learning from them is not a trivial exercise and greatly complicated by their intrinsic complexity, heterogeneity across different sources, or variability (to name just a few features). As far as promiscuity analysis in the context of polypharmacology is concerned, the conservative picture of multi-target activities of small molecules we promote in the article departs from -or even contradicts- subjective views and expectations that are frequently put forward. The key issue is that the results we report are based on a rigorous assessment of what currently available high-confidence data can tell us - and not what we might believe in; an important aspect of large-scale data analysis, also being aware of its limitations. I might add that conclusions drawn from our analysis are also well in accord with recent findings that many compounds from screening decks in the pharmaceutical industry are rarely if ever active in high-throughput screens they are assayed in.
What challenges did you come across?
Planning this 'hybrid' paper was much more fun than a challenge, given that we had pretty firm views about the content and goals and most of the results already obtained. Needless to say, striving for the right balance of review components and personal views in a perspective is always a bit challenging. In this case, balancing the manuscript was further complicated since we also aimed to present original research within the same analysis context. All in all, however, I'm quite content with the outcome.
What work are you hoping to do next in this area?
Research on what we call 'big compound data' will continue to be of high interest to me. For example, we might be able to learn more about therapeutic targets and their potential relationships by systematically analyzing these data. At a 'meta level', it will also be interesting to develop computational tools to make the results of big data analysis accessible to the practice of medicinal chemistry and drug design. I'm sure we'll find more things to do here.
What is your vision for the future of big data in medicinal chemistry?
As stated above, big data in medicinal chemistry is here to stay and further evolve, no doubt. I anticipate that there will soon be a need to integrate data scientists into early-phase discovery programs. Specialists will most likely be required to make meaningful use of big data and support interdisciplinary research. It will also be increasingly important to determine on a large scale how specific experimental settings influence activity data, which is a critical issue for conclusions drawn from compound profiling. Data complexity, heterogeneity, and variability will further increase. In computational medicinal chemistry, we will need to better understand to what extent the availability of big data might improve computational modeling and activity predictions. In any event, it will be exciting to work in this area.