Tales of Similarity
The concept of molecular similarity is central to many applications in chemoinformatics and medicinal chemistry. While similarity is an intrinsically subjective concept, we attempt to quantify it by comparing molecular representations.
First and foremost, this is attempted through application of the famous (or infamous, as one of the other medicinal chemist might say) Tanimoto similarity metric. To avoid reiterating the literature (e.g., Maggiora et al. Molecular similarity in medicinal chemistry. J Med Chem 57, 3186, 2014) I'd like to draw attention to only one of the key issues in molecular similarity analysis (the most important one, perhaps): we typically try to predict active compounds on the basis of similarity calculations. Simply put, if a test compound is found to be 'similar' to known active molecules (e.g. on the basis of Tanimoto similarity calculations) we assume that the test compound also has a high probability to be active. As has been pointed out at least a few times in the literature, there are SAR caveats to consider when judging about similarity relationships in this way. Nonetheless, essentially all computational approaches currently employed for ligand-based virtual screening operate on the basis of this similarity relationship argument. What should be stressed, however, getting to the point of this post, is that there currently is no reliable way to confidently correlate calculated molecular similarity with observed biological activity similarity (regardless of how molecular similarity is quantified). In fact, I would consider our current inability to infer from calculated molecular similarity to activity similarity (and generalize activity predictions) to be one of the major unsolved problems in computational medicinal chemistry. Importantly, this is not only an academic issue, but one with profound practical implications. Let's just consider the fact that database rankings produced by virtual screening are generally enriched with many false-positives (at high ranks), a problem practitioners in computational compound screening face on a daily basis. Trying to tackle this problem scientifically is an equally challenging, interesting, and rewarding task. At the least, raising awareness of it is considered to be important for the medicinal chemistry field.