ReplaceR

 

In the course of a medicinal chemistry project, molecules progress through a series of changes that often involve replacement of R groups (or linkers). A key insight is that there is a certain order to these changes; in general, R groups that are simpler (in size, or in ease of synthesis, or just cheaper) will be made before those are more complex. It would be unusual to make the bromide, without having already tried chloride; if you see a tetrazole, then a carboxylic acid was likely already made.

Given true time series data from projects, we could work out this information. However, even without this, we can infer similar information from co-occurrence data in matched molecular series across publications and patents in ChEMBL (see this 2019 talk and blogpost of mine for more details). To give a sense of the method, consider that butyl co-occurs with propyl in matched series in 801 papers, but given that propyl occurs in 2765 series versus butyl's 1905 we infer that butyl is typically made after propyl (if at all). Groups in the 'after' panel are sorted by co-occurence with the query. Groups in the 'before' panel are instead sorted by co-occurrence/frequency - if we didn't do this, hydrogen or methyl would always be top of the list!

Note that no biological activity data or physicochemical property data (e.g. MWs) is used in this analysis. The suggestions are not based on bioisosteres, SAR transfer or QSAR models, but are simply a reflection of historical choices made by medicinal chemists across a large number of publications. I note that the 2021 papers by Mahendra Awale et al and Kosuke Takeuchi et al touch on many of the same issues.

This webapp may be useful to medicinal chemists. But in particular, computational chemists and cheminformaticians will find the app and the data behind it useful to support automatic enumeration and virtual screening of likely next steps. Note that the webapp just lists the 'nearest' 12 groups before and after. The full dataset is available on Zenodo. You can find me at baoilleach@gmail.com or file an issue on GitHub.

Cheminformatics Credits: EMBL-EBI Chemical Biology Services Team for ChEMBL data, John Mayfield et al for Chemistry Development Kit and CDKDepict, Geoff Hutchison et al for Open Babel, Greg Landrum et al for RDKit, Chen Jiang for openbabel.js, Michel Moreau for rdkit.js, Peter Ertl and Bruno Bienfait for JSME, Noel O'Boyle (me) for ReplaceR.